On Tuesday, April 14th, 2020 we saw a significant increase in ALIAS resolution failures in our Amsterdam (AMS) data center. The incident started at 07:00 UTC with an increase in
SERVFAIL responses for certain requests in the AMS region. This correlated with an increase in ingress traffic, although the volume of traffic was not directly responsible for the incident. At the same time, customers in Europe began reporting resolution failures with their ALIAS records.
Multiple contributing factors were identified:
There was a loss of IPv6 traffic into our AMS data center at the same time the incident started. It is unclear if this was a contributing factor.
Team members in Europe opened an incident and began investigating the issue after receiving reports from customers of ALIAS resolution failures. We ultimately identified that the issue was at least partially due to the new software version of the name server. We rolled back to the previous version in response.
We also reverted the resolver configuration changes (that were made the previous day) to remove ECS support to mitigate impact on a small subset of ALIAS records.
Our goal is to provide you with solid authoritative ALIAS resolution that you can trust to never fail. While we failed to live up to that goal during this incident, we are working with the knowledge gained to improve our system and processes to avoid incidents like this in the future.
Thank you for your trust and your business – all of us at DNSimple appreciate it.
I break things so Simone continues to have plenty to do. I occasionally have useful ideas, like building a domain and DNS provider that doesn't suck.
DNSimple is ten years old this year and that's something to celebrate.
Configure HTTPS redirects with our easy-to-use DNSimple Redirector and a certificate from your DNSimple account.