Lessons learned from buying, connecting, and operating domains
Free Trial
Status

Post Incident Report for Partial DNS Outage on October 10th, 2020

Anthony Eden's profile picture Anthony Eden

On Saturday, October 10th, 2020, four of our six data centers (Virigina, Amsterdam, Tokyo, and Sydney) were taken offline as a result of what appears to be a Distributed Denial of Service attack targeted at our systems. The attack came in the form of a burst of randomized subdomain traffic for a domain that is delegated to our servers but for which we are not authoritative. The attack triggered a bug in our name servers that we were not previously aware of. It consumed all the available memory on the systems, resulting in out-of-memory errors and the termination of the name server servers.

After triaging the incident and bringing the affected servers back online, we investigated the contributing factors. Based on information provided by our systems, we identified the name that was targeted, the type of attack that was used, and the bug that was triggered as part of the attack. Once we determined what the bug was, we implemented a fix and deployed that fix across our network.

During the incident, we also identified shortcomings in our monitoring that resulted in delayed diagnosis of the impact on the name server software. As part of our post-incident corrective actions, we are adding additional monitoring to reduce the time it takes to identify these types of issues. We are also working on increasing visibility into the type of traffic we receive in our edge data centers, so we can spot attacks like these faster.

These types of randomized subdomain attacks are, unfortunately, becoming more common each year. As such, we are developing mechanisms to cope with them better and faster with each attack. We understand your DNS is a critical part of your infrastructure, and that you depend on DNSimple to provide reliable DNS. If your systems were impacted by this incident, please accept my sincerest apologies. I assure you that the entire DNSimple team, including myself, are working hard to increase the resiliency of our systems so these types of incidents do not happen again.

Share on Twitter and Facebook

Anthony Eden's profile picture

Anthony Eden

I break things so Simone continues to have plenty to do. I occasionally have useful ideas, like building a domain and DNS provider that doesn't suck.