Status

Post-mortem for DNS outage on July 23rd

Anthony Eden's profile picture Anthony Eden on

Root Cause

From approximately 09:35 UTC until 10:00 UTC, July 23rd 2012, DNSimple experienced an outage across four name servers. This outage appears to be the result of a distributed denial of service (DDoS) attack. Due to the attack we were unable to respond to DNS requests in a timely fashion. Our job processing queue simultaneously stopped due to a failed deployment. The backup in the job queues resulted in new subscriptions timing out and impacted customers updating DNS records.

Identified Issues

One issue we identified during this outage was the lack of early feedback. Our automated systems reported the event quickly and posted notices to @DNSimpleDevOps, our Twitter account specifically for our operational tweets. However, that's not enough. We want to make sure that you can get in touch with someone during these types of events and we want to provide timely feedback.

Another issue was with our analytics system. As the inbound traffic increased our outbound traffic increased as well even though we did not need to respond to the inbound requests. This was due to the various analytics data we collect on our name servers.

The final issue was with the networks themselves. While our servers did not appear to become overloaded the networks we use were quickly saturated and thus caused queuing of requests. This queueing is what ultimately appears to have slowed down our DNS responses.

Remediations

Over the last two years we've focused on developing features and functions that help make DNSimple easy to use and save you time. In order to make that happen we've relied on various hosting providers for our DNS servers. We will now begin moving forward moving our DNS infrastructure to dedicated systems where we will have greater control over the network and server systems. While we do not have an estimated launch date yet for the new infrastructure we are making it a top priority.

On the communications front we will do a better job at responding in a timely fashion when an outage occurs. We will continue to use Twitter as our primary means of notifying our customers of issues so please follow @dnsimple there for updates as they occur.

If you have any questions or concerns about our service or our ability to handle your DNS traffic contact us at support@dnsimple.com.

Anthony & Darrin Eden

Share on Twitter and Facebook

Anthony Eden's profile picture

Anthony Eden

I break things so Simone continues to have plenty to do. I occasionally have useful ideas, like building a domain and DNS provider that doesn't suck.

We think domain management should be easy.
That's why we continue building DNSimple.

Try us free for 30 days
4.5 stars

4.3 out of 5 stars.

Based on Trustpilot.com and G2.com reviews.