On December 23rd, 2013 we were alerted to an outage with two of our unicast name servers, followed shortly thereafter with degraded performance from the two other unicast name servers. Darrin and I logged into Boundary and started analyzing the traffic. Focusing on the moment the outage started, we noticed a large NTP spike. We do not run any public NTP servers so we did not respond to any of the requests, however the attack did cause two of our providers (Rackspace and Linode) to take action to protect their networks. Both providers null-routed traffic to our IP addresses, effectively removing those two servers from the Internet. Our two other providers did not null-route. In the case of Server Central performance was degraded to the point of not being usable during the attack since our available bandwidth was saturated. Amazon was the only provider that actually blocked the NTP traffic from ever even appearing at our machines (hat-tip to Amazon for this as they essentially allowed us to keep operating at some level).
After about 1 hour, and a support request from us, Linode removed the null-route and NS2 returned to the Internet. The same could not be said for Rackspace, as they initially were unable to actually determine that they has set up a null route rule. After a bit of back-and-forth we were able to convince them that the null route was in place. At this point they indicated it would be a minimum of 24 hours before the machine would route again. Thus NS1 was effectively offline indefinitely. We decided at this point to begin switching everyone over to our Anycast network.
At the time we were operating both our Unicast and Anycast network, slowly moving customers over to Anycast. The plan was to complete the move to Anycast by Q1 2014, however this attack essentially forced us to move that time-table forward. We found a way to support both the nsx.dnsimple.com name and the nsxd.dnsimple.com name for each domain and added all of the appropriate records to both the Unicast and Anycast network. We then switched the IP address for ns1 from the old Unicast network to the new Anycast network. After verifying that the system was operating correctly and within the bounds we expected, we changed the IP addresses for the remaining name servers. By December 27th, we had migrated everyone except for our vanity name server customers that require glue records. Even today we are still operating the Unicast system in order to support those customers, however we will begin contacting them to move them as well.
All in all the switch was done under duress, which is never a good time to make changes, but in this case it turned out to be the right approach. Almost all of our customers are benefitting from faster resolution times, a robust name server architecture, and a better overall experience. Naturally we still have work to do, but we will keep on improving and iterating on this new network.
I break things so Simone continues to have plenty to do. I occasionally have useful ideas, like building a domain and DNS provider that doesn't suck.
Configure DNSimple as your secondary DNS provider to improve your domain's availability and redundancy with AXFR zone transfers.
Get a free limited-edition t-shirt featuring the characters of howdns.works and howhttps.works with any new yearly subscription to DNSimple.