DNS is one of the most fundamental systems on the internet. Being nearly as old as the IP address itself, it was designed for maximum resiliency through distribution and redundancy.
Since the issues at the end of October that caused massive outages after Dyn was attacked, several people want to know if all providers are now vulnerable as single-points-of-failure in another coordinated attack?
My shortest answer is "less now than ever," actually, but of course, it's not that simple. Like I said before, the technical design for domain name services is robustness in large distributed systems, but whenever you rely on any one service provider for services they are indeed a single point of failure.
We have experience with large DDoS attacks and are aware of the failures they can cause. Like pretty much any other DNS provider, we have experienced attacks in the past. We try to do as much as possible to mitigate any failures internally or externally. When you access ns1.dnsimple.com, there are five data centers across the world that answer that call, based on BGP best routing. In each of those data centers, there are multiple boxes that reply for that single address as well, allowing us to do maintenance or have internal machine outages without impacting name resolution at all. This is the same methodology the root DNS servers use to maintain full uptime. We also use a front layer DNS caching system to help absorb large amounts of our traffic burst load. Even with all this, I hesitate to say we are immune to outages from threats these days. Resilient, yes, but not immune.
This is why we offer AXFR transfers to other DNS providers as a backup, so you can use
NS4 with DNSimple and use up to nine other name servers with another provider. There is also a way to use DNSimple as a secondary, but it doesn't use AXFR. I think that splitting your services across multiple providers is the only way really to be sure to maximize your uptime. If you only use one hosting platform (Heroku for example) they are another single point of failure and if they go down so will your website. Just like in the past when Amazon has had major failures, the cascading effect is huge when core players that are heavily relied upon have outages.
If you really want to maximize things you could even configure a custom setup that used up to 13 different ns servers with a large variety of providers but I think that could become very difficult to manage. There is a lot of computer system engineering theory on scalability and uptime, but the short of it is you need to weigh the value of your product against the cost of different levels of redundancy.
This type of attack used is far from new, in fact, long-time customers of ours will remember when we were DDoSed last. It's the reason we've spent a lot of time and resources to make sure we are as robust as possible. The fact that so many large players were affected by a single company's attack is not something any of us are happy about, and we hope it helps bring to light the importance of having as many redundancies as possible, not just in your own infrastructure, but also in the providers that you use.