A few days ago we released the support for Secondary DNS, after more than one month of almost-full time, almost-full team, intense development.
This feature has been one of the rare cases where the entire DNSimple team has worked together. Generally, a feature is the result of the hard work of a couple of people (or less frequently, a single person), but this time it was different.
It's not a coincidence that the entire team has been involved in the development of this feature. There are a number of challenges behind what seems to be a simple feature, on the surface. The reality is that implementing secondary DNS support is not a trivial task. We knew this long before starting work on it, and that's one of the reasons we deferred the development of this feature for so long.
Here's just a few of the technical challenges we had to solve to provide Secondary DNS support:
- Special DNS record type support
- Zone synchronization
- Zone validation
- Make it simple!™
Let me quickly go through these points.
ALIAS record and special DNS record types
DNSimple was the very first DNS provider to introduce the concept of the ALIAS record back in 2011. The ALIAS record allows you to point a root domain to an arbitrary FQDN, rather than an IP address, bypassing the limitations of the CNAME record.
Supporting this record and the other special record types we provide (such as
URL) was, by far, the most significant challenge. Since
URL records are proprietary record types, they cannot be copied verbatim to the secondary DNS replication database. First, they must be converted to some standard DNS type that the secondary DNS provider is able to understand.
For instance, the
ALIAS record requires real-time resolution of the content hostname into the corresponding
AAAA values. Our primary DNS system is an open-source DNS server in Erlang combined with a private module that performs this task in real time. In this case, however, we had to perform the resolution via a different process, store the result in a persistent location for a certain amount of time, and periodically refresh that record.
There was also another issue: before working on this feature, the specification of our ALIAS implementation was implicitly defined by the code itself. In other words, we specified how ALIAS records should work by writing (first in Ruby and then in Erlang) code that performed the task.
We decided to take this one step further. We extracted the ALIAS feature into a Go package, so we can now compile as a binary and re-use it from different products (including our main Ruby app), or import as a Go library in our new projects. We also tried to write a reasonable base set of tests and documentation that describe how the
ALIAS should behave.
From this point onwards, the remaining challenges were figuring out the best way to efficiently replicate the records from our system into a database. And… yes, synchronization is not trivial!
Synchronization is hard. Probably not as hard as cache invalidation or naming, but it's quite easy to introduce inconsistencies, especially when you have to deal with large volume of DNS records and asynchronous processes.
If something can go wrong, it will. Therefore, we had to plan for failure. In most cases, we refresh single records when it makes sense. However, every once in awhile, we trigger a refresh of the entire zone, to make sure we don't leave anybody behind.
Moreover, whenever it makes sense, we use a pull approach rather than a push approach. When a zone event (create, update or delete) is triggered, the notification is sent from source to target. The target pulls the data from the source, rather than the event pushing data to the target. With this approach you need to ensure that events that are broadcasted with no guarantee of time order, will not override each other, on the other side the advantage is that the data pulled from the source is always up-to-date.
Our synchronization system is not bullet-proof. The feature has been recently released, and we'll keep monitoring and tweaking our system as much as we can. After all, synchronization is hard.
Another big challenge was zone validation. At DNSimple we always tried to not limit our user management capabilites. Some providers disallow CNAME records on certain levels, for example the apex domain, or don't allow the user to create wildcard records.
We do. But that means, in some cases, users may create invalid zones.
In the last year or two we introduced several validations with the goal of preventing the most common and evident validation errors. For instance, it is no longer possible to create a record where a CNAME exists for the same name or, vice-versa, to create a CNAME when another record already exists on a particular host name.
Unfortunately, some old zones may still contain invalid data. We discussed several approaches to handle the problem: we could attempt to fix the error in the "best way" (assuming one could be found), with the risk to cause unnecessary overhead, or we could discard the invalid record, with the risk of selecting the wrong one.
We decided to leave the choice to each customer. When you enable secondary DNS for a domain and the zone is invalid, we provide a message with some debugging information and we encourage you to contact us. We will help you fix the zone, making sure to provide the best solution for that specific case and domain.
Make it simple! ™
Believe it or not, managing DNS at any significant scale is not easy.
Part of our mission is to make this task easier. Therefore, we had to iterate several revisions of our UI to make sure that managing secondary DNS with DNSimple would not be as complicated as launching a shuttle into space.
That's why our secondary DNS management area is one, single, minimal page. It asks you for the most essential details you need to get started.
Several people asked us for recommendations about secondary DNS providers. We are currently working on an update of the interface that will give you the 1-click ability to select a providers and enable the secondary DNS with the necessary configuration.
We started working on the secondary DNS service on December 9th, the very first day of our company meetup in Madrid. I still remember the scene of Javier driving me from the airport to the apartment, with Anthony trying to convince me that we had to start building it that day.
It took us the entire week of the meetup to get the basic infrastructure and product, and one entire month of development and testing.
The DNSimple team worked together to get this done. It has been a very stressful end of the year for us, and I'd like to thank the entire team for the amazing work.
But to be honest, we really need to thank you, our customers, for your support and for patiently waiting for us to release this feature.