Post-mortem for Name Server Performance Degradation
Today all of our name servers experienced performance degradation that in turn caused customer sites to fail to resolve. This was especially evident to customers who are using our ALIAS record type.
The source of the issue was a malformed TXT record that ended up getting published out to the name servers. This malformed record in turn caused our DNS backends to churn as they would fail and restart. The reason that customers using ALIAS records were especially susceptible to this issue is because a.) they depend on the backends to resolve the ALIAS record in a timely fashion and b.) they tend to have low time-to-live values and therefore are cached for shorter times.
After fixing the malformed record and putting a fix in place so that the issue would not occur again we saw the performance return to normal. With the issue resolved we are now working on improving our notifications so that we become aware of performance degradations sooner to minimize the impact to our customers.
Please accept my apologies if you were affected by this issue. We will continue to work hard to minimize the likelihood of these types of issues in the future.
I break things so Simone continues to have plenty to do. I occasionally have useful ideas, like building a domain and DNS provider that doesn't suck.
We think domain management should be easy.
That's why we continue building DNSimple.
What's in a DNS response?
DNS queries and responses fly across the internet all day, but we don't often take a look at what's inside of them. Today we do.
Two years of squash merge
A retrospective of the last two years where we adopted --squash as our default merge strategy for git branches.