Status

Post Incident Report for Partial DNS Outage on April 7th, 2025

Amelia Aronsohn's profile picture Amelia Aronsohn on

On April 7th, 2025, the DNSimple authoritative DNS network suffered a significant attack that overwhelmed our DDoS protections and caused intermittent SERVFAIL responses for a large number of our customers globally. This post-mortem explains what happened and how we responded.

We expose four sets of name servers: NS1, NS2, NS3, and NS4. NS2 and, to a lesser extent, NS4 were the most impacted, although all name servers globally experienced issues. The incident began at approximately 23:38 UTC and lasted until around 02:58 UTC.

DNSimple takes all incidents seriously, from minor system failures to global catastrophic events. This was an all-hands event from the moment our first traffic volume alerts triggered until an hour after the last resolution was restored.

Contributing factors and mitigation

When an incident occurs, we focus on identifying the contributing factors. In this case, the most impactful was a dramatic surge in traffic. Our EDGE systems experienced a 10x increase in QPS, spiking from about 25,000 requests per second to 260,000 rps.

This wasn't regular traffic. The attack resulted in a 4x increase in our usual UDP DNS load and a 100x increase in larger, slower, and more resource-intensive TCP traffic.

Another contributing factor was the "thundering herd" effect from some customers, especially those using our DNS service with automated deployment platforms like Kubernetes, who began retrying rapidly as soon as failures were detected. This led to a surge in valid, but overwhelming, traffic.

We have several mitigation strategies for DDoS traffic, including adjusting caching and minimum TTLs at the edge, rate-limiting origin name servers, and even rate-limiting specific zones. However, these systems were not tuned for this volume or for filtering the specific traffic patterns that caused the network and systems to be overwhelmed.

As part of our mitigation efforts, we tightened these filters across multiple zones to control traffic flow and cycled overwhelmed systems to restore normal operation. We also made several real-time adjustments to our DDoS protection, including more aggressive caching, to help handle the load.

Post-incident review

Once the emergency passed, we conducted a rigorous post-incident review to learn from the event and prevent similar issues in the future.

In the two days following the incident, we documented a record-setting number of corrective actions and future improvements. Our Incident Review meeting lasted over 90 minutes-well beyond its usual one-hour timeslot.

We identified several areas for improvement, including:

  • Enhancing our response to overwhelming traffic attacks
  • Improving general failure recovery processes
  • Making traffic filtering faster and more efficient
  • Unlocking significant performance improvements in our name servers

In the coming weeks, we'll continue our efforts to improve, optimize, and reduce the impact of similar incidents in the future.

Conclusion

Before we close, we want to acknowledge a miscommunication during the incident: a mistaken status page update incorrectly set the incident to "monitoring" instead of "identifying". We apologize for the confusion this caused. We're committed to improving our communication and being more proactive when issues arise that may affect you or your customers.

At DNSimple, we work hard to ensure maximum uptime for our DNS services. We will continue strengthening our networks and systems to respond to the growing threats on the Internet.

Thank you for your support and patience throughout this incident and beyond. If you have any questions, please contact our support team, and we'll be happy to help.

Share on Twitter and Facebook

Amelia Aronsohn's profile picture

Amelia Aronsohn

Kaizen junkie, list enthusiast, automation obsessor, unrepentant otaku, constantly impressed by how amazing technology is.

We think domain management should be easy.
That's why we continue building DNSimple.

Try us free for 30 days
4.5 stars

4.3 out of 5 stars.

Based on Trustpilot.com and G2.com reviews.