Sectigo Root Certificate Expiration Issue
On May 30th, 2020, Sectigo's Root certificate CN = AddTrust External CA Root expired. What should have been a transparent, non-noticeable change turned into an internet-wide issue. Particularly because some old versions of OpenSSL and other crypto libraries were unable to validate the alternate certificate chain, the certificate chain was treated as invalid.
A wide range of software and services were affected. Just to name a few: Stripe, Spreedly, and Roku all had incidents. A number of additional companies posted updates, including RedHat, CPanel, and various SSL certificate resellers, like DNSimple.
DNSimple's systems were partially affected by this issue for a brief period of time, and a number of DNSimple customers have been experiencing various issues. I want to explain what happened, why, and how DNSimple reacted.
I will also share what we at DNSimple have learned from this incident, and how we are planning on handling similar changes in the future.
DNSimple customers affected by this issue can follow these instructions to update the certificate bundle and resolve the error.
Before we get into the main section of this post, I want to provide some necessary context so we're all on the same page.
Certificate Authorities, Root, and Intermediate certificates
For this article, it's important to be familiar with terms like Root certificate, intermediate certificate chain, and Certificate Authorities (CAs). You will also want to know how a certificate works, and how a client validates the certificate and its chain to determine if it's trusted. If any of this is unfamiliar, take a look at this recent article from Scott Helme. He does a good job setting the stage before looking at the same issue we're going to discuss here.
Comodo and Sectigo
To fully understand the issue, we need to time travel a few years into the past. It's 2017 when Comodo CA, Comodo's certificate division, is acquired by Francisco partners. Comodo continues to run its business as Comodo Cyber Security. One year later, in November 2018, Comodo CA is rebranded as Sectigo.
This rebranding will have implications for our story and in the issue we're talking about.
The rebranding has been carried over in multiple phases. Initially, it was purely aesthetic (logo, site, marketing material, etc). Certificates continued to be issued by Sectigo and signed by CN = AddTrust External CA Root via CN = COMODO RSA Certification Authority. During this period, links and support documentation kept changing almost monthly in an effort to replace the Comodo brand with Sectigo. Very often, previous resource links resulted in broken links, until one day the entirety of the support documentation was gone from the old Comodo support site and scattered across the new Sectigo domain name.
We still see trails of these changes in our source code where we used to document all the links to stay on top of the changes.
Around January 2019, Sectigo started to issue new certificates under the new intermediate CN = Sectigo RSA Domain Validation Secure Server CA.
This happened essentially overnight, with no prior communication. I still recall when I was told the news - we were in our quarterly company team meeting in Lanzarote, and had to stop our morning activities to deal with a critical issue - our customers were served an incorrect bundle by our installer.
Dealing with the issue was very painful. After searching for the new bundle with no success, we reached out to Sectigo support. Even they were unable to provide us the new Root chain from a publicly available source. We were left without a solution, until we were finally able to extract the bundle from one of our customer's orders, at which point we made it available to everyone via our certificate installer.
A change to the intermediate chain could (and should) have been handled significantly differently by Sectigo. Unfortunately, this is another contributing factor to the issue from the 30th of May.
Certificate intermediate chain and the DNSimple certificate installer
The last piece of this story is about our certificate installer, also called the SSL certificate installation wizard in our original announcement post back in October 2014.
During our first few years of reselling SSL certificates, we learned from our customers that the biggest difficulty of obtaining an SSL certificate was installing it. More specifically, figuring out the correct intermediate chain, and how to package it along with the server certificate.
This should come as no surprise considering that, when searching for documentation, most CAs offer you documentation similar to this:
We were selling a few different SSL products from Comodo and other certification authorities. In some cases, certificates issued by the same company needed different intermediate chains. In the years between 2012 and 2014, almost 20% of DNSimple support requests were about SSL certificate chains. Close to 100% of SSL certificate problems were related to SSL certificate chains.
Immediately following the release of the certificate installer, the number of support requests on this topic dropped close to zero.
DNSimple is not a certificate authority. We are not involved in the issuance process or the trust chain. DNSimple is not required to supply the intermediate chain. This is entirely under the control of the certificate authority.
However, at DNSimple we constantly try to improve our user experience and do our best to provide that extra personal touch that our customers appreciate and look for. That's why we decided to take this customer pain point as our responsibility, and commit to maintaining an intermediate chain builder as accurately as possible - to simplify our customers' lives.
If someone asks me what the most successful feature I've ever built in DNSimple is, this is probably one of the top 5.
On May 30th, Sectigo's Root certificate CN = AddTrust External CA Root expired. This certificate was issued 20 years ago, and was the Root certificate originally used by Comodo. This was considered the legacy Root certificate. In 2010, the certification authority issued a new Root certificate, valid until 2038, to replace the legacy one. They then started to distribute the new Root to various certificate Root stores. As the new Root was distributed to software in various updates, they used a process called cross-signing to sign a new certificate with both Root certificates.
A certificate should be considered 'trusted' if at least one of the trust chains associated with the certificate is trusted. Using cross-signing, the new Root certificate would have guaranteed a trusted chain, as the old Root certificate chain became invalid due to the expired Root. Comodo's (and then Sectigo's) plan was that all modern browsers would initially have both the expired Root and the new Root. They would have automatically switched to using the new Root certificate once the old expired. Users should not have not experienced any issues due to the expiration.
Clearly, this is not what happened. Certain users started to receive invalid certificate errors. Several online services had outages.
DNSimple had an outage as well. This is our initial incident event timeline, from our internal Post Incident Review:
- 30 May @ 10:48 UTC - Chef run fails showing an error connecting to our chef orchestrator server.
- 30 May @ 11:26 UTC - Alerts from pingdom regarding sandbox.dnsimple.com appear. The on-call team member was notified of the incident and started investigating the alerts. It was determined that most - but not all - Pingdom checkers fail to check sandbox.dnsimple.com due to expired certificate errors.
- 30 May @ 11:39 UTC - The on-call team member identifies the issue as caused by the Root certificate packaged within the intermediate certificate chain. We planned to replace the bundle to remove the Root. We could not consistently reproduce the issue, it did not occur in browsers, and only certain software seemed to be affected. At this point, we considered it a DNSimple-only issue.
- 30 May @ 12:26 UTC - We receive the first support request related to a similar issue.
- 30 May @ 13:56 UTC - As we monitor support, and the first issues appear with services other than DNSimple, we realize other users may be affected. We assume it's a race condition caused only if your bundle contained the expired Root certificate. As a precaution, we decide to publish tweets from the DNSimple Twitter account with information on how to address the issue, depending on whether the certificate is signed by Sectigo or Comodo.
- 30 May @ 14:50 UTC - We updated the certificate bundles on several of our systems. Upon completion of the update, our systems are no longer impacted, and we adopt this change as the remediation process to recommend to our customers.
As time passed, and we found more non-DNSimple related cases being reported, we started to realize it was not a single issue, but a combination of issues. It took a while to put all the pieces together. Let's take a look at why this happened.
Why did this happen?
We learned the hard way that a number of legacy network clients and libraries were not able to correctly detect, follow, or trust the alternate intermediate chain. As a result, devices using these clients and libraries failed to validate the certificate, returning an invalid certificate error.
As reported by a study from Carnegie Mellon University, there are two main categories of incompatibilities:
- Legacy clients - This includes old software and devices that failed to validate the alternate chain because they did not have the new Root certificate included in their Root store. This list includes software like Apple Mac OS X 10.11 (El Capitan) or earlier, Apple iOS 9 or earlier, Microsoft Windows XP, and Mozilla Firefox 35 or earlier. These clients will continue to report an invalid certificate. The only solution is to upgrade to a more recent version of the software.
- Broken clients - This includes software and devices that failed to validate the alternate chain due to a broken SSL certificate validation implementation. This list includes a few different libraries, but the majority of cases directly or indirectly correlated with the OpenSSL library.
In almost all cases we observed directly, OpenSSL was the issue. OpenSSL versions prior to 1.1.1 appear to always validate the first (invalid) trust chain, assuming that certificates are a single linear chain. Unfortunately, OpenSSL is one of the most widely used crypto libraries, and it's embedded in a large number of programming languages. For instance, the SSL implementation of the Ruby programming language is built on top of OpenSSL. As a result, any library developed with the Ruby programming language compiled against an OpenSSL version lower than 1.1.1 stopped working, as the Root certificate expired on May 30, 2020.
Programming languages like Go or Java that implement their own crypto library were not affected. In the investigation we performed at DNSimple after the incident was addressed, we realized all our affected clients were software written in Erlang or Ruby, both of which rely on OpenSSL.
Go is the second language at DNSimple, but Go implements its own cypto library, which explains why none of our Go systems showed any issues connecting to our systems when the expired certificate was included. Furthermore, modern web browsers successfully switched to the new chain, making our investigation process even more challenging.
The issue was caused by the inability of certain legacy or broken software to use the alternate and trusted chain, once the primary certificate trust chain became invalid as the primary Root certificate expired.
What did we do?
The DNSimple team reacted to the initial incident affecting our systems within 3 hours of the initial alert. That included identifying the issue, determining a mitigation strategy, and ultimately removing the expired certificate from the chain. The issue occurred on Saturday morning European time, so the direct impact on our customers was extremely limited. In fact, the impact was mostly on our internal tools.
As the issue evolved, and it started to become clear this was not an isolated issue to our system, we performed a number of actions to assist our customers:
- 30 May - We posted a first public notice on our Twitter account, instructing our customers to re-download the certificate bundle. This action was based on the initial interpretation that the issue was just the expired Root certificate embedded in the certificate bundle. For historical context, our initial version of the certificate installer shipped with the Root certificate bundled by default in the intermediate chain. As we realized this was unnecessary, on Jan 2020 we changed our installer to no longer include the Root certificate. This change was mostly performance-oriented. As customers reached out to us in 2020 asking for more information about the expiring Root certificate, we recommended replacing the bundle and excluding the Root. This was a precaution, as we assumed libraries would correctly use the valid chain even if the expired certificate was present.
- 30 May - As we go through the cases reported to our customer support, we notice a small portion of customers continue to have issues even after removing the Root certificate from the bundle. We find that an intermediate certificate belonging to the expired Root also expired the same day. We compare the chain with the one currently published in Sectigo website, and we find that it has changed once again - without any communication or last update indicator on the site. We roll out a change immediately to update to the new chain. We had compared the chain just 3 months before. We'll discover later that Sectigo continued to provide the soon-to-expire chain up to 30 days before the expiration, as also reported by Namecheap who experienced a very similar issue. Essentially, customers had less than 30 days to switch, and no clear notice.
- 30 May - We receive a support request indicating that the customer is still having issues with their certificate chain. We continue to investigate, and realize the certificate was issued by Comodo and not Sectigo, signed by CN=COMODO RSA Domain Validation Secure Server CA. This is where the Comodo to Sectigo rebranding I explained above tricked us. In order to support that change, we introduced a switch based on the server certificate signer, and the new bundle was returned only for certificates signed by Sectigo. Unfortunately, some 2 and 3 year-old certificates were still under Comodo intermediates. These intermediates were not on the Sectigo website. To immediately assist the impacted customers, we decide to offer them a free, newly issued SSL certificate so they could get back in business in a few minutes. We published a second update on Twitter to cover this.
- 02 Jun - We find the new Comodo intermediates that had been recently updated without notice. We continue to investigate.
- 05 Jun - We publish an update to the Comodo installer as a precaution, although all customers who reported issues have already been offered a new free SSL certificate as replacement.
Why were customers not informed?
One of the most freqently asked questions we've received via support is why did we not inform our customers about this event. The reason is that we did not expect the expiration of the Root certificate to become an issue, nor did we expect any impact on our customers.
As pointed out by The RedHat article:
Root certificate expiry is a normal, if infrequent, occurrence.
We did not expect this event to turn into an issue.
DNSimple is not directly responsible for the intermediate certificates. It's the responsibility of the CA. We trusted the CA's decision, and evaluated previous similar processes. As an example, Let's Encrypt has been cross-signing certificates since 2018, and we have never received a single complaint about validation issues.
A few customers asked, as a follow up question, why we did not consider sending notifications regarding this event. The main reason is that events like this happen every day with zero impact. If we sent out emails for each of these events, your inbox would be filled with hundreds of non-actionable emails a week.
As an example, every week a number of registries rotate their DNSSEC signing keys, with the potential risk to take down an entire TLD space – including our customer domains. But these events are not actionable for our customers, and in almost all cases the rotation completes without impact. Likewise, if you turn on DNSSEC at DNSimple, we rotate your signing key every 90 days. We expect these events to complete seamlessly, and generally they do.
In order to reduce the noise, we send notifications only for actionable events, or critical events over which we have control. We did not consider the expiration of a Root certificate one of them - rather an operational event that would have completed as many others do every day.
How can I fix the issue?
You can fix the issue by re-installing the SSL certificate. From your DNSimple account, go to the certificate page, follow the instructions to download the certificate intermediate chain, and replace it on the server.
If the issue persists, send us an email, and we'll assist you.
What we learned and future improvements
While most customers praised our quick reaction and fast support turnaround, the most common critique is that we did not effectively communicate the issue through the expected notification channels - we relied solely on Twitter. After internal discussion, we agree that our public response was ineffective. We will make sure to properly communicate similar issues in the future via our Status site.
We are also evaluating developing an automated mechanism to monitor intermediate certificates and update our certificate installer with the most recent intermediates whenever possible.
We stand by our decision to not include the Root certificate in the bundle served by our certificate installer. It turns out it was not the primary issue as we originally thought, yet there is no compelling reason to include a Root certificate in the bundle.
As we continue encouraging domain automation, we may consider stopping support of certificate authorities that fail to provide a sufficient level of automation to support our needs, and the needs of our customers. We will continue encouraging short-lived certificates, as multi-year certificates have proven to be the source of several security and maintainability issues. This may soon become a non-issue, as the 3-year expiration has been prohibited since 2017. Starting in September 2020, the maximum lifetime will be enforced to 1 year.
We will consider Root and intermediate transitions as potentially risky events. Whenever applicable, we will inform our customers of changes to the intermediate chain for certificates they ordered or Root transitions.
We will put our new processes into practice as part of the upcoming Let's Encrypt transition to ISRG root. This event was planned for May 2019, and postponed to July 2020. We will monitor the progress of the transition and notify our customers accordingly.
A personal note
I want to thank all of our customers for their understanding and support. This issue has been my top priority since May 30th, and a top priority of several team members who helped our customers, and worked around the clock to investigate reports and update our system.
While this issue was caused by events beyond our control, I know many customers choose DNSimple because they trust that we can reduce the challenges of dealing with domain names or, in this case, SSL certificates. I will continue to make sure we fulfill this promise to the best of our capabilities.
I hope certificate authorities will learn from this incident. I hope they will better evaluate the risk associated with changes to their trust chain, and properly communicate with their customers and their resellers. I also sincerely hope that more certificate autorities will follow the lead of Let's Encrypt in considering automation a first-class citizen into their processes, so that we can finally stop relying on convoluted manual processes.
If you have any additional questions, you can contact support or reach out to me directly at simone at dnsimple dot com.
Italian software developer, a PADI scuba instructor and a former professional sommelier. I make awesome code and troll Anthony for fun and profit.
We think domain management should be easy.
That's why we continue building DNSimple.
DNSimple Now Supports Secondary DNS Hosting
Configure DNSimple as your secondary DNS provider to improve your domain's availability and redundancy with AXFR zone transfers.
Introducing Domain Access Control
Use DNSimple's Domain Access Control to limit what each member can access on a per-domain or per-zone basis.