Every system that evolves over time accrues technical debt. Depending on the number of contributors to that system; the complexity, the rate of change, as well as other factors, that debt may accrue slower or faster from system to system. DNSimple is no different. In 2010 DNSimple was a simple Rails application that just handled DNS management. Today we provide complex DNS management across an Anycast network, domain registration, transfers, renewals, SSL certificates, including Let's Encrypt certificates, a comprehensive API, secondary DNS support, and so much more.
Like all systems, technical debt in DNSimple has grown over the years. Here's what we do to pay it back.
The first step to paying back technical debt is to understand where it hurts. There are several indicators that you can use to identify technical debt.
The first indicator you should look at is customer feedback. Software is written to solve the needs of customers. Sometimes the customer is someone who is giving you money in exchange for your software, sometimes it is other developers or non-developers on your team, and sometimes you are the customer. The first step to identifying technical debt should come from paying attention to customer pain. If certain parts of your system are confusing, hard-to-use, slow, or otherwise result in a less-than-stellar experience, start looking in those areas for your technical debt. At DNSimple, each single team member is involved in customer support.
The next indicator you should look for is errors coming from your system. If you are not already surfacing and collecting your error cases, then start. We use Bugsnag tied into Slack to surface our errors, but you can use whatever tool you prefer. The main idea is to make sure that errors get in your face. The old saying "out of sight, out of mind" is one that you should take to heart and be aware of. If you can't see errors happening then you probably won't fix them. A lot of our processes are asynchronous and for some time we tended to sweep those errors under a rug. Recently we began surfacing a bunch of errors that result in pain for our customers, even if it isn't necessarily immediate and visible pain. These errors are an indicator that we have technical debt we need to clean up.
Another thing to look for is sections of code that are hard to change or that no one on the team understands well nor wants to touch. Every project I've ever worked on has a bit of code or a subsystem that should come with a warning sign: "Here be dragons, BEWARE!". Usually these bits solve very specific problems that the author may not have fully understood at the time the code was written, but that had to be fixed. These areas of code need to be addressed at some point.
Finally, keep an eye on other indicators like test speed, deployment issues, operational headaches, low-impact but growing performance issues, and so on. All of these things can help you identify where you've accrued technical debt that needs to be paid back.
Once you understand where you have technical debt and the level of pain and frustration it is causing, the next step is to get rid of the low-hanging fruit. In many cases you will have technical debt that is tied to underused features. You should consider each underused feature as a candidate for removal. We developers (and everyone else involved in creating and selling projects) get caught up in the notion that more is better. In reality more features are only better if those features are bringing more value to customers than the cost of maintaining them.
To remove a feature you should first determine who is using it and what impact removing the feature will have on them. You may need to gather additional metrics in your system for specific feature usage. Once you know who is using a feature that you are considering removing you should contact them and explain to them that the feature is deprecated and will be removed. Listen to their feedback. Perhaps this feature is essential to them and they are a significant customer. If that's the case then you can reconsider the removal - after all, you haven't actually changed anything yet. If the response to the deprecation is minimal then begin to cut. Create a specific pull request that removes all code and tests related to that feature. If you remove a method that has tests which are directly testing that method or function, then remove just those tests. Run your test suite and see if there are other uses of the method or function.
There is always a risk that you will have something that is not well tested and used somewhere where you don't expect. Use tools like
ripgrep, or specific IDE features that can identify usage of functions and methods to make sure you aren't using removed code in an unexpected location. This may also help identify ambiguous function names and provide future naming guidance to make it easier to discover how different parts of your code depend on each other.
Roll out a single code removal at a time. Monitor your error handling system to ensure that you did not introduce unexpected behavior. Bask in the glory of code deletion and congratulate yourself on taking a small step towards technical debt payback.
Once you've removed low-hanging fruit the next step is to deal with those features that people use all the time and which they complain about for one reason or another. In my experience I often find that the features that are the most challenging for customers are ones where there is integration with multiple service providers behind the scenes. Place yourself in your customer's shoes and ask yourself: what am I actually trying to do here?
Can you take a multi-step process and convert it to a single step? Minimizing the number of steps required can help clean up code that is implementing those steps. If you can get a task down to a single step then you can also reduce the need to share state across multiple steps of a process. Removing dependencies on stateful data can also really help clean up your implementation.
Can you use sensible defaults, thus minimizing what your customer needs to do to accomplish a task? This can also help clean up your code by removing the amount of user-defined input you must process. Code spent validating and cleansing input can take up a lot of your time to implement and maintain. By removing unnecessary configuration requirements in your application, you can often remove unnecessary code.
In many cases your technical debt will be tied directly to early implementations of features. You probably have a better understanding on how the feature can and should be implemented at this point, so it may make sense to improve the flow of the feature, which will likely result in removal of technical debt at the same time.
Do you hear that sound? That sound of your CPU fans firing up may be the sound of your test suite telling you that it is working too hard. Test suites can tell you a lot about the implementation of your system. Hard-to-test code is probably hard-to-understand code. Do you find yourself creating a large amount of setup code just to run a particular test? Do you have a stack of mocks, or mocks that reference other mocks? Are you using "mock" as a verb? Do you have 1000-line test files? Does your test suite take tens of minutes or more to run? All of these things may be signs that your implementation has a sufficient amount of technical debt.
When you have sections of your implementation that are hard to test, the first thing to do is to try to understand why it is hard to test. Are there a significant number of objects or functions that the code you are testing depends upon? Interdependencies between parts of your system may be necessary, but it is also possible you have prematurely extracted code where it doesn't necessarily need to be extracted. I know I have fallen many times into the trap of premature extraction, thinking I was de-duplicating code when in fact I didn't fully understand the behavior of the system. You may want to try inlining all dependencies in a particular implementation as an experiment. Ensure that all data is actually passed directly into a function or method as opposed to depending on state that is either attached to an instance or made available through some sort of global entity.
Like it or not all projects accumulate technical debt over time. Eventually this technical debt can slow down your development process, at which point it will have to be addressed. Pay attention to indicators, remove unused code, and improve those hard to test systems and you'll be on your way to reducing technical debt in a healthy fashion.
I break things so Simone continues to have plenty to do. I occasionally have useful ideas, like building a domain and DNS provider that doesn't suck.
Configure DNSimple as your secondary DNS provider to improve your domain's availability and redundancy with AXFR zone transfers.
Get a free limited-edition t-shirt featuring the characters of howdns.works and howhttps.works with any new yearly subscription to DNSimple.