Tales of a Chef Workflow: Keeping a tidy server
In the next installment of the Tales of a Chef Workflow series, I'm going to cover a seldom mentioned topic in the world of Chef which is cleanup of resources. At DNSimple, we're a mostly bare-metal setup with the vast majority of our servers being in 5 physical locations around the globe. This translates to us not using a container-based workflow to manage changes of our systems so each change we deploy must be mindful of the previous change. We don't have the luxury of simply making a new image to deploy updated code so we must work with what is already there on the server. In the development world you might call this a brownfield where you are working on a legacy application and need to change out old code for new code. For the operators who use change management systems, this means removing old code, configuration files, or system services. I'm going to cover a few ways of how we approach cleaning up our servers using Chef.
Leveraging Default Actions
For the DNSimple web application we employ feature flags to enable and disable sections of our software for a myriad of purposes. Maybe we want to beta test a feature or guard a feature so only certain account types can access them. Whatever the reason, we leverage a mechanism in Chef called tags which are stored as normal level attributes with the node they are attached to. With these tags, we can enable or disable features in our cookbook without altering environments or roles. You can then use these tags as indicators for which action to take in a given resource. To illustrate this technique, I'll show you how we use a chef tag to control enabling and disabling of the Chef Client from running on our systems.
In this resource, we install a crontab for running our chef-client every so often. I've removed a bit of code for brevity, but it should still help explain this technique:
cron_d 'chef-client' do
With the above resource, it will create a crond entry for our chef client to run every so often. This resource, like many others in Chef have a default action which is typically a
:add. Using this knowledge of default actions, we can take an opposite action based upon the chef tag present in the chef run.
chef_disabled = tagged?('disable-chef')
cron_d 'chef-client' do
action :delete if chef_disabled
If you look closely at the example, check out the action line at the bottom. We're only going to run the
:delete action if the
disable-chef tag is present, otherwise we will assume the default of creating the crontab. It's a subtle change with a powerful result. Now the server will remove the chef client on its next run until that tag is removed and chef is re-run to restore the service. Every chef resource will come with a default action declaration if there is more than one action defined for it. Here we are simply only calling the opposite action in a resource under a given circumstance. You can use this simple technique to remove or disable unnecessary services, templates, files, and more from your server to keep things neat and tidy between releases. Chef has not reached the point of self-awareness to know the delta of changes between chef cookbooks, but it's not very hard to give it this ability provided you are mindful of system changes.
The ZAP Pattern
Another variation of this cleanup pattern is the ZAP pattern. Joe Nuspl coined the zap pattern and has even codified it into a library cookbook. With ZAP, it teaches Chef resources to smarter about cleanup of specific resource artifacts when they are no longer defined in a chef run. We don't use this ourselves, but it is definitely worth a look and exploring for yourself as an option.
If your systems are not re-imaged with every deploy, it would be wise to utilize this technique. We use default actions (and opposite actions) in a lot of places for our workflow to keep a more predictable and clean environment to deliver a stable and reliable DNS service to you every day. It keeps cleanup inline and avoids cloning of chef resources which is deprecated in the Chef 13.2 release.
Software and Server maintainer by day, board and video game geek by night.
We think domain management should be easy.
That's why we continue building DNSimple.
Elapsed time with Ruby, the right way
Elapsed time calculations based on Time.now are wrong. Learn why they are wrong and how to fix them.
Technical reasons behind the ALIAS record
In this article I will try to explain the technical reason behind the ALIAS record and important limitations of the CNAME record you need to know.