From code to server: A DNSimple operations workflow
Hello there! It's time for another exciting installment of "The workflows of DNSimple." Last time we talked about… actually I don't know - we don't always post these things in order. ^__^
Anyways I wanted to share our Operational deploy strategy. Like many small teams trying to do big things, we use a lot of development operation strategies to optimize the work and make sure we have uniform deploys and systems. It's no surprise we use Chef for most everything but that's not the whole story. How do we get from code to operational happiness? It's time to find out.
It all starts out with a branch and a pull request for us. We don't use forks on our internal repos as it is much easier to work on each others code with a branch. We usually have a release per Pull Request so master is more or less what is deployed at all times. In the Pull Request itself we have a few extra requirements; Deploy Steps & Verification Steps. Below is a real life example of a PR's comments.
This moves process and dns_check to the new resource ## Deploy Steps: - [x] Deploy at least version 2.3 of the dnsimple_metrics cookbook to all environments - [ ] Deploy https://github.com/dnsimple/chef-dnsimple_erldnsimple/pull/71 to all environments - [ ] Deploy https://github.com/dnsimple/chef-dnsimple_zoneserver/pull/32 to all environments - [ ] deploy to canary - [ ] verify on canary - [ ] deploy to productions - [ ] verify production ## Verification Steps: - [ ] check that /etc/dd-agent/conf.d/dns-checks.yaml on canary contains ALL FOUR checks - [ ] check that /etc/dd-agent/conf.d/process.yaml on canary contains zoneserver & pdns checks - [ ] verify dns_checks for ALIAS show up in datadog - [ ] verify pnds_recursor process check shows up in datadog - [ ] verify dns_checks for erldnsimple are still present in datadog
That's a lot of text for what is actually a small change! They aren't always this big, in fact usually they are much smaller and simpler, but I wanted to grab one from a much more elaborate ticket I am working on, one which is changing how a lot of our monitoring resources are build under the covers. I think these sections are one of the more important and useful parts of our process to share.
We have a lot of cookbooks, and sometimes deploys depend on other deploys, or there is manual remediation steps in the deployment. Often this part is just "Deploy to environments x,y,z & verify", but having that well documented is useful, especially when someone else is someone wants to know if something is blocked or they are blocking something.
The verification steps are my favorite part. It makes you stop and think of everything you are affecting with these changes, and what you are going to check to make sure your changes work. We don't really have an acceptance environment for operations and we aren't really at the point where we have a fancy automate acceptance environment. We do have a staging and canary to deploy to and this is where we verify our changes have no breaking effects. Most often this is "Make sure the thing happened and didn't break this other thing", but in this case we are verifying that all the cookbooks now play nice together so there is a lot to check. This is a very commonly used place for feedback in reviews. Often someone will say "Hey but how will this affect X? should we check that X isn't affected?"
We try to enforce at least one review for all PRs in operations. Sometimes for large changesets we try for two reviews, but with the async nature of our team sometimes that would mean making tickets sit from overnight to days. Once those changes are done we squash and merge. We use GitHub's awesome feature to disable rebases and standard merges to prevent mistakes.
Now that it's in master it's all good! You are done, go have a coffee! J/K. While we have travis run our test suite on PRs, we do not have a CD pipeline for releases. It's something we want to do but we also need to perfect our standard pipeline before it can be continual and automated. To help facilitate this we have a script that lets you set the desired version. It then generates a change log, updates the version in the metadata, then finishes up with a berkshelf update before uploading it to our Chef Server. With most all the bugs hammered out of this script we are about ready to move this into something more automated soon!
Currently we control versioning in environment pins, which are controlled though files in our base chef repo. This allows us to make PR's against the files for any and all version bumps! Once merged to master we have a thor task that syncs the files and the server to keep everything up to date. This is fairly "standard" but we are currently looking at replacing all the environments with policy files since they are much more improved over the berkshelf workflow and highly recommended everywhere you check. Look forward to more posts about that later!
Well once your environment pins are synced up the cookbooks are all uploaded! You've got your changes all deployed. I hope you followed your verification and any deploy steps along the way otherwise we'll have quite the talk later.
So that's a walkthrough of how our chef workflow goes. I hope you all learned a trick or two, or maybe just something you can bring in and try in your own environment to speed things up.
Kaizen junkie, list enthusiast, automation obsessor, unrepentant otaku, constantly impressed by how amazing technology is.
We think domain management should be easy.
That's why we continue building DNSimple.
Elapsed time with Ruby, the right way
Elapsed time calculations based on Time.now are wrong. Learn why they are wrong and how to fix them.
How We Work as a Remote Team
Inspired by a recent blog post from Travis CI, I'd like to share details about how DNSimple team members work together without offices.