Engineering

Put down that ENV; what you should know about ENV configuration

Amelia Aronsohn on May 16, 2018

It's always a scary thing to wake up with your phone full of messages from Slack, especially when your title contains the word "operations". You tend to expect the worst. Yesterday was one of those days, I woke up, and my phone was filled with pings and emails.

To my joy, nothing had gone wrong, but there had been conversations where others wished I was around to contribute. Such indeed is the nature of a remote-first company; you have to be ready to have conversations at least somewhat asynchronously. No matter how you wish @onlyhavecans was around the simple fact of the matter is he lives in UTC-8 and is a big fan of sleeping in till 08:30. Anyways, you didn't click on this blog post to read about my sleep or asynchronous conversation methods we use at DNSimple, but you saw the big clickbait title 15 reason ENV variables are ruining your app or hopefully something else extra catchy.

The conversation going on while I was snoring sounded something like this:

DevOne: I'm building a new app for us. I've decided to put all configuration, secrets, and certificates into environment variables because that's simple and easy.

DevTwo: Whoah there. That's a bad idea. @OnlyHaveCans has my back here.

DevThree: Yea talk to @OnlyHaveCans because he will agree ENV variables are the devil.

DevOne: Hey @OnlyHaveCans everyone says ENV_VARS will break into my house at night and steal all my frozen paella. Please tell everyone how wrong they are and that ENV variables are misunderstood entirely and very lovely people.

DevFour: Maybe we should go forward with this PR, and after @OnlyHaveCans can weigh in we can reconsider changes for the next revision.

Me, an operator: I have a far more nuanced view on environment variables for application configuration, and no, I've never heard of a proven case of them stealing frozen paella, but they aren't entirely fantastic application configuration avenues either. Before I get too deep into my feelings on environment variables for configurations, secrets, and complex data, I would like to talk about what environment variables are and how they usually work.

I can hear you shouting at me through the screen!

I know what environment variables are! It's a key-value store that I can use to pass things to my app at runtime!

Well, you aren't entirely on the money there, and that's where most of the confusion comes from.

How environment variables work

According to POSIX standards environment variables are all null-terminated strings in the format name=value where name cannot contain the = character. Depending on implementation name may not start with a digit and only consist of uppercase letters, digits, and the underscore _. We've all seen this with examples like LC_LOAD_LETTER.

According to the specification we also see that the value part of our name=value string are usually not restricted except that they contain a string, end with a null byte, and the total space used to store the environment and the arguments to the process is limited to ARG_MAX bytes. But wait? Why the ARG_MAX variable? Keep this in mind; I'm about to get to why.

You might have noticed I used the word usually a lot above when describing how environment variables are formatted and that's because they are not just an operating system level construct but most of the time are handled (read mangled) by your shell before we can reach syscall level. Every shell implements and treats them differently, and service managers like Docker, Runit, etc. have their unique way of handling ENVs with their limitations.

The one way they are all treated the same is how the kernel gives your application access to them. When Linux and BSD executes a file they use C call called execve (Linux man and FreeBSD man). Without jumping too deep into the weeds on how these work, it's important to know that these take three things;

The process to run
The arguments to the command
A pointer to the parent process's environment

Once called, it allocates the memory needed and copies the program's arguments, all the environment strings from the parent program, and then the code of the program into memory. This is why ARG_MAX is a restriction. More or less the environment variables are just a masked set of arguments passed to your program!

So when you launch a shell it gets environment variables from its parent process, then it changes them or sets them as needed, then that new set is passed down to programs you run there, and if your application spawns any more processes, the child gets those environment variables (including any changes your program made too!) and so forth. It's all very hereditary. This means that if I change an environment variable in one shell the change is entirely invisible to every other process other than its children¹.

So back to how this applies to the conversation I missed. In my professional opinion storing complex or sensitive data in ENV vars is generally a bad idea™. However, if you are willing to take risks and you understand the gotchas of using environment variables as configuration this is an entirely valid use case; but you must take each one very seriously.

In no proper order, I'd like to enumerate the reasons why I am not a fan of using environment variables for complex or sensitive data. This isn't to say that some simple settings or environmentally based data should be there. By definition it's excellent to define to your application the environment it's running in but not a replacement for configuration and secrets files.

Complex data is poorly supported

At it's best environment variables is a set of strings in a key/value-ish format. No data types are supported.

By POSIX standards you can store any text in the value part of an environment variable, but depending on your configuration and software you will likely need to escape all shell whitespace and escape characters at a minimum. In practice, I've seen a lot of different things fall over or error in hard to decipher ways because of the various shell behaviors when trying to put in and take out complex information.

My rule of thumb is that unless you have excellent knowledge of your systems, you should actively handle encoding and decoding anything you put in an environment above and beyond what you can put in a URI if you want to avoid issues. If you know no text munching shell will come between you and your data feel free to put that prettifier formatted UTF-8 JSON blob in there, but know your limits.

Size is vaguely restricted and errors poorly

This is more of a portability concern than anything else. There is a lot of essential information in the environment that C and the kernel need to know how to run your program correctly for the situation. Then if you start piling in kilobytes or even megabytes of data into those environment variables, you will eventually discover there is a limit to all of this data.

There is not always a single environment size maximum², but there still will be a maximum amount of space imposed on the entirety of the environment and all of the arguments used to run the programs. This varies from 2048 kilobytes to a quarter of the maximum size of the process stack.

When you violate this constraint, it will not be pretty. Applications will not launch, maybe you will get errors about argument length from your shell, or possibly you will get direct C errors from execve.

This is a LOT harder to break in a very modern Linux (anything past 2.6), but if you are working with systems that have limited resources, you can violate this constraint.

Copied onto the stack

As highlighted above your environment is copied onto the process stack alongside arguments. While this doesn't matter to most programs, any hot-reloading system or high availability codebase needs to keep this in mind that you usually³ can't easily change environment variables for a running program externally. They are almost read-only but that's isn't a guarantee either. There is nothing that ensures environment immutibility; Your program can make setenv() syscalls to change it's own environment.

Also be aware certain languages have different behaviors for reading from them. Python reads from the environment on import of os and then doesn't re-read this automatically. Check the docs for more details

With a configuration file, certificate file, database, or some secret stores you can re-read and load configurations on a regular basis or when your application receives specific signals.

Highly varied setup in different environments

Environment variables are easily used tools in several SaaS setups because of how they carefully control how programs are executed and run and the fact they don't want to hand out storage.

This is non-trivial to securely design, encode, store, & decode yourself depending on your software and environment options. If you are using runit's envdir program then this isn't hard to set up at all in my opinion but what if you can't guarantee what supervisor you are using? Will you do what many programs do and write shell script wrappers around your applications to set up all the environment you need?

I also would like to point out this amazing yet extensive post about the complexities of using environment variables with docker; something I have had to reference more times than I would like to admit.

A configuration file can have a whole series of standard locations as well as command line configuration options to select custom ones.

Security

Storing things in the environment isn't very secure for a few reasons.

First of all, how are you loading secrets into the environment? Most often there is a file you are writing all these secrets to be loaded by a shell or process supervisor. Are all these files adequately secured and marked sensitive? If you are loading these into a system user's shell, then it's going to be difficult to track.

Secondly, the environment is extremely leaky. It is passed to all children processes by default so anything your application calls gets all your secrets and if that process is a debugger or logs dangerously, you are in trouble.

Finally, is the fact that many things log environment for debugging. From boot logs, to crash logs, to introspection tooling, a lot of it treat the environment no more sacred than the arguments passed to the program and will plain text upload them or write them out without a second thought.

So what is your advice here?

I've made a lot of statements of facts but really haven't given you a lot of actionable advice here. You might be asking yourself;

Ok, that's great; you told me a lot of things about environment variables but what SHOULD I do for my configurations, secrets, and other things?

Sadly I'm not going to give you a cookie cutter "always do this" answer. While reviewing this blog one of our developers pointed out that according to The Twelve-Factor Application Manifesto the environment is where all your application configuration belongs. I personally feel this advice is very narrow and only really applies to certain types of apps deployed in a certain way⁴. This makes configuring complex applications difficult, and debugging those configurations harder; I can't imagine if you had to configure Nginx wholly with environment variables.

My advice is to take the time to understand the platform you are deploying to and the caveats of the choices you make. You have a lot of places to store configurations from files on the filesystem, remote data and secret stores, and of course the environment. Ask yourself how much configuration you need, how complex this data is, and of course, how sensitive this data is.

After you've done all this deep contemplation try to minimize sensitive and complex data in the environment, opting to use files or datastores where available and reasonable. Also consider the difference between standard configurations and secret material; opting to try to make sure that you are storing and sharing secrets in a way that's easily rotateable and not easily leaked. There are many secrets management systems out there and available.

Try to treat environment variables like command line arguments and don't put anything in them that you wouldn't also pass as a flag. It is easier to punt down the road and make a quick os.getenv(...) call in your Python code but make sure to think about the operational and security risks of putting your SSL certificate and certificate chains in an series of environment variables.

Again handling is dependent on your implementation. Fish-Shell has a global daemon and the concept of "global" variables that can be set once and will update all your other shells. However, fish also has a much stricter sense of variable scope and exported vs. unexported variables than most shells. ↩
But there CAN be because MAX_ARG_STRINGLEN is a thing in newer Linux kernels ↩
If you are root and have a good debugger you can do anything ↩
Which is kinda outlined in the forward of the Manifesto. I've honestly not studied it. ↩

Share on Twitter and Facebook

Amelia Aronsohn

Kaizen junkie, list enthusiast, automation obsessor, unrepentant otaku, constantly impressed by how amazing technology is.

We think domain management should be easy.
That's why we continue building DNSimple.

Try us free for 30 days

4.3 out of 5 stars.

Based on Trustpilot.com and G2.com reviews.

Culture

Using time tracking to improve your remote working habits

What we learned, individually, from our collective time tracking experiment.

Antoine Meunier

Engineering

Two years of squash merge

A retrospective of the last two years where we adopted --squash as our default merge strategy for git branches.

Simone Carletti

Engineering

Elapsed time with Ruby, the right way

Elapsed time calculations based on Time.now are wrong. Learn why they are wrong and how to fix them.

Luca Guidi