Populating a config file when a pod starts - kubernetes

I am trying to run a legacy application inside Kubernetes. The application consists of one of more controllers, and one or more workers. The workers and controllers can be scaled independently. The controllers take a configuration file as a command line option, and the configuration looks similar to the following:
instanceId=hostname_of_machine
Memory=XXX
....
I need to be able to populate the instanceId field with the name of the machine, and this needs to be stable over time. What are the general guidelines for implementing something like this? The application doesn't support environment variables, so my first thought was to record the stateful set stable network ID in an environment variable and rewrite the configuration file with an init container. Is there a cleaner way to approach this? I haven't found a whole lot of solutions when I searched the 'net.

Nope, that's the way to do it (use an initContainer to update the config file).

Related

a very simple Kubernetes Scheduler question about custom scheduler

Just a quick question.. Do you HAVE to remove or move the default kube-scheduler.yaml from the folder? Can't I just make a new yaml(with the custom scheduler) and run that in the pod?
Kubernetes isn't file-based. It doesn't care about the file location. You use the files only to apply the configuration onto the cluster via a kubectl / kubeadm or similar CLI tools or their libraries. The yaml is only the content you manually put into it.
You need to know/decide what your folder structure and the execution/configuration flow is.
Also, you can simply have a temporary fule, the naming doesn't matter as well and it's alright to replace the content of a yaml file. Preferably though, try to have some kind of history record such as manual note, comment or a source control such as git in place, so you know what and why was changed.
So yes, you can change the scheduler yaml or you can create a new file and reorganize it however you like but you will need to adjust your flow to that - change paths, etc.

Is it possible for Deis Workflow read values from ConfigMap?

I have installed Deis Workflow v.2.11 in a GKE cluster, and some of our applications share values in common, like a proxy URL e credentials. I can use these values putting them into environment variables, or even in a .env file.
However, every new application, I need to create a .env file, with shared values and then, call
deis config:push
If one of those shared value changes, I need to adjust every configuration of every app and restart them. I would like to modify the value in ConfigMap once and, after changes, Deis restart the applications.
Does anyone know if it is possible to read values from Kubernetes ConfigMap and to put them into Deis environment variables? Moreover, if yes, how do I do it?
I believe what you're looking for is a way to set environment variables globally across all applications. That is currently not implemented. However, please feel free to hack up a PR and we'd likely accept it!
https://github.com/deis/controller/issues/383
https://github.com/deis/controller/issues/1219
Currently there is no support for configMaps in Deis Workflow v2.18.0 . We would appreciate a PR into the Hephy Workflow (open source fork of Deis Workflow). https://github.com/teamhephy/controller
There is no functionality right now to capture configMap in by the init scripts of the containers.
You could update the configMap, but each of the applications would need to run kubectl replace -f path/accessible/for/everyone/configmap.yaml to get the variables updated.
So, I would say yes, at Kubernetes level you can do it. Just figure out the best way for your apps to update the configMap. I don't have details of your use case, so I can't tell you specific ways.

How to implement the "One Binary" principle with Docker

The One Binary principle explained here:
http://programmer.97things.oreilly.com/wiki/index.php/One_Binary states that one should...
"Build a single binary that you can identify and promote through all the stages in the release pipeline. Hold environment-specific details in the environment. This could mean, for example, keeping them in the component container, in a known file, or in the path."
I see many dev-ops engineers arguably violate this principle by creating one docker image per environment (ie, my-app-qa, my-app-prod and so on). I know that Docker favours immutable infrastructure which implies not changing an image after deployment, therefore not uploading or downloading configuration post deployment. Is there a trade-off between immutable infrastructure and the one binary principle or can they complement each-other? When it comes to separating configuration from code what is the best practice in a Docker world??? Which one of the following approaches should one take...
1) Creating a base binary image and then having a configuration Dockerfile that augments this image by adding environment specific configuration. (i.e my-app -> my-app-prod)
2) Deploying a binary-only docker image to the container and passing in the configuration through environment variables and so on at deploy time.
3) Uploading the configuration after deploying the Docker file to a container
4) Downloading configuration from a configuration management server from the running docker image inside the container.
5) Keeping the configuration in the host environment and making it available to the running Docker instance through a bind mount.
Is there another better approach not mentioned above?
How can one enforce the one binary principle using immutable infrastructure? Can it be done or is there a trade-off? What is the best practice??
I've about 2 years of experience deploying Docker containers now, so I'm going to talk about what I've done and/or know to work.
So, let me first begin by saying that containers should definitely be immutable (I even mark mine as read-only).
Main approaches:
use configuration files by setting a static entrypoint and overriding the configuration file location by overriding the container startup command - that's less flexible, since one would have to commit the change and redeploy in order to enable it; not fit for passwords, secure tokens, etc
use configuration files by overriding their location with an environment variable - again, depends on having the configuration files prepped in advance; ; not fit for passwords, secure tokens, etc
use environment variables - that might need a change in the deployment code, thus lessening the time to get the config change live, since it doesn't need to go through the application build phase (in most cases), deploying such a change might be pretty easy. Here's an example - if deploying a containerised application to Marathon, changing an environment variable could potentially just start a new container from the last used container image (potentially on the same host even), which means that this could be done in mere seconds; not fit for passwords, secure tokens, etc, and especially so in Docker
store the configuration in a k/v store like Consul, make the application aware of that and let it be even dynamically reconfigurable. Great approach for launching features simultaneously - possibly even accross multiple services; if implemented with a solution such as HashiCorp Vault provides secure storage for sensitive information, you could even have ephemeral secrets (an example would be the PostgreSQL secret backend for Vault - https://www.vaultproject.io/docs/secrets/postgresql/index.html)
have an application or script create the configuration files before starting the main application - store the configuration in a k/v store like Consul, use something like consul-template in order to populate the app config; a bit more secure - since you're not carrying everything over through the whole pipeline as code
have an application or script populate the environment variables before starting the main application - an example for that would be envconsul; not fit for sensitive information - someone with access to the Docker API (either through the TCP or UNIX socket) would be able to read those
I've even had a situation in which we were populating variables into AWS' instance user_data and injecting them into container on startup (with a script that modifies containers' json config on startup)
The main things that I'd take into consideration:
what are the variables that I'm exposing and when and where am I getting their values from (could be the CD software, or something else) - for example you could publish the AWS RDS endpoint and credentials to instance's user_data, potentially even EC2 tags with some IAM instance profile magic
how many variables do we have to manage and how often do we change some of them - if we have a handful, we could probably just go with environment variables, or use environment variables for the most commonly changed ones and variables stored in a file for those that we change less often
and how fast do we want to see them changed - if it's a file, it typically takes more time to deploy it to production; if we're using environment variable
s, we can usually deploy those changes much faster
how do we protect some of them - where do we inject them and how - for example Ansible Vault, HashiCorp Vault, keeping them in a separate repo, etc
how do we deploy - that could be a JSON config file sent to an deployment framework endpoint, Ansible, etc
what's the environment that we're having - is it realistic to have something like Consul as a config data store (Consul has 2 different kinds of agents - client and server)
I tend to prefer the most complex case of having them stored in a central place (k/v store, database) and have them changed dynamically, because I've encountered the following cases:
slow deployment pipelines - which makes it really slow to change a config file and have it deployed
having too many environment variables - this could really grow out of hand
having to turn on a feature flag across the whole fleet (consisting of tens of services) at once
an environment in which there is real strive to increase security by better handling sensitive config data
I've probably missed something, but I guess that should be enough of a trigger to think about what would be best for your environment
How I've done it in the past is to incorporate tokenization into the packaging process after a build is executed. These tokens can be managed in an orchestration layer that sits on top to manage your platform tools. So for a given token, there is a matching regex or xpath expression. That token is linked to one or many config files, depending on the relationship that is chosen. Then, when this build is deployed to a container, a platform service (i.e. config mgmt) will poke these tokens with the correct value with respect to its environment. These poke values most likely would be pulled from a vault.

"Injecting" configuration files at startup

I have a number of legacy services running which read their configuration files from disk and a separate daemon which updates these files as they change in zookeeper (somewhat similar to confd).
For most of these types of configuration we would love to move to a more environment variable like model, where the config is fixed for the lifetime of the pod. We need to keep the outside config files as the source of truth as services are transitioning from the legacy model to kubernetes, however. I'm curious if there is a clean way to do this in kubernetes.
A simplified version of the current model that we are pursuing is:
Create a docker image which has a utility for fetching config files and writing them to disk ones. Then writes a /donepath/done file.
The main image waits until the done file exists. Then allows the normal service startup to progress.
Use an empty dir volume and volume mounts to get the conf from the helper image into the main image.
I keep seeing instances of this problem where I "just" need to get a couple of files into the docker image at startup (to allow per-env/canary/etc variance), and running all of this machinery each time seems like a burden throw on devs. I'm curious if there is a more simplistic way to do this already in kubernetes or on the horizon.
You can use the ADD command in your Dockerfile. It is used as ADD File /path/in/docker. This will allow you to add a complete file quickly to your container. You need to have the file you want to add to the image in the same directory as the Dockerfile when you build the container. You can also add a tar file this way which will be expanded during the build.
Another option is the ENV command in a your Dockerfile. This adds the data as an environment variable.

Using Ansible to automatically configure AWS autoscaling group instances

I'm using Amazon Web Services to create an autoscaling group of application instances behind an Elastic Load Balancer. I'm using a CloudFormation template to create the autoscaling group + load balancer and have been using Ansible to configure other instances.
I'm having trouble wrapping my head around how to design things such that when new autoscaling instances come up, they can automatically be provisioned by Ansible (that is, without me needing to find out the new instance's hostname and run Ansible for it). I've looked into Ansible's ansible-pull feature but I'm not quite sure I understand how to use it. It requires a central git repository which it pulls from, but how do you deal with sensitive information which you wouldn't want to commit?
Also, the current way I'm using Ansible with AWS is to create the stack using a CloudFormation template, then I get the hostnames as output from the stack, and then generate a hosts file for Ansible to use. This doesn't feel quite right – is there "best practice" for this?
Yes, another way is just to simply run your playbooks locally once the instance starts. For example you can create an EC2 AMI for your deployment that in the rc.local file (Linux) calls ansible-playbook -i <inventory-only-with-localhost-file> <your-playbook>.yml. rc.local is almost the last script run at startup.
You could just store that sensitive information in your EC2 AMI, but this is a very wide topic and really depends on what kind of sensitive information it is. (You can also use private git repositories to store sensitive data).
If for example your playbooks get updated regularly you can create a cron entry in your AMI that runs every so often and that actually runs your playbook to make sure your instance configuration is always up to date. Thus avoiding having "push" from a remote workstation.
This is just one approach there could be many others and it depends on what kind of service you are running, what kind data you are using, etc.
I don't think you should use Ansible to configure new auto-scaled instances. Instead use Ansible to configure a new image, of which you will create an AMI (Amazon Machine Image), and order AWS autoscaling to launch from that instead.
On top of this, you should also use Ansible to easily update your existing running instances whenever you change your playbook.
Alternatives
There are a few ways to do this. First, I wanted to cover some alternative ways.
One option is to use Ansible Tower. This creates a dependency though: your Ansible Tower server needs to be up and running at the time autoscaling or similar happens.
The other option is to use something like packer.io and build fully-functioning server AMIs. You can install all your code into these using Ansible. This doesn't have any non-AWS dependencies, and has the advantage that it means servers start up fast. Generally speaking building AMIs is the recommended approach for autoscaling.
Ansible Config in S3 Buckets
The alternative route is a bit more complex, but has worked well for us when running a large site (millions of users). It's "serverless" and only depends on AWS services. It also supports multiple Availability Zones well, and doesn't depend on running any central server.
I've put together a GitHub repo that contains a fully-working example with Cloudformation. I also put together a presentation for the London Ansible meetup.
Overall, it works as follows:
Create S3 buckets for storing the pieces that you're going to need to bootstrap your servers.
Save your Ansible playbook and roles etc in one of those S3 buckets.
Have your Autoscaling process run a small shell script. This script fetches things from your S3 buckets and uses it to "bootstrap" Ansible.
Ansible then does everything else.
All secret values such as Database passwords are stored in CloudFormation Parameter values. The 'bootstrap' shell script copies these into an Ansible fact file.
So that you're not dependent on external services being up you also need to save any build dependencies (eg: any .deb files, package install files or similar) in an S3 bucket. You want this because you don't want to require ansible.com or similar to be up and running for your Autoscale bootstrap script to be able to run. Generally speaking I've tried to only depend on Amazon services like S3.
In our case, we then also use AWS CodeDeploy to actually install the Rails application itself.
The key bits of the config relating to the above are:
S3 Bucket Creation
Script that copies things to S3
Script to copy Bootstrap Ansible. This is the core of the process. This also writes the Ansible fact files based on the CloudFormation parameters.
Use the Facts in the template.