Using Ansible to automatically configure AWS autoscaling group instances - deployment

I'm using Amazon Web Services to create an autoscaling group of application instances behind an Elastic Load Balancer. I'm using a CloudFormation template to create the autoscaling group + load balancer and have been using Ansible to configure other instances.
I'm having trouble wrapping my head around how to design things such that when new autoscaling instances come up, they can automatically be provisioned by Ansible (that is, without me needing to find out the new instance's hostname and run Ansible for it). I've looked into Ansible's ansible-pull feature but I'm not quite sure I understand how to use it. It requires a central git repository which it pulls from, but how do you deal with sensitive information which you wouldn't want to commit?
Also, the current way I'm using Ansible with AWS is to create the stack using a CloudFormation template, then I get the hostnames as output from the stack, and then generate a hosts file for Ansible to use. This doesn't feel quite right – is there "best practice" for this?

Yes, another way is just to simply run your playbooks locally once the instance starts. For example you can create an EC2 AMI for your deployment that in the rc.local file (Linux) calls ansible-playbook -i <inventory-only-with-localhost-file> <your-playbook>.yml. rc.local is almost the last script run at startup.
You could just store that sensitive information in your EC2 AMI, but this is a very wide topic and really depends on what kind of sensitive information it is. (You can also use private git repositories to store sensitive data).
If for example your playbooks get updated regularly you can create a cron entry in your AMI that runs every so often and that actually runs your playbook to make sure your instance configuration is always up to date. Thus avoiding having "push" from a remote workstation.
This is just one approach there could be many others and it depends on what kind of service you are running, what kind data you are using, etc.

I don't think you should use Ansible to configure new auto-scaled instances. Instead use Ansible to configure a new image, of which you will create an AMI (Amazon Machine Image), and order AWS autoscaling to launch from that instead.
On top of this, you should also use Ansible to easily update your existing running instances whenever you change your playbook.

Alternatives
There are a few ways to do this. First, I wanted to cover some alternative ways.
One option is to use Ansible Tower. This creates a dependency though: your Ansible Tower server needs to be up and running at the time autoscaling or similar happens.
The other option is to use something like packer.io and build fully-functioning server AMIs. You can install all your code into these using Ansible. This doesn't have any non-AWS dependencies, and has the advantage that it means servers start up fast. Generally speaking building AMIs is the recommended approach for autoscaling.
Ansible Config in S3 Buckets
The alternative route is a bit more complex, but has worked well for us when running a large site (millions of users). It's "serverless" and only depends on AWS services. It also supports multiple Availability Zones well, and doesn't depend on running any central server.
I've put together a GitHub repo that contains a fully-working example with Cloudformation. I also put together a presentation for the London Ansible meetup.
Overall, it works as follows:
Create S3 buckets for storing the pieces that you're going to need to bootstrap your servers.
Save your Ansible playbook and roles etc in one of those S3 buckets.
Have your Autoscaling process run a small shell script. This script fetches things from your S3 buckets and uses it to "bootstrap" Ansible.
Ansible then does everything else.
All secret values such as Database passwords are stored in CloudFormation Parameter values. The 'bootstrap' shell script copies these into an Ansible fact file.
So that you're not dependent on external services being up you also need to save any build dependencies (eg: any .deb files, package install files or similar) in an S3 bucket. You want this because you don't want to require ansible.com or similar to be up and running for your Autoscale bootstrap script to be able to run. Generally speaking I've tried to only depend on Amazon services like S3.
In our case, we then also use AWS CodeDeploy to actually install the Rails application itself.
The key bits of the config relating to the above are:
S3 Bucket Creation
Script that copies things to S3
Script to copy Bootstrap Ansible. This is the core of the process. This also writes the Ansible fact files based on the CloudFormation parameters.
Use the Facts in the template.

Related

Best way to deploy long-running high-compute app to GCP

I have a python app that builds a dataset for a machine learning task on GCP.
Currently I have to start an instance of a VM that we have, and then SSH in, and run the app, which will complete in 2-24 hours depending on the size of the dataset requested.
Once the dataset is complete the VM needs to be shutdown so we don't incur additional charges.
I am looking to streamline this process as much as possible, so that we have a "1 click" or "1 command" solution, but I'm not sure the best way to go about it.
From what I've read about so far it seems like containers might be a good way to go, but I'm inexperienced with docker.
Can I setup a container that will pip install the latest app from our private GitHub and execute the dataset build before shutting down? How would I pass information to the container such as where to get the config file etc? It's conceivable that we will have multiple datasets being generated at the same time based on different config files.
Is there a better gcloud feature that suits our purpose more effectively than containers?
I'm struggling to get information regarding these basic questions, it seems like container tutorials are dominated by web apps.
It would be useful to have a batch-like container service that runs a container until its process completes. I'm unsure whether such a service exists. I'm most familiar with Google Cloud Platform and this provides a wealth of compute and container services. However -- to your point -- these predominantly scale by (HTTP) requests.
One possibility may be Cloud Run and to trigger jobs using Cloud Pub/Sub. I see there's async capabilities too and this may be interesting (I've not explored).
Another runtime for you to consider is Kubernetes itself. While Kubernetes requires some overhead in having Google, AWS or Azure manage a cluster for you (I strongly recommend you don't run Kubernetes yourself) and some inertia in the capacity of the cluster's nodes vs. the needs of your jobs, as you scale the number of jobs, you will smooth these needs. A big advantage with Kubernetes is that it will scale (nodes|pods) as you need them. You tell Kubernetes to run X container jobs, it does it (and cleans-up) without much additional management on your part.
I'm biased and approach the container vs image question mostly from a perspective of defaulting to container-first. In this case, you'd receive several benefits from containerizing your solution:
reproducible: the same image is more probable to produce the same results
deployability: container run vs. manage OS, app stack, test for consistency etc.
maintainable: smaller image representing your app, less work to maintain it
One (beneficial!?) workflow change if you choose to use containers is that you will need to build your images before using them. Something like Knative combines these steps but, I'd stick with doing-this-yourself initially. A common solution is to trigger builds (Docker, GitHub Actions, Cloud Build) from your source code repo. Commonly you would run tests against the images that are built but you may also run your machine-learning tasks this way too.
Your containers would container only your code. When you build your container images, you would pip install, perhaps pip install --requirement requirements.txt to pull the appropriate packages. Your data (models?) are better kept separate from your code when this makes sense. When your runtime platform runs containers for you, you provide configuration information (environment variables and|or flags) to the container.
The use of a startup script seems to better fit the bill compared to containers. The instance always executes startup scripts as root, thus you can do anything you like, as the command will be executed as root.
A startup script will perform automated tasks every time your instance boots up. Startup scripts can perform many actions, such as installing software, performing updates, turning on services, and any other tasks defined in the script.
Keep in mind that a startup script cannot stop an instance but you can stop an instance through the guest operating system.
This would be the ideal solution for the question you posed. This would require you to make a small change in your Python app where the Operating system shuts off when the dataset is complete.
Q1) Can I setup a container that will pip install the latest app from our private GitHub and execute the dataset build before shutting down?
A1) Medium has a great article on installing a package from a private git repo inside a container. You can execute the dataset build before shutting down.
Q2) How would I pass information to the container such as where to get the config file etc?
A2) You can use ENV to set an environment variable. These will be available within the container.
You may consider looking into Docker for more information about container.

How to implement the "One Binary" principle with Docker

The One Binary principle explained here:
http://programmer.97things.oreilly.com/wiki/index.php/One_Binary states that one should...
"Build a single binary that you can identify and promote through all the stages in the release pipeline. Hold environment-specific details in the environment. This could mean, for example, keeping them in the component container, in a known file, or in the path."
I see many dev-ops engineers arguably violate this principle by creating one docker image per environment (ie, my-app-qa, my-app-prod and so on). I know that Docker favours immutable infrastructure which implies not changing an image after deployment, therefore not uploading or downloading configuration post deployment. Is there a trade-off between immutable infrastructure and the one binary principle or can they complement each-other? When it comes to separating configuration from code what is the best practice in a Docker world??? Which one of the following approaches should one take...
1) Creating a base binary image and then having a configuration Dockerfile that augments this image by adding environment specific configuration. (i.e my-app -> my-app-prod)
2) Deploying a binary-only docker image to the container and passing in the configuration through environment variables and so on at deploy time.
3) Uploading the configuration after deploying the Docker file to a container
4) Downloading configuration from a configuration management server from the running docker image inside the container.
5) Keeping the configuration in the host environment and making it available to the running Docker instance through a bind mount.
Is there another better approach not mentioned above?
How can one enforce the one binary principle using immutable infrastructure? Can it be done or is there a trade-off? What is the best practice??
I've about 2 years of experience deploying Docker containers now, so I'm going to talk about what I've done and/or know to work.
So, let me first begin by saying that containers should definitely be immutable (I even mark mine as read-only).
Main approaches:
use configuration files by setting a static entrypoint and overriding the configuration file location by overriding the container startup command - that's less flexible, since one would have to commit the change and redeploy in order to enable it; not fit for passwords, secure tokens, etc
use configuration files by overriding their location with an environment variable - again, depends on having the configuration files prepped in advance; ; not fit for passwords, secure tokens, etc
use environment variables - that might need a change in the deployment code, thus lessening the time to get the config change live, since it doesn't need to go through the application build phase (in most cases), deploying such a change might be pretty easy. Here's an example - if deploying a containerised application to Marathon, changing an environment variable could potentially just start a new container from the last used container image (potentially on the same host even), which means that this could be done in mere seconds; not fit for passwords, secure tokens, etc, and especially so in Docker
store the configuration in a k/v store like Consul, make the application aware of that and let it be even dynamically reconfigurable. Great approach for launching features simultaneously - possibly even accross multiple services; if implemented with a solution such as HashiCorp Vault provides secure storage for sensitive information, you could even have ephemeral secrets (an example would be the PostgreSQL secret backend for Vault - https://www.vaultproject.io/docs/secrets/postgresql/index.html)
have an application or script create the configuration files before starting the main application - store the configuration in a k/v store like Consul, use something like consul-template in order to populate the app config; a bit more secure - since you're not carrying everything over through the whole pipeline as code
have an application or script populate the environment variables before starting the main application - an example for that would be envconsul; not fit for sensitive information - someone with access to the Docker API (either through the TCP or UNIX socket) would be able to read those
I've even had a situation in which we were populating variables into AWS' instance user_data and injecting them into container on startup (with a script that modifies containers' json config on startup)
The main things that I'd take into consideration:
what are the variables that I'm exposing and when and where am I getting their values from (could be the CD software, or something else) - for example you could publish the AWS RDS endpoint and credentials to instance's user_data, potentially even EC2 tags with some IAM instance profile magic
how many variables do we have to manage and how often do we change some of them - if we have a handful, we could probably just go with environment variables, or use environment variables for the most commonly changed ones and variables stored in a file for those that we change less often
and how fast do we want to see them changed - if it's a file, it typically takes more time to deploy it to production; if we're using environment variable
s, we can usually deploy those changes much faster
how do we protect some of them - where do we inject them and how - for example Ansible Vault, HashiCorp Vault, keeping them in a separate repo, etc
how do we deploy - that could be a JSON config file sent to an deployment framework endpoint, Ansible, etc
what's the environment that we're having - is it realistic to have something like Consul as a config data store (Consul has 2 different kinds of agents - client and server)
I tend to prefer the most complex case of having them stored in a central place (k/v store, database) and have them changed dynamically, because I've encountered the following cases:
slow deployment pipelines - which makes it really slow to change a config file and have it deployed
having too many environment variables - this could really grow out of hand
having to turn on a feature flag across the whole fleet (consisting of tens of services) at once
an environment in which there is real strive to increase security by better handling sensitive config data
I've probably missed something, but I guess that should be enough of a trigger to think about what would be best for your environment
How I've done it in the past is to incorporate tokenization into the packaging process after a build is executed. These tokens can be managed in an orchestration layer that sits on top to manage your platform tools. So for a given token, there is a matching regex or xpath expression. That token is linked to one or many config files, depending on the relationship that is chosen. Then, when this build is deployed to a container, a platform service (i.e. config mgmt) will poke these tokens with the correct value with respect to its environment. These poke values most likely would be pulled from a vault.

Best practice for getting RDS password to docker container on ECS

I am using Postgres Amazon RDS and Amazon ECS for running my docker containers.
The question is. What is the best practice for getting the username and password for the RDS database into the docker container running on ECS?
I see a few options:
Build the credentials into docker image. I don't like this since then everyone with access to the image can get the password.
Put the credentials in the userdata of the launch configuration used by the autoscaling group for ECS. With this approach all docker images running on my ECS cluster has access to the credentials. I don't really like that either. That way if a blackhat finds a security hole in any of my services (even services that does not use the database) he will be able to get the credentials for the database.
Put the credentials in a S3 and control the limit the access to that bucket with a IAM role that the ECS server has. Same drawbacks as putting them in the userdata.
Put the credentials in the Task Definition of ECS. I don't see any drawbacks here.
What is your thoughts on the best way to do this? Did I miss any options?
regards,
Tobias
Building it into the container is never recomended. Makes it hard to distribute and change.
Putting it into the ECS instances does not help your containers to use it. They are isolated and you'd end up with them on all instances instead of just where the containers are that need them.
Putting them into S3 means you'll have to write that functionality into your container. And it's another place to have configuration.
Putting them into your task definition is the recommended way. You can use the environment portion for this. It's flexible. It's also how PaaS offerings like Heroku and Elastic Beanstalk use DB connection strings for Ruby on rails and other services. Last benefit is it makes it easy to use your containers against different databases (like dev, test, prod) without rebuilding containers or building weird functionality
The accepted answer recommends configuring environment variables in the task definition. This configuration is buried deep in the ECS web console. You have to:
Navigate to Task Definitions
Select the correct task and revision
Choose to create a new revision (not allowed to edit existing)
Scroll down to the container section and select the correct container
Scroll down to the Env Variables section
Add your configuration
Save the configuration and task revision
Choose to update your service with the new task revision
This tutorial has screenshots that illustrate where to go.
Full disclosure: This tutorial features containers from Bitnami and I work for Bitnami. However the thoughts expressed here are my own and not the opinion of Bitnami.
For what it's worth, while putting credentials into environment variables in your task definition is certainly convenient, it's generally regarded as not particularly secure -- other processes can access your environment variables.
I'm not saying you can't do it this way -- I'm sure there are lots of people doing exactly this, but I wouldn't call it "best practice" either. Using Amazon Secrets Manager or SSM Parameter Store is definitely more secure, although getting your credentials out of there for use has its own challenges and on some platforms those challenges may make configuring your database connection much harder.
Still -- it seems like a good idea that anyone running across this question be at least aware that using the task definition for secrets is ... shall way say ... frowned upon?

cloudformation best practices in AWS

We are at early stages with running our services on AWS. We have our server hosted in AWS, in a VPC, having private and public subnets and have multiple instances in private and public subnets using ELB and autoscaling setup (using AMIs) for frontend web servers. The whole environement(VPC, security groups, EC2 instances, DB instances, S3 buckets, cloudfront) are setup manually using AWS console at first.
Application servers host jboss and war files are deployed on the servers.
As per AWS best practices we want to create whole infrastructure using cloudformation and have setup test/stage/prod environment.
-Would it be a good idea to have all the above componenets (VPC, security groups, EC2 instances, DB instances, S3 buckets, cloudfront etc) using one cloudformation stack/template? Or we should we create two stacks 1) having network replated components and 2) having EC2 related components?
-Once we have a prod envoronemtn running with cloudformation stact and In case we want to update the new AMIs on prod in future, how can we update the live running EC2 instances using cloudformation without interruptions?
-What are the best practices/multiple ways for code deployment to multiple EC2 notes when a new release is done? We dont use Contineus integration at the moment.
It's a very good idea to separate your setup into multiple stacks. One obvious reason is that stacks have certain limits that you may reach eventually. A more practical reason is that you don't really need to update, say, your VPC every time you just want to deploy a new version. The network architecture typically changes less frequently. Another reason to avoid having one huge template, or to make changes to an "important" template needlessly, is that you always run the risk of messing things up. If there's an error in your template and you remove an important resource by accident (e.g. commented out) you'll be very sorry. So separating stacks out of sheer caution is probably a good idea.
If you want to update your application you can simply update the template with the new AMIs and CFN will know what needs to be recreated or updated. You can read about rolling updates here. However, I'd recommend considering using something a bit more straightforward for deploying your actual code, like Ansible or Chef.
I'd also recommend you look into Docker for packaging and deploying your application's nodes. Very handy.

When is cloud-init run and how does it find its data?

I'm currently dealing with CoreOS, and so far I think I got the overall idea and concept. One thing that I did not yet get is execution of cloud-init.
I understand that cloud-init is a process that does some configuration for CoreOS. What I do not yet understand is…
When does CoreOS run cloud-init? On first boot? On each boot? …?
How does cloud-init know where to find its configuration data? I've seen that there is config-drive and that totally makes sense, but is this the only way? What exactly is the role of the user-data file? …?
CoreOS runs cloudinit a few times during the boot process. Right now this happens at each boot, but that functionality may change in the future.
The first pass is the OEM cloud-init, which is baked into the image to set up networking and other features required for that provider. This is done for EC2, Rackspace, Google Compute Engine, etc since they all have different requirements. You can see these files on Github.
The second pass is the user-data pass, which is handled differently per provider. For example, EC2 allows the user to input free-form text in their UI, which is stored in their metadata service. The EC2 OEM has a unit that reads this metadata and passes it to the second cloud-init run. On Rackspace/Openstack, config-drive is used to mount a read-only filesystem that contains the user-data. The Rackspace and Openstack OEMs know to mount and look for the user-data file at that location.
The latest version of CoreOS also has a flag to fetch a remote file to be evaluated for use with PXE booting.
The CoreOS distribution docs have a few more details as well.