GCloud authentication race conditions - gcloud

I'm trying to avoid race conditions with gcloud / gsutil authentication on the same system but different CI/CD jobs on my Gitlab-Runner on a Mac Mini.
I have tried setting the auth manually with
RUN gcloud auth activate-service-account --key-file="gitlab-runner.json"
RUN gcloud config set project $GCP_PROJECT_ID
for the Dockerfile (in which I'm performing a download operation from a Google Cloud Storage bucket).
I'm using a configuration in the bash script to run the docker command and in the same script for authenticating I'm using
gcloud config configurations activate $TARGET
Where I've previously done the above two commands to save them to the configuration.
The configurations are working fine if I start the CI/CD jobs one after the other has finished. But I want to trigger them for all clients at the same time, which causes race conditions with gcloud authentication and one of the jobs trying to download from the wrong project bucket.
How to avoid a race condition? I'm already authenticating before each gsutil command but still its causing the race condition. Do I need something like CloudBuild to separate the runtime environments?

You can use Cloud Build to get separate execution environments but this might be an overkill for your use case, as a Cloud Build worker uses an entire VM which might be just too heavy, linux containers / Docker can provide necessary isolation as well.
You should make sure that each container you run has a unique config file placed in the path expected by gcloud. The issue may come from improper volume mounting (all the containers share the same location from the host/OS), or maybe you should mount a directory containing their configuration file (unique for each bucket) on running an image, or perhaps you should run gcloud config configurations activate in a Dockerfile step (thus creating image variants for different buckets if it’s feasible).
Alternatively, and I think this solution might be easier, you can switch from Cloud SDK distribution to standalone gsutil distribution. That way you can provide a path to a boto configuration file through an environment variable.
Such variables can be specified on running a Docker image.

Related

Multiple active projects under single config? Or, multiple active configurations?

I have a set of clusters split between two projects, 1 and 2. Currently, need to use gcloud init to switch between the two projects. Is there any possibility of having both projects active under the single configuration? Or, is it possible to have two configurations simultaneously active? I would hate to have to use init every time to switch between the two. Thanks!
gcloud init should only be used to (re)initialize gcloud on a host. The only time I ever use it is when I install gcloud on a new machine.
gcloud uses a global config that can be manipulated with the gcloud config command. IMO (I've been using GCP for 9 years) the less you use gcloud config, the better for your experience.
I think you're much better placed specifying config explicitly with gcloud commands.
Every gcloud command can include e.g.:
--project=${PROJECT} to specify the project to use
--account=${ACCOUNT} to specify the gcloud auth'd account to use
--region=${REGION} or --zone=${ZONE} or --location=${LOCATION}
etc.
Using gcloud commands and explicitly setting flags to specific the project, account, location etc. makes it trivial to flip between these and often (though not always) in a more intentional way.

How to run docker-compose across different lifecycle environments

How to run docker-compose across different lifecycle environments (say dev, qa, staging, production).
Sometimes a larger VM is being shared by multiple developers, so would like to start the containers with appropriate developer specific suffixes (say dev1, dev2, dev3 ..). Should port customization be handled manually via the environment file (i.e. .env file)
This is an unusual use case for docker-compose, but I'll leave some tips anyway! :)
There's two different ways to name stuff you start with docker-compose. One is to name the service that you specify under the main services: key of your docker-compose.yml file. By default, individual running containers will be assigned names indicating what project they are from (by default, the name of the directory from which your docker-compose file is in), what service they run (this is what's specified under your services: key), and which instance of that service they are (this number changes if eg. you're using replicas). Eg. default container names for a service named myservice specified in a compose file ~/my_project/docker/docker-compose.yml will have a name like docker_myservice_1 (or _2, _3, etc if more than one container is supposed to run).
You can use environment variables to specify a lot of key-value pairs in docker-compose files, but you can't conditionally specify the service name - service keys are only allowed to have alphanumeric characters in them and compose files can't look like eg:
version: "3"
services:
${ENVVAR}:
image: ubuntu:20.04
However, you can override the container naming scheme by using the container_name field in your docker-compose file (documentation for usage here). Maybe a solution you could use looks like this:
version: "3"
services:
myservice:
image: ubuntu:20.04
container_name: ${DEVELOPER_ENVVAR?err}
this will require a developer to specify DEVELOPER_ENVVAR at runtime, either by exporting it in their shell or by running docker-compose like DEVELOPER_ENVVAR=myservice_dev1 docker-compose up. Note that using container_name is incompatible with using replicas to run multiple containers for the same service - the names have to be unique for those running containers, so you'll either have to define separate services for each name, or give up on using container_name.
However, you're in a pickle if you expect multiple developers to be able to run containers with different names using the same compose file in the same directory. That's because when starting a service, docker-compose has a Recreating step where, if there's already containers implementing that service running, they'll wait for that container to finish. Ultimately, I think this is for the best - if multiple developers were trying to run the exact same compose project at once, should a developer have control over other developers' running containers? Probably not, right?
If you want multiple developers to be able to run services at once in the same VM, I think you probably want to do two things:
first, (and you may well have already done this! but it's still a good reminder) make sure that this is a good idea. Are there going to be resource contention issues (eg. for port-forwarding) that make different running instances of your project conflict? For many Docker services, there are going to be, but there probably won't be for eg. images that are meant to be run in a swarm.
second, have different compose files checked out in different directories, so that there are separate compose projects for each developer. To use .env files one way one obvious option is to just maintain separate copies, one per developer directory. If, for your use case, it's unsatisfactory to maintain one copy of .env per developer this way, you could use symlinks named .env (or whatever your env file is named) to the same file somewhere else on the VM.
After you've done this, you'll be able to tell from the container names who is running what.
If none of these are satisfactory, you might want to consider, eg. using one VM per developer, or maybe even considering using a different container management system than docker-compose.
I have done very similar automation and I've used Ansible to create "docker compose" config on the fly.
So based on input-Environment , the ansible playbook will create the relevant docker-compose file. So basically I have a docker-compose template in my git repository with values that are dynamic and ansible playbook populates them etc.
and also you can use ansible to trigger such creation or automation one after another
A similar sample has been posted at ansible_docker_splunk repository.
Basically the whole project is to automate end-to-end docker cluster from CSV file

Best way to deploy long-running high-compute app to GCP

I have a python app that builds a dataset for a machine learning task on GCP.
Currently I have to start an instance of a VM that we have, and then SSH in, and run the app, which will complete in 2-24 hours depending on the size of the dataset requested.
Once the dataset is complete the VM needs to be shutdown so we don't incur additional charges.
I am looking to streamline this process as much as possible, so that we have a "1 click" or "1 command" solution, but I'm not sure the best way to go about it.
From what I've read about so far it seems like containers might be a good way to go, but I'm inexperienced with docker.
Can I setup a container that will pip install the latest app from our private GitHub and execute the dataset build before shutting down? How would I pass information to the container such as where to get the config file etc? It's conceivable that we will have multiple datasets being generated at the same time based on different config files.
Is there a better gcloud feature that suits our purpose more effectively than containers?
I'm struggling to get information regarding these basic questions, it seems like container tutorials are dominated by web apps.
It would be useful to have a batch-like container service that runs a container until its process completes. I'm unsure whether such a service exists. I'm most familiar with Google Cloud Platform and this provides a wealth of compute and container services. However -- to your point -- these predominantly scale by (HTTP) requests.
One possibility may be Cloud Run and to trigger jobs using Cloud Pub/Sub. I see there's async capabilities too and this may be interesting (I've not explored).
Another runtime for you to consider is Kubernetes itself. While Kubernetes requires some overhead in having Google, AWS or Azure manage a cluster for you (I strongly recommend you don't run Kubernetes yourself) and some inertia in the capacity of the cluster's nodes vs. the needs of your jobs, as you scale the number of jobs, you will smooth these needs. A big advantage with Kubernetes is that it will scale (nodes|pods) as you need them. You tell Kubernetes to run X container jobs, it does it (and cleans-up) without much additional management on your part.
I'm biased and approach the container vs image question mostly from a perspective of defaulting to container-first. In this case, you'd receive several benefits from containerizing your solution:
reproducible: the same image is more probable to produce the same results
deployability: container run vs. manage OS, app stack, test for consistency etc.
maintainable: smaller image representing your app, less work to maintain it
One (beneficial!?) workflow change if you choose to use containers is that you will need to build your images before using them. Something like Knative combines these steps but, I'd stick with doing-this-yourself initially. A common solution is to trigger builds (Docker, GitHub Actions, Cloud Build) from your source code repo. Commonly you would run tests against the images that are built but you may also run your machine-learning tasks this way too.
Your containers would container only your code. When you build your container images, you would pip install, perhaps pip install --requirement requirements.txt to pull the appropriate packages. Your data (models?) are better kept separate from your code when this makes sense. When your runtime platform runs containers for you, you provide configuration information (environment variables and|or flags) to the container.
The use of a startup script seems to better fit the bill compared to containers. The instance always executes startup scripts as root, thus you can do anything you like, as the command will be executed as root.
A startup script will perform automated tasks every time your instance boots up. Startup scripts can perform many actions, such as installing software, performing updates, turning on services, and any other tasks defined in the script.
Keep in mind that a startup script cannot stop an instance but you can stop an instance through the guest operating system.
This would be the ideal solution for the question you posed. This would require you to make a small change in your Python app where the Operating system shuts off when the dataset is complete.
Q1) Can I setup a container that will pip install the latest app from our private GitHub and execute the dataset build before shutting down?
A1) Medium has a great article on installing a package from a private git repo inside a container. You can execute the dataset build before shutting down.
Q2) How would I pass information to the container such as where to get the config file etc?
A2) You can use ENV to set an environment variable. These will be available within the container.
You may consider looking into Docker for more information about container.

How to use Docker in the development/deployment workflow?

I'm not sure I completely understand the role of Docker in the process of development and deployment.
Say, I create a Dockerfile with nginx, some database and something else which creates a container and runs fine.
I drop it somewhere in the cloud and execute it to install and configure all the dependencies and environment settings.
Next, I have a repository with a web application which I want to run inside the container I created and deployed in the first 2 steps. I regularly work on it and push the changes.
Now, how do I integrate the web application into the container?
Do I put it as a dependency inside the Dockerfile I create in the 1st step and recreate the container each time from scratch?
Or, do I deploy the container once but have procedures inside Dockerfile that install utils that pull the code from repo by command or via hooks?
What if a container is running but I want to change some settings of, say, nginx? Do I add these changes into Dockerfile and recreate the image?
In general, what's the role of Docker in the daily app development routine? Is it used often if the infrastructure is running fine and only code is changing?
I think there is no singl "use only this" answer - as you already outlined, there are different viable concepts available.
Deployment to staging/production/pre-production
a)
Do I put it as a dependency inside the Dockerfile I create in the 1st step and recreate the container each time from scratch?
This is for sure the most docker`ish way and aligns fully with he docker-philosophy. It is highly portable, reproducible and suites anything, from one container to "swarm" thousands of. E.g. this concept has no issue suddenly scaling horizontally when you need more containers, lets say due to heavy traffic / load.
It also aligns with the idea that only the configuration/data should be dynamic in a docker container, not code / binaries /artifacts
This strategy should be chosen for production use, so when not as frequent deployments happen. If you care about downtimes during container-rebuilds (on upgrade), there are good concepts to deal with that too.
We use this for production and pre-production intances.
b)
Or, do I deploy the container once but have procedures inside
Dockerfile that install utils that pull the code from repo by command
or via hooks?
This is a more common practice for very frequent deployment. You can go the pull ( what you said ) or the push (docker cp / ssh scp) concept, while i guess the latter is preferred in this kind of environment.
We use this for any kind strategy for staging instances, which basically should reflect the current "codebase" and its status. We also use this for smoke-tests and CI, but depending on the application. If the app actually changes its dependencies a lot and a clean build requires a rebuild with those to really ensure stuff is tested as it is supposed to, we actually rebuild the image during CI.
Configuration management
1.
What if a container is running but I want to change some settings of,
say, nginx? Do I add these changes into Dockerfile and recreate the
image?
I am not using this as c) since this is configuration management, not applications deployment and the answer to this can be very complicated, depending on your case. In general, if redeployment needs configuration changes, it depends on your configuration management, if you can go with b) or always have to go a).
E.g. if you use https://github.com/markround/tiller with consul as the backend, you can push the configuration changes into consul, regenerating the configuration with tiller, while using consul watch -prefix /configuration tiller as a watch-task to react on those value changes.
This enables you to go b) and fix the configuration
You can also use https://github.com/markround/tiller and on deployment, e.g. change ENV vars or some kind of yml file ( tiller supports different backends ), and call tiller during deployment yourself. This most probably needs you to have ssh or you ssh on the host and use docker cp and docker exec
Development
In development, you generally reuse your docker-compose.yml file you use for production, but overload it with docker-compose-dev.yml to e.g. mount your code-folder, set RAILS_ENV=development, reconfigurat / mount some other configurations like xdebug or more verbose nginx loggin, whatever you need. You can also add some fake MTA-services like fermata and so on
docker-compose -f docker-compose.yml -f docker-compose-dev.yml up
docker-compose-dev.yml only overloads some values, it does not redefine it or duplicate it.
Depending on how powerful your configuration management is, you can also do a pre-installation during development stack up.
We actually use scaffolding for that, we use https://github.com/xeger/docker-compose and after running it, we use docker exec and docker cp to preinstall a instance or stage something. Some examples are here https://github.com/EugenMayer/docker-sync/wiki/7.-Scripting-with-docker-sync
If you are developing under OSX and you face performance issues due to OSXFS / code shares, you probably want to have a look at http://docker-sync.io ( i am biased though )

Using Ansible to automatically configure AWS autoscaling group instances

I'm using Amazon Web Services to create an autoscaling group of application instances behind an Elastic Load Balancer. I'm using a CloudFormation template to create the autoscaling group + load balancer and have been using Ansible to configure other instances.
I'm having trouble wrapping my head around how to design things such that when new autoscaling instances come up, they can automatically be provisioned by Ansible (that is, without me needing to find out the new instance's hostname and run Ansible for it). I've looked into Ansible's ansible-pull feature but I'm not quite sure I understand how to use it. It requires a central git repository which it pulls from, but how do you deal with sensitive information which you wouldn't want to commit?
Also, the current way I'm using Ansible with AWS is to create the stack using a CloudFormation template, then I get the hostnames as output from the stack, and then generate a hosts file for Ansible to use. This doesn't feel quite right – is there "best practice" for this?
Yes, another way is just to simply run your playbooks locally once the instance starts. For example you can create an EC2 AMI for your deployment that in the rc.local file (Linux) calls ansible-playbook -i <inventory-only-with-localhost-file> <your-playbook>.yml. rc.local is almost the last script run at startup.
You could just store that sensitive information in your EC2 AMI, but this is a very wide topic and really depends on what kind of sensitive information it is. (You can also use private git repositories to store sensitive data).
If for example your playbooks get updated regularly you can create a cron entry in your AMI that runs every so often and that actually runs your playbook to make sure your instance configuration is always up to date. Thus avoiding having "push" from a remote workstation.
This is just one approach there could be many others and it depends on what kind of service you are running, what kind data you are using, etc.
I don't think you should use Ansible to configure new auto-scaled instances. Instead use Ansible to configure a new image, of which you will create an AMI (Amazon Machine Image), and order AWS autoscaling to launch from that instead.
On top of this, you should also use Ansible to easily update your existing running instances whenever you change your playbook.
Alternatives
There are a few ways to do this. First, I wanted to cover some alternative ways.
One option is to use Ansible Tower. This creates a dependency though: your Ansible Tower server needs to be up and running at the time autoscaling or similar happens.
The other option is to use something like packer.io and build fully-functioning server AMIs. You can install all your code into these using Ansible. This doesn't have any non-AWS dependencies, and has the advantage that it means servers start up fast. Generally speaking building AMIs is the recommended approach for autoscaling.
Ansible Config in S3 Buckets
The alternative route is a bit more complex, but has worked well for us when running a large site (millions of users). It's "serverless" and only depends on AWS services. It also supports multiple Availability Zones well, and doesn't depend on running any central server.
I've put together a GitHub repo that contains a fully-working example with Cloudformation. I also put together a presentation for the London Ansible meetup.
Overall, it works as follows:
Create S3 buckets for storing the pieces that you're going to need to bootstrap your servers.
Save your Ansible playbook and roles etc in one of those S3 buckets.
Have your Autoscaling process run a small shell script. This script fetches things from your S3 buckets and uses it to "bootstrap" Ansible.
Ansible then does everything else.
All secret values such as Database passwords are stored in CloudFormation Parameter values. The 'bootstrap' shell script copies these into an Ansible fact file.
So that you're not dependent on external services being up you also need to save any build dependencies (eg: any .deb files, package install files or similar) in an S3 bucket. You want this because you don't want to require ansible.com or similar to be up and running for your Autoscale bootstrap script to be able to run. Generally speaking I've tried to only depend on Amazon services like S3.
In our case, we then also use AWS CodeDeploy to actually install the Rails application itself.
The key bits of the config relating to the above are:
S3 Bucket Creation
Script that copies things to S3
Script to copy Bootstrap Ansible. This is the core of the process. This also writes the Ansible fact files based on the CloudFormation parameters.
Use the Facts in the template.