Managing resources (database, elasticsearch, redis, etc) for tests using Docker and Jenkins - postgresql

We need to use Jenkins to test some web apps that each need:
a database (postgres in our case)
a search service (ElasticSearch in our case, but only sometimes)
a cache server, such as redis
So far, we've just had these services running on the Jenkins master, but this causes problems when we want to upgrade Postgres, ES or Redis versions. Not all apps can move in lock step, and we want to run the tests on new versions before committing to move an app in production.
What we'd like to do is have these services provided on a per-job-run basis, each one running in its own container.
What's the best way to orchestrate these containers?
How do you start up these ancillary containers and tear them down, regardless of whether to job succeeds or not?
how do you prevent port collisions between, say, the database in a run of a job for one web app and the database in the job for another web app?

Check docker-compose and write a docker-compose file for your tests.
The latest network features of Docker (private network) will help you to isolate builds running in parallel.
However, start learning docker-compose as if you only had one build at the same time. When confident with this, look further for advanced docker documentation around networking.

Related

Possible to deploy or use several containers as one service in Google Cloud Run?

I am testing Google Cloud Run by following the official instruction:
https://cloud.google.com/run/docs/quickstarts/build-and-deploy
Is it possible to deploy or use several containers as one service in Google Cloud Run? For example: DB server container, Web server container, etc.
Short Answer NO. You can't deploy several container on the same service (as you could do with a Pod on K8S).
However, you can run several binaries in parallel on the same container -> This article has been written by a Googler that work on Cloud Run.
In addition, keep in mind
Cloud Run is a serverless product. It scales up and down (to 0) as it wants (but especially according with the traffic). If the startup duration is long and a new instance of your service is created, the query will take time to be served (and your use will wait)
You pay as you use, I means, you are billed only when HTTP requests are processed. Out of processing period, the CPU allocated to the instance is close to 0.
That implies that Cloud Run serves container that handle HTTP requests. You can't run a batch processing out of any HTTP request, in background.
Cloud Run is stateless. You have an ephemeral and in memory writable directory (/tmp) but when the instance goes down, all the data goes down. You can't run a DB server container that store data. You can interact with external services (Cloud SQL, Cloud Storage,...) but store only transient file locally
To answer your question directly, I do not think it is possible to deploy a service that has two different containers: DB server container, and Web server container. This does not include scaling (service is automatically scaled to a certain number of container instances).
However, you can deploy a container (a service) that contains multiple processes, although it might not be considered as best practices, as mentioned in this article.
Cloud Run takes a user's container and executes it on Google infrastructure, and handles the instantiation of instances (scaling) of that container, seamlessly based on parameters specified by the user.
To deploy to Cloud Run, you need to provide a container image. As the documentation points out:
A container image is a packaging format that includes your code, its packages, any needed binary dependencies, the operating system to use, and anything else needed to run your service.
In response to incoming requests, a service is automatically scaled to a certain number of container instances, each of which runs the deployed container image. Services are the main resources of Cloud Run.
Each service has a unique and permanent URL that will not change over time as you deploy new revisions to it. You can refer to the documentation for more details about the container runtime contract.
As a result of the above, Cloud Run is primarily designed to run web applications. If you are after a microservice architecture, which consists of different servers running each in unique containers, you will need to deploy multiple services. I understand that you want to use Cloud Run as database server, but perhaps you may be interested in Google's database solutions, like Cloud SQL, Datastore, BigTable or Spanner.

Best way to deploy long-running high-compute app to GCP

I have a python app that builds a dataset for a machine learning task on GCP.
Currently I have to start an instance of a VM that we have, and then SSH in, and run the app, which will complete in 2-24 hours depending on the size of the dataset requested.
Once the dataset is complete the VM needs to be shutdown so we don't incur additional charges.
I am looking to streamline this process as much as possible, so that we have a "1 click" or "1 command" solution, but I'm not sure the best way to go about it.
From what I've read about so far it seems like containers might be a good way to go, but I'm inexperienced with docker.
Can I setup a container that will pip install the latest app from our private GitHub and execute the dataset build before shutting down? How would I pass information to the container such as where to get the config file etc? It's conceivable that we will have multiple datasets being generated at the same time based on different config files.
Is there a better gcloud feature that suits our purpose more effectively than containers?
I'm struggling to get information regarding these basic questions, it seems like container tutorials are dominated by web apps.
It would be useful to have a batch-like container service that runs a container until its process completes. I'm unsure whether such a service exists. I'm most familiar with Google Cloud Platform and this provides a wealth of compute and container services. However -- to your point -- these predominantly scale by (HTTP) requests.
One possibility may be Cloud Run and to trigger jobs using Cloud Pub/Sub. I see there's async capabilities too and this may be interesting (I've not explored).
Another runtime for you to consider is Kubernetes itself. While Kubernetes requires some overhead in having Google, AWS or Azure manage a cluster for you (I strongly recommend you don't run Kubernetes yourself) and some inertia in the capacity of the cluster's nodes vs. the needs of your jobs, as you scale the number of jobs, you will smooth these needs. A big advantage with Kubernetes is that it will scale (nodes|pods) as you need them. You tell Kubernetes to run X container jobs, it does it (and cleans-up) without much additional management on your part.
I'm biased and approach the container vs image question mostly from a perspective of defaulting to container-first. In this case, you'd receive several benefits from containerizing your solution:
reproducible: the same image is more probable to produce the same results
deployability: container run vs. manage OS, app stack, test for consistency etc.
maintainable: smaller image representing your app, less work to maintain it
One (beneficial!?) workflow change if you choose to use containers is that you will need to build your images before using them. Something like Knative combines these steps but, I'd stick with doing-this-yourself initially. A common solution is to trigger builds (Docker, GitHub Actions, Cloud Build) from your source code repo. Commonly you would run tests against the images that are built but you may also run your machine-learning tasks this way too.
Your containers would container only your code. When you build your container images, you would pip install, perhaps pip install --requirement requirements.txt to pull the appropriate packages. Your data (models?) are better kept separate from your code when this makes sense. When your runtime platform runs containers for you, you provide configuration information (environment variables and|or flags) to the container.
The use of a startup script seems to better fit the bill compared to containers. The instance always executes startup scripts as root, thus you can do anything you like, as the command will be executed as root.
A startup script will perform automated tasks every time your instance boots up. Startup scripts can perform many actions, such as installing software, performing updates, turning on services, and any other tasks defined in the script.
Keep in mind that a startup script cannot stop an instance but you can stop an instance through the guest operating system.
This would be the ideal solution for the question you posed. This would require you to make a small change in your Python app where the Operating system shuts off when the dataset is complete.
Q1) Can I setup a container that will pip install the latest app from our private GitHub and execute the dataset build before shutting down?
A1) Medium has a great article on installing a package from a private git repo inside a container. You can execute the dataset build before shutting down.
Q2) How would I pass information to the container such as where to get the config file etc?
A2) You can use ENV to set an environment variable. These will be available within the container.
You may consider looking into Docker for more information about container.

Can we consider using containers (and kubernetes) for monolith, stateful web applications?

I'm learning about Containers and Kubernetes and was evaluating if we can move our monolith, stateful appplication to kubernetes?
I was also looking at https://kubernetes.io/blog/2018/03/principles-of-container-app-design/ and "Self-Containment" looks close. We can consider using "storage".
Properties of my application:
1. Runs on a JVM
2. Does not have a database. Saves all its data/content to TAR files on the file-system
3. Should be able to backup and retain state if the container goes down.
In our current scenarios, we deploy the app to a VM and our IT teams generally take snapshots of these VM's as backups and restore them if the app fails or they have to restore to a point where the app was working good. I wanted to avoid this.
Please advice.
You call it as web application, but based on what it does it just a process which writes to file system.
If you move to k8s, write to NFS or persistent storage from pod. If you can only run one instance, then you can't use k8s horizontal scaling.

How to test autodiscovery with just one computer?

I'm trying to write a Kotlin server with auto discovery, however I only have one computer to develop. My server uses a port and I just can't figure it out myself how to successfully test my application. Thanks for your help!
JVMs
You may be able to run copies of the app in different JVMs, but would have to run them on different ports.
VMs
This may be slow, but may be an option
Docker
Using Docker (and optionally compose) you been run multiple copies of the app on the same port, with less overhead than using full VMs.

How to use Docker in the development/deployment workflow?

I'm not sure I completely understand the role of Docker in the process of development and deployment.
Say, I create a Dockerfile with nginx, some database and something else which creates a container and runs fine.
I drop it somewhere in the cloud and execute it to install and configure all the dependencies and environment settings.
Next, I have a repository with a web application which I want to run inside the container I created and deployed in the first 2 steps. I regularly work on it and push the changes.
Now, how do I integrate the web application into the container?
Do I put it as a dependency inside the Dockerfile I create in the 1st step and recreate the container each time from scratch?
Or, do I deploy the container once but have procedures inside Dockerfile that install utils that pull the code from repo by command or via hooks?
What if a container is running but I want to change some settings of, say, nginx? Do I add these changes into Dockerfile and recreate the image?
In general, what's the role of Docker in the daily app development routine? Is it used often if the infrastructure is running fine and only code is changing?
I think there is no singl "use only this" answer - as you already outlined, there are different viable concepts available.
Deployment to staging/production/pre-production
a)
Do I put it as a dependency inside the Dockerfile I create in the 1st step and recreate the container each time from scratch?
This is for sure the most docker`ish way and aligns fully with he docker-philosophy. It is highly portable, reproducible and suites anything, from one container to "swarm" thousands of. E.g. this concept has no issue suddenly scaling horizontally when you need more containers, lets say due to heavy traffic / load.
It also aligns with the idea that only the configuration/data should be dynamic in a docker container, not code / binaries /artifacts
This strategy should be chosen for production use, so when not as frequent deployments happen. If you care about downtimes during container-rebuilds (on upgrade), there are good concepts to deal with that too.
We use this for production and pre-production intances.
b)
Or, do I deploy the container once but have procedures inside
Dockerfile that install utils that pull the code from repo by command
or via hooks?
This is a more common practice for very frequent deployment. You can go the pull ( what you said ) or the push (docker cp / ssh scp) concept, while i guess the latter is preferred in this kind of environment.
We use this for any kind strategy for staging instances, which basically should reflect the current "codebase" and its status. We also use this for smoke-tests and CI, but depending on the application. If the app actually changes its dependencies a lot and a clean build requires a rebuild with those to really ensure stuff is tested as it is supposed to, we actually rebuild the image during CI.
Configuration management
1.
What if a container is running but I want to change some settings of,
say, nginx? Do I add these changes into Dockerfile and recreate the
image?
I am not using this as c) since this is configuration management, not applications deployment and the answer to this can be very complicated, depending on your case. In general, if redeployment needs configuration changes, it depends on your configuration management, if you can go with b) or always have to go a).
E.g. if you use https://github.com/markround/tiller with consul as the backend, you can push the configuration changes into consul, regenerating the configuration with tiller, while using consul watch -prefix /configuration tiller as a watch-task to react on those value changes.
This enables you to go b) and fix the configuration
You can also use https://github.com/markround/tiller and on deployment, e.g. change ENV vars or some kind of yml file ( tiller supports different backends ), and call tiller during deployment yourself. This most probably needs you to have ssh or you ssh on the host and use docker cp and docker exec
Development
In development, you generally reuse your docker-compose.yml file you use for production, but overload it with docker-compose-dev.yml to e.g. mount your code-folder, set RAILS_ENV=development, reconfigurat / mount some other configurations like xdebug or more verbose nginx loggin, whatever you need. You can also add some fake MTA-services like fermata and so on
docker-compose -f docker-compose.yml -f docker-compose-dev.yml up
docker-compose-dev.yml only overloads some values, it does not redefine it or duplicate it.
Depending on how powerful your configuration management is, you can also do a pre-installation during development stack up.
We actually use scaffolding for that, we use https://github.com/xeger/docker-compose and after running it, we use docker exec and docker cp to preinstall a instance or stage something. Some examples are here https://github.com/EugenMayer/docker-sync/wiki/7.-Scripting-with-docker-sync
If you are developing under OSX and you face performance issues due to OSXFS / code shares, you probably want to have a look at http://docker-sync.io ( i am biased though )