Restarting vs Building a Container on Docker - deployment

I have some difficulties understanding when to restart a container and when to build a new one.
Imagine I have a webapp whose data I give to a container (over a symbolic link which point to the current deploy). Now I have two options when a new deploy comes in: (1) build a new container from an image or (2) simply restart the running container.
I know that the decision depends on various things but my question is a more conceptual one:
What is generally a better practice or how is Docker meant to be used? Do you see any problems with one of these approaches?

I think that both options can have the same result in some cases.
Anyway, I think that the correct approach is to prepare an image with all your prerequisites and dependencies. This is your inital state that you want to use for your webapp. Then you can start your webapp in one or many containers based on this same image. Every instance has own unique container.
In case of deploy I think that you should do a "fresh start": stop and discard running container and create a new container with your new deployed app.
Sure, you can just restart the container, but every container has a state and - in general - you want to throw this state away and start again with your fresh initial image. Of course, if your app just printfs "Hello World", there is no state saved in container, so in this case both options behave the same.

Related

docker build nginx(ingress-nginx) images slowly, how to speed up this

Sometimes we face some nginx vulnerabilities,
so we need to fix the nginx vulnerabilities inside ingress-nginx,
but the docker build -t image is too slow.
The reason is that the dockerfile internal will make compile and make install process.
How to add some parameters can make the docker build process faster?
Although the docker build process prompts make to add the -j parameter to increase threads to speed up the process,
there is no make related parameter inside the dockerfile.
It is not a good idea to modify the dockerfile directly.
Source of the dockerfile.
There is no one good solution on how to speed up the building of a Docker image. This may depend on a number of things. That is why I am posting the answer of the community wiki to present as many solution proposals as possible, referring to various tutorials.
There are a few tricks you can use to speed up building Docker images.
First I will present you solution from Google cloud:
The easiest way to increase the speed of your Docker image build is by specifying a cached image that can be used for subsequent builds. You can specify the cached image by adding the --cache-from argument in your build config file, which will instruct Docker to build using that image as a cache source.
You can read more here about Docker Layer Caching.
Another way is Structure your Dockerfile instructions like an inverted pyramid:
Each instruction in your Dockerfile results in an image layer being created. Docker uses layers to reuse work, and save bandwidth. Layers are cached and don’t need to be recomputed if:
All previous layers are unchanged.
In case of COPY instructions: the files/folders are unchanged.
In case of all other instructions: the command text is unchanged.
To make good use of the Docker cache, it’s a good idea to try and put layers where lots of slow work needs to happen, but which change infrequently early in your Dockerfile, and put quickly-changing and fast layers last. The result is like an inverted pyramid.
You can also Only copy files which are needed for the next step.
Look at these great tutorials about speeding building your Docker images:
-5 Tips to Speed up Your Docker Image Build
-Speed Up Your Development Flow With These Dockerfile Best Practices
-[Six Ways to Build Docker Images Faster (Even in Seconds)](# Six Ways to Build Docker Images Faster (Even in Seconds)
At the end I will present you one more method described here - How to speed up your Docker image build? You can you a tool Buildkit.
With Docker 18.09 ,a new builder was released. It’s called Buildkit. It is not used by default, so most of us are still using the old one. The thing is, Buildkit is much faster, even for such simple images!
The difference is about 18 seconds on an image that builds in the 70s. That’s a lot, almost 33%!
Hope it helps ;)
docker version: docker-ce.20.10
Actual testing of images made by buildkit
may actually do little to improve the efficiency of making images if they involve a lot of compilation processes internally.
Maybe based on the official alpine base image,
consider add the -j nproc parameter in dockerfile to increase the threads,
would be a good direction.

What's the benefit of not using ImagePullPolicy set to Always?

In the Kubernetes documentation it mentions that the caching semantic of using ImagePullPolicy: Always makes the ImagePullPolicy quite efficient. What reasons would I want to choose a different ImagePullPolicy?
It heavily depends on your versioning/tagging strategy.
When new replicas of your application are created (because your app has scaled up, or a pod has died and it's been replaced by a new pod), if you use ImagePullPolicy: Always and you have pushed a different version of your application using the same tag (like people do when using latest), the new created replicas might run a totally different version of the application than the rest of the replicas, leading to an inconsistent behavior.
You also may want to use a different value than Always when on a development environment like minikube.
There isn't much of a disadvantage to ImagePullPolicy: Always, but having the control means that:
if you have an underlying image provider that doesn't provide caching (ie you're not using Docker), you have control to make sure that it's not called every time the kubelet wants the image
even with Docker caching, it's still going to be slightly faster to not attempt to pull the image every time. If you know that you never re-use a tag (which is recommended) or you're explicitly specifying the image digest then there's no benefit to ImagePullPolicy: Always
if you have an installation where you're pulling the image onto the node using a separate mechanism, you may want it to not attempt an automatic fetch if something goes wrong and the image doesn't exist.
Note that in fiunchinho's answer it mentions that you can use it to keep various replicas of an application in sync. This would be dangerous as images are pulled per node, so you could end up with different versions of the application running on different nodes.

if an application needs several containers running on the same host, why not just make a single container with everything you need?

I started working on kubernetes. I already worked with single container pods. Now i want to working on multiple container pod. i read the statement like
if an application needs several containers running on the same host, why not just make a single container with everything you need?
means two containers with single IP address. My dought is, in which cases two or more containers uses same host?
Could you please anybody explain me above scenario with an example?
This is called "multiple processes per container".
https://docs.docker.com/config/containers/multi-service_container/
t's discussed on the internet many times and it has many gotchas. Basically there's not a lot of benefit of doing it.
Ideally you want container to host 1 process and its threads/subprocesses.
So if your database process is in a crash loop, let it crash and let docker restart it. This should not impact your web container.
Also putting processes in separate containers lets you set separate memory/CPU limits so that you can set different limits for your web container and database container.
That's why Kubernetes exposes POD concept which lets you run multiple containers in the same namespace. Read this page fully: https://kubernetes.io/docs/concepts/workloads/pods/pod/
Typically it is recommended to run one process in a docker container. Although it is very much possible to run multiple processes in a container it is discouraged more on the basis of application architecture and DevOps perspective rather than technical reasons.
Following discussion gives a better insight into this debate: https://devops.stackexchange.com/questions/447/why-it-is-recommended-to-run-only-one-process-in-a-container
When one runs multiple processes from different packages/tools in the same docker it could run into dependency and upgradability issues that docker meant to solve in the first place. Nevertheless, for many applications, it makes much sense to scale and manage application in blocks rather than individual components, sometimes it makes life bit easy. So PODs are basically a balance between the two. It gives the isolation for each process for better service manageability and yet groups them together so that the application as a whole can be better managed.
I would say segregating responsibilities can be a boon. Maintenance is whole lot easier this way. A container can have a single entrypoint and a pod's health is checked via that main process in the entry point. If your front end (say Java over Tomcat ) application is working fine but database is not working (although one must never use production database in container), you will not get the feedback that u might get when they are different containers.
Also, one of the best part of docker images are they are available as different modules and maintenance is super easy that way.

How to organize pods with non-replicatable containers in kubernetes?

I'm trying to get my head around kubernetes.
I understand that pods are a great way to organize containers that are related. I understand that replication controllers are a great way to ensure they are up and running.
However, I do not get how to do it in real life.
Given a webapp with, say a rails app on unicorn, behind nginx with a postgres database.
The nginx and rails app can autoscale horizontally (if they are shared nothing), but postgres can't out of the box.
Does that mean I can't organize the postgres database within the same pod as nginx and rails, when I want to have two servers behind a loadbalancer? Does postgres need an own replication controller and is simply a service within the cluster?
The general question about that is: In common webscenarios, what kind of containers are going into one pod? I know that this can't be answered generally, so the ideas behind it are interesting.
The answer to this really depends on how far down the rabbithole you want to go. You are really describing 3 independent parts of your app - ngingx, rails, postgres. Each of those parts has different needs when it comes to monitoring, scaling, and updating.
You CAN put all 3 into a single pod. That replicates the experience of a VM, but with some advantages for manageability, deployment etc. But you're already calling out one of the major disadvantages - you want to scale (for example) the rails app but not the postgres instance. It's time to decompose.
What if you instead made 2 pods - one for rails+nginx and one for postgres. Now you can scale your frontend without messing up your database deployment. You might go even further and split your rails app and nginx into distinct pods, if that makes sense. Or split your rails app into 5 smaller apps.
This is the whole idea behind microservices (which is a very hyped word, I know). Decompose the problem into smaller and more manageable chunks. This is WHY kubernetes exists - to help you manage the resulting ocean of microservices.
To answer your last question - there is no single recipe. It's all about how your application is structured and what makes sense FOR YOU. Maybe you want to decompose along team boundaries, or along departments in your company, or along admin roles. The questions to ask yourself are things like "if I update or scale this pod, is there any part of it that I don't want updated/sclaed at the same time?"
In some sense it is a data normalization problem. A pod should be a model of one concept or owner or scope. I hope that helps a little.
You should put containers into the same pod when you want to deploy and update them at the same time or if they need to share local state (disk, network, etc). In some edge cases, you may also want to co-locate them for performance reasons.
In your scenario, since nginx and the rails app can scale horizontally, they should be in their own pods so that you can provision the right number of replicas for each tier of your application (unless you would always scale them at the same time). The postgres database would be in a separate pod, accessed via a service.
This allows you to update to a newer version of nginx without changing anything else about your service. Same for the rails app. And they could each scale independently.

Apache Marathon app and container relation

I would like to understand the relation between a Marathon App and a container. Is it really so, that a Marathon App definition can contain only a single container definition (1:1)? As far as I understand the Marathon REST API, link attached, the answer is yes.
https://mesosphere.github.io/marathon/docs/rest-api.html#post-/v2/apps
But then are we supposed to use App Groups in order to define such complex applications that are built from more than a single container? I have checked Kubernetes, and the idea of "pod" in that case seems to be very convenient to build such applications, that are composed by multiple containers, which containers in the same pod have a single network stack, and application scaling happens on pod level.
Can we say, that Kubernetes pod corresponds to Marathon App Group? Or should I not try to find any similarities, but rather I should better understand Marathon philosophy?
Thank you!
Regards,
Laszlo
An app in Marathon specifies how to spawn tasks of that application. While you can specify how many tasks you want to spawn, every single on of these tasks only corresponds to one command or container.
To help you, I would need to understand more about your use case.
Groups can be used to organize related apps including dependencies. The tasks of the apps will not necessarily be co-located on the same host.
If you need co-location, you need to either create a container with multiple processed or use constraints to directly specify on which host you want to run the tasks.