docker build nginx(ingress-nginx) images slowly, how to speed up this - kubernetes

Sometimes we face some nginx vulnerabilities,
so we need to fix the nginx vulnerabilities inside ingress-nginx,
but the docker build -t image is too slow.
The reason is that the dockerfile internal will make compile and make install process.
How to add some parameters can make the docker build process faster?
Although the docker build process prompts make to add the -j parameter to increase threads to speed up the process,
there is no make related parameter inside the dockerfile.
It is not a good idea to modify the dockerfile directly.
Source of the dockerfile.

There is no one good solution on how to speed up the building of a Docker image. This may depend on a number of things. That is why I am posting the answer of the community wiki to present as many solution proposals as possible, referring to various tutorials.
There are a few tricks you can use to speed up building Docker images.
First I will present you solution from Google cloud:
The easiest way to increase the speed of your Docker image build is by specifying a cached image that can be used for subsequent builds. You can specify the cached image by adding the --cache-from argument in your build config file, which will instruct Docker to build using that image as a cache source.
You can read more here about Docker Layer Caching.
Another way is Structure your Dockerfile instructions like an inverted pyramid:
Each instruction in your Dockerfile results in an image layer being created. Docker uses layers to reuse work, and save bandwidth. Layers are cached and don’t need to be recomputed if:
All previous layers are unchanged.
In case of COPY instructions: the files/folders are unchanged.
In case of all other instructions: the command text is unchanged.
To make good use of the Docker cache, it’s a good idea to try and put layers where lots of slow work needs to happen, but which change infrequently early in your Dockerfile, and put quickly-changing and fast layers last. The result is like an inverted pyramid.
You can also Only copy files which are needed for the next step.
Look at these great tutorials about speeding building your Docker images:
-5 Tips to Speed up Your Docker Image Build
-Speed Up Your Development Flow With These Dockerfile Best Practices
-[Six Ways to Build Docker Images Faster (Even in Seconds)](# Six Ways to Build Docker Images Faster (Even in Seconds)
At the end I will present you one more method described here - How to speed up your Docker image build? You can you a tool Buildkit.
With Docker 18.09 ,a new builder was released. It’s called Buildkit. It is not used by default, so most of us are still using the old one. The thing is, Buildkit is much faster, even for such simple images!
The difference is about 18 seconds on an image that builds in the 70s. That’s a lot, almost 33%!
Hope it helps ;)

docker version: docker-ce.20.10
Actual testing of images made by buildkit
may actually do little to improve the efficiency of making images if they involve a lot of compilation processes internally.
Maybe based on the official alpine base image,
consider add the -j nproc parameter in dockerfile to increase the threads,
would be a good direction.

Related

Kubernetes API custom image metadata

I try to use the Kubernetes API to read metadata via annotations from container images. The metadata is applicable to every instance of the respecting image and is needed in order to run any resulting container properly. Following this SO question it is not possible to read Docker image labels from the kubernetes API directly.
My next thought was to use custom annotations added to the image manifest, although this seems to be a pretty hacky solution for such a "simple" task. Anyway if I add the annotations to the manifest using docker, I see no way to read them from the Kubernetes API.
I think I am on the completely wrong track here. This seems to be a rather simple task which other people likely have implemented already...anyway I cannot find any further information regarding this. Is it really that hard to read image metadata via kubernetes before deploying a container of that image?
Thanks in advance for any help!
Edit:
The reason I am asking is because I want to grant the containers of specific images access to specific serial USB devices (e.g. FTDI232) on diverse host systems. Since I have no idea which path (e.g. /dev/ttyUSB0) will be assigned to the USB devices, I wrote a program that is monitoring USB devices and, in case an appropriate device is plugged in or gets plugged in, creates the container and passes it the corresponding path. From inside the container I want to access the serial device via a static, non-changing path (e.g. /dev/FTDI232)
Yes. The K8s API is limited when it comes to this, I believe the abstractions for container image metadata are at lower level and probably left out for a reason. You can always look at the CRI spec to see what's supported (note that the doc is out of date so you might have to look at the code).
If the end goal is to use Kubernetes to run your workloads it sounds like the more feasible route here is just to write a script that reads that image manifest outside Kubernetes and create the manifest files that you need to run your workloads after (based on that metadata) and then finally apply it to your cluster.
If you are using a common container image registry you could also write something that pulls the images from that registry to just pick metadata and metadata changes.

What's the benefit of not using ImagePullPolicy set to Always?

In the Kubernetes documentation it mentions that the caching semantic of using ImagePullPolicy: Always makes the ImagePullPolicy quite efficient. What reasons would I want to choose a different ImagePullPolicy?
It heavily depends on your versioning/tagging strategy.
When new replicas of your application are created (because your app has scaled up, or a pod has died and it's been replaced by a new pod), if you use ImagePullPolicy: Always and you have pushed a different version of your application using the same tag (like people do when using latest), the new created replicas might run a totally different version of the application than the rest of the replicas, leading to an inconsistent behavior.
You also may want to use a different value than Always when on a development environment like minikube.
There isn't much of a disadvantage to ImagePullPolicy: Always, but having the control means that:
if you have an underlying image provider that doesn't provide caching (ie you're not using Docker), you have control to make sure that it's not called every time the kubelet wants the image
even with Docker caching, it's still going to be slightly faster to not attempt to pull the image every time. If you know that you never re-use a tag (which is recommended) or you're explicitly specifying the image digest then there's no benefit to ImagePullPolicy: Always
if you have an installation where you're pulling the image onto the node using a separate mechanism, you may want it to not attempt an automatic fetch if something goes wrong and the image doesn't exist.
Note that in fiunchinho's answer it mentions that you can use it to keep various replicas of an application in sync. This would be dangerous as images are pulled per node, so you could end up with different versions of the application running on different nodes.

if an application needs several containers running on the same host, why not just make a single container with everything you need?

I started working on kubernetes. I already worked with single container pods. Now i want to working on multiple container pod. i read the statement like
if an application needs several containers running on the same host, why not just make a single container with everything you need?
means two containers with single IP address. My dought is, in which cases two or more containers uses same host?
Could you please anybody explain me above scenario with an example?
This is called "multiple processes per container".
https://docs.docker.com/config/containers/multi-service_container/
t's discussed on the internet many times and it has many gotchas. Basically there's not a lot of benefit of doing it.
Ideally you want container to host 1 process and its threads/subprocesses.
So if your database process is in a crash loop, let it crash and let docker restart it. This should not impact your web container.
Also putting processes in separate containers lets you set separate memory/CPU limits so that you can set different limits for your web container and database container.
That's why Kubernetes exposes POD concept which lets you run multiple containers in the same namespace. Read this page fully: https://kubernetes.io/docs/concepts/workloads/pods/pod/
Typically it is recommended to run one process in a docker container. Although it is very much possible to run multiple processes in a container it is discouraged more on the basis of application architecture and DevOps perspective rather than technical reasons.
Following discussion gives a better insight into this debate: https://devops.stackexchange.com/questions/447/why-it-is-recommended-to-run-only-one-process-in-a-container
When one runs multiple processes from different packages/tools in the same docker it could run into dependency and upgradability issues that docker meant to solve in the first place. Nevertheless, for many applications, it makes much sense to scale and manage application in blocks rather than individual components, sometimes it makes life bit easy. So PODs are basically a balance between the two. It gives the isolation for each process for better service manageability and yet groups them together so that the application as a whole can be better managed.
I would say segregating responsibilities can be a boon. Maintenance is whole lot easier this way. A container can have a single entrypoint and a pod's health is checked via that main process in the entry point. If your front end (say Java over Tomcat ) application is working fine but database is not working (although one must never use production database in container), you will not get the feedback that u might get when they are different containers.
Also, one of the best part of docker images are they are available as different modules and maintenance is super easy that way.

Dataproc node setup

I understand google dataproc clusters are equipped to handle initialization actions - which are executed on creation of every node. However, this is only reasonable for small actions, and would not do well with creating nodes with tons of dependencies and software for large pipelines. Thus, I was wondering - is there anyway to load nodes as custom images or have an image spin up once the node is created that has all the installs on it, so you don't have to download things again and again.
Good question.
As you note, initialization actions are currently the canonical way to install stuff on Clusters when they are created. If you have a ton of dependancies, or need to do things like compile from source, those initialization actions may take a bit.
We have support for a better method to handle customizations on our long-term roadmap. This may be via custom images or some other mechanism.
In the interim, scaling clusters up/down may provide some relief if you want to keep some of the customizations in place and split the difference between boot time and the persistence of your cluster. Likewise, if there are any precompiled packages, those always save time.

Restarting vs Building a Container on Docker

I have some difficulties understanding when to restart a container and when to build a new one.
Imagine I have a webapp whose data I give to a container (over a symbolic link which point to the current deploy). Now I have two options when a new deploy comes in: (1) build a new container from an image or (2) simply restart the running container.
I know that the decision depends on various things but my question is a more conceptual one:
What is generally a better practice or how is Docker meant to be used? Do you see any problems with one of these approaches?
I think that both options can have the same result in some cases.
Anyway, I think that the correct approach is to prepare an image with all your prerequisites and dependencies. This is your inital state that you want to use for your webapp. Then you can start your webapp in one or many containers based on this same image. Every instance has own unique container.
In case of deploy I think that you should do a "fresh start": stop and discard running container and create a new container with your new deployed app.
Sure, you can just restart the container, but every container has a state and - in general - you want to throw this state away and start again with your fresh initial image. Of course, if your app just printfs "Hello World", there is no state saved in container, so in this case both options behave the same.