Somebody, please, explain me (or direct to a detailed resource) why kubernetes uses this wrapper (pod) to work with containers. Every resource I go across just quotes same words - "it is the smallest unit in k8s". What I am looking for is the reason for it from engineering perspective. I do understand that it provides namespace for storage and networking for containers inside, but best practice is keeping a single container in a pod anyways.
I've used docker-compose a lot before I familiarized myself with k8s, and have hard times to understand the need for this additional layer (wrapper) around pretty straightforward entity, container.
The reason for this decision is simply because a Pod may contain more than one container, doing different things.
First of all, A pod may have an init-container which is responsible to do some starting operations to ensure that the main container / containers work properly. I could have an init-container load some configuration and preparing it for the main application, or do some basic operations such as restoring a backup or similar things.
I can basically inject a series of operations to exec before starting the main application without building again the main application container image.
Second, even if the majority of applications are perfectly fine having only one container for Pod, there are several situations where more than one container in the same Pod may be useful.
An example could be having the main application running, and then a side-car container doing a proxy in front of the main application, maybe being the responsible for checking JWT tokens.. or another example could be a secondary application extracting metrics from the main application or similar things.
Last, let me quote Kubernetes documentation (https://kubernetes.io/docs/tasks/access-application-cluster/communicate-containers-same-pod-shared-volume/)
The primary reason that Pods can have multiple containers is to support helper applications that assist a primary application. Typical examples of helper applications are data pullers, data pushers, and proxies. Helper and primary applications often need to communicate with each other. Typically this is done through a shared filesystem, as shown in this exercise, or through the loopback network interface, localhost. An example of this pattern is a web server along with a helper program that polls a Git repository for new updates.
Update
Like you said, init containers.. or multiple containers in the same Pod are not a must, all the functionalities that I listed can also be obtained in other ways, such as en entrypoints or two separate Pods communicating with each other instead of two containers in the same Pod.
There are several benefits in using those functionalities tho, let me quote the Kubernetes documentation once more (https://kubernetes.io/docs/concepts/workloads/pods/init-containers/)
Because init containers have separate images from app containers, they have some advantages for start-up related code:
Init containers can contain utilities or custom code for setup that
are not present in an app image. For example, there is no need to make
an image FROM another image just to use a tool like sed, awk, python,
or dig during setup.
The application image builder and deployer roles
can work independently without the need to jointly build a single app
image.
Init containers can run with a different view of the filesystem
than app containers in the same Pod. Consequently, they can be given
access to Secrets that app containers cannot access.
Because init
containers run to completion before any app containers start, init
containers offer a mechanism to block or delay app container startup
until a set of preconditions are met. Once preconditions are met, all
of the app containers in a Pod can start in parallel.
Init containers
can securely run utilities or custom code that would otherwise make an
app container image less secure. By keeping unnecessary tools separate
you can limit the attack surface of your app container image
The same applies to multiple containers running in the same Pod, they can communicate safely with each other, without exposing that communication to other on the cluster, because they keep it local.
Related
I am currently studying distributed systems and have seen that many businesses relies on side-car proxy pattern for their services. For example, I know a company that uses an nginx proxy for authentication of their services and roles and permissions instead of including this business logic within their services.
Another one makes use of a cloud-sql-proxy on GKE to use the Cloud SQL offering that comes on google cloud. So on top of deploying their services in a container which runs in a pod, they is a proxy just for communicating with the database.
There is also istio which is a service mesh solution which can be deployed as a side-car proxy in a pod.
I am pretty sure there are other commonly know use-cases where this pattern is used but at some point how much is too much side-car proxy? How heavy is it on the pod running it and what are the complexity that comes with using 2, 3, or even 4 side car proxys on top of your service container?
I recommend you to define what really you need and continue your research based on this, since this topic is too broad and doesn't have one correct answer.
Due to this, I decided to post a community wiki answer. Feel free to expand it.
There can be various reasons for running some containers in one pod. According to Kubernetes documentation:
A Pod can encapsulate an application composed of multiple co-located
containers that are tightly coupled and need to share resources. These
co-located containers form a single cohesive unit of service—for
example, one container serving data stored in a shared volume to the
public, while a separate sidecar container refreshes or updates
those files. The Pod wraps these containers, storage resources, and an
ephemeral network identity together as a single unit.
In its simplest form, a sidecar container can be used to add functionality to a primary application that might otherwise be difficult to improve.
Advantages of using sidecar containers
sidecar container is independent from its primary application in terms of runtime environment and programming language;
no significant latency during communication between primary application and sidecar container;
the sidecar pattern entails designing modular containers. The modular container can be plugged in more than one place with minimal modifications, since you don't need to write configuration code inside each application.
Notes regarding usage of sidecar containers
consider making a small sidecar container that doesn't consume much resources. The strong point of a sidecar containers lies in their ability to be small and pluggable. If sidecar container logic is getting more complex and/or becoming more tightly coupled with the main application container, it may better be integrated with the main application’s code instead.
to ensure that any number of sidecar containers can works successfully with main application its necessary to sum up all the resources/request limits while defining resource limits for the pod, because all the containers will run in parallel. Whole functionality works only if both types of containers are running successfully and most of the time these sidecar containers are simple and small that consume fewer resources than the main container.
How do I run a start script on Kubernetes cluster? I saw that there are ways to run start scripts on compute instances but couldn't find any documentation regarding the same for kubernetes clusters.
You can use init containers to execute script before the actual container is started.
Init containers are specialized containers that run before app containers in a Pod . Init containers can contain utilities or setup scripts not present in an app image.
Because init containers have separate images from app containers, they have some advantages for start-up related code:
Init containers can contain utilities or custom code for setup that
are not present in an app image. For example, there is no need to
make an image FROM another image just to use a tool like sed, awk,
python, or dig during setup.
The application image builder and deployer roles can work
independently without the need to jointly build a single app image.
Init containers can run with a different view of the filesystem than
app containers in the same Pod. Consequently, they can be given
access to Secrets that app containers cannot access.
Because init containers run to completion before any app containers
start, init containers offer a mechanism to block or delay app
container startup until a set of preconditions are met. Once
preconditions are met, all of the app containers in a Pod can start
in parallel.
Init containers can securely run utilities or custom code that would
otherwise make an app container image less secure. By keeping
unnecessary tools separate you can limit the attack surface of your
app container image.
What's the benefit of having multiple containers in a pod versus having standalone containers?
If you have multiple containers in the same pod, they can speak to each other as localhost and can share mounted volumes.
If you have multiple pods of one container each, you can restart one without restarting the other. Assuming they're controlled by deployments, you can add additional replicas of one without necessarily scaling the other. If the version or some other characteristic of one of them changes, you're not forced to restart the other. You'd need to set up a service to talk from one to the other, and they can't communicate via a filesystem.
The general approach I've always seen is to always have one container per pod within a deployment, unless you have a specific reason to need an additional container. Usually this is some kind of special-purpose "sidecar" that talks to a credentials service, or manages logging, or runs a network proxy, or something else that's secondary to the main thing the pod does (and isn't a separate service in its own right).
Apart from the points pointed out , the CPU and Memory(under technical preview) are associated with a POD so if we have a single container in a POD it is easy to understand and implement the application resourcerequirement inside the POD with more than one container inside the POD we could face issues/challenges when we want to do a horizontal scale
Secondly the deployments (Blue/Green,Canary,A/B) are also more aligned with the approach of single container/POD
From the Kubernetes documentation
A Pod might encapsulate an application composed of multiple co-located containers that are tightly coupled and need to share resources. These co-located containers might form a single cohesive unit of service–one container serving files from a shared volume to the public, while a separate “sidecar” container refreshes or updates those files. The Pod wraps these containers and storage resources together as a single manageable entity.
I am trying to deploy multiple pods in k8s like say MySQL, Mango, Redis etc
Can i create a single deployment resource for this and have multiple containers defined in template section? Is this allowed? If so, how will replication behave in this case?
Thanks
Pavan
I am trying to deploy multiple pods in k8s like say MySQL, Mango,
Redis etc
From microservices architecture perspective it is actually quite a bad idea to place all those containers in a single Pod. Keep in mind that a Pod is a smallest deployable unit that can be created and managed by Kubernetes. There are quite many good reasons you don't want to have all above mentioned services in a single Pod. Difficulties in scaling such solution is just one of them.
Can i create a single deployment resource for this and have multiple
containers defined in template section? Is this allowed? If so, how
will replication behave in this case?
No, it is not allowed in Kubernetes. As to Deployments and StatefulSets, (which you need for statefull applications such as databases) both manage Pods that are based on identical container spec so it is not possible to have a Deployment or StatefulSet consisting of different types of Pods, based on different specs.
To sum up:
Many Deployments and StatefulSets objects, serving for different purposes are the right solution.
A deployment can have multiple containers inside of it.
Generaly it's used to have one master container for the app and some sidecar container that are needed for the app. I don't have an example right now.
Still it's a best practice to split deployments for scalling purpose, your front may need to scale more than the back depending on cache and you may not want to have pods too big. For cahing purpose like redis it's better to have a cluster on the side as each time a pod start or stop, you will loose data.
It's common having multiple containers per Pod in order to share namespaces and volumes between them: take as example the Ambassador pattern that is used to present the application to outside adding a layer for the authentication, making it totally transparent to the main app.
Other examples using the sidecar pattern consist of log parsers or configurators that hot reload credentials without the main app to worry about it.
That's the theory, according to your needs you have to use one deployment per component, so a Deployment for your app, a StatefulSet for the DB and so on. Keep in mind to use a container per process and a Kubernetes resource per backing service.
As I have been using kubernetes more I keep on seeing the reference that a pod can contain 1 container or more and I have even looked at examples.
My question is whether there is a case where this would be best practice and more efficient to create multi container pods since you can scale and replicate your pods coupling it with a service.
Thanks in advance
A Pod can contain multiple containers, but for the most portion of the situations, it makes perfect sense for the Pod to be simply an abstraction over a single running container.
In what situations does it make sense to have a multi-container deployed Pod?
What comes to my mind are the scenarios where you have a primary Pod running, but you need to tightly couple helper processes, such as a log watcher. In those situations, it makes perfect sense to actually have multiple containers running inside a single pod.
Another big example that comes to my mind is from the Istio project, which is a platform made to connect, manage and secure microservices and is generally referred as a Service Mesh.
A huge part of what it does and is able to accomplish to provide a greater control and customization over the deployed microservices network, is due to the fact that it deploys a sidecar proxy, denominated Envoy, throughout the environment intercepting all network communication between microservices.
Here, you can check an example of load balancing in a Istio service mesh. As you can see the Proxy is deployed inside the Pod, intercepting all communication that goes through it.