How many side-car proxy is too much in a pod - kubernetes

I am currently studying distributed systems and have seen that many businesses relies on side-car proxy pattern for their services. For example, I know a company that uses an nginx proxy for authentication of their services and roles and permissions instead of including this business logic within their services.
Another one makes use of a cloud-sql-proxy on GKE to use the Cloud SQL offering that comes on google cloud. So on top of deploying their services in a container which runs in a pod, they is a proxy just for communicating with the database.
There is also istio which is a service mesh solution which can be deployed as a side-car proxy in a pod.
I am pretty sure there are other commonly know use-cases where this pattern is used but at some point how much is too much side-car proxy? How heavy is it on the pod running it and what are the complexity that comes with using 2, 3, or even 4 side car proxys on top of your service container?

I recommend you to define what really you need and continue your research based on this, since this topic is too broad and doesn't have one correct answer.
Due to this, I decided to post a community wiki answer. Feel free to expand it.
There can be various reasons for running some containers in one pod. According to Kubernetes documentation:
A Pod can encapsulate an application composed of multiple co-located
containers that are tightly coupled and need to share resources. These
co-located containers form a single cohesive unit of service—for
example, one container serving data stored in a shared volume to the
public, while a separate sidecar container refreshes or updates
those files. The Pod wraps these containers, storage resources, and an
ephemeral network identity together as a single unit.
In its simplest form, a sidecar container can be used to add functionality to a primary application that might otherwise be difficult to improve.
Advantages of using sidecar containers
sidecar container is independent from its primary application in terms of runtime environment and programming language;
no significant latency during communication between primary application and sidecar container;
the sidecar pattern entails designing modular containers. The modular container can be plugged in more than one place with minimal modifications, since you don't need to write configuration code inside each application.
Notes regarding usage of sidecar containers
consider making a small sidecar container that doesn't consume much resources. The strong point of a sidecar containers lies in their ability to be small and pluggable. If sidecar container logic is getting more complex and/or becoming more tightly coupled with the main application container, it may better be integrated with the main application’s code instead.
to ensure that any number of sidecar containers can works successfully with main application its necessary to sum up all the resources/request limits while defining resource limits for the pod, because all the containers will run in parallel. Whole functionality works only if both types of containers are running successfully and most of the time these sidecar containers are simple and small that consume fewer resources than the main container.

Related

Does kubernetes support non distributed applications?

Our store applications are not distributed applications. We deploy on each node and then configured to store specific details. So, it is tightly coupled to node. Can I use kubernetes for this test case? Would I get benefits from it?
Our store applications are not distributed applications. We deploy on each node and then configured to store specific details. So, it is tightly coupled to node. Can I use kubernetes for this test case?
Based on only this information, it is hard to tell. But Kubernetes is designed so that it should be easy to migrate existing applications. E.g. you can use a PersistentVolumeClaim for the directories that your application store information.
That said, it will probably be challenging. A cluster administrator want to treat the Nodes in the cluster as "cattles" and throw them away when its time to upgrade. If your app only has one instance, it will have some downtime and your PersistentVolume should be backed by a storage system over the network - otherwise the data will be lost when the node is thrown away.
If you want to run more than one instance for fault tolerance, it need to be stateless - but it is likely not stateless if it stores local data on disk.
There are several ways to have applications running on fixed nodes of the cluster. It really depends on how those applications behave and why do they need to run on a fixed node of the cluster.
Usually such applications are Stateful and may require interacting with a specific node's resources, or writing directly on a mounted volume on specific nodes for performance reasons and so on.
It can be obtained with a simple nodeSelector or with affinity to nodes ( https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/ )
Or with local persistent volumes ( https://kubernetes.io/blog/2019/04/04/kubernetes-1.14-local-persistent-volumes-ga/ )
With this said, if all the applications that needs to be executed on the Kubernetes cluster are apps that needs to run on a single node, you lose a lot of benefits as Kubernetes works really well with stateless applications, which may be moved between nodes to obtain high availability and a strong resilience to nodes failure.
The thing is that Kubernetes is complex and brings you a lot of tools to work with, but if you end up using a small amount of them, I think it's an overkill.
I would weight the benefits you could get with adopting Kubernetes (easy way to check the whole cluster health, easy monitoring of logs, metrics and resources usage. Strong resilience to node failure for stateless applications, load balancing of requests and a lot more) with the cons and complexity that it may brings (especially migrating to it can require a good amount of effort, if you weren't using containers to host your applications and so on)

Why kubernetes does not work directly with containers

Somebody, please, explain me (or direct to a detailed resource) why kubernetes uses this wrapper (pod) to work with containers. Every resource I go across just quotes same words - "it is the smallest unit in k8s". What I am looking for is the reason for it from engineering perspective. I do understand that it provides namespace for storage and networking for containers inside, but best practice is keeping a single container in a pod anyways.
I've used docker-compose a lot before I familiarized myself with k8s, and have hard times to understand the need for this additional layer (wrapper) around pretty straightforward entity, container.
The reason for this decision is simply because a Pod may contain more than one container, doing different things.
First of all, A pod may have an init-container which is responsible to do some starting operations to ensure that the main container / containers work properly. I could have an init-container load some configuration and preparing it for the main application, or do some basic operations such as restoring a backup or similar things.
I can basically inject a series of operations to exec before starting the main application without building again the main application container image.
Second, even if the majority of applications are perfectly fine having only one container for Pod, there are several situations where more than one container in the same Pod may be useful.
An example could be having the main application running, and then a side-car container doing a proxy in front of the main application, maybe being the responsible for checking JWT tokens.. or another example could be a secondary application extracting metrics from the main application or similar things.
Last, let me quote Kubernetes documentation (https://kubernetes.io/docs/tasks/access-application-cluster/communicate-containers-same-pod-shared-volume/)
The primary reason that Pods can have multiple containers is to support helper applications that assist a primary application. Typical examples of helper applications are data pullers, data pushers, and proxies. Helper and primary applications often need to communicate with each other. Typically this is done through a shared filesystem, as shown in this exercise, or through the loopback network interface, localhost. An example of this pattern is a web server along with a helper program that polls a Git repository for new updates.
Update
Like you said, init containers.. or multiple containers in the same Pod are not a must, all the functionalities that I listed can also be obtained in other ways, such as en entrypoints or two separate Pods communicating with each other instead of two containers in the same Pod.
There are several benefits in using those functionalities tho, let me quote the Kubernetes documentation once more (https://kubernetes.io/docs/concepts/workloads/pods/init-containers/)
Because init containers have separate images from app containers, they have some advantages for start-up related code:
Init containers can contain utilities or custom code for setup that
are not present in an app image. For example, there is no need to make
an image FROM another image just to use a tool like sed, awk, python,
or dig during setup.
The application image builder and deployer roles
can work independently without the need to jointly build a single app
image.
Init containers can run with a different view of the filesystem
than app containers in the same Pod. Consequently, they can be given
access to Secrets that app containers cannot access.
Because init
containers run to completion before any app containers start, init
containers offer a mechanism to block or delay app container startup
until a set of preconditions are met. Once preconditions are met, all
of the app containers in a Pod can start in parallel.
Init containers
can securely run utilities or custom code that would otherwise make an
app container image less secure. By keeping unnecessary tools separate
you can limit the attack surface of your app container image
The same applies to multiple containers running in the same Pod, they can communicate safely with each other, without exposing that communication to other on the cluster, because they keep it local.

Can Kubernetes work like a compute farm and route one request per pod

I've dockerized a legacy desktop app. This app does resource-intensive graphical rendering from a command line interface.
I'd like to offer this rendering as a service in a "compute farm", and I wondered if Kubernetes could be used for this purpose.
If so, how in Kubernetes would I ensure that each pod only serves one request at a time (this app is resource-intensive and likely not thread-safe)? Should I write a single-threaded wrapper/invoker app in the container and thus serialize requests? Would K8s then be smart enough to route subsequent requests to idle pods rather than letting them pile up on an overloaded pod?
Interesting question.
The inbuilt default Service object along with kube-proxy does route the requests to different pods, but only does so in a round-robin fashion which does not fit our use case.
Your use-case would require changes to be made to the kube-proxy setup during the cluster setup. This approach is tedious and will require you to have your own cluster setup (not supported by cloud services). As described here.
Best bet would be to setup a service-mesh like Istio which provides the features with little configuration along with a lot of other useful functionalities.
See if this helps.

In what case is recommended to use one pod for many containers [duplicate]

What's the benefit of having multiple containers in a pod versus having standalone containers?
If you have multiple containers in the same pod, they can speak to each other as localhost and can share mounted volumes.
If you have multiple pods of one container each, you can restart one without restarting the other. Assuming they're controlled by deployments, you can add additional replicas of one without necessarily scaling the other. If the version or some other characteristic of one of them changes, you're not forced to restart the other. You'd need to set up a service to talk from one to the other, and they can't communicate via a filesystem.
The general approach I've always seen is to always have one container per pod within a deployment, unless you have a specific reason to need an additional container. Usually this is some kind of special-purpose "sidecar" that talks to a credentials service, or manages logging, or runs a network proxy, or something else that's secondary to the main thing the pod does (and isn't a separate service in its own right).
Apart from the points pointed out , the CPU and Memory(under technical preview) are associated with a POD so if we have a single container in a POD it is easy to understand and implement the application resourcerequirement inside the POD with more than one container inside the POD we could face issues/challenges when we want to do a horizontal scale
Secondly the deployments (Blue/Green,Canary,A/B) are also more aligned with the approach of single container/POD
From the Kubernetes documentation
A Pod might encapsulate an application composed of multiple co-located containers that are tightly coupled and need to share resources. These co-located containers might form a single cohesive unit of service–one container serving files from a shared volume to the public, while a separate “sidecar” container refreshes or updates those files. The Pod wraps these containers and storage resources together as a single manageable entity.

Kubernetes best practices in pods

As I have been using kubernetes more I keep on seeing the reference that a pod can contain 1 container or more and I have even looked at examples.
My question is whether there is a case where this would be best practice and more efficient to create multi container pods since you can scale and replicate your pods coupling it with a service.
Thanks in advance
A Pod can contain multiple containers, but for the most portion of the situations, it makes perfect sense for the Pod to be simply an abstraction over a single running container.
In what situations does it make sense to have a multi-container deployed Pod?
What comes to my mind are the scenarios where you have a primary Pod running, but you need to tightly couple helper processes, such as a log watcher. In those situations, it makes perfect sense to actually have multiple containers running inside a single pod.
Another big example that comes to my mind is from the Istio project, which is a platform made to connect, manage and secure microservices and is generally referred as a Service Mesh.
A huge part of what it does and is able to accomplish to provide a greater control and customization over the deployed microservices network, is due to the fact that it deploys a sidecar proxy, denominated Envoy, throughout the environment intercepting all network communication between microservices.
Here, you can check an example of load balancing in a Istio service mesh. As you can see the Proxy is deployed inside the Pod, intercepting all communication that goes through it.