I am using kubernetes to deploy a simple application. The pieces are:
a rabbitMQ instance
a stateless HTTP server
a worker that takes jobs from the message queue and processes them
I want to be able to scale the HTTP server and the worker up and down independently of each other. Would it be more appropriate for me to create a single deployment containing one pod for the HTTP server and one for the worker, or separate deployments for the HTTP server / worker?
You should definitely choose different deployment for HTTP Server and the worker. For following reasons:
Your scaling characteristics are different for both of them. It does not make sense to put them in the same deployment
The parameters on which you will scale will be different too. For HTTP server it might be RPS and for worker application, it will number of items pending/to be processed state. You can create HPA and scale them for different parameters that suit them best
The metrics & logs that you want to collect and measure for each would be again different and would make sense to keep them separate.
I think the Single Responsibility principle fits well too and would unnecessarily mix things up if you keep then in same pod/deployment.
Related
We have an AKS cluster and we want to achieve below two points in our architecture:
We have replicas of pods and we want to have only 1 request served by one pod. basically one pod - one request design.
When all pods are busy, then next coming request should not be queued at POD level, instead it should be queued at service level and once any of busy pod become idle or available then only queued request should be dispatched on idle pod.
How to achieve above things?
Generally, this could be achieved by creating a custom proxy that creates pods on demand, but in practice it will be very difficult and performance will be poor. This was very well explained by David Maze in his comment:
You need to write a custom proxy with access to the Kubernetes API that can create new pods on demand; this is not a standard Kubernetes setup. This is also an extremely heavy-weight setup (if it takes tens of seconds to pull and deploy a new pod you can hit HTTP request timeouts very easily) and every Web framework supports handling multiple requests per process.
I have test environment where HA is not important but rather resources efficiency, so would you recommend in that regard to create one Pod with multiple containers where it make sense of course, where containers are tight coupled or to have one Pod for every service? Does this have any impact on resources at all?
I will give an example if for instance I have php application, and then nginx proxy and then filebeat service that is listening logs, what would be better to have 3 pods for this 3 things or one pod with 3 containers. And when I say better I mean to use less memory, cpu, etc.
The difference between both solutions should not be significant (negligible ?).
However, depending on your approach, the management effort might be quite significantly different.
With one component in a dedicated pod you need to somehow synchronise live cycle of all pods (php + nginx + filebeat) whenever you spin just one, new application up.
With all of them in one pod you just need to create/delete one pod.
As I have been using kubernetes more I keep on seeing the reference that a pod can contain 1 container or more and I have even looked at examples.
My question is whether there is a case where this would be best practice and more efficient to create multi container pods since you can scale and replicate your pods coupling it with a service.
Thanks in advance
A Pod can contain multiple containers, but for the most portion of the situations, it makes perfect sense for the Pod to be simply an abstraction over a single running container.
In what situations does it make sense to have a multi-container deployed Pod?
What comes to my mind are the scenarios where you have a primary Pod running, but you need to tightly couple helper processes, such as a log watcher. In those situations, it makes perfect sense to actually have multiple containers running inside a single pod.
Another big example that comes to my mind is from the Istio project, which is a platform made to connect, manage and secure microservices and is generally referred as a Service Mesh.
A huge part of what it does and is able to accomplish to provide a greater control and customization over the deployed microservices network, is due to the fact that it deploys a sidecar proxy, denominated Envoy, throughout the environment intercepting all network communication between microservices.
Here, you can check an example of load balancing in a Istio service mesh. As you can see the Proxy is deployed inside the Pod, intercepting all communication that goes through it.
I am trying to create a Kubernetes job that consists of two pods that have to be scheduled on separate nodes in our Hybrid cluster. Our requirement is that one of the pods runs on a Windows Server node and the other pod is running on a Linux node (thus we cannot just run two Docker containers from the same pod, which I know is possible, but would not work in our scenario). The Linux pod (which you can imagine as a client) will communicate over the network with the Windows pod (which you can imagine as a stateful server) exchanging data while the job runs. When the Linux pod terminates, we want to also terminate the Windows pod. However, if one of the pods fail, then we want to fail both pods (as they are designed to be a single job)
Our current design is to write a K8S service that handles the communication between the pods, and then apply the service and the two pods to the cluster to "emulate" a job. However, this is not ideal since the two pods are not tightly coupled as a single job and adds quite a bit of overhead to manually manage this setup (e.g. when failures or the job, we probably need to manually kill the service and deployment of the Windows pod). Plus we would need to deploy a new service for each "job", as we require the Linux pod to always communicate with the same Windows pod for the duration of the job due to underlying state (thus cannot use a single service for all Windows pods).
Any thoughts on how this could be best achieved on Kubernetes would be much appreciated! Hopefully this scenario is supported natively, and I would not need to resort in this kind of pod-service-pod setup that I described above.
Many thanks
I am trying to distinguish your distaste for creating and wiring the Pods from your distaste at having to do so manually. Because, in theory, a Job that creates Pods is very similar to what you are describing, and would be able to have almost infinite customization for those kinds of rules. With a custom controller like that, one need not create a Service for the client(s) to speak to their server, as the Job could create the server Pod first, obtain its Pod-specific-IP, and feed that to the subsequently created client Pods.
I would expect one could create a Job controller using only bash and either curl or kubectl: generate the json or yaml that describes the situation you wish to have, feed it to the kubernetes API (since the Job would have a service account - just like any other in-cluster container), and use normal traps to cleanup after itself. Without more of the specific edge cases loaded in my head it's hard to say if that's a good idea or not, but I believe it's possible.
I have a client that makes asynchronous calls to a gRPC service managed by kubernetes. The function calls are computationally expensive and they each take a while to complete. Therefore many of the calls wait for response in a queue (as shown in this tutorial https://grpc.io/docs/tutorials/async/helloasync-cpp.html or more specific https://github.com/grpc/grpc/blob/v1.4.x/examples/cpp/helloworld/greeter_async_client2.cc). What I notice is that all the calls are served by the same pod and other pods remain unused on my cluster.
If I launch multiple instances of the client it picks up different nodes or pods, but I'm interested in this happening for calls from one async client connection.
Is this possible and if so, does it require some specific configuration?
(I realize that I could open many connections from one script, but this does not seem optimal??)
I should also mention that I'm running a local kubernetes setup with just a few nodes which is setup using kubeadm.
kube-proxy is an L4 load balancer so it's not able to distinguish between separate http requests (L7) in one stream. Depending on what you are trying to achieve an L7 proxy (that supports HTTP/2) could be a solution.
There is a nice overview in this document: https://grpc.io/blog/loadbalancing