I notice the cpu utilization of pods in same hpa varies from 31m to 1483m. Is this expected and normal? See below for the cpu utilization of 8 pods which are of the same hpa.
NAME CPU(cores)
myapp-svc-pod1 31m
myapp-svc-pod2 87m
myapp-svc-pod3 1061m
myapp-svc-pod4 35m
myapp-svc-pod5 523m
myapp-svc-pod6 1483m
myapp-svc-pod7 122m
myapp-svc-pod8 562m
HPA main goal is to spawn more pods to keep average load for a group of pods on specified level.
HPA is not responsible for Load Balancing and equal connection distribution.
For equal connection distribution is responsible k8s service, which works by deafult in iptables mode and - according to k8s docs - it picks pods by random.
Your uneven cpu load distribution is most probably caused by the data it processes. To make sure it's not the issue with k8s service, I'd recomment you to export some metrics like number of connections and time it takes to process one request. After you gathered this data, have a look at it and see if a pattern emerges.
Now to answer your question:
Is this expected and normal?
It depends what you consider as normal, but if you were expecting more equal cpu load distribution then you may want to rethink your design. It's hard to say what you can do to make it more equal because I don't know what myapp-svc-pods do, but as I already mentioned, it may be best to have a look at the metrics.
Related
community.
I have doubts about the use of HorizontalPodAutoscaler (HPA) in Kubernetes, what are the best practices of using HPA, especially in the implementation of MaxReplicate, as an example, If I have a cluster with 3 workers nodes running a single app, and setting up the HPA to scale up to 20 pods, but it is a good practice scale pods 3x more than the available nodes? Or scale the pods up to the same quantity of available worker nodes in the cluster as a better approach?
Thank you in advantage
first of all you need to test your application and decide a reasonable resources per pod "request and limits"
After setting the limit per pod then you know how many pods your cluster can maintain.
for example if you have total/free 10 cpu and 10 Gi memory over the cluster and you set limit per pod to have 1 cpu and 1 Gi memo then you can run up to 10 pods.
then it's time to run your load test and fire the expected traffic at its max with the lowest number of pods you're planning to run that fit the normal/daily traffic .. gradually startup new pod and check if you can handle the high traffic or you need to add more pods still .. repeat this till you reach a appropriate number of pods. then you got the maximum number of pods that you can configure in your HPA.
I am running a deployment on a cluster of 1 master and 4 worker nodes (2-32GB and 2-4GB machine). I want to run a maximum of 10 pods on 4GB machines and 50 pods in 32GB machines.
Is there a way to assign different number of pods to different nodes in Kubernetes for same deployment?
I want to run a maximum of 10 pods on 4GB machines and 50 pods in 32GB
machines.
This is possible with configuring kubelet to limit the maximum pod count on the node:
// maxPods is the number of pods that can run on this Kubelet.
MaxPods int32 `json:"maxPods,omitempty"`
Github can be found here.
Is there a way to assign different number of pods to different nodes
in Kubernetes for same deployment?
Adding this to your request makes it not possible. There is no such native mechanism in Kubernetes at this point to suffice this. And this more or less goes in spirit of how Kubernetes works and its principles. Basically you schedule your application and let scheduler decides where it should go, unless there is very specific resource required like GPU. And this is possible with labels,affinity etc .
If you look at the Kubernetes-API you notice the there is no such field that will make your request possible. However, API functionality can be extended with custom resources and this problem can be tackled with creating your own scheduler. But this is not the easy way of fixing this.
You may want to also set appropriate memory requests. Higher requests will tell scheduler to deploy more pods into node which has more memory resources. It's not ideal but it is something.
Well in general the scheduling is done on basis of algorithms like round robin, least used etc.
And likely we have the independence of adding node affinities via selectors but that won't even tackle the count.
Maybe you have to manually reset this thing up along the worker nodes.
Say -
you did kubectl top nodes to get the available spaces, once the deployment has been done.
and kubectl get po -o wide will give you the nodes taken on by the pods.
Now to force the Pod to get spawned in a specific node, let's say the one with 32GB then you can temporarily mark the 4GB nodes as "Not ready" by executing following command
Kubectl cordon {node_name}
And now kill the pods those are running in 4GB machines and you want those to run in 32GB machines. After killing them, they will automatically get spawned in any of the 32GB nodes
then you can execute
Kubectl uncordon {node_name} to mark the node as "ready" again.
This is bit involved stuff and will need lots of calculations as well.
What is the difference between running 2 pods (2 replicas) in Kubernetes vs a one larger pod ?
I have set a pod with 20m memory request limit. Is it better to have 2 replicas with 20m limits or a single pod with 40m memory request limit?
Personally, I think the performance had better to run multiple pods on the same host. I don't know what web server you use, but the requests are processed by limited cpu time, though it has multiple processes or threads for work. Additionally it's more efficient to utilize cpu time during network I/O waiting in using multiple processes. In order to improve the throughput, you should increase the processes or instances to work horizontally, because the response time is getting slower as time past.
Depends mainly on the requirements of the web/mobile application being hosted, which you can ascertain by benchmarking the app performance under 20m & 40m configurations. Overall, you can expect better performance for the application running at 40m and scaling elastically when required by user traffic. Running two pods in different data centers will give better fail-over performance in case of system crash or other issues. You may have higher billing rates running two pods when supporting the same rate of web traffic.
I think there is no golden rule on how to plan your infrastructure capacity to met specific level of your application/service`s objectives. You should start collecting some key performance metrics of your application, and based on these monitoring stats start doing proper dimensioning of your PODs, for which you can use Kubernetes features like Horizontal/Vertical Pod Autoscaling.
My understanding is that in Kubernetes, when using the Horizontal Pod Autoscaler, if the targetCPUUtilizationPercentage field is set to 50%, and the average CPU utilization across all the pod's replicas is above that value, the HPA will create more replicas. Once the average CPU drops below 50% for some time, it will lower the number of replicas. Here is the part that I am not sure about:
What if the CPU utilization on a pod is 10%, not 0%?Will HPA still terminate the replica? 10% CPU isn't much, but since it's not 0%, some task is currently running on that pod. If it's a long lasting task (several seconds) and HPA decides to terminate the pod, that task will not be finished.
Does the HPA terminate pods only if the CPU utilization on them is 0% or does it terminate them whenever it sees that the value is below targetCPUUtilizationPercentage?
How does HPA decide which pods to remove?
Thank you!
So you have two questions in there and let me address one by one. The first part - if a pod in a replica set is consuming let's say 10% then will Kubernetes kill that pod? The answer is Yes. Kubernetes is not looking at the individual pods but at an average of that metric across all pods in that replica set. Also the scaling down is gradual as explained here
The second part of the question - how does your application behave gracefully when a pod is about to be killed and it is still serving some requests? This can be handled by the grace period of the pod termination and even better if you implement a PreStop hook - which will allow you to do something like stop taking incoming requests but process existing requests. The implementation of this will vary based on the language runtime you are using, so I won't go in the details here.
Lastly - one scenario you should consider is what if VM on which pod was running goes down abruptly - you have no chance to execute PreStop hook! I think the application needs to be robust enough to handle failures.
pod will not start due to "No nodes are available that match all of the following predicates:: Insufficient cpu"
In the above question, I had an issue starting a deployment with 3 containers.
Upon further investigation, it appears there is only 27% of the CPU quota available - which seems very low. The rest of the CPU seems to be assigned to some default bundled containers.
How is this normally mitigated? Is a larger node required? Do limits need to be set manually? Are all those additional containers necessary?
1 cpu for a single node cluster is probably too small.
From the containers in the original answer, both the dashboard and fluentd can be removed:
the dashboard is just a web UI, which can go away if you use kubectl (which you should, IMO);
fluentd should be reading the log files on disk to ship them somewhere (GCP's log aggregation, I think).
The unnecessary containers should be tied to a Deployment or ReplicaSet, which can be listed with kubectl get deployment and kubectl get rs, respectively. You can then kubectl delete them.
Increasing the resources on the node should not change the requirements for the basic pods, meaning they should all be free scheduling.