Will the pods will consume full resources specified in its request or limit while it getting created? - kubernetes

Would like to clear about the pods resource consumption when its getting created or restarted as part of rolling update or scaling up.
looking to understand..
whether pods will consume entire resoources specified in its requests while its getting created? or limits ?
or it will just consume how much its required to start, which will be less than its request.
We are currently facing some issue with our AKS cluster that, pods generating high cpu usage alerts (morethan 95%) when new pods getting created as part of rollout or as part of scaling up , but our applications are light weight and needs less cpu for its functionality.
So looking for a solution for this ,
whether we can consider CPU initialization period /initial readiness, which will make the pods to manage its resource consumption during startups?
whether we can tweak the hpa settings during scaling up activities or any policy window or stabilization window during the pod startups?

That really depends on the resource type and the actual values you specify.
As your question focuses on CPU I will as well.
The first consideration is what kind of class your pod ends up in
https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/
Pods generally do not "consume" any cpu resources, software running within them does, so what happens with CPU relies strictly on what software you're running. Some will have cpu heavy startup phase (oh what affection do I have for Java in k8s) and in that case initial spike of cpu will be perfectly normal, but also, due to scaling logic, that initial spike, if happening before pod is in ready state, will be discarded for computation of HPA scaleup.
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
So my ultimate advice would be to set your readinessProbe correctly.

Related

Is it possible to specifiy a sorting criteria within a kubernetes horizonal pod autoscaler to define which pods are chosen to terminate?

I have some HPAs defined within a Kubernetes cluster and the scaling functionality works as expected. However, I've observed that the choice of specific pods that are chosen to be scaled down seems pretty arbitrary.
So the question is. Can I somewhere define criteria to choose which pods are preferred to be terminated when a scale-down event happens, but without explicitly defining that the pods are actively scaled on that criteria?
For example, I mainly care about CPU and scale such that the CPU percentage is maintained at 50% or less, but when scaling down would prefer that older pods are preferred to be terminated rather than newer ones, or pods consuming the most memory be terminated in preference to those consuming less memory.
I'm aware that I can explicitly scale on multiple criteria like CPU and memory, but this can be problematic and prevent downward scaling unnecessarily for example when memory is allocated to a cache but CPU usage has decreased.
As per this official doc You can add the annotation controller.kubernetes.io/pod-deletion-cost with a value in the range [-2147483647, 2147483647] and this will cause pods with lower value to be killed first. Default is 0, so anything negative on one pod will cause a pod to get killed during downscaling.
Find this gitlink about the implementation of this feature: Scale down a deployment by removing specific pods (PodDeletionCost) #2255.
You can also use Pod Priority and Preemption. Refer to this official doc for more information

High total CPU request but low total usage (kubernetes resources)

I have a bunch of pods in a cluster that is almost requesting all (7.35/8) available CPU resources on a node:
even though their actual total usage is almost nothing (0.34/8).
The pod that is currently requesting the most only requests 210m which I guess is not an outrageous amount - also I would like to enforce some sensible minimum request size for all pods in the cluster. Of course that will accumulate when there are lots of pods.
It seems I could easily scale down the request by a factor of 10 and leave the limits where they are to begin with.
But is there something else that I should look into instead before doing that - reducing replica count etc.?
Also it looks a bit strange that the pods are not more evenly distributed between the nodes.
Your request values seems overestimated.
You need time and metrics to find the right request/limit for your workload.
Keep in mind that if you change those values, your pods will restart.
Also, It's normal that you can find some unbalance nodes on your cluster. Kubernetes will never remove a pod if you don't ask.
For example, if your create a cluster with 3 nodes, fill those 3 nodes with pods and then add another 3 nodes. The new nodes will stay empty.
You can setup some HorizontalPodAutoScaler on your cluster to adapt your number of pod to your workload.
Doing that, your workload will spread among nodes and with a correct balance. (if you use the default Scheduling Policy
I suggest following:
Resource Allocation: Based on history value set your request to meaningful value with buffer. Also to have guaranteed pod resource allocation it may be a good idea to set request and limit as same value. But that means you pod cannot burst for new resource. One more thing to note is scheduling only happens based on requested value, so if node has no more resource left, then pod will be killed and rescheduled if you request is trying to burst to limit.
Resource quotas: Check Kubernetes Resource Quotas to have sensible namespace level quotas to control overly provisioned resources by developers
Affinity/AntiAffinity: Check concept of Anti-affinity to have your replicas or different pods scheduled across your cluster. You can ensure for eg., that one host or Avalability zone etc can have only one replica of your pod (helps in HA), spread different pods to different nodes (layer scheduling etc) - Check this video
There are good answers already but I would like to add some more info.
It is very important to have a good strategy when calculating how much resources you would need for each container.
Optimally, your pods should be using exactly the amount of resources you requested but that's almost impossible to achieve. If the usage is lower than your request, you are wasting resources. If it's higher, you are risking performance issues. Consider a 25% margin up and down the request value as a good starting point. Regarding limits, achieving a good setting would depend on trying and adjusting. There is no optimal value that would fit everyone as it depends on many factors related to the application itself, the demand model, the tolerance to errors etc.
Kubernetes best practices: Resource requests and limits is a very good guide explaining the idea behind these mechanisms with a detailed explanation and examples.
Also, Managing Resources for Containers will provide you with the official docs regarding:
Requests and limits
Resource types
Resource requests and limits of Pod and Container
Resource units in Kubernetes
How Pods with resource requests are scheduled
How Pods with resource limits are run, etc
Just in case you'll need a reference.

Kubernetes "nice" a pod's CPU usage

I have a cluster w/ 3 nodes. Hurray! The cluster doesn't autoscale nodes.
These nodes run an amazing web app, yet most of the time do almost nothing.
I also have a background process that could use an infinite amount of CPU (the usefulness drops rapidly but remains useful).
I want these background pods to run on each Node and slowed down to leave a 20% CPU headroom on the Node. Or similar.
That's the shape of a DaemonSet.
Can I tell Kubernetes to deprioritize the DaemonSet Pods w/ a 20% headroom?
Can the DaemonSet Pods detect the Nodes CPU usage and deprioritize themselves (risky if buggy)?
QoS looks like it's for scheduling and evicting pods to make room for other pods, but they don't get 'niced'.
Priority also looks like it's for eviction.
You may achieve what you're looking for in many ways.
I imagine that you've already read this and that, based on the theory of this other.
Also RedHat has nice documentation about setting hardware limits via softwarre.
Here you can find how to restrict cpu usage, which may be set inside a container to achieve what you're looking for.
So, to recap: with K8S you can set requests and limits, and inside the container you can set even further restrictive limits.
Hope this gives you the solution or at least the path to follow in order to achieve what you want.

Pod CPU Throttling

I'm experiencing a strange issue when using CPU Requests/Limits in Kubernetes. Prior to setting any CPU Requests/Limits at all, all my services performed very well. I recently started placing some Resource Quotas to avoid future resource starvation. These values were set based in the actual usage of those services, but to my surprise, after those were added, some services started to increase their response time drastically. My first guess was that I might placed wrong Requests/Limits, but looking at the metrics revealed that in fact none of the services facing this issue were near those values. In fact, some of them were closer to the Requests than the Limits.
Then I started looking at CPU throttling metrics and found that all my pods are being throttled. I then increased the limits for one of the services to 1000m (from 250m) and I saw less throttling in that pod, but I don't understand why I should set that higher limit if the pod wasn't reaching its old limit (250m).
So my question is: If I'm not reaching the CPU limits, why are my pods throttling? Why is my response time increasing if the pods are not using their full capacity?
Here there are some screenshots of my metrics (CPU Request: 50m, CPU Limit: 250m):
CPU Usage (here we can see the CPU of this pod never reached its limit of 250m):
CPU Throttling:
After setting limits to this pod to 1000m, we can observe less throttling
kubectl top
P.S: Before setting these Requests/Limits there wasn't throttling at all (as expected)
P.S 2: None of my nodes are facing high usage. In fact, none of them are using more than 50% of CPU at any time.
Thanks in advance!
If you see the documentation you see when you issue a Request for CPUs it actually uses the --cpu-shares option in Docker which actually uses the cpu.shares attribute for the cpu,cpuacct cgroup on Linux. So a value of 50m is about --cpu-shares=51 based on the maximum being 1024. 1024 represents 100% of the shares, so 51 would be 4-5% of the share. That's pretty low, to begin with. But the important factor here is that this relative to how many pods/container you have on your system and what cpu-shares those have (are they using the default).
So let's say that on your node you have another pod/container with 1024 shares which is the default and you have this pod/container with 4-5 shares. Then this container will get about get about 0.5% CPU, while the other pod/container will
get about 99.5% of the CPU (if it has no limits). So again it all depends on how many pods/container you have on the node and what their shares are.
Also, not very well documented in the Kubernetes docs, but if you use Limit on a pod it's basically using two flags in Docker: --cpu-period and --cpu--quota which actually uses the cpu.cfs_period_us and the cpu.cfs_quota_us attributes for the cpu,cpuacct cgroup on Linux. This was introduced to the fact that cpu.shares didn't provide a limit so you'd spill over cases where containers would grab most of the CPU.
So, as far as this limit is concerned you will never hit it if you have other containers on the same node that don't have limits (or higher limits) but have a higher cpu.shares because they will end up optimizing and picking idle CPU. This could be what you are seeing, but again depends on your specific case.
A longer explanation for all of the above here.
Kubernetes uses (Completely Fair Scheduler) CFS quota to enforce CPU limits on pod containers. See "How does the CPU Manager work" described in https://kubernetes.io/blog/2018/07/24/feature-highlight-cpu-manager/ for further details.
The CFS is a Linux feature, added with the 2.6.23 kernel, which is based on two parameters: cpu.cfs_period_us and cpu.cfs_quota
To visualize these two parameters, I'd like to borrow the following picture from Daniele Polencic from his excellent blog (https://twitter.com/danielepolencic/status/1267745860256841731):
If you configure a CPU limit in K8s it will set period and quota. If a process running in a container reaches the limit it is preempted and has to wait for the next period. It is throttled.
So this is the effect, which you are experiencing. The period and quota algorithm should not be considered to be a CPU limit, where processes are unthrottled, if not reached.
The behavior is confusing, and also a K8s issue exist for this: https://github.com/kubernetes/kubernetes/issues/67577
The recommendation given in https://github.com/kubernetes/kubernetes/issues/51135 is to not set CPU limits for pods that shouldn't be throttled.
TLDR: remove your CPU limits. (Unless this alert fires on metrics-server in which case that wont work.) CPU limits are actually a bad-practice and not a best-practice.
Why this happens
I'll focus on what to do, but first let me give a quick example showing why this happens:
Imagine a pod with a CPU limit of 100m which is equivalent to 1/10 vCPU.
The pod does nothing for 10 minutes.
Then it uses the CPU nonstop for 200ms. The usage during the burst is equivalent to 2/10 vCPU, hence the pod is over it's limit and will be throttled.
On the other hand, the average CPU usage will be incredibly low.
In a case like this you'll be throttled but the burst is so small (200 milliseconds) that it wont show up in any graphs.
What to do
You actually don't want CPU limits in most cases because they prevent pods from using spare resources. There are Kubernetes maintainers on the record saying you shouldn't use CPU limits and should only set requests.
More info
I wrote a whole wiki page on why CPU throttling can occur despite low CPU usage and what to do about it. I also go into some common edge cases like how to deal with this for metrics-server which doesn't follow the usual rules.

How to set the right cpu millicores for a container?

I want to optimally configure the CPU cores without over or under allocation. How can I measure the required CPU millicore for a given container? It also brings the question of how much traffic a proxy will send it to any given pod based on CPU consumption so we can optimally use the compute.
Currently I send requests and monitor with,
kubectl top pod
Is there any tool that can measure, Requests, CPU and Memory over the time and suggest the optimal CPU recommendation for the pods.
Monitoring over time and per Pod yes, there's suggestions at https://kubernetes.io/docs/tasks/debug-application-cluster/resource-usage-monitoring/ One of the more popular is the Prometheus-Grafana combination - https://grafana.com/dashboards/315
As for automatic suggestion of the request and limits, I don't think there is anything. Keep in mind Kubernetes already tries to balance giving each Pod what it needs without it taking too much. The limits and requests that you set are to help it do this more safely. There are limitations on automatically inference as an under-resourced Pod can still work but respond a bit slower - it is up to you to decide what level of slowness you would tolerate. It is also up to you to decide what level of resource consumption could be acceptable in peak load, as opposed to excessive consumption that might indicate a bug in your app or even an attack. There's a further limitation as the metric units are themselves an attempt to approximate resource power that can actually vary with types of hardware (memory and CPUs can differ in mode of operation as well as quantity) and so can vary across clusters or even nodes on a cluster if the hardware isn't all equal.
What you are doing with top seems to me a good way to get started. You'll want to monitor resource usage for the cluster anyway so keeping track of this and adjusting limits as you go is a good idea. If you can run the same app outside of kubernetes and read around to see what other apps using the same language do then that can help to indicate if there's anything you can do to improve utilisation (memory consumption on the JVM in containers for example famously requires some tweaking to get right).