What size should the cpu limits be decided in k8s? - kubernetes

I did performance test to my application.
The usage of cpu was 30% while testing. - the test sever has 4 cores.
so I thought one core had 30/4 usage = 7.5%.
I want to a container in k8s to keep the usage of CPU under 30%
so I decided that the cpu limits was 250m + 50m(extra core).
I am wondering if this way is right? otherwise, is there the best other way to decide the limit of cpu?

To answer your question, I am wondering if this way is right?, yes, that's the right way. You can also use a tool called goldilocks, a kubernetes controller that collects data about running pods and provides recommendations on how to set resource requests and limits. It could help you identify a starting point for resource requests and limits.
I'm not sure if your calculations are correct, as CPU resources are defined in millicores. If your container needs two full cores to run, you would put the value “2000m”. If your container only needs ¼ of a core, you would put a value of “250m”.
So if you want 30% of 1 of your cores then 300m is the correct amount. But if you want 30% of your 4 cores, then I would say 1200m is the correct amount.
There is kubernetes documentation about that.
You could also consider using Vertical Pod Autoscaler.
Vertical Pod Autoscaler (VPA) frees the users from necessity of setting up-to-date resource limits and requests for the containers in their pods. When configured, it will set the requests automatically based on usage and thus allow proper scheduling onto nodes so that appropriate resource amount is available for each pod. It will also maintain ratios between limits and requests that were specified in initial containers configuration.
I would also recommend to read below tutorials about limits and requests:
Setting the right requests and limits in Kubernetes
Kubernetes best practices: Resource requests and limits

Related

High total CPU request but low total usage (kubernetes resources)

I have a bunch of pods in a cluster that is almost requesting all (7.35/8) available CPU resources on a node:
even though their actual total usage is almost nothing (0.34/8).
The pod that is currently requesting the most only requests 210m which I guess is not an outrageous amount - also I would like to enforce some sensible minimum request size for all pods in the cluster. Of course that will accumulate when there are lots of pods.
It seems I could easily scale down the request by a factor of 10 and leave the limits where they are to begin with.
But is there something else that I should look into instead before doing that - reducing replica count etc.?
Also it looks a bit strange that the pods are not more evenly distributed between the nodes.
Your request values seems overestimated.
You need time and metrics to find the right request/limit for your workload.
Keep in mind that if you change those values, your pods will restart.
Also, It's normal that you can find some unbalance nodes on your cluster. Kubernetes will never remove a pod if you don't ask.
For example, if your create a cluster with 3 nodes, fill those 3 nodes with pods and then add another 3 nodes. The new nodes will stay empty.
You can setup some HorizontalPodAutoScaler on your cluster to adapt your number of pod to your workload.
Doing that, your workload will spread among nodes and with a correct balance. (if you use the default Scheduling Policy
I suggest following:
Resource Allocation: Based on history value set your request to meaningful value with buffer. Also to have guaranteed pod resource allocation it may be a good idea to set request and limit as same value. But that means you pod cannot burst for new resource. One more thing to note is scheduling only happens based on requested value, so if node has no more resource left, then pod will be killed and rescheduled if you request is trying to burst to limit.
Resource quotas: Check Kubernetes Resource Quotas to have sensible namespace level quotas to control overly provisioned resources by developers
Affinity/AntiAffinity: Check concept of Anti-affinity to have your replicas or different pods scheduled across your cluster. You can ensure for eg., that one host or Avalability zone etc can have only one replica of your pod (helps in HA), spread different pods to different nodes (layer scheduling etc) - Check this video
There are good answers already but I would like to add some more info.
It is very important to have a good strategy when calculating how much resources you would need for each container.
Optimally, your pods should be using exactly the amount of resources you requested but that's almost impossible to achieve. If the usage is lower than your request, you are wasting resources. If it's higher, you are risking performance issues. Consider a 25% margin up and down the request value as a good starting point. Regarding limits, achieving a good setting would depend on trying and adjusting. There is no optimal value that would fit everyone as it depends on many factors related to the application itself, the demand model, the tolerance to errors etc.
Kubernetes best practices: Resource requests and limits is a very good guide explaining the idea behind these mechanisms with a detailed explanation and examples.
Also, Managing Resources for Containers will provide you with the official docs regarding:
Requests and limits
Resource types
Resource requests and limits of Pod and Container
Resource units in Kubernetes
How Pods with resource requests are scheduled
How Pods with resource limits are run, etc
Just in case you'll need a reference.

Kubernetes "nice" a pod's CPU usage

I have a cluster w/ 3 nodes. Hurray! The cluster doesn't autoscale nodes.
These nodes run an amazing web app, yet most of the time do almost nothing.
I also have a background process that could use an infinite amount of CPU (the usefulness drops rapidly but remains useful).
I want these background pods to run on each Node and slowed down to leave a 20% CPU headroom on the Node. Or similar.
That's the shape of a DaemonSet.
Can I tell Kubernetes to deprioritize the DaemonSet Pods w/ a 20% headroom?
Can the DaemonSet Pods detect the Nodes CPU usage and deprioritize themselves (risky if buggy)?
QoS looks like it's for scheduling and evicting pods to make room for other pods, but they don't get 'niced'.
Priority also looks like it's for eviction.
You may achieve what you're looking for in many ways.
I imagine that you've already read this and that, based on the theory of this other.
Also RedHat has nice documentation about setting hardware limits via softwarre.
Here you can find how to restrict cpu usage, which may be set inside a container to achieve what you're looking for.
So, to recap: with K8S you can set requests and limits, and inside the container you can set even further restrictive limits.
Hope this gives you the solution or at least the path to follow in order to achieve what you want.

Pod CPU Throttling

I'm experiencing a strange issue when using CPU Requests/Limits in Kubernetes. Prior to setting any CPU Requests/Limits at all, all my services performed very well. I recently started placing some Resource Quotas to avoid future resource starvation. These values were set based in the actual usage of those services, but to my surprise, after those were added, some services started to increase their response time drastically. My first guess was that I might placed wrong Requests/Limits, but looking at the metrics revealed that in fact none of the services facing this issue were near those values. In fact, some of them were closer to the Requests than the Limits.
Then I started looking at CPU throttling metrics and found that all my pods are being throttled. I then increased the limits for one of the services to 1000m (from 250m) and I saw less throttling in that pod, but I don't understand why I should set that higher limit if the pod wasn't reaching its old limit (250m).
So my question is: If I'm not reaching the CPU limits, why are my pods throttling? Why is my response time increasing if the pods are not using their full capacity?
Here there are some screenshots of my metrics (CPU Request: 50m, CPU Limit: 250m):
CPU Usage (here we can see the CPU of this pod never reached its limit of 250m):
CPU Throttling:
After setting limits to this pod to 1000m, we can observe less throttling
kubectl top
P.S: Before setting these Requests/Limits there wasn't throttling at all (as expected)
P.S 2: None of my nodes are facing high usage. In fact, none of them are using more than 50% of CPU at any time.
Thanks in advance!
If you see the documentation you see when you issue a Request for CPUs it actually uses the --cpu-shares option in Docker which actually uses the cpu.shares attribute for the cpu,cpuacct cgroup on Linux. So a value of 50m is about --cpu-shares=51 based on the maximum being 1024. 1024 represents 100% of the shares, so 51 would be 4-5% of the share. That's pretty low, to begin with. But the important factor here is that this relative to how many pods/container you have on your system and what cpu-shares those have (are they using the default).
So let's say that on your node you have another pod/container with 1024 shares which is the default and you have this pod/container with 4-5 shares. Then this container will get about get about 0.5% CPU, while the other pod/container will
get about 99.5% of the CPU (if it has no limits). So again it all depends on how many pods/container you have on the node and what their shares are.
Also, not very well documented in the Kubernetes docs, but if you use Limit on a pod it's basically using two flags in Docker: --cpu-period and --cpu--quota which actually uses the cpu.cfs_period_us and the cpu.cfs_quota_us attributes for the cpu,cpuacct cgroup on Linux. This was introduced to the fact that cpu.shares didn't provide a limit so you'd spill over cases where containers would grab most of the CPU.
So, as far as this limit is concerned you will never hit it if you have other containers on the same node that don't have limits (or higher limits) but have a higher cpu.shares because they will end up optimizing and picking idle CPU. This could be what you are seeing, but again depends on your specific case.
A longer explanation for all of the above here.
Kubernetes uses (Completely Fair Scheduler) CFS quota to enforce CPU limits on pod containers. See "How does the CPU Manager work" described in https://kubernetes.io/blog/2018/07/24/feature-highlight-cpu-manager/ for further details.
The CFS is a Linux feature, added with the 2.6.23 kernel, which is based on two parameters: cpu.cfs_period_us and cpu.cfs_quota
To visualize these two parameters, I'd like to borrow the following picture from Daniele Polencic from his excellent blog (https://twitter.com/danielepolencic/status/1267745860256841731):
If you configure a CPU limit in K8s it will set period and quota. If a process running in a container reaches the limit it is preempted and has to wait for the next period. It is throttled.
So this is the effect, which you are experiencing. The period and quota algorithm should not be considered to be a CPU limit, where processes are unthrottled, if not reached.
The behavior is confusing, and also a K8s issue exist for this: https://github.com/kubernetes/kubernetes/issues/67577
The recommendation given in https://github.com/kubernetes/kubernetes/issues/51135 is to not set CPU limits for pods that shouldn't be throttled.
TLDR: remove your CPU limits. (Unless this alert fires on metrics-server in which case that wont work.) CPU limits are actually a bad-practice and not a best-practice.
Why this happens
I'll focus on what to do, but first let me give a quick example showing why this happens:
Imagine a pod with a CPU limit of 100m which is equivalent to 1/10 vCPU.
The pod does nothing for 10 minutes.
Then it uses the CPU nonstop for 200ms. The usage during the burst is equivalent to 2/10 vCPU, hence the pod is over it's limit and will be throttled.
On the other hand, the average CPU usage will be incredibly low.
In a case like this you'll be throttled but the burst is so small (200 milliseconds) that it wont show up in any graphs.
What to do
You actually don't want CPU limits in most cases because they prevent pods from using spare resources. There are Kubernetes maintainers on the record saying you shouldn't use CPU limits and should only set requests.
More info
I wrote a whole wiki page on why CPU throttling can occur despite low CPU usage and what to do about it. I also go into some common edge cases like how to deal with this for metrics-server which doesn't follow the usual rules.

How to set the right cpu millicores for a container?

I want to optimally configure the CPU cores without over or under allocation. How can I measure the required CPU millicore for a given container? It also brings the question of how much traffic a proxy will send it to any given pod based on CPU consumption so we can optimally use the compute.
Currently I send requests and monitor with,
kubectl top pod
Is there any tool that can measure, Requests, CPU and Memory over the time and suggest the optimal CPU recommendation for the pods.
Monitoring over time and per Pod yes, there's suggestions at https://kubernetes.io/docs/tasks/debug-application-cluster/resource-usage-monitoring/ One of the more popular is the Prometheus-Grafana combination - https://grafana.com/dashboards/315
As for automatic suggestion of the request and limits, I don't think there is anything. Keep in mind Kubernetes already tries to balance giving each Pod what it needs without it taking too much. The limits and requests that you set are to help it do this more safely. There are limitations on automatically inference as an under-resourced Pod can still work but respond a bit slower - it is up to you to decide what level of slowness you would tolerate. It is also up to you to decide what level of resource consumption could be acceptable in peak load, as opposed to excessive consumption that might indicate a bug in your app or even an attack. There's a further limitation as the metric units are themselves an attempt to approximate resource power that can actually vary with types of hardware (memory and CPUs can differ in mode of operation as well as quantity) and so can vary across clusters or even nodes on a cluster if the hardware isn't all equal.
What you are doing with top seems to me a good way to get started. You'll want to monitor resource usage for the cluster anyway so keeping track of this and adjusting limits as you go is a good idea. If you can run the same app outside of kubernetes and read around to see what other apps using the same language do then that can help to indicate if there's anything you can do to improve utilisation (memory consumption on the JVM in containers for example famously requires some tweaking to get right).

Is there any tool for GKE nodes autoscaling base on total pods requested in kubernetes?

When I resize a replication controller using kubectl, if the cluster does not have enough resource, there will have one or more pods always in pending.
Is there has any tool will auto resize GKE cluster when the resource is running out?
I had a similar requirement (for the Go build system): wanted to know when scheduled vs. available CPU or memory was > 1, and scale out nodes when that was true (or, more accurately, when it was ~.8). There's not a built-in metric, but as you suggest you can do it with a custom metric.
This was all done in Go, but it will give you the basic idea:
Create the metrics (memory and CPU, in my case
Put values to the metrics
The key takeaway IMO is that you have to iterate over each pod in the cluster to determine how much capacity is consumed, then iterate over each node in the cluster to determine how much capacity is available. It's then just a matter of pointing your autoscaler to the custom metric(s).
Big big big thing worth noting: I ultimately determined that scaling on the built-in CPU utilization metric was just as good as (if not better than, but more on that in a bit) than the custom metric. Each pod we scheduled pegged the CPU, so when pods were maxed out so was CPU. The build-in CPU utilization metric is probably better because you don't have the latency that comes with periodically putting custom metrics.
You can turn on autoscaling for the Instance Group that your GKE nodes belong to.