How to set the right cpu millicores for a container? - kubernetes

I want to optimally configure the CPU cores without over or under allocation. How can I measure the required CPU millicore for a given container? It also brings the question of how much traffic a proxy will send it to any given pod based on CPU consumption so we can optimally use the compute.
Currently I send requests and monitor with,
kubectl top pod
Is there any tool that can measure, Requests, CPU and Memory over the time and suggest the optimal CPU recommendation for the pods.

Monitoring over time and per Pod yes, there's suggestions at https://kubernetes.io/docs/tasks/debug-application-cluster/resource-usage-monitoring/ One of the more popular is the Prometheus-Grafana combination - https://grafana.com/dashboards/315
As for automatic suggestion of the request and limits, I don't think there is anything. Keep in mind Kubernetes already tries to balance giving each Pod what it needs without it taking too much. The limits and requests that you set are to help it do this more safely. There are limitations on automatically inference as an under-resourced Pod can still work but respond a bit slower - it is up to you to decide what level of slowness you would tolerate. It is also up to you to decide what level of resource consumption could be acceptable in peak load, as opposed to excessive consumption that might indicate a bug in your app or even an attack. There's a further limitation as the metric units are themselves an attempt to approximate resource power that can actually vary with types of hardware (memory and CPUs can differ in mode of operation as well as quantity) and so can vary across clusters or even nodes on a cluster if the hardware isn't all equal.
What you are doing with top seems to me a good way to get started. You'll want to monitor resource usage for the cluster anyway so keeping track of this and adjusting limits as you go is a good idea. If you can run the same app outside of kubernetes and read around to see what other apps using the same language do then that can help to indicate if there's anything you can do to improve utilisation (memory consumption on the JVM in containers for example famously requires some tweaking to get right).

Related

What size should the cpu limits be decided in k8s?

I did performance test to my application.
The usage of cpu was 30% while testing. - the test sever has 4 cores.
so I thought one core had 30/4 usage = 7.5%.
I want to a container in k8s to keep the usage of CPU under 30%
so I decided that the cpu limits was 250m + 50m(extra core).
I am wondering if this way is right? otherwise, is there the best other way to decide the limit of cpu?
To answer your question, I am wondering if this way is right?, yes, that's the right way. You can also use a tool called goldilocks, a kubernetes controller that collects data about running pods and provides recommendations on how to set resource requests and limits. It could help you identify a starting point for resource requests and limits.
I'm not sure if your calculations are correct, as CPU resources are defined in millicores. If your container needs two full cores to run, you would put the value “2000m”. If your container only needs ¼ of a core, you would put a value of “250m”.
So if you want 30% of 1 of your cores then 300m is the correct amount. But if you want 30% of your 4 cores, then I would say 1200m is the correct amount.
There is kubernetes documentation about that.
You could also consider using Vertical Pod Autoscaler.
Vertical Pod Autoscaler (VPA) frees the users from necessity of setting up-to-date resource limits and requests for the containers in their pods. When configured, it will set the requests automatically based on usage and thus allow proper scheduling onto nodes so that appropriate resource amount is available for each pod. It will also maintain ratios between limits and requests that were specified in initial containers configuration.
I would also recommend to read below tutorials about limits and requests:
Setting the right requests and limits in Kubernetes
Kubernetes best practices: Resource requests and limits

How to determine resource limit for Openshift Pods required for my tomcat application?

I've a web application (soap service) running in Tomcat 8 server in Openshift. The payload size is relatively small with 5-10 elements and the traffic is also small (300 calls per day, 5-10 max threads at a time). I'm little confused on the Pod resource restriction. How do I come up with min and max cpu and memory limits for each pod if I'm going to use min 1 and max 3 pods for my application?
It's tricky to configure accurate limitation value without performance test.
Because we don't expect your application is required how much resources process per requests. A good rule of thumb is to limit the resource based on heaviest workload on your environment. Memory limitation can trigger OOM-killer, so you should set up afforded value which is based on your tomcat heap and static memory size.
As opposed to CPU limitation will not kill your pod if reached the limitation value, but slow down the process speed.
My suggestion of each limitation value's starting point is as follows.
Memory: Tomcat(Java) memory size + 30% buffer
CPU: personally I think CPU limitation is useless to maximize the
process performance and efficiency. Even though CPU usage is afforded and the pod
can use full cpu resources to process the requests as soon as
possible at that time, the limitation setting can disturb it. But if
you should spread the resource usage evenly for suppressing some
aggressive resource eater, you can consider the CPU limitation.
This answer might not be what you want to, but I hope it help you to consider your capacity planning.

What is the use case of setting memory request less than limit in . K8s

I understand the use case of setting CPU request less than limit - it allows for CPU bursts in each container if the instances has free CPU hence resulting in max CPU utilization.
However, I cannot really find the use case for doing the same with memory. Most application don't release memory after allocating it, so effectively applications will request up to 'limit' memory (which has the same effect as setting request = limit). The only exception is containers running on an instance that already has all its memory allocated. I don't really see any pros in this, and the cons are more nondeterministic behaviour that is hard to monitor (one container having higher latencies than the other due to heavy GC).
Only use case I can think of is a shaded in memory cache, where you want to allow for a spike in memory usage. But even in this case one would be running the risk of of one of the nodes underperforming.
Maybe not a real answer, but a point of view on the subject.
The difference with the limit on CPU and Memory is what happens when the limit is reached. In case of the CPU, the container keeps running but the CPU usage is limited. If memory limit is reached, container gets killed and restarted.
In my use case, I often set the memory request to the amount of memory my application uses on average, and the limit to +25%. This allows me to avoid container killing most of the time (which is good) but of course it exposes me to memory overallocation (and this could be a problem as you mentioned).
Actually the topic you mention is interesting and in the meantime complex, just as Linux memory management is. As we know when the process is using more memory than the limit it will quickly move up on the potential "to-kill" process "ladder". Going further, the purpose of limit is to tell the kernel when it should consider the process to be potentially killed. Requests on the other hand are a direct statement "my container will need this much memory", but other than that they provide valuable information to the Scheduler about where can the Pod be scheduled (based on available Node resources).
If there is no memory request and high limit, Kubernetes will default the request to the limit (this might result in scheduling fail, even if the pods real requirements are met).
If you set a request, but not limit - container will use the default limit for namespace (if there is none, it will be able to use the whole available Node memory)
Setting memory request which is lower than limit you will give your pods room to have activity bursts. Also you make sure that a memory which is available for Pod to consume during boost is actually a reasonable amount.
Setting memory limit == memory request is not desirable simply because activity spikes will put it on a highway to be OOM killed by Kernel. The memory limits in Kubernetes cannot be throttled, if there is a memory pressure that is the most probable scenario (lets also remember that there is no swap partition).
Quoting Will Tomlin and his interesting article on Requests vs Limits which I highly recommend:
You might be asking if there’s reason to set limits higher than
requests. If your component has a stable memory footprint, you
probably shouldn’t since when a container exceeds its requests, it’s
more likely to be evicted if the worker node encounters a low memory
condition.
To summarize - there is no straight and easy answer. You have to determine your memory requirements and use monitoring and alerting tools to have control and be ready to change/adjust the configuration accordingly to needs.

Pod CPU Throttling

I'm experiencing a strange issue when using CPU Requests/Limits in Kubernetes. Prior to setting any CPU Requests/Limits at all, all my services performed very well. I recently started placing some Resource Quotas to avoid future resource starvation. These values were set based in the actual usage of those services, but to my surprise, after those were added, some services started to increase their response time drastically. My first guess was that I might placed wrong Requests/Limits, but looking at the metrics revealed that in fact none of the services facing this issue were near those values. In fact, some of them were closer to the Requests than the Limits.
Then I started looking at CPU throttling metrics and found that all my pods are being throttled. I then increased the limits for one of the services to 1000m (from 250m) and I saw less throttling in that pod, but I don't understand why I should set that higher limit if the pod wasn't reaching its old limit (250m).
So my question is: If I'm not reaching the CPU limits, why are my pods throttling? Why is my response time increasing if the pods are not using their full capacity?
Here there are some screenshots of my metrics (CPU Request: 50m, CPU Limit: 250m):
CPU Usage (here we can see the CPU of this pod never reached its limit of 250m):
CPU Throttling:
After setting limits to this pod to 1000m, we can observe less throttling
kubectl top
P.S: Before setting these Requests/Limits there wasn't throttling at all (as expected)
P.S 2: None of my nodes are facing high usage. In fact, none of them are using more than 50% of CPU at any time.
Thanks in advance!
If you see the documentation you see when you issue a Request for CPUs it actually uses the --cpu-shares option in Docker which actually uses the cpu.shares attribute for the cpu,cpuacct cgroup on Linux. So a value of 50m is about --cpu-shares=51 based on the maximum being 1024. 1024 represents 100% of the shares, so 51 would be 4-5% of the share. That's pretty low, to begin with. But the important factor here is that this relative to how many pods/container you have on your system and what cpu-shares those have (are they using the default).
So let's say that on your node you have another pod/container with 1024 shares which is the default and you have this pod/container with 4-5 shares. Then this container will get about get about 0.5% CPU, while the other pod/container will
get about 99.5% of the CPU (if it has no limits). So again it all depends on how many pods/container you have on the node and what their shares are.
Also, not very well documented in the Kubernetes docs, but if you use Limit on a pod it's basically using two flags in Docker: --cpu-period and --cpu--quota which actually uses the cpu.cfs_period_us and the cpu.cfs_quota_us attributes for the cpu,cpuacct cgroup on Linux. This was introduced to the fact that cpu.shares didn't provide a limit so you'd spill over cases where containers would grab most of the CPU.
So, as far as this limit is concerned you will never hit it if you have other containers on the same node that don't have limits (or higher limits) but have a higher cpu.shares because they will end up optimizing and picking idle CPU. This could be what you are seeing, but again depends on your specific case.
A longer explanation for all of the above here.
Kubernetes uses (Completely Fair Scheduler) CFS quota to enforce CPU limits on pod containers. See "How does the CPU Manager work" described in https://kubernetes.io/blog/2018/07/24/feature-highlight-cpu-manager/ for further details.
The CFS is a Linux feature, added with the 2.6.23 kernel, which is based on two parameters: cpu.cfs_period_us and cpu.cfs_quota
To visualize these two parameters, I'd like to borrow the following picture from Daniele Polencic from his excellent blog (https://twitter.com/danielepolencic/status/1267745860256841731):
If you configure a CPU limit in K8s it will set period and quota. If a process running in a container reaches the limit it is preempted and has to wait for the next period. It is throttled.
So this is the effect, which you are experiencing. The period and quota algorithm should not be considered to be a CPU limit, where processes are unthrottled, if not reached.
The behavior is confusing, and also a K8s issue exist for this: https://github.com/kubernetes/kubernetes/issues/67577
The recommendation given in https://github.com/kubernetes/kubernetes/issues/51135 is to not set CPU limits for pods that shouldn't be throttled.
TLDR: remove your CPU limits. (Unless this alert fires on metrics-server in which case that wont work.) CPU limits are actually a bad-practice and not a best-practice.
Why this happens
I'll focus on what to do, but first let me give a quick example showing why this happens:
Imagine a pod with a CPU limit of 100m which is equivalent to 1/10 vCPU.
The pod does nothing for 10 minutes.
Then it uses the CPU nonstop for 200ms. The usage during the burst is equivalent to 2/10 vCPU, hence the pod is over it's limit and will be throttled.
On the other hand, the average CPU usage will be incredibly low.
In a case like this you'll be throttled but the burst is so small (200 milliseconds) that it wont show up in any graphs.
What to do
You actually don't want CPU limits in most cases because they prevent pods from using spare resources. There are Kubernetes maintainers on the record saying you shouldn't use CPU limits and should only set requests.
More info
I wrote a whole wiki page on why CPU throttling can occur despite low CPU usage and what to do about it. I also go into some common edge cases like how to deal with this for metrics-server which doesn't follow the usual rules.

Multiple node pools vs single pool with many machines vs big machines

We're moving all of our infrastructure to Google Kubernetes Engine (GKE) - we currently have 50+ AWS machines with lots of APIs, Services, Webapps, Database servers and more.
As we have already dockerized everything, it's time to start moving everything to GKE.
I have a question that may sound too basic, but I've been searching the Internet for a week and did not found any reasonable post about this
Straight to the point, which of the following approaches is better and why:
Having multiple node pools with multiple machine types and always specify in which pool each deployment should be done; or
Having a single pool with lots of machines and let Kubernetes scheduler do the job without worrying about where my deployments will be done; or
Having BIG machines (in multiple zones to improve clusters' availability and resilience) and let Kubernetes deploy everything there.
List of consideration to be taken merely as hints, I do not pretend to describe best practice.
Each pod you add brings with it some overhead, but you increase in terms of flexibility and availability making failure and maintenance of nodes to be less impacting the production.
Nodes too small would cause a big waste of resources since sometimes will be not possible to schedule a pod even if the total amount of free RAM or CPU across the nodes would be enough, you can see this issue similar to memory fragmentation.
I guess that the sizes of PODs and their memory and CPU request are not similar, but I do not see this as a big issue in principle and a reason to go for 1). I do not see why a big POD should run merely on big machines and a small one should be scheduled on small nodes. I would rather use 1) if you need a different memoryGB/CPUcores ratio to support different workloads.
I would advise you to run some test in the initial phase to understand which is the size of the biggest POD and the average size of the workload in order to properly chose the machine types. Consider that having 1 POD that exactly fit in one node and assign to it is not the right to proceed(virtual machine exist for this kind of scenario). Since fragmentation of resources would easily cause to impossibility to schedule a large node.
Consider that their size will likely increase in the future and to scale vertically is not always this immediate and you need to switch off machine and terminate pods, I would oversize a bit taking this issue into account and since scaling horizontally is way easier.
Talking about the machine type you can decide to go for a machine 5xsize the biggest POD you have (or 3x? or 10x?). Oversize a bit as well the numebr of nodes of the cluster to take into account overheads, fragmentation and in order to still have free resources.
Remember that you have an hard limit of 100 pods each node and 5000 nodes.
Remember that in GCP the network egress throughput cap is dependent on the number of vCPUs that a virtual machine instance has. Each vCPU has a 2 Gbps egress cap for peak performance. However each additional vCPU increases the network cap, up to a theoretical maximum of 16 Gbps for each virtual machine.
Regarding the prices of the virtual machines notice that there is no difference in price buying two machines with size x or one with size 2x. Avoid to customise the size of machines because rarely is convenient, if you feel like your workload needs more cpu or mem go for HighMem or HighCpu machine type.
P.S. Since you are going to build a pretty big Cluster, check the size of the DNS
I will add any consideration that it comes to my mind, consider in the future to update your question with the description of the path you chose and the issue you faced.
1) makes a lot of sense as if you want, you can still allow kube deployments treat it as one large pool (by not adding nodeSelector/NodeAffinity) but you can have different machines of different sizes, you can think about having a pool of spot instances, etc. And, after all, you can have pools that are tainted and so forth excluded from normal scheduling and available to only a particular set of workloads. It is in my opinion preferred to have some proficiency with this approach from the very beginning, yet in case of many provisioners it should be very easy to migrate from 2) to 1) anyway.
2) As explained above, it's effectively a subset of 1) so better to build up exp with 1) approach from day 1, but if you ensure your provisioning solution supports easy extension to 1) model then you can get away with starting with this simplified approach.
3) Big is nice, but "big" is relative. It depends on the requirements and amount of your workloads. Remember that while you need to plan for loss of a whole AZ anyway, it will be much more frequent to loose single nodes (reboots, decommissions of underlying hardware, updates etc.) so if you have more hosts, impact of loosing one will be smaller. Bottom line is that you need to find your own balance, that makes sense for your particular scale. Maybe 50 nodes is too much, would 15 cut it? Who knows but you :)