Allocatable resources not taken into account on one Kubernetes node

Allocatable resources not taken into account on one Kubernetes node - kubernetes

I have a K8s cluster on premise with five nodes running 1.19.8. For some time now one of those nodes gets a lot more pressure than the other four.
That node has double the amount of CPU cores available. But the same amount of memory as the others.
When I describe it, I see the same amount of memory and the same pod limit of 110.
Last night I had a look at it and it was running more than 130 pods! Sometimes it has nearly 98 percent of memory used whilst other nodes have 60 to 70 % and way fewer pods assigned. But it still gets assigned new ones.
At some point then it gets a SystemOOM and the OS starts killing processes...
Has anyone any idea what could go wrong here? Why this is happening? Or where I could start to check?
Thanks in advance!

Related

How to setup Kubernetes HPA to scale based on maximum available memory in a given pod?

I’d like to autoscale the pods not based on the average memory, but rather based on largest amount of available memory in a given pod.
Example:
Let’s say the target maximum available memory is 50%.
If we have 7 pods already and 6 of them have 90% occupied memory, but a single pod with 40% occupied memory, that’d satisfy my criteria and we won’t need to upscale. But the moment that last pod goes below 50% available memory we’ll upscale.
I know it’s not a wise criteria for scaling in case majority of case, but in my particular circumstance, it fits.

Can I set the pod to use max request CPU from the beginning?

I am using Openshift 4, CPU Request: 0.2, Limit 0.4.
From the monitoring, I can see the CPU usage started from 0.1, and increased gradually. Is it because that there is a machanisim to prevent over reserve the CPU usage?
Can I setup that the pod to use the max request CPU from the beginning, and adapt to Limit as fast as possible?

The max limit is already available from the beginning (presuming that the node has the CPU available to give). OCP is using CFS to enforce that limit, and CFS doesn't have anything that gradually kicks in, CFS only has one thing it considers: the configured limit.
As for why you are seeing this in your monitoring, I'm not sure. But my first guess would be that that graph is using a moving average. (And thus, since it's a moving average it will converge towards the actual usage.)

Opensearch: Data node costs

I don't understand the costs of having 1 data node vs having 2 or more data nodes.
Will I have the same cost regardless of the number of nodes?
If I have 2 data nodes, that means that I will have double the cost of the instances?
Thanks

Depends on the instance size: i3.2xlarge would be ~2x more expensive than i3.xlarge.
If you use one instance size then yes, 2 nodes would be 2x more expensive than 1 node but you'll get more resilience (if one node goes down your cluster can still get updates and serve data) and rolling restarts.
Though, Opensearch requires an odd number of nodes for master election to work reliably so 3 smaller nodes might be better than 2 larger ones.

Set cpu requests in K8s for fluctuating load

I have a service deployed in Kubernetes and I am trying to optimize the requested cpu resources.
For now, I have deployed 10 instances and set spec.containers[].resources.limits.cpu to 0.1, based on the "average" use. However, it became obvious that this average is rather useless in practice because under constant load, the load increases significantly (to 0.3-0.4 as far as I can tell).
What happens consequently, when multiple instances are deployed on the same node, is that this node is heavily overloaded; pods are no longer responsive, are killed and restarted etc.
What is the best practice to find a good value? My current best guess is to increase the requested cpu to 0.3 or 0.4; I'm looking at Grafana visualizations and see that the pods on the heavily loaded node(s) converge there under continuous load.
However, how can I know if they would use more load if they could before becoming unresponsive as the node is overloaded?
I'm actually trying to understand how to approach this in general. I would expect an "ideal" service (presuming it is CPU-focused) to use close to 0.0 when there is no load, and close to 1.0 when requests are constantly coming in. With that assumption, should I set the cpu.requests to 1.0, taking a perspective where actual constant usage is assumed?
I have read some Kubernetes best practice guides, but none of them seem to address how to set the actual value for cpu requests in practice in more depth than "find an average".

Basically come up with a number that is your lower acceptable bound for how much the process runs. Setting a request of 100m means that you are okay with a lower limit of your process running 0.1 seconds for every 1 second of wall time (roughly). Normally that should be some kind of average utilization, usually something like a P99 or P95 value over several days or weeks. Personally I usually look at a chart of P99, P80, and P50 (median) over 30 days and use that to decide on a value.
Limits are a different beast, they are setting your CPU timeslice quota. This subsystem in Linux has some persistent bugs so unless you've specifically vetted your kernel as correct, I don't recommend using it for anything but the most hostile of programs.

In a nutshell: Main goal is to understand how much traffic a pod can handle and how much resource it consumes to do so.
CPU limits are hard to understand and can be harmful, you might want
to avoid them, see static policy documentation and relevant
github issue.
To dimension your CPU requests you will want to understand first how much a pod can consume during high load. In order to do this you can :
disable all kind of autoscaling (HPA, vertical pod autoscaler, ...)
set the number of replicas to one
lift the CPU limits
request the highest amount of CPU you can on a node (3.2 usually on 4cpu nodes)
send as much traffic as you can on the application (you can achieve simple Load Tests scenarios with locust for example)
You will eventually end up with a ratio clients-or-requests-per-sec/cpu-consumed. You can suppose the relation is linear (this might not be true if your workload complexity is O(n^2) with n the number of clients connected, but this is not the nominal case).
You can then choose the pod resource requests based on the ratio you measured. For example if you consume 1.2 cpu for 1000 requests per second you know that you can give each pod 1 cpu and it will handle up to 800 requests per second.
Once you know how much a pod can consume under its maximal load, you can start setting up cpu-based autoscaling, 70% is a good first target that can be refined if you encounter issues like latency or pods not autoscaling fast enough. This will avoid your nodes to run out of cpu if the load increases.
There are a few gotchas, for example single-threaded applications are not able to consume more than a cpu. Thus if you give it 1.5 cpu it will run out of cpu but you won't be able to visualize it from metrics as you'll believe it still can consume 0.5 cpu.

Calculating memory requests and limits in Kubernetes

We have a couple of clusters running on GKE and up until now I've only been maintaining a CPU request/limit for pods. We've recently run into issues where the cluster autoscaling isn't responding when pods begin to be evicted for low memory, and we can visibly see in the GKE console that there is memory pressure on at least one of the nodes.
I was hoping someone could tell me: is there some sort of calculation that we can make as a starting point for how much memory we should request/limit per pod of each of our services, or is that was more trial/error? Is there some statistic service that can track what's being used in the cluster now?
Thanks!

There is no magic trick for calculating limits. You need to start with reasonable limits and refine using trial and error.
I can suggest a video from YouTube that explains quite well a method to refine your limits: https://youtu.be/-lsJyni7EQA
Basically it suggests to start with low limits and load test your application (one pod instance) until it breaks.
Than, raise the limits and load test again until you find good values.