Kubernetes limit and request of resource would be better to be closer - kubernetes

I was told by a more experienced DevOps person, that resource(CPU and Memory) limit and request would be nicer to be closer for scheduling pods.
Intuitively I can image less scaling up and down would require less calculation power for K8s? or can someone explain it in more detail?

The resource requests and limits do two fundamentally different things. The Kubernetes scheduler places a pod on a node based only on the sum of the resource requests: if the node has 8 GB of RAM, and the pods currently scheduled on that node requested 7 GB of RAM, then a new pod that requests 512 MB will fit there. The limits control how much resource the pod is actually allowed to use, with it getting CPU-throttled or OOM-killed if it uses too much.
In practice many workloads can be "bursty". Something might require 2 GB of RAM under peak load, but far less than that when just sitting idle. It doesn't necessarily make sense to provision enough hardware to run everything at peak load, but then to have it sit idle most of the time.
If the resource requests and limits are far apart then you can "fit" more pods on the same node. But, if the system as a whole starts being busy, you can wind up with many pods that are all using above their resource request, and actually use more memory than the node has, without any individual pod being above its limit.
Consider a node with 8 GB of RAM, and pods with 512 MB RAM resource requests and 2 GB limits. 16 of these pods "fit". But if each pod wants to use 1 GB RAM (allowed by the resource limits) that's more total memory than the node has, and you'll start getting arbitrary OOM-kills. If the pods request 1 GB RAM instead, only 8 will "fit" and you'll need twice the hardware to run them at all, but in this scenario the cluster will run happily.
One strategy for dealing with this in a cloud environment is what your ops team is asking, make the resource requests and limits be very close to each other. If a node fills up, an autoscaler will automatically request another node from the cloud. Scaling down is a little trickier. But this approach avoids problems where things die randomly because the Kubernetes nodes are overcommitted, at the cost of needing more hardware for the idle state.

Related

Is there a way to force the use of the same physical CPU while allocating cores to a pod in Kubernetes?

I was wondering if it was possible to force Kubernetes to allocate the cores from the same CPU while spinning up a POD. What I would like Kubernetes to do is, as new PODs are created, the cores allocated to them should come from -let's say- CPU1 as long as there are cores still available on it. CPU2's, CPU3's, etc. cores should not be used in the newly initiated pod. I would like my PODs to have cores allocated from a single CPU as long as it is possible.
Is there a way to achieve this?
Also, is there a way to see from which physical CPUs the cores(cpu) of a POD is coming from?
Thanks a lot.
Edit: Let me explain why I want to do this.
We are running a Spark cluster on Kubernetes. The lead of system/linux administration team warned us about the concept of NUMA. He told us that we could improve the performance of our executor pods if we were to allocate the cores from the same physical CPU. That is why I started digging into this.
I found this Kubernetes CPU Manager. The documentation says:
CPU Manager allocates CPUs in a topological order on a best-effort
basis. If a whole socket is free, the CPU Manager will exclusively
allocate the CPUs from the free socket to the workload. This boosts
the performance of the workload by avoiding any cross-socket traffic.
Also on the same page:
Allocate all the logical CPUs (hyperthreads) from the same physical
CPU core if available and the container requests an entire core worth
of CPUs.
So now I am starting to think maybe what I need is to enable the static policy for the CPU manager to get what I want.

How kubernetes request and limit works in pressure?

Let us assume kubernetes cluster with one worker node (1core and 256MB RAM). all pods will be scheduled in worker node.
At first i deployed a pod with config (request: cpu 0.4, limit: cpu 0.8), it deployed successfully. as the machine has 1 core free it took 0.8 cpu
Can i able to deploy another pod with same config? If yes will first pod's cpu reduce to 0.4?
Resource requests and limits are considered in two different places.
Requests are only considered when scheduling a pod. If you're scheduling two pods that each request 0.4 CPU on a node that has 1.0 CPU, then they fit and could both be scheduled there (along with other pods requesting up to a total of 0.2 CPU more).
Limits throttle CPU utilization, but are also subject to the actual physical limits of the node. If one pod tries to use 1.0 CPU but its pod spec limits it to 0.8 CPU, it will get throttled. If two of these pods run on the same hypothetical node with only 1 actual CPU, they will be subject to the kernel scheduling policy and in practice will probably each get about 0.5 CPU.
(Memory follows the same basic model, except that if a pod exceeds its limits or if the total combined memory used on a node exceeds what's available, the pod will get OOM-killed. If your node has 256 MB RAM, and each pod has a memory request of 96 MB and limit of 192 MB, they can both get scheduled [192 MB requested memory fits] but could get killed if either one individually allocates more than 192 MB RAM [its own limit] or if the total memory used by all Kubernetes and non-Kubernetes processes on that node goes over the physical memory limit.)
Fractional requests are allowed. A Container with spec.containers[].resources.requests.cpu of 0.5 is guaranteed half as much CPU as one that asks for 1 CPU. The expression 0.1 is equivalent to the expression 100m, which can be read as “one hundred millicpu”. Some people say “one hundred millicores”, and this is understood to mean the same thing. A request with a decimal point, like 0.1, is converted to 100m by the API, and precision finer than 1m is not allowed. For this reason, the form 100m might be preferred.
CPU is always requested as an absolute quantity, never as a relative quantity; 0.1 is the same amount of CPU on a single-core, dual-core, or 48-core machine.
From here
In your condition you able to run 2 pods on the node.

Which component in Kubernetes is responsible for resource limits?

When a pod is created but no resource limit is specified, which component is responsible to calculate or assign resource limit to that pod? Is that the kubelet or the Docker?
If a pod doesn't specify any resource limits, it's ultimately up to the Linux kernel scheduler on the node to assign CPU cycles or not to the process, and to OOM-kill either the pod or other processes on the node if memory use is excessive. Neither Kubernetes nor Docker will assign or guess at any sort of limits.
So if you have a process with a massive memory leak, and it gets scheduled on a very large but quiet instance with 256 GB of available memory, it will get to use almost all of that memory before it gets OOM-killed. If a second replica gets scheduled on a much smaller instance with only 4 GB, it's liable to fail much sooner. Usually you'll want to actually set limits for consistent behavior.

Does Kubernetes allocates resources if the resource limit is above node capacity?

I would like to know, does scheduler considers resource limits when scheduling a pod?
For example, of scheduler schedules 4 pods in a specific node with total capacity <200mi, 400m> and the total resource limits of those pods are <300mi, 700m>, what will be happened?
Only resource requests are considered during scheduling. This can result in a node being overcommitted. (Managing Compute Resources for Containers in the Kubernetes documentation says a little more.)
In your example, say your node has 1 CPU and 2 GB of RAM, and you've scheduled 4 pods that request 0.2 CPU and 400 MB RAM each. Those all "fit" (requiring 0.8 CPU and 1.6 GB RAM total) so they get scheduled. If any individual pod exceeds its own limit, its CPU usage will be throttled or memory allocation will fail or the process will be killed. But, say all 4 of the pods try to allocate 600 MB of RAM: none individually exceeds its limits, but in aggregate it's more memory than the system has, so the underlying Linux kernel will invoke its out-of-memory killer and shut down processes to free up space. You might see this as a pod restarting for no apparent reason.

AWS ECS Task Memory and CPU Allocation

I'm looking for guidance on allocating memory for an ECS task. I'm running a Rails app for a client who wants to be as cheap as possible on server cost. I was looking at the medium server size that has 2 CPU and 4 gb memory.
Most of the time I'll only need 1 container running the rails server at a time. However, there are occasional spikes and I want to scale out another server and have the container deployed to it. When traffic slows down, I want to scale back down to the single server / task.
Here's where I need help:
What should I make my task memory setting be? 4GB? That would be the total on the box but doesn't account for system processes. I could do 3 GB, but then I'd be wasting some passionless free memory. Same question for the CPU... should I just make it 100%?
I don't want to pay for a bigger server, i.e. 16 GB to sit there and only have 1 container needed most of the time... such a waste.
What I want seems simple. 1 task per instance. When the instance gets to 75% usage, scale a new instance and deploy the task to the second. I don't get why I have to set task memory and CPU settings when it's a one-to-one ratio.
Can anyone give me guidance on how to do what I've described? Or what the proper task definition settings should be when it's meant to be one-to-one with the instance?
Thanks for any help.
--Edit--
Based on feedback, here's a potential solution:
Task definition = memory reservation is 3 GB and memory is 4 GB.
Ec2 medium nodes, Which have 4 GB
ECS Service autoscaling configured:
- scale up (increase task count by 1) when Service CPU utilization is greater than 75%.
- scale down (decrease task count by 1) when Service CPU utilization is less than 25%.
ECS Cluster scaling configured:
- scale up (increase ec2 instance count by 1) when cluster memory utilization is greater than 80%.
- scale down (decrease ec2 instance count by 1) when cluster memory utilization is less than 40%.
Example:
Starts with 1 EC2 instance running a task with 3 GB reservation. This is 75% cluster utilization.
When the service spikes and CPU utilization of the service jumps to greater than 75%, it will trigger a service scale. Now the task count is increased and the new task is asking for 3 GB again, which makes it a total of 6 GB but only 4 is available so the cluster is at 150% utilization.
This triggers the cluster scale (over 80%) which adds a new ec2 node to the cluster for the new service. When it's there, we're back down to 6GB demand / 8 GB available which is 75% and stable.
The scale down would happen the same.
For setting memory for containers, I would recommend using "memoryReservation": The soft limit of memory to your container and
"memory": the hard limit on your container.
You can set "memoryReservation" to 3GB, which will ensure the other instance of the container does not land up on the same EC2 instance. The "memory" option will allow the container to swell up on memory when absolutely needed.
Ref:http://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html
ECS right now does not support the flexibility to disable the deployment of same task twice on the same ec2 compute instance.
But you can hack your way by either blocking cpu/memory or externalizing a known port on you task.