Does Kubernetes allocates resources if the resource limit is above node capacity? - kubernetes

I would like to know, does scheduler considers resource limits when scheduling a pod?
For example, of scheduler schedules 4 pods in a specific node with total capacity <200mi, 400m> and the total resource limits of those pods are <300mi, 700m>, what will be happened?

Only resource requests are considered during scheduling. This can result in a node being overcommitted. (Managing Compute Resources for Containers in the Kubernetes documentation says a little more.)
In your example, say your node has 1 CPU and 2 GB of RAM, and you've scheduled 4 pods that request 0.2 CPU and 400 MB RAM each. Those all "fit" (requiring 0.8 CPU and 1.6 GB RAM total) so they get scheduled. If any individual pod exceeds its own limit, its CPU usage will be throttled or memory allocation will fail or the process will be killed. But, say all 4 of the pods try to allocate 600 MB of RAM: none individually exceeds its limits, but in aggregate it's more memory than the system has, so the underlying Linux kernel will invoke its out-of-memory killer and shut down processes to free up space. You might see this as a pod restarting for no apparent reason.

Related

Kubernetes - Request CPU & Limit CPU issue

I have a pod in kubernetes with the request CPU 100m and Limit cpu with 4000m.
The application spins up 500+ threads and it works fine. The application becomes unresponsive during heavy load and could be because of the thread limit issue.
Question:-
The number of threads are related to CPU's.
Since, the request CPU is 100m, Will there be any problem in the thread limitation, or, pod can still spin up more threads as the Limit is 4000m.
Since, the request CPU is 100m, Will there be any problem in the thread limitation, or, pod can still spin up more threads as the Limit is 4000m.
The CPU limitation is 4000m (4 cores), so it can use as many threads as it wants and use CPU utilization up to 4 cores.
The CPU request of 100m is almost only used for Pod scheduling, so your pod might end up on a node with few resources compared to your limitation, and might be evicted if there are few available CPU resources on the node.

How kubernetes request and limit works in pressure?

Let us assume kubernetes cluster with one worker node (1core and 256MB RAM). all pods will be scheduled in worker node.
At first i deployed a pod with config (request: cpu 0.4, limit: cpu 0.8), it deployed successfully. as the machine has 1 core free it took 0.8 cpu
Can i able to deploy another pod with same config? If yes will first pod's cpu reduce to 0.4?
Resource requests and limits are considered in two different places.
Requests are only considered when scheduling a pod. If you're scheduling two pods that each request 0.4 CPU on a node that has 1.0 CPU, then they fit and could both be scheduled there (along with other pods requesting up to a total of 0.2 CPU more).
Limits throttle CPU utilization, but are also subject to the actual physical limits of the node. If one pod tries to use 1.0 CPU but its pod spec limits it to 0.8 CPU, it will get throttled. If two of these pods run on the same hypothetical node with only 1 actual CPU, they will be subject to the kernel scheduling policy and in practice will probably each get about 0.5 CPU.
(Memory follows the same basic model, except that if a pod exceeds its limits or if the total combined memory used on a node exceeds what's available, the pod will get OOM-killed. If your node has 256 MB RAM, and each pod has a memory request of 96 MB and limit of 192 MB, they can both get scheduled [192 MB requested memory fits] but could get killed if either one individually allocates more than 192 MB RAM [its own limit] or if the total memory used by all Kubernetes and non-Kubernetes processes on that node goes over the physical memory limit.)
Fractional requests are allowed. A Container with spec.containers[].resources.requests.cpu of 0.5 is guaranteed half as much CPU as one that asks for 1 CPU. The expression 0.1 is equivalent to the expression 100m, which can be read as “one hundred millicpu”. Some people say “one hundred millicores”, and this is understood to mean the same thing. A request with a decimal point, like 0.1, is converted to 100m by the API, and precision finer than 1m is not allowed. For this reason, the form 100m might be preferred.
CPU is always requested as an absolute quantity, never as a relative quantity; 0.1 is the same amount of CPU on a single-core, dual-core, or 48-core machine.
From here
In your condition you able to run 2 pods on the node.

Kubernetes limit and request of resource would be better to be closer

I was told by a more experienced DevOps person, that resource(CPU and Memory) limit and request would be nicer to be closer for scheduling pods.
Intuitively I can image less scaling up and down would require less calculation power for K8s? or can someone explain it in more detail?
The resource requests and limits do two fundamentally different things. The Kubernetes scheduler places a pod on a node based only on the sum of the resource requests: if the node has 8 GB of RAM, and the pods currently scheduled on that node requested 7 GB of RAM, then a new pod that requests 512 MB will fit there. The limits control how much resource the pod is actually allowed to use, with it getting CPU-throttled or OOM-killed if it uses too much.
In practice many workloads can be "bursty". Something might require 2 GB of RAM under peak load, but far less than that when just sitting idle. It doesn't necessarily make sense to provision enough hardware to run everything at peak load, but then to have it sit idle most of the time.
If the resource requests and limits are far apart then you can "fit" more pods on the same node. But, if the system as a whole starts being busy, you can wind up with many pods that are all using above their resource request, and actually use more memory than the node has, without any individual pod being above its limit.
Consider a node with 8 GB of RAM, and pods with 512 MB RAM resource requests and 2 GB limits. 16 of these pods "fit". But if each pod wants to use 1 GB RAM (allowed by the resource limits) that's more total memory than the node has, and you'll start getting arbitrary OOM-kills. If the pods request 1 GB RAM instead, only 8 will "fit" and you'll need twice the hardware to run them at all, but in this scenario the cluster will run happily.
One strategy for dealing with this in a cloud environment is what your ops team is asking, make the resource requests and limits be very close to each other. If a node fills up, an autoscaler will automatically request another node from the cloud. Scaling down is a little trickier. But this approach avoids problems where things die randomly because the Kubernetes nodes are overcommitted, at the cost of needing more hardware for the idle state.

kubernetes / understanding CPU resources limits

Coming from numerous years of running node/rails apps on bare metal; i was used to be able to run as many apps as i wanted on a single machine (let's say, a 2Go at digital ocean could easily handle 10 apps without worrying, based on correct optimizations or fairly low amount of traffic)
Thing is, using kubernetes, the game sounds quite different. I've setup a "getting started" cluster with 2 standard vm (3.75Go).
Assigned a limit on a deployment with the following :
resources:
requests:
cpu: "64m"
memory: "128Mi"
limits:
cpu: "128m"
memory: "256Mi"
Then witnessing the following :
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
default api 64m (6%) 128m (12%) 128Mi (3%) 256Mi (6%)
What does this 6% refers to ?
Tried to lower the CPU limit, to like, 20Mi… the app does to start (obviously, not enough resources). The docs says it is percentage of CPU. So, 20% of 3.75Go machine ? Then where this 6% comes from ?
Then increased the size of the node-pool to the n1-standard-2, the same pod effectively span 3% of node. That sounds logical, but what does it actually refers to ?
Still wonder what is the metrics to be taken in account for this part.
The app seems to need a large amount of memory on startup, but then it use only a minimal fraction of this 6%. I then feel like I misunderstanding something, or misusing it all
Thanks for any experienced tips/advices to have a better understanding
Best
According to the docs, CPU requests (and limits) are always fractions of available CPU cores on the node that the pod is scheduled on (with a resources.requests.cpu of "1" meaning reserving one CPU core exclusively for one pod). Fractions are allowed, so a CPU request of "0.5" will reserve half a CPU for one pod.
For convenience, Kubernetes allows you to specify CPU resource requests/limits in millicores:
The expression 0.1 is equivalent to the expression 100m, which can be read as “one hundred millicpu” (some may say “one hundred millicores”, and this is understood to mean the same thing when talking about Kubernetes). A request with a decimal point, like 0.1 is converted to 100m by the API, and precision finer than 1m is not allowed.
As already mentioned in the other answer, resource requests are guaranteed. This means that Kubernetes will schedule pods in a way that the sum of all requests will not exceed the amount of resources actually available on a node.
So, by requesting 64m of CPU time in your deployment, you are requesting actually 64/1000 = 0,064 = 6,4% of one of the node's CPU cores time. So that's where your 6% come from. When upgrading to a VM with more CPU cores, the amount of available CPU resources increases, so on a machine with two available CPU cores, a request for 6,4% of one CPU's time will allocate 3,2% of the CPU time of two CPUs.
The 6% of CPU means 6% (CPU requests) of the nodes CPU time is reserved for this pod. So it guaranteed that it always get at lease this amount of CPU time. It can still burst up to 12% (CPU limits), if there is still CPU time left.
This means if the limit is very low, your application will take more time to start up. Therefore a liveness probe may kill the pod before it is ready, because the application took too long. To solve this you may have to increase the initialDelaySeconds or the timeoutSeconds of the liveness probe.
Also note that the resource requests and limits define how many resources your pod allocates, and not the actual usage.
The resource request is what your pod is guaranteed to get on a node. This means, that the sum of the requested resources must not be higher than the total amount of CPU/memory on that node.
The resource limit is the upper limit of what your pod is allowed to use. This means the sum of of these resources can be higher than the actual available CPU/memory.
Therefore the percentages tell you how much CPU and memory of the total resources your pod allocates.
Link to the docs: https://kubernetes.io/docs/user-guide/compute-resources/
Some other notable things:
If your pod uses more memory than defined in the limit, it gets OOMKilled (out of memory).
If your pod uses more memory than defined in the requests and the node runs our of memory, the pod might get OOMKilled in order to guarantee other pods to survive, which use less than their requested memory.
If your application needs more CPU than requested it can burst up to the limit.
Your pod never gets killed, because it uses too much CPU.

AWS ECS Task Memory and CPU Allocation

I'm looking for guidance on allocating memory for an ECS task. I'm running a Rails app for a client who wants to be as cheap as possible on server cost. I was looking at the medium server size that has 2 CPU and 4 gb memory.
Most of the time I'll only need 1 container running the rails server at a time. However, there are occasional spikes and I want to scale out another server and have the container deployed to it. When traffic slows down, I want to scale back down to the single server / task.
Here's where I need help:
What should I make my task memory setting be? 4GB? That would be the total on the box but doesn't account for system processes. I could do 3 GB, but then I'd be wasting some passionless free memory. Same question for the CPU... should I just make it 100%?
I don't want to pay for a bigger server, i.e. 16 GB to sit there and only have 1 container needed most of the time... such a waste.
What I want seems simple. 1 task per instance. When the instance gets to 75% usage, scale a new instance and deploy the task to the second. I don't get why I have to set task memory and CPU settings when it's a one-to-one ratio.
Can anyone give me guidance on how to do what I've described? Or what the proper task definition settings should be when it's meant to be one-to-one with the instance?
Thanks for any help.
--Edit--
Based on feedback, here's a potential solution:
Task definition = memory reservation is 3 GB and memory is 4 GB.
Ec2 medium nodes, Which have 4 GB
ECS Service autoscaling configured:
- scale up (increase task count by 1) when Service CPU utilization is greater than 75%.
- scale down (decrease task count by 1) when Service CPU utilization is less than 25%.
ECS Cluster scaling configured:
- scale up (increase ec2 instance count by 1) when cluster memory utilization is greater than 80%.
- scale down (decrease ec2 instance count by 1) when cluster memory utilization is less than 40%.
Example:
Starts with 1 EC2 instance running a task with 3 GB reservation. This is 75% cluster utilization.
When the service spikes and CPU utilization of the service jumps to greater than 75%, it will trigger a service scale. Now the task count is increased and the new task is asking for 3 GB again, which makes it a total of 6 GB but only 4 is available so the cluster is at 150% utilization.
This triggers the cluster scale (over 80%) which adds a new ec2 node to the cluster for the new service. When it's there, we're back down to 6GB demand / 8 GB available which is 75% and stable.
The scale down would happen the same.
For setting memory for containers, I would recommend using "memoryReservation": The soft limit of memory to your container and
"memory": the hard limit on your container.
You can set "memoryReservation" to 3GB, which will ensure the other instance of the container does not land up on the same EC2 instance. The "memory" option will allow the container to swell up on memory when absolutely needed.
Ref:http://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html
ECS right now does not support the flexibility to disable the deployment of same task twice on the same ec2 compute instance.
But you can hack your way by either blocking cpu/memory or externalizing a known port on you task.