Coming from numerous years of running node/rails apps on bare metal; i was used to be able to run as many apps as i wanted on a single machine (let's say, a 2Go at digital ocean could easily handle 10 apps without worrying, based on correct optimizations or fairly low amount of traffic)
Thing is, using kubernetes, the game sounds quite different. I've setup a "getting started" cluster with 2 standard vm (3.75Go).
Assigned a limit on a deployment with the following :
resources:
requests:
cpu: "64m"
memory: "128Mi"
limits:
cpu: "128m"
memory: "256Mi"
Then witnessing the following :
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
default api 64m (6%) 128m (12%) 128Mi (3%) 256Mi (6%)
What does this 6% refers to ?
Tried to lower the CPU limit, to like, 20Mi… the app does to start (obviously, not enough resources). The docs says it is percentage of CPU. So, 20% of 3.75Go machine ? Then where this 6% comes from ?
Then increased the size of the node-pool to the n1-standard-2, the same pod effectively span 3% of node. That sounds logical, but what does it actually refers to ?
Still wonder what is the metrics to be taken in account for this part.
The app seems to need a large amount of memory on startup, but then it use only a minimal fraction of this 6%. I then feel like I misunderstanding something, or misusing it all
Thanks for any experienced tips/advices to have a better understanding
Best
According to the docs, CPU requests (and limits) are always fractions of available CPU cores on the node that the pod is scheduled on (with a resources.requests.cpu of "1" meaning reserving one CPU core exclusively for one pod). Fractions are allowed, so a CPU request of "0.5" will reserve half a CPU for one pod.
For convenience, Kubernetes allows you to specify CPU resource requests/limits in millicores:
The expression 0.1 is equivalent to the expression 100m, which can be read as “one hundred millicpu” (some may say “one hundred millicores”, and this is understood to mean the same thing when talking about Kubernetes). A request with a decimal point, like 0.1 is converted to 100m by the API, and precision finer than 1m is not allowed.
As already mentioned in the other answer, resource requests are guaranteed. This means that Kubernetes will schedule pods in a way that the sum of all requests will not exceed the amount of resources actually available on a node.
So, by requesting 64m of CPU time in your deployment, you are requesting actually 64/1000 = 0,064 = 6,4% of one of the node's CPU cores time. So that's where your 6% come from. When upgrading to a VM with more CPU cores, the amount of available CPU resources increases, so on a machine with two available CPU cores, a request for 6,4% of one CPU's time will allocate 3,2% of the CPU time of two CPUs.
The 6% of CPU means 6% (CPU requests) of the nodes CPU time is reserved for this pod. So it guaranteed that it always get at lease this amount of CPU time. It can still burst up to 12% (CPU limits), if there is still CPU time left.
This means if the limit is very low, your application will take more time to start up. Therefore a liveness probe may kill the pod before it is ready, because the application took too long. To solve this you may have to increase the initialDelaySeconds or the timeoutSeconds of the liveness probe.
Also note that the resource requests and limits define how many resources your pod allocates, and not the actual usage.
The resource request is what your pod is guaranteed to get on a node. This means, that the sum of the requested resources must not be higher than the total amount of CPU/memory on that node.
The resource limit is the upper limit of what your pod is allowed to use. This means the sum of of these resources can be higher than the actual available CPU/memory.
Therefore the percentages tell you how much CPU and memory of the total resources your pod allocates.
Link to the docs: https://kubernetes.io/docs/user-guide/compute-resources/
Some other notable things:
If your pod uses more memory than defined in the limit, it gets OOMKilled (out of memory).
If your pod uses more memory than defined in the requests and the node runs our of memory, the pod might get OOMKilled in order to guarantee other pods to survive, which use less than their requested memory.
If your application needs more CPU than requested it can burst up to the limit.
Your pod never gets killed, because it uses too much CPU.
Related
I have a pod in kubernetes with the request CPU 100m and Limit cpu with 4000m.
The application spins up 500+ threads and it works fine. The application becomes unresponsive during heavy load and could be because of the thread limit issue.
Question:-
The number of threads are related to CPU's.
Since, the request CPU is 100m, Will there be any problem in the thread limitation, or, pod can still spin up more threads as the Limit is 4000m.
Since, the request CPU is 100m, Will there be any problem in the thread limitation, or, pod can still spin up more threads as the Limit is 4000m.
The CPU limitation is 4000m (4 cores), so it can use as many threads as it wants and use CPU utilization up to 4 cores.
The CPU request of 100m is almost only used for Pod scheduling, so your pod might end up on a node with few resources compared to your limitation, and might be evicted if there are few available CPU resources on the node.
I am searching for an specific information regarding kubernetes requests and limits and still didn't find an answer or just didn't understand quite well. Say I've defined two containers A and B for a single pod, both with its resources limits and requests:
A:
RAM request: 1Gi
RAM limit: 2Gi
B:
RAM request: 1Gi
RAM limit: 2Gi
So, we have a PoD limit of 4Gi (total). Suppose the A container exceeded the limit (say +1Gi), but B is consuming 64Mi only. So, my questions are:
What happens to the pod? Is it evicted?
Is the container A restarted?
Is the container A allowed to use the B's available RAM?
Thanks!
What happens to the pod? Is it evicted?
If the memory limit of a container is exceeded, the kernel's OOM killer is invoked and terminates the container's process. The Pod then starts a new container on the same Node.
(CPU limits use a different mechanism (CFS Bandwidth Control) that throttles the processes' CPU cycles instead of terminating the process.)
Is the container A restarted?
Yes.
Is the container A allowed to use the B's available RAM?
The memory is tracked separately for each container. They are not pooled together into the same limit.
Just to add some details
Memory request: Is the memory reserved for container, whether it is used completely or not.
Memory Limit: Is a restriction limit of max memory this container is supposed to use. So when containers memory requests exceeds, then whether to allocate or not depends on the free memory available in the machine running that container at that point of time
To answer your queries, from my understanding:
If Container A reaches its Memory limit of 2GI, it OOMed and this will restart the containers.
If Container A exceeds its Memory request of 1GI, it tries to get the required memory from whats available of the machine(max to what limit is set)
Hope this answers you queries
Let us assume kubernetes cluster with one worker node (1core and 256MB RAM). all pods will be scheduled in worker node.
At first i deployed a pod with config (request: cpu 0.4, limit: cpu 0.8), it deployed successfully. as the machine has 1 core free it took 0.8 cpu
Can i able to deploy another pod with same config? If yes will first pod's cpu reduce to 0.4?
Resource requests and limits are considered in two different places.
Requests are only considered when scheduling a pod. If you're scheduling two pods that each request 0.4 CPU on a node that has 1.0 CPU, then they fit and could both be scheduled there (along with other pods requesting up to a total of 0.2 CPU more).
Limits throttle CPU utilization, but are also subject to the actual physical limits of the node. If one pod tries to use 1.0 CPU but its pod spec limits it to 0.8 CPU, it will get throttled. If two of these pods run on the same hypothetical node with only 1 actual CPU, they will be subject to the kernel scheduling policy and in practice will probably each get about 0.5 CPU.
(Memory follows the same basic model, except that if a pod exceeds its limits or if the total combined memory used on a node exceeds what's available, the pod will get OOM-killed. If your node has 256 MB RAM, and each pod has a memory request of 96 MB and limit of 192 MB, they can both get scheduled [192 MB requested memory fits] but could get killed if either one individually allocates more than 192 MB RAM [its own limit] or if the total memory used by all Kubernetes and non-Kubernetes processes on that node goes over the physical memory limit.)
Fractional requests are allowed. A Container with spec.containers[].resources.requests.cpu of 0.5 is guaranteed half as much CPU as one that asks for 1 CPU. The expression 0.1 is equivalent to the expression 100m, which can be read as “one hundred millicpu”. Some people say “one hundred millicores”, and this is understood to mean the same thing. A request with a decimal point, like 0.1, is converted to 100m by the API, and precision finer than 1m is not allowed. For this reason, the form 100m might be preferred.
CPU is always requested as an absolute quantity, never as a relative quantity; 0.1 is the same amount of CPU on a single-core, dual-core, or 48-core machine.
From here
In your condition you able to run 2 pods on the node.
I would like to know, does scheduler considers resource limits when scheduling a pod?
For example, of scheduler schedules 4 pods in a specific node with total capacity <200mi, 400m> and the total resource limits of those pods are <300mi, 700m>, what will be happened?
Only resource requests are considered during scheduling. This can result in a node being overcommitted. (Managing Compute Resources for Containers in the Kubernetes documentation says a little more.)
In your example, say your node has 1 CPU and 2 GB of RAM, and you've scheduled 4 pods that request 0.2 CPU and 400 MB RAM each. Those all "fit" (requiring 0.8 CPU and 1.6 GB RAM total) so they get scheduled. If any individual pod exceeds its own limit, its CPU usage will be throttled or memory allocation will fail or the process will be killed. But, say all 4 of the pods try to allocate 600 MB of RAM: none individually exceeds its limits, but in aggregate it's more memory than the system has, so the underlying Linux kernel will invoke its out-of-memory killer and shut down processes to free up space. You might see this as a pod restarting for no apparent reason.
I was told by a more experienced DevOps person, that resource(CPU and Memory) limit and request would be nicer to be closer for scheduling pods.
Intuitively I can image less scaling up and down would require less calculation power for K8s? or can someone explain it in more detail?
The resource requests and limits do two fundamentally different things. The Kubernetes scheduler places a pod on a node based only on the sum of the resource requests: if the node has 8 GB of RAM, and the pods currently scheduled on that node requested 7 GB of RAM, then a new pod that requests 512 MB will fit there. The limits control how much resource the pod is actually allowed to use, with it getting CPU-throttled or OOM-killed if it uses too much.
In practice many workloads can be "bursty". Something might require 2 GB of RAM under peak load, but far less than that when just sitting idle. It doesn't necessarily make sense to provision enough hardware to run everything at peak load, but then to have it sit idle most of the time.
If the resource requests and limits are far apart then you can "fit" more pods on the same node. But, if the system as a whole starts being busy, you can wind up with many pods that are all using above their resource request, and actually use more memory than the node has, without any individual pod being above its limit.
Consider a node with 8 GB of RAM, and pods with 512 MB RAM resource requests and 2 GB limits. 16 of these pods "fit". But if each pod wants to use 1 GB RAM (allowed by the resource limits) that's more total memory than the node has, and you'll start getting arbitrary OOM-kills. If the pods request 1 GB RAM instead, only 8 will "fit" and you'll need twice the hardware to run them at all, but in this scenario the cluster will run happily.
One strategy for dealing with this in a cloud environment is what your ops team is asking, make the resource requests and limits be very close to each other. If a node fills up, an autoscaler will automatically request another node from the cloud. Scaling down is a little trickier. But this approach avoids problems where things die randomly because the Kubernetes nodes are overcommitted, at the cost of needing more hardware for the idle state.