Kubernetes release requested cpu - kubernetes

We have a Java application distributed over multiple pods on Google Cloud Platform. We also set memory requests to give the pod a certain part of the memory available on the node for heap and non-heap space.
The application is very resource-intensive in terms of CPU while starting the pod but does not use the CPU after the pod is ready (only 0,5% are used). If we use container resource "requests", the pod does not release these resources after start has finished.
Does Kubernetes allow to specify that a pod is allowed to use (nearly) all the cpu power available during start and release those resources after that? Due to rolling update we can prevent that two pods are started at the same time.
Thanks for your help.

If you specify requests without a limit the value will be used for scheduling the pod to an appropriate node that satisfies the requested available CPU bandwidth. The kernel scheduler will assume that the requests match the actual resource consumption but will not prevent exceeding usage. This will be 'stolen' from other containers.
If you specify a limit as well your container will get throttled if it tries to exceed the value. You can combine both to allow bursting usage of the cpu, exceeding the usual requests but not allocating everything from the node, slowing down other processes.

"Does Kubernetes allow to specify that a pod is allowed to use
(nearly) all the cpu power available during start and release those
resources after that?"
A key word here is "available". The answer is "yes" and it can be achieved by using Burstable QoS (Quality of Service) class. Configure CPU request to a value you expect the container will need after starting up, and either:
configure CPU limit higher than the CPU request, or
don't configure CPU limit in which case either namespace's default CPU limit will apply if defined, or the container "...could use all of the CPU resources available on the Node where it is running".
If there isn't CPU available on the Node for bursting, the container won't get any beyond the requested value and as result the starting of the application could be slower.
It is worth mentioning what the docs explain for Pods with multiple Containers:
The CPU request for a Pod is the sum of the CPU requests for all the
Containers in the Pod. Likewise, the CPU limit for a Pod is the sum of
the CPU limits for all the Containers in the Pod.
If running Kubernetes v1.12+ and have access to configure kubelet, the Node CPU Management Policies could be of interest.

one factor for scheduling pods in nodes is resource availability and kubernetes scheduler calculates used resources from request value of each pod. If you do not assign any value in request parameter then for this deployment request will be zero . Request parameter doesnt ensure that the pod will use this much cpu or ram. you can get current usage of resources from "kubectl top pods / nodes".
request parameter will buffer resources for a pod. where as limit put a cap on resources usage for a pod.
you can get more information here https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/.
This will give you a rough idea of request and limit.

Related

If requested memory is "the minimum", why is kubernetes killing my pod when it exceeds 10x the requested?

I am debuggin a problem with pod eviction in Kubernetes.
It looks like it is related to a configuration in PHP FPM children processes quantity.
I assigned a minimum memory of 128 MB and Kubernetes is evicting my pod apparently when exceeds 10x that amount (The node was low on resource: memory. Container phpfpm was using 1607600Ki, which exceeds its request of 128Mi.)
How can I prevent this? I thought that requested resources is the minimum and that the pod can use whatever is available if there's no upper limit.
Requested memory is not "the minimum", it is what it is called - the amount of memory requested by pod. When kubernetes schedules pod, it uses request as a guidance to choose a node which can accommodate this workload, but it doesn't guarantee that pod won't be killed if node is short on memory.
As per docs https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#how-pods-with-resource-limits-are-run
if a container exceeds its memory request and the node that it runs on becomes short of memory overall, it is likely that the Pod the container belongs to will be evicted.
If you want to guarantee a certain memory window for your pods - you should use limits, but in that case if your pod doesn't use most of this memory, it will be "wasted"
So to answer your question "How can I prevent this?", you can:
reconfigure your php-fpm in a way, that prevents it to use 10x memory (i.e. reduce workers count), and configure autoscaling. That way your overloaded pods won't be evicted, and kubernetes will schedule new pods in event of higher load
set memory limit to guarantee a certain amount of memory to your pods
Increase memory on your nodes
Use affinity to schedule your demanding pods on some dedicated nodes and other workloads on separate nodes

QoS class of Guaranteed for Pod in Kubernetes

On my kubernetes nodes there are
prioritized pods
dispensable pods
Therefore I would like to have QoS class of Guaranteed for the prioritized pods.
To achieve a Guaranteed class the cpu/memory requests/limits must meet some conditions. Therefore:
For every Container in the Pod, the CPU limit must equal the CPU
request
But I would like to set a higher CPU limit than request, so that the prioritized pods can use every free CPU resources which are available.
Simple example: A Node with 4 cores has:
1 prioritized pod with 2000 CPU request and 3900 CPU limit
3 dispensable pods with each 500 CPU request and limit.
If the prioritized pod would have 2000 CPU request and limit 2 Cores are wasted because the dispensable pods don't use CPU most of the time.
If the prioritized pod would have 3900 CPU request and limit, I would need an extra node for the dispensable pods.
Questions
Is it possible to set explicitly the Guaranteed class to a pod even with difference CPU request and limit?
If it's not possible: Why is there no way to explicitly set the QoS class?
Remarks
There's an system-cluster-critical option. But I think this should only be used for critical k8s add-on pods but not for critical applications.
Is it possible to set explicitly the Guaranteed class to a pod even with difference CPU request and limit?
Yes, however you will need to use an additional plugin: capacity-scheduling used with PriorityClass:
There is increasing demand to use Kubernetes to manage batch workloads (ML/DL). In those cases, one challenge is to improve cluster utilization while ensuring that each user has a reasonable amount of resources. The problem can be partially addressed by the Kubernetes ResourceQuota. The native Kubernetes ResourceQuota API can be used to specify the maximum overall resource allocation per namespace. The quota enforcement is done through an admission check. A quota consumer (e.g., a Pod) cannot be created if the aggregated resource allocation exceeds the quota limit. In other words, the overall resource usage is aggregated based on Pod's spec (i.e., cpu/mem requests) when it's created. The Kubernetes quota design has the limitation: the quota resource usage is aggregated based on the resource configurations (e.g., Pod cpu/mem requests specified in the Pod spec). Although this mechanism can guarantee that the actual resource consumption will never exceed the ResourceQuota limit, it might lead to low resource utilization as some pods may have claimed the resources but failed to be scheduled. For instance, actual resource consumption may be much smaller than the limit.
Pods can be created at a specific priority. You can control a pod's consumption of system resources based on a pod's priority, by using the scopeSelector field in the quota spec.
A quota is matched and consumed only if scopeSelector in the quota spec selects the pod.
When quota is scoped for priority class using scopeSelector field, quota object is restricted to track only following resources:
pods
cpu
memory
ephemeral-storage
limits.cpu
limits.memory
limits.ephemeral-storage
requests.cpu
requests.memory
requests.ephemeral-storage
This plugin supports also preemption (example for Elastic):
Preemption happens when a pod is unschedulable, i.e., failed in PreFilter or Filter phases.
In particular for capacity scheduling, the failure reasons could be:
Prefilter Stage
sum(allocated res of pods in the same elasticquota) + pod.request > elasticquota.spec.max
sum(allocated res of pods in the same elasticquota) + pod.request > sum(elasticquota.spec.min)
So the preemption logic will attempt to make the pod schedulable, with a cost of preempting other running pods.
Examples of yaml files and usage can be found in the plugin description.

Are Kubernetes requests really guaranteed?

I'm running a pod on an EKS node with 2500m of requests and no limits - it happily uses around 3000m typically. I wanted to test whether requests were really guaranteed, so I am running a CPU stress test pod on the same node, with 3000m requests and no limits again.
This caused the original pod to not be able to use more than ~1500m of CPU - well below it's requests. Then when I turned off the stress pod, it returned to using 3000m.
There are a number of Kubernetes webpages which say that requests are what the pod is "guaranteed" - but does this only mean guaranteed for scheduling, or should it actually be a guarantee. If it is guaranteed, why might my pod CPU usage have been restricted (noting that there is no throttling for pods without limits).
Requests are not a guarantee that resources (especially CPU) will be available at runtime. If you set requests and limits very close together you have better expectations, but you need every pod in the system to cooperate to have a real guarantee.
Resource requests only affect the initial scheduling of the pod. In your example, you have one pod that requests 2.5 CPU and a second pod that requests 3 CPU. If your node has 8 CPU, both can be scheduled on the same node, but if the node only has 4 CPU, they need to go on separate nodes (if you have the cluster autoscaler, it can create a new node).
To carry on with the example, let's say the pods get scheduled on the same node with 8 CPU. Now that they've been scheduled the resource requests don't matter any more. Neither pod has resource limits, but let's say the smaller pod actually tries to use 3 CPU and the larger pod (a multi-threaded stress test) uses 13 CPU. This is more than the physical capacity of the system, so the kernel will allocate processor cycles to the two processes.
For CPU usage, if the node is overcommitted, you'll just see slow-downs in all of the processes. Either memory or disk ("ephemeral storage") can cause pods to be Evicted and rescheduled on different nodes; the pods that get evicted are the ones that exceed their resource requests by the most. Memory can also cause the node to run out of physical memory, and pods can get OOMKilled.
If every pod sets resource requests and limits to the same value then you do have an approximate guarantee that resources will be available, since nothing will be able to use more resource than the pod scheduler allocates it. For an individual pod and for non-CPU resources, if resource requests and limits are the same, your pod won't get evicted if the node is overcommitted (because it can't exceed its requests). On the other hand, most processes won't generally use exactly their resource requests, and so setting requests high enough that you're guaranteed to not be evicted also means you're causing the node to have unused resources, and your cluster as a whole will be less efficient (need more nodes to do the same work and be more expensive) (but more reliable since pods won't get killed off as often).

Kubernetes: Scheduling Pod without resource limits

Kubernetes: what happens when a pod has no resources limits / requests defined?
How much resources can a pod use in Kubernetes (GKE) when it has no (or only partial) resource limits/requests defined?
For example, I have a pod with only memory limits and memory requests, but it has no cpu specs.
Will the cpu available to this pod be:
0
as much as left on the node/namespace (total minus all other pod claims)
as much as possible regarding actual use by other pods on the node/namespace
If you do not specify a CPU limit for a Container, then one of these situations applied:
The Container has no upper bound limit on the CPU resources it can use. The Container can use all of the CPU resources available on the Node where the pod is running. So in your case it will be second option which you have specified in your question : as much as left on the node/namespace.
Normally Kubernetes Cluster Administrator defines the limit for each and every namespace in cluster. so the Container is running in a namespace that has a default CPU limit, and the Container is automatically assigned the default limit.
Resource Quota should be defined for each Namespace which comes in handy to get rid of pods that don't have resource request or limits and eating up all the resources. This means you can not schedule the pod until you specify the resource requirements for that pod in particular namespace and this is recommended as best practices
For more information you could refer to this section : https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/#if-you-do-not-specify-a-cpu-limit

What does it mean OutOfcpu error in kubernetes?

I got OutOfcpu in kubernetes on googlecloud what does it mean? My pods seem to be working now, however there there were pods in this same revision which got OutOfcpu.
It means that the kube-scheduler can't find any node with available CPU to schedule your pods:
kube-scheduler selects a node for the pod in a 2-step operation:
Filtering
Scoring
The filtering step finds the set of Nodes where it’s feasible to
schedule the Pod. For example, the PodFitsResources filter checks
whether a candidate Node has enough available resource to meet a Pod’s
specific resource requests.
[...]
PodFitsResources: Checks if the
Node has free resources (eg, CPU and Memory) to meet the requirement
of the Pod.
Also, as per Assigning Pods to Nodes:
If the named node does not have the resources to accommodate the pod,
the pod will fail and its reason will indicate why, e.g. OutOfmemory
or OutOfcpu.
In addition to how-kube-scheduler-schedules-pods, I think this will be helpful to understand why OutOfcpu error has been shown up.
When you create a Pod, the Kubernetes scheduler selects a node for the
Pod to run on. Each node has a maximum capacity for each of the
resource types: the amount of CPU and memory it can provide for Pods.
The scheduler ensures that, for each resource type, the sum of the
resource requests of the scheduled Containers is less than the
capacity of the node. Note that although actual memory or CPU resource
usage on nodes is very low, the scheduler still refuses to place a Pod
on a node if the capacity check fails. This protects against a
resource shortage on a node when resource usage later increases, for
example, during a daily peak in request rate.
Ref: how-pods-with-resource-requests-are-scheduled