I am looking for a way to calculate appropriate Limit Range and Resource Quota settings for Kubernetes based on the sizing of our Load Test (LT) environment. The LT environment we want to keep flexible in order to play with things and I feel that's a great way to figure out how to set up the limits, etc.
I might also have a fundamental misunderstanding of how this works, so feel free to correct that.
Does anyone have a set of equations or anything that takes into account (I know it won't be an exact science, but I am looking mostly for a jumping-off point):
Container CPU
Container memory
Right now I am pulling the CPU limits requested using this (and the memory similarly, and using some nifty shell script things I found to do full calculations for me):
kubectl get pods -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[].resource.limits.cpu}{"\n"}{end}' -n my-namespace
We made sure all of our containers are explicitly making requests for CPU/memory, so that works nicely.
The machine type is based on our testing and target number of pods per node. We have nodeSelector declarations in use as we need to separate things out for some very specific needs by the services being deployed and to be able to leverage multiple machine types.
For the Limit Range I was thinking (adding 10% just for padding):
Maximum [CPU/memory] + 10% (ensuring that the machine type holds 2x that calculation) as:
apiVersion: v1
kind: LimitRange
metadata:
name: ns-my-namespace
namespace: my-namespace
spec:
limits:
- max:
cpu: [calculation_from_above]
memory: [calculation_from_above]
type: Container
For the Resource Quota I was thinking (50% to handle estimated overflow in "emergency" HPA):
Total of all CPU/memory in the Load Test environment + 50% as:
apiVersion: v1
kind: ResourceQuota
metadata:
name: ns-my-namespace
namespace: my-namespace
spec:
hard:
limits:
cpu: [calculation_from_above]
memory: [calculation_from_above]
Related
I am playing around with the Horizontal Pod Autoscaler in Kubernetes. I've set the HPA to start up new instances once the average CPU Utilization passes 35%. However this does not seem to work as expected.
The HPA triggers a rescale even though the CPU Utilization is far below the defined target utilization. As seen below the "current" utilization is 10% which is far away from 35%. But still, it rescaled the number of pods from 5 to 6.
I've also checked the metrics in my Google Cloud Platform dashboard (the place at which we host the application). This also shows me that the requested CPU utilization hasn't surpassed the threshold of 35%. But still, several rescales occurred.
The content of my HPA
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: django
spec:
{{ if eq .Values.env "prod" }}
minReplicas: 5
maxReplicas: 35
{{ else if eq .Values.env "staging" }}
minReplicas: 1
maxReplicas: 3
{{ end }}
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: django-app
targetCPUUtilizationPercentage: 35
Does anyone know what the cause of this might be?
This is tricky and can be a bug, but I don't think so, most of time people configure too low values as I'll explain.
How targetCPUUtilizationPercentage relates to Pod's request limits.
The targetCPUUtilizationPercentage configures a percentage based on all the CPU a pod can use. On Kubernetes we can't create an HPA without specifying some limits to CPU usage.
Let's assume that this is our limits:
apiVersion: v1
kind: Pod
metadata:
name: apache
spec:
containers:
- name: apache
image: httpd:alpine
resources:
limits:
cpu: 1000m
And in our targetCPUUtilizationPercentage inside HPA we specify 75%.
That is easy to explain because we ask for 100% (1000m = 1 CPU core) of a single core, so when this core is about 75% of use, HPA will start to work.
But if we define our limits as this:
spec:
containers:
- name: apache
image: httpd:alpine
resources:
limits:
cpu: 500m
Now, 100% of CPU our pod can utilize is only 50% of a single core. Fine, so 100% of cpu usage from this pod means, on hardware, 50% usage of a single core.
This is indifferent for targetCPUUtilizationPercentage, if we keep our value of 75% the HPA will start to work when our single core is about 37.5% usage, because this is 75% of all CPU this pod can consume.
From the perspective of a pod/hpa, they never know that they are limited on CPU or memory.
Understanding the scenario in the question above
With some programs like the one used in the question above - the CPU spikes do occur - however only in small timeframes (for example 10 second spikes). Due to the short duration of these spikes the metric server doesn't save this spike, but only saves the metric after a 1m window. In such cases the spike in between such windows will be excluded. This explains why the spike cannot be seen in the metrics dashboards, but is picked up by the HPA.
Thus, for services with low cpu limits a larger scale-up time window (scaleUp settings in HPA) can be ideal.
Scaling is based on % of requests not limits. I think we should change this answer as the examples in the accepted answer show:
limits:
cpu: 1000m
But the targetCPUUtilizationPercentage is based on requests like:
requests:
cpu: 1000m
For per-pod resource metrics (like CPU), the controller fetches the metrics from the resource metrics API for each Pod targeted by the HorizontalPodAutoscaler. Then, if a target utilization value is set, the controller calculates the utilization value as a percentage of the equivalent resource request on the containers in each Pod. If a target raw value is set, the raw metric values are used directly. The controller then takes the mean of the utilization or the raw value (depending on the type of target specified) across all targeted Pods, and produces a ratio used to scale the number of desired replicas.
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#how-does-a-horizontalpodautoscaler-work
We have a test cluster of 3 nodes on Alicloud
We haven't set any memory limits, per namespace or per pod
However, when looking at the nodes in the alicloud console, we see a Requested/Limited/Used set for the memory. The pods are running out of memory when the Used is over the Limited threshold
Does anyone know where this limit comes from? It seems to be different for each one of our node, so it creates an arbitrary limit per pod?
To be honest I cant find any relevant information where are your current default limits come from.. but anyway you should solve your issue somehow.
I can suggest you manually set required limits to avoid OOMs in future. What you will need to do - is to calculate approximate resource usage and correctly apply limits
This can help:
Kubernetes: Assign Memory Resources and Limits to Containers, especially this part
LimitRange for Memory
Kubernetes administrators can define RAM limits for their nodes.
These limits are enforced at higher priority over how much RAM your
Pod declares and wants to use.
Let's define our first LimitRange : 25Mi RAM as min, 200Mi as max.
nano myRAM-LimitRange.yaml
apiVersion: v1
kind: LimitRange
metadata:
name: my-ram-limit
spec:
limits:
- max:
memory: 200Mi
min:
memory: 25Mi
Running Openshift 3.11 with project ResourceQuotas and LimitRanges enforced, I am trying to understand how I can utilise the entire of my project CPU quota based on the "actual current usage" rather than what I have "reserved".
As a simple example, if my question is not clear:
If I have a project with a ResourceQuota of 2 Core CPU
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-resources
spec:
hard:
limits.cpu: "2"
I have a number of long running containers which are often idle, waiting for requests. So are not actually using much CPU. When requests start appearing I want the effected container to be able to "burst", allowing CPU usage up to the remaining CPU quota available in the project based on what is actually being used (I have no issue with the 100ms CFS resolution).
I need to enforce the maximum the project can have in total, hence the limits.cpu ResourceQuota. But, I must therefore also provide the limits.cpu for each container I create (explicitly or via LimitRange defaults) e.g:
...
spec:
containers:
...
resources:
limits:
cpu: "2"
requests:
cpu: 200m
This however will only work with the first container I create - the second container with the same settings will exceed the project quotas limits.cpu. But the container is just idle doing almost nothing after it's initial startup sequence.
Is it not possible in my scenario above to have it deallocate 200m from the quota for each container based on the request.cpu and burst up to 1800m? ( 1600m of 2000m quota unused + initial 200m requested )
I have read through the following, the overcommit link seemed promising, but I am still stuck.
https://docs.openshift.com/container-platform/3.11/admin_guide/quota.html
https://docs.openshift.com/container-platform/3.11/admin_guide/limits.html
https://docs.openshift.com/container-platform/3.11/admin_guide/overcommit.html
Is what I am trying to do possible?
I am trying to understand how I can utilise the entire of my project CPU quota based on the "actual current usage" rather than what I have "reserved"
You can't. If your quota is on limit.cpu then the cluster admin doesn't want you to burst higher than that value.
If you can get your cluster admin to set your quota differently, to have a low request.cpu quota, and a higher limit.cpu quota, you might be able to size your containers as you'd like.
The other option is to use low limits, and a Horizontal Pod Autoscaler to scale up the number of pods for a specific service that is getting a burst in traffic.
Kubernetes on Google Cloud Platform configures a default CPU request and limit.
I make use of deamonsets and deamonset pods should use as much CPU as possible.
Manually increasing the upper limit is possible but the upper bound must be reconfigured in case of new nodes and the upper bound must be set much lower than what is available on the node in order to have rolling updates allowing pods scheduling.
This requires a lot of manual actions and some resources are just not used most of the time. Is there a way to completely remove the default CPU limit so that pods can use all available CPUs?
GKE, by default, creates a LimitRange object named limits in the default namespace looking like this:
apiVersion: v1
kind: LimitRange
metadata:
name: limits
spec:
limits:
- defaultRequest:
cpu: 100m
type: Container
So, if you want to change this, you can either edit it:
kubectl edit limitrange limits
Or you can delete it altogether:
kubectl delete limitrange limits
Note: the policies in the LimitRange objects are enforced by the LimitRanger admission controller which is enabled by default in GKE.
Limit Range is a policy to constrain resource by Pod or Container in a namespace.
A limit range, defined by a LimitRange object, provides constraints
that can:
Enforce minimum and maximum compute resources usage per Pod or Container in a namespace.
Enforce minimum and maximum storage
request per PersistentVolumeClaim in a namespace.
Enforce a ratio between request and limit for a resource in a namespace.
Set default request/limit for compute resources in a namespace and automatically inject them to Containers at runtime.
You need to find the LimitRange resource of your namespace and remove the spec.limits.default.cpu and spec.limits.defaultRequest.cpu that are defined (or simply delete the LimitRange to remove all constraints).
The resource limitation can be configured in 2 ways.
At object level:
kubectl edit limitrange limits
This object is created by default and the value is 100m (1/10 of CPU) and when a pod reach that limit, it's simply killed.
At manifest level:
Using statefulSet, DaemonSet, etc, through a yaml file and configured on
spec.containers.resources
it's look like this:
spec:
containers:
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200 Mi
As mentioned you can modify the configuration or simply delete them to remove the limitations.
However, they have some reasons why these limitations has been implemented.
I found a video from a Googler talking about it, take a look! [1]
On top of the Limit Range mentioned by Eduardo Baitello, you should also look out for admission controllers, which can intercept requests to the Kubernetes API and modify them (e.g. add limits, and other defaults).
We currently have around 20 jobs. These jobs create one pod each, but we want to make sure that only one of these pods can run at a time, keeping the rest of them in pending status. Increasing the resource limitations makes them to run one by one but I want to be sure that this is always the behaviour.
Is there any way of limiting this concurrency to 1, let's say per label or something similar?
Use ResourceQuota resource:
apiVersion: v1
kind: ResourceQuota
metadata:
name: pod-demo
spec:
hard:
pods: "5"