Alicloud node memory Requested/Limited/Used - kubernetes

We have a test cluster of 3 nodes on Alicloud
We haven't set any memory limits, per namespace or per pod
However, when looking at the nodes in the alicloud console, we see a Requested/Limited/Used set for the memory. The pods are running out of memory when the Used is over the Limited threshold
Does anyone know where this limit comes from? It seems to be different for each one of our node, so it creates an arbitrary limit per pod?

To be honest I cant find any relevant information where are your current default limits come from.. but anyway you should solve your issue somehow.
I can suggest you manually set required limits to avoid OOMs in future. What you will need to do - is to calculate approximate resource usage and correctly apply limits
This can help:
Kubernetes: Assign Memory Resources and Limits to Containers, especially this part
LimitRange for Memory
Kubernetes administrators can define RAM limits for their nodes.
These limits are enforced at higher priority over how much RAM your
Pod declares and wants to use.
Let's define our first LimitRange : 25Mi RAM as min, 200Mi as max.
nano myRAM-LimitRange.yaml
apiVersion: v1
kind: LimitRange
metadata:
name: my-ram-limit
spec:
limits:
- max:
memory: 200Mi
min:
memory: 25Mi

Related

Programmatic calculation of Kubernetes Limit Range

I am looking for a way to calculate appropriate Limit Range and Resource Quota settings for Kubernetes based on the sizing of our Load Test (LT) environment. The LT environment we want to keep flexible in order to play with things and I feel that's a great way to figure out how to set up the limits, etc.
I might also have a fundamental misunderstanding of how this works, so feel free to correct that.
Does anyone have a set of equations or anything that takes into account (I know it won't be an exact science, but I am looking mostly for a jumping-off point):
Container CPU
Container memory
Right now I am pulling the CPU limits requested using this (and the memory similarly, and using some nifty shell script things I found to do full calculations for me):
kubectl get pods -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[].resource.limits.cpu}{"\n"}{end}' -n my-namespace
We made sure all of our containers are explicitly making requests for CPU/memory, so that works nicely.
The machine type is based on our testing and target number of pods per node. We have nodeSelector declarations in use as we need to separate things out for some very specific needs by the services being deployed and to be able to leverage multiple machine types.
For the Limit Range I was thinking (adding 10% just for padding):
Maximum [CPU/memory] + 10% (ensuring that the machine type holds 2x that calculation) as:
apiVersion: v1
kind: LimitRange
metadata:
name: ns-my-namespace
namespace: my-namespace
spec:
limits:
- max:
cpu: [calculation_from_above]
memory: [calculation_from_above]
type: Container
For the Resource Quota I was thinking (50% to handle estimated overflow in "emergency" HPA):
Total of all CPU/memory in the Load Test environment + 50% as:
apiVersion: v1
kind: ResourceQuota
metadata:
name: ns-my-namespace
namespace: my-namespace
spec:
hard:
limits:
cpu: [calculation_from_above]
memory: [calculation_from_above]

HPA Scaling even though Current CPU is below Target CPU

I am playing around with the Horizontal Pod Autoscaler in Kubernetes. I've set the HPA to start up new instances once the average CPU Utilization passes 35%. However this does not seem to work as expected.
The HPA triggers a rescale even though the CPU Utilization is far below the defined target utilization. As seen below the "current" utilization is 10% which is far away from 35%. But still, it rescaled the number of pods from 5 to 6.
I've also checked the metrics in my Google Cloud Platform dashboard (the place at which we host the application). This also shows me that the requested CPU utilization hasn't surpassed the threshold of 35%. But still, several rescales occurred.
The content of my HPA
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: django
spec:
{{ if eq .Values.env "prod" }}
minReplicas: 5
maxReplicas: 35
{{ else if eq .Values.env "staging" }}
minReplicas: 1
maxReplicas: 3
{{ end }}
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: django-app
targetCPUUtilizationPercentage: 35
Does anyone know what the cause of this might be?
This is tricky and can be a bug, but I don't think so, most of time people configure too low values as I'll explain.
How targetCPUUtilizationPercentage relates to Pod's request limits.
The targetCPUUtilizationPercentage configures a percentage based on all the CPU a pod can use. On Kubernetes we can't create an HPA without specifying some limits to CPU usage.
Let's assume that this is our limits:
apiVersion: v1
kind: Pod
metadata:
name: apache
spec:
containers:
- name: apache
image: httpd:alpine
resources:
limits:
cpu: 1000m
And in our targetCPUUtilizationPercentage inside HPA we specify 75%.
That is easy to explain because we ask for 100% (1000m = 1 CPU core) of a single core, so when this core is about 75% of use, HPA will start to work.
But if we define our limits as this:
spec:
containers:
- name: apache
image: httpd:alpine
resources:
limits:
cpu: 500m
Now, 100% of CPU our pod can utilize is only 50% of a single core. Fine, so 100% of cpu usage from this pod means, on hardware, 50% usage of a single core.
This is indifferent for targetCPUUtilizationPercentage, if we keep our value of 75% the HPA will start to work when our single core is about 37.5% usage, because this is 75% of all CPU this pod can consume.
From the perspective of a pod/hpa, they never know that they are limited on CPU or memory.
Understanding the scenario in the question above
With some programs like the one used in the question above - the CPU spikes do occur - however only in small timeframes (for example 10 second spikes). Due to the short duration of these spikes the metric server doesn't save this spike, but only saves the metric after a 1m window. In such cases the spike in between such windows will be excluded. This explains why the spike cannot be seen in the metrics dashboards, but is picked up by the HPA.
Thus, for services with low cpu limits a larger scale-up time window (scaleUp settings in HPA) can be ideal.
Scaling is based on % of requests not limits. I think we should change this answer as the examples in the accepted answer show:
limits:
cpu: 1000m
But the targetCPUUtilizationPercentage is based on requests like:
requests:
cpu: 1000m
For per-pod resource metrics (like CPU), the controller fetches the metrics from the resource metrics API for each Pod targeted by the HorizontalPodAutoscaler. Then, if a target utilization value is set, the controller calculates the utilization value as a percentage of the equivalent resource request on the containers in each Pod. If a target raw value is set, the raw metric values are used directly. The controller then takes the mean of the utilization or the raw value (depending on the type of target specified) across all targeted Pods, and produces a ratio used to scale the number of desired replicas.
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#how-does-a-horizontalpodautoscaler-work

Kubernetes Deployment makes great use of cpu and memory without stressing it

I have deployed an app on Kubernetes and would like to test hpa.
With kubectl top nodes command, i noticed that cpu and memory are increased without stressing it.
Does it make sense?
Also while stressing deployment with apache bench, cpu and memory dont be increased enough to pass the target and make a replica.
My Deployment yaml file is so big to provide it. This is one of my containers.
- name: web
image: php_apache:1.0
imagePullPolicy: Always
resources:
requests:
memory: 50Mi
cpu: 80m
limits:
memory: 100Mi
cpu: 120m
volumeMounts:
- name: shared-data
mountPath: /var/www/html
ports:
- containerPort: 80
It consists of 15 containers
I have a VM that contains a cluster with 2 nodes (master,worker).
I would like to stress deployment so that i can see it scale up.
But here I think there is a problem! Without stressing the app, the
CPU/Memory from Pod has passed the target and 2 replicas have been made (without stressing it).
I know that the more Requests i provide to containers the less is that percentage.
But does it make sense the usage of memory/cpu to be increased from the beggining, without stressing it?
I would like, the left part of target(the usage of memory in pods), be at the beggining 0% and as much as I stress it to be increased and create replicas.
But as i'm stressing with apache bench, the value is increased by a maximum of 10%
We can see here the usage of CPU:
kubectl top pods
NAME CPU(cores) MEMORY(bytes)
x-app-55b54b6fc8-7dqjf 76m 765Mi
!!59% is the usage of memory from the pod and is described by Sum of Memory Requests/Memory(usage of memory). In my case 59% = 765Mi/1310Mi
HPA yaml file:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: hpa
spec:
maxReplicas: 10
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 35
With kubectl top nodes command, i noticed that cpu and memory are increased without stressing it. Does it make sense?
Yes, it makes sense. If you will check Google Cloud about Requests and Limits
Requests and limits are the mechanisms Kubernetes uses to control resources such as CPU and memory. Requests are what the container is guaranteed to get. If a container requests a resource, Kubernetes will only schedule it on a node that can give it that resource. Limits, on the other hand, make sure a container never goes above a certain value. The container is only allowed to go up to the limit, and then it is restricted.
But does it make sense the usage of memory/cpu to be increased from the beggining, without stressing it?
Yes as, for example your container www it can start with memory: 50Mi and cpu: 80m but its allowed to increase to memory: 100Mi and cpu: 120m. Also as you mentioned you have 15 containers in total, so depends on their request, limits it can reach more than 35% of your memory.
In HPA documentation - algorithm-details you can find information:
When a targetAverageValue or targetAverageUtilization is specified, the currentMetricValue is computed by taking the average of the given metric across all Pods in the HorizontalPodAutoscaler's scale target. Before checking the tolerance and deciding on the final values, we take pod readiness and missing metrics into consideration, however.
All Pods with a deletion timestamp set (i.e. Pods in the process of being shut down) and all failed Pods are discarded.
If a particular Pod is missing metrics, it is set aside for later; Pods with missing metrics will be used to adjust the final scaling amount.
Not sure about last question:
!!59% is the usage of memory from the pod and is described by Sum of Memory Requests/Memory(usage of memory). In my case 59% = 765Mi/1310Mi
In your HPA you set to create another pod when averageUtilization: will reach 35% of memory. It reached 59% and it created another pod. As HPA target is memory, HPA is not counting CPU at all. Also please keep in mind as this is average it needs about ~1 minute to change values.
For better understanding how HPA is working, please try this walkthrough.
If this was not helpful, please clarify what are you exact asking.

Openshift ResourceQuota usage rather than reservation?

Running Openshift 3.11 with project ResourceQuotas and LimitRanges enforced, I am trying to understand how I can utilise the entire of my project CPU quota based on the "actual current usage" rather than what I have "reserved".
As a simple example, if my question is not clear:
If I have a project with a ResourceQuota of 2 Core CPU
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-resources
spec:
hard:
limits.cpu: "2"
I have a number of long running containers which are often idle, waiting for requests. So are not actually using much CPU. When requests start appearing I want the effected container to be able to "burst", allowing CPU usage up to the remaining CPU quota available in the project based on what is actually being used (I have no issue with the 100ms CFS resolution).
I need to enforce the maximum the project can have in total, hence the limits.cpu ResourceQuota. But, I must therefore also provide the limits.cpu for each container I create (explicitly or via LimitRange defaults) e.g:
...
spec:
containers:
...
resources:
limits:
cpu: "2"
requests:
cpu: 200m
This however will only work with the first container I create - the second container with the same settings will exceed the project quotas limits.cpu. But the container is just idle doing almost nothing after it's initial startup sequence.
Is it not possible in my scenario above to have it deallocate 200m from the quota for each container based on the request.cpu and burst up to 1800m? ( 1600m of 2000m quota unused + initial 200m requested )
I have read through the following, the overcommit link seemed promising, but I am still stuck.
https://docs.openshift.com/container-platform/3.11/admin_guide/quota.html
https://docs.openshift.com/container-platform/3.11/admin_guide/limits.html
https://docs.openshift.com/container-platform/3.11/admin_guide/overcommit.html
Is what I am trying to do possible?
I am trying to understand how I can utilise the entire of my project CPU quota based on the "actual current usage" rather than what I have "reserved"
You can't. If your quota is on limit.cpu then the cluster admin doesn't want you to burst higher than that value.
If you can get your cluster admin to set your quota differently, to have a low request.cpu quota, and a higher limit.cpu quota, you might be able to size your containers as you'd like.
The other option is to use low limits, and a Horizontal Pod Autoscaler to scale up the number of pods for a specific service that is getting a burst in traffic.

Removing default CPU request and limits on GCP Kubernetes

Kubernetes on Google Cloud Platform configures a default CPU request and limit.
I make use of deamonsets and deamonset pods should use as much CPU as possible.
Manually increasing the upper limit is possible but the upper bound must be reconfigured in case of new nodes and the upper bound must be set much lower than what is available on the node in order to have rolling updates allowing pods scheduling.
This requires a lot of manual actions and some resources are just not used most of the time. Is there a way to completely remove the default CPU limit so that pods can use all available CPUs?
GKE, by default, creates a LimitRange object named limits in the default namespace looking like this:
apiVersion: v1
kind: LimitRange
metadata:
name: limits
spec:
limits:
- defaultRequest:
cpu: 100m
type: Container
So, if you want to change this, you can either edit it:
kubectl edit limitrange limits
Or you can delete it altogether:
kubectl delete limitrange limits
Note: the policies in the LimitRange objects are enforced by the LimitRanger admission controller which is enabled by default in GKE.
Limit Range is a policy to constrain resource by Pod or Container in a namespace.
A limit range, defined by a LimitRange object, provides constraints
that can:
Enforce minimum and maximum compute resources usage per Pod or Container in a namespace.
Enforce minimum and maximum storage
request per PersistentVolumeClaim in a namespace.
Enforce a ratio between request and limit for a resource in a namespace.
Set default request/limit for compute resources in a namespace and automatically inject them to Containers at runtime.
You need to find the LimitRange resource of your namespace and remove the spec.limits.default.cpu and spec.limits.defaultRequest.cpu that are defined (or simply delete the LimitRange to remove all constraints).
The resource limitation can be configured in 2 ways.
At object level:
kubectl edit limitrange limits
This object is created by default and the value is 100m (1/10 of CPU) and when a pod reach that limit, it's simply killed.
At manifest level:
Using statefulSet, DaemonSet, etc, through a yaml file and configured on
spec.containers.resources
it's look like this:
spec:
containers:
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200 Mi
As mentioned you can modify the configuration or simply delete them to remove the limitations.
However, they have some reasons why these limitations has been implemented.
I found a video from a Googler talking about it, take a look! [1]
On top of the Limit Range mentioned by Eduardo Baitello, you should also look out for admission controllers, which can intercept requests to the Kubernetes API and modify them (e.g. add limits, and other defaults).