I am trying to understand how hpa works but I have some concerns:
In case my service is set like this:
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 250m
memory: 512Mi
and I configure hpa in this way:
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: test-service
minReplicas: 3
maxReplicas: 6
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
Is it preventing my service to reach the limits (500m), right?
Is it better to configure by putting a higher value like 80%?
I have this doubt because with this configuration I see pods scaled to the maximum number even if they are using less cpu than limits:
NAME CPU(cores) MEMORY(bytes)
test-service-76f8b8c894-2f944 189m 283Mi
test-service-76f8b8c894-2ztt6 183m 278Mi
test-service-76f8b8c894-4htzg 117m 233Mi
test-service-76f8b8c894-5hxhv 142m 193Mi
test-service-76f8b8c894-6bzbj 140m 200Mi
test-service-76f8b8c894-6sj5m 149m 261Mi
The amount of CPU used is less than the request configured in the definition of the service.
Moreover, I have seen that it has been discussed here as well but I didn't get the answer.
Using Horizontal Pod Autoscaling along with resource requests and limits
Is it preventing my service to reach the limits (500m), right?
No, hpa is not preventing it (althogh resources.limits is). What hpa does is starting new replicas when the average cpu utilization across all pods gets above 50% of requested cpu resources, i.e. above 125m.
Is it better to configure by putting a higher value like 80%?
Can't say, it is application specific.
Horizontal autoscaling is pretty well described in the documentation.
Related
I have a container running in a GKE autopilot K8s cluster. I have the following in my deployment manifest (only relevant parts included):
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
containers:
resources:
requests:
memory: "250Mi"
cpu: "512m"
So I've requested the minimum resources that GKE autopilot allows for normal pods. Note that I have not specified a limits.
However, having applied the manifest and looking at the yaml I see that it does not match what's in the manifest I applied:
resources:
limits:
cpu: 750m
ephemeral-storage: 1Gi
memory: 768Mi
requests:
cpu: 750m
ephemeral-storage: 1Gi
memory: 768Mi
Any idea what's going on here? Why has GKE scaled up the resources. This is costing me more money unnecessarily?
Interestingly it was working as intended until recently. This behaviour only seemed to start in the past few days.
If the resources that you've requested are following:
memory: "250Mi"
cpu: "512m"
Then they are not compliant with the minimal amount of resources that GKE Autopilot will assign. Please take a look on the documentation:
NAME
Normal Pods
CPU
250 mCPU
Memory
512 MiB
Ephemeral storage
10 MiB (per container)
-- Cloud.google.com: Kubernetes Engine: Docs: Concepts: Autopilot overview: Allowable resource ranges
As you can see the amount of memory you've requested was too small and that's why you saw the following message (and the manifest was modified to increate the requests/limits):
Warning: Autopilot increased resource requests for Deployment default/XYZ to meet requirements. See http://g.co/gke/autopilot-resources.
To fix that you will need to assign resources that are within the limits of the documentation, I've included in the link above.
I am playing around with the Horizontal Pod Autoscaler in Kubernetes. I've set the HPA to start up new instances once the average CPU Utilization passes 35%. However this does not seem to work as expected.
The HPA triggers a rescale even though the CPU Utilization is far below the defined target utilization. As seen below the "current" utilization is 10% which is far away from 35%. But still, it rescaled the number of pods from 5 to 6.
I've also checked the metrics in my Google Cloud Platform dashboard (the place at which we host the application). This also shows me that the requested CPU utilization hasn't surpassed the threshold of 35%. But still, several rescales occurred.
The content of my HPA
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: django
spec:
{{ if eq .Values.env "prod" }}
minReplicas: 5
maxReplicas: 35
{{ else if eq .Values.env "staging" }}
minReplicas: 1
maxReplicas: 3
{{ end }}
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: django-app
targetCPUUtilizationPercentage: 35
Does anyone know what the cause of this might be?
This is tricky and can be a bug, but I don't think so, most of time people configure too low values as I'll explain.
How targetCPUUtilizationPercentage relates to Pod's request limits.
The targetCPUUtilizationPercentage configures a percentage based on all the CPU a pod can use. On Kubernetes we can't create an HPA without specifying some limits to CPU usage.
Let's assume that this is our limits:
apiVersion: v1
kind: Pod
metadata:
name: apache
spec:
containers:
- name: apache
image: httpd:alpine
resources:
limits:
cpu: 1000m
And in our targetCPUUtilizationPercentage inside HPA we specify 75%.
That is easy to explain because we ask for 100% (1000m = 1 CPU core) of a single core, so when this core is about 75% of use, HPA will start to work.
But if we define our limits as this:
spec:
containers:
- name: apache
image: httpd:alpine
resources:
limits:
cpu: 500m
Now, 100% of CPU our pod can utilize is only 50% of a single core. Fine, so 100% of cpu usage from this pod means, on hardware, 50% usage of a single core.
This is indifferent for targetCPUUtilizationPercentage, if we keep our value of 75% the HPA will start to work when our single core is about 37.5% usage, because this is 75% of all CPU this pod can consume.
From the perspective of a pod/hpa, they never know that they are limited on CPU or memory.
Understanding the scenario in the question above
With some programs like the one used in the question above - the CPU spikes do occur - however only in small timeframes (for example 10 second spikes). Due to the short duration of these spikes the metric server doesn't save this spike, but only saves the metric after a 1m window. In such cases the spike in between such windows will be excluded. This explains why the spike cannot be seen in the metrics dashboards, but is picked up by the HPA.
Thus, for services with low cpu limits a larger scale-up time window (scaleUp settings in HPA) can be ideal.
Scaling is based on % of requests not limits. I think we should change this answer as the examples in the accepted answer show:
limits:
cpu: 1000m
But the targetCPUUtilizationPercentage is based on requests like:
requests:
cpu: 1000m
For per-pod resource metrics (like CPU), the controller fetches the metrics from the resource metrics API for each Pod targeted by the HorizontalPodAutoscaler. Then, if a target utilization value is set, the controller calculates the utilization value as a percentage of the equivalent resource request on the containers in each Pod. If a target raw value is set, the raw metric values are used directly. The controller then takes the mean of the utilization or the raw value (depending on the type of target specified) across all targeted Pods, and produces a ratio used to scale the number of desired replicas.
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#how-does-a-horizontalpodautoscaler-work
I have deployed an app on Kubernetes and would like to test hpa.
With kubectl top nodes command, i noticed that cpu and memory are increased without stressing it.
Does it make sense?
Also while stressing deployment with apache bench, cpu and memory dont be increased enough to pass the target and make a replica.
My Deployment yaml file is so big to provide it. This is one of my containers.
- name: web
image: php_apache:1.0
imagePullPolicy: Always
resources:
requests:
memory: 50Mi
cpu: 80m
limits:
memory: 100Mi
cpu: 120m
volumeMounts:
- name: shared-data
mountPath: /var/www/html
ports:
- containerPort: 80
It consists of 15 containers
I have a VM that contains a cluster with 2 nodes (master,worker).
I would like to stress deployment so that i can see it scale up.
But here I think there is a problem! Without stressing the app, the
CPU/Memory from Pod has passed the target and 2 replicas have been made (without stressing it).
I know that the more Requests i provide to containers the less is that percentage.
But does it make sense the usage of memory/cpu to be increased from the beggining, without stressing it?
I would like, the left part of target(the usage of memory in pods), be at the beggining 0% and as much as I stress it to be increased and create replicas.
But as i'm stressing with apache bench, the value is increased by a maximum of 10%
We can see here the usage of CPU:
kubectl top pods
NAME CPU(cores) MEMORY(bytes)
x-app-55b54b6fc8-7dqjf 76m 765Mi
!!59% is the usage of memory from the pod and is described by Sum of Memory Requests/Memory(usage of memory). In my case 59% = 765Mi/1310Mi
HPA yaml file:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: hpa
spec:
maxReplicas: 10
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 35
With kubectl top nodes command, i noticed that cpu and memory are increased without stressing it. Does it make sense?
Yes, it makes sense. If you will check Google Cloud about Requests and Limits
Requests and limits are the mechanisms Kubernetes uses to control resources such as CPU and memory. Requests are what the container is guaranteed to get. If a container requests a resource, Kubernetes will only schedule it on a node that can give it that resource. Limits, on the other hand, make sure a container never goes above a certain value. The container is only allowed to go up to the limit, and then it is restricted.
But does it make sense the usage of memory/cpu to be increased from the beggining, without stressing it?
Yes as, for example your container www it can start with memory: 50Mi and cpu: 80m but its allowed to increase to memory: 100Mi and cpu: 120m. Also as you mentioned you have 15 containers in total, so depends on their request, limits it can reach more than 35% of your memory.
In HPA documentation - algorithm-details you can find information:
When a targetAverageValue or targetAverageUtilization is specified, the currentMetricValue is computed by taking the average of the given metric across all Pods in the HorizontalPodAutoscaler's scale target. Before checking the tolerance and deciding on the final values, we take pod readiness and missing metrics into consideration, however.
All Pods with a deletion timestamp set (i.e. Pods in the process of being shut down) and all failed Pods are discarded.
If a particular Pod is missing metrics, it is set aside for later; Pods with missing metrics will be used to adjust the final scaling amount.
Not sure about last question:
!!59% is the usage of memory from the pod and is described by Sum of Memory Requests/Memory(usage of memory). In my case 59% = 765Mi/1310Mi
In your HPA you set to create another pod when averageUtilization: will reach 35% of memory. It reached 59% and it created another pod. As HPA target is memory, HPA is not counting CPU at all. Also please keep in mind as this is average it needs about ~1 minute to change values.
For better understanding how HPA is working, please try this walkthrough.
If this was not helpful, please clarify what are you exact asking.
I have a Kubernetes cluster running on GKE, and I created a new namespace with a ResourceQuota:
yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: bot-quota
spec:
hard:
requests.cpu: '500m'
requests.memory: 1Gi
limits.cpu: '1000m'
limits.memory: 2Gi
which I apply to my namespace (called bots), which gives me kubectl describe resourcequota --namespace=bots:
Name: bot-quota
Namespace: bots
Resource Used Hard
-------- ---- ----
limits.cpu 0 1
limits.memory 0 2Gi
requests.cpu 0 500m
requests.memory 0 1Gi
Name: gke-resource-quotas
Namespace: bots
Resource Used Hard
-------- ---- ----
count/ingresses.extensions 0 5k
count/jobs.batch 0 10k
pods 0 5k
services 0 1500
This is what I expect - and my expectation is that the bots namespace is hard limited to above limits.
Now I would like to deploy a single pod onto that namespace, using this simple yaml:
apiVersion: v1
kind: Pod
metadata:
name: podname
namespace: bots
labels:
app: someLabel
spec:
nodeSelector:
cloud.google.com/gke-nodepool: default-pool
containers:
- name: containername
image: something-image-whatever:latest
resources:
requests:
memory: '96Mi'
cpu: '300m'
limits:
memory: '128Mi'
cpu: '0.5'
args: ['0']
Given the resources specified`, I'd expect to be well in range, deploying a single instance. When I apply the yaml though:
Error from server (Forbidden): error when creating "pod.yaml": pods "podname" is forbidden: exceeded quota: bot-quota, requested: limits.cpu=2500m, used: limits.cpu=0, limited: limits.cpu=1
If I change the pod's yaml to use a cpu limit of 0.3, then the same error appear with limits.cpu=2300m requested.
In other words: it seems to miraculously add 2000m (=2) cpu to my limit.
We do NOT have any LimitRange applied.
What am I missing?
As discussed in the comments above, it is indeed related to istio. How?
As it is (now) obvious, the requests and limits are specified on container level, and NOT on pod/deployment level. Why is that relevant?
Running istio (in our case, managed istio on GKE), the container is not alone in the "workload", much rather it also has istio-init (which is terminated soon after starting) plus istio-proxy.
And these additional containers apply their own limits & resources, in the current pod I am looking at for example:
Limits:
cpu: 2
memory: 1Gi
Requests:
cpu: 100m
memory: 128Mi
on istio-proxy (using: kubectl describe pods <podid>)
This indeed explains why the WHOLE deployment has 2 cpu more in the limit as expected.
I have tried using HPA for a RC which contains only one container and it works perfectly fine. But when I have a RC with multiple containers (i.e., a pod containing multiple containers), the HPA is unable to scrape the CPU utilization and shows the status as "Unknown", shown below. How can I successfully implement a HPA for a RC with multiple containers. The Kuberentes docs have no information regarding this and also I didnt find any mention of it not being possible. Can anyone please share their experience or a point of view, with regard to this issue. Thanks a lot.
prometheus-watch-ssltargets-hpa ReplicationController/prometheus <unknown> / 70% 1 10 0 4s
Also for your reference, below is my HPA yaml file.
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: prometheus-watch-ssltargets-hpa
namespace: monitoring
spec:
scaleTargetRef:
apiVersion: v1
kind: ReplicationController
name: prometheus
minReplicas: 1
maxReplicas: 5
targetCPUUtilizationPercentage: 70
By all means it is possible to set a HPA for an RC/Deployment/Replica-set with multiple containers. In my case the problem was the format of resource limit request. I figured out from this link, that if the pod's containers do not have the relevant resource request set, CPU utilization for the pod will not be defined and the HPA will not take any action for that metric. In my case I was using the resource request as below, which caused the error(But please note that the following resource request format works absolutely fine when I use it with deployments, replication controllers etc. It is only when, in addition I wanted to implement HPA that caused the problem mentioned in the question.)
resources:
limits:
cpu: 2
memory: 200M
requests:
cpu: 1
memory: 100Mi
But after changing it like below(i.e., with a relevant resource request set that HPA can understand), it works fine.
resources:
limits:
cpu: 2
memory: 200Mi
requests:
cpu: 1
memory: 100Mi