GCP kubernetes HPA not working as described - kubernetes

We have been running our workload on kubernetes on GCP for around a year now, however, last week, one of our websites was hit hard by a promo campaign launched by the site owner but without us prescaling the cluster.
Since then we've been load testing the stack on a test cluster, the issue we're having is that using a Google node, the HPA doesn't scale up as written in the documentation, rather it always scales
2 --> 4 -->8 -->16 -->32 --64--128
Regardless of CPU load.
For instance in our test we had 251%/60% cpu with 8 pods running. My math would suggest (251/60)*8= 33 pods are needed, however, it will always go to 16 next, then wait 3 mins before going to 32.
The scale out required for the test to function is around 64 pods, which it arrives at after about 25 mins, instead of 8, is there a way to get the GCP version operating more like the manual?
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
annotations:
autoscaling.alpha.kubernetes.io/conditions: '[{"type":"AbleToScale","status":"False","lastTransitionTime":"2017-11- 22T14:17:35Z","reason":"BackoffBoth","message":"the
time since the previous scale is still within both the downscale and upscale
forbidden windows"},{"type":"ScalingActive","status":"True","lastTransitionTime":"2017-11-21T19:09:34Z","reason":"ValidMetricFound","message":"the
HPA was able to succesfully calculate a replica count from cpu resource utilization
(percentage of request)"},{"type":"ScalingLimited","status":"False","lastTransitionTime":"2017-11-22T14:11:05Z","reason":"DesiredWithinRange","message":"the
desired replica count is within the acceptible range"}]'
autoscaling.alpha.kubernetes.io/current-metrics: '[{"type":"Resource","resource":{"name":"cpu","currentAverageUtilization":64,"currentAverageValue":"193m"}}]'
creationTimestamp: 2017-11-20T15:44:48Z
name: varnish-7
namespace: default
resourceVersion: "373498"
selfLink: /apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers/varnish-7
uid: bd60211b-ce09-11e7-af0d-42010a8e0099
spec:
maxReplicas: 60
minReplicas: 2
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: Deployment
name: varnish-7
targetCPUUtilizationPercentage: 50
status:
currentCPUUtilizationPercentage: 64
currentReplicas: 8
desiredReplicas: 8
lastScaleTime: 2017-11-22T14:16:05Z

Related

What is the unit of targetAverageValue for external metrics kubernetes.io|container|accelerator|duty_cycle for horizontal pod autoscaling?

I referred this stackoverflow question to set up my HPA(Horizontal Pod Autoscaler) for google kubernetes engine(gke) workload. According to the details of that question and the details specified here I mentioned my targetAverageValue to be 50 which should be considered 50% but when I run the command kubectl describe hpa this is the line I notice in the logs
Metrics: ( current / target ) "kubernetes.io|container|accelerator|duty_cycle" (target average value): 33500m / 50
This is my hpa yaml
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: gpu-metric
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: parabole-dj-u1
minReplicas: 1
maxReplicas: 5
metrics:
- type: External
external:
metricName: kubernetes.io|container|accelerator|duty_cycle
targetAverageValue: 50
It seems to be measuring using some other unit. What then should be my targetAverageValue if I want it to autoscale at 50% duty_cycle?
Adding the screenshot of the duty cycle metric from the portal like #Alberto Pau asked duty_cycle image
Your configuration is correct, HPA always shows in the mili units.
The current utilization is probably 33.5%, just divide the number with the "m" by 1000 and you get the percentages.

How do I stop replicaSet terminating requested number of pods

I have a minimum number of replicas set to 3, maximum 10 and the number of requested replicas as 6. When I deploy, the replicaSet looks good, and I have 6 pods running as expected.
However, after a few minutes I get this message- "Scaled down replica set my-first-app to 3". It then terminates my pods so I am only left with 3. How do I stop it doing this? I want to have 6 replicas.
The HorizontalPodAutoscaler (which I assume you are talking about, as the ReplicaSet resource would not actually be able to scale your Pods horizontally) automatically scales down resources based on the configured metrics (e.g. CPU usage, Memory usage).
High level it uses the following algorithm to determine the number of replicas:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
It then checks whether the desiredReplicas is larger than the minimum number of replicas required.
See the docs on the Horizontal Pod Autoscaler for more details.
To answer your question: If you always want 6 Pods at the very minimum, just make sure the minReplicas are set to 6, e.g.:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: hpa-example
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: deployment-example
minReplicas: 6
maxReplicas: 10
targetCPUUtilizationPercentage: 70

GKE HPA only targeting Node CPU utilisation rather than targeted deployments

I have two Deployments A and B running on a node, I've set up the hpas as so:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: A
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: A
minReplicas: 1
maxReplicas: 4
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 75
(and the same for B, but with the names replaced of course).
However when monitoring the HPAs the target CPU utilisation is ALWAYS the same for both HPAs and hence both A and B always scale at the same time even if their simulated workloads are different, so it seems the HPA is targeting the node cpu utilisation rather than the deployment. Further testing by running jobs independent of A and B on the node still trigger HPA scaling of A and B.
How can I can configure it so each HPA ONLY targets the CPU utilisation of the target deployment?
The metric name cpu is too vague and does not target the pod CPU utilization. Since you are just looking to use standard pod CPU, I recommend using the v1 HPA version instead of v2Beta1 and define 75 in the targetCPUUtilizationPercentage field as this refers specifically to the pod CPU utilization

Horizontal pod Autoscaler scales custom metric too aggressively on GKE

I have the below Horizontal Pod Autoscaller configuration on Google Kubernetes Engine to scale a deployment by a custom metric - RabbitMQ messages ready count for a specific queue: foo-queue.
It picks up the metric value correctly.
When inserting 2 messages it scales the deployment to the maximum 10 replicas.
I expect it to scale to 2 replicas since the targetValue is 1 and there are 2 messages ready.
Why does it scale so aggressively?
HPA configuration:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: foo-hpa
namespace: development
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: foo
minReplicas: 1
maxReplicas: 10
metrics:
- type: External
external:
metricName: "custom.googleapis.com|rabbitmq_queue_messages_ready"
metricSelector:
matchLabels:
metric.labels.queue: foo-queue
targetValue: 1
I think you did a great job explaining how targetValue works with HorizontalPodAutoscalers. However, based on your question, I think you're looking for targetAverageValue instead of targetValue.
In the Kubernetes docs on HPAs, it mentions that using targetAverageValue instructs Kubernetes to scale pods based on the average metric exposed by all Pods under the autoscaler. While the docs aren't explicit about it, an external metric (like the number of jobs waiting in a message queue) counts as a single data point. By scaling on an external metric with targetAverageValue, you can create an autoscaler that scales the number of Pods to match a ratio of Pods to jobs.
Back to your example:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: foo-hpa
namespace: development
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: foo
minReplicas: 1
maxReplicas: 10
metrics:
- type: External
external:
metricName: "custom.googleapis.com|rabbitmq_queue_messages_ready"
metricSelector:
matchLabels:
metric.labels.queue: foo-queue
# Aim for one Pod per message in the queue
targetAverageValue: 1
will cause the HPA to try keeping one Pod around for every message in your queue (with a max of 10 pods).
As an aside, targeting one Pod per message is probably going to cause you to start and stop Pods constantly. If you end up starting a ton of Pods and process all of the messages in the queue, Kubernetes will scale your Pods down to 1. Depending on how long it takes to start your Pods and how long it takes to process your messages, you may have lower average message latency by specifying a higher targetAverageValue. Ideally, given a constant amount of traffic, you should aim to have a constant number of Pods processing messages (which requires you to process messages at about the same rate that they are enqueued).
According to https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
From the most basic perspective, the Horizontal Pod Autoscaler controller operates on the ratio between desired metric value and current metric value:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
From the above I understand that as long as the queue has messages the k8 HPA will continue to scale up since currentReplicas is part of the desiredReplicas calculation.
For example if:
currentReplicas = 1
currentMetricValue / desiredMetricValue = 2/1
then:
desiredReplicas = 2
If the metric stay the same in the next hpa cycle currentReplicas will become 2 and desiredReplicas will be raised to 4
Try to follow this instruction that describes horizontal autoscale settings for RabbitMQ in k8s
Kubernetes Workers Autoscaling based on RabbitMQ queue size
In particular, targetValue: 20 of metric rabbitmq_queue_messages_ready is recommended instead of targetValue: 1:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: workers-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1beta1
kind: Deployment
name: my-workers
minReplicas: 1
maxReplicas: 10
metrics:
- type: External
external:
metricName: "custom.googleapis.com|rabbitmq_queue_messages_ready"
metricSelector:
matchLabels:
metric.labels.queue: myqueue
**targetValue: 20
Now our deployment my-workers will grow if RabbitMQ queue myqueue has more than 20 non-processed jobs in total
I'm using the same Prometheus metrics from RabbitMQ (I'm using Celery with RabbitMQ as broker).
Did anyone here considered using rabbitmq_queue_messages_unacked metric rather than rabbitmq_queue_messages_ready?
The thing is, that rabbitmq_queue_messages_ready is decreasing as soon the message pulled by a worker and I'm afraid that long-running task might be killed by HPA, while rabbitmq_queue_messages_unacked stays until the task completed.
For example, I have a message that will trigger a new pod (celery-worker) to run a task that will take 30 minutes. The rabbitmq_queue_messages_ready will decrease as the pod is running and the HPA cooldown/delay will terminate pod.
EDIT: seems like a third one rabbitmq_queue_messages is the right one - which is the sum of both unacked and ready:
sum of ready and unacknowledged messages - total queue depth
documentation

How to prevent Kubernetes horizontal auto-scaler from scaling down?

I have created a horizontal auto-scaler based on the cpu usage and it works fine. I want to know how I can configure the autoscaler in a way that it just scales up without scaling down? The reason I want such a thing is when I have high load/request I create some operators but I want to keep them alive even if for some amount of time they don't do anything but auto-scaler kills the pods and scaling down to the minimum replicas after sometime if there is no load.
My autoscaler:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: gateway
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: gateway
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 20
Edit:
By operator, I mean small applications/programs that are running in a pod.
You can add --horizontal-pod-autoscaler-downscale-stabilization flag to kube-controller-manager as described in docs. Default delay is set to 5 minutes.
To add flag to kube-controller-manager edit /etc/kubernetes/manifests/kube-controller-manager.yaml on master node, pod will be then recreated.