Is it possible to have the HPA scale based on the number of available running pods?
I have set up a readiness probe that cuts out a pod based it's internal state (idle, working, busy). When a pod is 'busy', it no longer receives new requests. But the cpu, and memory demands are low.
I don't want to scale based on cpu, mem, or other metrics.
Seeing as the readiness probe removes it from active service, can I scale based on the average number of active (not busy) pods? When that number drops below a certain point more pods are scaled.
You can create custom metrics, a number of busy-pods for HPA.
That is, the application should emit a metric value when it is busy. And use that metric to create HorizontalPodAutoscaler.
Something like this:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
name: custom-metric-sd
namespace: default
apiVersion: apps/v1beta1
kind: Deployment
name: custom-metric-sd
minReplicas: 1
maxReplicas: 20
- type: Pods
metricName: busy-pods
targetAverageValue: 4
How does the Kubernetes HPA(HorizontalPodAutoscaler) determine which pod's metrics should be used if multiple PODs have the same metrics

suppose we have below HPA(HorizontalPodAutoscaler) deployed in the demo namespace, and multiple pods (POD-A,POD-B) in this demo namespace have the same metric "istio_requests_per_second", How does the HPA determine the metric "istio_requests_per_second" from which pod should be used? Or every POD with this metric will be evaluate against the HPA target?
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
name: httpbin
minReplicas: 1
maxReplicas: 5
- type: Pods
name: istio_requests_per_second
type: AverageValue
averageValue: "10"
apiVersion: apps/v1
kind: Deployment
name: httpbin
If you're using prometheus then its the adapter thats correlating between k8's pod name and what metric value to return. Basically the HPA is asking the prometheus adapter for metric istio_requests_per_second. By calling /apis/custom.metrics.k8s.io/v1beta1/namespaces/myNamespace/pods/mypod the adapter takes that and looks at its rules configured for what it should query for.
Based on my test, I think HPA uses the 'scaleTargetRef' to determine which POD's metrics should be used, and pull these metrics from the metrics server and evaluate them against the target config.
As per Kubernetes documentation:
For object metrics and external metrics, a single metric is fetched, which describes the object in question. This metric is compared to the target value, to produce a ratio as above. In the autoscaling/v2 API version, this value can optionally be divided by the number of Pods before the comparison is made.
GKE HPA only targeting Node CPU utilisation rather than targeted deployments

I have two Deployments A and B running on a node, I've set up the hpas as so:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
name: A
namespace: default
apiVersion: apps/v1
kind: Deployment
name: A
minReplicas: 1
maxReplicas: 4
- type: Resource
name: cpu
targetAverageUtilization: 75
(and the same for B, but with the names replaced of course).
However when monitoring the HPAs the target CPU utilisation is ALWAYS the same for both HPAs and hence both A and B always scale at the same time even if their simulated workloads are different, so it seems the HPA is targeting the node cpu utilisation rather than the deployment. Further testing by running jobs independent of A and B on the node still trigger HPA scaling of A and B.
How can I can configure it so each HPA ONLY targets the CPU utilisation of the target deployment?
Horizontal pod Autoscaler scales custom metric too aggressively on GKE

I have the below Horizontal Pod Autoscaller configuration on Google Kubernetes Engine to scale a deployment by a custom metric - RabbitMQ messages ready count for a specific queue: foo-queue.
It picks up the metric value correctly.
When inserting 2 messages it scales the deployment to the maximum 10 replicas.
I expect it to scale to 2 replicas since the targetValue is 1 and there are 2 messages ready.
Why does it scale so aggressively?
HPA configuration:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
name: foo-hpa
namespace: development
apiVersion: apps/v1
kind: Deployment
name: foo
minReplicas: 1
maxReplicas: 10
- type: External
metricName: "custom.googleapis.com|rabbitmq_queue_messages_ready"
metric.labels.queue: foo-queue
targetValue: 1
I think you did a great job explaining how targetValue works with HorizontalPodAutoscalers. However, based on your question, I think you're looking for targetAverageValue instead of targetValue.
In the Kubernetes docs on HPAs, it mentions that using targetAverageValue instructs Kubernetes to scale pods based on the average metric exposed by all Pods under the autoscaler. While the docs aren't explicit about it, an external metric (like the number of jobs waiting in a message queue) counts as a single data point. By scaling on an external metric with targetAverageValue, you can create an autoscaler that scales the number of Pods to match a ratio of Pods to jobs.
Back to your example:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
name: foo-hpa
namespace: development
apiVersion: apps/v1
kind: Deployment
name: foo
minReplicas: 1
maxReplicas: 10
- type: External
metricName: "custom.googleapis.com|rabbitmq_queue_messages_ready"
metric.labels.queue: foo-queue
# Aim for one Pod per message in the queue
targetAverageValue: 1
will cause the HPA to try keeping one Pod around for every message in your queue (with a max of 10 pods).
As an aside, targeting one Pod per message is probably going to cause you to start and stop Pods constantly. If you end up starting a ton of Pods and process all of the messages in the queue, Kubernetes will scale your Pods down to 1. Depending on how long it takes to start your Pods and how long it takes to process your messages, you may have lower average message latency by specifying a higher targetAverageValue. Ideally, given a constant amount of traffic, you should aim to have a constant number of Pods processing messages (which requires you to process messages at about the same rate that they are enqueued).
According to https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
From the most basic perspective, the Horizontal Pod Autoscaler controller operates on the ratio between desired metric value and current metric value:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
From the above I understand that as long as the queue has messages the k8 HPA will continue to scale up since currentReplicas is part of the desiredReplicas calculation.
For example if:
currentReplicas = 1
currentMetricValue / desiredMetricValue = 2/1
desiredReplicas = 2
If the metric stay the same in the next hpa cycle currentReplicas will become 2 and desiredReplicas will be raised to 4
Try to follow this instruction that describes horizontal autoscale settings for RabbitMQ in k8s
Kubernetes Workers Autoscaling based on RabbitMQ queue size
In particular, targetValue: 20 of metric rabbitmq_queue_messages_ready is recommended instead of targetValue: 1:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
name: workers-hpa
apiVersion: apps/v1beta1
kind: Deployment
name: my-workers
minReplicas: 1
maxReplicas: 10
- type: External
metricName: "custom.googleapis.com|rabbitmq_queue_messages_ready"
metric.labels.queue: myqueue
**targetValue: 20
Now our deployment my-workers will grow if RabbitMQ queue myqueue has more than 20 non-processed jobs in total
I'm using the same Prometheus metrics from RabbitMQ (I'm using Celery with RabbitMQ as broker).
Did anyone here considered using rabbitmq_queue_messages_unacked metric rather than rabbitmq_queue_messages_ready?
The thing is, that rabbitmq_queue_messages_ready is decreasing as soon the message pulled by a worker and I'm afraid that long-running task might be killed by HPA, while rabbitmq_queue_messages_unacked stays until the task completed.
For example, I have a message that will trigger a new pod (celery-worker) to run a task that will take 30 minutes. The rabbitmq_queue_messages_ready will decrease as the pod is running and the HPA cooldown/delay will terminate pod.
EDIT: seems like a third one rabbitmq_queue_messages is the right one - which is the sum of both unacked and ready:
How to prevent Kubernetes horizontal auto-scaler from scaling down?

I have created a horizontal auto-scaler based on the cpu usage and it works fine. I want to know how I can configure the autoscaler in a way that it just scales up without scaling down? The reason I want such a thing is when I have high load/request I create some operators but I want to keep them alive even if for some amount of time they don't do anything but auto-scaler kills the pods and scaling down to the minimum replicas after sometime if there is no load.
My autoscaler:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
name: gateway
namespace: default
apiVersion: apps/v1
kind: Deployment
name: gateway
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 20
By operator, I mean small applications/programs that are running in a pod.
You can add --horizontal-pod-autoscaler-downscale-stabilization flag to kube-controller-manager as described in docs. Default delay is set to 5 minutes.
Is it posssible to define multiple replicas for different values of custom metrics in HorizontalPodAutoscaling

I am using HPA(HorizontalPodAutoscaling) along with custom metrics in kubernetes. I can scale my pod count according to my custom metrics value.
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2alpha1
name: sample-metrics-app-hpa
kind: Deployment
name: sample-metrics-app
minReplicas: 2
maxReplicas: 10
- type: Object
kind: Service
name: sample-metrics-app
metricName: http_requests
targetValue: 100
Is it possible to define multiple target value, for example if the http request hits 100 then the pod should scale by 10(min replica should be 2 for this), if it hits 1000, then the pod should scale by 20(min replica needs to be 10).
As I know it is not possible to achieve such result with HPA. Moreover, your configuration says that every 100 http requests hpa will add 1 more pod. Not 2 --> 10 at once, but again, 1-->2-->3 every 100 requests.
