How do I stop replicaSet terminating requested number of pods - kubernetes

I have a minimum number of replicas set to 3, maximum 10 and the number of requested replicas as 6. When I deploy, the replicaSet looks good, and I have 6 pods running as expected.
However, after a few minutes I get this message- "Scaled down replica set my-first-app to 3". It then terminates my pods so I am only left with 3. How do I stop it doing this? I want to have 6 replicas.

The HorizontalPodAutoscaler (which I assume you are talking about, as the ReplicaSet resource would not actually be able to scale your Pods horizontally) automatically scales down resources based on the configured metrics (e.g. CPU usage, Memory usage).
High level it uses the following algorithm to determine the number of replicas:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
It then checks whether the desiredReplicas is larger than the minimum number of replicas required.
See the docs on the Horizontal Pod Autoscaler for more details.
To answer your question: If you always want 6 Pods at the very minimum, just make sure the minReplicas are set to 6, e.g.:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: hpa-example
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: deployment-example
minReplicas: 6
maxReplicas: 10
targetCPUUtilizationPercentage: 70

Related

Can I autoscale Kind : Pod?

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: testingHPA
spec:
scaleTargetRef:
apiVersion: apps/v1beta1
kind: Deployment
name: my_app
minReplicas: 3
maxReplicas: 5
targetCPUUtilizationPercentage: 85
Above is the normal hpa.yaml structure, is it possible to use kind as a pod and auto scale it ??
As already pointed by others, it is not possible to set Pod as the Kind object as the target resource for an HPA.
The document describes HPA as:
The Horizontal Pod Autoscaler automatically scales the number of Pods
in a replication controller, deployment, replica set or stateful set
based on observed CPU utilization (or, with custom metrics support, on
some other application-provided metrics). Note that Horizontal Pod
Autoscaling does not apply to objects that can't be scaled, for
example, DaemonSets.
The document also described how the algorithm is implemented at the backend as:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
and since the Pod resource does not have the replicas field as part of its spec therefore we can conclude that the same is not supported for auto scaling using the HPA.
A single Pod is only ever one Pod. It does not have any mechanism for horizontal scaling because it is that mechanism for everything else.

Kubernetes horizontal pod autoscaler - taget replicas computation

I am running a Kubernetes horizontal pod autoscaler to scale kafka consumers based on the consumer group lag. The HPA yaml file is shown below.
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: kafka-consumer-application
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: kafka-consumer-application
minReplicas: 1
maxReplicas: 10
metrics:
- type: External
external:
metric:
name: kafka_consumergroup_lag
target:
type: AverageValue
averageValue:5
I observed that the HPA is scaling replicas not strictly according to the formula ceil(currentReplicas * currentMetricValue/desiredMetricValue ).
For instance, when the metric (consumer lag) was 108 with only one replica, Kubernetes scaled up only 4 replicas (as shown in the screen shot below), while theoretically it should scale to 10 (maximum replicas allowed)....
Any idea on the reason? am I missing something such as the maximim number of replicas that can be scaled/replicated per single iteration of the HPA reconciliation loop?
Please notice the message in the screenshot 'ScalingLimited True ScaleUpLimit the desired replica count is increasing faster than the maximum scale rate' what does it mean?
Thanks.

Kubernetes HPA based on available healthy pods

Is it possible to have the HPA scale based on the number of available running pods?
I have set up a readiness probe that cuts out a pod based it's internal state (idle, working, busy). When a pod is 'busy', it no longer receives new requests. But the cpu, and memory demands are low.
I don't want to scale based on cpu, mem, or other metrics.
Seeing as the readiness probe removes it from active service, can I scale based on the average number of active (not busy) pods? When that number drops below a certain point more pods are scaled.
TIA for any suggestions.
You can create custom metrics, a number of busy-pods for HPA.
That is, the application should emit a metric value when it is busy. And use that metric to create HorizontalPodAutoscaler.
Something like this:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: custom-metric-sd
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1beta1
kind: Deployment
name: custom-metric-sd
minReplicas: 1
maxReplicas: 20
metrics:
- type: Pods
pods:
metricName: busy-pods
targetAverageValue: 4
Here is another reference for HPA with custom metrics.

Horizontal pod Autoscaler scales custom metric too aggressively on GKE

I have the below Horizontal Pod Autoscaller configuration on Google Kubernetes Engine to scale a deployment by a custom metric - RabbitMQ messages ready count for a specific queue: foo-queue.
It picks up the metric value correctly.
When inserting 2 messages it scales the deployment to the maximum 10 replicas.
I expect it to scale to 2 replicas since the targetValue is 1 and there are 2 messages ready.
Why does it scale so aggressively?
HPA configuration:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: foo-hpa
namespace: development
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: foo
minReplicas: 1
maxReplicas: 10
metrics:
- type: External
external:
metricName: "custom.googleapis.com|rabbitmq_queue_messages_ready"
metricSelector:
matchLabels:
metric.labels.queue: foo-queue
targetValue: 1
I think you did a great job explaining how targetValue works with HorizontalPodAutoscalers. However, based on your question, I think you're looking for targetAverageValue instead of targetValue.
In the Kubernetes docs on HPAs, it mentions that using targetAverageValue instructs Kubernetes to scale pods based on the average metric exposed by all Pods under the autoscaler. While the docs aren't explicit about it, an external metric (like the number of jobs waiting in a message queue) counts as a single data point. By scaling on an external metric with targetAverageValue, you can create an autoscaler that scales the number of Pods to match a ratio of Pods to jobs.
Back to your example:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: foo-hpa
namespace: development
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: foo
minReplicas: 1
maxReplicas: 10
metrics:
- type: External
external:
metricName: "custom.googleapis.com|rabbitmq_queue_messages_ready"
metricSelector:
matchLabels:
metric.labels.queue: foo-queue
# Aim for one Pod per message in the queue
targetAverageValue: 1
will cause the HPA to try keeping one Pod around for every message in your queue (with a max of 10 pods).
As an aside, targeting one Pod per message is probably going to cause you to start and stop Pods constantly. If you end up starting a ton of Pods and process all of the messages in the queue, Kubernetes will scale your Pods down to 1. Depending on how long it takes to start your Pods and how long it takes to process your messages, you may have lower average message latency by specifying a higher targetAverageValue. Ideally, given a constant amount of traffic, you should aim to have a constant number of Pods processing messages (which requires you to process messages at about the same rate that they are enqueued).
According to https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
From the most basic perspective, the Horizontal Pod Autoscaler controller operates on the ratio between desired metric value and current metric value:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
From the above I understand that as long as the queue has messages the k8 HPA will continue to scale up since currentReplicas is part of the desiredReplicas calculation.
For example if:
currentReplicas = 1
currentMetricValue / desiredMetricValue = 2/1
then:
desiredReplicas = 2
If the metric stay the same in the next hpa cycle currentReplicas will become 2 and desiredReplicas will be raised to 4
Try to follow this instruction that describes horizontal autoscale settings for RabbitMQ in k8s
Kubernetes Workers Autoscaling based on RabbitMQ queue size
In particular, targetValue: 20 of metric rabbitmq_queue_messages_ready is recommended instead of targetValue: 1:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: workers-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1beta1
kind: Deployment
name: my-workers
minReplicas: 1
maxReplicas: 10
metrics:
- type: External
external:
metricName: "custom.googleapis.com|rabbitmq_queue_messages_ready"
metricSelector:
matchLabels:
metric.labels.queue: myqueue
**targetValue: 20
Now our deployment my-workers will grow if RabbitMQ queue myqueue has more than 20 non-processed jobs in total
I'm using the same Prometheus metrics from RabbitMQ (I'm using Celery with RabbitMQ as broker).
Did anyone here considered using rabbitmq_queue_messages_unacked metric rather than rabbitmq_queue_messages_ready?
The thing is, that rabbitmq_queue_messages_ready is decreasing as soon the message pulled by a worker and I'm afraid that long-running task might be killed by HPA, while rabbitmq_queue_messages_unacked stays until the task completed.
For example, I have a message that will trigger a new pod (celery-worker) to run a task that will take 30 minutes. The rabbitmq_queue_messages_ready will decrease as the pod is running and the HPA cooldown/delay will terminate pod.
EDIT: seems like a third one rabbitmq_queue_messages is the right one - which is the sum of both unacked and ready:
sum of ready and unacknowledged messages - total queue depth
documentation

GCP kubernetes HPA not working as described

We have been running our workload on kubernetes on GCP for around a year now, however, last week, one of our websites was hit hard by a promo campaign launched by the site owner but without us prescaling the cluster.
Since then we've been load testing the stack on a test cluster, the issue we're having is that using a Google node, the HPA doesn't scale up as written in the documentation, rather it always scales
2 --> 4 -->8 -->16 -->32 --64--128
Regardless of CPU load.
For instance in our test we had 251%/60% cpu with 8 pods running. My math would suggest (251/60)*8= 33 pods are needed, however, it will always go to 16 next, then wait 3 mins before going to 32.
The scale out required for the test to function is around 64 pods, which it arrives at after about 25 mins, instead of 8, is there a way to get the GCP version operating more like the manual?
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
annotations:
autoscaling.alpha.kubernetes.io/conditions: '[{"type":"AbleToScale","status":"False","lastTransitionTime":"2017-11- 22T14:17:35Z","reason":"BackoffBoth","message":"the
time since the previous scale is still within both the downscale and upscale
forbidden windows"},{"type":"ScalingActive","status":"True","lastTransitionTime":"2017-11-21T19:09:34Z","reason":"ValidMetricFound","message":"the
HPA was able to succesfully calculate a replica count from cpu resource utilization
(percentage of request)"},{"type":"ScalingLimited","status":"False","lastTransitionTime":"2017-11-22T14:11:05Z","reason":"DesiredWithinRange","message":"the
desired replica count is within the acceptible range"}]'
autoscaling.alpha.kubernetes.io/current-metrics: '[{"type":"Resource","resource":{"name":"cpu","currentAverageUtilization":64,"currentAverageValue":"193m"}}]'
creationTimestamp: 2017-11-20T15:44:48Z
name: varnish-7
namespace: default
resourceVersion: "373498"
selfLink: /apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers/varnish-7
uid: bd60211b-ce09-11e7-af0d-42010a8e0099
spec:
maxReplicas: 60
minReplicas: 2
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: Deployment
name: varnish-7
targetCPUUtilizationPercentage: 50
status:
currentCPUUtilizationPercentage: 64
currentReplicas: 8
desiredReplicas: 8
lastScaleTime: 2017-11-22T14:16:05Z