Is there a way to speed up the horizontal pod autoscaler metrics scan? Now it takes two minutes to upscale new pods

Is there a way to speed up the horizontal pod autoscaler metrics scan? Now it takes two minutes to upscale new pods - kubernetes

I have an application with some endpoints that are quite CPU intensive. Because of that I have configured a Horizontal Pod Autoscaler like this:
spec:
maxReplicas: 10
minReplicas: 1
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: DeploymentConfig
name: some-app
targetCPUUtilizationPercentage: 30
The point is, supposing there's a request on a pod that keeps it working at 100% CPU for 5 mins. I takes two minutes until Openshift/Kubernetes schedules new pods.
Is there a way to speed up this process? It forces us to be almost unresponsive for two minutes.
The same thing happens to downscale, having to wait for two minutes until it destroys the unnecessary pods.
Ideally there should be some config option to set this up.

For OpenShift, Please modify /etc/origin/master/master-config.yaml
kubernetesMasterConfig:
controllerArguments:
horizontal-pod-autoscaler-downscale-delay: 2m0s
horizontal-pod-autoscaler-upscale-delay: 2m0s
and restart openshift master.

It's not a scalar, you should set it like this
kubernetesMasterConfig:
controllerArguments:
horizontal-pod-autoscaler-downscale-delay:
- 2m0s
horizontal-pod-autoscaler-upscale-delay:
- 2m0s
At least in OpenShift Origin v3.11

Related

How to make kubernetes cluster elastic?

Helo i am running a .NET application in Azure Kubernetes Services as a 3 pod cluster (1 pod per node).
I am trying to understand how can i make my cluster elastic depending on load ?
How can i configure the deployment.yaml so that after a certain % of the cpu utilization and/or % of memory per pod it spawns another pod? The same thing when load decreases, how do i shut down instances.
Is there any guide/tutorial to set this up based on percentage (ideally) ?

The basic feature you need to use is called HorizontalPodAutoscaler or for short HPA. There you can configure cpu or memory limits and if the limit is exceeded, the pod replica number will be increased. E.g. from this walkthrough:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
This will scale out the php-apache deployment, as soon as the pods cpu utilization is greater than 50 %. Be aware that calculating the resource utilization and the resulting number of replicas is not as intuitive, as it might seam. Also see docs (the whole page should be quite interesting too). You can also combine criteria for scale out.
There are also addons that help you scale based on other parameters, like the number of messages in a queue. Check out keda, they provide different scalers, like RabbitMQ, Kafka, AWS CloudWatch, Azure Monitor, etc.
And since you wrote
1 pod per node
you might be running a DaemonSet. In that case your only option to scale out would be to add additional nodes, since with daemonsets there is always exactly one pod per node. If that's the case you could think about using a Deployment combined with a PodAntiAffinity instead, see docs. By that you can configure pods to preferably run on nodes where pods of the same deployment are not running yet, e.g.:
[...]
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S2
topologyKey: topology.kubernetes.io/zone
[...]
From docs:
The anti-affinity rule says that the scheduler should try to avoid scheduling the Pod onto a node that is in the same zone as one or more Pods with the label security=S2. More precisely, the scheduler should try to avoid placing the Pod on a node that has the topology.kubernetes.io/zone=R label if there are other nodes in the same zone currently running Pods with the Security=S2 Pod label.
That would make scale out more flexible as it is with a daemonset, yet you have a similar effect of pods being equally distributed through out the cluster.
If you want/need to stick to a daemonset you can check out the AKS Cluster Autoscaler, that can be used to automatically add/remove additional nodes from your cluster, based on resource consumption.

Horizontal Auto-scaling based on other pod healthcheck

Is it possible to scale down a pod to 0 replicas when other pod is down?I'm familiar with the basics of the Horizontal Auto-Scaling concept, but as I understand it scales pod up or down only when demands for resources (CPU, memory) changes.
My CI pipeline follows a green/blue pattern, so when the new version of the application is being deployed the second one is scaled down to 0 replicas, leaving other pods belonging to the same environment up wasting resources. Do you have any idea how to solve it using kubernetes or helm features?
Thanks

If you have a CI pipeline you can just run the kubectl command and scale down the deployment before deploying the blue-green this way no resource wasting will be there.
However yes, you can scale UP/DOWN the deployment or application based on the custom metrics.
i would recommend you checking out Cloud-native project Keda : https://keda.sh/
Keda:
KEDA is a Kubernetes-based Event Driven Autoscaler. With KEDA, you can
drive the scaling of any container in Kubernetes based on the number
of events needing to be processed.
Example
apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
name: {scaled-object-name}
spec:
scaleTargetRef:
deploymentName: {deployment-name} # must be in the same namespace as the ScaledObject
containerName: {container-name} #Optional. Default: deployment.spec.template.spec.containers[0]
pollingInterval: 30 # Optional. Default: 30 seconds
cooldownPeriod: 300 # Optional. Default: 300 seconds
minReplicaCount: 0 # Optional. Default: 0
maxReplicaCount: 100 # Optional. Default: 100
triggers:
# {list of triggers to activate the deployment}
Scale object ref : https://keda.sh/docs/1.4/concepts/scaling-deployments/#scaledobject-spec

How to scale up all OpenShift pods before scaling down old ones

I have a basic OpenShift deployment configuration:
kind: DeploymentConfig
spec:
replicas: 3
strategy:
type: Rolling
Additionaly I've put:
maxSurge: 3
maxUnavailable: 0%
because I want to scale up all new pods first and after that scale down old pods (so there will be 6 pods running during deploymentm that's why I decided to set up maxSurge).
I want to have all old pods running until all new pods are up but with this set of parameters there is something wrong. During deployment:
all 3 new pods are initialized at once and are trying to start, old pods are running (as expected)
if first new pod started sucessfully then the old one is terminated
if second new pod is ready then another old pod is terminated
I want to terminate all old pods ONLY if all new pods are ready to handle requests, otherwise all the old pods should handle requests.
What did I miss in this confgiuration?

The behavior you document is expected for a deployment rollout (that OpenShift will shut down each old pod as a new pod becomes ready). It will also start routing traffic to the new nodes as they become available, which you say that you don't want either.
A service is pretty much by definition going to route to pods as they are available. And a deployment pretty much handles pods independently, so I don't believe that anything will really give you the behavior you are looking for there either.
If you want a blue green style deployment like you describe, you are essentially going to have deploy the new pods as a separate deployment. Then once the new deployment is completely up, you can change the corresponding service to point at the new pods. Then you can shut down the old deployment.
Service Mesh can help with some of that. So could an operator. Or you could do it manually.

You can combine the rollout strategy with readiness checks with an initial delay to ensure that all the new pods have time to start up before the old ones are all shut down at the same time.
In the case below, the new 3 pods will be spun up (for a total of 6 pods) and then after 60 seconds, the readiness check will occur and the old pods will be shut down. You would just want to adjust your readiness delay to a large enough timeframe to give all of your new pods time to start up.
apiVersion: v1
kind: DeploymentConfig
spec:
replicas: 3
strategy:
rollingParams:
maxSurge: 3
maxUnavailable: 0
type: Rolling
template:
spec:
containers:
- readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8099
initialDelaySeconds: 60

Horizontal pod Autoscaler scales custom metric too aggressively on GKE

I have the below Horizontal Pod Autoscaller configuration on Google Kubernetes Engine to scale a deployment by a custom metric - RabbitMQ messages ready count for a specific queue: foo-queue.
It picks up the metric value correctly.
When inserting 2 messages it scales the deployment to the maximum 10 replicas.
I expect it to scale to 2 replicas since the targetValue is 1 and there are 2 messages ready.
Why does it scale so aggressively?
HPA configuration:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: foo-hpa
namespace: development
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: foo
minReplicas: 1
maxReplicas: 10
metrics:
- type: External
external:
metricName: "custom.googleapis.com|rabbitmq_queue_messages_ready"
metricSelector:
matchLabels:
metric.labels.queue: foo-queue
targetValue: 1

I think you did a great job explaining how targetValue works with HorizontalPodAutoscalers. However, based on your question, I think you're looking for targetAverageValue instead of targetValue.
In the Kubernetes docs on HPAs, it mentions that using targetAverageValue instructs Kubernetes to scale pods based on the average metric exposed by all Pods under the autoscaler. While the docs aren't explicit about it, an external metric (like the number of jobs waiting in a message queue) counts as a single data point. By scaling on an external metric with targetAverageValue, you can create an autoscaler that scales the number of Pods to match a ratio of Pods to jobs.
Back to your example:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: foo-hpa
namespace: development
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: foo
minReplicas: 1
maxReplicas: 10
metrics:
- type: External
external:
metricName: "custom.googleapis.com|rabbitmq_queue_messages_ready"
metricSelector:
matchLabels:
metric.labels.queue: foo-queue
# Aim for one Pod per message in the queue
targetAverageValue: 1
will cause the HPA to try keeping one Pod around for every message in your queue (with a max of 10 pods).
As an aside, targeting one Pod per message is probably going to cause you to start and stop Pods constantly. If you end up starting a ton of Pods and process all of the messages in the queue, Kubernetes will scale your Pods down to 1. Depending on how long it takes to start your Pods and how long it takes to process your messages, you may have lower average message latency by specifying a higher targetAverageValue. Ideally, given a constant amount of traffic, you should aim to have a constant number of Pods processing messages (which requires you to process messages at about the same rate that they are enqueued).

According to https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
From the most basic perspective, the Horizontal Pod Autoscaler controller operates on the ratio between desired metric value and current metric value:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
From the above I understand that as long as the queue has messages the k8 HPA will continue to scale up since currentReplicas is part of the desiredReplicas calculation.
For example if:
currentReplicas = 1
currentMetricValue / desiredMetricValue = 2/1
then:
desiredReplicas = 2
If the metric stay the same in the next hpa cycle currentReplicas will become 2 and desiredReplicas will be raised to 4

Try to follow this instruction that describes horizontal autoscale settings for RabbitMQ in k8s
Kubernetes Workers Autoscaling based on RabbitMQ queue size
In particular, targetValue: 20 of metric rabbitmq_queue_messages_ready is recommended instead of targetValue: 1:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: workers-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1beta1
kind: Deployment
name: my-workers
minReplicas: 1
maxReplicas: 10
metrics:
- type: External
external:
metricName: "custom.googleapis.com|rabbitmq_queue_messages_ready"
metricSelector:
matchLabels:
metric.labels.queue: myqueue
**targetValue: 20
Now our deployment my-workers will grow if RabbitMQ queue myqueue has more than 20 non-processed jobs in total

I'm using the same Prometheus metrics from RabbitMQ (I'm using Celery with RabbitMQ as broker).
Did anyone here considered using rabbitmq_queue_messages_unacked metric rather than rabbitmq_queue_messages_ready?
The thing is, that rabbitmq_queue_messages_ready is decreasing as soon the message pulled by a worker and I'm afraid that long-running task might be killed by HPA, while rabbitmq_queue_messages_unacked stays until the task completed.
For example, I have a message that will trigger a new pod (celery-worker) to run a task that will take 30 minutes. The rabbitmq_queue_messages_ready will decrease as the pod is running and the HPA cooldown/delay will terminate pod.
EDIT: seems like a third one rabbitmq_queue_messages is the right one - which is the sum of both unacked and ready:
sum of ready and unacknowledged messages - total queue depth
documentation

How to prevent Kubernetes horizontal auto-scaler from scaling down?

I have created a horizontal auto-scaler based on the cpu usage and it works fine. I want to know how I can configure the autoscaler in a way that it just scales up without scaling down? The reason I want such a thing is when I have high load/request I create some operators but I want to keep them alive even if for some amount of time they don't do anything but auto-scaler kills the pods and scaling down to the minimum replicas after sometime if there is no load.
My autoscaler:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: gateway
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: gateway
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 20
Edit:
By operator, I mean small applications/programs that are running in a pod.

You can add --horizontal-pod-autoscaler-downscale-stabilization flag to kube-controller-manager as described in docs. Default delay is set to 5 minutes.
To add flag to kube-controller-manager edit /etc/kubernetes/manifests/kube-controller-manager.yaml on master node, pod will be then recreated.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse