Kubernetes, HPA scales up with every deployment - kubernetes

I have an EKS kubernetes cluster i setup on HPA to be able to scaleup if there is any traffic, but there is unexpected behavior happens with every Deployment. HPA scales up to the maximum number of pods then after 5 minites it scales down again
After so many searches I found that there is a cpu spike happens after the app is redeployed again and this spike takes only mili seconds that’s why it might scale.
So, Do you have an Idea how to prevent this spike from happening or just I want to disable the HPA while deploying or delaying the controller manger to scaleup for example after 1 minute no the default value
Thanks

This is a known issue. There is a bug with hpa implementation. You can check it out here -
Issues - https://github.com/kubernetes/kubernetes/issues/78712 and https://github.com/kubernetes/kubernetes/issues/72775
The Fix is rolled out in version 1.16+ - https://github.com/kubernetes/kubernetes/pull/79035
I was facing the same issue. I have implemented a workaround for the issue. The small script which works like a charm. The steps are -
Set the HPA max to current replicas
Set the image
Wait for deployment to be completed or specified time
Set the max HPA to the original number
https://gist.github.com/shukla2112/035d4e633dded6441f5a9ba0743b4f08
it's bash script, if you are using the Jenkins for deployment, you can simply integrate it or use it independently by accommodating within your deployment workflow.
The fix would be rolled out in - 1.16+https://github.com/kubernetes/kubernetes/pull/79035version #79035 (comment)

Related

Kubernetes vertical-pod-autoscaler without Updater and how to use it with benchmarks

Good morning all. I am working now with Kubernetes Vertical Pod Autoscaler (VPA) https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler and I'd like to use it to optimise resource consumption on the cluster. There's around 40+ microservices deployed, I can easily, in a shell script, automate creation of the VPA objects and I get recommendations. I cannot use the Admission Controller on a cluster - this is not deployed.
Do I understand the documentation correctly that Updater part needs the Admission Controller to work and w/o that, no eviction / update is done at all?
https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/pkg/updater/README.md
Updater runs in Kubernetes cluster and decides which pods should be restarted based on resources allocation recommendation calculated by Recommender. (...) Updater does not perform the actual resources update, but relies on Vertical Pod Autoscaler admission plugin to update pod resources when the pod is recreated after eviction.
Another question that I have is a difference between updatePolicy.updateMode: Auto and Off. How they are different in a deployment where Updater is not used?
Again, documentation / VPA code says:
// UpdateModeOff means that autoscaler never changes Pod resources.
// The recommender still sets the recommended resources in the
// VerticalPodAutoscaler object. This can be used for a "dry run".
UpdateModeOff UpdateMode = "Off"
// UpdateModeInitial means that autoscaler only assigns resources on pod
// creation and does not change them during the lifetime of the pod.
UpdateModeInitial UpdateMode = "Initial"
My VPA cannot evict / recreate a pod, no admission controller. So I should use the Off mode, if I understand correctly? Recreating a pod (scale -> 0, scale -> 1) recreates with resources defined the same as in Deployment, that's in my opinion expected. Is it any difference if I use Off or Initial?
And the last question, as I cannot use the Updater part of VPA, I've set up the --history-length=14d of Prometheus to 2 weeks, I can on demand run performance tests. Should I point VPA to an environment with more steady load or rather can use DEV-like environment with artificial load (via jMeter in my case) to get the recommendations. Should VPA be deployed on the same time as the load tests are being run to get the live metrics for recommendation?
Thank you for your responses.

Is Interrupt functionality is availabe in Kubernetes minikube autoscaling?

I have used custom metrics API and have successfully auto scaled the service based on some metrics.But here is the point the auto scaling works in way that the minikube's HPA(horizontal pod autoscaler) will check the particular URL/API and try to find out the metric value repetitively with some polling period.For example HPA will check for the value for every 15 seconds.So this is continuous polling from HPA to the URL/API to fetch the value of that metric.After that it will simply compare the value with the target reference value given and try to scale.
What I want is, the API/URL should trigger the minikube HPA whenever needed, it's like HPA should work as a interrupt here in simple words.
call for autoscale should be from Service/API to HPA not from HPA to Service !
is this feature available in Kubernetes ? or do you have any comments in this scenario ? please share your view on this I am currently in last stage and this question is stopping my progress!

Kubernetes Deployment Terminate Oldest Pod

I'm using Azure Kubernetes Service to run a Go application that pulls from RabbitMQ, runs some processing, and returns it. The pods scale to handle an increase of jobs. Pretty run-of-the-mill stuff.
The HPA is setup like this:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
production Deployment/production 79%/80% 2 10 10 4d11h
staging Deployment/staging 2%/80% 1 2 1 4d11h
What happens is as the HPA scales up and down, there are always 2 pods that will stay running. We've found that after running for so long, the Go app on those pods will time out. Sometimes that's days, sometimes it weeks. Yes, we could probably dig into the code and figure this out, but it's kind of a low priority for that team.
Another solution I've thought of would be to have the HPA remove the oldest pods first. This would mean that the oldest pod would never be more than a few hours old. A first-in, first-out model.
However, I don't see any clear way to do that. It's entirely possible that it isn't, but it seems like something that could work.
Am I missing something? Is there a way to make this work?
In my opinion(I also mentioned in comment) - the most simple(not sure about elegance) way is to have some cronjob that will periodically clean timed out pods.
One CronJob object is like one line of a crontab (cron table) file. It
runs a job periodically on a given schedule, written in Cron format.
CronJobs are useful for creating periodic and recurring tasks, like
running backups or sending emails. CronJobs can also schedule
individual tasks for a specific time, such as scheduling a Job for
when your cluster is likely to be idle.
Cronjobs examples and howto:
How To Use Kubernetes’ Job and CronJob
Kubernetes: Delete pods older than X days
https://github.com/dignajar/clean-pods <-- real example

Kubernetes Deployment Rolling Updates

I have an application that I deploy on Kubernetes.
This application has 4 replicas and I'm doing a rolling update on each deployment.
This application has a graceful shutdown which can take tens of minutes (it has to wait for running tasks to finish).
My problem is that during updates, I have over-capacity since all the older version pods are stuck at "Terminating" status while all the new pods are created.
During the updates, I end up running with 8 containers and it is something I'm trying to avoid.
I tried to set maxSurge to 0, but this setting doesn't take into consideration the "Terminating" pods, so the load on my servers during the deployment is too high.
The behaviour I'm trying to get is that new pods will only get created after the old version pods finished successfully, so at all times I'm not exceeding the number of replicas I set.
I wonder if there is a way to achieve such behaviour.
What I ended up doing is creating a StatefulSet with podManagementPolicy: Parallel and updateStrategy to OnDelete.
I also set terminationGracePeriodSeconds to the maximum time it takes for a pod to terminate.
As a part of my deployment process, I apply the new StatefulSet with the new image and then delete all the running pods.
This way all the pods are entering Terminating state and whenever a pod finished its task and terminated a new pod with the new image will replace it.
This way I'm able to keep a static number of replicas during the whole deployment process.
Let me suggest the following strategy:
Deployments implement the concept of ready pods to aide rolling updates. Readiness probes allow the deployment to gradually update pods while giving you the control to determine when the rolling update can proceed.
A Ready pod is one that is considered successfully updated by the Deployment and will no longer count towards the surge count for deployment. A pod will be considered ready if its readiness probe is successful and spec.minReadySeconds have passed since the pod was created. The default for these options will result in a pod that is ready as soon as its containers start.
So, what you can do, is implement (if you haven't done so yet) a readiness probe for your pods in addition to setting the spec.minReadySeconds to a value that will make sense (worst case) to the time that it takes for your pods to terminate.
This will ensure rolling out will happen gradually and in coordination to your requirements.
In addition to that, don't forget to configure a deadline for the rollout.
By default, after the rollout can’t make any progress in 10 minutes, it’s considered as failed. The time after which the Deployment is considered failed is configurable through the progressDeadlineSeconds property in the Deployment spec.

HPA scale-down-kubernetes pods

My Requirement is Scale up PODS on Custom metrics like pending messages from queue increases pods has to increase to process jobs. In kubernetes Scale up is working fine with prometheus adapter & prometheus operator.
I have long running process in pods, but HPA checks the custom metrics and try to scale down, Due to this process killing mid of operations and loosing that message. How i can control the HPA kill only free pods where no process is running.
AdapterService to collect custom metrics
seriesQuery: '{namespace="default",service="hpatest-service"}'
resources:
overrides:
namespace:
resource: "namespace"
service:
resource: "service"
name:
matches: "msg_consumergroup_lag"
metricsQuery: 'avg_over_time(msg_consumergroup_lag{topic="test",consumergroup="test"}[1m])'
HPA Configuration
type: Object
object:
describedObject:
kind: Service
name: custommetric-service
metric:
name: msg_consumergroup_lag
target:
type: Value
value: 2
At present the HPA cannot be configured to accommodate workloads of this nature. The HPA simply sets the replica count on the deployment to a desired value according to the scaling algorithm, and the deployment chooses one or more pods to terminate.
There is a lot of discussion on this topic in this Kubernetes issue that may be of interest to you. It is not solved by the HPA, and may never be. There may need to be a different kind of autoscaler for this type of workload. Some suggestions are given in the link that may help you in defining one of these.
If I was to take this on myself, I would create a new controller, with corresponding CRD containing a job definition and the scaling requirements. Instead of scaling deployments, I would have it launch jobs. I would have the jobs do their work (process the queue) until they became idle (no items in the queue) then exit. The controller would only scale up, by adding jobs, never down. The jobs themselves would scale down by exiting when the queue is empty.
This would require that your jobs be able to detect when they become idle, by checking the queue and exiting if there is nothing there. If your queue read blocks forever, this would not work and you would need a different solution.
The kubebuilder project has an excellent example of a job controller. I would start with that and extend it with the ability to check your published metrics and start the jobs accordingly.
Also see Fine Parallel Processing Using a Work Queue in the Kubernetes documentation.
I will suggest and idea here , You can run a custom script to disable HPA as soon as it scales up and the script should keep checking the resource and process and when no process enable HPA and scale down , or kill the pods using kubectl command and enable HPA back.
I had similar use case to scale the deployments based on the queue length, I used KEDA (keda.sh), it does exactly that. Just know that it will scale down the additional pods created for that deployment even if the pod is currently processing the data/input - you will have to configure the cooldown parameter to scale down appropriately.
KEDA ScaledJobs are best for such scenarios and can be triggered through Queue, Storage, etc. (the currently available scalers can be found here). The ScaledJobs are not killed in between the execution and are recommended for long running executions.