I have used custom metrics API and have successfully auto scaled the service based on some metrics.But here is the point the auto scaling works in way that the minikube's HPA(horizontal pod autoscaler) will check the particular URL/API and try to find out the metric value repetitively with some polling period.For example HPA will check for the value for every 15 seconds.So this is continuous polling from HPA to the URL/API to fetch the value of that metric.After that it will simply compare the value with the target reference value given and try to scale.
What I want is, the API/URL should trigger the minikube HPA whenever needed, it's like HPA should work as a interrupt here in simple words.
call for autoscale should be from Service/API to HPA not from HPA to Service !
is this feature available in Kubernetes ? or do you have any comments in this scenario ? please share your view on this I am currently in last stage and this question is stopping my progress!
Related
Good morning all. I am working now with Kubernetes Vertical Pod Autoscaler (VPA) https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler and I'd like to use it to optimise resource consumption on the cluster. There's around 40+ microservices deployed, I can easily, in a shell script, automate creation of the VPA objects and I get recommendations. I cannot use the Admission Controller on a cluster - this is not deployed.
Do I understand the documentation correctly that Updater part needs the Admission Controller to work and w/o that, no eviction / update is done at all?
https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/pkg/updater/README.md
Updater runs in Kubernetes cluster and decides which pods should be restarted based on resources allocation recommendation calculated by Recommender. (...) Updater does not perform the actual resources update, but relies on Vertical Pod Autoscaler admission plugin to update pod resources when the pod is recreated after eviction.
Another question that I have is a difference between updatePolicy.updateMode: Auto and Off. How they are different in a deployment where Updater is not used?
Again, documentation / VPA code says:
// UpdateModeOff means that autoscaler never changes Pod resources.
// The recommender still sets the recommended resources in the
// VerticalPodAutoscaler object. This can be used for a "dry run".
UpdateModeOff UpdateMode = "Off"
// UpdateModeInitial means that autoscaler only assigns resources on pod
// creation and does not change them during the lifetime of the pod.
UpdateModeInitial UpdateMode = "Initial"
My VPA cannot evict / recreate a pod, no admission controller. So I should use the Off mode, if I understand correctly? Recreating a pod (scale -> 0, scale -> 1) recreates with resources defined the same as in Deployment, that's in my opinion expected. Is it any difference if I use Off or Initial?
And the last question, as I cannot use the Updater part of VPA, I've set up the --history-length=14d of Prometheus to 2 weeks, I can on demand run performance tests. Should I point VPA to an environment with more steady load or rather can use DEV-like environment with artificial load (via jMeter in my case) to get the recommendations. Should VPA be deployed on the same time as the load tests are being run to get the live metrics for recommendation?
Thank you for your responses.
I am trying to understand the metrics emited by argo workflow but their explination isn't helping enough:
For example
argo_workflows_pods_count
It is possible for a workflow to start, but no pods be running (e.g.
cluster is too busy to run them). This metric sheds light on actual
work being done.
Does it mean the count of all the running pods for all the workflows (if this is the case, then, at least for me, doesn't seem correct) from all the namespaces?
There is a difference between this metric and kubernetes_state.pod.* metrics (which would give me the pods with different states, eg: running)?
Enabling and scraping the endpoint shows the following data exposed:
# HELP argo_workflows_pods_count Number of Pods from Workflows currently accessible by the controller by status (refreshed every 15s)
# TYPE argo_workflows_pods_count gauge
argo_workflows_pods_count{status="Pending"} 0
argo_workflows_pods_count{status="Running"} 0
As we are querying the workflow controller here and there are no additional labels attached to the metric, we can assume that this is indeed the total number of pods created by Argo. However, this is not necessarily the same as kubernetes_state.pod.* as this will also include pods created by other processes.
My Requirement is Scale up PODS on Custom metrics like pending messages from queue increases pods has to increase to process jobs. In kubernetes Scale up is working fine with prometheus adapter & prometheus operator.
I have long running process in pods, but HPA checks the custom metrics and try to scale down, Due to this process killing mid of operations and loosing that message. How i can control the HPA kill only free pods where no process is running.
AdapterService to collect custom metrics
seriesQuery: '{namespace="default",service="hpatest-service"}'
resources:
overrides:
namespace:
resource: "namespace"
service:
resource: "service"
name:
matches: "msg_consumergroup_lag"
metricsQuery: 'avg_over_time(msg_consumergroup_lag{topic="test",consumergroup="test"}[1m])'
HPA Configuration
type: Object
object:
describedObject:
kind: Service
name: custommetric-service
metric:
name: msg_consumergroup_lag
target:
type: Value
value: 2
At present the HPA cannot be configured to accommodate workloads of this nature. The HPA simply sets the replica count on the deployment to a desired value according to the scaling algorithm, and the deployment chooses one or more pods to terminate.
There is a lot of discussion on this topic in this Kubernetes issue that may be of interest to you. It is not solved by the HPA, and may never be. There may need to be a different kind of autoscaler for this type of workload. Some suggestions are given in the link that may help you in defining one of these.
If I was to take this on myself, I would create a new controller, with corresponding CRD containing a job definition and the scaling requirements. Instead of scaling deployments, I would have it launch jobs. I would have the jobs do their work (process the queue) until they became idle (no items in the queue) then exit. The controller would only scale up, by adding jobs, never down. The jobs themselves would scale down by exiting when the queue is empty.
This would require that your jobs be able to detect when they become idle, by checking the queue and exiting if there is nothing there. If your queue read blocks forever, this would not work and you would need a different solution.
The kubebuilder project has an excellent example of a job controller. I would start with that and extend it with the ability to check your published metrics and start the jobs accordingly.
Also see Fine Parallel Processing Using a Work Queue in the Kubernetes documentation.
I will suggest and idea here , You can run a custom script to disable HPA as soon as it scales up and the script should keep checking the resource and process and when no process enable HPA and scale down , or kill the pods using kubectl command and enable HPA back.
I had similar use case to scale the deployments based on the queue length, I used KEDA (keda.sh), it does exactly that. Just know that it will scale down the additional pods created for that deployment even if the pod is currently processing the data/input - you will have to configure the cooldown parameter to scale down appropriately.
KEDA ScaledJobs are best for such scenarios and can be triggered through Queue, Storage, etc. (the currently available scalers can be found here). The ScaledJobs are not killed in between the execution and are recommended for long running executions.
I have an EKS kubernetes cluster i setup on HPA to be able to scaleup if there is any traffic, but there is unexpected behavior happens with every Deployment. HPA scales up to the maximum number of pods then after 5 minites it scales down again
After so many searches I found that there is a cpu spike happens after the app is redeployed again and this spike takes only mili seconds that’s why it might scale.
So, Do you have an Idea how to prevent this spike from happening or just I want to disable the HPA while deploying or delaying the controller manger to scaleup for example after 1 minute no the default value
Thanks
This is a known issue. There is a bug with hpa implementation. You can check it out here -
Issues - https://github.com/kubernetes/kubernetes/issues/78712 and https://github.com/kubernetes/kubernetes/issues/72775
The Fix is rolled out in version 1.16+ - https://github.com/kubernetes/kubernetes/pull/79035
I was facing the same issue. I have implemented a workaround for the issue. The small script which works like a charm. The steps are -
Set the HPA max to current replicas
Set the image
Wait for deployment to be completed or specified time
Set the max HPA to the original number
https://gist.github.com/shukla2112/035d4e633dded6441f5a9ba0743b4f08
it's bash script, if you are using the Jenkins for deployment, you can simply integrate it or use it independently by accommodating within your deployment workflow.
The fix would be rolled out in - 1.16+https://github.com/kubernetes/kubernetes/pull/79035version #79035 (comment)
We are using prometheus adapter to allow for custom metric auto scaling in k8s. We have everything setup where the HPA is getting the metric but it is exhibiting improper scale up even when the metric is below the target.
Polling the metric api server we can see the correct metric.
We can see the metric in prometheus over time where value never gets above 25% but scaling of the container still happens from 1-4.
We can see where the pod has a target value of 100 but has shown history of scaling from 1 to 4.
And the config for rule in prometheus adapter.
Pod Helm Config
Any help or direction to check next would be great. The expected result is we should not see any autoscaling as the target, 100, is always greater then the metric.
UPDATE 1: Ran on watch and got this
with this coming from prometheus -v 8