Track time taken to scale up in Kubernetes using HPA and CA - kubernetes

I am trying to track and monitor, how much time does a pod take to come online/healthy/Running.
I am using EKS. And I have got HPA and cluster-autoscaler installed on my cluster.
Let's say I have a deployment with HorizontalPodAutoscaler scaling policy with 70% targetAverageUtilization.
So whenever the average utilization of deployment will go beyond 70%, HPA will trigger to create new POD. Now, based on different factors, like if nodes are available or not, and if not is already available, then the image needs to be downloaded or is it present on cache, the scaling can take from few seconds to few minutes to come up.
I want to track this time/duration, every time the POD is scheduled, how much time does it take to come to Running state. Any suggestions?
Or any direction where I should be looking at.
I found this Cluster Autoscaler Visibility Logs, but this is only available in GCE.
I am looking for any solution, can be out-of-the-box integration, or raising events and storing them in some time-series DB or scraping data from Prometheus. But I couldn't find any solution for this till now.
Thanks in advance.

There is nothing out of the box for this, you will need to build something yourself.

Related

With GKE Autopilot banning the cluster-autoscaler.kubernetes.io/safe-to-evict=false annotation, is there a way to ensure job pods do not get evicted?

Our GKE Autopilot cluster was recently upgraded to version 1.21.6-gke.1503, which apparently causes the cluster-autoscaler.kubernetes.io/safe-to-evict=false annotation to be banned.
I totally get this for deployments, as Google doesn't want a deployment preventing scale-down, but for jobs I'd argue this annotation makes perfect sense in certain cases. We start complex jobs that start and monitor other jobs themselves, which makes it hard to make them restart-resistant given the sheer number of moving parts.
Is there any way to make it as unlikely as possible for job pods to be restarted/moved around when using Autopilot? Prior to switching to Autopilot, we used to make sure our jobs filled a single node by requesting all of its available resources; combined with a Guaranteed QoS class, this made sure the only way for a pod to be evicted was if the node somehow failed, which almost never happened. Now all we seem to have left is the Guaranteed QoS class, but that doesn't prevent pods from being evicted.
At this point the only thing left is to ask to bring back this feature on IssueTracker - raise a new feature reqest and hope for the best.
Link to this thread also as it contains quite a lot of troubleshooting and may be useful.

GKE Autoscaling: How do I tell the autoscaler to remove older pods first? (FILO insteado FIFO)

There is a small memory leak in our application. For certain business reasons we do not have the resources to fix this memory leak. Instead, it would be better if our pods were deleted or scaled out after a certain period.
Rather than debugging this memory leak would it be possible to change the Google Kubernetes Engine autoscaling profile to scale down by removing older pods instead of newer pods first? Essentially, I am looking for a "First In Last Out" method of scaling down pods instead of a "First In First Out" method, which is what GKE currently uses (from my understanding) when autoscaling.
Is this possible? I'm not finding anything about this in the documentation. Thank you!
Scale-down in cluster-autoscaler isn't really either of those. It's looking for nodes with low utilization and simulating if those pods were evicted would the cluster have enough capacity. In practice FIFO or close to it is common because newer pods end up on newer nodes and those have less utilization. But you can use tools like Descheduler to help balance stuff out a bit.

Way to configure notifications/alerts for a kubernetes pod which is reaching 90% memory and which is not exposed to internet(backend microservice)

I am currently working on a solution for alerts/notifications where we have microservices deployed on kubernetes in a way of frontend and back end services. There has been multiple occasions where backend services are not able to restart or reach a 90% allocated pod limit, if they encounter memory exhaust. To identify such pods we want an alert mechanism to lookin when they fail or saturation level. We have prometheus and grafana as monitoring services but are not able to configure alerts, as i have quite a limited knowledge in these, however any suggestions and references provided where i can have detailed way on achieving this will be helpful. Please do let me know
I did try it out on the internet for such ,but almost all are pointing to node level ,cluster level monitoring only. :(
enter image description here
The Query used to check the memory usage is :
sum (container_memory_working_set_bytes{image!="",name=~"^k8s_.*",namespace=~"^$namespace$",pod_name=~"^$deployment-[a-z0-9]+-[a-z0-9]+"}) by (pod_name)
I saw this recently on google. It might be helpful to you. https://groups.google.com/u/1/g/prometheus-users/c/1n_z3cmDEXE?pli=1

GCP Kubernetes spreading pods across nodes instead of filling available resources

I have a few kubefiles defining Kubernetes services and deployments. When I create a cluster of 4 nodes on GCP (never changes), all the small kube-system pods are spread across the nodes instead of filling one at a time. Same with the pods created when I apply my kubefiles.
The problem is sometimes I have plenty of available total CPU for a deployment, but its pods can't be provisioned because no single node has that much free. It's fragmented, and it would obviously fit if the kube-system pods all went into one node instead of being spread out.
I can avoid problems by using bigger/fewer nodes, but I feel like I shouldn't have to do that. I'd also rather not deal with pod affinity settings for such a basic testing setup. Is there a solution to this, maybe a setting to have it prefer filling nodes in order? Like using an already opened carton of milk instead of opening a fresh one each time.
Haven't tested this, but the order I apply files in probably matters, meaning applying the biggest CPU users first could help. But that seems like a hack.
I know there's some discussion on rescheduling that gets complicated because they're dealing with a dynamic node pool, and it seems like they don't have it ready, so I'm guessing there's no way to have it rearrange my pods dynamically.
You can write your own scheduler. Almost all components in k8s are replaceable.
I know you won't. If you don't want to deal with affinity, you def won't write your own scheduler. But know that you have that option.
With GCP native, try to have all your pods with resource request and limits set up.

Auto scale kubernetes pods based on downstream api results

I have seen HPA can be scaled based on CPU usage. That is super cool.
However, the scenario I have is: the stateful app (container in pod) is one to one mapping based on the downstream API results. For example, the downstream api results return maximum and expected capacity like {response: 10}. I would like to see replicaSet or statefulSet or other kubernetes controller can obtain this value and auto scale the pods to 10. Unfortunately, the pod replicas is hardcoded in the yaml file.
If I am doing it manually, I think I can do it via running start a scheduler. The job of the scheduler is to watch the api and run the kubectl scale command based on the downstream api results. This can be error prone and there is another system I need to maintain. I guess this logic should belong to a kubernetes controller ?
May I ask has someone done this stuff before and what is the way to configure it ?
Thanks in advance.
Unfortunately, it is not possible to use an HPA in that mode, but your conception about how to scale is right.
HPA is designed to analyze metrics and decide how many pods need to be spawned based on those metrics. It is using scaling rules and can only spawn pods one by one based on the result of its decision.
Moreover, it using standard Kubernetes API for scale pods.
Because a logic of HPA is already in your application, you can use the same API to scale your pods. Btw, kubectl scale is using the same way to interact with a cluster.
So, you can use i.e. Cronjob, with a small application which will call API of your application every 5 minutes and call kubectl scale with proper name of deployment to scale your app.
But, please keep in mind, you need to somehow control the frequency of up- and downscaling of pods, it will make your application more stable. That’s why I think that scaling not more often than once per 5 minutes is OK, but trying to do it every minute generally is not the best idea.
And of course, you can create a daemon and run it using Deployment, but I think Cronjob solution is more easy and faster to implement.