I'm running a Kubernetes cluster (GCP) with 10 deployments. Each deployment is configured to auto scale on stress.
From my website statistics, I found that Monday is the day with the most load. I want to define Kubernetes deployment to have more min-replicas on this day.
Is this possible?
I read somewhere that I can run a cronjob script before and after this day and change the minimum number of machines. Is that the current way to do it? Is this safe? what if the cronjob wasn't fired? If this is the current way, please link me for some instruction how to do it.
Thanks!
You seem to be talking of two things here.
Pod autoscaling (Add more pods when load on existing pods increases) : HPA will help with this. If your workloads show a spike in CPU or memory and can handle horizontal scaling, then HPA would work fine.
Example : https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/
Now HPA can increase pods only if the cluster has enough nodes to schedule them.
If it is desired to have more nodes with more traffic and reduce them when traffic is low, a cluster autoscaler could be a good option.
https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler
Ofcourse, the scaling of nodes is not instantaneous as the autoscaler watches for pods which are in Pending state due to resource level constraints. After this, it requests additional nodes from the cloud provider and once these nodes join the cluster, workloads get scheduled.
Related
there is a case in my hands right now. I have a deployment in my AWS EKS 1.21 cluster with 3 replicas. Under high traffic, this app can scale up to 15-20 pods. My problem is, sometimes when traffic is low, all pods can be scheduled on the same node, and when there is closing due to cluster autoscaler resizing, all pod replicas are closing and opening at the same time. I want to prevent this. But I don't want to do NodeAffinity with a number due to control node count as well. What I am trying to achieve is to prevent lets say more than %33 of total deployment replicas cannot be scheduled on the same node. I did my research but I don't find any component working with percentage.
Is there a possibility to accomplish this?
Any help/recommendation will be highly appreciated. Thank you very much.
Some context: I have multiple cron jobs running daily, weekly, hourly and some of which require significant processing power.
I would like to add requests and limitations to these container cron pods to try and enable vertical scaling and ensure that the assigned node will have enough capacity when being initialized. This will prevent me from having to have multiple large node available at all times and also letting me modify how many crons I can run in parallel easily.
I would like to try and avoid timed scaling since the cron jobs processing time can increase as the application grows.
Edit - Additional Information :
Currently I am using Digital Ocean and utilizing it's UI for cluster autoscaling. I have it working with HPA's on deployments but not crons. Adding limits to crons does not trigger cluster autoscaling to my knowledge.
I have tried to enable HPA scaling with the cron but with no success. Basically it just sits on a pending status signalling that there is insufficient CPU available and does not generate a new node.
Does HPA scaling work with cron job pods and is there a way to achieve the same type of scaling?
HPA is used to scale more pods when pod loads are high, but this won't increase the resources on your cluster.
I think you're looking for cluster autoscaler (works on AWS, GKE and Azure) and will increase cluster capacity when pods can't be scheduled.
This is a Community Wiki answer so feel free to edit it and add any additional details you consider important.
As Dom already mentioned "this won't increase the resources on your cluster." Saying more specifically, it won't create an additional node as Horizontal Pod Autoscaler doesn't have such capability and in fact it has nothing to do with cluster scaling. It's name is pretty self-exlpanatory. HPA is able only to scale Pods and it scales them horizontaly, in other words it can automatically increase or decrease number of replicas of your "replication controller, deployment, replica set or stateful set based on observed CPU utilization (or, with custom metrics support, on some other application-provided metrics)" as per the docs.
As to cluster autoscaling, as already said by Dom, such solutions are implemented in so called managed kubernetes solutions such as GKE on GCP, EKS on AWS or AKS on Azure and many more. You typically don't need to do anything to enable them as they are available out of the box.
You may wonder how HPA and CA fit together. It's really well explained in FAQ section of the Cluster Autoscaler project:
How does Horizontal Pod Autoscaler work with Cluster Autoscaler?
Horizontal Pod Autoscaler changes the deployment's or replicaset's
number of replicas based on the current CPU load. If the load
increases, HPA will create new replicas, for which there may or may
not be enough space in the cluster. If there are not enough resources,
CA will try to bring up some nodes, so that the HPA-created pods have
a place to run. If the load decreases, HPA will stop some of the
replicas. As a result, some nodes may become underutilized or
completely empty, and then CA will terminate such unneeded nodes.
I'm trying(learning) to figure out the best way to utilize CPU (and RAM) on k8s nodes.
My final goal is to make sure CPU utilization on each node in the cluster is above X%
Till now I've read about cluster-autoscaler and HPA, but not sure if they'd help me with the use case.
From what I've read:
cluster-autoscaler is used to autoscale nodes based on a comparison between replica count and resources.request Vs available CPU on the target ec2 instance - which is NOT based on traffic/actual CPU usage
HPA is based on CPU/actual cpu usage, but for individual pods
I essentially wanna get to a point where kubectl top nodes would show all nodes are using > X% (let's say 60%) - and ideally trigger the autoscaling if we reach X2% (let's say 80%)
any suggestion/pointer on how to go about this use case? (or I should somehow use the combination of these 2 autoscaling mechanisms)
You can a combination of the HPA or/and Cluster autoscaler and/or the cloud providers' autoscaling group.
HPA based on CPU/Memory of your pods and scale up and down your K8s Deployments for example.
Cloud provider ASG or autoscaling group. Using the VMs or instances based and you can scale up and down based on their own CPU and memory metrics.
Cluster autoscaler. It works when pods are pending and they have nowhere to run, but if you are handling the case above this is more of a safe fail mechanism or perhaps for workloads that don't require to come up very quickly.
In summary, you can use all 3 above (or less) but you have to see what works for you so that they don't conflict with each other. One potential problem is that when your cloud ASG starts scaling down then you also have pods in pending state then your cluster autoscaler (if you have it enabled) will kick in and you may have both of them trying to do the opposite causing your cluster to just not being able to schedule any pod.
✌️☮️
We have a bunch of pods running in dev environment. The pods are auto-provisioned by an application on every business action. The problem is that across various namespaces they are accumulating and eating available resources in EKS.
Is there a way without jenkins/k8s jobs to simply put some parameter on the pod manifest to tell it to self destruct say in 24 hours?
Add to your pod.spec:
activeDeadlineSeconds: 86400
After deadline your Pod will be stopped for good with the status DeadlineExceeded
If I understood your situation properly, you would like to scale your cluster down in order to save resources.
Kubernetes is featured with the ability to autoscale your application in a cluster. Literally, it means that Kubernetes can start additional pods when the load is increasing and terminate excessive pods when the load is decreasing.
It is possible to downscale the application to zero pods, but, in this case, you will have a delay serving the first request while the pod is starting.
This functionality relies on performance metrics. From the practical side, it means that autoscaling doesn't happen instantly, because it takes some time to performance metrics reach the configured threshold.
The mentioned Kubernetes feature called HPA(horizontal pod autoscale) is described in this document.
In case you are running your cluster on GCP or GKE, you are able to go further and automatically start additional nodes for your cluster when you need more computing capacity and shut down nodes when they are not running application pods anymore.
More information about this functionality can be found following the link.
Last, but not least, you can use tool like Ansible to manage all your kubernetes assets (it can create/manage deployments via playbooks).
If you decide to give it a try, you might find this information useful:
Creating a Container cluster in GKE
70% cheaper Kubernetes cluster on AWS
How to build a Kubernetes Horizontal Pod Autoscaler using custom metrics
I have a small Web Application running on a Google Kubernetes Cluster. But I want to save some money, because the web app does not get much traffic.
Thus my goal is to automatically downscale my Kubernetes cluster to 0 nodes if there was no traffic for more than some amount of time. And of course it should automatically spin up a node if there is incoming traffic.
Any ideas on how to do this?
The GKE autoscaler scales up only when there are pods to be scheduled that do not fir on any current nodes and scaling up would allow the pod to be scheduled.
Scaling down occurs whenever a node is using less than half it's total memory and CPU, and all the pods running on the node can be scheduled on another node.
This being said, the autoscaler will never scale a cluster down to 0 as the reuirements for that can't be met.
However, you can configure Horizontal Pod Autoscaling for your application deployment. You can configure HPA to scale up or down based on the number of HTTP requests using a custom metric. Despite this, HPA should also not scale the deployment all the way down to 0, nor should it scale up from 0.
If you configure HPA properly, enable cluster autoscaling, and plan how your pods are being deployed by leveraging taints, tolerations, and affinity, then you can optimze autoscaling so that your cluster will scale down to a minimal size. But it still will not be 0.
All this being said, if you are running just a simple application with extended downtime, you may want to consider using Cloud Run or App Engine as those will be easier to manage than GKE and will have far less overhead (and likely less cost).