Automatic Down- and Upscaling of Kubernetes Cluster Depending on Request Frequency - kubernetes

I have a small Web Application running on a Google Kubernetes Cluster. But I want to save some money, because the web app does not get much traffic.
Thus my goal is to automatically downscale my Kubernetes cluster to 0 nodes if there was no traffic for more than some amount of time. And of course it should automatically spin up a node if there is incoming traffic.
Any ideas on how to do this?

The GKE autoscaler scales up only when there are pods to be scheduled that do not fir on any current nodes and scaling up would allow the pod to be scheduled.
Scaling down occurs whenever a node is using less than half it's total memory and CPU, and all the pods running on the node can be scheduled on another node.
This being said, the autoscaler will never scale a cluster down to 0 as the reuirements for that can't be met.
However, you can configure Horizontal Pod Autoscaling for your application deployment. You can configure HPA to scale up or down based on the number of HTTP requests using a custom metric. Despite this, HPA should also not scale the deployment all the way down to 0, nor should it scale up from 0.
If you configure HPA properly, enable cluster autoscaling, and plan how your pods are being deployed by leveraging taints, tolerations, and affinity, then you can optimze autoscaling so that your cluster will scale down to a minimal size. But it still will not be 0.
All this being said, if you are running just a simple application with extended downtime, you may want to consider using Cloud Run or App Engine as those will be easier to manage than GKE and will have far less overhead (and likely less cost).

Related

Scaling a Kubernetes cluster to process jobs in a queue?

(I'm new to Kubernetes and not sure this is best practice)
I have a pipeline of jobs in a Firestore database that need to be completed as quickly as possible.
I want to set up a Kubernetes cluster (on GKE) that will scale up when there is a large backlog of tasks to complete. Each pod/node needs a single GPU to complete the task.
Is it possible to use a cloud function to manually scale the number of nodes depending on the number of jobs in the pipeline?
I was planning on using the clusters.nodePools.setSize() function from the GKE client library but I'm not sure if this is just intended for initial cluster setup rather than manual scaling.
Thanks
https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters.nodePools/setSize
You can configure and use Horizontal pod scaling on your cluster to scale the number of Pods .
As suggested by #somethingsomething refer these links on Horizontal Pod Autoscaler and Cluster autoscaler :
The Horizontal Pod Autoscaler changes the shape of your Kubernetes workload by automatically increasing or decreasing the number of Pods in response to the workload's CPU or memory consumption, or in response to custom metrics reported from within Kubernetes or external metrics from sources outside of your cluster.
Horizontal Pod autoscaling helps to ensure that your workload functions consistently in different situations, and allows you to control costs by only paying for extra capacity when you need it.
It's not always easy to predict the indicators that show whether your workload is under-resourced or under-utilized. The Horizontal Pod Autoscaler can automatically scale the number of Pods in your workload based on one or more metrics.

kubernetes - use-case to choose deployment over daemon-set

Normally when we scale up the application we do not deploy more than 1 pod of the same service on the same node, using daemon-set we can make sure that we have our service on each nodes and would make it very easy to manage pod when scale-up and scale down node. If I use deployment instead, there will have trouble when scaling, there may have multiple pod on the same node, and new node may have no pod there.
I want to know the use-case where deployment will be more suitable than daemon-set.
Your cluster runs dozens of services, and therefore runs hundreds of nodes, but for scale and reliability you only need a couple of copies of each service. Deployments make more sense here; if you ran everything as DaemonSets you'd have to be able to fit the entire stack into a single node, and you wouldn't be able to independently scale components.
I would almost always pick a Deployment over a DaemonSet, unless I was running some sort of management tool that must run on every node (a metric collector, log collector, etc.). You can combine that with a HorizontalPodAutoscaler to make the size of the Deployment react to the load of the system, and in turn combine that with the cluster autoscaler (if you're in a cloud environment) to make the size of the cluster react to the resource requirements of the running pods.
Cluster scale-up and scale-down isn't particularly a problem. If the cluster autoscaler removes a node, it will first move all of the existing pods off of it, so you'll keep the cluster-wide total replica count for your service. Similarly, it's not usually a problem if every node isn't running every service, so long as there are enough replicas of each service running somewhere in the cluster.
There are two levels (or say layers) of scaling when using deployments:
Let's say a website running on kubernetes has high traffic only on Fridays.
The deployment is scaled up to launch more pods as the traffic increases and scaled down later when traffic subsides. This is service/pod auto scaling.
To accommodate the increase in the pods more nodes are added to the cluster, later when there are less pods some nodes are shutdown and released. This is cluster auto scaling.
Unlike the above case, a daemonset has a 1 to 1 mapping to the nodes. And the N nodes = N pods kind of scaling will be useful only when 1 pods fits exactly to 1 node resources. This however, is very unlikely in real world scenarios.
Having a Daemonset has the downside that you might need to scale the application and therefore need to scale the number of nodes to add more pods. Also if you only need a few pods of the application but have a large cluster you might end up running a lot of unused pods that block resources for other applications.
Having a Deployment solves this problem, because two or more pods of the same application can run on one node and the number of pods is decoupled from the number of nodes per default. But this brings another problem: If your cluster is rather small and you have a small number of pods, they might end up all running on a few nodes. There is no good distribution over all available nodes. If some of those nodes fail for some reason you loose the majority of your application pods.
You can solve this using PodAntiAffinity, so pods can not run on a node where a defined other pod is running. By that you can have a similar behavior as a Daemonset but with far less pods and more flexibility regarding scaling and resource usage.
So a use case would be, when you don't need one pod per node but still want them to be distrubuted over your nodes. Say you have 50 nodes and an application of which you need 15 pods. Using a Deployment with PodAntiAffinity you can run those 15 pods in a distributed way on different 15 nodes. When you suddently need 20 you can scale up the application (not the nodes) so 20 pods run on 20 different nodes. But you never have 50 pods per default, where you only need 15 (or 20).
You could achieve the same with a Daemonset using nodeSelector or taints/tolerations but that would be far more complicated and less flexible.

Is it possible to use Kubernetes autoscale on cron job pods

Some context: I have multiple cron jobs running daily, weekly, hourly and some of which require significant processing power.
I would like to add requests and limitations to these container cron pods to try and enable vertical scaling and ensure that the assigned node will have enough capacity when being initialized. This will prevent me from having to have multiple large node available at all times and also letting me modify how many crons I can run in parallel easily.
I would like to try and avoid timed scaling since the cron jobs processing time can increase as the application grows.
Edit - Additional Information :
Currently I am using Digital Ocean and utilizing it's UI for cluster autoscaling. I have it working with HPA's on deployments but not crons. Adding limits to crons does not trigger cluster autoscaling to my knowledge.
I have tried to enable HPA scaling with the cron but with no success. Basically it just sits on a pending status signalling that there is insufficient CPU available and does not generate a new node.
Does HPA scaling work with cron job pods and is there a way to achieve the same type of scaling?
HPA is used to scale more pods when pod loads are high, but this won't increase the resources on your cluster.
I think you're looking for cluster autoscaler (works on AWS, GKE and Azure) and will increase cluster capacity when pods can't be scheduled.
This is a Community Wiki answer so feel free to edit it and add any additional details you consider important.
As Dom already mentioned "this won't increase the resources on your cluster." Saying more specifically, it won't create an additional node as Horizontal Pod Autoscaler doesn't have such capability and in fact it has nothing to do with cluster scaling. It's name is pretty self-exlpanatory. HPA is able only to scale Pods and it scales them horizontaly, in other words it can automatically increase or decrease number of replicas of your "replication controller, deployment, replica set or stateful set based on observed CPU utilization (or, with custom metrics support, on some other application-provided metrics)" as per the docs.
As to cluster autoscaling, as already said by Dom, such solutions are implemented in so called managed kubernetes solutions such as GKE on GCP, EKS on AWS or AKS on Azure and many more. You typically don't need to do anything to enable them as they are available out of the box.
You may wonder how HPA and CA fit together. It's really well explained in FAQ section of the Cluster Autoscaler project:
How does Horizontal Pod Autoscaler work with Cluster Autoscaler?
Horizontal Pod Autoscaler changes the deployment's or replicaset's
number of replicas based on the current CPU load. If the load
increases, HPA will create new replicas, for which there may or may
not be enough space in the cluster. If there are not enough resources,
CA will try to bring up some nodes, so that the HPA-created pods have
a place to run. If the load decreases, HPA will stop some of the
replicas. As a result, some nodes may become underutilized or
completely empty, and then CA will terminate such unneeded nodes.

Kubernetes node CPU utilization

I'm trying(learning) to figure out the best way to utilize CPU (and RAM) on k8s nodes.
My final goal is to make sure CPU utilization on each node in the cluster is above X%
Till now I've read about cluster-autoscaler and HPA, but not sure if they'd help me with the use case.
From what I've read:
cluster-autoscaler is used to autoscale nodes based on a comparison between replica count and resources.request Vs available CPU on the target ec2 instance - which is NOT based on traffic/actual CPU usage
HPA is based on CPU/actual cpu usage, but for individual pods
I essentially wanna get to a point where kubectl top nodes would show all nodes are using > X% (let's say 60%) - and ideally trigger the autoscaling if we reach X2% (let's say 80%)
any suggestion/pointer on how to go about this use case? (or I should somehow use the combination of these 2 autoscaling mechanisms)
You can a combination of the HPA or/and Cluster autoscaler and/or the cloud providers' autoscaling group.
HPA based on CPU/Memory of your pods and scale up and down your K8s Deployments for example.
Cloud provider ASG or autoscaling group. Using the VMs or instances based and you can scale up and down based on their own CPU and memory metrics.
Cluster autoscaler. It works when pods are pending and they have nowhere to run, but if you are handling the case above this is more of a safe fail mechanism or perhaps for workloads that don't require to come up very quickly.
In summary, you can use all 3 above (or less) but you have to see what works for you so that they don't conflict with each other. One potential problem is that when your cloud ASG starts scaling down then you also have pods in pending state then your cluster autoscaler (if you have it enabled) will kick in and you may have both of them trying to do the opposite causing your cluster to just not being able to schedule any pod.
✌️☮️

Kubernetes automatic shutdown after some idle time

Does kubernetes or Helm support shut down the pods if it is idle for more than a given threshold time?
This would be very useful in the development environment, to provide room for other processes to consume it and save cost.
Kubernetes is featured with the ability to autoscale your application in a cluster. Literally, it means that Kubernetes can start additional pods when the load is increasing and terminate excessive pods when the load is decreasing.
It is possible to downscale the application to zero pods, but, in this case, you will have a delay serving the first request while the pod is starting.
This functionality relies on performance metrics provided by Heapster application, that must be run in the cluster. From the practical side, it means that autoscaling doesn't happen instantly, because it takes some time to performance metrics reach the configured threshold.
The mentioned Kubernetes feature called HPA(horizontal pod autoscale) is described in this document.
In case you are running your cluster on GCP or GKE, you are able to go further and automatically start additional nodes for your cluster when you need more computing capacity and shut down nodes when they are not running application pods anymore.
More information about this functionality can be found following the link.
If you decide to give it a try, you might find this information useful:
Creating a Container cluster in GKE
70% cheaper Kubernetes cluster on AWS
How to build a Kubernetes Horizontal Pod Autoscaler using custom metrics