google kubernetes engine node idle timeout - kubernetes

We use GKE in one of our service which is autoscaled. The workload is variable and based on the workload the cluster scales upto hundreds of nodes. However i see that when the workload goes down many of the nodes which are idle still are alive for very long time and hence increasing our bill. Is there a setting we can do where we can specify a time after which a node will be terminated and removed from the cluster?

The Kubernetes scaling-down process typically includes a delay as a protection from peak traffic spikes that can eventually occurs while performing the resize.
As well, there are several aspect about the autoscaler to consider. Please check the following docs for details:
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-does-scale-down-work
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node
Furthermore, when using the GKE autoscaler, there are some constraints to take into account:
When scaling down, cluster autoscaler honors a graceful termination
period of 10 minutes for rescheduling the node's Pods onto a different
node before forcibly terminating the node.
Occasionally, cluster autoscaler cannot scale down completely and an extra node exists after scaling down. This can occur when required system Pods are scheduled
onto different nodes, because there is no trigger for any of those
Pods to be moved to a different node. See I have a couple of nodes
with low utilization, but they are not scaled down. Why?. To work
around this limitation, you can configure a Pod disruption budget.
Disclaimer: Comments and opinions are my own and not the views of my employer.

Related

Is it possible to use Kubernetes autoscale on cron job pods

Some context: I have multiple cron jobs running daily, weekly, hourly and some of which require significant processing power.
I would like to add requests and limitations to these container cron pods to try and enable vertical scaling and ensure that the assigned node will have enough capacity when being initialized. This will prevent me from having to have multiple large node available at all times and also letting me modify how many crons I can run in parallel easily.
I would like to try and avoid timed scaling since the cron jobs processing time can increase as the application grows.
Edit - Additional Information :
Currently I am using Digital Ocean and utilizing it's UI for cluster autoscaling. I have it working with HPA's on deployments but not crons. Adding limits to crons does not trigger cluster autoscaling to my knowledge.
I have tried to enable HPA scaling with the cron but with no success. Basically it just sits on a pending status signalling that there is insufficient CPU available and does not generate a new node.
Does HPA scaling work with cron job pods and is there a way to achieve the same type of scaling?
HPA is used to scale more pods when pod loads are high, but this won't increase the resources on your cluster.
I think you're looking for cluster autoscaler (works on AWS, GKE and Azure) and will increase cluster capacity when pods can't be scheduled.
This is a Community Wiki answer so feel free to edit it and add any additional details you consider important.
As Dom already mentioned "this won't increase the resources on your cluster." Saying more specifically, it won't create an additional node as Horizontal Pod Autoscaler doesn't have such capability and in fact it has nothing to do with cluster scaling. It's name is pretty self-exlpanatory. HPA is able only to scale Pods and it scales them horizontaly, in other words it can automatically increase or decrease number of replicas of your "replication controller, deployment, replica set or stateful set based on observed CPU utilization (or, with custom metrics support, on some other application-provided metrics)" as per the docs.
As to cluster autoscaling, as already said by Dom, such solutions are implemented in so called managed kubernetes solutions such as GKE on GCP, EKS on AWS or AKS on Azure and many more. You typically don't need to do anything to enable them as they are available out of the box.
You may wonder how HPA and CA fit together. It's really well explained in FAQ section of the Cluster Autoscaler project:
How does Horizontal Pod Autoscaler work with Cluster Autoscaler?
Horizontal Pod Autoscaler changes the deployment's or replicaset's
number of replicas based on the current CPU load. If the load
increases, HPA will create new replicas, for which there may or may
not be enough space in the cluster. If there are not enough resources,
CA will try to bring up some nodes, so that the HPA-created pods have
a place to run. If the load decreases, HPA will stop some of the
replicas. As a result, some nodes may become underutilized or
completely empty, and then CA will terminate such unneeded nodes.

How to avoid the last pod being killed on automatic node scale down in AKS

We are using Azure AKS v1.17.9 with auto-scaling both for pods (using HorizontalPodAutoscaler) and for nodes. Overall it works well, but we have seen outages in some cases. We have some deployments where minReplicas=1 and maxReplicas=4. Most of the time there will only be one pod running for such a deployment. In some cases where the auto-scaler has decided to scale down a node, the last remaining pod has been killed. Later a new pod is started on another node, but this means an outage.
I would have expected the auto-scaler to first create a new pod running on another node (bringing the number of replicas up to the allowed value of 2) and then scaling down the old pod. That would have worked without downtime. As it is it kills first and asks questions later.
Is there a way around this except the obvious alternative of setting minReplicas=2 (which increases the cost as all these pods are doubled, needing additional VMs)? And is this expected, or is it a bug?
In some cases where the auto-scaler has decided to scale down a node, the last remaining pod has been killed. Later a new pod is started on another node, but this means an outage.
For this reason, you should always have at least 2 replicas for Deployment in a production environment. And you should use Pod Anti-Affinity so that those two pods are not scheduled to the same Availability Zone. E.g. if there is network problems in one Availability Zone, your app is still available.
It is common to have at least 3 replicas, one in each Availability Zone, since cloud providers typically has 3 Availability Zones in each Region - so that you can use inter-zone traffic which is cheaper than cross-zone traffic, typically.
You can always use fewer replicas to save cost, but it is a trade-off and you get worse availability.

Automatic Down- and Upscaling of Kubernetes Cluster Depending on Request Frequency

I have a small Web Application running on a Google Kubernetes Cluster. But I want to save some money, because the web app does not get much traffic.
Thus my goal is to automatically downscale my Kubernetes cluster to 0 nodes if there was no traffic for more than some amount of time. And of course it should automatically spin up a node if there is incoming traffic.
Any ideas on how to do this?
The GKE autoscaler scales up only when there are pods to be scheduled that do not fir on any current nodes and scaling up would allow the pod to be scheduled.
Scaling down occurs whenever a node is using less than half it's total memory and CPU, and all the pods running on the node can be scheduled on another node.
This being said, the autoscaler will never scale a cluster down to 0 as the reuirements for that can't be met.
However, you can configure Horizontal Pod Autoscaling for your application deployment. You can configure HPA to scale up or down based on the number of HTTP requests using a custom metric. Despite this, HPA should also not scale the deployment all the way down to 0, nor should it scale up from 0.
If you configure HPA properly, enable cluster autoscaling, and plan how your pods are being deployed by leveraging taints, tolerations, and affinity, then you can optimze autoscaling so that your cluster will scale down to a minimal size. But it still will not be 0.
All this being said, if you are running just a simple application with extended downtime, you may want to consider using Cloud Run or App Engine as those will be easier to manage than GKE and will have far less overhead (and likely less cost).

Can you force GKE node autoscaling and how long should autoscaling take?

When running an autoscaling cluster on Google Cloud GKE, it is taking 15 minutes or sometimes half an hour with unschedulable pods before the autoscaler kicks in and provisions another node.
Especially in cases where I have manually deleted a node. The list of nodes shows as correct, but the number of nodes on the Node Cluster console will show as if the node was not deleted for at least 30 minutes.
Is there a way to force the autoscaler to take stock and make an upgrade immediately?
I have also tried turning off autoscaling and just setting a static number of nodes. But when deleting one of those nodes, it has not come back after waiting 45 minutes.
Is this expected behavior or is something up with GKE or do I potentially have something configured incorrectly?
I have checked and confirmed that I am not hitting up against any quotas. I have node auto-repair and autoscale both activated.
There is no way to force the autoscaler to go up (aside from manually changing the settings in the node pool). The autoscaler should scale up as soon as there are unschedulable pods as long as adding a new node will lead the pod to being scheduled.
Currently, only autoscaler actions are sent to Stackdriver so it's a lot harder to diagnose why an action was not taken. Your best bet would be to open a case with Google support if you have support or open an Issue Tracker to have a Googler check the logs for you.

How does multiple replicas/pods scale Kubernetes?

From what I understand, using multiple replicas as well as auto-scaling is supposed to help in the case that lots of people visit your website and make calls to services provided by your Kubernetes cluster.
How do the replicas help with scaling?
Aren't these extra pods all just running on the same computer with constant resources? That would mean that they're all limited by a constant amount of CPU and memory.
Kubernetes has couple of scaling mechanisms. Horizontal Pod Autoscaler being the basic, but not the only one.
With HPA you can spin up additional PODs according to some metrics (most commonly cpu and memory). At some point you will hit a moment when your cluster nodes do not have enough resources to satisfy resource requirements of your pods (you will have pods in Pending state due to lack of nodes available for scheduling).
At that point a Cluster Autoscaler can kick in and ie. scale AWS ASG (or some other cloud-ish node pool) to add new node to the cluster and make space for the pending pod(s)