GKE Cluster with 0 Node and Autopilot Enabled - kubernetes

I use GKE for years and I wanted to experiment with GKE with AutoPilot mode, and my initial expectation was, it starts with 0 worker nodes, and whenever I deploy a workload, it automatically scales the nodes based on requested memory and CPU. However, I created a GKE Cluster, there is nothing related to nodes in UI, but in kubectl get nodes output I see there are 2 nodes. Do you have any idea how to start that cluster with no node initially?

The principle of GKE autopilot is NOT TO worry about the node, it's managed for you. No matter if there is 1, 2 or 10 node to your cluster, you don't pay for them, you pay only when a POD run in your cluster (CPU and Memory time usage).
So, you can't handle the number of node, number of pools and low level management like that, something similar to serverless product (Google prefers saying "nodeless" cluster)
At the opposite, it's great to already have resources provisioned that you don't pay on your cluster, you will deploy and scale quicker!
EDIT 1
You can have a look to the pricing. You have a flat fee of $74.40 per month ($0.10/hour) for the control plane. And then you pay your pods (CPU + Memory).
You have 1 free cluster per Billing account.

Related

GKE Cluster autoscaler profile for older luster

Now in GKE there is new tab while creating new K8s cluster
Automation - Set cluster-level criteria for automatic maintenance, autoscaling, and auto-provisioning. Edit the node pool for automation like auto-scaling, auto-upgrades, and repair.
it has two options - Balanced (default) & Optimize utilization (beta)
cant we set this for older cluster any work around?
we are running old GKE version 1.14 we want to auto-scale cluster when 70% of resource utilization of existing nodes.
Currently, we have 2 different pools - only one has auto node provisioning enable but during peak hour if HPA scales POD, New node taking some time to join the cluster and sometimes exiting node start crashing due to resource pressure.
You can set the autoscaling profile by going into:
GCP Cloud Console (Web UI) -> Kubernetes Engine -> CLUSTER-NAME -> Edit -> Autoscaling profile
This screenshot was made on GKE version 1.14.10-gke.50
You can also run:
gcloud beta container clusters update CLUSTER-NAME --autoscaling-profile optimize-utilization
The official documentation states:
You can specify which autoscaling profile to use when making such decisions. The currently available profiles are:
balanced: The default profile.
optimize-utilization: Prioritize optimizing utilization over keeping spare resources in the cluster. When selected, the cluster autoscaler scales down the cluster more aggressively: it can remove more nodes, and remove nodes faster. This profile has been optimized for use with batch workloads that are not sensitive to start-up latency. We do not currently recommend using this profile with serving workloads.
-- Cloud.google.com: Kubernetes Engine: Cluster autoscaler: Autoscaling profiles
This setting (optimize-utilization) could not be the best option when using it for serving workloads. It will more aggressively try to scale-down (remove a node). It will automatically reduce the amount of available resources your cluster is having and could be more vulnerable to workload spikes.
Answering the part of the question:
we are running old GKE version 1.14 we want to auto-scale cluster when 70% of resource utilization of existing nodes.
As stated in the documentation:
Cluster autoscaler increases or decreases the size of the node pool automatically, based on the resource requests (rather than actual resource utilization) of Pods running on that node pool's nodes. It periodically checks the status of Pods and nodes, and takes action:
If Pods are unschedulable because there are not enough nodes in the node pool, cluster autoscaler adds nodes, up to the maximum size of the node pool.
-- Cloud.google.com: Kubernetes Engine: Cluster autoscaler: How cluster autoscaler works
You can't directly scale the cluster based on the percentage of resource utilization (70%).
Autoscaler bases on inability of the cluster to schedule pods on currently existing nodes.
You can scale the amount of replicas of your Deployment by CPU usage with Horizontal Pod Autoscaler. This Pods could have a buffer to handle increased amount of traffic and after a specific threshold they could spawn new Pods where the CA( Cluster autoscaler) would send a request for a new node (if new Pods are unschedulable). This buffer would be the mechanism to prevent sudden spikes that application couldn't manage.
The buffer part and over-provisioning explained in details in:
Cloud.google.com: Solutions: Best practices for running cost effective kubernetes applications on gke: Autoscaler and over-provisioning
There is an extensive documentation about running cost effective apps on GKE:
Cloud.google.com: Solutions: Best practices for running cost effective kubernetes applications on gke
I encourage you to check above link as there are a lot of tips and insights on (scaling, over-provisioning, workload spikes, HPA, VPA,etc.)
Additional resources:
Cloud.google.com: Kubernetes Engine: Node auto provisioning

Kubernetes node CPU utilization

I'm trying(learning) to figure out the best way to utilize CPU (and RAM) on k8s nodes.
My final goal is to make sure CPU utilization on each node in the cluster is above X%
Till now I've read about cluster-autoscaler and HPA, but not sure if they'd help me with the use case.
From what I've read:
cluster-autoscaler is used to autoscale nodes based on a comparison between replica count and resources.request Vs available CPU on the target ec2 instance - which is NOT based on traffic/actual CPU usage
HPA is based on CPU/actual cpu usage, but for individual pods
I essentially wanna get to a point where kubectl top nodes would show all nodes are using > X% (let's say 60%) - and ideally trigger the autoscaling if we reach X2% (let's say 80%)
any suggestion/pointer on how to go about this use case? (or I should somehow use the combination of these 2 autoscaling mechanisms)
You can a combination of the HPA or/and Cluster autoscaler and/or the cloud providers' autoscaling group.
HPA based on CPU/Memory of your pods and scale up and down your K8s Deployments for example.
Cloud provider ASG or autoscaling group. Using the VMs or instances based and you can scale up and down based on their own CPU and memory metrics.
Cluster autoscaler. It works when pods are pending and they have nowhere to run, but if you are handling the case above this is more of a safe fail mechanism or perhaps for workloads that don't require to come up very quickly.
In summary, you can use all 3 above (or less) but you have to see what works for you so that they don't conflict with each other. One potential problem is that when your cloud ASG starts scaling down then you also have pods in pending state then your cluster autoscaler (if you have it enabled) will kick in and you may have both of them trying to do the opposite causing your cluster to just not being able to schedule any pod.
✌️☮️

Can I force Kubernetes not to run more than X replicas of a pod in the same node?

I have a tiny Kubernetes cluster consisting of just two nodes running on t3a.micro AWS EC2 instances (to save money).
I have a small web app that I am trying to run in this cluster. I have a single Deployment for this app. This deployment has spec.replicas set to 4.
When I run this Deployment, I noticed that Kubernetes scheduled 3 of its pods in one node and 1 pod in the other node.
Is it possible to force Kubernetes to schedule at most 2 pods of this Deployment per node? Having 3 instances in the same pod puts me dangerously close to running out of memory in these tiny EC2 instances.
Thanks!
The correct solution for this would be to set memory requests and limits correctly matching your steady state and burst RAM consumption levels on every pod, then the scheduler will do all this math for you.
But for the future and for others, there is a new feature which kind of allows this https://kubernetes.io/blog/2020/05/introducing-podtopologyspread/. It's not an exact match, you can't put a global cap, rather you can require pods be evenly spaced over the cluster subject to maximum skew caps.

GCP Kubernetes scale too high

I have Kubernetes cluster hosted on GCP (Master version: 1.12.7-gke.7, Node version: 1.12.7-gke.7).
Recently i noticed that too many nodes are created, without any stress to the system. My expected average number of nodes is 30 but actually after unwanted scale up it goes to something around 60.
I tried to investigate this issue with
kubectl get hpa
and saw that the average CPU is near 0% - no scaling should be occur here.
Also checked
kubectl get deployments
and saw that the DESIRED number of pods is equal to the AVAILABLE - so the system don't asked for more resources.
After inspecting the node utilization I saw that around 25 nodes utilize only 200 mCPU which is very low consumption (5% of the node potential).
After a while, the cluster is back to the normal (around 30 nodes) without any significant event.
What's going on here? what I should check next?
The Horizontal Pod Autoscaler automatically scales the number of pods. So alone it can't be responsible for scaling the nodes. However if you have enabled cluster autoscaler this could be possible. Now to debug what is going on you would need logs from your master node, which you have no access to in GKE because it is maintained by google.
In this case my advice is to contact Google Cloud Support.

AWS EKS Cluster Autoscaler - Scale-In Policy

I've a CA (Cluster Autoscaler) deployed on EKS followed this post. What I'm wondering is CA automatically scales down the cluster whenever at least a single pod is deployed on that node i.e. if there are 3 nodes with the capacity of 8 pods, if 9th pod comes up, CA would provision 4th nodes to run that 9th pod. What I see is CA is continuously terminating & creating a node randomly chosen from within the cluster disturbing other pods & nodes.
How can I tell EKS (without defining minimal nodes or disabling scale-in policy in ASG) to not to kill the node having at least 1 pod running on it. Any suggestion would be appreciated.
You cannot use pod as unit. CA work with resources cpu and memory unit.
If the cluster does not have enough cpu or memory it add one new.
You have to play with your requests resources (in the pod definition) or redefine your node to take an instance type with more or less resources depending how many pod you want on each.
Or you can play with the param scale-down-utilization-threshold
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-the-parameters-to-ca