Container CPU request value for Kubernetes - kubernetes

I have several microservices and I am deploying them to GCP Kubernetes. I am using the free credits and trying out my deployments. My question is when we define CPU requests, based on what we define it? I have set it to 250mCPU but that fills my cluster nodes which are small in with CPU.
Currently, I have 3 nodes with 940mCPU allocable CPU and 3 nodes of the same kind. Now I deployed one API with 3 replicas and assigned 250mCPU for each. With all Kubernetes internal items, all nodes are almost full.
So my question is, based on what we can assign a value for CPU for a service. 250mCPU was a random value. What are others doing to find a minimum CPU for Kubernetes? I have one ASP.NET Core API and 8 NodeJS APIs. If it's based on usage, what's' the best values to start with for a new product?

Mostly by doing stress testing on your application, also, you could use a vertical pod autoscaler in "recomendation" mode, which will monitor your application for some time and then make a recomendation for you to set limits.
Documentation: https://cloud.google.com/kubernetes-engine/docs/concepts/verticalpodautoscaler
Remember you cannot use a vertical pod autoscaler and horizontal pod autoscaler at the same time, unless the vertical pod autoscaler its in recomendation mode.

Related

Scaling a Kubernetes cluster to process jobs in a queue?

(I'm new to Kubernetes and not sure this is best practice)
I have a pipeline of jobs in a Firestore database that need to be completed as quickly as possible.
I want to set up a Kubernetes cluster (on GKE) that will scale up when there is a large backlog of tasks to complete. Each pod/node needs a single GPU to complete the task.
Is it possible to use a cloud function to manually scale the number of nodes depending on the number of jobs in the pipeline?
I was planning on using the clusters.nodePools.setSize() function from the GKE client library but I'm not sure if this is just intended for initial cluster setup rather than manual scaling.
Thanks
https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters.nodePools/setSize
You can configure and use Horizontal pod scaling on your cluster to scale the number of Pods .
As suggested by #somethingsomething refer these links on Horizontal Pod Autoscaler and Cluster autoscaler :
The Horizontal Pod Autoscaler changes the shape of your Kubernetes workload by automatically increasing or decreasing the number of Pods in response to the workload's CPU or memory consumption, or in response to custom metrics reported from within Kubernetes or external metrics from sources outside of your cluster.
Horizontal Pod autoscaling helps to ensure that your workload functions consistently in different situations, and allows you to control costs by only paying for extra capacity when you need it.
It's not always easy to predict the indicators that show whether your workload is under-resourced or under-utilized. The Horizontal Pod Autoscaler can automatically scale the number of Pods in your workload based on one or more metrics.

Is it possible to use Kubernetes autoscale on cron job pods

Some context: I have multiple cron jobs running daily, weekly, hourly and some of which require significant processing power.
I would like to add requests and limitations to these container cron pods to try and enable vertical scaling and ensure that the assigned node will have enough capacity when being initialized. This will prevent me from having to have multiple large node available at all times and also letting me modify how many crons I can run in parallel easily.
I would like to try and avoid timed scaling since the cron jobs processing time can increase as the application grows.
Edit - Additional Information :
Currently I am using Digital Ocean and utilizing it's UI for cluster autoscaling. I have it working with HPA's on deployments but not crons. Adding limits to crons does not trigger cluster autoscaling to my knowledge.
I have tried to enable HPA scaling with the cron but with no success. Basically it just sits on a pending status signalling that there is insufficient CPU available and does not generate a new node.
Does HPA scaling work with cron job pods and is there a way to achieve the same type of scaling?
HPA is used to scale more pods when pod loads are high, but this won't increase the resources on your cluster.
I think you're looking for cluster autoscaler (works on AWS, GKE and Azure) and will increase cluster capacity when pods can't be scheduled.
This is a Community Wiki answer so feel free to edit it and add any additional details you consider important.
As Dom already mentioned "this won't increase the resources on your cluster." Saying more specifically, it won't create an additional node as Horizontal Pod Autoscaler doesn't have such capability and in fact it has nothing to do with cluster scaling. It's name is pretty self-exlpanatory. HPA is able only to scale Pods and it scales them horizontaly, in other words it can automatically increase or decrease number of replicas of your "replication controller, deployment, replica set or stateful set based on observed CPU utilization (or, with custom metrics support, on some other application-provided metrics)" as per the docs.
As to cluster autoscaling, as already said by Dom, such solutions are implemented in so called managed kubernetes solutions such as GKE on GCP, EKS on AWS or AKS on Azure and many more. You typically don't need to do anything to enable them as they are available out of the box.
You may wonder how HPA and CA fit together. It's really well explained in FAQ section of the Cluster Autoscaler project:
How does Horizontal Pod Autoscaler work with Cluster Autoscaler?
Horizontal Pod Autoscaler changes the deployment's or replicaset's
number of replicas based on the current CPU load. If the load
increases, HPA will create new replicas, for which there may or may
not be enough space in the cluster. If there are not enough resources,
CA will try to bring up some nodes, so that the HPA-created pods have
a place to run. If the load decreases, HPA will stop some of the
replicas. As a result, some nodes may become underutilized or
completely empty, and then CA will terminate such unneeded nodes.

Kubernetes node CPU utilization

I'm trying(learning) to figure out the best way to utilize CPU (and RAM) on k8s nodes.
My final goal is to make sure CPU utilization on each node in the cluster is above X%
Till now I've read about cluster-autoscaler and HPA, but not sure if they'd help me with the use case.
From what I've read:
cluster-autoscaler is used to autoscale nodes based on a comparison between replica count and resources.request Vs available CPU on the target ec2 instance - which is NOT based on traffic/actual CPU usage
HPA is based on CPU/actual cpu usage, but for individual pods
I essentially wanna get to a point where kubectl top nodes would show all nodes are using > X% (let's say 60%) - and ideally trigger the autoscaling if we reach X2% (let's say 80%)
any suggestion/pointer on how to go about this use case? (or I should somehow use the combination of these 2 autoscaling mechanisms)
You can a combination of the HPA or/and Cluster autoscaler and/or the cloud providers' autoscaling group.
HPA based on CPU/Memory of your pods and scale up and down your K8s Deployments for example.
Cloud provider ASG or autoscaling group. Using the VMs or instances based and you can scale up and down based on their own CPU and memory metrics.
Cluster autoscaler. It works when pods are pending and they have nowhere to run, but if you are handling the case above this is more of a safe fail mechanism or perhaps for workloads that don't require to come up very quickly.
In summary, you can use all 3 above (or less) but you have to see what works for you so that they don't conflict with each other. One potential problem is that when your cloud ASG starts scaling down then you also have pods in pending state then your cluster autoscaler (if you have it enabled) will kick in and you may have both of them trying to do the opposite causing your cluster to just not being able to schedule any pod.
✌️☮️

Can I force Kubernetes not to run more than X replicas of a pod in the same node?

I have a tiny Kubernetes cluster consisting of just two nodes running on t3a.micro AWS EC2 instances (to save money).
I have a small web app that I am trying to run in this cluster. I have a single Deployment for this app. This deployment has spec.replicas set to 4.
When I run this Deployment, I noticed that Kubernetes scheduled 3 of its pods in one node and 1 pod in the other node.
Is it possible to force Kubernetes to schedule at most 2 pods of this Deployment per node? Having 3 instances in the same pod puts me dangerously close to running out of memory in these tiny EC2 instances.
Thanks!
The correct solution for this would be to set memory requests and limits correctly matching your steady state and burst RAM consumption levels on every pod, then the scheduler will do all this math for you.
But for the future and for others, there is a new feature which kind of allows this https://kubernetes.io/blog/2020/05/introducing-podtopologyspread/. It's not an exact match, you can't put a global cap, rather you can require pods be evenly spaced over the cluster subject to maximum skew caps.

How to auto scale on Kubernetes (GKE) with a pod that runs one per node and uses all available resources?

I think I have a pretty simple scenario: I need to auto-scale on Google Kubernetes Engine with a pod that runs one per node and uses all available remaining resources on the node.
"Remaining" resources means that there are certain basic pod services running on each node such logging and metrics, which need their requested resources. But everything left should go to this particular pod, which is in fact the main web service for my cluster.
Also, these remaining resources should be available when the pod's container starts up, rather than through vertical autoscaling with pod restarts. The reason is that the container has certain constraints that make restarts sort of expensive: heavy disk caching, and issues with licensing of some 3rd party software I use. So although certainly the container/pod is restartable, I'd like to avoid except for rolling updates.
The cluster should scale nodes when CPU utilization gets too high (say, 70%). And I don't mean requested CPU utilization of a node's pods, but rather the actual utilization, which is mainly determined by the web service's load.
How should I configure the cluster for this scenario? I've seen there's cluster auto scaling, vertical pod autoscaling, and horizontal pod autoscaling. There's also Deployment vs DaemonSet, although it does not seem that DaemonSet is designed for pods that need to scale. So I think Deployment may be necessary, but in a way that limits one web service pod per node (pod anti affinity??).
How do I put all this together?
You could set up a Deployment with a resource request that equals a single node's allocatable resources (i.e., total resources minus auxiliary services as you mentioned). Then configure Horizontal Pod Autoscaling to scale up your deployment when CPU request utilization goes above 70%; this should do the trick as in this case request utilization rate is essentially the same as total node resource utilization rate, right? However if you do want to base scaling on actual node CPU utilization, there's always scaling by external metrics.
Technically the Deployment's resource request doesn't have to exactly equal remaining resources; rather it's enough for the request to be large enough to prevent two pods being ran on the same node. As long as that's the case and there's no resource limits, the pod ends up consuming all the available node resources.
Finally configure cluster autoscaling on your GKE node pool and we should be good to go. Vertical Pod Autoscaling doesn't really come into play here as pod resource request stays constant, and DaemonSets aren't applicable as they're not scalable via HPA as mentioned.