community.
I have doubts about the use of HorizontalPodAutoscaler (HPA) in Kubernetes, what are the best practices of using HPA, especially in the implementation of MaxReplicate, as an example, If I have a cluster with 3 workers nodes running a single app, and setting up the HPA to scale up to 20 pods, but it is a good practice scale pods 3x more than the available nodes? Or scale the pods up to the same quantity of available worker nodes in the cluster as a better approach?
Thank you in advantage
first of all you need to test your application and decide a reasonable resources per pod "request and limits"
After setting the limit per pod then you know how many pods your cluster can maintain.
for example if you have total/free 10 cpu and 10 Gi memory over the cluster and you set limit per pod to have 1 cpu and 1 Gi memo then you can run up to 10 pods.
then it's time to run your load test and fire the expected traffic at its max with the lowest number of pods you're planning to run that fit the normal/daily traffic .. gradually startup new pod and check if you can handle the high traffic or you need to add more pods still .. repeat this till you reach a appropriate number of pods. then you got the maximum number of pods that you can configure in your HPA.
Related
Normally when we scale up the application we do not deploy more than 1 pod of the same service on the same node, using daemon-set we can make sure that we have our service on each nodes and would make it very easy to manage pod when scale-up and scale down node. If I use deployment instead, there will have trouble when scaling, there may have multiple pod on the same node, and new node may have no pod there.
I want to know the use-case where deployment will be more suitable than daemon-set.
Your cluster runs dozens of services, and therefore runs hundreds of nodes, but for scale and reliability you only need a couple of copies of each service. Deployments make more sense here; if you ran everything as DaemonSets you'd have to be able to fit the entire stack into a single node, and you wouldn't be able to independently scale components.
I would almost always pick a Deployment over a DaemonSet, unless I was running some sort of management tool that must run on every node (a metric collector, log collector, etc.). You can combine that with a HorizontalPodAutoscaler to make the size of the Deployment react to the load of the system, and in turn combine that with the cluster autoscaler (if you're in a cloud environment) to make the size of the cluster react to the resource requirements of the running pods.
Cluster scale-up and scale-down isn't particularly a problem. If the cluster autoscaler removes a node, it will first move all of the existing pods off of it, so you'll keep the cluster-wide total replica count for your service. Similarly, it's not usually a problem if every node isn't running every service, so long as there are enough replicas of each service running somewhere in the cluster.
There are two levels (or say layers) of scaling when using deployments:
Let's say a website running on kubernetes has high traffic only on Fridays.
The deployment is scaled up to launch more pods as the traffic increases and scaled down later when traffic subsides. This is service/pod auto scaling.
To accommodate the increase in the pods more nodes are added to the cluster, later when there are less pods some nodes are shutdown and released. This is cluster auto scaling.
Unlike the above case, a daemonset has a 1 to 1 mapping to the nodes. And the N nodes = N pods kind of scaling will be useful only when 1 pods fits exactly to 1 node resources. This however, is very unlikely in real world scenarios.
Having a Daemonset has the downside that you might need to scale the application and therefore need to scale the number of nodes to add more pods. Also if you only need a few pods of the application but have a large cluster you might end up running a lot of unused pods that block resources for other applications.
Having a Deployment solves this problem, because two or more pods of the same application can run on one node and the number of pods is decoupled from the number of nodes per default. But this brings another problem: If your cluster is rather small and you have a small number of pods, they might end up all running on a few nodes. There is no good distribution over all available nodes. If some of those nodes fail for some reason you loose the majority of your application pods.
You can solve this using PodAntiAffinity, so pods can not run on a node where a defined other pod is running. By that you can have a similar behavior as a Daemonset but with far less pods and more flexibility regarding scaling and resource usage.
So a use case would be, when you don't need one pod per node but still want them to be distrubuted over your nodes. Say you have 50 nodes and an application of which you need 15 pods. Using a Deployment with PodAntiAffinity you can run those 15 pods in a distributed way on different 15 nodes. When you suddently need 20 you can scale up the application (not the nodes) so 20 pods run on 20 different nodes. But you never have 50 pods per default, where you only need 15 (or 20).
You could achieve the same with a Daemonset using nodeSelector or taints/tolerations but that would be far more complicated and less flexible.
I am new to K8s autoscaling. I have a stateful application I am trying to find out which autoscaling method works for me. According to the documentation:
if pods don't have the correct resources set, the Updater component
of VPA kills them so that they can be recreated by their controllers
with the updated requests.
I want to know the downtime for kills the existing pod and creating the new ones. Or at least how can I measure it for my application?
I am comparing the HPA and VPA approaches for my application.
the follow-up question is - how long does it take in HPA to create a new pod in scaling up?
There are few things to clear out here:
VPA does not create nodes, Cluster Autoscaler is used for that. Vertical Pod Autoscaler allocates more (or less) CPUs and memory to existing pods and CA scales your node clusters based on the number of pending pods.
Whether to use HPA, VPA, CA, or some combination, depends on the needs of your application. Experimentation is the most reliable way to find which option works best for you, so it might take a few tries to find the right setup. HPA and VPA depend on metrics and some historic data. CA is recommended if you have a good understanding of your pods and containers needs.
HPA and VPA should not be used together to evaluate CPU/Memory. However, VPA can be used to evaluate CPU or Memory whereas HPA can be used to evaluate external metrics (like the number of HTTP requests or the number of active users, etc). Also, you can use VPA together with CA.
It's hard to evaluate the exact time needed for VPA to adjust and restart pods as well as for HPA to scale up. The difference between best case scenario and worse case one relies on many factors and can make a significant gap in time. You need to rely on metrics and observations in order to evaluate that.
Kubernetes Metrics Server collects resource metrics from Kubelets and exposes them in Kubernetes apiserver through Metrics API for use by Horizontal Pod Autoscaler and Vertical Pod Autoscaler.
Below are some useful sources that would help you understand and choose the right solution for you:
AutoScaling in Kubernetes ( HPA / VPA )
Kubernetes Autoscaling in Production: Best Practices for Cluster Autoscaler, HPA and VPA
Kubernetes Autoscaling Options: Horizontal Pod Autoscaler, Vertical Pod Autoscaler and Cluster Autoscaler
EDIT:
Scaling up is a time sensitive operation. You should consider the average time it can take your pods to scale up. Two example scenarios:
Best case scenario - 4 minutes:
30 seconds : Target metrics values updated: 30-60 seconds
30 seconds : HPA checks on metrics values: 30 seconds
< 2 seconds: pods created and goes into pending state - 1 second
< 2 seconds : CA sees the pending pods and fires up the calls to provision nodes - 1 second
3 minutes: Cloud provider provision the nodes & K8 waits for them till they are ready: up to 10 minutes (depends on multiple factors)
(Reasonable) Worst case scenario - 12 minutes:
60 seconds : Target metrics values updated
30 seconds : HPA checks on metrics values
< 2 seconds : pods created and goes into pending state
< 2 seconds : CA sees the pending pods and fires up the calls to provision nodes
10 minutes : Cloud provider provision the nodes & K8 waits for them till they are ready minutes (depends on multiple factors, such provider latency, OS latency, boot strapping tools, etc.)
Again, it is hard to estimate the exact time it would take so observation and metrics are the key here.
I'm trying(learning) to figure out the best way to utilize CPU (and RAM) on k8s nodes.
My final goal is to make sure CPU utilization on each node in the cluster is above X%
Till now I've read about cluster-autoscaler and HPA, but not sure if they'd help me with the use case.
From what I've read:
cluster-autoscaler is used to autoscale nodes based on a comparison between replica count and resources.request Vs available CPU on the target ec2 instance - which is NOT based on traffic/actual CPU usage
HPA is based on CPU/actual cpu usage, but for individual pods
I essentially wanna get to a point where kubectl top nodes would show all nodes are using > X% (let's say 60%) - and ideally trigger the autoscaling if we reach X2% (let's say 80%)
any suggestion/pointer on how to go about this use case? (or I should somehow use the combination of these 2 autoscaling mechanisms)
You can a combination of the HPA or/and Cluster autoscaler and/or the cloud providers' autoscaling group.
HPA based on CPU/Memory of your pods and scale up and down your K8s Deployments for example.
Cloud provider ASG or autoscaling group. Using the VMs or instances based and you can scale up and down based on their own CPU and memory metrics.
Cluster autoscaler. It works when pods are pending and they have nowhere to run, but if you are handling the case above this is more of a safe fail mechanism or perhaps for workloads that don't require to come up very quickly.
In summary, you can use all 3 above (or less) but you have to see what works for you so that they don't conflict with each other. One potential problem is that when your cloud ASG starts scaling down then you also have pods in pending state then your cluster autoscaler (if you have it enabled) will kick in and you may have both of them trying to do the opposite causing your cluster to just not being able to schedule any pod.
✌️☮️
I understand the logic behind replicating a pod across nodes, but is there any benefit to replicating pods in the same node? From my understanding this doesn't increase degree of replication, because one pod can take down a node.
Replication of pods within a node?
You can handle the replication condition/count using some metrics but you can't handle the spawning of the pods on the same node until unless there is some affinity set, as it's handled by kube-scheduler. However you can set various pod/node affinities, anti affinities and topologykey [if you wish to maintain a minimum number of pods on a particular node].
One pod can take a node down?
I highly doubt that, it can only happen if you don't have any requests/limits set for CPU/Memory or if in your HPA the maximum replicas are set to increase more than the capacity of the node with a condition that podaffinity is set so all the pods will be spawning on the same node.
When can such a scenario make sense?
The reason for it can be if your nodes are of irregular size. And you want to consume a particular node more than other nodes in the cluster.
In a best practices environment all the nodes are of regular size. HPA is set and limits/quotas are provided so that a pod doesn't crash a node by crossing all the limits of a node.
EDIT from comments:
So what if I want to run multiple instances of a nodejs app to utilize all the cores on my kube node? Lets say 8 cores, does it make sense to replicate the nodejs app pod 8 times in the same kube node or is it better to have 1 pod spin up 8 instances of the nodejs app?
A single-threaded application in your case, a pod will be considered as an instance. A pod to spin up 8 instances? It will be a multi-container pod with 8 containers of the same image, that would really be a bad practice and surely not even worth a test environment. However, having different replicas for the same Deployment is a do-able practice. But the question is how will you stop the Kubernetes service that if a pod is already serving some request and is in the lock state how will the request be routed to another pod. That's only possible by HPA if max request per pod is set to 1.
Why not use NodeJS Cluster module for kubernetes to utilise all the cores of the node in the cluster where App is deployed? -- Suggestion by #DanielKobe
node-js-scaling-out-on-kubernetes
Should-you-use-pm2-node-cluster-or-neither-in-kubernetes
I'm running a Kubernetes cluster (GCP) with 10 deployments. Each deployment is configured to auto scale on stress.
From my website statistics, I found that Monday is the day with the most load. I want to define Kubernetes deployment to have more min-replicas on this day.
Is this possible?
I read somewhere that I can run a cronjob script before and after this day and change the minimum number of machines. Is that the current way to do it? Is this safe? what if the cronjob wasn't fired? If this is the current way, please link me for some instruction how to do it.
Thanks!
You seem to be talking of two things here.
Pod autoscaling (Add more pods when load on existing pods increases) : HPA will help with this. If your workloads show a spike in CPU or memory and can handle horizontal scaling, then HPA would work fine.
Example : https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/
Now HPA can increase pods only if the cluster has enough nodes to schedule them.
If it is desired to have more nodes with more traffic and reduce them when traffic is low, a cluster autoscaler could be a good option.
https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler
Ofcourse, the scaling of nodes is not instantaneous as the autoscaler watches for pods which are in Pending state due to resource level constraints. After this, it requests additional nodes from the cloud provider and once these nodes join the cluster, workloads get scheduled.