Im looking to scale my pods/nodes based on disk space. Is it possible? I see that i can scale based on cpu or memory, but how can i scale based on disk usage?
Yes, you can use a tool named Keda, basically, it gives you the option to scale based on anything.
Here is an example of scaling based on the sum of HTTP requests to your service; Keda will take the number directly from prometheus.
So yes you can scale pods based on disk space if you know which metrics to use
Related
I have a k8s cluster in GKE with node autoscaler turned on. I want to maximize the resource utilization, and have applied all the suggestion on requests/limit changes recommended by GKE. At this moment there is 4 nodes as shown in the image below. They all uses n2-standard-2 i.e. 4 GB of memory per vCPU.
Memory request to allocatable ration is quite high compared to CPU request/allocatable.
Wondering if any other machine machine type that better suits my case. or any other resource optimization recommendation?
In GKE You can select custom compute sizes.
We find most workloads work best in 1:4 vCPU to Memory ratio (Hence the default). But it's possible to support other workload types. For your workload it looks like 1:2 for vCPU to Memory would be appropriate.
Also, it's hard to know exactly what sort of resource limit to set. You should look into generating some load for your cluster and using VPA to get a suggestion made by GKE cluster to be able to right size the limits.
**can the following be done : **
VPA relies on a number of different
measurements and is different from the HPA. We can
therefore use VPA without interference in relation to the HPA.
For a truly efficient scaling, the HPA and VPA complement
each other. HPA creates new replicas if the load raises. If the
space for these replicas is not sufficient, VPA will provide
some nodes, allowing HPA-made pods to run
can it use the same metrics? if we use metrics will both of it execute or do we need to define different metrics for both?
I would also like to clarify one thing:
If the space for these replicas is not sufficient, VPA will provide some nodes, allowing HPA-made pods to run
If the number of nodes provided changes, it is horizontal scaling. Vertical scaling would mean changing the resource capacity of a node like number of cpus or amount of memory.
As for VPA working with HPA:
No, According to this article:
Avoid using HPA and VPA in tandem
HPA and VPA are currently incompatible and a best practice is to avoid
using both together for the same set of pods. VPA can however be used
with HPA that is configured to use either external or custom metrics.
AFAIK, k8s is better suited for HPA. K8s documentation also has HPA page.
We have configured to use 2 metrics for HPA
CPU Utilization
App specific custom metrics
When testing, we observed the scaling happening, but calculation of no.of replicas is not very clear. I am not able to locate any documentation on this.
Questions:
Can someone point to documentation or code on the calculation part?
Is it a good practice to use multiple metrics for scaling?
Thanks in Advance!
From https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#how-does-the-horizontal-pod-autoscaler-work
If multiple metrics are specified in a HorizontalPodAutoscaler, this calculation is done for each metric, and then the largest of the desired replica counts is chosen. If any of those metrics cannot be converted into a desired replica count (e.g. due to an error fetching the metrics from the metrics APIs), scaling is skipped.
Finally, just before HPA scales the target, the scale recommendation is recorded. The controller considers all recommendations within a configurable window choosing the highest recommendation from within that window. This value can be configured using the --horizontal-pod-autoscaler-downscale-stabilization-window flag, which defaults to 5 minutes. This means that scaledowns will occur gradually, smoothing out the impact of rapidly fluctuating metric values
In Kubernetes, I am a little unclear of what criteria needs to be met for open-faas to scale a function's replicas up or down.
According to the documentation:
Auto-scaling in OpenFaaS allows a function to scale up or down depending on demand represented by different metrics.
It sounds like, by default, a reason for scaling would be requests/second increasing/decreasing.
OpenFaaS ships with a single auto-scaling rule defined in the mounted configuration file for AlertManager. AlertManager reads usage (requests per second) metrics from Prometheus in order to know when to fire an alert to the API Gateway.
And this "alert" sent to the API Gateway would cause a function's replica count to scale up.
I don't see in the documentation, or the AlertManager, where the threshold for requests/second is set to scale up/down at.
My overall questions:
What is the default threshold of requests/second that would cause a scale up?
Is this threshold configurable? If so, how?
We have a couple of clusters running on GKE and up until now I've only been maintaining a CPU request/limit for pods. We've recently run into issues where the cluster autoscaling isn't responding when pods begin to be evicted for low memory, and we can visibly see in the GKE console that there is memory pressure on at least one of the nodes.
I was hoping someone could tell me: is there some sort of calculation that we can make as a starting point for how much memory we should request/limit per pod of each of our services, or is that was more trial/error? Is there some statistic service that can track what's being used in the cluster now?
Thanks!
There is no magic trick for calculating limits. You need to start with reasonable limits and refine using trial and error.
I can suggest a video from YouTube that explains quite well a method to refine your limits: https://youtu.be/-lsJyni7EQA
Basically it suggests to start with low limits and load test your application (one pod instance) until it breaks.
Than, raise the limits and load test again until you find good values.