Horizontal Pod Autoscaler with custom metrics from Prometheus with percentiles for CPU usage - kubernetes

So I am trying to figure out how can I configure an Horizontal Pod Autoscaler from a custom metric reading from Prometheus that returns CPU usage with percentile 0.95
I have everything set up to use custom metrics with prometheus-adapter, but I don't understand how to create the rule in Prometheus. For example, if I go to Grafana to check some of the Graphs that comes by default I see this metric:
sum(namespace_pod_name_container_name:container_cpu_usage_seconds_total:sum_rate{namespace="api", pod_name="api-xxxxx9b-bdxx", container_name!="POD", cluster=""}) by (container_name)
But how can I modify that to be percentile 95? I tried with histogram_quantile function but it says no datapoints found:
histogram_quantile(0.95, sum(namespace_pod_name_container_name:container_cpu_usage_seconds_total:sum_rate{namespace="api", pod_name="api-xxxxx9b-bdxx", container_name!="POD", cluster=""}) by (container_name))
But even if that works, will the pod name and namespace be filled by prometheus-adapter or prometheus when using custom metrics?
And every example I find using custom metrics are not related with CPU. So... other question I have is how people is using autoscaling metrics in production? I'm used to scale based on percentiles but I don't understand how is this managed in Kubernetes.

If I understand you correctly you don't have to use custom metrics in order to horizontally autoscale your pods. By default, you can automatically scale the number of Kubernetes pods based on the observed CPU utilization.
Here is the official documentation with necessary details.
The Horizontal Pod Autoscaler automatically scales the number of pods
in a replication controller, deployment or replica set based on
observed CPU utilization (or, with custom metrics support, on some
other application-provided metrics).
The Horizontal Pod Autoscaler is implemented as a Kubernetes API
resource and a controller. The resource determines the behavior of the
controller. The controller periodically adjusts the number of replicas
in a replication controller or deployment to match the observed
average CPU utilization to the target specified by user.
And here you can find the walkthrough of how to set it up.
Also, here is the kubectl autoscale command documentation.
Example: kubectl autoscale rc foo --max=5 --cpu-percent=80
Auto scale a replication controller "foo", with the number of pods between 1 and 5, target CPU utilization at 80%
I believe that it is the easiest way so no need to complicate it with some custom metrics.
Please let me know if that helped.

If you want to add HPA based on custom metrics you can use Prometheus adapter.
Prometheus adapter helps you in exposing custom metrics to HPA.
Helm Chart - https://github.com/helm/charts/tree/master/stable/prometheus-adapter
Prometheus adapter - https://github.com/DirectXMan12/k8s-prometheus-adapter
Note - You have to enable 6443 port from public to cluster, because prometheus doesn’t provide override option.
https://github.com/helm/charts/blob/master/stable/prometheus-adapter/templates/custom-metrics-apiserver-deployment.yaml#L34
Make sure that Prometheus is getting custom metrics data
Install Prometheus adapter on the same kubernetes cluster where you want apply hpa
helm install --name my-release stable/prometheus-adapter -f values.yaml
Pass the following config file to helm - values.yaml
prometheus-adapter:
enabled: true
prometheus:
url: http://prometheus.namespace.svc.cluster.local
rules:
default: true
custom:
- seriesQuery: '{__name__="cpu",namespace!="",pod!="",service="svc_name"}'
seriesFilters: []
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "cpu"
as: "cpu_95"
metricsQuery: "histogram_quantile(0.95, sum(irate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>,le))"
Above config will expose
cpu metrics as cpu_95 to HPA.
To Verify, if data is exposed properly run following command -
Fetch the data curl raw query command - kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/namespace_name/pods/\*/cpu_95 | jq .
HPA config -
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: test-cpu-manual
labels:
app: app_name
spec:
scaleTargetRef:
apiVersion: apps/v1beta2
kind: Deployment
name: app_name
minReplicas: 1
maxReplicas: 15
metrics:
- type: Pods
pods:
metricName: cpu_95
targetAverageValue: 75

Related

How does the Kubernetes HPA(HorizontalPodAutoscaler) determine which pod's metrics should be used if multiple PODs have the same metrics

suppose we have below HPA(HorizontalPodAutoscaler) deployed in the demo namespace, and multiple pods (POD-A,POD-B) in this demo namespace have the same metric "istio_requests_per_second", How does the HPA determine the metric "istio_requests_per_second" from which pod should be used? Or every POD with this metric will be evaluate against the HPA target?
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: httpbin
spec:
minReplicas: 1
maxReplicas: 5
metrics:
- type: Pods
pods:
metric:
name: istio_requests_per_second
target:
type: AverageValue
averageValue: "10"
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: httpbin
test...
If you're using prometheus then its the adapter thats correlating between k8's pod name and what metric value to return. Basically the HPA is asking the prometheus adapter for metric istio_requests_per_second. By calling /apis/custom.metrics.k8s.io/v1beta1/namespaces/myNamespace/pods/mypod the adapter takes that and looks at its rules configured for what it should query for.
https://github.com/kubernetes-sigs/prometheus-adapter/blob/master/docs/config-walkthrough.md
Based on my test, I think HPA uses the 'scaleTargetRef' to determine which POD's metrics should be used, and pull these metrics from the metrics server and evaluate them against the target config.
As per Kubernetes documentation:
For object metrics and external metrics, a single metric is fetched, which describes the object in question. This metric is compared to the target value, to produce a ratio as above. In the autoscaling/v2 API version, this value can optionally be divided by the number of Pods before the comparison is made.
It will calculate the ratio based on the mean across the target pods.
References:
1.-https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#how-does-a-horizontalpodautoscaler-work

How to make kubernetes cluster elastic?

Helo i am running a .NET application in Azure Kubernetes Services as a 3 pod cluster (1 pod per node).
I am trying to understand how can i make my cluster elastic depending on load ?
How can i configure the deployment.yaml so that after a certain % of the cpu utilization and/or % of memory per pod it spawns another pod? The same thing when load decreases, how do i shut down instances.
Is there any guide/tutorial to set this up based on percentage (ideally) ?
The basic feature you need to use is called HorizontalPodAutoscaler or for short HPA. There you can configure cpu or memory limits and if the limit is exceeded, the pod replica number will be increased. E.g. from this walkthrough:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
This will scale out the php-apache deployment, as soon as the pods cpu utilization is greater than 50 %. Be aware that calculating the resource utilization and the resulting number of replicas is not as intuitive, as it might seam. Also see docs (the whole page should be quite interesting too). You can also combine criteria for scale out.
There are also addons that help you scale based on other parameters, like the number of messages in a queue. Check out keda, they provide different scalers, like RabbitMQ, Kafka, AWS CloudWatch, Azure Monitor, etc.
And since you wrote
1 pod per node
you might be running a DaemonSet. In that case your only option to scale out would be to add additional nodes, since with daemonsets there is always exactly one pod per node. If that's the case you could think about using a Deployment combined with a PodAntiAffinity instead, see docs. By that you can configure pods to preferably run on nodes where pods of the same deployment are not running yet, e.g.:
[...]
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S2
topologyKey: topology.kubernetes.io/zone
[...]
From docs:
The anti-affinity rule says that the scheduler should try to avoid scheduling the Pod onto a node that is in the same zone as one or more Pods with the label security=S2. More precisely, the scheduler should try to avoid placing the Pod on a node that has the topology.kubernetes.io/zone=R label if there are other nodes in the same zone currently running Pods with the Security=S2 Pod label.
That would make scale out more flexible as it is with a daemonset, yet you have a similar effect of pods being equally distributed through out the cluster.
If you want/need to stick to a daemonset you can check out the AKS Cluster Autoscaler, that can be used to automatically add/remove additional nodes from your cluster, based on resource consumption.

How to make HPA scale a deployment based on metrics produced by another deployment

What I am trying to achieve is creating a Horizontal Pod Autoscaler able to scale worker pods according to a custom metric produced by a controller pod.
I already have Prometheus scraping, Prometheus Adapater, Custom Metric Server fully operational and scaling the worker deployment with a custom metric my_controller_metric produced by the worker pods already works.
Now my workerpods don't produce this metric anymore, but the controller does.
It seems that the API autoscaling/v1 does not support this feature. I am able to specify the HPA with the autoscaling/v2beta1 API if necessary though.
Here is my spec for this HPA:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: my-worker-hpa
namespace: work
spec:
maxReplicas: 10
minReplicas: 1
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: Deployment
name: my-worker-deployment
metrics:
- type: Object
object:
target:
kind: Deployment
name: my-controller-deployment
metricName: my_controller_metric
targetValue: 1
When the configuration is applied with kubectl apply -f my-worker-hpa.yml I get the message:
horizontalpodautoscaler "my-worker-hpa" configured
Though this message seems to be OK, the HPA does not work. Is this spec malformed?
As I said, the metric is available in the Custom Metric Server with a kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq . | grep my_controller_metric.
This is the error message from the HPA:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
ScalingActive False FailedGetObjectMetric the HPA was unable to compute the replica count: unable to get metric my_controller_metric: Deployment on work my-controller-deployment/unable to fetch metrics from custom metrics API: the server could not find the metric my_controller_metric for deployments
Thanks!
In your case problem is HPA configuration: spec.metrics.object.target should also specify API version.
Putting apiVersion: extensions/v1beta1 under spec.metrics.object.target should fix it.
In addition, there is an open issue about better config validation in HPA: https://github.com/kubernetes/kubernetes/issues/60511

Difference between API versions v2beta1 and v2beta2 in Horizontal Pod Autoscaler?

The Kubernetes Horizontal Pod Autoscaler walkthrough in https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/ explains that we can perform autoscaling on custom metrics. What I didn't understand is when to use the two API versions: v2beta1 and v2beta2. If anybody can explain, I would really appreciate it.
Thanks in advance.
The first metrics autoscaling/V2beta1 doesn't allow you to scale your pods based on custom metrics. That only allows you to scale your application based on CPU and memory utilization of your application
The second metrics autoscaling/V2beta2 allows users to autoscale based on custom metrics. It allow autoscaling based on metrics coming from outside of Kubernetes. A new External metric source is added in this api.
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
It will identify a specific metric to autoscale on based on metric name and a label selector. Those metrics can come from anywhere like a stackdriver or prometheus monitoring application and based on some query from prometheus you want to scale your application.
It would always better to use V2beta2 api because it can do scaling on CPU and memory as well as on custom metrics, while V2beta1 API can scale only on internal metrics.
The snippet I mentioned in answer denotes how you can specify the target CPU utilisation in V2beta2 API
UPDATE: v2beta1 is deprecated in 1.19 and you should use v2beta2 going forward.
Also, v2beta2 added the new api field spec.behavior in 1.18 which allows you to define how fast or slow pods are scaled up and down.
Originally, both versions were functionally identical but had different APIs.
autoscaling/v2beta2 was released in Kubernetes version 1.12 and the release notes state:
We released autoscaling/v2beta2, which cleans up and unifies the API
The "cleans up and unifies the API" is referring to that fact that v2beta2 consistently uses the MetricIdentifier and MetricTarget objects:
spec:
metrics:
external:
metric: MetricIdentifier
target: MetricTarget
object:
describedObject: CrossVersionObjectReference
metric: MetricIdentifier
target: MetricTarget
pods:
metric: MetricIdentifier
target: MetricTarget
resource:
name: string
target: MetricTarget
type: string
In v2beta1, those fields have pretty different specs, making it (in my opinion) more difficult to figure out how to use.
How to check differences between HPA versions in general?
I would provide additional answer which I think would be also suitable for other version differences in the future.
Run kubectl api-versions and check which version your cluster is supporting.
Go to the K8S API site and comapre autoscaling versions:
MetricSpec v2beta2 autoscaling Vs MetricSpec v2beta1 autoscaling .
(*) Just notice that you're in the correct K8S version in the url:
https:// kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#metricspec-v2beta1-autoscaling
In case you need to drive the horizontal pod autoscaler with a custom external metric, and only v2beta1 is available to you (I think this is true of GKE still), we do this routinely in GKE. You need:
A stackdriver monitoring metric, possibly one you create yourself,
If the metric isn't derived from sampling Stackdriver logs, a way to publish data to the stackdriver monitoring metric, such as a cronjob that runs no more than once per minute (we use a little python script and Google's python library for monitoring_v3), and
A custom metrics adapter to expose Stackdriver monitoring to the HPA (e.g., in Google, gcr.io/google-containers/custom-metrics-stackdriver-adapter:v0.10.0). There's a tutorial on how to deploy this adapter here. You'll need to ensure that you grant the required RBAC stuff to the service account running the adapter, as shown here. You may or may not want to grant the principal that deploys the configuration cluster-admin role as described in the tutorial; we use Helm 2 w/ Tiller and are careful to grant least privilege to Tiller to deploy.
Configure your HPA this way:
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
...
spec:
scaleTargetRef:
kind: e.g., StatefulSet
name: name-of-pod-to-scale
apiVersion: e.g., apps/v1
minReplicas: 1
maxReplicas: ...
metrics:
type: External
external:
metricName: "custom.googleapis.com|your_metric_name"
metricSelector:
matchLabels:
resource.type: "generic_task"
resource.labels.job: ...
resource.labels.namespace: ...
resource.labels.project_id: ...
resourcel.labels.task_id: ...
targetValue: e.g., 0.7 (i.e., if you publish a metric that measures the ratio between demand and current capacity)
If you ask kubectl for your HPA object, you won't see autoscaling/v2beta1 settings, but this works well:
kubectl get --raw /apis/autoscaling/v2beta1/namespaces/your-namespace/horizontalpodautoscalers/your-autoscaler | jq
So far, we've only exercised this on GKE. It's clearly Stackdriver-specific. To the extent that Stackdriver can be deployed on other public managed k8s platforms, it might actually be portable. Or you might end up with a different way to publish a custom metric for each platform, using a different metrics publishing library in your cronjob, and a different custom metrics adapter. We know that one exists for Azure, for example.

How to use custom metric with specific filter in Horizontal Pod Autoscaling

I am trying to setup HPA for Ingress Controller based on custom metric nginx_ingress_controller_nginx_process_connections_total.
But while fetching the metrics from localhost:10254/metrics, I could see three such metrics with filter as follows:
# HELP nginx_ingress_controller_nginx_process_connections_total total number of connections with state {active, accepted, handled}
# TYPE nginx_ingress_controller_nginx_process_connections_total counter
nginx_ingress_controller_nginx_process_connections_total{controller_class="nginx",controller_namespace="ingress-nginx",controller_pod="nginx-ingress-controller-7dddd-mssssf",state="accepted"} 479707
nginx_ingress_controller_nginx_process_connections_total{controller_class="nginx",controller_namespace="ingress-nginx",controller_pod="nginx-ingress-controller-7dddd-mssssf",state="active"} 3
nginx_ingress_controller_nginx_process_connections_total{controller_class="nginx",controller_namespace="ingress-nginx",controller_pod="nginx-ingress-controller-7dddd-mssssf",state="handled"} 479707
Out of these metrics, I want to use the below metric for HPA.
nginx_ingress_controller_nginx_process_connections_total{controller_class="nginx",controller_namespace="ingress-nginx",controller_pod="nginx-ingress-controller-7dddd-mssssf",state="active"}
How can I use the specified metric from these different values. My yaml file for HPA is given below.
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
name: ingress-hpa
spec:
scaleTargetRef:
kind: Deployment
name: nginx-ingress-controller
minReplicas: 3
maxReplicas: 10
metrics:
- type: Pods
pods:
metricName: <I need to set the custom metric here>
targetAverageValue: 10000
You can use HPA custom metrics. You need to expose endpoint in POD to fetch the metrics also setup Prometheus and custom metric api server.