Horizontal pod Autoscaling without custom metrics - kubernetes

We want to scale our pods horizontally based on the amount of messages in our Kafka Topic. The standard solution is to publish the metrics to the custom metrics API of Kubernetes. However, due to company guidelines we are not allowed to use the custom metrics API of Kubernetes. We are only allowed to use non-admin functionality. Is there a solution for this with kubernetes-nativ features or do we need to implement a customized solution?

I'm not exactly sure if this would fit your needs but you could use Autoscaling on metrics not related to Kubernetes objects.
Applications running on Kubernetes may need to autoscale based on metrics that don’t have an obvious relationship to any object in the Kubernetes cluster, such as metrics describing a hosted service with no direct correlation to Kubernetes namespaces. In Kubernetes 1.10 and later, you can address this use case with external metrics.
Using external metrics requires knowledge of your monitoring system; the setup is similar to that required when using custom metrics. External metrics allow you to autoscale your cluster based on any metric available in your monitoring system. Just provide a metric block with a name and selector, as above, and use the External metric type instead of Object. If multiple time series are matched by the metricSelector, the sum of their values is used by the HorizontalPodAutoscaler. External metrics support both the Value and AverageValue target types, which function exactly the same as when you use the Object type.
For example if your application processes tasks from a hosted queue service, you could add the following section to your HorizontalPodAutoscaler manifest to specify that you need one worker per 30 outstanding tasks.
- type: External
external:
metric:
name: queue_messages_ready
selector: "queue=worker_tasks"
target:
type: AverageValue
averageValue: 30
When possible, it’s preferable to use the custom metric target types instead of external metrics, since it’s easier for cluster administrators to secure the custom metrics API. The external metrics API potentially allows access to any metric, so cluster administrators should take care when exposing it.
You may also have a look at zalando-incubator/kube-metrics-adapter and use Prometheus collector external metrics.
This is an example of an HPA configured to get metrics based on a Prometheus query. The query is defined in the annotation metric-config.external.prometheus-query.prometheus/processed-events-per-second where processed-events-per-second is the query name which will be associated with the result of the query. A matching query-name label must be defined in the matchLabels of the metric definition. This allows having multiple prometheus queries associated with a single HPA.
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
annotations:
# This annotation is optional.
# If specified, then this prometheus server is used,
# instead of the prometheus server specified as the CLI argument `--prometheus-server`.
metric-config.external.prometheus-query.prometheus/prometheus-server: http://prometheus.my->namespace.svc
# metric-config.<metricType>.<metricName>.<collectorName>/<configKey>
# <configKey> == query-name
metric-config.external.prometheus-query.prometheus/processed-events-per-second: |
scalar(sum(rate(event-service_events_count{application="event-service",processed="true"}[1m])))
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: custom-metrics-consumer
minReplicas: 1
maxReplicas: 10
metrics:
- type: External
external:
metric:
name: prometheus-query
selector:
matchLabels:
query-name: processed-events-per-second
target:
type: AverageValue
averageValue: "10"

Related

How does the Kubernetes HPA(HorizontalPodAutoscaler) determine which pod's metrics should be used if multiple PODs have the same metrics

suppose we have below HPA(HorizontalPodAutoscaler) deployed in the demo namespace, and multiple pods (POD-A,POD-B) in this demo namespace have the same metric "istio_requests_per_second", How does the HPA determine the metric "istio_requests_per_second" from which pod should be used? Or every POD with this metric will be evaluate against the HPA target?
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: httpbin
spec:
minReplicas: 1
maxReplicas: 5
metrics:
- type: Pods
pods:
metric:
name: istio_requests_per_second
target:
type: AverageValue
averageValue: "10"
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: httpbin
test...
If you're using prometheus then its the adapter thats correlating between k8's pod name and what metric value to return. Basically the HPA is asking the prometheus adapter for metric istio_requests_per_second. By calling /apis/custom.metrics.k8s.io/v1beta1/namespaces/myNamespace/pods/mypod the adapter takes that and looks at its rules configured for what it should query for.
https://github.com/kubernetes-sigs/prometheus-adapter/blob/master/docs/config-walkthrough.md
Based on my test, I think HPA uses the 'scaleTargetRef' to determine which POD's metrics should be used, and pull these metrics from the metrics server and evaluate them against the target config.
As per Kubernetes documentation:
For object metrics and external metrics, a single metric is fetched, which describes the object in question. This metric is compared to the target value, to produce a ratio as above. In the autoscaling/v2 API version, this value can optionally be divided by the number of Pods before the comparison is made.
It will calculate the ratio based on the mean across the target pods.
References:
1.-https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#how-does-a-horizontalpodautoscaler-work

K8S Ingress: How to limit requests in flight per pod

I am porting an application to run within k8s. I have run into an issue with ingress. I am trying to find a way to limit the number of REST API requests in flight at any given time to each backend pod managed by a deployment.
See the image below the shows the architecture.
Ingress is being managed by nginx-ingress. For a given set of URL paths, the ingress forwards the request to a service that targets a deployment of REST API backend processes. The deployment is also managed by an HPA based upon CPU load.
What I want to do is find a way to queue up ingress requests such that there are never more than X requests in flight to any pod running our API backend process. (ex. only allow 50 requests in flight at once per pod)
Does anyone know how to put a request limit in place like this?
As a bonus question, the next thing I would need to do is have the HPA monitor the request queuing and automatically scale up/down the deployment to match the number of pods to the number of requests currently being processed / queued. For example if each pod can handle 100 requests in flight at once and we currently have load levels of 1000 requests to handle, then autoscale to 10 pods.
If it is useful, I am also planning to have linkerd in place for this cluster. Perhaps it has a capability that could help.
Autoscaling in Network request requires the custom metrics. Given that you are using the NGINX ingress controller, you can first install prometheus and prometheus adaptor to export the metrics from NGINX ingress controller. By default, NGINX ingress controller has already exposed the prometheus endpoint.
The relation graph will be like this.
NGINX ingress <- Prometheus <- Prometheus Adaptor <- custom metrics api service <- HPA controller
The arrow means the calling in API. So, in total, you will have three more extract components in your cluster.
Once you have set up the custom metric server, you can scale your app based on the metrics from NGINX ingress. The HPA will look like this.
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: srv-deployment-custom-hpa
spec:
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: Deployment
name: srv-deployment
minReplicas: 1
maxReplicas: 100
metrics:
- type: Pods
pods:
metricName: nginx_srv_server_requests_per_second
targetAverageValue: 100
I won't go through the actual implementation here because it will include a lot of environment specific configuration.
Once you have set that up, you can see the HPA object will show up the metrics which is pulling from the adaptor.
For the rate limiting in the Service object level, you will need a powerful service mesh to do so. Linkerd2 is designed to be lightweight so it does not ship with the function in rate limiting. You can refer to this issue under linkerd2. The maintainer rejected to implement the rate limiting in the service level. They would suggest you to do this on Ingress level instead.
AFAIK, Istio and some advanced serivce mesh provides the rate limiting function. In case you haven't deployed the linkerd as your service mesh option, you may try Istio instead.
For Istio, you can refer this document to see how to do the rate limiting. But I need to let you know that Istio with NGINX ingress may cause you a trouble. Istio is shipped with its own ingress controller. You will need to have extra work for making it work.
To conclude, if you can use the HPA with custom metrics in the number of requests, it will be the quick solution to resolve your issue in traffic control. Unless you still have a really hard time with the traffic control, you will then need to consider the Service level rate limiting.
Nginx ingress allow to have rate limiting with annotations. You may want to have a look at the limit-rps one:
nginx.ingress.kubernetes.io/limit-rps: number of requests accepted from a given IP each second. The burst limit is set to this limit multiplied by the burst multiplier, the default multiplier is 5. When clients exceed this limit, limit-req-status-code default: 503 is returned.
On top of that NGINX will queue your requests with the leaky bucket algorithm so the incoming requests will buffered in the queue with FIFO (first-in-first-out) algorithm and then consumed at limited rate. The burst value in this case defines the size of the queue which allows the request to exceed the beyond limit. When this queue become full the next requests will be rejected.
For more detailed reading about limit traffic and shaping:
Nginx rate limiting in nutshell
Rate limiting nginx
Perhaps you should consider implementing Kubernetes Service APIs
Based on the latest kubernetes docs . We can do hpa based on custom metrics.
Doc reference : https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/
Adding code below :
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
type: Object
object:
metric:
name: requests-per-second
describedObject:
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
name: main-route
target:
type: Value
value: 10k
What i would suggest is having an ingress resource specifically for this service (load balancing in round robing) and then if you can do autoscaling based on the number of request(you expect) * no of min replica nodes . This should do a optimal hpa .
PS I will test it out myself and comment.

Dynamically update prometheus scrape config based on pod labels

I'm trying to enhance my monitoring and want to expand the amount of metrics pulled into Prometheus from our Kube estate. We already have a stand alone Prom implementation which has a hard coded config file monitoring some bare metal servers, and hooks into cadvisor for generic Pod metrics.
What i would like to do is configure Kube to monitor the apache_exporter metrics from a webserver deployed in the cluster, but also dynamically add a 2nd, 3rd etc webserver as the instances are scaled up.
I've looked at the kube-prometheus project, but this seems to be more geared to instances where there is no established Prometheus deployed. Is there a simple way to get prometheus to scrape the Kube API or etcd to pull in the current list of pods which match a certain criteria (ie, a tag like deploymentType=webserver) and scrape the apache_exporter metrics for these pods, and scrape the mysqld_exporter metrics where deploymentType=mysql
There's a project called kube-prometheus-stack (formerly prometheus-operator): https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
It has concepts called ServiceMonitor and PodMonitor:
https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/design.md#servicemonitor
https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/design.md#podmonitor
Basically, this is a selector that points your Prometheus instance to scrape targets. In the case of service selector, it discovers all the pods behind the service. In the case of a pod selector, it discovers pods directly. Prometheus scrape config is updated and reloaded automatically in both cases.
Example PodMonitor:
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: example
namespace: monitoring
spec:
podMetricsEndpoints:
- interval: 30s
path: /metrics
port: http
namespaceSelector:
matchNames:
- app
selector:
matchLabels:
app.kubernetes.io/name: my-app
Note that this PodMonitor object itself must be discovered by the controller. To achieve this you write a PodMonitorSelector(link). This additional explicit linkage is done intentionally - in this way, if you have 2 Prometheus instances on your cluster (say Infra and Product) you can separate which Prometheus will get which Pods to its scraping config.
The same applies to a ServiceMonitor.

Horizontal Pod Autoscaler with custom metrics from Prometheus with percentiles for CPU usage

So I am trying to figure out how can I configure an Horizontal Pod Autoscaler from a custom metric reading from Prometheus that returns CPU usage with percentile 0.95
I have everything set up to use custom metrics with prometheus-adapter, but I don't understand how to create the rule in Prometheus. For example, if I go to Grafana to check some of the Graphs that comes by default I see this metric:
sum(namespace_pod_name_container_name:container_cpu_usage_seconds_total:sum_rate{namespace="api", pod_name="api-xxxxx9b-bdxx", container_name!="POD", cluster=""}) by (container_name)
But how can I modify that to be percentile 95? I tried with histogram_quantile function but it says no datapoints found:
histogram_quantile(0.95, sum(namespace_pod_name_container_name:container_cpu_usage_seconds_total:sum_rate{namespace="api", pod_name="api-xxxxx9b-bdxx", container_name!="POD", cluster=""}) by (container_name))
But even if that works, will the pod name and namespace be filled by prometheus-adapter or prometheus when using custom metrics?
And every example I find using custom metrics are not related with CPU. So... other question I have is how people is using autoscaling metrics in production? I'm used to scale based on percentiles but I don't understand how is this managed in Kubernetes.
If I understand you correctly you don't have to use custom metrics in order to horizontally autoscale your pods. By default, you can automatically scale the number of Kubernetes pods based on the observed CPU utilization.
Here is the official documentation with necessary details.
The Horizontal Pod Autoscaler automatically scales the number of pods
in a replication controller, deployment or replica set based on
observed CPU utilization (or, with custom metrics support, on some
other application-provided metrics).
The Horizontal Pod Autoscaler is implemented as a Kubernetes API
resource and a controller. The resource determines the behavior of the
controller. The controller periodically adjusts the number of replicas
in a replication controller or deployment to match the observed
average CPU utilization to the target specified by user.
And here you can find the walkthrough of how to set it up.
Also, here is the kubectl autoscale command documentation.
Example: kubectl autoscale rc foo --max=5 --cpu-percent=80
Auto scale a replication controller "foo", with the number of pods between 1 and 5, target CPU utilization at 80%
I believe that it is the easiest way so no need to complicate it with some custom metrics.
Please let me know if that helped.
If you want to add HPA based on custom metrics you can use Prometheus adapter.
Prometheus adapter helps you in exposing custom metrics to HPA.
Helm Chart - https://github.com/helm/charts/tree/master/stable/prometheus-adapter
Prometheus adapter - https://github.com/DirectXMan12/k8s-prometheus-adapter
Note - You have to enable 6443 port from public to cluster, because prometheus doesn’t provide override option.
https://github.com/helm/charts/blob/master/stable/prometheus-adapter/templates/custom-metrics-apiserver-deployment.yaml#L34
Make sure that Prometheus is getting custom metrics data
Install Prometheus adapter on the same kubernetes cluster where you want apply hpa
helm install --name my-release stable/prometheus-adapter -f values.yaml
Pass the following config file to helm - values.yaml
prometheus-adapter:
enabled: true
prometheus:
url: http://prometheus.namespace.svc.cluster.local
rules:
default: true
custom:
- seriesQuery: '{__name__="cpu",namespace!="",pod!="",service="svc_name"}'
seriesFilters: []
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "cpu"
as: "cpu_95"
metricsQuery: "histogram_quantile(0.95, sum(irate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>,le))"
Above config will expose
cpu metrics as cpu_95 to HPA.
To Verify, if data is exposed properly run following command -
Fetch the data curl raw query command - kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/namespace_name/pods/\*/cpu_95 | jq .
HPA config -
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: test-cpu-manual
labels:
app: app_name
spec:
scaleTargetRef:
apiVersion: apps/v1beta2
kind: Deployment
name: app_name
minReplicas: 1
maxReplicas: 15
metrics:
- type: Pods
pods:
metricName: cpu_95
targetAverageValue: 75

Difference between API versions v2beta1 and v2beta2 in Horizontal Pod Autoscaler?

The Kubernetes Horizontal Pod Autoscaler walkthrough in https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/ explains that we can perform autoscaling on custom metrics. What I didn't understand is when to use the two API versions: v2beta1 and v2beta2. If anybody can explain, I would really appreciate it.
Thanks in advance.
The first metrics autoscaling/V2beta1 doesn't allow you to scale your pods based on custom metrics. That only allows you to scale your application based on CPU and memory utilization of your application
The second metrics autoscaling/V2beta2 allows users to autoscale based on custom metrics. It allow autoscaling based on metrics coming from outside of Kubernetes. A new External metric source is added in this api.
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
It will identify a specific metric to autoscale on based on metric name and a label selector. Those metrics can come from anywhere like a stackdriver or prometheus monitoring application and based on some query from prometheus you want to scale your application.
It would always better to use V2beta2 api because it can do scaling on CPU and memory as well as on custom metrics, while V2beta1 API can scale only on internal metrics.
The snippet I mentioned in answer denotes how you can specify the target CPU utilisation in V2beta2 API
UPDATE: v2beta1 is deprecated in 1.19 and you should use v2beta2 going forward.
Also, v2beta2 added the new api field spec.behavior in 1.18 which allows you to define how fast or slow pods are scaled up and down.
Originally, both versions were functionally identical but had different APIs.
autoscaling/v2beta2 was released in Kubernetes version 1.12 and the release notes state:
We released autoscaling/v2beta2, which cleans up and unifies the API
The "cleans up and unifies the API" is referring to that fact that v2beta2 consistently uses the MetricIdentifier and MetricTarget objects:
spec:
metrics:
external:
metric: MetricIdentifier
target: MetricTarget
object:
describedObject: CrossVersionObjectReference
metric: MetricIdentifier
target: MetricTarget
pods:
metric: MetricIdentifier
target: MetricTarget
resource:
name: string
target: MetricTarget
type: string
In v2beta1, those fields have pretty different specs, making it (in my opinion) more difficult to figure out how to use.
How to check differences between HPA versions in general?
I would provide additional answer which I think would be also suitable for other version differences in the future.
Run kubectl api-versions and check which version your cluster is supporting.
Go to the K8S API site and comapre autoscaling versions:
MetricSpec v2beta2 autoscaling Vs MetricSpec v2beta1 autoscaling .
(*) Just notice that you're in the correct K8S version in the url:
https:// kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#metricspec-v2beta1-autoscaling
In case you need to drive the horizontal pod autoscaler with a custom external metric, and only v2beta1 is available to you (I think this is true of GKE still), we do this routinely in GKE. You need:
A stackdriver monitoring metric, possibly one you create yourself,
If the metric isn't derived from sampling Stackdriver logs, a way to publish data to the stackdriver monitoring metric, such as a cronjob that runs no more than once per minute (we use a little python script and Google's python library for monitoring_v3), and
A custom metrics adapter to expose Stackdriver monitoring to the HPA (e.g., in Google, gcr.io/google-containers/custom-metrics-stackdriver-adapter:v0.10.0). There's a tutorial on how to deploy this adapter here. You'll need to ensure that you grant the required RBAC stuff to the service account running the adapter, as shown here. You may or may not want to grant the principal that deploys the configuration cluster-admin role as described in the tutorial; we use Helm 2 w/ Tiller and are careful to grant least privilege to Tiller to deploy.
Configure your HPA this way:
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
...
spec:
scaleTargetRef:
kind: e.g., StatefulSet
name: name-of-pod-to-scale
apiVersion: e.g., apps/v1
minReplicas: 1
maxReplicas: ...
metrics:
type: External
external:
metricName: "custom.googleapis.com|your_metric_name"
metricSelector:
matchLabels:
resource.type: "generic_task"
resource.labels.job: ...
resource.labels.namespace: ...
resource.labels.project_id: ...
resourcel.labels.task_id: ...
targetValue: e.g., 0.7 (i.e., if you publish a metric that measures the ratio between demand and current capacity)
If you ask kubectl for your HPA object, you won't see autoscaling/v2beta1 settings, but this works well:
kubectl get --raw /apis/autoscaling/v2beta1/namespaces/your-namespace/horizontalpodautoscalers/your-autoscaler | jq
So far, we've only exercised this on GKE. It's clearly Stackdriver-specific. To the extent that Stackdriver can be deployed on other public managed k8s platforms, it might actually be portable. Or you might end up with a different way to publish a custom metric for each platform, using a different metrics publishing library in your cronjob, and a different custom metrics adapter. We know that one exists for Azure, for example.