Custom prometheus query with k8s for HPA - kubernetes

I want to do Horizontal pod autoscaling on the basis of input request. I made query for prometheus which returns aggregated input tps.
sum(sum by (path) (rate(dapr_http_server_request_count{app_id="governor",path=~"/v1.0/invoke/app/method/interceptor/.*"}[10s])))
I want to use this output in kubernetes HPA.
I am using prometheus-adapter for the same. Prometheus configuration is as follows:
default: true
custom:
- seriesQuery: '{__name__=~"dapr_http_server_request_avg_.*",namespace!="",pod!=""}'
resources:
overrides:
namespace:
resource: namespace
pod:
resource: pod
name:
matches: "^(.*)_total"
as: "${1}_per_second"
metricsQuery: 'sum(sum by (path) (rate(dapr_http_server_request_count{app_id="governor",path=~"/v1.0/invoke/app/method/interceptor/.*"}[10s])))'
When i try to get the output in custom api of kubernetes it returns.
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/app/pods/*/dapr_http_server_request_avg_total" | jq .
Error from server (NotFound): the server could not find the metric dapr_http_server_request_avg_total for pods

You should check your prometheus-adapter config file, which makes me confused.
seriesQuery: dapr_http_server_request_avg_.*
metricsQuery: dapr_http_server_request_count
It is possible to get metrics dapr_http_server_request_count from above series?
name:
matches: "^(.*)_total"
as: "${1}_per_second"
and here take care, your metric name seems to be renamed, you should find the right metric name for you query.
try this: kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1
you will see what your K8s Api-server actually get from Prometheus Adapter.

Related

How to format the output in Kubernetes?

I want to get specific output for a command like getting the nodeports and loadbalancer of a service. How do I do that?
The question is pretty lacking on what exactly wants to be retrieved from Kubernetes but I think I can provide a good baseline.
When you use Kubernetes, you are most probably using kubectl to interact with kubeapi-server.
Some of the commands you can use to retrieve the information from the cluster:
$ kubectl get RESOURCE --namespace NAMESPACE RESOURCE_NAME
$ kubectl describe RESOURCE --namespace NAMESPACE RESOURCE_NAME
Example:
Let's assume that you have a Service of type LoadBalancer (I've redacted some output to be more readable):
$ kubectl get service nginx -o yaml
apiVersion: v1
kind: Service
metadata:
name: nginx
namespace: default
spec:
clusterIP: 10.2.151.123
externalTrafficPolicy: Cluster
ports:
- nodePort: 30531
port: 80
protocol: TCP
targetPort: 80
selector:
app: nginx
sessionAffinity: None
type: LoadBalancer
status:
loadBalancer:
ingress:
- ip: A.B.C.D
Getting a nodePort from this output could be done like this:
kubectl get svc nginx -o jsonpath='{.spec.ports[].nodePort}'
30531
Getting a loadBalancer IP from this output could be done like this:
kubectl get svc nginx -o jsonpath="{.status.loadBalancer.ingress[0].ip}"
A.B.C.D
You can also use kubectl with custom-columns:
kubectl get service -o=custom-columns=NAME:metadata.name,IP:.spec.clusterIP
NAME IP
kubernetes 10.2.0.1
nginx 10.2.151.123
There are a lot of possible ways to retrieve data with kubectl which you can read more by following the:
kubectl get --help:
-o, --output='': Output format. One of:
json|yaml|wide|name|custom-columns=...|custom-columns-file=...|go-template=...|go-template-file=...|jsonpath=...|jsonpath-file=...
See custom columns, golang template and jsonpath template.
Kubernetes.io: Docs: Reference: Kubectl: Cheatsheet: Formatting output
Additional resources:
Kubernetes.io: Docs: Reference: Kubectl: Overview
Github.com: Kubernetes client: Python - if you would like to retrieve this information with Python
Stackoverflow.com: Answer: How to parse kubectl describe output and get the required field value
If you want to extract just single values, perhaps as part of scripts, then what you are searching for is -ojsonpath such as this example:
kubectl get svc service-name -ojsonpath='{.spec.ports[0].port}'
which will extract jus the value of the first port listed into the service specs.
docs - https://kubernetes.io/docs/reference/kubectl/jsonpath/
If you want to extract the whole definition of an object, such as a service, then what you are searching for is -oyaml such as this example:
kubectl get svc service-name -oyaml
which will output the whole service definition, all in yaml format.
If you want to get a more user-friendly description of a resource, such as a service, then you are searching for a describe command, such as this example:
kubectl describe svc service-name
docs - https://kubernetes.io/docs/reference/kubectl/overview/#output-options

kubernetes Autoscaler - Cannot obtain loadbalancing.googleapis.com|https|request_count

I'm trying to define an Horizontal Pod Autoscaler for two Kubernetes services.
The Autoscaler strategy relies in 3 metrics:
cpu
pubsub.googleapis.com|subscription|num_undelivered_messages
loadbalancing.googleapis.com|https|request_count
CPU and num_undelivered_messages are correctly obtained, but no matter what i do, i cannot get the request_count metric.
The first service is a backend service (Service A), and the other (Service B) is an API that uses an Ingress to manage the external access to the service.
The Autoscaling strategy is based on Google documentation: Autoscaling Deployments with External Metrics.
For service A, the following defines the metrics used for Autoscaling:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: ServiceA
spec:
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: Deployment
name: ServiceA
minReplicas: 1
maxReplicas: 3
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 80
- external:
metricName: pubsub.googleapis.com|subscription|num_undelivered_messages
metricSelector:
matchLabels:
resource.labels.subscription_id: subscription_id
targetAverageValue: 100
type: External
For service B, the following defines the metrics used for Autoscaling:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: ServiceB
spec:
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: Deployment
name: ServiceB
minReplicas: 1
maxReplicas: 3
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 80
- external:
metricName: loadbalancing.googleapis.com|https|request_count
metricSelector:
matchLabels:
resource.labels.forwarding_rule_name: k8s-fws-default-serviceb--3a908157de956ba7
targetAverageValue: 100
type: External
As defined in the above article, the metrics server is running, and the metrics server adapter is deployed:
$ kubectl get apiservices |egrep metrics
v1beta1.custom.metrics.k8s.io custom-metrics/custom-metrics-stackdriver-adapter True 2h
v1beta1.external.metrics.k8s.io custom-metrics/custom-metrics-stackdriver-adapter True 2h
v1beta1.metrics.k8s.io kube-system/metrics-server True 2h
v1beta2.custom.metrics.k8s.io custom-metrics/custom-metrics-stackdriver-adapter True 2h
For service A, all metrics, CPU and num_undelivered_messages, are correctly obtained:
$ kubectl get hpa ServiceA
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
ServiceA Deployment/ServiceA 0/100 (avg), 1%/80% 1 3 1 127m
For service B, HPA cannot obtain the Request Count:
$ kubectl get hpa ServiceB
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
ServiceB Deployment/ServiceB <unknown>/100 (avg), <unknown>/80% 1 3 1 129m
When accessing the Ingress, i get this warning:
unable to get external metric default/loadbalancing.googleapis.com|https|request_count/&LabelSelector{MatchLabels:map[string]string{resource.labels.forwarding_rule_name: k8s-fws-default-serviceb--3a908157de956ba7,},MatchExpressions:[],}: no metrics returned from external metrics API
The metricSelector for the forwarding-rule is correct, as confirmed when describing the ingress (only the relevant information is show):
$ kubectl describe ingress serviceb
Annotations:
ingress.kubernetes.io/https-forwarding-rule: k8s-fws-default-serviceb--3a908157de956ba7
I've tried to use a different metric selector, for example, using url_map_name, to no avail, i've got a similar error.
I've followed the exact guidelines on Google Documentation, and checked with a few online tutorials that refer the exact same process, but i haven't been able to understand what i'm missing.
I'm probably lacking some configuration, or some specific detail, but i cannot find it documented anywhere.
What am i missing, that explains why i'm not being able to obtain the loadbalancing.googleapis.com|https|request_count metric?
It seems the metric that you're defining isn't available in the External Metrics API. To find out what's going on, you can inspect the External Metrics API directly:
kubectl get --raw="/apis/external.metrics.k8s.io/v1beta1" | jq
Is the loadbalancing.googleapis.com|https|request_count metric reported in the output?
You can then dig deeper by making requests of the following form:
kubectl get --raw="/apis/external.metrics.k8s.io/v1beta1/namespaces/<namespace_name>/<metric_name>?labelSelector=<selector>" | jq
And see what's returned given your metric name and a specific metric selector.
These are precisely the requests that the Horizontal Pod Autoscaler also makes at runtime. By replicating them manually, you should be able to pinpoint the source of the problem.
Comments about additional information:
1) 83m is the Kubernetes way of writing 0.083 (read as 83 "milli-units").
2) In your HorizontalPodAutoscaler definition, you use a targetAverageValue. So, if there exist multiple targets with this metric, the HPA calculates their average. So, 83m might be an average of multiple targets. To make sure, you use only the metric of a single target, you can use the targetValue field (see API reference).
3) Not sure why the items: [] array in the API response is empty. The documentation mentions that after sampling, the data is not visible for 210 seconds... You could try making the API request when the HPA is not running.
Thank you very much for your detailed response.
When using the metricSelector to select the specific forwarding_rule_name, we need to use the exact forwarding_rule_name as defined by the ingress:
metricSelector:
matchLabels:
resource.labels.forwarding_rule_name: k8s-fws-default-serviceb--3a908157de956ba7
$ kubectl describe ingress
Name: serviceb
...
Annotations:
ingress.kubernetes.io/https-forwarding-rule: k8s-fws-default-serviceb--9bfb478c0886702d
...
kubernetes.io/ingress.allow-http: false
kubernetes.io/ingress.global-static-ip-name: static-ip
The problem, is that the suffix of the forwarding_rule_name (3a908157de956ba7) changes for every deployment, and is created dynamically on Ingress creation:
k8s-fws-default-serviceb--3a908157de956ba7
We have a fully automated deployment using Helm, and, as such, when the HPA is created, we don't know what the forwarding_rule_name will be.
And, it seems that the matchLabels does not accept regular expressions, or else we would simply do something like:
metricSelector:
matchLabels:
resource.labels.forwarding_rule_name: k8s-fws-default-serviceb--*
I've tried several approaches, all without success:
Use Annotations to force the forwarding_rule_name
Use a different machLabel, as backend_target_name
Obtain the forwarding_rule_name using a command, so i can insert it later in the yaml file.
Use Annotations to force the forwarding_rule_name:
When creating the ingress, i can use specific annotations to change the default behavior, or define specific values, for example, on Ingress.yaml:
annotations:
kubernetes.io/ingress.global-static-ip-name: static-ip
I tried to use the https-forwarding-rule annotation to force a specific "static" name, but this didn't work:
annotations:
ingress.kubernetes.io/https-forwarding-rule: some_name
annotations:
kubernetes.io/https-forwarding-rule: some_name
Use a different machLabel, as backend_target_name
metricSelector:
matchLabels:
resource.labels.backend_target_name: serviceb
Also failed.
Obtain the forwarding_rule_name using a command
When executing the following command, i get the list of Forwarding Rules, but for all the clusters. And according to the documentation, is not possible to filter by cluster:
gcloud compute forwarding-rules list
NAME P_ADDRESS IP_PROTOCOL TARGET
k8s-fws-default-serviceb--4e1c268b39df8462 xx TCP k8s-tps-default-serviceb--4e1c268b39df8462
k8s-fws-default-serviceb--9bfb478c0886702d xx TCP k8s-tps-default-serviceb--9bfb478c0886702d
Is there any way to allow me to select the resource i need, in order to get the Requests count metric?
It seems everything was OK with my code, but, there is a time delay (aprox. 10m), before the request_count metric is available. After this period, the metric is now computed and available:
$ kubectl get hpa ServiceB
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
ServiceB Deployment/ServiceB 83m/100 (avg), 1%/80% 1 3 1 18m
Now, regarding the loadbalancing.googleapis.com|https|request_count metric, i'm not understanding how its being presented. What does 83m means?
According to Google documentation for Load balancing metrics:
https/request_bytes_count Request bytes
DELTA, INT64, By
GA
The number of requests served by HTTP/S load balancer. Sampled every 60
seconds. After sampling, data is not visible for up to 210 seconds.
According to Metric Details:
In a DELTA metric, each data point represents the change in a value
over the time interval. For example, the number of service requests
received since the previous measurement would be a delta metric.
I've made one single request to the service, so i was expecting a value of 1, and i can't understand what the 83m means.
Another possibility, could be that i'm not using the correct metric.
I've selected the loadbalancing.googleapis.com|https|request_count metric, assuming it would provide the number of requests that were executed by the service, via the loadbalancer.
Isn't exactly this information that the loadbalancing.googleapis.com|https|request_count metric provides?
Regarding the above comment, when executing:
kubectl get --raw="/apis/external.metrics.k8s.io/v1beta1/namespaces/default/pubsub.googleapis.com|subscription|num_undelivered_messages" | jq
i get the correct data:
...
{
"metricName": "pubsub.googleapis.com|subscription|num_undelivered_messages",
"metricLabels": {
"resource.labels.project_id": "project-id",
"resource.labels.subscription_id": "subscription_id",
"resource.type": "pubsub_subscription"
},
"timestamp": "2019-10-22T15:39:58Z",
"value": "4"
}
...
but, when executing:
kubectl get --raw="/apis/external.metrics.k8s.io/v1beta1/namespaces/default/loadbalancing.googleapis.com|https|request_count" | jq
i get nothing back:
{ "kind": "ExternalMetricValueList", "apiVersion":
"external.metrics.k8s.io/v1beta1", "metadata": {
"selfLink": >"/apis/external.metrics.k8s.io/v1beta1/namespaces/default/loadbalancing.googleapis.com%7Chttps%7Crequest_count"
}, "items": [] }

Horizontal Pod Autoscaler with custom metrics from Prometheus with percentiles for CPU usage

So I am trying to figure out how can I configure an Horizontal Pod Autoscaler from a custom metric reading from Prometheus that returns CPU usage with percentile 0.95
I have everything set up to use custom metrics with prometheus-adapter, but I don't understand how to create the rule in Prometheus. For example, if I go to Grafana to check some of the Graphs that comes by default I see this metric:
sum(namespace_pod_name_container_name:container_cpu_usage_seconds_total:sum_rate{namespace="api", pod_name="api-xxxxx9b-bdxx", container_name!="POD", cluster=""}) by (container_name)
But how can I modify that to be percentile 95? I tried with histogram_quantile function but it says no datapoints found:
histogram_quantile(0.95, sum(namespace_pod_name_container_name:container_cpu_usage_seconds_total:sum_rate{namespace="api", pod_name="api-xxxxx9b-bdxx", container_name!="POD", cluster=""}) by (container_name))
But even if that works, will the pod name and namespace be filled by prometheus-adapter or prometheus when using custom metrics?
And every example I find using custom metrics are not related with CPU. So... other question I have is how people is using autoscaling metrics in production? I'm used to scale based on percentiles but I don't understand how is this managed in Kubernetes.
If I understand you correctly you don't have to use custom metrics in order to horizontally autoscale your pods. By default, you can automatically scale the number of Kubernetes pods based on the observed CPU utilization.
Here is the official documentation with necessary details.
The Horizontal Pod Autoscaler automatically scales the number of pods
in a replication controller, deployment or replica set based on
observed CPU utilization (or, with custom metrics support, on some
other application-provided metrics).
The Horizontal Pod Autoscaler is implemented as a Kubernetes API
resource and a controller. The resource determines the behavior of the
controller. The controller periodically adjusts the number of replicas
in a replication controller or deployment to match the observed
average CPU utilization to the target specified by user.
And here you can find the walkthrough of how to set it up.
Also, here is the kubectl autoscale command documentation.
Example: kubectl autoscale rc foo --max=5 --cpu-percent=80
Auto scale a replication controller "foo", with the number of pods between 1 and 5, target CPU utilization at 80%
I believe that it is the easiest way so no need to complicate it with some custom metrics.
Please let me know if that helped.
If you want to add HPA based on custom metrics you can use Prometheus adapter.
Prometheus adapter helps you in exposing custom metrics to HPA.
Helm Chart - https://github.com/helm/charts/tree/master/stable/prometheus-adapter
Prometheus adapter - https://github.com/DirectXMan12/k8s-prometheus-adapter
Note - You have to enable 6443 port from public to cluster, because prometheus doesn’t provide override option.
https://github.com/helm/charts/blob/master/stable/prometheus-adapter/templates/custom-metrics-apiserver-deployment.yaml#L34
Make sure that Prometheus is getting custom metrics data
Install Prometheus adapter on the same kubernetes cluster where you want apply hpa
helm install --name my-release stable/prometheus-adapter -f values.yaml
Pass the following config file to helm - values.yaml
prometheus-adapter:
enabled: true
prometheus:
url: http://prometheus.namespace.svc.cluster.local
rules:
default: true
custom:
- seriesQuery: '{__name__="cpu",namespace!="",pod!="",service="svc_name"}'
seriesFilters: []
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "cpu"
as: "cpu_95"
metricsQuery: "histogram_quantile(0.95, sum(irate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>,le))"
Above config will expose
cpu metrics as cpu_95 to HPA.
To Verify, if data is exposed properly run following command -
Fetch the data curl raw query command - kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/namespace_name/pods/\*/cpu_95 | jq .
HPA config -
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: test-cpu-manual
labels:
app: app_name
spec:
scaleTargetRef:
apiVersion: apps/v1beta2
kind: Deployment
name: app_name
minReplicas: 1
maxReplicas: 15
metrics:
- type: Pods
pods:
metricName: cpu_95
targetAverageValue: 75

Prometheus Adapter custom metrics HPA

I was following this walk through (partially, am using EKS.
https://itnext.io/horizontal-pod-autoscale-with-custom-metrics-8cb13e9d475
I manage to get scaled up one deployment with this http_requests_total metric.
Now, I am trying to add a new metric. I have prometheus server, and it already scrapes cloudwatch and I have aws_sqs_approximate_age_of_oldest_message_maximum value there for many of my queues.
In the similar manner to the mentioned tutorial, I am adding the definition of a metric:
- seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'
resources:
overrides:
kubernetes_namespace: {resource: "namespace"}
kubernetes_pod_name: {resource: "pod"}
name:
matches: "^(.*)_total"
as: "${1}_per_second"
metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
vs
- seriesQuery: 'aws_sqs_approximate_age_of_oldest_message_maximum{queue_name!=""}'
resources:
overrides:
kubernetes_namespace: {resource: "namespace"}
kubernetes_pod_name: {resource: "pod"}
metricsQuery: '<<.Series>>{<<.LabelMatchers>>}'
Or some version of the bottom one. But, I can never see it in:
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq
No matter what I try.
Any ideas how to move forward?
Thanks!
If you don't see the metric in /apis/custom.metrics.k8s.io/v1beta1 it means that the Prometheus Adapter couldn't discover it.
The Prometheus Adapter discovers metrics by using the value of your seriesQuery field for an /api/v1/series request to Prometheus (done periodically with a frequency defined by the relist interval).
Things to try:
What do you get if you make the following request to Prometheus?
http://<prometheus-ip>:9090/api/v1/series? match[]=aws_sqs_approximate_age_of_oldest_message_maximum{queue_name!=""}&start=<current-timestamp-sec>
What do you get if you drop the following in the query text box of the Prometheus UI and press Execute?
aws_sqs_approximate_age_of_oldest_message_maximum{queue_name!=""}
If you get no data back in either case, then you just don't have any time series in Prometheus that match your seriesQuery specification.

Prometheus Adapter Custom Metrics for Libvirt in a K8S Cluster

I have a K8S cluster which is also managing VMs via virtlet. This K8S cluster is running K8S v1.13.2, with prometheus and the prometheus-adapter, and a custom-metrics server. I have written a custom metrics exporter for libvirtd which pulls in VM metrics and have configured prometheus to scrape that exporter for those VM metrics -- this is working and working well.
What I need to do next, is to have the prometheus-adapter push those metrics into K8S. Nothing I have done is working. Funny thing is, I can see the metrics in prometheus, but I am unable to present them to the custom metrics API.
Example metric visible in prometheus:
libvirt_cpu_stats_cpu_time_nanosecs{app="prometheus-lex",domain="virtlet-c91822c8-5e82-beta-deflect",instance="192.168.2.32:9177",job="kubernetes-pods",kubernetes_namespace="default",kubernetes_pod_name="prometheus-lex-866694b884-9z8v6",name="prometheus-lex",pod_template_hash="866694b884"}
Prometheus Adapter configuration for this metric:
- seriesQuery: 'libvirt_cpu_stats_cpu_time_nanosecs{job="kubernetes-pods", app="prometheus-lex"}'
seriesFilters: []
resource:
overrides:
kubernetes_pod_name:
resource: pod
kubernetes_namespace:
resource: namespace
name:
matches: libvirt_cpu_stats_cpu_time_nanosecs
as: libvirt_cpu_stats_cpu_time_rate
metricsQuery: rate(libvirt_cpu_stats_cpu_time_nanosecs{job="kubernetes-pods", app="prometheus-lex", <<.LabelMatchers>>}[5m])
When I query the custom metrics API, I do not see what I am looking for:
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1|grep libvirt
returns nothing
Additionally, I can see the prometheus-adapter is able to query the series from prometheus. So I know that side of the adapter is working. I am just trying to figure out why it's not presenting them to the custom metrics server.
From the prometheus-adapter
I0220 19:12:58.442937 1 api.go:74] GET http://prometheus-server.default.svc.cluster.local:80/api/v1/series?match%5B%5D=libvirt_cpu_stats_cpu_time_nanosecs%7Bkubernetes_namespace%21%3D%22%22%2Ckubernetes_pod_name%21%3D%22%22%7D&start=1550689948.392 200 OK
Any ideas what I am missing here?
Update::
I have also tried the following new configuration, and it's still not working.
- seriesQuery: 'libvirt_cpu_stats_cpu_time_nanosecs{kubernetes_namespace!="",kubernetes_pod_name!=""}'
seriesFilters: []
resource:
overrides:
kubernetes_namespace: {resource: "namespace"}
kubernetes_pod_name: {resource: "pod"}
name:
matches: 'libvirt_cpu_stats_cpu_time_nanosecs'
as: 'libvirt_cpu_stats_cpu_time_rate'
metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
It actually depends on how you install the Prometheus Adapter. If you install via helm and use the YAML as configuration to the rules. You need to follow this README https://github.com/helm/charts/blob/master/stable/prometheus-adapter/README.md and and declare the rules like
rules:
custom:
- seriesQuery: '{__name__=~"^some_metric_count$"}'
resources:
template: <<.Resource>>
name:
matches: ""
as: "my_custom_metric"
metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)
Pay attention to the custom keyword. If you miss it, the number won't be available via custom metrics.