I'm looking to reduce the number of Grafana alerts by making the existing PromQL slightly more DRY. How can I alert on all persistentvolumeclaim's within the same namespace? The ideal scenario is that I have one PromQL query targeting one namespace and the alerts can indicate something like "Volume 95% Utilized", etc.
Example PromQL queries that I would like to reduce to one:
sum without(instance, node) (topk(1, (kubelet_volume_stats_available_bytes{cluster="", job="kubelet", metrics_path="/metrics", namespace="develop", persistentvolumeclaim="chserver-logs"})))
sum without(instance, node) (topk(1, (kubelet_volume_stats_available_bytes{cluster="", job="kubelet", metrics_path="/metrics", namespace="develop", persistentvolumeclaim="chserver-vol2"})))
Related
Context
We have a Spring Boot application, deployed into K8s cluster (with 2 instances) configured with Micrometer exporter for Prometheus and visualization in Grafana.
My custom metrics
I've implemented couple of additional Micrometer metrics, that report some information regarding business data in the database (PostgreSQL) and I could see those metrics in Grafana, however separately for each pod.
Problem:
For our 2 pods in Grafana - I can see separate set of same metrics and the most recent value can be found by choosing (by label) one of the pods.
However there is no way to tell which pod reported the most recent values.
Is there a way to somehow always show the metrics values from the pod that was scraped last (ie it will contain the most fresh metric data)?
Right now in order to see the most fresh metric data - I have to switch pods and guess which one has the latest values.
(The metrics in question relate to database, therefore yielding the same values no matter the pod from which they are requested.)
In Prometheus, you can obtain the labels of the latest scrape using topk() and timestamp() function:
topk(1,timestamp(up{job="micrometer"}))
This can then be used in Grafana to populate a (hidden) variable containing the instance name:
Name: instance
Type: Query
Query: topk(1,timestamp(up{job="micrometer"}))
Regex: /.*instance="([^"]*)".*/
I advise to active the refresh on time range change to get the last scrape in your time range.
Then you can use the variable in all your dashboard's queries:
micrometer_metric{instance="${instance}"}
EDIT: requester wants to update it on each data refresh
If you want to update it on each data refresh, it needs to be used in every query of your dashboard using AND logical operator:
micrometer_other_metric AND ON(instance) topk(1,timestamp(up{job="micrometer"}))
vector1 AND vector2 results in a vector consisting of the elements of vector1 for which there are elements in vector2 with exactly matching label sets. Other elements are dropped.
Is there a way to easily query Kubernetes resources in an intuitive way? Basically I want to run queries to extract info about objects which match my criteria. Currently I face an issue where my match labels isn't quite working and I would like to run the match labels query manually to try and debug my issue.
Basically in a pseudo code way:
Select * from pv where labels in [red,blue,green]
Any third party tools who do something like this? Currently all I have to work with is the search box on the dashboard which isn't quite robust enough.
You could use kubectl with JSONPath (https://kubernetes.io/docs/reference/kubectl/jsonpath/). More information on JSONPath: https://github.com/json-path/JsonPath
It allows you to query any resource property, example:
kubectl get pods -o=jsonpath='{$.items[?(#.metadata.namespace=="default")].metadata.name}'
This would list all pod names in namespace "default". Your pseudo code would be something along the lines:
kubectl get pv -o=jsonpath='{$.items[?(#.metadata.label in ["red","blue","green"])]}'
I have used a variable in grafana which looks like this:
label_values(some_metric, service)
If the metric is not emitted by the data source at the current time the variable values are not available for the charts. The variable in my case is the release name and all the charts of grafana are dependent on this variable.
After the server I was monitoring crashed, this metric is not emitted. Even if I set a time range to match the time when metric was emitted, it has no impact as the query for the variable is not taking the time range into account.
In Prometheus I can see the values for the metric using the query:
some_metric[24h]
In grafana this is invalid:
label_values(some_metric[24h], service)
Also as per the documentation its invalid to provide $__range etc for label_values.
If I have to use the query_result instead how do I write the above invalid grafana query in correct way so that I get the same result as label_values?
Is there any other way to do this?
The data source is Prometheus.
I'd suggest query_result(count by (somelabel)(count_over_time(some_metric[$__range]))) and then use regular expressions to extract out the label value you want.
That I'm using count here isn't too important, it's more that I'm using an over_time function and then aggregating.
The most straightforward and lightweight solution is to use last_over_time function. For example, the following Grafana query template would return all the unique service label values for all the some_metric time series, which were available during the last 24 hours:
label_values(last_over_time(some_metric[24h]), service)
I am trying to put a dropdown for each API end point which will show the QPS and Latency of http requests (RED metrics).
I used Grafana's templating and used the following prometheus query.
label_values(http_duration_milliseconds_count, api_path)
But the problem here is sort order. It shows some longtail api requests like /admin/phpMyAdmin all.
I want to do only the top 10 endpoints by count to be shown in this drop down. How do I achieve this?
Attached an image for reference on my first dashboard.
We can use query_result to achieve this.
https://grafana.com/docs/grafana/latest/datasources/prometheus/template-variables/#use-query-variables
query_result(topk(10, sort_desc(sum(http_tt_ms_count) by (api_path))))
http_tt_ms_count - is my metric timeseries of Prometheus with time taken.
api_path - is my label name
This query_result will give three-tuple value like this.
{api_path="/search/query"} 25704195 1507641522000
used the Regex field in query path to get only the api names.
*api_path="(.*)".*
This looks like a long way but
label_values((topk(10, sort_desc(sum(http_tt_ms_count) by (api_path)))), api_path)
is not working in Grafana which made me to go into this path.
I annotate my Kubernetes objects with things like version and whom to contact when there are failures. How would I relay this information to Prometheus, knowing that these annotation values will frequently change? I can't capture this information in Prometheus labels, as they serve as the primary key for a target (e.g. if the version changes, it's a new target altogether, which I don't want). Thanks!
I just wrote a blog post about this exact topic! https://www.weave.works/aggregating-pod-resource-cpu-memory-usage-arbitrary-labels-prometheus/
The trick is Kubelet/cAdvisor doesn't expose them directly, so I run a little exporter which does, and join this with the pod name in PromQL. The exporter is: https://github.com/tomwilkie/kube-api-exporter
You can do a join in Prometheus like this:
sum by (namespace, name) (
sum(rate(container_cpu_usage_seconds_total{image!=""}[5m])) by (pod_name, namespace)
* on (pod_name) group_left(name)
k8s_pod_labels{job="monitoring/kube-api-exporter"}
)
Here I'm using a label called "name", but it could be any label.
We use the same trick to get metrics (such as error rate) by version, which we then use to drive our continuous deployment system. kube-api-exporter exports a bunch of useful meta-information about Kubernetes objects to Prometheus.
Hope this helps!