Prometheus OR when using rate() - kubernetes

Summary
I'm trying to figure out how to properly use the OR | operator in a Prometheus query because my imported Grafana dashboard is not working.
Long version
I'm trying to debug a Grafana dashboard based on some data scraped from my Kubernetes pods running AppMetrics/Prometheus; the dashboard is here. Basically what happens is that when the value "All" for the server is selected on the Grafana dashboard (server is an individual pod in this case), no data appears. However, when I select an individual pod, then data does appear.
Here's an example of the same metric scraped from the two pods:
# HELP application_httprequests_transactions
# TYPE application_httprequests_transactions summary
application_httprequests_transactions_sum{server="myapp-test-58d94bf78d-jdq78",app="MyApp",env="test"} 5.006965628
application_httprequests_transactions_count{server="myapp-test-58d94bf78d-jdq78",app="MyApp",env="test"} 1367
application_httprequests_transactions{server="myapp-test-58d94bf78d-jdq78",app="MyApp",env="test",quantile="0.5"} 0.000202825
application_httprequests_transactions{server="myapp-test-58d94bf78d-jdq78",app="MyApp",env="test",quantile="0.75"} 0.000279318
application_httprequests_transactions{server="myapp-test-58d94bf78d-jdq78",app="MyApp",env="test",quantile="0.95"} 0.000329862
application_httprequests_transactions{server="myapp-test-58d94bf78d-jdq78",app="MyApp",env="test",quantile="0.99"} 0.055584233
# HELP application_httprequests_transactions
# TYPE application_httprequests_transactions summary
application_httprequests_transactions_sum{server="myapp-test-58d94bf78d-l9tdv",app="MyApp",env="test"} 6.10214788
application_httprequests_transactions_count{server="myapp-test-58d94bf78d-l9tdv",app="MyApp",env="test"} 1363
application_httprequests_transactions{server="myapp-test-58d94bf78d-l9tdv",app="MyApp",env="test",quantile="0.5"} 0.000218548
application_httprequests_transactions{server="myapp-test-58d94bf78d-l9tdv",app="MyApp",env="test",quantile="0.75"} 0.000277483
application_httprequests_transactions{server="myapp-test-58d94bf78d-l9tdv",app="MyApp",env="test",quantile="0.95"} 0.033821094
application_httprequests_transactions{server="myapp-test-58d94bf78d-l9tdv",app="MyApp",env="test",quantile="0.99"} 0.097113234
I ran the Query inspector in Grafana to find out which query it is calling, and then ran the PromQL query in Prometheus itself. Basically, when I execute the following PromQL queries individually, they return data:
rate(application_httprequests_transactions_count{env="test",app="MyApp",server="myapp-test-58d94bf78d-l9tdv"}[15m])*60
rate(application_httprequests_transactions_count{env="test",app="MyApp",server="myapp-test-58d94bf78d-jdq78"}[15m])*60
However, when I try to use PromQL's | operator to combine them, I don't get data back:
rate(application_httprequests_transactions_count{env="test",app="MyApp",server="myapp-test-58d94bf78d-l9tdv|myapp-test-58d94bf78d-jdq78"}[15m])*60
Here's the raw output from Grafana's query inspector:
xhrStatus:"complete"
request:Object
method:"GET"
url:"api/datasources/proxy/56/api/v1/query_range?query=rate(application_httprequests_transactions_count%7Benv%3D%22test%22%2Capp%3D%22MyApp%22%2Cserver%3D%22myapp-test-58d94bf78d-jdq78%7Cmyapp-test-58d94bf78d-l9tdv%7Cmyapp-test-5b8c9845fb-7lklm%7Cmyapp-test-5b8c9845fb-8jf7n%7Cmyapp-test-5b8c9845fb-d9x5c%7Cmyapp-test-5b8c9845fb-fw4gj%7Cmyapp-test-5b8c9845fb-vtl9z%7Cmyapp-test-5b8c9845fb-vv7xv%7Cmyapp-test-5b8c9845fb-wq9bs%7Cmyapp-test-5b8c9845fb-xqfrt%7Cmyapp-test-69999d58b5-549vd%7Cmyapp-test-69999d58b5-lmp8x%7Cmyapp-test-69999d58b5-nbvt9%7Cmyapp-test-69999d58b5-qphj2%7Cmyapp-test-6b8dcc5ffb-gjjvj%7Cmyapp-test-6b8dcc5ffb-rxfk2%7Cmyapp-test-7fdf446767-bzhm2%7Cmyapp-test-7fdf446767-hp46w%7Cmyapp-test-7fdf446767-rhqhq%7Cmyapp-test-7fdf446767-wxmm2%22%7D%5B1m%5D)*60&start=1540574190&end=1540574505&step=15"
response:Object
status:"success"
data:Object
resultType:"matrix"
result:Array[0] => []
I opened a GitHub issue for this as well; it has a quick GIF screen recording showing what I mean: AppMetrics/Prometheus#43

| is for regular expressions, PromQL doesn't have a | operator (but it does have an or operator). You need to specify that the matcher is a regex rather than an exact match with =~:
rate(application_httprequest_transactions_count{env="test",app="MyApp",server=~"myapp-test-58d94bf78d-l9tdv|myapp-test-58d94bf78d-jdq78"}[15m])*60

Related

Fill Grafana Dashboard Variable From LoqQL

I have a Grafana dashboard and I'd like to define a variable for this dashboard. I'd like the values of this variable will come from LogQL query. To be more specific - in each log I have a field called "site_ids", and I want the values of the variable to be all the different "site_ids" (longs).
So I wrote this query:
{_namespace_="namespace",_schema_="schema"} | logfmt | line_format "{{.site_ids}}"
Which seems to work when I just run it in the query executor, this is the output (the actual site_ids):
0
-1
196
2
3
...
But when putting it as a query when I try to configure a new variable, I see nothing in the Preview of values:
Unfortunately I can barely find documentation about this..
Any suggestions?
Thanks!
Use label_values like this Query Variable section

Grafana - Is it possible to use variables in Loki-based dashboard query?

I am working on a Loki-based Dashboard on Grafana. I have one panel for searching text in the Loki trace logs, the current query is like:
{job="abc-service"}
|~ "searchTrace"
|json
|line_format "{if .trace_message}} Message: \t{{.trace_message}} {{end}}"
Where searchTrace is a variable of type "Text box" for the user to input search text.
I want to include another variable skipTestLog to skip logs created by some test cron tasks. skipTestLog is a custom variable of two options: Yes,No.
Suppose the logs created by test cron tasks contain the text CronTest in the field trace_message after the json parser, are there any ways to filter them out based on the selected value of skipTestLog?
Create a key/value custom variable like in the following example:
Use the variable like in the following example:

Grafana dashboard to display a metric for a key in JSON Loki record

I'm having trouble understanding how to create a dashboard time series plot to display a single key/value from a Loki log which is in JSON format.
eg:
here is my query in the Explorer:
{job="railsdevlogs"}|json
which returns log lines such as:
{"date":"2022-01-05T21:27:21.895Z","pool":{"Pool Size":50,"Current":5,"Active":1,"Idle":4,"Dead":0,"Timeout":"5 sec"},"puma":{"Started At":"2022-01-05T20:35:26Z","Max Threads":16,"Pool Capacity":16,"Running":1,"Backlog":0,"IO Handles":15,"File Handles":2,"Socket Handles":4,"Server Log Size":46750072},"process":[{"Name":"ruby.exe","Process ID":656,"Threads":11,"Working Set":150728704,"Virtual Size":288079872},{"Name":"mysqld.exe","Process ID":4836,"Threads":3,"Working Set":360448,"Virtual Size":4445065216},{"Name":"mysqld.exe","Process ID":5808,"Threads":49,"Working Set":69906432,"Virtual Size":4924059648},{"Name":"aaaaa.exe","Process ID":14460,"Threads":18,"Working Set":49565696,"Virtual Size":5478469632},{"Name":"bbbbb.exe","Process ID":9584,"Threads":14,"Working Set":35012608,"Virtual Size":4496551936},{"Name":"ccccc.exe","Process ID":11944,"Threads":14,"Working Set":29609984,"Virtual Size":4481880064}],"gc":{"count":242,"heap_allocated_pages":1277,"heap_sorted_length":1279,"heap_allocatable_pages":9,"heap_available_slots":869213,"heap_live_slots":464541,"heap_free_slots":404672,"heap_final_slots":0,"heap_marked_slots":411311,"heap_swept_slots":457903,"heap_eden_pages":1268,"heap_tomb_pages":9,"total_allocated_pages":1278,"total_freed_pages":1,"total_allocated_objects":74364715,"total_freed_objects":73900174,"malloc_increase_bytes":640096,"malloc_increase_bytes_limit":16777216,"minor_gc_count":131,"major_gc_count":111,"remembered_wb_unprotected_objects":57031,"remembered_wb_unprotected_objects_limit":114062,"old_objects":349257,"old_objects_limit":698512,"oldmalloc_increase_bytes":640288,"oldmalloc_increase_bytes_limit":16777216},"os":{"System Name":"xxxxx","Description":"","Organization":"","Operating System":"Microsoft Windows 10 Enterprise LTSC","OS Version":"10.0.17763","OS Serial Number":"xxxxx-xxxxx-xxxxx-xxxxx","System Time":"2022-01-05T16:27:22.000-05:00","System Time Zone":-300,"Last Boot Time":"2021-12-15T23:26:38.000-05:00","System Drive":"C:","Total Physical Memory":34204393472,"Free Physical Memory":20056260608,"Total Virtual Memory":39304667136,"Free Virtual Memory":13915041792,"Number of Processes":307,"Number of Users":2,"volumes":[{"Drive":"C:\\","Type":"NTFS","Total Space":1023563264000,"Free Space":681182343168,"Block Size":4096}]},"symbol":{"size":28106},"stats_collection_time":387}
using |json will automatically create dynamic labels for all the key/values in the json log line:
gc_count = 123
os_Free_Virtual_Memory = 456789
etc.
Now I would like to plot one of these values in a grafana time series plot, but I am struggling to understand how to isolate one dynamic label and plot it.
Perhaps I'm using |json incorrectly. The documentation and examples I have read so far shows how to filter the logs using the dynamic labels, but I dont need that since I want to plot every log line.
thanks
I think this should help https://grafana.com/go/observabilitycon/2020/keynote-what-is-observability/ if you go to minute 41.
There's an example which is very similar to what you're trying to achieve.
Your query should look something like:
quantile_over_time(0,99, {job="railsdevlogs"}
| json
| unwrap gc_count [1m]}
by (job)

Host's pods panel in Grafana

I want to have a panel in Grafana which displays what pods are currently running in a host.
For the host variable I have the following query (the job variable is just label_values(node_uname_info, job).):
label_values(node_uname_info{job="$job"}, instance)
This gives me an array of sockets: host_ip:port
I can get the pod names from kube_pod_info{job="$job", host_ip="$host_ip"}, but in order to get the IP I need to remove the port part of the socket:
label_replace(node_uname_info{job="$job", instance="$node"}, "host_ip", "$1", "instance", "(.*):.*")
I haven't found how to use the new host_ip label in the pod query to eventually get all the pod label values of kube_pod_info. I don't want to put the label_replace in Prometheus to avoid data duplication - is there a way to use the new host_ip label in the pod query?
Edit:
I added the host_ip variable with the regex as shan1024 showed in his answer and changed the panel's query to:
sum by (pod) (kube_pod_info{job="$job", host_ip="$host_ip"})
Then I changed the panel's visualization to table and added column styles to Time and Value (chose type Hidden). This allows me to display the host's running pods in a list-like fashion.
This is actually quite easy to do in Grafana and no need to change labels in Prometheus. You just need to add a regex in the instance variable (when we add a regex with a capturing group, the value(s) of the 1st captured group will be the value(s) of the variable).
e.g.
Variable definition without Regex (you get host_ip:port)-
Variable definition with Regex (you only get host_ip)-
Then you can add a new variable with value kube_pod_info{ host_ip="$instance" } to get all pods in the selected host.

How can I filter the result of label_values(label) to get a list of labels that match a regex?

I have several metrics with the label "service". I want to get a list of all the "service" levels that begin with "abc" and end with "xyz". These will be the values of a grafana template variable.
This is that I have tried:
label_values(service) =~ "abc.*xyz"
However this produces a error Template variables could not be initialized: parse error at char 13: could not parse remaining input "(service_name) "...
Any ideas on how to filter the label values?
This should work (replacing up with the metric you mention):
label_values(up{service=~"abc.*xyz"}, service)
Or, in case you actually need to look across multiple metrics (assuming that for some reason some metrics have some service label values and other metrics have other values):
label_values({__name__=~"metric1|metric2|metric3", service=~"abc.*xyz"}, service)