PromQL metric query returning other metrics than what I want - kubernetes

I must just not understand PromQL yet, but everything I read says this query should work fine:
node_cpu
Really simple right? Name of my metric, and I do get them in my result set.
node_cpu{app="prometheus",chart="prometheus-6.2.1",component="node-exporter",cpu="cpu0",heritage="Tiller",instance="10.85.166.16:9100",io_cattle_field_appId="prometheus",job="kubernetes-service-endpoints",kubernetes_name="prometheus-node-exporter",kubernetes_namespace="prometheus",mode="guest_nice",release="prometheus"} 0
node_cpu{app="prometheus",chart="prometheus-6.2.1",component="node-exporter",cpu="cpu0",heritage="Tiller",instance="10.85.166.16:9100",io_cattle_field_appId="prometheus",job="kubernetes-service-endpoints",kubernetes_name="prometheus-node-exporter",kubernetes_namespace="prometheus",mode="idle",release="prometheus"} 1784679.96
node_cpu{app="prometheus",chart="prometheus-6.2.1",component="node-exporter",cpu="cpu0",heritage="Tiller",instance="10.85.166.16:9100",io_cattle_field_appId="prometheus",job="kubernetes-service-endpoints",kubernetes_name="prometheus-node-exporter",kubernetes_namespace="prometheus",mode="iowait",release="prometheus"} 2897.73
But I also get a ton of other, unwanted metrics:
kubelet_runtime_operations_latency_microseconds_count{beta_kubernetes_io_arch="amd64",beta_kubernetes_io_os="linux",instance="la-1pk8s-w4",job="kubernetes-nodes",kubernetes_io_hostname="la-1pk8s-w4",node_role_kubernetes_io_worker="true",operation_type="image_status"}
container_start_time_seconds{beta_kubernetes_io_arch="amd64",beta_kubernetes_io_os="linux",id="/docker/8effa9b35affbf17118e7cc83a586d70da9fa960097ab717076c7251bf4eb324",image="rancher/rke-tools:v0.1.13",instance="la-1pk8s-w2",job="kubernetes-nodes-cadvisor",kubernetes_io_hostname="la-1pk8s-w2",name="rke-log-linker-nginx-proxy",node_role_kubernetes_io_worker="true"}
storage_operation_duration_seconds_bucket{beta_kubernetes_io_arch="amd64",beta_kubernetes_io_os="linux",instance="la-1pk8s-w4",job="kubernetes-nodes",kubernetes_io_hostname="la-1pk8s-w4",le="0.1",node_role_kubernetes_io_worker="true",operation_name="volume_unmount",volume_plugin="kubernetes.io/configmap"}
Not sure why they are there, strange. So I figure I'll filter on the label component="node-exporter" since that label only exists in the metrics I want.
node_cpu{component="node-exporter"} yields the same result set.
node_cpu{component=~"node-exporter"} yields same result set.
Why can't I just get all node_cpu metrics and why is the filtering not working? Thanks.

Either this is a bug that was fixed in 2.3.0, or you have a remote_read that's returning undesired results.

Related

Sum query result by name specified by regex

I am using Grafana together with Prometheus to display data of my Pods from Kubernetes Cluster. Here I am displaying Memory Usage for each Pod by name:
sum (container_memory_working_set_bytes{namespace="namespace1", image!="",name=~"^k8s_.*",kubernetes_io_hostname=~"^$Node$"}) by (pod_name)
It gives correct result for each pod. In example:
namespace1-eventstore-1
namespace1-eventstore-0
avsandbox-X-64ff4d-rl9z6
avsandbox-X-64ff4d-ldfnx
avsandbox-Y-7d9df9ddff-asdf
avsandbox-Y-7d9df9ddff-dfas
avsandbox-Z-5957dbaf58dt-gds24
avsandbox-Z-5957dbaf58dt-g4gd7
Now I want to sum them by their respective names to receive following result or closest I can get to it
namespace1-eventstore
avsandbox-X
avsandbox-Y
avsandbox-Z
So in conclusion I want to sum everything that has same name before second -. How can I achieve that?
Edit.: Here's further example what I'm looking for (hopefully it's helpful to give practical example and general idea)
sum (container_memory_working_set_bytes{namespace="namespace1", image!="",name=~"^k8s_.*",kubernetes_io_hostname=~"^$Node$"}) by (pod_name="([a-zA-Z0-9]+-[a-zA-Z0-9])-.*")
But that's not possible because of syntax.

grafana define variable with prometheus query based on metrics

I am pretty new to Grafana, so the question might be an easy one:
I try to store a metric value in a variable. Therefore I setup a variable with Prometheus query:
metrics(passed_tests_total{job="MyJob"})
Surprising to me, the value returns value None, although metric values with that label exist. I verified that by setting up a 'singlestat' panel with query passed_tests_total{job="MyJob"} which works perfectly fine.
So my question: how can I store a metric value to a variable?
Remark: my approach is basing on docu http://docs.grafana.org/features/datasources/prometheus/
If you want to retrieve the value of a metric you should use query_result(), metrics() gives you the name of matching metrics, not the value itself.
Your Query should be: query_result(passed_tests_total{job="MyJob"})
And the Regex to extract just the value of metric should be /.* ([^\ ]*) .*/.

Prometheus many-to-many problem for kube cronjobs

Hy there,
I'm trying to configure Kubernetes Cronjobs monitoring & alerts with Prometheus. I found this helpful guide
But I always get a many-to-many matching not allowed: matching labels must be unique on one side error.
For example, this is the PromQL query which triggers this error:
max(
kube_job_status_start_time
* ON(job_name) GROUP_RIGHT()
kube_job_labels{label_cronjob!=""}
) BY (job_name, label_cronjob)
The queries by itself result in e.g. these metrics
kube_job_status_start_time:
kube_job_status_start_time{app="kube-state-metrics",chart="kube-state-metrics-0.12.1",heritage="Tiller",instance="REDACTED",job="kubernetes-service-endpoints",job_name="test-1546295400",kubernetes_name="kube-state-metrics",kubernetes_namespace="monitoring",kubernetes_node="REDACTED",namespace="test-develop",release="kube-state-metrics"}
kube_job_labels{label_cronjob!=""}:
kube_job_labels{app="kube-state-metrics",chart="kube-state-metrics-0.12.1",heritage="Tiller",instance="REDACTED",job="kubernetes-service-endpoints",job_name="test-1546295400",kubernetes_name="kube-state-metrics",kubernetes_namespace="monitoring",kubernetes_node="REDACTED",label_cronjob="test",label_environment="test-develop",namespace="test-develop",release="kube-state-metrics"}
Is there something I'm missing here? The same many-to-many error happens for every query I tried from the guide.
Even constructing it by myself from ground up resulted in the same error.
Hope you can help me out here :)
In my case I don't get this extra label from Prometheus when installed via helm (stable/prometheus-operator).
You need to configure it in Prometheus. It calls: honor_labels: false
# If honor_labels is set to "false", label conflicts are resolved by renaming
# conflicting labels in the scraped data to "exported_<original-label>" (for
# example "exported_instance", "exported_job") and then attaching server-side
# labels.
So you have to configure your prometheus.yaml file - config with option honor_labels: false
# Setting honor_labels to "true" is useful for use cases such as federation and
# scraping the Pushgateway, where all labels specified in the target should be
# preserved
Anyway if I have it like this (I have now exported_jobs), still can't do proper query, but I guess is still because of my LHS.
Error executing query: found duplicate series for the match group
{exported_job="kube-state-metrics"} on the left hand-side of the operation:
[{__name__=
I ran into the same issue when I followed that article, but for me, I actually get duplicate job names but in different namespaces.
Ex. When running kube_job_status_start_time:
kube_job_status_start_time{instance="REDACTED",job="kube-state-metrics",job_name="job-abc-123",namespace="us"}
kube_job_status_start_time{instance="REDACTED",job="kube-state-metrics",job_name="job-abc-123",namespace="ca"}
So I had to either add a filter for the namespace or add namespace into the ON/BY clauses to get it to be unique.
e.g. for one of the subqueries I had to do this:
max(
kube_job_status_start_time
* ON(namespace, job_name) GROUP_RIGHT()
kube_job_labels{label_cronjob!=""}
) BY (namespace, label_cronjob)
Essentially had to apply that principle to all the rest of the queries for it to work for me. Not sure if that applies in your case.
Replacing kube_job_status_start_time with max(kube_job_status_start_time) by (job_name) will aggregate out any duplicates and should resolve the error.
The resulting query will look like this
max(
max(kube_job_status_start_time) by (job_name)
* ON(job_name) GROUP_RIGHT()
kube_job_labels{label_cronjob!=""}
) BY (job_name, label_cronjob)
I dug into this issue a bit more, and I guess the root cause of it is within this one-to-many vector matching expression:
kube_job_status_start_time * ON(job_name) GROUP_RIGHT() kube_job_labels{label_cronjob!=""}
where the group modifier "GROUP_RIGHT()" suggests, that each vector element from the left side (kube_job_status_start_time) can match with multiple elements on the right side (kube_job_labels), based on common label (job_name). The thing is that we are really dealing here with many-to-many matching, as each vector element from right side can match also multiple elements from left vector as well:
I think that what we are missing here is the way to uniquely identify exported Job objects from K8S by Prometheus. The author of this blog post, mentions about this feature in his setup:
...Prometheus resolves this collision of label names by including the
raw metric’s label as an exported_job label...
In my case I don't get this extra label from Prometheus when installed via helm (stable/prometheus-operator).
Regarding the missing labels - make sure that your kube-state-metrics is configured with a --metric-labels-allowlist. This is "new" since kube-state-metrics v2. See https://kubernetes.io/blog/2021/04/13/kube-state-metrics-v-2-0/#what-is-new-in-v2-0
By default, the metric contains only name and namespace labels.
But... the original guide is not woking with newer kube-state-metrics anyway. I can recommend this guide, which is a rework and does not need the labels.

BadRequest when adding expandClause for JobStatistics

I want to get some statistics about the job I'm running on my pool, and for that I am trying to use the JobStatistics class, but I have been getting job.Statistics as null in most of my runs except for few where the result was magically not null. I read in a documentation (https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.batch.cloudjob.statistics?view=azurebatch-6.1.0#Microsoft_Azure_Batch_CloudJob_Statistics) that for the statistics results not to be null, I need to use an expand clause with DetailLevel, but each time I do, I get the error: "operation returned an invalid status code 'badrequest' ". This is what I have for that.
ODATADetailLevel detailExJob = new ODATADetailLevel();
detailExJob.SelectClause = "id,executionInfo,stats";
detailExJob.ExpandClause = "id,executionInfo,stats";
await job.RefreshAsync(detailExJob);
What am I missing here? How can I get job.Statistics not to be null?
Thanks!
I'll try to answer your question, but it looks like you have two separate issues.
Job lifetime statistics may not be immediately available. The Batch service
performs periodic roll-up of statistics. I believe the typical delay is about 30minutes, but this is not documented.
The expand clause currently only supports stats. If you modify your detailExJob.ExpandClause statement to be assigned just "stats", then your job query should work. Moreover, you can simplify your detail level object to omit the expand clause altogether since you specified stats in the select clause.

NDepend CQL queries returning N/A for LOC

Using the the following CQL query:
SELECT NAMESPACES WHERE NameLike "Test$" ORDER BY NbLinesOfCode DESC
I am getting some results that show "N/A" instead of a number for NbLinesOfCode. Anyone know why this is happening and how to resolve it?
Note: I tried changing NbLinesOfCode to NbILInstructions, and none of the result records showed N/A.
There can be two things here:
If both NbLinesOfCode and NbILInstructions shows N/A, it means that the namespace doesnt have any code and contains only types without code (like interfaces, delegates or enumerations)
If only NbLinesOfCode shows N/A but NbILInstructions shows something, then it means NDepend cannot have access to the assemblies PDB. More information in: Understanding NDepend Analysis Inputs