Auto join in prometheus with max over time with no output - kubernetes

prometheus:v2.15.2
kubernetes:v1.14.9
I have a query where it shows exactly the maximum over time during the set period.
But I would like to join with the metric already set in the kube_pod_container resource.
I would like to know if what is set is close to the percentage set or not, displaying the percentage.
I have other examples working with this same structure of metric
jvm_memory_bytes_used{instance="url.instance.com.br"} / jvm_memory_bytes_max{area="heap"} * 100 > 80
but this one is not working.
max_over_time(sum(rate(container_cpu_usage_seconds_total{pod="pod-name-here",container_name!="POD", container_name!=""}[1m])) [1h:1s]) / kube_pod_container_resource_requests_cpu_cores * 100 < 70
Well the first idea was to create a query to collect the maximum historical cpu usage of a container in a pod in a brief period:
max_over_time(sum(rate(container_cpu_usage_seconds_total{pod="xpto-92838241",container_name!="POD", container_name!=""}[1m])) [1h:1s])
Element: {} Value: 0.25781324101515
If we execute it this way:
container_cpu_usage_seconds_total{pod="xpto-92838241",container_name!="POD", container_name!=""}
Element: container_cpu_usage_seconds_total{beta_kubernetes_io_arch="amd64",beta_kubernetes_io_instance_type="t3.small",beta_kubernetes_io_os="linux",cluster="teste.k8s.xpto",container="xpto",container_name="xpto",cpu="total",failure_domain_beta_kubernetes_io_region="sa-east-1",failure_domain_beta_kubernetes_io_zone="sa-east-1c",generic="true",id="/kubepods/burstable/poda9999e9999e999e9-/99999e9999999e9",image="nginx",instance="kubestate-dev.internal.xpto",job="kubernetes-cadvisor",kops_k8s_io_instancegroup="nodes",kubernetes_io_arch="amd64",kubernetes_io_hostname="ip-99-999-9-99.sa-east-1.compute.internal",kubernetes_io_os="linux",kubernetes_io_role="node",name="k8s_nginx_nginx-99999e9999999e9",namespace="nmpc",pod="pod-92838241",pod_name="pod-92838241",spot="false"} Value: 22533.2
Now we have what is configured:
kube_pod_container_resource_requests_cpu_cores{pod="xpto-92838241"}
Element: kube_pod_container_resource_requests_cpu_cores{container="xpto",instance="kubestate-dev.internal.xpto",job="k8s-http",namespace="nmpc",node="ip-99-999-999-99.sa-east-1.compute.internal",pod="pod-92838241"} Value: 1
Well, in my perception it would be to use these two metrics and get it close to the percentage like this:
max_over_time(sum(rate(container_cpu_usage_seconds_total{pod="xpto-dev-92838241",container_name!="POD", container_name!=""}[1m])) [1h:1s]) / kube_pod_container_resource_requests_cpu_cores * 100 < 70
Element: no data Value:
But these two metrics do not interact, I can not understand why and do not find in the documentation.
Regards

As you can see here, only in Kubernetes 1.16 cadvisor metric labels pod_name and container_name were removed and substituted by pod and container respectively.
As you are using Kubernetes 1.14, you should still use pod_name and container_name.
Let me know if it helps.

Here's Prometheus Operators,
with the documentation and this blog about CPU aggregation walkthrough.
I got the solution of my problem with vector matching.
max_over_time(sum(rate(container_cpu_usage_seconds_total{pod="pod-name-here",container_name!="POD", container_name!=""}[1m])) [1h:1s]) / on(pod_name) group_left(container_name) kube_pod_container_resource_requests_cpu_cores{pod="pod-name-here"}
thank you all

The following PromQL query returns per-pod CPU usage in percentage of its configured limits:
100 * sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (pod)
/
sum(kube_pod_container_resource_requests{resource="cpu"}) by (pod)
The following query returns the maximum per-pod CPU usage over the last hour:
max_over_time((
100 * sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by
(pod)
/
sum(kube_pod_container_resource_requests{resource="cpu"}) by (pod)
)[1h:5m])
Note that the first query is basically wrapped into max_over_time((...)[1h:5m]). Such construction is called subquery. It may work slower than the original query.

Related

Translating PromQL 'sum(rate(value))' queries to InfluxQL

I'm creating a custom k8s grafana dashboard with datasource as InfluxDB (v1.8.6).I have gone through the influxdb documentation and recognised that the analogical construct for prometheus rate() in influx is non_negative_derivative(mean(value), interval). But on trying to convert the prometheus query to InfluxQL, the resultant query values vary when executed against same time intervals. Im basically trying to compute the k8s cluster network i/o pressure.
PromQL :
sum (rate (container_network_receive_bytes_total{kubernetes_io_hostname=~"^$Node$", job="kubernetes-nodes-cadvisor"}[1m]))
Output is in bytes : 7321180
InfluxQL :
SELECT SUM(bytes_used) FROM (SELECT non_negative_derivative(mean(value), 1s) AS bytes_used FROM container_network_receive_bytes_total WHERE ("job" = 'kubernetes-nodes-cadvisor' AND "kubernetes_io_hostname" =~ /^$Node$/) AND $timeFilter GROUP BY time(1m)) group by time($__interval)
Output is in Mb/s : 36.7 MB/s
Could someone help identify the issue and correct me ?

prometheus: counting the number of time series in a time range

I'm trying to get deployment frequency in kubernetes. Counting replicasets doesn't work for us since we want to see the frequency of 'code deploys'. For this we can look at how many different images were deployed in a period of time. I'm trying to get this data from prometheus which has kubernetes metrics, we also have kube-state-metrics deployed.
I have tried things like -
count(
group by (image) (
(label_replace(kube_pod_owner{owner_kind="ReplicaSet", owner_name=~"app-.*"}, "replicaset", "$1", "owner_name", "(.*)")
* on (replicaset) group_left kube_replicaset_spec_replicas{}
> 0)
* on (pod) group_right kube_pod_container_info{container="app"}
)
)
count(
group by (image) (
(count_over_time(kube_pod_container_info{container="app", pod=~"app.*"}[1d])
* on(pod) group_left kube_pod_container_status_ready{}
) > 1
)
)
Which aren't giving me quite what I want. I get a spike in the graph for the period where two images exist simultaneously, but that's what I'm looking for.

Grafana - How To plot the metrics for each variable which is passed dynamically

I'm using prometheus with grafana. I have a usecase where I have to take variables dynamically and need to perform divide operation which to be performed for each variable which is coming dynamically so can plot graph at each variable level.
eg. first metrics is -
rate(container_cpu_usage_seconds_total{id="/",instance=~'${INSTANCE:pipe}'}[5m])
where ${INSTANCE:pipe} getting dynamically
which needs to be divided by -
machine_cpu_cores{kubernetes_io_hostname=~'${INSTANCE:pipe}'}
and i want result in format -
1 entry per variable
eg.
vars result
var1 - 102
var2 - 23
var3 - 453
note (var1,var2,var3 are nothing but dynamically passed variables and result is nothing value return by divide operation)
Thanks in advance
After trying some queries found the solution -
My use-case has 2 metrics as below -
container_cpu_usage_seconds_total
machine_cpu_cores
In both metrics I found common label as kubernetes_io_hostname
I grouped both the metrics with the above label with the following queries -
(sort_desc ( max (rate (container_cpu_usage_seconds_total{id="/",kubernetes_io_role="node"}[5m])) BY (kubernetes_io_hostname)
sort_desc(max (machine_cpu_cores{kubernetes_io_role="node"}) BY (kubernetes_io_hostname ))
So my data has only 1 label named kubernetes_io_hostname
Then I did the division of the above 2 metrics and then got the result for the kubernetes_io_hostname label
If you need more info on this let me know in the comment section.

How to turn prometheus irate function to sql

I need to turn prometheus irate function to sql language, and i cannot really find the calculation logic anywhere.
i have the following query in prometheus sql:
100 - (avg by (instance) (irate(node_cpu_seconds_total{job="node",mode="idle"}[40s])) * 100)
Let's say i have the following data for a cpu:
v 20 50 100 200 201 230
----x-+----x------x-------x-------x--+-----x-----
t 10 20 30 40 50 60
| <-- range=40s -->|
t
My question is not really related to postgres, since i could solve this problem in sql if i would know what is the formula i should develop.
i understand that i have to get the last two datapoints difference and divide value_diff with time_diff:
(201-200)/(50-40), but how the 40s window comes into the picture?
((201-200)/(50-40))/40 ?
What would be the proper mathematical calculation for the above prometheus query?
And how i should do the same if i have 8 cpu data?
I tried to search for documentation, but could not find any proper explanation what is going on behind.
Thanks

Prometheus average over a time period into Grafana table

I can't seem to figure out the Prometheus query to calculate the single value of, say, average CPU usage per instance over a time period and create the Grafana table out of it:
Period: last 3h
Instance A: CPU usage A
Instance B: CPU usage B
Simply put, I want to:
select a time period in Grafana
have Prometheus average the values per instance within that period to a single value
use that data to populate a Grafana table
Any hints?
Thanks!
To answer myself:
avg_over_time(instance:cpu_usage:irate[$__range])
So for example if I would like to get the CPU utilisation , does it mean that this promql will work well?
(((count(count(node_cpu_seconds_total{job="vi-prod-node-exporter-ec2-vsat2-spen"}) by (cpu))) - avg(sum by (mode)(rate(node_cpu_seconds_total{mode='idle',job="vi-prod-node-exporter-ec2-vsat2-spen"}[$__rate_interval])))) * 100) / count(count(node_cpu_seconds_total{job="vi-prod-node-exporter-ec2-vsat2-spen"}) by (cpu))