I'm trying to get deployment frequency in kubernetes. Counting replicasets doesn't work for us since we want to see the frequency of 'code deploys'. For this we can look at how many different images were deployed in a period of time. I'm trying to get this data from prometheus which has kubernetes metrics, we also have kube-state-metrics deployed.
I have tried things like -
count(
group by (image) (
(label_replace(kube_pod_owner{owner_kind="ReplicaSet", owner_name=~"app-.*"}, "replicaset", "$1", "owner_name", "(.*)")
* on (replicaset) group_left kube_replicaset_spec_replicas{}
> 0)
* on (pod) group_right kube_pod_container_info{container="app"}
)
)
count(
group by (image) (
(count_over_time(kube_pod_container_info{container="app", pod=~"app.*"}[1d])
* on(pod) group_left kube_pod_container_status_ready{}
) > 1
)
)
Which aren't giving me quite what I want. I get a spike in the graph for the period where two images exist simultaneously, but that's what I'm looking for.
I need to show in Grafana in a status panel (plugin) the sum of two different measurements:
in_bytes + out_bytes
Preferably the last data.
Any idea/hack how to overcome as I know there is no join in influxdb and I can't prepare on the server-side the data (merge and write to influx one measurement which sums them)?
prometheus:v2.15.2
kubernetes:v1.14.9
I have a query where it shows exactly the maximum over time during the set period.
But I would like to join with the metric already set in the kube_pod_container resource.
I would like to know if what is set is close to the percentage set or not, displaying the percentage.
I have other examples working with this same structure of metric
jvm_memory_bytes_used{instance="url.instance.com.br"} / jvm_memory_bytes_max{area="heap"} * 100 > 80
but this one is not working.
max_over_time(sum(rate(container_cpu_usage_seconds_total{pod="pod-name-here",container_name!="POD", container_name!=""}[1m])) [1h:1s]) / kube_pod_container_resource_requests_cpu_cores * 100 < 70
Well the first idea was to create a query to collect the maximum historical cpu usage of a container in a pod in a brief period:
max_over_time(sum(rate(container_cpu_usage_seconds_total{pod="xpto-92838241",container_name!="POD", container_name!=""}[1m])) [1h:1s])
Element: {} Value: 0.25781324101515
If we execute it this way:
container_cpu_usage_seconds_total{pod="xpto-92838241",container_name!="POD", container_name!=""}
Element: container_cpu_usage_seconds_total{beta_kubernetes_io_arch="amd64",beta_kubernetes_io_instance_type="t3.small",beta_kubernetes_io_os="linux",cluster="teste.k8s.xpto",container="xpto",container_name="xpto",cpu="total",failure_domain_beta_kubernetes_io_region="sa-east-1",failure_domain_beta_kubernetes_io_zone="sa-east-1c",generic="true",id="/kubepods/burstable/poda9999e9999e999e9-/99999e9999999e9",image="nginx",instance="kubestate-dev.internal.xpto",job="kubernetes-cadvisor",kops_k8s_io_instancegroup="nodes",kubernetes_io_arch="amd64",kubernetes_io_hostname="ip-99-999-9-99.sa-east-1.compute.internal",kubernetes_io_os="linux",kubernetes_io_role="node",name="k8s_nginx_nginx-99999e9999999e9",namespace="nmpc",pod="pod-92838241",pod_name="pod-92838241",spot="false"} Value: 22533.2
Now we have what is configured:
kube_pod_container_resource_requests_cpu_cores{pod="xpto-92838241"}
Element: kube_pod_container_resource_requests_cpu_cores{container="xpto",instance="kubestate-dev.internal.xpto",job="k8s-http",namespace="nmpc",node="ip-99-999-999-99.sa-east-1.compute.internal",pod="pod-92838241"} Value: 1
Well, in my perception it would be to use these two metrics and get it close to the percentage like this:
max_over_time(sum(rate(container_cpu_usage_seconds_total{pod="xpto-dev-92838241",container_name!="POD", container_name!=""}[1m])) [1h:1s]) / kube_pod_container_resource_requests_cpu_cores * 100 < 70
Element: no data Value:
But these two metrics do not interact, I can not understand why and do not find in the documentation.
Regards
As you can see here, only in Kubernetes 1.16 cadvisor metric labels pod_name and container_name were removed and substituted by pod and container respectively.
As you are using Kubernetes 1.14, you should still use pod_name and container_name.
Let me know if it helps.
Here's Prometheus Operators,
with the documentation and this blog about CPU aggregation walkthrough.
I got the solution of my problem with vector matching.
max_over_time(sum(rate(container_cpu_usage_seconds_total{pod="pod-name-here",container_name!="POD", container_name!=""}[1m])) [1h:1s]) / on(pod_name) group_left(container_name) kube_pod_container_resource_requests_cpu_cores{pod="pod-name-here"}
thank you all
The following PromQL query returns per-pod CPU usage in percentage of its configured limits:
100 * sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (pod)
/
sum(kube_pod_container_resource_requests{resource="cpu"}) by (pod)
The following query returns the maximum per-pod CPU usage over the last hour:
max_over_time((
100 * sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by
(pod)
/
sum(kube_pod_container_resource_requests{resource="cpu"}) by (pod)
)[1h:5m])
Note that the first query is basically wrapped into max_over_time((...)[1h:5m]). Such construction is called subquery. It may work slower than the original query.
I can't seem to figure out the Prometheus query to calculate the single value of, say, average CPU usage per instance over a time period and create the Grafana table out of it:
Period: last 3h
Instance A: CPU usage A
Instance B: CPU usage B
Simply put, I want to:
select a time period in Grafana
have Prometheus average the values per instance within that period to a single value
use that data to populate a Grafana table
Any hints?
Thanks!
To answer myself:
avg_over_time(instance:cpu_usage:irate[$__range])
So for example if I would like to get the CPU utilisation , does it mean that this promql will work well?
(((count(count(node_cpu_seconds_total{job="vi-prod-node-exporter-ec2-vsat2-spen"}) by (cpu))) - avg(sum by (mode)(rate(node_cpu_seconds_total{mode='idle',job="vi-prod-node-exporter-ec2-vsat2-spen"}[$__rate_interval])))) * 100) / count(count(node_cpu_seconds_total{job="vi-prod-node-exporter-ec2-vsat2-spen"}) by (cpu))
I got a influx db table consisting of
> SELECT * FROM results
name: results
time artnum duration
---- ------ --------
1539084104865933709 1234 34
1539084151822395648 1234 81
1539084449707598963 2345 56
1539084449707598123 2345 52
and other tags. Both artnum and duration are fields (that is changeable though). I'm now trying to create a query (to use in grafana) that gives me the following result with a calculated mean() and the number of measurements for that artnum:
artnum mean_duration no. measurements
------ -------- -----
1234 58 2
2345 54 2
First of all: Is it possible to exclude the time column? Secondly, what is the influx db way to create such a table? I started with
SELECT mean("duration"), "artnum" FROM "results"
resulting in ERR: mixing aggregate and non-aggregate queries is not supported. Then I found https://docs.influxdata.com/influxdb/v1.6/guides/downsampling_and_retention/, which looked like what I wanted to do. I then created a infinite retention policy (duration 0s) and a continuous query
> CREATE CONTINUOUS QUERY "cq" ON "test" BEGIN
SELECT mean("duration"),"artnum"
INTO infinite.mean_duration
FROM infinite.test
GROUP BY time(1m)
END
I followed the instructions, but after I fed some data to the db and waited for 1m, `SELECT * FROM "infinite"."mean_duration" did not return anything.
Is that approach the right one or should I continue somewhere else? The very goal is to see the updated table in grafana, refreshing once a minute.
InfluxDB is a time series database, so you really need the time dimension - also in the response. You will have a hard time with Grafana if your query returns non time series data. So don't try to remove time from the query. Better option is to hide time in the Grafana table panel - use column styles and set Type: Hidden.
InfluxDB doesn't have a tables, but measurements. I guess you need query with proper grouping only, no advance continous queries, etc.. Try and improve this query*:
SELECT
MEAN("duration"),
COUNT("duration")
FROM results
GROUP BY "artnum" fill(null)
*you may have a problem with grouping in your case, because artnum is InfluxDB field - better option is to save artnum as InfluxDB tag.