Determining average jvm metrics for set of pods in kubernetes - kubernetes

I am analyzing jvm metrics on prometheus dashboard for a service deployed in kubernetes. There are several pods, each running an instance of the service.
When I do:
jvm_memory_max_bytes{area="heap", app="my-application",job="my-job"}
This fetches all the entries for all the pods.
Now I apply sum function:
sum(jvm_memory_max_bytes{area="heap", app="my-application",job="my-job"})
It sums up all the results from first query.
My objective is to find average jvm statistics, which may need the number of pods running.
In grafana, I tried to search kube_* metrics, but couldn't find any suitable one.
How can I get average jvm metrics for a set of pods?

You can get number of running pods by using:
sum(kube_pod_status_phase{phase="Running"})
So your final query might look like this:
sum(jvm_memory_max_bytes{area="heap", app="my-application",job="my-job"}) / sum(kube_pod_status_phase{phase="Running"})

Related

k8s resources overview planning

We are planning on delivering small k8s clusters to clients with our application on top.
Currently we are struggling on see what resources we actually need. At average we are running 20-30 pods in the system.
While getting resources requests and limits per deployment is not hard to see.
It is hared to get full view of all requests or all limits resources for all pods that are running in the cluster. At least in an automated way.
Is there prebuild dashboard in Grafana or some kind of kubectl command that would collect all of the requests and limits for all pods running in the k8s cluster?
The result should be a "nice" report for all resource requirements.
Since we are delivering a "static" cluster to clients there is no hpa roles in our clusters.
So far we have done manual check per each pod and write it in Excel table which is not time efficient and repeatable.
Hi skolko you can use prometheus for monitoring your kubernetes cluster there are various options available like monitoring individual deployments, monitoring entire cluster and monitoring each pod individually. Follow this document for setting up the prometheus monitoring for kubernetes and this document for getting an overview on metrics available for monitoring.

Finding the duration of Kubernetes Pods in Datadog?

I am trying to aggregate the pod duration of the status Running over another custom business-logic tag. Then I can calculate how much it cost me to run this service.
I have tried to use docker.uptime but it has not been fruitful as I imagine multiple containers can be run in parallel per node. I saw that Datadog KSM provides pods.age and pods.uptime metrics but they do not appear in Datadog metric explorer.
I do not want to use Prometheus/Grafana to do this because I think this should be possible with Datadog. Prometheus Solution
avg:container.uptime{kube_namespace:<your namespace>} by {pod_name}

How to get information about scaled up pods in a Kubernetes cluster

I have a Kubernetes cluster and I am figuring out in what numbers have pods got scaled up using the Kubectl command.
what is the possible way to get the details of all the scaled-up and scaled-down pods within a month?
That is not information Kubernetes records. The Events system keeps some debugging messages that include stuff about pod startup and sometimes shutdown, but that's only kept for a few hours. For long term metrics look at something like Prometheus + kube-state-metrics.
If you have k8s audit Policy in place, you can find the Events that are passed through the k8s API by filtering them in Cloud Audit Logs or elasticsearch! it depends on you current setup!

Duplicate metrics with multiple instances of kube-state-metrics

Problem:
Duplicate data when querying from prometheus for metrics from kube-state-metrics.
Sample query and result with 3 instances of kube-state-metrics running:
Query:
kube_pod_container_resource_requests_cpu_cores{namespace="ns-dummy"}
Metrics
kube_pod_container_resource_requests_cpu_cores{container="appname",endpoint="http",instance="172.232.35.142:8080",job="kube-state-metrics",namespace="ns-dummy",node="ip-172-232-34-25.ec2.internal",pod="app1-appname-6bd9d8d978-gfk7f",service="prom-kube-state-metrics"}
1
kube_pod_container_resource_requests_cpu_cores{container="appname",endpoint="http",instance="172.232.35.142:8080",job="kube-state-metrics",namespace="ns-dummy",node="ip-172-232-35-22.ec2.internal",pod="app2-appname-ccbdfc7c8-g9x6s",service="prom-kube-state-metrics"}
1
kube_pod_container_resource_requests_cpu_cores{container="appname",endpoint="http",instance="172.232.35.17:8080",job="kube-state-metrics",namespace="ns-dummy",node="ip-172-232-34-25.ec2.internal",pod="app1-appname-6bd9d8d978-gfk7f",service="prom-kube-state-metrics"}
1
kube_pod_container_resource_requests_cpu_cores{container="appname",endpoint="http",instance="172.232.35.17:8080",job="kube-state-metrics",namespace="ns-dummy",node="ip-172-232-35-22.ec2.internal",pod="app2-appname-ccbdfc7c8-g9x6s",service="prom-kube-state-metrics"}
1
kube_pod_container_resource_requests_cpu_cores{container="appname",endpoint="http",instance="172.232.37.171:8080",job="kube-state-metrics",namespace="ns-dummy",node="ip-172-232-34-25.ec2.internal",pod="app1-appname-6bd9d8d978-gfk7f",service="prom-kube-state-metrics"}
1
kube_pod_container_resource_requests_cpu_cores{container="appname",endpoint="http",instance="172.232.37.171:8080",job="kube-state-metrics",namespace="ns-dummy",node="ip-172-232-35-22.ec2.internal",pod="app2-appname-ccbdfc7c8-g9x6s",service="prom-kube-state-metrics"}
Observation:
Every metric is coming up Nx when N pods are running for kube-state-metrics. If it's a single pod running, we get the correct info.
Possible solutions:
Scale down to single instance of kube-state-metrics. (Reduced availability is a concern)
Enable sharding. (Solves duplication problem, still less available)
According to the docs, for horizontal scaling we have to pass sharding arguments to the pods.
Shards are zero indexed. So we have to pass the index and total number of shards for each pod.
We are using Helm chart and it is deployed as a deployment.
Questions:
How can we pass different arguments to different pods in this scenario, if its possible?
Should we be worried about availability of the kube-state-metrics considering the self-healing nature of k8s workloads?
When should we really scale it to multiple instances and how?
You could use a 'self-healing' deployment with only a single replica of kube-state-metric if the container down, the deployment will start a new container. Since kube-state-metric is not focused on the health of the individual kubernetes components. It only will affect you if your cluster is too big and make many objects changes per second.
It is not focused on the health of the individual Kubernetes components, but rather on the health of the various objects inside, such as deployments, nodes and pods.
For small cluster there's is no problem use in this way, but you really need a high availability monitoring platform I recommend you take a look in this two articles:
creating a well designed and highly available monitoring stack for kubernetes and
kubernetes monitoring

how to summarize metrics per service?

For example, kubelet(cAdvisor) container_cpu_usage_seconds_total has value with some parameter (e.g. pod, namespace).
I wonder how to summarize this kind of values into Service(for example, CPU usage per service)? I understand that Service is a set of pods so that just aggregating these values per pod to service, but I do not know how?
Is there any aggregation method to Service? Or, process_cpu_seconds_total is a kind of aggregated value per service of 'container_cpu_usage_seconds_total'?
Thank you for your help!
What about
sum(rate(container_cpu_usage_seconds_total{job="kubelet", cluster="", namespace="default", pod_name=~"your_pod_name.*"}[3m]))
Taken from kubernetes-mixin
In general, cAdvisor collects metrics about containers and doesn't know anything about Services. If you want to aggregate by Service, you need to manually select the metrics of the Pod that belong to this Service.
For example, if your cAdvisor metrics are in Prometheus, you can use this PromQL query:
sum(rate(container_cpu_usage_seconds_total{pod_name=~"myprefix-*"}[2m]))
This adds up the CPU usages of all containers of all Pods that have a name starting with myprefix-.
Or if you have the Resource Metrics API enabled (i.e. the Metrics Server installed), you can query the CPU usage of a specific Pod (in fractions of a CPU core) with:
kubectl get --raw="/apis/metrics.k8s.io/v1beta1/namespaces/{namespace}/pods/{pod}"
To get the total usage of a Service, you would need to iterate through all the Pods of the Service, extract the values, and add them together.
In general, Service is a Kubernetes concept and does not exist in cAdvisor, which is an independent project and just happens to be used in Kubernetes.