How can I get correct metric for pod status? - kubernetes

I'm trying to get the pod status in Grafana through Prometheus in a GKE cluster.
kube-state-metrics has been installed together with Prometheus by using the prometheus-community/prometheus and grafana Helm charts.
I tried to know the pod status through kube_pod_status_phase{exported_namespace=~".+-my-namespace", pod=~"my-server-.+"}, but I get only "Running" as a result.
In other words, in the obtained graph I can see only a straight line at the value 1 for the running server. I can't get when the given pod was pending or in another state different from Running.
I am interested in the starting phase, after the pod is created, but before it is running.
Am I using the query correctly? Is there another query or it could be due to something in the installation?

If you mean the Pending status for the Pod, I think you should use instead kube_pod_status_phase{exported_namespace=~".+-my-namespace", pod=~"my-server-.+", phase="Pending"} . Not sure what it does when you don't put the phase in your request but I suspect it just renders the number of Pods whatever the state is. In your case is always 1.

Related

Grafana consolidate pod metrics

I have a Kubernetes Pod which serves metrics for prometheus.
Once in a while I update the release and thus the pod gets restarted.
Prometheus safes the metrics but labels it according to the new pod name:
this is by prometheus' design, so its ok.
but if I display this data with grafana, Im getting this (the pods ahve been redeployed twice):
So for example the metric "Registered Users" now has 3 different colors because the source from it comes from 3 diffferent pods
I have some options. Maybe disregard the pod name in prometheus, but I consider that bad practise because I dont want to lose data.
So I think I have to consolidate this in grafana. But how I can I tell Grafana that I want to merge all data with container-name api-gateway-narkuma and disregard the label pods?
You can do something like
max(users) without (instance, pod)

check if grafana agent operator is up and running

I have a grafana agent operator and I was trying to create some metrics to monitor if it's up.
If I had a simple grafana agent process I would just use something along the lines of absent(up{instance="1.2.3.4:8000"} == 1 but with the Grafana Agent operator the components are dynamic.
I don't see issues with monitoring the metrics part. For example, if the grafana-agent-0 stateful set for metrics goes down and a new pod is built the name would be the same.
But for logs, the Grafana Agent operator runs a pod (daemon set) for every node with a different name each time.
In the log case if a pod grafana-agent-log-vsq5r goes down or a new node is added to the cluster I would have a new pod to monitor with a different name which would create some problems in being able to monitor the changes in the cluster. Anyone that already had this issue or that knows some good way of tackling the issue?
I would like to suggest using Labels in Grafana Alerting

Is there any kubectl command to poll until all the pod roll to new code?

I am building deploy pipeline. I Need a "kubectl" command that would tell me that rollout is completed to all the pods then I can deploy to next stage.
The Deployment documentation suggests kubectl rollout status, which among other things will return a non-zero exit code if the deployment isn't complete. kubectl get deployment will print out similar information (how many replicas are expected, available, and up-to-date), and you can add a -w option to watch it.
For this purpose you can also consider using one of the Kubernetes APIs. You can "get" or "watch" the deployment object, and get back something matching the structure of a Deployment object. Using that you can again monitor the replica count, or the embedded condition list, and decide if it's ready or not. If you're using the "watch" API you'll continue to get updates as the object status changes.
The one trick here is detecting failed deployments. Say you're deploying a pod that depends on a database; usual practice is to configure the pod with the hostname you expect the database to have, and just crash (and get restarted) if it's not there yet. You can briefly wind up in CrashLoopBackOff state when this happens. If your application or deployment is totally wrong, of course, you'll also wind up in CrashLoopBackOff state, and your deployment will stop progressing. There's not an easy way to tell these two cases apart; consider an absolute timeout.

Kubernetes / Prometheus Metrics Mismatch

I have an application running in Kubernetes (Azure AKS) in which each pod contains two containers. I also have Grafana set up to display various metrics some of which are coming from Prometheus. I'm trying to troubleshoot a separate issue and in doing so I've noticed that some metrics don't appear to match up between data sources.
For example, kube_deployment_status_replicas_available returns the value 30 whereas kubectl -n XXXXXXXX get pod lists 100 all of which are Running, and kube_deployment_status_replicas_unavailable returns a value of 0. Also, if I get the deployment in question using kubectl I'm seeing the expected value.
$ kubectl get deployment XXXXXXXX
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
XXXXXXXX 100 100 100 100 49d
There are other applications (namespaces) in the same cluster where all the values correlate correctly so I'm not sure where the fault may be or if there's any way to know for sure which value is the correct one. Any guidance would be appreciated. Thanks
Based on having the kube_deployment_status_replicas_available metric I assume that you have Prometheus scraping your metrics from kube-state-metrics. It sounds like there's something quirky about its deployment. It could be:
Cached metric data
And/or simply it can't pull current metrics from the kube-apiserver
I would:
Check the version that you are running for kube-state-metrics and see if it's compatible with your K8s version.
Restart the kube-state-metrics pod.
Check the logs kubectl logskube-state-metrics`
Check the Prometheus logs
If you don't see anything try starting Prometheus with the --log.level=debug flag.
Hope it helps.

How to get number of pods running in prometheus

I am scraping the kubernetes metrics from prometheus and would need to extract the number of running pods.
I can see container_last_seen metrics but how should i get no of pods running. Can someone help on this?
If you need to get number of running pods, you can use a metric from the list of pods metrics https://github.com/kubernetes/kube-state-metrics/blob/master/docs/pod-metrics.md for that (To get the info purely on pods, it'd make sens to use pod-specific metrics).
For example if you need to get the number of pods per namespace, it'll be:
count(kube_pod_info{namespace="$namespace_name"}) by (namespace)
To get the number of all pods running on the cluster, then just do:
count(kube_pod_info)
Assuming you want to display that in Grafana according to your question tags, from this Kubernetes App Metrics dashboard for example:
count(count(container_memory_usage_bytes{container_name="$container", namespace="$namespace"}) by (pod_name))
You can just import the dashboard and play with the queries.
Depending on your configuration/deployment, you can adjust the variables container_name and namespace, grouping by (pod_name) and count'ing it does the trick. Some other label than pod_name can be used as long as it's shared between the pods you want to count.
If you want to see only the number of "deployed" pods in some namespace, you can use the solutions in previous answers.
My use case was to see the current running pods in some namespace and below is my solution:
'min_over_time(sum(group(kube_pod_container_status_ready{namespace="BC_NAME"}) by (pod,uid)) [5m:1m]) OR on() vector(0)'
Please replace BC_NAME with your namespace name.
The timespan provides you fine the data.
If no data found - no pod currently running it returns '0'