grafana variable still catch old metrics info - grafana

i use grafana+prometheus monitor k8s pod,when my pod is removed ,i clean all metrics belongs to the removed pod,but still can see in grafana
variable
for example ,i defined a variable named node ,the query express is " {instance=~"(.+)", job="node status"} ",it can catch all metrics ,and i use regex expression '/instance="([^"]+):9100"/' to match the ip of each monitor target ,when i click node label on dashboard,it display all target ip , and when one of these targets is removed ,i use http api provide by prometheus to clean all metrics belongs to this target,but when i click node label ,it still display the removed target ip ,why? and how i can delete this ip?

It seems that Prometheus targets are not updated, even so that some of the pods are evicted. You can check it in Prometheus http://yourprometheus/targets page.
Does Prometheus run inside of the K8s cluster?

Related

check if grafana agent operator is up and running

I have a grafana agent operator and I was trying to create some metrics to monitor if it's up.
If I had a simple grafana agent process I would just use something along the lines of absent(up{instance="1.2.3.4:8000"} == 1 but with the Grafana Agent operator the components are dynamic.
I don't see issues with monitoring the metrics part. For example, if the grafana-agent-0 stateful set for metrics goes down and a new pod is built the name would be the same.
But for logs, the Grafana Agent operator runs a pod (daemon set) for every node with a different name each time.
In the log case if a pod grafana-agent-log-vsq5r goes down or a new node is added to the cluster I would have a new pod to monitor with a different name which would create some problems in being able to monitor the changes in the cluster. Anyone that already had this issue or that knows some good way of tackling the issue?
I would like to suggest using Labels in Grafana Alerting

How can I get correct metric for pod status?

I'm trying to get the pod status in Grafana through Prometheus in a GKE cluster.
kube-state-metrics has been installed together with Prometheus by using the prometheus-community/prometheus and grafana Helm charts.
I tried to know the pod status through kube_pod_status_phase{exported_namespace=~".+-my-namespace", pod=~"my-server-.+"}, but I get only "Running" as a result.
In other words, in the obtained graph I can see only a straight line at the value 1 for the running server. I can't get when the given pod was pending or in another state different from Running.
I am interested in the starting phase, after the pod is created, but before it is running.
Am I using the query correctly? Is there another query or it could be due to something in the installation?
If you mean the Pending status for the Pod, I think you should use instead kube_pod_status_phase{exported_namespace=~".+-my-namespace", pod=~"my-server-.+", phase="Pending"} . Not sure what it does when you don't put the phase in your request but I suspect it just renders the number of Pods whatever the state is. In your case is always 1.

Having issue while creating custom dashboard in Grafana( data-source is Prometheus)

I have setup Prometheus and Grafana for monitoring my kubernetes cluster and everything works fine. Then I have created custom dashboard in Grafana for my application.The metrics available in Prometheus is as follows and i have added the same in grafana as metrics:
sum(irate(container_cpu_usage_seconds_total{namespace="test", pod_name="my-app-65c7d6576b-5pgjq", container_name!="POD"}[1m])) by (container_name)
The issue is, my application is running as pod in kubernetes,so when the pod is deleted or recreated, then the name of the pod will change and it will be different than the pod name specified in the above metrics "my-app-65c7d6576b-5pgjq". So the data for the above metrics will not work anymore. and I have to add new metrics again in Grafana. Please let me know How can I overcome this situation.
Answer was provided by manu thankachan:
I have done it. Made some change in the query as follow:
sum(irate(container_cpu_usage_seconds_total{namespace="test",
container_name="my-app", container_name!="POD"}[1m])) by
(container_name)
If pod is created directly(not a part of deployment) then only pod name is same as we mentioned.
If pod is part of Deployment the it will have unique string from replicaset and also ends with random 5 characters to maintain unique name.
So always try to use container_name label or if your Kubernetes version is > v1.16.0 then use container label

Why prometheus expression can not find data

I use prometheus to monitor kuernetes cluster. When i use sum(container_fs_reads_total), the result is 0 . How can I find pod's filesystem reads per seconds
Prometheus graphing dashboard may or may not be getting the values for that metric.
Since this is part of cadvisor and this
Verify the k8s pods associated with cadvisor are up and running.
Check to see that your cadvisor web site has data under /containers for the metric.
Verify in the config map for Prometheus that you are scraping/containers inside the scrape_config.
Once you have the Prometheus Dashboard up, go to the Graph tab and see if the metric has any values for the last couple of days or so.
Then check the targets tab and make sure the cadvisor host is a target and is up.
Those are some suggestions to narrow down your search for verifying the data is being collected and scraped.

How to update services in Kubernetes?

Background:
Let's say I have a replication controller with some pods. When these pods were first deployed they were configured to expose port 8080. A service (of type LoadBalancer) was also create to expose port 8080 publicly. Later we decide that we want to export an additional port from the pods (port 8081). We change the pod definition and do a rolling-update with no downtime, great! But we want this port to be publicly accessible as well.
Question:
Is there a good way to update a service without downtime (for example by adding a an additional port to expose)? If I just do:
kubectl replace -f my-service-with-an-additional-port.json
I get the following error message:
Replace failedspec.clusterIP: invalid value '': field is immutable
If you name the ports in the pods, you can specify the target ports by name in the service rather than by number, and then the same service can direct target to pods using different port numbers.
Or, as Yu-Ju suggested, you can do a read-modify-write of the live state of the service, such as via kubectl edit. The error message was due to not specifying the clusterIP that had already been set.
In such case you can create a second service to expose the second port, it won't conflict with the other one and you'll have no downtime.
If you have more that one pod running for the same service you may use the Kubernetes Engine within the Google Cloud Console as follows:
Under "Workloads", select your Replication Controller. Within that screen, click "EDIT" then update and save your replication controller details.
Under "Discover & Load Balancing", select your Service. Within that screen, click "EDIT" then update and save your service details. If you changed ports you should see those reflecting under the column "Endpoints" when you've finished editing the details.
Assuming you have at least two pods running on a machine (and a restart policy of Always), if you wanted to update the pods with the new configuration or container image:
Under "Workloads", select your Replication Controller. Within that screen, scroll down to "Managed pods". Select a pod, then in that screen click "KUBECTL" -> "Delete". Note, you can do the same with the command line: kubectl delete pod <podname>. This would delete and restart it with the newly downloaded configuration and container image. Delete each pod one at a time, making sure to wait until a pod has fully restarted and working (i.e. check logs, debug) etc, before deleting the next.