Grafana query error for Prometheus Node Exporter - grafana

A query that used to work to query Prometheus node exporter RAM load stopped working and I would like to ask for hints where I could debug further.
100 - ((node_memory_MemAvailable_bytes{instance="localhost:9100",job="Node Exporter"} * 100) / node_memory_MemTotal_bytes{instance="localhost:9100",job="Node Exporter"})
My Dashboard shows nothing because of the failed query:
The query used to work, I changed nothing, and don't know how to debug the issue.
The node exporter prometheus is working:

Related

Request URI too large for Grafana - kubernetes dashboard

We are running nearly 100 instances in Production for kubernetes cluster and using prometheus server to create Grafana dashboard. To monitor the disk usage , below query is used
(sum(node_filesystem_size_bytes{instance=~"$Instance"}) - sum(node_filesystem_free_bytes{instance=~"$Instance"})) / sum(node_filesystem_size_bytes{instance=~"$Instance"})
As Instance ip is getting replaced and we are using nearly 80 instances, I am getting error as "Request URI too large".Can someone help to fix this issue
You only need to specify the instances once and use the on matching operator to get their matching series:
(sum(node_filesystem_size_bytes{instance=~"$Instance"})
- on(instance) sum(node_filesystem_free_bytes))
/ on(instance) sum(node_filesystem_size_bytes)
Consider also adding a unifying label to your time series so you can do something like ...{instance_type="group-A"} instead of explicitly specifying instances.

Grafana (v. 8.4.1) not connecting to InfluxDB (v.2.1.1) database

I have three docker containers running. The first runs a python script that writes data from a sensor to the InfluxDB emon_data bucket running in a second container. This works perfectly and i can run queries and create dashboards within InfluxDB. The third container runs Grafana. The data source setting in Grafana that establishes the connection to InfluxDB seems to be correct as it confirms having a connection to the data source - see picture.
However, when I go to set up a dashboard in Grafana it keeps throwing an error stating that the database cannot be found - see picture.
I have tried to find information on this error but am not finding much and what I am finding seems to be for much older versions of InfluxDB and Grafana. Any suggestions or pointers on how to resolve this would be much appreciated.
Baobab

How to query the total memory available to kubernetes nodes

Many Grafana dashboards use a metric named machine_memory_bytes to query the total memory available to a node. Unfortunatly this metric seems to be deprecated and is not exported any more.
But I can not find any substitute to get the desired information except using node_memory_MemTotal_bytes from the node exporter. But this is not very helpful when it comes to building Grafana dashboards.
Is there any way to query the desired information form the cadvisor?
After a little more resarch I found (Kubernetes 1.19) kube_node_status_allocatable_memory_bytes suitable for the job.
Additionally the kube_node_status_capacity_cpu_cores could be used for the calculation of the CPU utlilisation.
Check out the official list of node metrics
Here is example usage:
CPU: kube_node_status_capacity{resource="cpu", unit="core"}
Memory: kube_node_status_capacity{resource="memory", unit="byte"}

Prometheus Exporter is unreachable, what is way to investigate?

I would assume it's more or less common case.
Sometimes we can observe gaps in time series data in Prometheus.
After investigation, we found:
Prometheus was up all time and information from other exporters were exist.
According to "up" metric , exporter was unreachable.
Exporter pod was alive
Looks like exporter application by itself was alive as well, due to some messages in syslog.
Hence, i can conclude we have network problem, which i have no idea how to debug in k8, either Prometheus ignores one exporter (usually the same one) time to time.
Thanks for any hints
One thing you can do to confirm the availability of the exporter is using periodic scraping manually (using a script with curl for example). Or using a scraping tool such as Metricat (https://metricat.dev/).
If you set up interval small enough you might see small windows of unavailability.

Datadog: Slow queries from MongoDB

I have a MongoDB using the database profiler to collect the slowest queries. How can I send this information to Datadog and analyze it in my Datadog dashboard?
Once the datadog is properly installed on your server, you can use the custom metric feature to let datadog read your query result into a custom metric and then use that metric to create a dashboard.
You can find more on custom metric on datadog here
They work with yaml file so be cautious with the formatting of the yaml file that will hold your custom metric.