Prometheus: extract a substring from a label value? - grafana

I have similar services on the same host. For example I have two mysql server named mysql01 e mysql02. I have installed two mysqld_exporter one for each mysql server. The values of label instance are instance="<host>-mysql01" and instance="<host>-mysql02". I have also installed node_exporter on the same host.
I wish to relate mysql_exporter series with node_exporter series. For example, on the same grafana dashboard, dedicated to mysql0x, I wish to visualize metrics about swap memory and buffer pool size. So I need to visualize the following series:
mysql_global_variables_innodb_buffer_pool_size{instance="<host>-mysql0x"}
node_memory_SwapTotal_bytes{instance="<host>"}
How can I extract from instance label value <host>-mysql0x the host part?
As bonus question: there is a best practice to label similar service on the same host?

you can use regex to do that as mentioned in the documentation.
http://docs.grafana.org/reference/templating/#filter-and-modify-the-options-using-a-regex-capture-group-to-return-part-of-the-text
Note: regex in image is just example

Related

how to force сadvisor not to give empty labels in prometheus in cluster kubernetes?

we have a problem, we have cadvisor installed as a daemonset with hostport setup. We request metrics at , for example, worker5:31194/metrics and the request takes a very long time, about 40 seconds. As I understand it, the problem is related to the fact that cadvisor gives away extra empty labels.
looks like
container_cpu_cfs_periods_total{container_label_annotation_cni_projectcalico_org_containerID="",container_label_annotation_cni_projectcalico_org_podIP="",container_label_annotation_cni_projectcalico_org_podIPs="",container_label_annotation_io_kubernetes_container_hash="",container_label_annotation_io_kubernetes_container_ports="",container_label_annotation_io_kubernetes_container_preStopHandler="",container_label_annotation_io_kubernetes_container_restartCount="",container_label_annotation_io_kubernetes_container_terminationMessagePath="",container_label_annotation_io_kubernetes_container_terminationMessagePolicy="",container_label_annotation_io_kubernetes_pod_terminationGracePeriod="",container_label_annotation_kubernetes_io_config_seen="",container_label_annotation_kubernetes_io_config_source="",container_label_app="",container_label_app_kubernetes_io_component="",container_label_app_kubernetes_io_instance="",container_label_app_kubernetes_io_name="",container_label_app_kubernetes_io_version="",container_label_architecture="",container_label_build_date="",container_label_build_id="",container_label_com_redhat_build_host="",container_label_com_redhat_component="",container_label_com_redhat_license_terms="",container_label_control_plane="",container_label_controller_revision_hash="",container_label_description="",container_label_distribution_scope="",container_label_git_commit="",container_label_io_k8s_description="",container_label_io_k8s_display_name="",container_label_io_kubernetes_container_logpath="",container_label_io_kubernetes_container_name="",container_label_io_kubernetes_docker_type="",container_label_io_kubernetes_pod_name="",container_label_io_kubernetes_pod_namespace="",container_label_io_kubernetes_pod_uid="",container_label_io_kubernetes_sandbox_id="",container_label_io_openshift_expose_services="",container_label_io_openshift_tags="",container_label_io_rancher_rke_container_name="",container_label_k8s_app="",container_label_license="",container_label_maintainer="",container_label_name="",container_label_org_label_schema_build_date="",container_label_org_label_schema_license="",container_label_org_label_schema_name="",container_label_org_label_schema_schema_version="",container_label_org_label_schema_url="",container_label_org_label_schema_vcs_ref="",container_label_org_label_schema_vcs_url="",container_label_org_label_schema_vendor="",container_label_org_label_schema_version="",container_label_org_opencontainers_image_created="",container_label_org_opencontainers_image_description="",container_label_org_opencontainers_image_documentation="",container_label_org_opencontainers_image_licenses="",container_label_org_opencontainers_image_revision="",container_label_org_opencontainers_image_source="",container_label_org_opencontainers_image_title="",container_label_org_opencontainers_image_url="",container_label_org_opencontainers_image_vendor="",container_label_org_opencontainers_image_version="",container_label_pod_template_generation="",container_label_pod_template_hash="",container_label_release="",container_label_summary="",container_label_url="",container_label_vcs_ref="",container_label_vcs_type="",container_label_vendor="",container_label_version="",id="/kubepods/burstable/pod080e6da8-7f00-403d-a8de-3f93db373776",image="",name=""} 3.572708e+06
is there any solution to remove the empty label or remove the label altogether?
I found two parameters, the first one suited me, but you never know who will need the second, there is little information, so I decided to post the answer
-store_container_labels
convert container labels and environment variables into labels on prometheus
metrics for each container. If flag set to false, then only metrics exported are
container name, first alias, and image name (default true)
-whitelisted_container_labels string
comma separated list of container labels to be converted to labels on prometheus
metrics for each container. store_container_labels must be set to false for this to
take effect.

Prometheus query to get memory limit commitment for the entire cluster

I'm using the latest prometheus 2.21.0 and latest node-exporter
Trying to run the query and getting no datapoints found however both metrics kube_pod_container_resource_limits_memory_bytes and node_memory_MemTotal_bytes are working independently and return data
(sum(kube_pod_container_resource_limits_memory_bytes) / :node_memory_MemTotal_bytes:sum)*100
So two questions
I never saw such syntax before :node_memory_MemTotal_bytes:sum - is it valid prometheus query?
What is wrong with the query if the syntax is correct?
This is a convention widely used in prometheus land. It means this metric is not one directly scraped from some target(s), but instead a result of recording rule. This convention is described here.
If queries on both left and right side return data individually but after performing artihmetic on them you are left with no data then it probably means labels on them are not exactly the same. Execute them separately and compare labels you have on your results. Assuming that :node_memory_MemTotal_bytes:sum does return data then you'll probably have to add sum there too to remove any remaining labels there

Grafana and Prometheus: add metrics automatically

I'm using Grafana and Prometheus to monitor our server. We have a lot of database procedures like "select_users" or "insert_task". In order to monitor how many pending database procedure calls are there in the server, we add data points for every procedure call in Prometheus dynamically. Now we have data points like "pending_select_users", "pending_insert_task" in Prometheus.
However, since there are so many database procedures(and the number will increase during developing), it's not very practical for us to add metrics in Grafana for each data point manually. Is there a way we can add metrics dynamically in Grafana? Since all the data point have a common name prefix("pending_"), can we add metrics in Grafana with wildcard? Or is there a better way to do this?
Since Grafana uses JSON as the underlying dashboard DSL, you could dynamically create dashboards, every time you add a new metric, and import it (via API) into Grafana.
I'd add an automation on top of your Prometheus targets, scrape the metrics, and if new metrics (with the required prefix) are found without a matching dashboard, the automation would create it and import it into Grafana.
Grafana API: http://docs.grafana.org/http_api/ (specifically for Dashbboards).
The solution described by #Eitan is definitely feasible. The same goes for using a library like grafonnet to generate dashboards dynamically.
But the simplest approach in my opinion would be to create a variable in Grafana that contains all the label values you are interested in. Something like
label_values(metric_name{label_name=~"prefix*"}, label_name)
should work for that. And then use the repeating panels / rows feature of Grafana to repeat a set of panels for every value in the variable. Though this could get out of hand if you have dozens / hundreds of distinct values.
https://grafana.com/docs/grafana/latest/variables/repeat-panels-or-rows/
https://grafana.com/blog/2020/06/09/learn-grafana-how-to-automatically-repeat-rows-and-panels-in-dynamic-dashboards/
If you want to generate just a single dashboard from your Proimetheus metrics sample, you can use this service:
http://eljah.tatar/micrometer2grafana/

How to properly monitor ELB latency on AWS using Grafana?

I am trying to monitor Latency on ElasticBeanstalk environment using Grafana.
I get some things to work, and some things do not provide any information.
I am using "CloudWatch" data source.
There is ELB and ApplicationELB.
The ApplicationELB does not offer Latency metric. In fact, every metric I select here will result with "no data".
When I configure monitoring on AWS, I get this following graph:
I am able to query for Latency on a region using Grafana and I do get some correlation
As you can see around 13:50 some requests timed-out. But it is also obvious Grafana is showing additional information from other environments which I would like to ignore.
My query currently looks like this:
Which I know is too broad, but I do not know how to refine.
I tried using "InstanceName" as dimension, but it is not clear to me which ELB I should look for, and seems to me like ApplicationELB should be what I am looking for, but that one does not offer Latency and does not provide any data either way.
Using AvailabilityZone does not help, and that's the only other option for dimension (other than InstanceName).
I need a way to refine the query so I see the same result in AWS and Grafana.
A clarification about ApplicationELB and ELB would be great also!
Application ELB vs ELB: they are just different types of load balancers provided by AWS https://aws.amazon.com/elasticloadbalancing/ - I'm not sure which one is used by ElasticBeanstalk.
You need to add dimension to filter your metrics. Some metrics may need multiple dimensions for correct filtering. Available dimensions are available in the docs. For example LoadBalancerName is a correct dimension for AWS/ELB namespace: https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-cloudwatch-metrics.html
I recommend to use existing published AWS dashboard(s) (https://github.com/monitoringartist/grafana-aws-cloudwatch-dashboards - I'm the author) and then just customize them for your needs.

How to differentiate between equally-named Prometheus metrics from dynamically discovered micro-services in Kubernetes

I’m looking for a way to differentiate between Prometheus metrics gathered from different dynamically discovered services running in a Kubernetes cluster (we’re using https://github.com/coreos/prometheus-operator). E.g. for the metrics written into the db, I would like to understand from which service they actually came.
I guess you can do this via a label from within the respective services, however, swagger-stats (http://swaggerstats.io/) which we’re using does not yet offer this functionality (to enhance this, there is an issue open: https://github.com/slanatech/swagger-stats/issues/50).
Is there a way to implement this over Prometheus itself, e.g. that Prometheus adds a service-specific label per time series after a scrape?
Appreciate your feedback!
Is there a way to implement this over Prometheus itself, e.g. that Prometheus adds a service-specific label per time series after a scrape?
This is how Prometheus is designed to be used, as a target doesn't know how the monitoring system views it and prefixing metric names makes cross-service analysis harder. Both setting labels across an entire target and prefixing metric names are considered anti-patterns.
What you want is called a target label, these usually come from relabelling applied to metadata from service discovery.
When using the Prometheus Operator, you can specify targetLabels as a list of labels to copy from the Kubernetes Service to the Prometheus targets.