How to properly monitor ELB latency on AWS using Grafana? - grafana

I am trying to monitor Latency on ElasticBeanstalk environment using Grafana.
I get some things to work, and some things do not provide any information.
I am using "CloudWatch" data source.
There is ELB and ApplicationELB.
The ApplicationELB does not offer Latency metric. In fact, every metric I select here will result with "no data".
When I configure monitoring on AWS, I get this following graph:
I am able to query for Latency on a region using Grafana and I do get some correlation
As you can see around 13:50 some requests timed-out. But it is also obvious Grafana is showing additional information from other environments which I would like to ignore.
My query currently looks like this:
Which I know is too broad, but I do not know how to refine.
I tried using "InstanceName" as dimension, but it is not clear to me which ELB I should look for, and seems to me like ApplicationELB should be what I am looking for, but that one does not offer Latency and does not provide any data either way.
Using AvailabilityZone does not help, and that's the only other option for dimension (other than InstanceName).
I need a way to refine the query so I see the same result in AWS and Grafana.
A clarification about ApplicationELB and ELB would be great also!

Application ELB vs ELB: they are just different types of load balancers provided by AWS https://aws.amazon.com/elasticloadbalancing/ - I'm not sure which one is used by ElasticBeanstalk.
You need to add dimension to filter your metrics. Some metrics may need multiple dimensions for correct filtering. Available dimensions are available in the docs. For example LoadBalancerName is a correct dimension for AWS/ELB namespace: https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-cloudwatch-metrics.html
I recommend to use existing published AWS dashboard(s) (https://github.com/monitoringartist/grafana-aws-cloudwatch-dashboards - I'm the author) and then just customize them for your needs.

Related

Best fit prometheus metric data model for grafana sysdig

I am using prometheues metric in grafana UI emitted from sysdig dashboard.
I am implementing a state change metric i.e pod states and my data mode is below:
pod_request_state_duration(id,method="create",demoapi,state=creating-running)
I want to use promQL to find the changing state and display in grafana UI. Please help.
As the query is not exact, i will try to give a best possible solution.
Try using delta.
delta(pod_request_state_duration{id,method="create",demoapi,state=creating-running}[time_duration])

Prometheus query to get memory limit commitment for the entire cluster

I'm using the latest prometheus 2.21.0 and latest node-exporter
Trying to run the query and getting no datapoints found however both metrics kube_pod_container_resource_limits_memory_bytes and node_memory_MemTotal_bytes are working independently and return data
(sum(kube_pod_container_resource_limits_memory_bytes) / :node_memory_MemTotal_bytes:sum)*100
So two questions
I never saw such syntax before :node_memory_MemTotal_bytes:sum - is it valid prometheus query?
What is wrong with the query if the syntax is correct?
This is a convention widely used in prometheus land. It means this metric is not one directly scraped from some target(s), but instead a result of recording rule. This convention is described here.
If queries on both left and right side return data individually but after performing artihmetic on them you are left with no data then it probably means labels on them are not exactly the same. Execute them separately and compare labels you have on your results. Assuming that :node_memory_MemTotal_bytes:sum does return data then you'll probably have to add sum there too to remove any remaining labels there

Grafana and Prometheus: add metrics automatically

I'm using Grafana and Prometheus to monitor our server. We have a lot of database procedures like "select_users" or "insert_task". In order to monitor how many pending database procedure calls are there in the server, we add data points for every procedure call in Prometheus dynamically. Now we have data points like "pending_select_users", "pending_insert_task" in Prometheus.
However, since there are so many database procedures(and the number will increase during developing), it's not very practical for us to add metrics in Grafana for each data point manually. Is there a way we can add metrics dynamically in Grafana? Since all the data point have a common name prefix("pending_"), can we add metrics in Grafana with wildcard? Or is there a better way to do this?
Since Grafana uses JSON as the underlying dashboard DSL, you could dynamically create dashboards, every time you add a new metric, and import it (via API) into Grafana.
I'd add an automation on top of your Prometheus targets, scrape the metrics, and if new metrics (with the required prefix) are found without a matching dashboard, the automation would create it and import it into Grafana.
Grafana API: http://docs.grafana.org/http_api/ (specifically for Dashbboards).
The solution described by #Eitan is definitely feasible. The same goes for using a library like grafonnet to generate dashboards dynamically.
But the simplest approach in my opinion would be to create a variable in Grafana that contains all the label values you are interested in. Something like
label_values(metric_name{label_name=~"prefix*"}, label_name)
should work for that. And then use the repeating panels / rows feature of Grafana to repeat a set of panels for every value in the variable. Though this could get out of hand if you have dozens / hundreds of distinct values.
https://grafana.com/docs/grafana/latest/variables/repeat-panels-or-rows/
https://grafana.com/blog/2020/06/09/learn-grafana-how-to-automatically-repeat-rows-and-panels-in-dynamic-dashboards/
If you want to generate just a single dashboard from your Proimetheus metrics sample, you can use this service:
http://eljah.tatar/micrometer2grafana/

How to differentiate between equally-named Prometheus metrics from dynamically discovered micro-services in Kubernetes

I’m looking for a way to differentiate between Prometheus metrics gathered from different dynamically discovered services running in a Kubernetes cluster (we’re using https://github.com/coreos/prometheus-operator). E.g. for the metrics written into the db, I would like to understand from which service they actually came.
I guess you can do this via a label from within the respective services, however, swagger-stats (http://swaggerstats.io/) which we’re using does not yet offer this functionality (to enhance this, there is an issue open: https://github.com/slanatech/swagger-stats/issues/50).
Is there a way to implement this over Prometheus itself, e.g. that Prometheus adds a service-specific label per time series after a scrape?
Appreciate your feedback!
Is there a way to implement this over Prometheus itself, e.g. that Prometheus adds a service-specific label per time series after a scrape?
This is how Prometheus is designed to be used, as a target doesn't know how the monitoring system views it and prefixing metric names makes cross-service analysis harder. Both setting labels across an entire target and prefixing metric names are considered anti-patterns.
What you want is called a target label, these usually come from relabelling applied to metadata from service discovery.
When using the Prometheus Operator, you can specify targetLabels as a list of labels to copy from the Kubernetes Service to the Prometheus targets.

Billing by tag in Google Compute Engine

Google Compute Engine allows for a daily export of a project's itemized bill to a storage bucket (.csv or .json). In the daily file I can see X-number of seconds of N1-Highmem-8 VM usage. Is there a mechanism for further identifying costs, such as per tag or instance group, when a project has many of the same resource type deployed for different functional operations?
As an example, Qty:10 N1-Highmem-8 VM's are deployed to a region in a project. In the daily bill they just display as X-seconds of N1-Highmem-8.
Functionally:
2 VM's might run a database 24x7
3 VM's might run batch analytics operation averaging 2-5 hrs each night
5 VM's might perform a batch operation which runs in sporadic 10 minute intervals through the day
final operation writes data to a specific GS Buckets, other operations read/write to different buckets.
How might costs be broken out across these four operations each day?
The Usage Logs do not provide 'per-tag' granularity at this time and it can be a little tricky to work with the usage logs but here is what I recommend.
To further break down the usage logs and get better information out of em, I'd recommend trying to work like this:
Your usage logs provide the following fields:
Report Date
MeasurementId
Quantity
Unit
Resource URI
ResourceId
Location
If you look at the MeasurementID, you can choose to filter by the type of image you want to verify. For example VmimageN1Standard_1 is used to represent an n1-standard-1 machine type.
You can then use the MeasurementID in combination with the Resource URI to find out what your usage is on a more granular (per instance) scale. For example, the Resource URI for my test machine would be:
https://www.googleapis.com/compute/v1/projects/MY_PROJECT/zones/ZONE/instances/boyan-test-instance
*Note: I've replaced the "MY_PROJECT" and "ZONE" here, so that's that would be specific to your output along with the name of the instance.
If you look at the end of the URI, you can clearly see which instance that is for. You could then use this to look for a specific instance you're checking.
If you are better skilled with Excel or other spreadsheet/analysis software, you may be able to do even better as this is just an idea on how you could use the logs. At that point it becomes somewhat a question of creativity. I am sure you could find good ways to work with the data you gain from an export.
9/2017 update.
It is now possible to add user defined labels, then track usage and billing by these labels for Compute and GCS.
Additionally, by enabling the billing export to Big Query, it is then possible to create custom views or hit Big Query in a tool more friendly to finance people such as Google Docs, Data Studio, or anything which can connect to Big Query. Here is a great example of labels across multiple projects to split costs into something friendlier to organizations, in this case a Data Studio report.