I'm logging custom metric data into AWS Cloudwatch and trying to graph it. I assumed that Dimensions in Cloudwatch were metadata for enriching my data, but it seems that once you add dimensions you can no longer query across different combinations of dimensions. So for one I don't really see the point of dimensions as any unique combination is basically just a new metric. But more importantly, is there a way to log one set of data with different labels or dimensions and then slice and dice that data (e.g., in Grafana).
To make it more concrete, I am logging cache load times in my application. I have one metric called "cache-miss", with several dimensions, for example:
the cached collection
the customer associated with the cached data
I want to several different graphs:
Total cache misses (i.e., ignore dimensions, just see a count over time)
Total cache misses per collection (aggregate by first dimension)
Total cache misses per customer (aggregate by second dimension)
Is there some way to achieve this with Cloudwatch metrics and/or Grafana (or alternate tool)?
As you have mentioned - https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_concepts.html :
CloudWatch treats each unique combination of dimensions as a separate metric, even if the metrics have the same metric name. You can only retrieve statistics using combinations of dimensions that you specifically published. When you retrieve statistics, specify the same values for the namespace, metric name, and dimension parameters that were used when the metrics were created.
So if you have pushed Total cache misses with 2 dimensions, you can query this metric only with 2 dimensions. So you really can't just see a count over time.
Possible workarounds:
CloudWatch math - see example in CloudWatch does not aggregate across dimensions for your custom metrics
in theory also Grafana 7+ transformation feature https://grafana.com/blog/2020/06/11/new-in-grafana-7.0-data-transformations-for-all-visualizations-that-support-queries/
Or you can switch from the CloudWatch to better TSDB for your use case.
Related
I want to monitor the cpu usage of kafka container, but the graph is chopped up into different pieces. There seem to be gaps in the graph and after each gap a different colored line follows. The time range is last 30 days. For the exporter we use danielqsj/kafka-exporter:v1.4.2
The promql query used to create this graph is:
rate(container_cpu_usage_seconds_total{container="cp-kafka-broker"}[1m])
Can I merge these lines into one continual? If so, with what promql expression/dashboard configuration?
This happens when at least 1 of the labels that are attached to the metric changes. The rate function keeps all the original labels from the underline time series. In Prometheus, each time series is uniquely identified by the metric name container_cpu_usage_seconds_total and any labels (key-value pairs) attached to the metric (container, for instance). This is why Grafana uses different colors because they are different time series.
If you want to get a single series in Grafana you can aggregate using the sum operator:
sum(rate(container_cpu_usage_seconds_total{container="cp-kafka-broker"}[1m]))
which by default will not keep any of the original labels.
I performed a clustering analysis of the media usage of different users in order to find different groups that use a specific set of media (e.g. group 1 use media A, B and C and group 2 use media B, C and D). Then I divided the datset in different groups, since the users belong to a specific group (as a consequence the original dataset and the new datasets have a different size). Within in this groups I like to cluster again which different media sets are used.
How can I determine the number of clusters to guarantee that the results are comparable?
Thank you in advance!
Don't rely on clustering to be stable.
It's a hypothesis generation tool.
You clustered, and now you have the hypothesis that there are groups ABCD of media usage. You should first evaluate if this hypothesis is adequate. Now what you want to do in your next step is to assign the labels to subsets of the data. First of all, you should be able to simply subset this from the previous labels. But if this really is different data, you can label new data, for example using the most similar record (nearest neighbor classification). But that is classification now, because your classes are fixed.
We have configured to use 2 metrics for HPA
CPU Utilization
App specific custom metrics
When testing, we observed the scaling happening, but calculation of no.of replicas is not very clear. I am not able to locate any documentation on this.
Questions:
Can someone point to documentation or code on the calculation part?
Is it a good practice to use multiple metrics for scaling?
Thanks in Advance!
From https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#how-does-the-horizontal-pod-autoscaler-work
If multiple metrics are specified in a HorizontalPodAutoscaler, this calculation is done for each metric, and then the largest of the desired replica counts is chosen. If any of those metrics cannot be converted into a desired replica count (e.g. due to an error fetching the metrics from the metrics APIs), scaling is skipped.
Finally, just before HPA scales the target, the scale recommendation is recorded. The controller considers all recommendations within a configurable window choosing the highest recommendation from within that window. This value can be configured using the --horizontal-pod-autoscaler-downscale-stabilization-window flag, which defaults to 5 minutes. This means that scaledowns will occur gradually, smoothing out the impact of rapidly fluctuating metric values
I'm looking for a function in Grafana which looks like it should be trivial, but until now I haven't been able to find out how, if at all, it is possible to do.
With the recent templating options, I can easily create my dashboard once, and quickly change the displayed data to look at different subsets of my data, and that's great.
What I'm looking for is a way to combine this functionality to create interactive graphs that show aggregations on different subsets of my data.
E.g., the relevant measurement for me is a "clicks per views" measurement.
For each point in the series, I can calculate this ratio for each state (or node) in code before sending it to the graphite layer, and this is what I've been doing until now.
My problem starts where I want to combine several states together, interactively: I could use the "*" in one of the nodes, and use an aggregate function like "avg" or "sum" to collect the different values covered in the sub-nodes together.
Problem is, I can't just use an average of averages - as the numbers may be calculated on very different sample sizes,the results will be highly inaccurate.
Instead, I'd like to send to the graphite the "raw data" - number of clicks and number of views per state for each point in the series, and have grafana calculate something like "per specified states, aggregate number of clicks AND DIVIDE BY aggregate number of views".
Is there a was to do this? as far as I can tell, the asPercent function doesn't seem to do the trick.
You can use a query like this in edit mode:
SELECT (aggregate_function1(number_of_clicks)/aggregate_function2(number_of_views)) as result
FROM measurement_name
WHERE $timeFilter
GROUP BY time($_interval), state.
I am new to Graphite and can't understand how to do this:
I have a large number of time-metrics (celery metrics) in format stats.timers.*.median
I want to show:
Top N metrics with average value above X
Display them on one graph with the names of metrics
Now I have averageAbove(stats.timers.*.median,50) but it displays graphs without names and renders strangely and in bad scale. Help, please! :)
You will need to chain a few functions together in order to get the desired result.
limit(sortByMaxima(averageAbove(stats.timers.*.median, X)), N)
Starting the the averageAbove as the base.
The next thing you want to do is get all the metrics in order, "top-to-bottom" by using sortByMAxima.
Then you can limit the results that are rendered with the limit function.
You might not be rending the legend if you have too many metrics for the size of the graph. You can do 3 things.
Make the graph larger
Reduce the number of metrics using limit
Force the legend to be displayed via hideLegend