Can you calculate active users using time series - grafana

My atomist client exposes metrics on commands that are run. Each command is a metric with a username element as well a status element.
I've been scraping this data for months without resetting the counts.
My requirement is to show the number of active users over a time period. i.e 1h, 1d, 7d and 30d in Grafana.
The original query was:
count(count({Username=~".+"}) by (Username))
this is an issue because I dont clear the metrics so its always a count since inception.
I then tried this:
count(max_over_time(help_command{job=“Application
Name”,Username=~“.+“}[1w]) -
max_over_time(help_command{job=“Application name”,Username=~“.+“}[1w]
offset 1w) > 0)
which works but only for one command I have about 50 other commands that need to be added to that count.
I tried the:
"{__name__=~".+_command",job="app name"}[1w] offset 1w"
but this is obviously very expensive (timeout in browser) and has issues with integrating max_over_time which doesn't support it.
Any help, am I using the metric in the wrong way. Is there a better way to query... my only option at the moment is the count (format working above for each command)
Thanks in advance.

To start, I will point out a number of issues with your approach.
First, the Prometheus documentation recommends against using arbitrarily large sets of values for labels (as your usernames are). As you can see (based on your experience with the query timing out) they're not entirely wrong to advise against it.
Second, Prometheus may not be the right tool for analytics (such as active users). Partly due to the above, partly because it is inherently limited by the fact that it samples the metrics (which does not appear to be an issue in your case, but may turn out to be).
Third, you collect separate metrics per command (i.e. help_command, foo_command) instead of a single metric with the command name as label (i.e. command_usage{commmand="help"}, command_usage{commmand="foo"})
To get back to your question though, you don't need the max_over_time, you can simply write your query as:
count by(__name__)(
(
{__name__=~".+_command",job=“Application Name”}
-
{__name__=~".+_command",job=“Application name”} offset 1w
) > 0
)
This only works though because you say that whatever exports the counts never resets them. If this is simply because that exporter never restarted and when it will the counts will drop to zero, then you'd need to use increase instead of minus and you'd run into the exact same performance issues as with max_over_time.
count by(__name__)(
increase({__name__=~".+_command",job=“Application Name”}[1w]) > 0
)

Related

Graphite: keepLastValue for a given time period instead of number of points

I'm using Graphite + Grafana to monitor (by sampling) queue lengths in a test system at work. The data that gets uploaded to Graphite is grouped into different series/metrics by properties of the payloads in the queue. These properties can be somewhat arbitrary, at least to the point where they are not all known at the time when the data collection script is run.
For example, a property could be the project that the payload belongs to and this could be uploaded as a separate series/metric so that we can monitor the queues broken down by the different projects.
This has the consequence that Graphite sends a lot of null values for certain metrics if the queues in the test system did not contain any payloads with properties that would group it into that specific series/metric.
For example, if a certain project did not have any payloads in queue at the time when the data collection was ran.
In Grafana this is not so nice as the line graphs don't show up as connected lines and gauges will show either null or the last non-null value.
For line graphs I can just chose to connect null values in Grafana but for gauges thats not possible.
I know about the keepLastValue function in Graphite but that includes a limit for how long to keep the value which I like very much as I would like to keep the last value until the next time data collection is ran. Data collection is run periodically at known intervals.
The problem with keepLastValue is it expects a number of points as this limit. I would rather give it a time period instead. In Grafana the relationship between time and data points is very dynamic so its not easy to hard-code a good limit for keepLastValue.
Thus, my question is: Is there a way to tell Graphite to keep the last value for a given time instead of a given number of points?

Is there a way to force the output of a Stat panel in Grafana to only show values in days

I'm building a Grafana dashboard with some Stat panels that show average, minimum, and maximum time values (see below) for specific fields in my database. I'm storing the data in seconds and setting the value's units to seconds after which the panel displays the time in weeks, days, hours, etc. For the sake of consistency, I would like everything to be shown in days but I haven't been able to find a way to force units for the output value. If it's possible to do this, could someone please point me to some docs, or something that could show me which configurations to make in my panels?
So far, I've tried (without success):
Configure each panel to use units of days
The result of this was that everything showed up in years, etc.
Configure each panel to create a new field by performing a binary calculation where I converted from seconds to days and then I updated the units to be days
The result was that the values were not changed at all -> instead of showing X days, or whatever, it just showed the value in seconds without the units. I'm not sure what I messed up there, but it didn’t change anything.
I found this link that discusses setting a time range for queries
This didn’t end up being useful for what I was trying to do because it was actually geared towards changing the query to a specific date range rather than the output.
I looked through the transformations documentation, stat panel documentation, and a few other panel documentation pages in an effort to see if there was any information on how to do it but I was unable to find anything on forcing the output value to use a specific unit.
Edit:
So I kept messing around with the dashboard and got a solution that works - i.e. it's a "good-enough" solution (see below) - but, now, I'm curious if it's possible to show the units along with the value without it converting it to some other unit. Does anyone have any ideas about this?
One thing to note is that the data for this image is different than the data for the previous one so I'm expecting an inexact conversion to days.
You can use a custom unit. It is a bit tricky to enter the unit in the UI because of the automatic selection but if you enter i.e. "days" (without the quotes) in the Unit field ans instead of leving the field with tab or mouse click use the scrollbar of the combobox and select the last entry "Custom Unit: days".
Hope it helps. And for the record: I am using Garafana 7.1.4

Prometheus query to get memory limit commitment for the entire cluster

I'm using the latest prometheus 2.21.0 and latest node-exporter
Trying to run the query and getting no datapoints found however both metrics kube_pod_container_resource_limits_memory_bytes and node_memory_MemTotal_bytes are working independently and return data
(sum(kube_pod_container_resource_limits_memory_bytes) / :node_memory_MemTotal_bytes:sum)*100
So two questions
I never saw such syntax before :node_memory_MemTotal_bytes:sum - is it valid prometheus query?
What is wrong with the query if the syntax is correct?
This is a convention widely used in prometheus land. It means this metric is not one directly scraped from some target(s), but instead a result of recording rule. This convention is described here.
If queries on both left and right side return data individually but after performing artihmetic on them you are left with no data then it probably means labels on them are not exactly the same. Execute them separately and compare labels you have on your results. Assuming that :node_memory_MemTotal_bytes:sum does return data then you'll probably have to add sum there too to remove any remaining labels there

How do we change the "precision:ms" setting in the Grafana Query Inspector?

I have an InfluxDB database with only x11 data points in it. These data are not displaying correctly (or at least as I would expect) in Grafana when the time between them is shorter than 1ms.
If I insert data points 1 ms apart, then everything works as expected and I see all x11 points at the correct times, as shown below.:
However, if I delete these points and upload new ones but this time one point per 100 μs, then although the data displays correctly in InfluxDB, in Grafana I see only two points in my graph:
It seems like the data is being rounded/binned to the nearest millisecond, an that this is related to the “precision=ms” setting in the query here:
but I cannot find any way to change this setting. What is the correct way to fix this?
You can't configure Grafana to support different time precision for the InfluxDB. It is hardcoded in the source code: https://github.com/grafana/grafana/blob/36fd746c5df1438f27aa33fc74b24be77debc7ff/public/app/plugins/datasource/influxdb/datasource.ts#L364 (It may need to be fixed in multiple places of the source, not only in this one.)
So the correct way to fix it is to code it, which is of course not in the scope of this question.

JMeter to record results on hourly basis

I have a JMeter project with multiple GET and POST requests and assertions for these. I use Aggregate results and View results tree listeners, but none of these can store results on hourly basis. I tried JMeterPlugins-Standard and JMeterPlugins-Extras packages and jp#gc - Graphs Generator listener, but all of them use aggregated data instead of hourly data. So I would like to get number of successful and failed requests/assertions per hour, maybe a bar chart would be most suitable for this purpose.
I'm going to suggest a non-conventional design-level solution: name your samplers dynamically with hour (or date and hour), so that each hour the name will change, and thus they will appear in different category, i.e.:
The code for such name is:
${__time(dd:hh,)} the rest of sampler name
Such sampler will appear in the following way in Aggregate Report (here I simulated it with minutes/seconds, but same will happen with days/hours, just on larger scale):
Pros and cons of such approach:
Simple, you can aggregate anything by hour, minute, or any other time slice while test is running, and not by analysis after execution.
Not listener-dependant, can be used with pretty much any listener or visualizer
If you want to also have overall stats, it will require to sum up every sub-category. So it alters data, but in the way that it can still can be added back to original relatively easy.
Calculating __time before every sampler will not be unnoticed completely from performance perspective, but I don't think it will add visible overhead to a script.
You could get the same data by properly aggregating JTL or CSV (whichever you use) after execution, so it doesn't provide you with anything that is not possible to achieve using standard methods
Script needs altering to make this happen. if you have 100s of samplers, it's going to take a while. And if you want to change back...
You might want to use Filter Results Tool which has --start-offset and --end-offset parameters, you can "cut" your results file into "interesting" pieces and plot them according to your requirements.
You can install Filter Results Tool using JMeter Plugins Manager
Also be aware that according to JMeter Best Practices you should
Use as few Listeners as possible; if using the -l flag as above they can all be deleted or disabled.
Don't use "View Results Tree" or "View Results in Table" listeners during the load test, use them only during scripting phase to debug your scripts.
You can get whatever information you need from the .jtl results file, you can specify test results location via -l command-line argument
To get summarized results per hour add to your test plan Generate Summary Results:
Generates a summary of the test run so far to the log file and/or standard output
Update interval in jmeter.properties to your needs ,1 hour, 3600 seconds:
summariser.interval=3600
You will get summary per hour of your requests.
You can try with Jmeter backend Listener. It has integration with graphite and Influxdb. After storing the results in these time series database you can display the result in Grafana dashboard. Grafana has its own filtering of showing the results in hourly, monthly, daily basis and so on.