How can I add a line per grouping in grafana without hardcoding multiple queries? - grafana

I have data that can be aggregated by the company that produced the data item. There are around 96 such companies. As such I don't want to use 96 queries, as this seems inefficient.
How can I get grafana to do this with time series data please so I can get all the lines on the same graph?
CAVEAT: I get that 96 data streams is a lot on one graph. However I'm interested in boundary breaches and outliers which don't occur very often per supplier.

Grafana creates multiple lines if you have 3 variables called time, metric and value. Metric has to be a string and in this case I suppose it is the company id. If it is an integer id then you need to cast it to string. Also the query type needs to be time series.
For me, this works:
SELECT
date AS 'time',
cast(runDate AS char) as 'metric',
value/1000 as 'value'
FROM forecast
WHERE $__timeFilter(runDate)
ORDER BY date

Related

Find count of active users in the last 29 days in Tableau

Require assistance in calculating the Total Active Users from March 16 2020 to Feb 16 2020.
I have tried using calculated fields, but not getting the correct results. Please advise.
Thank you,
Nirmal
To find the number of unique values that appear in a field, say [user_code], you can use the COUNT DISTINCT function, COUNTD() as in COUNTD([user_code])
To restrict the data to a particular time range, one way is put your date field on the Filter shelf and choose the settings that include only the data rows you want — say the range from 2/16 to 3/16 as you stated.
Alternatively, you can push the filtering condition into the calculation with an IF function call, as in COUNTD(IF <data is relevant> THEN [user_code] END) Thus effectively combining the two techniques. That works because if there is no ELSE clause and the IF condition is False then the IF statement evaluates to null. Since COUNTD() silently ignores nulls, like other aggregation functions, the expression acts as if the irrelevant data rows were filtered.
So, for example,
COUNTD(IF [dates] >= #2/16/2020# AND [dates] <= #3/16/2020# THEN [user_code] END)
Will tell you then number of unique user codes during the period between 2/16 and 3/16. The DateDiff() function will probably be useful in more elaborate tests.
Finally, what if you want more flexibility? You could easily use Parameters or Filter controls to let the user choose the date range interactively.
If you want this calculation repeated for each possible day, showing the unique users in the preceding 30 day period, as some sort of rolling calculation, then you’ll need to learn about some more advanced features. Either multiple calculations as above for different time ranges, using Table Calculations, or some data prep and/or data padding with Tableau Prep Builder, Python or some other technique — mostly because in that scenario each data row contributes to multiple rolling counts, rather than one count when partitioning the data by some dimension.

Prometheus query equivalent to SQL DISTINCT

I have multiple Prometheus instances providing the same metric, such as:
my_metric{app="foo", state="active", instance="server-1"} 20
my_metric{app="foo", state="inactive", instance="server-1"} 30
my_metric{app="foo", state="active", instance="server-2"} 20
my_metric{app="foo", state="inactive", instance="server-2"} 30
Now I want to display this metric in a Grafana singlestat widget. When I use the following query...
sum(my_metric{app="foo", state="active"})
...it, of course, sums up all values and returns 40. So I tell Prometheus to sum it by instance...
sum(my_metric{app="foo", state="active"}) by (instance)
...which results in a "Multiple Series Error" in Grafana. Is there a way to tell Prometheus/Grafana to only use the first of the results?
I don't know of a distinct, but I think this would work too:
topk(1, my_metric{app="foo", state="active"} by (instance))
Check out the second to last example in here:
https://prometheus.io/docs/prometheus/latest/querying/examples/
One way I just found is to additionally do an average over all values:
avg(sum(my_metric{app="foo", state="active"}) by(instance))
If you need to return an arbitrary time series out of multiple matching time series, then this can be done with topk() or bottomk() functions. For example, the following query returns a single time series with the maximum value out of multiple time series which match my_metric{app="foo", state="active"}:
topk(1, my_metric{app="foo", state="active"})
You need to set instant query option in Grafana when using topk(). Otherwise topk(1, ...) may return multiple time series when it is used for building a graph with range query. This is because topk(1, ...) selects a single time series with the max value individually per each point on the graph. Different points on the graph may have different time series with the max value. There is a workaround, which allows returning a single series out of many series on a graph in alternative Prometheus-like systems such as VictoriaMetrics. It provides topk_* and bottomk_* functions for this purpose. See, for example, topk_last or topk_avg.
Note that topk() has no common grounds with DISTINCT from SQL. If you need to select distinct label values with PromQL, then you need to use count(...) by (label). It will return unique label values for the given label alongside the number of unique time series per each label value. For example, count(my_metric) by (app) will return unique app label names for time series with my_metric name. This is roughly equivalent to the following SQL with DISTINCT clause:
SELECT DISTINCT app FROM my_metric
See count() docs for details.

How to get all missing days between two dates

I will try to explain the problem on an abstract level first:
I have X amount of data as input, which is always going to have a field DATE. Before, the dates that came as input (after some process) where put in a table as output. Now, I am asked to put both the input dates and any date between the minimun date received and one year from that moment. If there was originally no input for some day between this two dates, all fields must come with 0, or equivalent.
Example. I have two inputs. One with '18/03/2017' and other with '18/03/2018'. I now need to create output data for all the missing dates between '18/03/2017' and '18/04/2017'. So, output '19/03/2017' with every field to 0, and the same for the 20th and 21st and so on.
I know to do this programmatically, but on powercenter I do not. I've been told to do the following (which I have done, but I would like to know of a better method):
Get the minimun date, day0. Then, with an aggregator, create 365 fields, each has that "day0"+1, day0+2, and so on, to create an artificial year.
After that we do several transformations like sorting the dates, union between them, to get the data ready for a joiner. The idea of the joiner is to do an Full Outer Join between the original data, and the data that is going to have all fields to 0 and that we got from the previous aggregator.
Then a router picks with one of its groups the data that had actual dates (and fields without nulls) and other group where all fields are null, and then said fields are given a 0 to finally be written to a table.
I am wondering how can this be achieved by, for starters, removing the need to add 365 days to a date. If I were to do this same process for 10 years intead of one, the task gets ridicolous really quick.
I was wondering about an XOR type of operation, or some other function that would cut the number of steps that need to be done for what I (maybe wrongly) feel is a simple task. Currently I now need 5 steps just to know which dates are missing between two dates, a minimun and one year from that point.
I have tried to be as clear as posible but if I failed at any point please let me know!
Im not sure what the aggregator is supposed to do?
The same with the 'full outer' join? A normal join on a constant port is fine :) c
Can you calculate the needed number of 'dublicates' before the 'joiner'? In that case a lookup configured to return 'all rows' and a less-than-or-equal predicate can help make the mapping much more readable.
In any case You will need a helper table (or file) with a sequence of numbers between 1 and the number of potential dublicates (or more)
I use our time-dimension in the warehouse, which have one row per day from 1753-01-01 and 200000 next days, and a primary integer column with values from 1 and up ...
You've identified you know how to do this programmatically and to be fair this problem is more suited to that sort of solution... but that doesn't exclude powercenter by any means, just feed the 2 dates into a java transformation, apply some code to produce all dates between them and for a record to be output for each. Java transformation is ideal for record generation
You've identified you know how to do this programmatically and to be fair this problem is more suited to that sort of solution... but that doesn't exclude powercenter by any means, just feed the 2 dates into a java transformation, apply some code to produce all dates between them and for a record to be output for each. Java transformation is ideal for record generation
Ok... so you could override your source qualifier to achieve this in the selection query itself (am giving Oracle based example as its what I'm used to and I'm assuming your data in is from a table). I looked up the connect syntax here
SQL to generate a list of numbers from 1 to 100
SELECT (MIN(tablea.DATEFIELD) + levquery.n - 1) AS Port1 FROM tablea, (SELECT LEVEL n FROM DUAL CONNECT BY LEVEL <= 365) as levquery
(Check if the query works for you - haven't access to pc to test it at the minute)

Best way to store metric data used for graphs

What is the best way to store metrics data used in displaying graphs?
Currently I have a table analytics(domain::text, interval_in_days::int, grouping::text, metric::text, type::text, labels[], data[], summary::json)
domain is the overall category of the metrics. Like what part of the application they're under. Could be sales or support etc.
the interval_in_days and grouping are 'view options' the end user can specify at the interface level to have a different view of the data points.
grouping can be date, day_of_week or time_of_day
interval_in_days can be 7, 30 or 90
labels is an array of the labels on the x-axis and data are the corresponding datapoints.
type is either data_series or summary. If data series, the row represent's the data used for drawing the graph, while a summary has the summary:json field populated with an object like {total_number_of_X: 132, median_X: 320.. etc}
metric is simply the metric the corresponding graph represents, so there's a separate graph for each value of metric
From this it follows that for each metric/graph I display, I have 9 (3 intervals * 3 groupings). For each domain I have a single row with type summary.
Every few hours I aggregate a lot of data across multiple tables into the analytics table. So I don't have to perform expensive queries adhoc.
I feel this is not the optimal approach, so I'm really interested in seeing how other people accomplishes the same task or any suggestions.
There is nothing wrong with storing 9 rows of raw data and later aggregating them to something more comfortable. It's a common approach and has performance benefits in some situations.
What I would really re-think in your design are the datatypes. From your description it seems you can transform all ::text fields into something like ::varchar(20). Then you can use STORAGE PLAIN on these columns and your table will become more efficient.
Also, consider adding foreign keys to describe what is stored in individual columns. For example, you stated grouping can be date, day_of_week or time_of_day, so you could have a groupings table that will list these options. But again, the foreign key would have to be covered by an index, so you may want to skip on that due to performance reasons.

Best way to query 4 B+ records in Tableau

I am looking a best way to analyse 4B records (1TB data) stored in Vertica using Tableau. I tried using extract of 1M records which works perfectly. but dont know how to manage 4B records, because its taking too long to query on 4B records.
I have following dataset :
timestamp id url domain keyword nor_word cat_1 cat_2 cat_3
So here I need to create descending list of Top 10 ID's, Top 10 url, Top 10 domain, Top 10 keyword, Top 10 nor_word, Top 10 cat_1, Top 10 cat_2, Top 10 cat_3 depending count of each field value in separate worksheet and combine all worksheet in one dashboard.
There is no primary key. This dataset of 1 month so I want to make global filter start date and end date to reduce the query size. But don't know how to create global date filter and display on dashboard ?
You have two questions, one about Vertica and one about Tableau. You should split these up.
Regarding Vertica, you need to know that Vertica stores data in ascending sort order in physical storage. This means that an additional step will always be required anytime you want to get a descending sort order.
I would suggest creating a partition on the date, and subsequently running Database Designer (DBD) in incremental mode and using your queries as samples. By partitioning the data, Vertica can eliminate the partitions during optimization.
Running the DBD will generate some better optimized projections. You should consider the trade-off between how often you will need this data and whether it's worth creating these additional projections as it will impact your load performance.