What is the best way to store metrics data used in displaying graphs?
Currently I have a table analytics(domain::text, interval_in_days::int, grouping::text, metric::text, type::text, labels[], data[], summary::json)
domain is the overall category of the metrics. Like what part of the application they're under. Could be sales or support etc.
the interval_in_days and grouping are 'view options' the end user can specify at the interface level to have a different view of the data points.
grouping can be date, day_of_week or time_of_day
interval_in_days can be 7, 30 or 90
labels is an array of the labels on the x-axis and data are the corresponding datapoints.
type is either data_series or summary. If data series, the row represent's the data used for drawing the graph, while a summary has the summary:json field populated with an object like {total_number_of_X: 132, median_X: 320.. etc}
metric is simply the metric the corresponding graph represents, so there's a separate graph for each value of metric
From this it follows that for each metric/graph I display, I have 9 (3 intervals * 3 groupings). For each domain I have a single row with type summary.
Every few hours I aggregate a lot of data across multiple tables into the analytics table. So I don't have to perform expensive queries adhoc.
I feel this is not the optimal approach, so I'm really interested in seeing how other people accomplishes the same task or any suggestions.
There is nothing wrong with storing 9 rows of raw data and later aggregating them to something more comfortable. It's a common approach and has performance benefits in some situations.
What I would really re-think in your design are the datatypes. From your description it seems you can transform all ::text fields into something like ::varchar(20). Then you can use STORAGE PLAIN on these columns and your table will become more efficient.
Also, consider adding foreign keys to describe what is stored in individual columns. For example, you stated grouping can be date, day_of_week or time_of_day, so you could have a groupings table that will list these options. But again, the foreign key would have to be covered by an index, so you may want to skip on that due to performance reasons.
Related
I have data that can be aggregated by the company that produced the data item. There are around 96 such companies. As such I don't want to use 96 queries, as this seems inefficient.
How can I get grafana to do this with time series data please so I can get all the lines on the same graph?
CAVEAT: I get that 96 data streams is a lot on one graph. However I'm interested in boundary breaches and outliers which don't occur very often per supplier.
Grafana creates multiple lines if you have 3 variables called time, metric and value. Metric has to be a string and in this case I suppose it is the company id. If it is an integer id then you need to cast it to string. Also the query type needs to be time series.
For me, this works:
SELECT
date AS 'time',
cast(runDate AS char) as 'metric',
value/1000 as 'value'
FROM forecast
WHERE $__timeFilter(runDate)
ORDER BY date
Does anyone know if I can add two rows together so that I end up with just one row in Tableau (see screenshot)? So, if both rows are city Aachen and one row has a value for cost but not for purchasing power and the other row has a value for purchasing power but not cost, I would want just one row with both values. I am not interested in the columns "Table Name" and "Document Index(...". Thankful for any help!
Manipulating data like that in Tableau is usually no-go. Nevertheless, you can try Tableau prep and you should be able to do what you need here. Or maybe a different tool (even excel).
With that said, even though you have the info in two rows, the default approach for Tableau is always to aggregate data, so even if you have many rows with similar cases, once you take it to a viz using City (for example) as a dimension, this issue shouldn't really matter.
I'm using Dataprep on GCP to wrangle a large file with a billion rows. I would like to limit the number of rows in the output of the flow, as I am prototyping a Machine Learning model.
Let's say I would like to keep one million rows out of the original billion. Is this possible to do this with Dataprep? I have reviewed the documentation of sampling, but that only applies to the input of the Transformer tool and not the outcome of the process.
You can do this, but it does take a bit of extra work in your Recipe--set up a formula in a new column using something like RANDBETWEEN to give you a random integer output between 1 and 1,000 (in this million-to-billion case). From there, you can filter rows based on whatever random integer between 1 and 1,000 as what you'll keep, and then your output will only have your randomized subset. Just have your last part of the recipe remove this temporary column.
So indeed there are 2 approaches to this.
As Courtney Grimes said, you can use one of the 2 functions that create random-number out of a range.
randbetween :
rand :
These methods can be used to slice an "even" portion of your data. As suggested, a randbetween(1,1000) , then pick 1<x<1000 to filter, because it's 1\1000 of data (million out of a billion).
Alternatively, if you just want to have million records in your output, but either
Don't want to rely on the knowledge of the size of the entire table
just want the first million rows, agnostic to how many rows there are -
You can just use 2 of these 3 row filtering methods: (top rows\ range)
P.S
By understanding the $sourcerownumber metadata parameter (can read in-product documentation), you can filter\keep a portion of the data (as per the first scenario) in 1 step (AKA without creating an additional column.
BTW, an easy way of "discovery" of how-to's in Trifacta would be to just type what you're looking for in the "search-transtormation" pane (accessed via ctrl-k). By searching "filter", you'll get most of the relevant options for your problem.
Cheers!
I am creating an interactive 'calculator' using tableau. I have a series of dataframes that I have crossed with one another, such that the resulting dataframe is every possible combination between the tables, and every row is unique.
Each column is its own worksheet as a table. Each table in the dashboard is a pane. So, here we have a series of tables with selectable units of measurement, and the final pane on the dashboard should filter to the cell for its respective column, on the unique row of the dataset that the user has selected and 'filtered out'.
I'm having some issues getting this to work and not sure why.
The closest I can think to solving this would be 'Cascading Filters.' Here are a couple resources:
General Use
In dashboard action-filter form
The critical piece, however, is that the filters must be selected in a specific order - therefore making them 'cascading.' This may differ from your presumed concept of clicking/filtering in any order on the worksheets to then arrive to a final answer. I do think that this may be a limitation of Tableau - I don't think that a 'many to many' type of relationship can be set up within Action Filters.
I have a calculation and it outputs multiple values. Then I am creating a table on those values. For example, in below data my formula is
if data is 1 then calculation is `one`
if data is 2 then calculation is `two`
if data is 3 then calculation is `three`
as three doesn't really appear in the output, when I create a table, three is not displayed. Is there any way to display it?
I tried table layout >> show empty rows and columns and it didn't work
data calculation
1 one
2 two
Tableau discovers the possible values for a dimension field dynamically from the query results.
If ‘three’ does not appear in your data, then how do you expect Tableau to know to make a column header for that non existent, but potential, value? It can’t read your mind.
This situation does occur often though - perhaps you want row or column headers to remain stable, even when you change filters in a way that causes some to no longer appear in the query results.
There are a few ways you can force Tableau to pad ** or **complete a domain:
one solution is to pad your data to make sure each value for your dimension field appears in at least one data row.
You can often do this easily by using a union to append some extra rows to your original data. You can often add padding rows that don’t impact any results by leaving all your Measure columns null since nulls are ignored by aggregation functions
Another common solution that is a bit more effort is to make what is known as scaffolding data source that is not much more than a list of your dimension members. You can then use that data source as a primary data source with data blending, making your original data source secondary.
There are two situations where Tableau can detect the absence of data and leave space for it in the visualization automatically
for numeric types, you can create a bin field that will automatically pad for missing bins
similarly, date fields can show missing values because, like bins, Tableau can tell when a month doesn’t appear in the data and leave room for it in the view