I have an application where some values are stored in DB, e.g. one value per second. It is 604800 values per 7 days and if I want to view this value in graph I need some effective way how to get only e.g. 800 values from DB if I have chart with 800px width.
I use some aggregation logic where mean value is computed for values in 2, 3, 4, 5, 6, 10, 12 minute interval and then hour and day interval aggregates are computed.
I use PostgreSQL and this aggregations are computed with statement:
"INSERT INTO aggre_table_ ... SELECT sum(...)/count(*) ... WHERE timestamp > ... and timestamp < ..."
Is there any better way how to do this or what is the best way of data aggregation for later displaying in charts?
Is it better to do this by some trigger or calling stored procedures?
Is there any DB support for aggregations for D3js, Highcharts or Google Charts?
How to aggregate your data is a large topic that is independent of your technology choices. It depends largely on how sensitive the data is, what the important indicators of the data are, what the implications of those indicators are, etc.
Is a single out of range point significant? Or are you looking for the overall trend? These are big questions with answers that aren't always easy.
My general suggestion:
to display a week worth of data, aggregate to hourly averages.
provide a range around that line indicating the distribution of points around each average
if something significant happened within that aggregated point, indicate it with a separate marker
provide drill down capability for each aggregated point to see the full detail charted, if that level of detail is important (chances are, it's not)
In Highcharts (Highstock in the fact) dataGrouping is used for approximation (see demo).
Also, here you can find more about Highstock.
Related
I need to plot trend charts on the react app based on user inputs such as timestamps, devices, etc. I have related time series data in DynamoDB and S3 (which I can query using Athena).
Returning all those millions of data points for a graph seems unreasonable and is super laggy.
I guess one option is "binning" where I decide the number of bins based on how big the time range is and take averages of the readings in that bin. However, concerned about how well it will show the drops and high we need to show them accurately.
Athena queries and DDB queries (due to the 1MB limit) - both seem fairly slow so far.
Of course the size of the response payload is another concern as API and Lambda both limit it to 10 and 6Mb respectively.
Any ideas?
I can't suggest anything smarter than "binning", but if you are concerned that the bucket interval might become too wide and performance might suffer, you can fixate the interval. Then create more than one table. For example, the interval can be 1 hour and you can have a new table for each week.
This is what we did when we had to deal with time series in dynamo. At some point, we decided to switch to Amazon Timestream
I am trying to show the change in moving average by county on a map.
Currently, I have the calculated field for this:
IF ISNULL(LOOKUP(SUM([Covid Count]),-14)) THEN NULL ELSE
WINDOW_AVG(SUM([Covid Count]), -7, 0)-WINDOW_AVG(SUM([Covid Count]), -14, -7)
END
This works in creating a line graph where I filter the dates to only include 15 consecutive dates. This results in one point with the correct change in average.
I would like this to number to be plotted on a map but it says there are just null values.
The formula is only one part of defining a table calculation (a class of calculations performed client side tableau taking the aggregate query results returned from the data source)
Equally critical are the dimensions in play on the view to determine the level of detail of the query, and the instructions you provide to tell Tableau how to slice up or layout the query results before applying the table calc formula. This critical step is known as setting the “partitioning and addressing” for the table calc, sometimes also as setting the “compute using”. Read about it in the online help for table calcs. You can experiment with using the Edit Table Calc dialog by clicking on the corresponding pill.
In short, you probably have to a dimension, such as your Date field to some shelf - likely the detail shelf, and the set the partitioning and addressing, probably to partition by county and address by state.
If you have more than a couple of weeks of data, then you’ll get multiple marks per county. You may need to decide how to handle that on your map.
I have a dataset that contains various wait-time metrics for all appointments in a practice for a year (check-in to call-back, call-back to check-out, etc). It contains appt time (one of about 40 15 minute slots), provider, various wait times.
I can get Tableau to show me, for each 15 minute slot, the average wait times for each provider in the practice.
What I can't seem to be able to do is also display the overall average for the practice for that given time slot so as to be able to compare that provider vs. the "office standard".
I'm super new to trying out Tableau, so I am sure it is something very simple.
Thanks in advance.
Use a level-of-detail (LOD) calculated field. An LOD calculation occurs at whatever aggregation level you specify, rather than what's on the row or column shelf.
You didn't provide any info about your data set so I will use made up names here.
This gives you the overall average wait time, regardless of other dimensions on row/column shelves:
{FIXED : avg([wait time])}
This gives you the overall average wait time per provider, regardless of other dimensions on row/column shelves:
{FIXED [Provider Name] : avg([wait time])}
See the online Tableau help at https://onlinehelp.tableau.com/current/pro/desktop/en-us/calculations_calculatedfields_lod_overview.html for more information. If you have filtering and need to calculate the overall without filters applied, look at the INCLUDE LOD keyword.
It is a Tableau 8.3 Desktop Edition question.
I am trying to aggregate data using two different dimensions. So, I want to aggregate twice: first I want to sum over all the rows and then multiply the results in a cummulative manner (so I can build a graph). How do I do that? Ok, too vague, here follow some more details:
I have a set of historical data. The columns are the date, the rows are the categories.
Easy part: I would like to sum all the rows.
Hard part: Given this those summations I want to build a graph that for each date it shows the product of all the summations from the earlier date till this date.
In another words:
Take the sum of all rows, call it x_i, where i is the date.
For each date i find y_i such that y_i = x_0 * x_1 * ... * x_i (if there is missing data, consider it to be one)
Then show a line graph for the y values versus the date.
I have searched for a solution for this and tried to figure it out by myself, but failed.
Thank you very much for your time and help :)
You need n calculated fields (number of columns you have), and manually do the calculation you need:
y_i = sum(field0)*sum(field1)
Basically because you cannot iterate on columns. For tableau, each column represent a different dimension or measure. So it won't consider that there is a logic order among them, meaning, it won't assume that column A comes before column B. It will assume A and B are different things.
Tableau works better with tables organized as databases. So if you have year columns, you should reorganize your data, eliminate all those columns and create a single field called 'Date', which will identify the value of your measure for that date. Yes, you will have less columns but far more rows. But Tableau works better this way (for very good reasons).
Tableau 9.0 allows you to do that directly. I only watched a demo (it was launched yesterday), but I understand that now there is an option to selected those columns (in the Data Connection tab) and convert them to a database format.
With that done, you can use a PREVIOUS_VALUE function to help you. I'm not with Tableau right now. As soon as I get to it I'll update this with the final answer . Unless you take the lead and discover yourself before that ;)
We have data that is submitted that is only YTD numbers. I'm wondering how I could display numbers that are subtracted along the Date field.
Ie, if I want to show the MTD movement on March. I will have to go March less February.
Now I know I can do this for individual measure fields. But having around 40+ measures seems a bit tedious.
http://kb.tableausoftware.com/articles/knowledgebase/creating-ytd-mtd-calculations
I tried to enter "Measure Values" but that is not a valid measure to put in the calculation.
Is there a way to set up a custom dimension?
Thanks,
Gem
After days of research, can't be done in tableau unless you want to labour for a week creating an almost cell by cell calculation. Data transformation in SQL will be a more feasible solution.
I had pivoted the data previously in SQL, so that I end up with 1 measure column instead of 40+. That enables you to minimise the calculation fields, so that you don't have to repeat all the calculation for individual measures.
Works well. Not for ratios though, as you will need to extract individual measures again so that you can divide them against each other. It's got pros and cons. Number of rows in the DB also multiplies.
Other solutions that preserves the table structure will be to use temp tables and do calculations on several temp tables.