Say I have a data set that has x-sections of country (US, CANADA) and then state/province then year then week. My data stack has 2 countries, 57 states/provinces, 3 years, and 52 weeks. I wanto to create a variable revenue that for each week, sums the last 52 weeks within the x-section.
Right now i have a loop but its very, very slow.
for each countries,
for each state,
for the last 2 years,
for each week,
sum the last 52 elements
Does anyone know how I can do this with vectorization?
You might want to look into using the function s=sum(X,DIM). Without more info about your dataset (please provide example), we cannot go into great detail.
For weighted sums over rolling window, filter() can work well.
Related
I'm trying to understand if it's possible to calculate a 1 month sum of revenue data in one of my measurements. For each day, I would like the sum of the previous 30 days.
Is this possible in InfluxDB or through Grafana's query interface?
A moving average is a moving sum, divided by the number of samples. So if you want a moving sum of the past 30 values:
select 30*moving_average(field_name, 30) from measurement
Edited to add:
As Peter Halicky points out in the comments, this is is not the past 30 days. It's the past 30 data points.
If you will always have data for every single day, it's not an issue.
If you're missing a day's data, you'll still get a 30-sample average, but it'll stretch over 31 days instead of 30.
If you don't actually care about the calendar, but want to know the past 30 days of activity, this is not a problem.
If it is a problem, there are a few work-arounds. One that's probably trickier than it sounds: ensure that there is always an entry for each day.
A more robust way is to have the reporting app do this in two steps. Something like this (haven't worked out all the details, but you get the idea):
find the number of data points in the past 30 days, using a query like select count(field_name) from measurement where time > now() - 30d.
Use this number (call it n) to form the query: select n*moving_average(field_name, n) from measurement where time > now - 30d.
Yes, definitely it's possible.
Just set this part of your query like this:
SELECT sum("value") FROM "YOUR_TAG_NAME"
WHERE $timeFilter GROUP BY time(30d) fill(null)
Just make sure that your dashboard time included Last 30 days (at least).
I have a chart that shows the number of departures for a given 15 minute interval as seen here.
I need to compound these counts backwards for one hour. For example, the 3 departures shown at 11:00 need to also be represented at the 10:00, 10:15, 10:30, and 10:45 columns. When completed, the 10:00 would have a total of 6 departures (10:15 -> 6, 10:30 ->5, 10:45 -> 4, 11:00 -> 4).
I have done this via VBA in excell, but am now needing to replicate the chart in Tableau and have been beating my head in for about two weeks now. I'd love to hear any and all suggestions.
You can use a Cartesian join against a large enough date range of your choosing to in effect resample your data and add the additional time intervals you desire.
For example, if you have a month's worth of data (min date -> max date = 30 days), then you have (30 * 24 * 4) 2880 15 minute intervals.
Create all those intervals in a separate data sheet
Add a bogus column with value of link for all rows
Create the same bogus in your actual data source
Join the two sheets together on the link column
Create a calculated field that is something along the following:
[Interval] <= [Flight Time] AND [Interval] >= DATEADD('hour',-1,[Flight Time])
This calculated field will evaluate to TRUE when the interval time is within one hour before the flight time. You can then drag this field onto your filter shelf and select TRUE value only. Effectively your [Interval] field becomes your new date field.
I would recommend adding that filter to the context and applying across the entire datasource. Before you add this filter you'll have 2880 times the about of data so be sure to do a live view first. Be careful with extracts using Cartesian joins as you could potentially be extracting more than you bargained for.
See the following links for different techniques on how to do this and re-sampling dates in general in tableau.
https://community.tableau.com/thread/151387
Depending on the size of your data (and if a live view is not necessary) it is often times easier and more efficient to do this type of pre-processing outside of tableau in SQL or something like python's pandas library.
Here is another solution provided from the Tableau Cumunity Forum. I have not tried tyvich's solution yet, but I know this one got me where I needed. Please follow the link to see the solution using moving table calculations.
https://community.tableau.com/thread/251154
I'm in healthcare and we're trying to assess the number of discharges we have per hour of day, but we'd also like to be able to filter them down by day of week, or specific month, or even a particular day of week in a particular month (e.g. " what is the average number of discharges per hour on Mondays in January?")
I'm confident that Tableau can do this, but haven't been able to make the averages show up in my line graph... every time that I convert it from COUNT to AVG, the line simply goes straight. I got close when I did a table calculation to find the Average (dividing the count per hour by the number of days captured in the report), but when I add a filter for either the month or day of week, selecting one of the options of the filter reduces the total number that is being counted, rather than re-averaging the non-filtered items. (i.e. if the average of the 7 days of the week is "10" for a particular hour, and I deselect the first three days of the week, it's now saying that my average for that hour is roughly 6, despite the fact that all of the days are very close to 10 at that hour.)
Currently, my data table has the following columns:
Account#/MonthYear/HourOfDay/DayOfWeek
ex.12345678/ Jan-17 / 12 /Sunday
I would just create a few calculated fields to differentiate the parts of the calendar you might want to filter/aggregate on. Mixing the month and day of the week with filtering is pretty straight forward with the calculated fields. Then do standard summing to get what you are looking for because an average count of records is always one unless you are throwing some other calculation into the mix. I threw a quick example up on Tableau Public for you to get the idea.
My input text source always contains last 12 months worth of data. e.g: Current month is October. So My input source contains data starting from last Oct 1st to till date. But I want the aggregate statistics to be displayed on a daily basis for last 10 days of sales , 30 days of sales, 45 days of sale per product across various regions
I am trying to use window_avg fuction with something like window_avg(sum(sales), first() + datediff('day', window_min(min([date]))-1, dateadd('month',1,window_min(min([Date]))-1)) * 13,13) something like that. But I am not able to crack the exact logic.
Could you please suggest me some better way to achieve this, rather than using these kind of calculations. Also I am afraid if this goes wrong if there is data missing in the middle one or two days.
Any help is appreciated.
A very simple thing is to use a relative date filter. There's a UI for you to select they last N days.
Put the date on the columns shelf and set it to the date truncation of year-month-days. Put your measure row shelf. Put the date pill on the filter shelf too and use a relative date filter.
If you are doing simple aggregate like the sum of sales for a day it's easy and you'll not need to do anything else. You can can also fairly easily create a table calculation by right clicking on the measure and choosing one of the quick table calculations. Even when I'm doing a more sophisticated calculation, I start with a quick table calculation and then start editing.
If you are doing something like a moving average, the filter and the moving average can interact. For example, if I'm showing a 5 day trailing moving average over 30 day period, the first few days do not get averaged in the same way -- you don't have days over 30 days ago. If that's not really an issue for you, that's cool and you are done.
If it is an issue, it's going to be trickier. I'd suggest creating a second filter based on a table calc. The reason is the order of operations in Tableau. The raw data is filtered then aggregated by the database, then the table calcs are performed. If there are any filters on table calculations, then they are filtered after that. So basically, in my example, you want create a filter for 35 days on the date, then create a table calc on the date -- like using the INDEX() function. Filter the index function to show 30 days worth, then you've got a moving average that uses 35 days to compute the average, but only shows 30.
Setting
As I'm sure many of you do in your vizs, I use date parameters for my data. This is great for creating trend analyses and all types of time series representations. Currently I'm using a line graph to show our sales hit rate history.
Picture
Question
The problem I'm running into is in creating a four week moving average. As you can see the four week moving average doesn't become just that until four weeks in! This creates quite the problem for me. What methods will enable the average at t=0 to show the average for the preceding four weeks?
Formula Used
This is my formula for creating the four week moving average:
WINDOW_AVG([Hit Ratio],-27,0)
Remove your date filter and try:
IIF(ATTR([DATE_FIELD])<T=0,NULL,WINDOW_AVG([Hit Ratio],-27,0))