Why are my values multiplying when I apply Month/Year to my values? - tableau-api

When I apply Month/Year to Cases or Deaths from my data, the values explode. For Cases it goes from approximately 48 million to over 1 billion, and for Deaths it goes from about 700 thousand to over 22 million. However, when I try the same thing with Initial Claims or the Stringency Index, my values remain correct. I'm trying to find the month over month percentage change by the way. And I'm using the Date column. I only select 2020 and 2021 in the filter for Year.
What I'm asking about is Sheet 21.
Link to workbook: https://public.tableau.com/app/profile/nilajah.rivers/viz/CoronaVirusProject_16323687296770/Sheet21

Your problem is that the data points are daily cumulative deaths. If you change the date aggregation to anything other than days, Tableau will default to summing the numbers for all the days in the month. This will give the wrong result, obviously.
If you want to show the correct total deaths or cases regardless of the time aggregation (months, days, weeks etc.) then you could use the New Case or New Death numbers plus a running sum table calculation. This will always give the correct total for the time period.
Table calculations will also allow automatic calculation of the period to period % change from the same data fields.
This is a common problem when working with datasets that offer pre-calculated aggregations. Tableau doesn't need that as it can dynamically calculate the aggregation of a field over any given time period but it is easy to forget which field has pre-aggregated data and which has raw data. Pre-aggregated fields assume a particular time period and can't be used for different time periods without disentangling that assumption (which is unnecessary if you also have the raw data (in this case daily new deaths/cases).

Related

Tableau Summing up aggregated data with FIXED

Data granularity is per customer, per invoice date, per product type.
Generally the idea is simple:
We have a moving average calculation of the volume per week. MA based on last 12 weeks (MA Volume):
window_sum(sum([Volume]),-11,0)/window_count(count([Volume]), -11,0)
We need to see the deviation of the current week vs the MA for that week (Vol DIFF):
SUM([Volume])-[MA Calc]
We need to sum up the deviations for a fixed period of time (Year/Month)
Basically this should show us whether on average, for a given period of time, we deviate positively or negatively vs the base.
enter image description here
Unfortunately I get errors like:
"Argument to SUM (an aggregate function) is already an aggregation, and cannot be further aggregated."
Or
"Level of detail expressions cannot contain table calculations or the ATTR function"
Any ideas how I can go around this one?
Managed to solve this one. Needed to add months to the view and then just WINDOW_SUM(Vol_DIFF).
Simple as that!

How to do a distinct count of a metric using graphite datasource in grafana?

I have a metric that shows the state of a server. The values are integers and if the value is 0 (zero) then the server is stable, else it is unstable. And the graph we have is at a minute level. So, I want to show an aggregated value to know how many hours the server is unstable in the selected time range.
Lets say, if I select "Last 7 days" as the time duration...we have get X hours of instability of server.
And one more thing, I have a line graph (time series graph) that shows the state of server...but, the thing is when I select "Last 24 hours or 48 hours" I am getting the graph at a minute level...when I increase the duration to a quarter I am getting the graph for every 5 min or something like that....I understand it's aggregating the values....but does any body know how the grafana is doing the aggregation ??
I have tried "scaleToSeconds" function and "ConsolidateBy" functions and many more to first get the count of non zero value minutes, but no success.
Any help would be greatly appreciated.
Thanks in advance.
There are a few different ways to tackle this, there are 2 places that aggregation happens in this situation:
When you query for a time range longer than your raw retention interval and whisper returns aggregated data. The aggregation method used here is defined in your carbon aggregation configuration.
When Grafana sends a query to Graphite it passes maxDataPoints=<width of graph in pixels>, and Graphite will perform aggregation to return at most that many points (because you don't have enough pixels to render more points than that). The method used for this consolidation is controlled by the consolidateBy function.
It is possible for both of these to be used in the same query if you eg have a panel that queries 3 days worth of data and you store 2 days at 1-minute and 7 days at 5-minute intervals in whisper then you'd have 72 * 60 / 5 = 864 points from the 5-minute archive in whisper, but if your graph is only 500px wide then at runtime that would be consolidated down to 10-minute intervals and return 432 points.
So, if you want to always have access to the count then you can change your carbon configuration to use sum aggregation for those series (and remove the existing whisper files so new ones are created with the new aggregation config), and pass consolidateBy('sum') in your queries, and you'll always get the sum back for each interval.
That said, you can also address this at query time by multiplying the average back out to get a total (assuming that your whisper aggregation config is using average). The simplest way to do that will be to summarize the data with average into buckets that match the longest aggregation interval you'll be querying, then scale those values by that interval to calculate the total number of minutes. Finally, you'll want to use consolidateBy('sum') so that any runtime consolidation will work properly.
consolidateBy(scale(summarize(my.series, '10min', 'avg'), 60), 'sum')
With all of that said, you may want to consider reporting uptime in terms of percentages rather than raw minutes, in which case you can use the raw averages directly.
When you say the value is zero (0), the server is healthy - what other values are reported while the server is unhealthy/unstable? If you're only reporting zero (healthy) or one (unhealthy), for example, then you could use the sumSeries function to get a count across multiple servers.
Some more information is needed here about the types of values the server is reporting in order to give you a better answer.
Grafana does aggregate - or consolidate - data typically by using the average aggregation function. You can override this using the 'sum' aggregation in the consolidateBy function.
To get a running calculation over time, you would most likely have to use the summarize function (also with the sum aggregation) and define the time period, e.g. 1 hour, 1 day, 1 week, and so on. You could take this a step further by combining this with a time template variable so that as the period grows/shrinks, the summarize period will increase/decrease accordingly.

Tableau: graphically show compounded leadtimes

I have a chart that shows the number of departures for a given 15 minute interval as seen here.
I need to compound these counts backwards for one hour. For example, the 3 departures shown at 11:00 need to also be represented at the 10:00, 10:15, 10:30, and 10:45 columns. When completed, the 10:00 would have a total of 6 departures (10:15 -> 6, 10:30 ->5, 10:45 -> 4, 11:00 -> 4).
I have done this via VBA in excell, but am now needing to replicate the chart in Tableau and have been beating my head in for about two weeks now. I'd love to hear any and all suggestions.
You can use a Cartesian join against a large enough date range of your choosing to in effect resample your data and add the additional time intervals you desire.
For example, if you have a month's worth of data (min date -> max date = 30 days), then you have (30 * 24 * 4) 2880 15 minute intervals.
Create all those intervals in a separate data sheet
Add a bogus column with value of link for all rows
Create the same bogus in your actual data source
Join the two sheets together on the link column
Create a calculated field that is something along the following:
[Interval] <= [Flight Time] AND [Interval] >= DATEADD('hour',-1,[Flight Time])
This calculated field will evaluate to TRUE when the interval time is within one hour before the flight time. You can then drag this field onto your filter shelf and select TRUE value only. Effectively your [Interval] field becomes your new date field.
I would recommend adding that filter to the context and applying across the entire datasource. Before you add this filter you'll have 2880 times the about of data so be sure to do a live view first. Be careful with extracts using Cartesian joins as you could potentially be extracting more than you bargained for.
See the following links for different techniques on how to do this and re-sampling dates in general in tableau.
https://community.tableau.com/thread/151387
Depending on the size of your data (and if a live view is not necessary) it is often times easier and more efficient to do this type of pre-processing outside of tableau in SQL or something like python's pandas library.
Here is another solution provided from the Tableau Cumunity Forum. I have not tried tyvich's solution yet, but I know this one got me where I needed. Please follow the link to see the solution using moving table calculations.
https://community.tableau.com/thread/251154

Aggregating data from the US stock market in Tableau, using different time frames

I am a very basic user of tableau and I have not found an answer to my question.
I have a txt file that has historical daily data for 98% of all the stocks in the US, with their daily capitalization. Each stocks has its TICKER, Daily Market Value for every trading day of the year, and its SECTOR.
I did a simple time series that display SUM([Mktval]) (sum of all individual market values) across all stocks, on a daily daily, and where I can see that the total value as of 2016 is about 24 Trillion USD, as in the image below.
When I change the view column from DAY to YEAR, I don't see the right values, but something a lot larger. So I realized that I need to do SUM([Mktval])/252 to get the right value for a year (there are 252 trading days in a year).
If I change the view to MONTH, as in the chart below, the numbers are again wrong because 252 is not the right value to use in the division.
Is there any way that Tableau can adjust the values automatically to reflect the AVG MktVal across different time intervals?
Thanks
Replace SUM(Mktval) on the Rows shelf with the following calculated field
avg({ fixed day(Date1) : sum(Mktval) })
That solution is all in one step. It is perhaps a bit more clear to use 2 steps. First, create a calculated field called total_daily_market_value defined as
{ fixed day(Date1) : sum(Mktval) }
Then make sure that calculated field is a measure. It is an LOD calculation that you can think of as a separate table with one value for each day showing the total market value for that day.
Drag that measure to a shelf, and then change the aggregation function to AVG(), MEDIAN(), MIN(), MAX() or STDEV() as desired. Tableau will aggregate the total_daily_market_value using your chosen aggregation function for whatever values of Date1 are in your view.

Tableau : How to get the Dashboard always display the latest 10 days worth of group statistics

My input text source always contains last 12 months worth of data. e.g: Current month is October. So My input source contains data starting from last Oct 1st to till date. But I want the aggregate statistics to be displayed on a daily basis for last 10 days of sales , 30 days of sales, 45 days of sale per product across various regions
I am trying to use window_avg fuction with something like window_avg(sum(sales), first() + datediff('day', window_min(min([date]))-1, dateadd('month',1,window_min(min([Date]))-1)) * 13,13) something like that. But I am not able to crack the exact logic.
Could you please suggest me some better way to achieve this, rather than using these kind of calculations. Also I am afraid if this goes wrong if there is data missing in the middle one or two days.
Any help is appreciated.
A very simple thing is to use a relative date filter. There's a UI for you to select they last N days.
Put the date on the columns shelf and set it to the date truncation of year-month-days. Put your measure row shelf. Put the date pill on the filter shelf too and use a relative date filter.
If you are doing simple aggregate like the sum of sales for a day it's easy and you'll not need to do anything else. You can can also fairly easily create a table calculation by right clicking on the measure and choosing one of the quick table calculations. Even when I'm doing a more sophisticated calculation, I start with a quick table calculation and then start editing.
If you are doing something like a moving average, the filter and the moving average can interact. For example, if I'm showing a 5 day trailing moving average over 30 day period, the first few days do not get averaged in the same way -- you don't have days over 30 days ago. If that's not really an issue for you, that's cool and you are done.
If it is an issue, it's going to be trickier. I'd suggest creating a second filter based on a table calc. The reason is the order of operations in Tableau. The raw data is filtered then aggregated by the database, then the table calcs are performed. If there are any filters on table calculations, then they are filtered after that. So basically, in my example, you want create a filter for 35 days on the date, then create a table calc on the date -- like using the INDEX() function. Filter the index function to show 30 days worth, then you've got a moving average that uses 35 days to compute the average, but only shows 30.