calculation to determine average per event by year - tableau-api

I have a very large table of data containing cricket information. At the moment I am trying to gather the average number of runs per match for Australia (and other countries) in years 2013, 2014, and 2015. I was able to get the average runs per year into a graph and currently I have a bar chart that looks like this:
year 2013 | 2014 | 2015
total runs 1037 | 1835 | 177
but I would like one that divides that total by the number of matches per year (6, 13, and 1 respectively) and looks like this:
year 2013 | 2014 | 2015
avg runs per match 173 | 141 | 177
but I don't know how to conduct a calculation on these numbers to divide that total over the number of games played. There is a column in my set called 'MID' for Match ID. Obviously, summing the number of MID for 2013 would give me the needed number, 6.
Ideally, I would divide the total number of runs by the number of unique items in the MID column, but I do not know how to do this. If this makes any sense at all, would anyone have a simple way of doing this? I would really appreciate it, as I'm essentially experimenting with this on my own and falling way behind on my other projects.

Assuming you have a column named "Runs" and a column named "MID", then a calculation for Runs per Match would be as follows:
SUM(Runs) / COUNTD(MID)
This gives total runs divided by distinct count of Match ID.

Related

Create subset and overall value line chart

This may sounds very simple but I am finding hard to get that. I have a data with value in one column and category in another column. i.e.,
Value Category Month
100 A Jan
300 A Feb
200 A Mar
459 B Jan
334 B Feb
765 B Mar
I am trying to use a line chart in tableau with Month on X axis and Value on Y-axis.Basically I am trying to add two lines, one for overall value for that particular month and another line for Category A alone for that particular month. Say for example the overall value for Jan as 559 and another line in same graph for category A for Jan as 100.
Though it sounds so basic, I find hard to achieve this. Should I create a calculated field for this or is there any simple method that works for this.
You'd want a dual axis probably to achieve this.
http://onlinehelp.tableau.com/current/pro/desktop/en-us/multiplemeasures_dualaxes.html
See: Dual axis chart from the same measure in Tableau
Then you can color one of the measures based upon the category. Make sure to synchronize axis.

Monthly average from daily data

I have a dataset of energy efficient smart AC units. Each one has an ID, and each unit has daily data that represents the cost saved (in dollars) for each day.
I want to create a bar graph that shows the average savings, per month, per unit. I'm really struggling, however. AVG([Elecsavingscost]) only gets me the average daily savings in a given month. SUM([Elecsavingscost]) * 30 gets me pretty close to what I want, but of course, not all months have 30 days.
Is there a more intelligent way to do this? I'm presuming it's possible...
It can be very easily done using R software. Following is the code to convert daily data to monthly data
install.packages(c("zoo","hydroTSM")
library(zoo)
library(hydroTSM)
data=read.csv("data.csv")
data #data should contain 2 or more columns; 1st column should be date in
#English (U.K.) format, 2nd and subsequent columns should be your daily data
date Elecsavingscost
01-01-1984 18.8
02-01-1984 20.2
03-01-1984 19
04-01-1984 19.6
05-01-1984 21.8
06-01-1984 21.5
.
.
.
25-12-2014 13.6
26-12-2014 13.6
27-12-2014 16.2
28-12-2014 18.2
29-12-2014 16.7
30-12-2014 19.4
31-12-2014 18.5
x1 <- zooreg(data$Elecsavingscost, start = as.Date("1984-01-01"))
## Daily to monthly conversion
Elecsavingscost_monthly<- daily2monthly(x1, FUN=sum, na.rm=TRUE)
write.csv(Elecsavingscost_monthly,"Elecsavingscost_monthly.csv")

Calculating percentage of survivors per cohort over time in Tableau

In my dataset, I have three columns of data:
CustomerID, BoxCount, MonthCreated
1001 1 Aug 2015
1001 2 Aug 2015
1001 3 Aug 2015
1002 1 Sep 2015
1002 2 Sep 2015
In the screenshot below, I have built a table that displays the count of unique CustomerIDs at each BoxCount level, by cohort (MonthCreated, which is when the customer signed up).
BoxCount level 1 is the full number of people who signed up in MonthCreated X, because everyone who has signed up receives at least 1 box. Then people start cancelling. The number of people who reached BoxCount level 2 for May 2015 (according to the screenshot), is 156,823, or 86.87% of total people who signed up in May (180,525).
I need to create a second column next to the count of customers that displays the % of customers remaining at each BoxCount level, per cohort (people who signed up in the same month).
I have tried using the Quick Table calculation Percentage of Total, with the computation method being "Table (Down)" but it only seems to work for the first month of MonthCreated. I would like for each subsequent month to have 100% for BoxCount level 1, and the following % to be a portion of the number at each month's level 1. I can't figure out why for July, the % starts at 83.89% and not 100%.
Can anyone help me figure out how to calculate this percentage and also to add it as a new column instead of replacing the column of raw counts?
Thanks!
Looks like you're almost there. Did you try changing the calculation definition by changing the values it summarizes from or the level?
Some example:
And for it to be replacing the column of raw counts, you can just add the raw counts column again in your view and you'll have both.

How to sample from KDB table to reduce data before querying?

I have a table of tick data representing prices of various financial instruments up to millisecond precision. Problem is, there are over 5 billion entries, and even the most basic queries takes several minutes.
I only need data with a precision of up to 1 second - is there an efficient way to sample the table so that the precision is reduced to roughly 1 second prior to querying? This should dramatically cut the amount of data and hence execution time.
So far, as a quick hack I've added the condition where i mod 2 = 0 to my query, but is there a better way?
The best way to bucket time data is with xbar
q)select last price, sum size by 10 xbar time.minute from trade where sym=`IBM
minute| price size
------| -----------
09:30 | 55.32 90094
09:40 | 54.99 48726
09:50 | 54.93 36511
10:00 | 55.23 35768
...
more info http://code.kx.com/q/ref/arith-integer/#xbar

Sum last 52 elements without loop

Say I have a data set that has x-sections of country (US, CANADA) and then state/province then year then week. My data stack has 2 countries, 57 states/provinces, 3 years, and 52 weeks. I wanto to create a variable revenue that for each week, sums the last 52 weeks within the x-section.
Right now i have a loop but its very, very slow.
for each countries,
for each state,
for the last 2 years,
for each week,
sum the last 52 elements
Does anyone know how I can do this with vectorization?
You might want to look into using the function s=sum(X,DIM). Without more info about your dataset (please provide example), we cannot go into great detail.
For weighted sums over rolling window, filter() can work well.