Is there any way to compare two table calculations made in Tableau to create a calculated field? - tableau-api

I am relatively new to tableau! I am working on some problem which requires me to compare a table calculation to specified thresholds. I have five time windows namely 0-30, 30-60, 60-90, 90-120 and 120 onwards, to categorize my data into. This is spread across the data. I calculate the number of events which happen within certain time windows by doing '{FIXED [time window] : count([time window])}'. Thus I got a count for all the categories as 50 events happened which lasted 0-30s, 30 events lasted 30-60s, 10 events lasted for 60-90s and 5 events each for the rest of the two classes. I have a restriction of cumulative percentages as: 75, 90, 95, 97.5, 100.
I have created this using IF, ELSEIF and ELSE statements like:
IF time window = '0-30s' THEN 75
ELSEIF time window = '30-60s' THEN 90
ELSEIF time window = '60-90s' THEN 95
ELSEIF time window = '90-120s' THEN 97.5
ELSE 100.
and named this as specified cumulative share.
I make a table calculation for the obtained values as percent of total of Running total of the count of events in each class using primary and secondary table calculations for the measure and thus have got 50%, 80%, 90%, 95% and 100% for the respective classes. Now I need to compare each of them with the specified share and create another calculated field saying greater than, equal to or less than. How do I do it?
The current table looks like this:
**Time window** | **Obtained cumulative share** | **Specified cumulative share**
0 - 30 s | 50 % | 75
30 - 60 s | 80 % | 90
60 - 90 s | 90 % | 95
90 - 120 s | 95 % | 97.5
120 onwards | 100 % | 100
**Obtained cumulative share** is an alias for percent of total ( running total (counts for each
class))

I created a sample data and did it like this-
Instead of calculating cumulative sum through table calculation methood, use a function running_sum like this-
RUNNING_SUM(SUM([Count of Class] ))
I named this field as calculated cum sum.
create another calculated field for your T/F condition
MIN([specified Cum Share])>=([calculated cum share])
I have tweaked your specified shares just to check the formula is correct. See this view that it works.

Related

How does the Graphite summarize function with avg work?

I'm trying to figure out how the Graphite summarize function works. I've the following data points, where X-axis represents time, and Y-axis duration in ms.
+-------+------+
| X | Y |
+-------+------+
| 10:20 | 0 |
| 10:30 | 1585 |
| 10:40 | 356 |
| 10:50 | 0 |
+-------+------+
When I pick any time window on Grafana more than or equal to 2 hours (why?), and apply summarize('1h', avg, false), I get a triangle starting at (9:00, 0) and ending at (11:00, 0), with the peak at (10:00, 324).
A formula that a colleague came up with to explain the above observation is as follows.
Let:
a = Number of data points for a peak, in this case 4.
b = Number of non-zero data points, in this case 2.
Then avg = sum / (a + b). It produces (1585+356) / 6 = 324 but doesn't match with the definition of any mean I know of. What is the math behind this?
Your data is at 10 minute intervals, so there are 6 points in each 1hr period. Graphite will simply take the sum of the non-null values in each period divided by the count (standard average). If you look at the raw series you'll likely find that there are also zero values at 10:00 and 10:10

count number values exceeds given threshold in moving window in matlab

I have time vs values plot. time =100. I want to select time 1 to 4 & then count how many values are exceeding 20. i.e. for time 1 to 4 values are 16 43 94 21 so 3 values are exceeding 20 so count should be 3. then want move window so time is 2 to 5 & count number of values exceeding 20. so last window would be 97 to 100. I tried following code but it showing 0 & 1
N=4;% length of window
d=length(t);% t has 100 values so took length
for e=0:d-N;
for x=1+e:N+e;
y(x)=sum(t(x)>20); % t contains values so took t(x)
end
end
how to do it.
You can use a logical index showing where t is greater than 20 then use movsum to count how many values in sliding window exceed 20;
N =4;
idx = t > 20;
result = movsum(idx,N)

Matlab: Count till sum equals 360 > insert event1, next 360 >insert event 2 etc

I have been trying to solve this problem for a while now and I would appreciate a push in the right direction.
I have a matrix called Turn. This matrix contains 1 column of data, somewhere between 10000 and 15000 rows (is variable). What I like to do is as follows:
start at row 1 and add values of row 2, row 3 etc till sum==360. When sum==360 insert in column 2 at that specific row 'event 1'.
Start counting at the next row (after 'event 1') till sum==360. When sum==360 insert in column 2 at that specific row 'event 2'. etc
So I basically want to group my data in partitions of sum==360
these will be called events.
The row number at which sum==360 is important to me as well (every row is a time point so it will tells me the duration of an event). I want to put those row numbers in a new matrix in which on row 1: rownr event 1 happened, row 2: rownr event 2 happened etc.
You can find the row indices where events occur using the following code. Basically you're going to use the modulo operator to find where the sum of the first column of Turn is a multiple of 360.
mod360 = mod(cumsum(Turn(:,1)),360);
eventInds = find(mod360 == 0);
You could then loop over eventInds to place whatever values you'd like in the appropriate rows in the second column of Turn.
I don't think you'll be able to place the string 'event 1' in the column though as a string array is acts like a vector and will result in a dimension mismatch. You could just store the numerical value 1 for the first event and 2 for the second event and so on.
Ryan's answer looks like the way to go. But if your condition is such that you need to find row numbers where the cumulative sum is not exactly 360, then you would be required to do a little more work. For that case, you may use this -
Try this vectorized (and no loops) code to get the row IDs where the 360 grouping occurs -
threshold = 360;
cumsum_val = cumsum(Turn);
ind1 = find(cumsum_val>=threshold,1)
num_events = floor(cumsum_val(end)/threshold);
[x1,y1] = find(bsxfun(#gt,cumsum_val,threshold.*(1:num_events)));
[~,b,~] = unique(y1,'first');
row_nums = x1(b)
After that you can get the event data, like this -
event1 = Turn(1:row_nums(1));
event2 = Turn(row_nums(1)+1:row_nums(2));
event3 = Turn(row_nums(2)+1:row_nums(3));
...
event21 = Turn(row_nums(20)+1:row_nums(21));
...
eventN = Turn(row_nums(N-1)+1:row_nums(N));
Edit 1
Sample case:
We create a small data of 20 random integer numbers instead of 15000 as used for the original problem. Also, we are using a threshold of 30 instead of 360 to account for the small datasize.
Code
Turn = randi(10,[20 1]);
threshold = 30;
cumsum_val = cumsum(Turn);
ind1 = find(cumsum_val>=threshold,1)
num_events = floor(cumsum_val(end)/threshold);
[x1,y1] = find(bsxfun(#gt,cumsum_val,threshold.*(1:num_events)));
[~,b,~] = unique(y1,'first');
row_nums = x1(b);
Run
Turn =
7
6
3
4
5
3
9
2
3
2
3
5
4
10
5
2
10
10
5
2
threshold =
30
row_nums =
7
14
18
The run results shows the row_nums as 7, 14, 18, which mean that the second grouping starts with the 7th index in Turn, third grouping starts at 14th index and so on. Of course, you can append 1 at the beginning of row_nums to indicate that the first grouping starts at the 1st index.
Given a column vector x, say,
x = randi(100,10,1)
the following would give you the index of the first row where the cumulative sum off all the items above that row adds up to 360:
i = max( find( cumsum(x) <= 360) )
Then, you would have to use that index to find the next set of cumulative sums that add up to 360, something like
offset = max( find( cumsum(x(i+1:end)) <= 360 ) )
i_new = i + offset
You might need to add +1/-1 to the offset and the index.
>> x = randi(100,10,1)'
x =
90 47 47 44 8 79 45 9 91 6
>> cumsum(x)
ans =
90 137 184 228 236 315 360 369 460 466
>> i = max(find(cumsum(x)<=360))
i =
7

Brain teaser - filtering algorithm using moving averages

I have a 1 second dataset of 86400 wind speed (WS) values in Matlab and need assistance in filtering it. It requires a certain level of cleverness.
If the average WS exceeds:
25m/s in a 600s time interval
28m/s in a 30s time interval
30m/s in a 3 s time interval
If any of these parameters are met, the WS is deemed 'invalid' until the average WS remains below 22m/s in a 300 s time interval.
Here is what I have for the 600 second requirement. I do a 600 and 300 second moving average on the data contained in 'dataset'. I filter the intervals from the first appearance of an average 25m/s to the next appearance of a value below 22m/s as 'NaN'. After filtering, I will do another 600 second average, and the intervals with values flagged with a NaN will be left a NaN.
i.e.
Rolling600avg(:,1) = tsmovavg(dataset(:,2), 's', 600, 1);
Rolling300avg(:,1) = tsmovavg(dataset(:,2), 's', 300, 1);
a = find(Rolling600avg(:,2)>25)
b = find(Rolling300avg(:,2)<22)
dataset(a:b(a:find(b==1)),2)==NaN; %?? Not sure
This is going to require a clever use of 'find' and some indexing. Could someone help me out? The 28m/s and 30m/s filters will follow the same method.
If I follow your question, one approach is to use a for loop to identify where the NaNs should begin and end.
m = [19 19 19 19 28 28 19 19 28 28 17 17 17 19 29 18 18 29 18 29]; %Example data
a = find(m>25);
b = find(m<22);
m2 = m;
% Use a loop to isolate segments that should be NaNs;
for ii = 1:length(a)
firstNull = a(ii)
lastNull = b( find(b>firstNull,1) )-1 % THIS TRIES TO FIND A VALUE IN B GREATER THAN A(II)
% IF THERE IS NO SUCH VALUE THEN NANS SHOULD FILL TO THE END OF THE VECTOR
if isempty(lastNull),
lastNull=length(m);
end
m2(firstNull:lastNull) = NaN
end
Note that this only works if tsmovavg returns an equal length vector as the one passed to it. If not then it's trickier and will require some modifications.
There's probably some way of avoiding a for loop but this is a pretty straight forward solution.

How to count matches in several matrices?

Making a dichotomous study, I have to count how many times a condition takes place?
The study is based on two kinds of matrices, ones with forecasts and others with analyzed data.
Both in the forecast and analysis matrices, in case a condition is satisfied we add 1 to a counter. This process is repeated for a points distributed in a grid.
Are there any functions in MATLAB that help me with counting or any script that supports this procedure?
Thanks guys!
EDIT:
The case goes about precipitation registered and forecasted. When both exceed a threshold I consider it as a hit. I have Europe divided in several grid points, and I have to count how many times the forecast is correct. I also have 50 forecasts for each year, so the result (hit/no hit) must be a cumulative action.
I've trying with count and sum functions, but they reduce the spatial dimension of the matrices.
It's difficult to tell exactly what you are trying to do but the following may help.
forecasted = [ 40 10 50 0 15];
registered = [ 0 15 30 0 10];
mismatch = abs( forecasted - registered );
maxDelta = 10;
forecastCorrect = mismatch <= maxDelta
totalCorrectForecasts = sum(forecastCorrect)
Results:
forecastCorrect =
0 1 0 1 1
totalCorrectForecasts =
3