Brain teaser - filtering algorithm using moving averages - matlab

I have a 1 second dataset of 86400 wind speed (WS) values in Matlab and need assistance in filtering it. It requires a certain level of cleverness.
If the average WS exceeds:
25m/s in a 600s time interval
28m/s in a 30s time interval
30m/s in a 3 s time interval
If any of these parameters are met, the WS is deemed 'invalid' until the average WS remains below 22m/s in a 300 s time interval.
Here is what I have for the 600 second requirement. I do a 600 and 300 second moving average on the data contained in 'dataset'. I filter the intervals from the first appearance of an average 25m/s to the next appearance of a value below 22m/s as 'NaN'. After filtering, I will do another 600 second average, and the intervals with values flagged with a NaN will be left a NaN.
i.e.
Rolling600avg(:,1) = tsmovavg(dataset(:,2), 's', 600, 1);
Rolling300avg(:,1) = tsmovavg(dataset(:,2), 's', 300, 1);
a = find(Rolling600avg(:,2)>25)
b = find(Rolling300avg(:,2)<22)
dataset(a:b(a:find(b==1)),2)==NaN; %?? Not sure
This is going to require a clever use of 'find' and some indexing. Could someone help me out? The 28m/s and 30m/s filters will follow the same method.

If I follow your question, one approach is to use a for loop to identify where the NaNs should begin and end.
m = [19 19 19 19 28 28 19 19 28 28 17 17 17 19 29 18 18 29 18 29]; %Example data
a = find(m>25);
b = find(m<22);
m2 = m;
% Use a loop to isolate segments that should be NaNs;
for ii = 1:length(a)
firstNull = a(ii)
lastNull = b( find(b>firstNull,1) )-1 % THIS TRIES TO FIND A VALUE IN B GREATER THAN A(II)
% IF THERE IS NO SUCH VALUE THEN NANS SHOULD FILL TO THE END OF THE VECTOR
if isempty(lastNull),
lastNull=length(m);
end
m2(firstNull:lastNull) = NaN
end
Note that this only works if tsmovavg returns an equal length vector as the one passed to it. If not then it's trickier and will require some modifications.
There's probably some way of avoiding a for loop but this is a pretty straight forward solution.

Related

Extract the same part of slices of a 3D matrix by using linear index

Indeed, my problem is a succession of my previous problem:
1) Extract submatrices, 2) vectorize and then 3) put back
Thanks to Dan and his ideas works perfectly for the purpose.
My new problem is this:
If I have a 3D matrix, 8 by 8 by 12, e.g. A = randn(8,8,12).
Let's see the linear index of the first slice:
From Dan's solution, I understand that A[4:6, 4:6, :] can extract the corresponding parts of all slices.
However, going back to my real situations, extracting parts by actually counting rows and columns seem not suit my purpose because my matrix size is huge and I do have many sub-matrices to be extracted.
So, I prefer to work on linear index and want to ask if there are any ways to work with this possibility.
Here is my trial:
By defining sub_group = [28 29 30 36 37 38 44 45 46], then A(sub_group) can extract sub-matrix from the first slice of the 3D matrix, A.
I understand that A(sub_group + 8*8*(n-1)) can extract the sub-matrix from the nth slice.
I aim to only work with my sub_group and then extract the same part of every slice.
Most importantly, I have to put back the sub-matrices after updating their values.
So, is there are any quick syntax for matlab to work for my purpose?
I appreciate for your help.
Approach #1
For cases like this when you need to calculate linear indices, you can use bsxfun as shown here -
%// Store number of rows in A as a variable
M = size(A,1)
%// Get start and offset linear indices for the first slice and thus sub_group
start_idx = (colstart-1)*M + rowstart
offset_idx = bsxfun(#plus,[0:rowstop - rowstart]', [0:colstop-colstart]*M) %//'
sub_group = reshape(start_idx + offset_idx,1,[])
%// Calculate sub_groups for all 3D slices
all_sub_groups = bsxfun(#plus,sub_group',[0:size(A,3)-1]*numel(A(:,:,1)))
Sample run -
A(:,:,1) =
0.096594 0.52368 0.76285 0.83984 0.27019
0.84588 0.65035 0.57569 0.42683 0.4008
0.9094 0.38515 0.63192 0.63162 0.55425
0.011341 0.6493 0.2782 0.83347 0.44387
A(:,:,2) =
0.090384 0.037262 0.38325 0.89456 0.89451
0.74438 0.9758 0.88445 0.39852 0.21417
0.032615 0.52234 0.25502 0.62502 0.0038592
0.42974 0.90963 0.90905 0.5676 0.88058
rowstart =
2
rowstop =
4
colstart =
3
colstop =
5
sub_group =
10 11 12 14 15 16 18 19 20
all_sub_groups =
10 30
11 31
12 32
14 34
15 35
16 36
18 38
19 39
20 40
Approach #2
For a quick syntax based solution, sub2ind could be suggested here. The implementation would look something like this -
[X,Y] = ndgrid(rowstart:rowstop,colstart:colstop);
sub_group = sub2ind(size(A(:,:,1)),X,Y);
[X,Y,Z] = ndgrid(rowstart:rowstop,colstart:colstop,1:size(A,3));
all_sub_groups = sub2ind(size(A),X,Y,Z);

Calculate a "running" maximum of a vector

I have the following matrix which keeps track of the starting and ending points of data ranges (the first column represents "starts" and the second column represents the "ends"):
myMatrix = [
162 199; %// this represents the range 162:199
166 199; %// this represents the range 166:199
180 187; %// and so on...
314 326;
323 326;
397 399;
419 420;
433 436;
576 757;
579 630;
634 757;
663 757;
668 757;
676 714;
722 757;
746 757;
799 806;
951 953;
1271 1272
];
I need to eliminate all the ranges (ie. rows) which are contained within a larger range present in the matrix. For example the ranges [166:199] and [180:187] are contained within the range [162:199] and thus, rows 2 and 3 would need to be removed.
The solution I thought of was to calculate a sort of "running" max on the second column to which subsequent values of the column are compared to determine whether or not they need to be removed. I implemented this with the use of a for loop as follows:
currentMax = myMatrix(1,2); %//set first value as the maximum
[sizeOfMatrix,~] = size(myMatrix); %//determine the number of rows
rowsToRemove = false(sizeOfMatrix,1); %//pre-allocate final vector of logicals
for m=2:sizeOfMatrix
if myMatrix(m,2) > currentMax %//if new max is reached, update currentMax...
currentMax = myMatrix(m,2);
else
rowsToRemove(m) = true; %//... else mark that row for removal
end
end
myMatrix(rowsToRemove,:) = [];
This correctly removes the "redundant" ranges in myMatrix and produces the following matrix:
myMatrix =
162 199
314 326
397 399
419 420
433 436
576 757
799 806
951 953
1271 1272
Onto the questions:
1) It would seem that there has to be a better way of calculating a "running" max than a for loop. I looked into accumarray and filter, but could not figure out a way to do it with those functions. Is there a potential alternative that skips the for loop (some kind of vectorized code that is more efficient)?
2) Is there a completely different (that is, more efficient) way to accomplish the final goal of removing all the ranges that are contained within larger ranges in myMatrix? I don't know if I'm over-thinking this whole thing...
Approach #1
bsxfun based brute-force approach -
myMatrix(sum(bsxfun(#ge,myMatrix(:,1),myMatrix(:,1)') & ...
bsxfun(#le,myMatrix(:,2),myMatrix(:,2)'),2)<=1,:)
Few explanations on the proposed solution:
Compare all starts indices against each other for "contained-ness" and similarly for ends indices. Note that the "contained-ness" criteria has to be for either of these two :
Greater than or equal to for starts and lesser than or equal to for ends
Lesser than or equal to for starts and greater than or equal to for ends.
I just so happen to go with the first option.
See which rows satisfy at least one "contained-ness" and remove those to have the desired result.
Approach #2
If you are okay with an output that has sorted rows according to the first column and if there are lesser number of local max's, you can try this alternative approach -
myMatrix_sorted = sortrows(myMatrix,1);
col2 = myMatrix_sorted(:,2);
max_idx = 1:numel(col2);
while 1
col2_selected = col2(max_idx);
N = numel(col2_selected);
labels = cumsum([true ; diff(col2_selected)>0]);
idx1 = accumarray(labels, 1:N ,[], #(x) findmax(x,col2_selected));
if numel(idx1)==N
break;
end
max_idx = max_idx(idx1);
end
out = myMatrix_sorted(max_idx,:); %// desired output
Associated function code -
function ix = findmax(indx, s)
[~,ix] = max(s(indx));
ix = indx(ix);
return;
I ended up using the following for the "running maximum" problem (but have no comment on its efficiency relative to other solutions):
function x = cummax(x)
% Cumulative maximum along dimension 1
% Adapted from http://www.mathworks.com/matlabcentral/newsreader/view_thread/126657
% Is recursive, but magically so, such that the number of recursions is proportional to log(n).
n = size(x, 1);
%fprintf('%d\n', n)
if n == 2
x(2, :) = max(x);
elseif n % had to add this condition relative to the web version, otherwise it would recurse infinitely with n=0
x(2:2:n, :) = cummax(max(x(1:2:n-1, :), x(2:2:n, :)));
x(3:2:n, :) = max(x(3:2:n, :), x(2:2:n-1, :));
end

Matlab: spatial average in a 4d matrix (time, case, x, y)

Here is my dataset:
pressure(time, case, x, y)
>> size(pressure)
ans =
100 1 289 570
How to get a spatial nanmean pressure for x from 30 to 60 and y from 40 to 70 in each time step?
For example: a nanmean value for that particular region for each timestep from time 1 to time 100.
I tried this, "spatial_mean_pressure = nanmean(pressure(:,:,30:60,40:70))" It averaged the pressure in the timeserie. This is not the result I want.
>> size(spatial_mean_pressure)
ans =
1 1 31 31
I like to get the results like this:
>> size(spatial_mean_pressure)
ans =
100 1 1 1
You are trying to get a mean for an entire block of matrix. Therefore, you should apply nanmean twice and not once. Also, apply it along a particular dimension to get the desired result. I think this is what you want.
x=randi(10,[100 1 10 25]);
First take the mean along the third dimension.
mean_x_3=nanmean(x,3);
You would get an answer of size = [100 1 1 25]. Then take the mean along 4th dimension.
mean_x_4=nanmean(mean_x_3,4);
This should give you the desired answer. You can write this in one line as,
mean_x = nanmean(nanmean(x,3),4);

MATLAB checking if part of a matrix is within x % of the first column

I've got a quick question about matrices in MATLAB.
Given a 3x4 matrix, how would you check if everything on the right side of the first column is within 80% of the first column? I can't really seem to come up with anything.
Example:
Temperature = [60 59 55 50; 60 48 30 46; 60 45 37 47]
Thank you.
For example:
relTemp = bsxfun(#rdivide,Temperature(:,2:end),Temperature(:,1));
%# be within +/- 80% of first column
isWithin80Perc = all(relTemp > 0.2 & relTemp < 1.8,2);

matlab updating time vector

I have 19 cells (19x1) with temperature data for an entire year where the first 18 cells represent 20 days (each) and the last cell represents 5 days, hence (18*20)+5 = 365days.
In each cell there should be 7200 measurements (apart from cell 19) where each measurement is taken every 4 minutes thus 360 measurements per day (360*20 = 7200).
The time vector for the measurements is only expressed as day number i.e. 1,2,3...and so on (thus no decimal day),
which is therefore displayed as 360 x 1's... and so on.
As the sensor failed during some days, some of the cells contain less than 7200 measurements, where one in
particular only contains 858 rows, which looks similar to the following example:
a=rand(858,3);
a(1:281,1)=1;
a(281:327,1)=2;
a(327:328,1)=5;
a(329:330,1)=9;
a(331:498,1)=19;
a(499:858,1)=20;
Where column 1 = day, column 2 and 3 are the data.
By knowing that each day number should be repeated 360 times is there a method for including an additional
amount of every value from 1:20 in order to make up the 360. For example, the first column requires
79 x 1's, 46 x 2's, 360 x 3's... and so on; where the final array should therefore have 7200 values in
order from 1 to 20.
If this is possible, in the rows where these values have been added, the second and third column should
changed to nan.
I realise that this is an unusual question, and that it is difficult to understand what is asked, but I hope I have been clear in expressing what i'm attempting to
acheive. Any advice would be much appreciated.
Here's one way to do it for a given element of the cell matrix:
full=zeros(7200,3)+NaN;
for i = 1:20 % for each day
starti = (i-1)*360; % find corresponding 360 indices into full array
full( starti + (1:360), 1 ) = i; % assign the day
idx = find(a(:,1)==i); % find any matching data in a for that day
full( starti + (1:length(idx)), 2:3 ) = a(idx,2:3); % copy matching data over
end
You could probably use arrayfun to make this slicker, and maybe (??) faster.
You could make this into a function and use cellfun to apply it to your cell.
PS - if you ask your question at the Matlab help forums you'll most definitely get a slicker & more efficient answer than this. Probably involving bsxfun or arrayfun or accumarray or something like that.
Update - to do this for each element in the cell array the only change is that instead of searching for i as the day number you calculate it based on how far allong the cell array you are. You'd do something like (untested):
for k = 1:length(cellarray)
for i = 1:length(cellarray{k})
starti = (i-1)*360; % ... as before
day = (k-1)*20 + i; % first cell is days 1-20, second is 21-40,...
full( starti + (1:360),1 ) = day; % <-- replace i with day
idx = find(a(:,1)==day); % <-- replace i with day
full( starti + (1:length(idx)), 2:3 ) = a(idx,2:3); % same as before
end
end
I am not sure I understood correctly what you want to do but this below works out how many measurements you are missing for each day and add at the bottom of your 'a' matrix additional lines so you do get the full 7200x3 matrix.
nbMissing = 7200-size(a,1);
a1 = nan(nbmissing,3)
l=0
for i = 1:20
nbMissing_i = 360-sum(a(:,1)=i);
a1(l+1:l+nbMissing_i,1)=i;
l = l+nb_Missing_i;
end
a_filled = [a;a1];