I have a 500x500 matrix with values ranging from 1-100.
I need to look at 5 rows at a time and see if those 5 rows contain values that are greater than 75. I then need to get the index of the first column where the value is greater than 75 and the index of the last column where the value is greater than 75.
So far, I have the following:
i = 1;
while i < size(data,1)
if (i + 5) <= size(data,1)
if any(envNoClutterscansV(i:i + 5, 1:500) > 75)
% do something
end
end
i = i + 5;
end
The idea here is that I am looking at 5 rows at a time. For every 5 rows, I'm looking through all the columns to see if there are values that meet my criteria. So far, this doesn't find any values, even though I'm sure that my dataset contains the values. Additionally, I am not sure what to do from here.
I think the trouble might be that the result of any in the above code is a vector of 500 true and false values. You should sum them if you e=want to respond every time there are larger than 75 values:
if sum(any(envNoClutterscansV(i:i + 5, 1:500) > 75))
If you want to speed it up, you can avoid the loop and vectorize it, for example like this:
data = [
11 76 25 44 55 75;
11 75 95 44 85 75;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 0 25 44 55 0;
11 0 25 44 55 0;
11 90 25 44 55 88;
11 0 25 44 55 0;
91 0 25 44 55 80;
];
% Geting the number of rows
nRows=size(data,1);
% Retting a logical matrix with all the cells that are above the treshold
cellsOverTreshold=data>75;
% Getting a logical index to all the rows that contain values above
% treshold
matchingRows=any(cellsOverTreshold,2);
% In nexy line of code "reshape" rearange the data to put in columns the
% values associated to each goup of 5 rows
% So colum 1 have group one corresponding to data columns 1,2,3,4,5
% colum 2 have group two corresponding to data columns 6,7,8,9,10
% and so on
% Now we can get all the row groups that have velues above threshold
matchingRowGroups=find(any(reshape(matchingRows,5,[])));
% Now e put each row of on a cell array to be able to operate row-wise
cellRows = num2cell(cellsOverTreshold, 2);
% We now get the first and last column over the threshold for each row
firstColumOfRow = cellfun(#(x)find(x,1,'first'), cellRows,'UniformOutput',false);
lastColumOfRow = cellfun(#(x)find(x,1,'last'), cellRows,'UniformOutput',false);
% We replace the empty cells with NaNs so we can convert them to vectors
% without losing the indexing
firstColumOfRow(~matchingRows)={NaN};
lastColumOfRow(~matchingRows)={NaN};
% We rearrange the data as above and get the minimum of the first columns
% of each group, that is the first colum of the group above the threshold
firstColInGroup=nanmin(reshape([firstColumOfRow{:}]',5,[]));
% With the maximum of the last colums we get the last column of each group
lastColInGroup=nanmax(reshape([lastColumOfRow{:}]',5,[]));
% We finaly keep only the data of the groups with at that have at least one
% element above the threshold
firstColInGroup=firstColInGroup(matchingRowGroups);
lastColInGroup=lastColInGroup(matchingRowGroups);
In this way the variable "matchingRowGroups" have the indexes of each group of 5 rows that matchs. The variable "firstColInGroup" have the first column matching for each group and "lastColInGroup" the last one.
In addition to my previous answer, here is another option of vectorization, avoiding to transform data into cell arrays and avoiding using cellfun too, therefore, it is probably faster. Here it is:
data = [
11 76 25 44 55 75;
11 75 95 44 85 75;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 0 25 44 55 0;
11 0 25 44 55 0;
11 90 25 44 55 88;
11 0 25 44 55 0;
91 0 25 44 55 80;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 0 25 84 55 0;
11 0 25 44 55 0;
];
% Geting the number of rows
[nRows, nCols]=size(data);
% Retting a logical matrix with all the cells that are above the treshold
cellsOverTreshold=data>75;
% Getting a logical index to all the rows that contain values above
% treshold
matchingRows=any(cellsOverTreshold,2);
% In nexy line of code "reshape" rearange the data to put in columns the
% values associated to each goup of 5 rows
% So colum 1 have group one corresponding to data columns 1,2,3,4,5
% colum 2 have group two corresponding to data columns 6,7,8,9,10
% and so on
% Now we can get all the row groups that have velues above threshold
matchingRowGroups=find(any(reshape(matchingRows,5,[])))
%We find the rows and columns of all the first and last columns of each row
% that have values above threshold
[firstRow, firstCol]=find(cumsum(cumsum(cellsOverTreshold,2),2)==1);
[lastRow, lastCol]=find(cumsum(cumsum(cellsOverTreshold,2,'reverse'),2,'reverse')==1);
% Sort this data in vectors with one value per row, leaving NANs for rows
% with no element above threshold
firstColumOfRow=NaN(nRows,1);
lastColumOfRow=NaN(nRows,1);
firstColumOfRow(firstRow)=firstCol;
lastColumOfRow(lastRow)=lastCol;
% We rearrange the data as above and get the minimum of the first columns
% of each group, that is the first colum of the group above the threshold
firstColInGroup=nanmin(reshape(firstColumOfRow,5,[]));
% With the maximum of the last colums we get the last column of each group
lastColInGroup=nanmax(reshape(lastColumOfRow,5,[]));
% We finaly keep only the data of the groups with at that have at least one
% element above the threshold
firstColInGroup=firstColInGroup(matchingRowGroups)
lastColInGroup=lastColInGroup(matchingRowGroups)
This code looks 5 rows a time. Use find to locate the values > 75 and ind2sub to convert the indices returned by find to rows (ignored) and columns cols.
data = [
11 76 25 44 55 78;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 0 25 44 55 0;
11 0 25 44 55 0;
11 0 25 44 55 88;
11 0 25 44 55 0;
11 0 25 44 55 0;
];
for row = 1:5:size(data, 1)
fprintf('Row %d - %d\n', row, row+4);
indices = find(data(row:row+4,:) > 75);
if ~isempty(indices)
[~, cols] = ind2sub([5 size(data, 2)], indices);
col_min = min(cols);
col_max = max(cols);
fprintf('Column: %d and %d\n', col_min, col_max);
end
end
After thinking a bit more, here you have yet another simpler, faster and more compact solution. See my first solution for more datils on the naming of variables, but they are quite self explanatory
data = [
11 76 25 44 55 75;
11 75 95 44 85 75;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 0 25 44 55 0;
11 0 25 44 55 0;
11 90 25 44 55 88;
11 0 25 44 55 0;
91 0 25 44 55 80;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 75 25 44 55 75;
11 0 25 84 55 0;
11 0 25 44 55 0;
];
% Geting the number of rows and columns
[nRows, nCols]=size(data);
%We create arrays with rows and column numbers of each element
[colNum,rowNum]=meshgrid(1:nCols,1:nRows);
% Set NaN the column numbers that do not match the treshold
colNum(data<=75)=NaN;
% Get the group number of each element
groupNum=ceil(rowNum/5);
%The matching groups are those that have at least one non-NaN element
matchingRowGroups = accumarray(groupNum(:),colNum(:),[],#(x)any(~isnan(x)))
%We get the minimum of the column numbers matching thershold on each group
firstColumOfGroup = accumarray(groupNum(:),colNum(:),[],#nanmin)
%We get the maximum of the column numbers matching thershold on each group
lastColumOfGroup = accumarray(groupNum(:),colNum(:),[],#nanmax)
The only difference with the previous solutions is that matchingRowGroups is a logical index, and firstColumOfGroup and lastColumOfGroup have one entry per group, instead of entries only for groups with elements above the threshold. Groups with no entry above threshold have NaN values
I have a vector like A = [20 30 40 50 60 55 54 60 70]. It is always ascending until the invalid value (here for ex. 55), I need to find the indices of this element and remove all elements after that. my desired vectror is [20 30 40 50 60]
any suggestion?
short answer:
A(find(diff(A)<0,1)+1:end) = []
longer answer with explanation:
diff calculates the difference between adjacent elements:
>> diff(A)
ans =
10 10 10 10 -5 -1 6 10
We then search the first index of those differences that is less than zero and remove this and all succeeding elements.
>>> idx = find(diff(A)<0,1)+1
idx =
6
>>> A(idx:end)
ans =
55 54 60 70
>> A(idx:end) = []
A =
20 30 40 50 60
I have a p-by-p-by-n tensor. I want to extract diagonal element for each p-by-p slice. Are there anyone know how to do this without looping?
Thank you.
Behold the ever mighty and ever powerful bsxfun for vectorizing MATLAB problems to do this task very efficiently using MATLAB's linear indexing -
diags = A(bsxfun(#plus,[1:p+1:p*p]',[0:n-1]*p*p))
Sample run with 4 x 4 x 3 sized input array -
A(:,:,1) =
0.7094 0.6551 0.9597 0.7513
0.7547 0.1626 0.3404 0.2551
0.2760 0.1190 0.5853 0.5060
0.6797 0.4984 0.2238 0.6991
A(:,:,2) =
0.8909 0.1493 0.8143 0.1966
0.9593 0.2575 0.2435 0.2511
0.5472 0.8407 0.9293 0.6160
0.1386 0.2543 0.3500 0.4733
A(:,:,3) =
0.3517 0.9172 0.3804 0.5308
0.8308 0.2858 0.5678 0.7792
0.5853 0.7572 0.0759 0.9340
0.5497 0.7537 0.0540 0.1299
diags =
0.7094 0.8909 0.3517
0.1626 0.2575 0.2858
0.5853 0.9293 0.0759
0.6991 0.4733 0.1299
Benchmarking
Here's few runtime tests comparing this bsxfun based approach against repmat + eye based approach for big datasizes -
***** Datasize: 500 x 500 x 500 *****
----------------------- With BSXFUN
Elapsed time is 0.008383 seconds.
----------------------- With REPMAT + EYE
Elapsed time is 0.163341 seconds.
***** Datasize: 800 x 800 x 500 *****
----------------------- With BSXFUN
Elapsed time is 0.012977 seconds.
----------------------- With REPMAT + EYE
Elapsed time is 0.402111 seconds.
***** Datasize: 1000 x 1000 x 500 *****
----------------------- With BSXFUN
Elapsed time is 0.017058 seconds.
----------------------- With REPMAT + EYE
Elapsed time is 0.690199 seconds.
One suggestion I have is to create a p x p logical identity matrix, replicate this n times in the third dimension, and then use this matrix to access your tensor. Something like this, supposing that your tensor was stored in A:
ind = repmat(logical(eye(p)), [1 1 n]);
out = A(ind);
Example use:
>> p = 5; n = 3;
>> A = reshape(1:75, p, p, n)
A(:,:,1) =
1 6 11 16 21
2 7 12 17 22
3 8 13 18 23
4 9 14 19 24
5 10 15 20 25
A(:,:,2) =
26 31 36 41 46
27 32 37 42 47
28 33 38 43 48
29 34 39 44 49
30 35 40 45 50
A(:,:,3) =
51 56 61 66 71
52 57 62 67 72
53 58 63 68 73
54 59 64 69 74
55 60 65 70 75
>> ind = repmat(logical(eye(p)), [1 1 n]);
>> out = A(ind)
out =
1
7
13
19
25
26
32
38
44
50
51
57
63
69
75
You'll notice that we grab the diagonals of the first slice, followed by the diagonals of the second slice, etc. up until the last slice. These values are all concatenated into a single vector.
Just reading Divakar's answer and trying to understand why he again is roughly 10 times faster than my idea I put together code mixing both, and ended up with code which is faster:
A=reshape(A,[],n);
diags2 = A(1:p+1:p*p,:);
For a 500x500x500 tensor I get 0.008s for Divakar's solution and 0.005s for my solution using Matlab 2013a. Probably plain indexing is the only way to beat bsxfun.
all,
I have a large dataset with a lot of continuous NAs, is there any fast way to replace the NAs with the average of previous and next non-missing value by column?
Thanks a lot
Lou
Interesting question... if only you explained clearly what you want. Maybe it's this?
data = [1 3 NaN 7 6 NaN NaN 2].'; %'// example data: column vector
isn = isnan(data); %// determine which values are NaN
inum = find(~isn); %// indices of numbers
inan = find(isn); %// indices of NaNs
comp = bsxfun(#lt,inan.',inum); %'// for each (number,NaN): 1 if NaN precedes num
[~, upper] = max(comp); %// next number to each NaN (max finds *first* maximum)
data(isn) = (data(inum(upper))+data(inum(upper-1)))/2; %// fill with average
In this example: original data:
>> data.'
ans =
1 3 NaN 7 6 NaN NaN 2
Result:
>> data.'
ans =
1 3 5 7 6 4 4 2
If you have a 2D array and want to work by columns, a for loop over columns is probably the best option.
And of course, if there can be NaN's at the beginning or end of a column, the problem is undefined.
Assuming NaNs are not in the first/last row in any column, here is how I would do it:
(If there are multiple consecutive NaNs, it searches for previous ann next non-missing values and averages them).
% Creating A
A=magic(7);
newA=A; %Result will be in newA
A(3,4)=NaN;
A(2,1)=NaN;
A(5,6)=NaN;
A(6,6)=NaN;
A(4,6)=NaN;
% Finding NaN position and calculating positions where we have to average numbers
ind=find(isnan(A));
otherInd=setdiff(1:numel(A(:)),ind);
for i=1:size(ind,1)
temp=otherInd(otherInd<ind(i));
prevInd(i,1)=temp(end);
temp=otherInd(otherInd>ind(i));
nextInd(i,1)=temp(1);
end
% For faster processing purposes
allInd(1:2:2*length(prevInd))=prevInd;
allInd(2:2:2*length(prevInd))=nextInd;
fun=#(block_struct) mean(block_struct.data)
prevNextNums=A(allInd);
A
newA(ind)=blockproc(prevNextNums,[1 2],fun)
%-----------------------Answer--------------------------
A =
30 39 48 1 10 19 28
NaN 47 7 9 18 27 29
46 6 8 NaN 26 35 37
5 14 16 25 34 NaN 45
13 15 24 33 42 NaN 4
21 23 32 41 43 NaN 12
22 31 40 49 2 11 20
newA =
30 39 48 1 10 19 28
38 47 7 9 18 27 29
46 6 8 17 26 35 37
5 14 16 25 34 23 45
13 15 24 33 42 23 4
21 23 32 41 43 23 12
22 31 40 49 2 11 20
Assume we have the following data:
H_T = [36 66 21 65 52 67 73; 31 23 19 33 36 39 42]
P = [40 38 39 40 35 32 37]
Using MATLAB 7.0, I want to create three new matrices that have the following properties:
The matrix H (the first part in matrix H_T) will be divided to 3 intervals:
Matrix 1: the 1st interval contains the H values between 20 to 40
Matrix 2: the 2nd interval contains the H values between 40 to 60
Matrix 3: the 3rd interval contains the H values between 60 to 80
The important thing is that the corresponding T and P will also be included in their new matrices meaning that H will control the new matrices depending on the specifications defined above.
So, the resultant matrices will be:
H_T_1 = [36 21; 31 19]
P_1 = [40 39]
H_T_2 = [52; 36]
P_2 = [35]
H_T_3 = [66 65 67 73; 23 33 39 42]
P_3 = [38 40 32 37]
Actually, this is a simple example and it is easy by looking to create the new matrices depending on the specifications, BUT in my values I have thousands of numbers which makes it very difficult to do that.
Here's a quick solution
[~,bins] = histc(H_T(1,:), [20 40 60 80]);
outHT = cell(3,1);
outP = cell(3,1);
for i=1:3
idx = (bins == i);
outHT{i} = H_T(:,idx);
outP{i} = P(idx);
end
then you access the matrices as:
>> outHT{3}
ans =
66 65 67 73
23 33 39 42
>> outP{3}
ans =
38 40 32 37