Row selection with a condition - Matlab - matlab

I want to take the rows from the following table (NEW) that the values of the first column of the Edge column (i.e. NEW.Edge(i,1)) are equal to the certain Number (N) and the values in the second column of Edge column (i.e. NEW.Edge(i,2)) are not equal to the values of array (IDs). For example, if N=2 and IDs=15,20 then I should get the rows from 1 to row 9.
The code that I tried so far is :
Y =[];
for i = 1:size(NEW,1)
if ((NEW.Edge(i,1)==N) & (sum(ismember(IDs,NEW.Edge(i,2))==0)))
Y = NEW.Edge(i,1)&NEW.Edge(i,2);
end
end
And
Y = ((NEW.Edge(i,1)==N) & (sum(ismember(IDs,NEW.Edge(i,2))==0)))
Lines = NEW (Y,:)
----Table 'NEW'
Event Node Edge
_________ ____ ________
edgetonew NaN 2 6
edgetonew NaN 2 7
edgetonew NaN 2 8
edgetonew NaN 2 9
edgetonew NaN 2 10
edgetonew NaN 2 11
edgetonew NaN 2 12
edgetonew NaN 2 13
edgetonew NaN 2 14
edgetonew NaN 2 15
edgetonew NaN 15 16
edgetonew NaN 15 17
edgetonew NaN 15 18
edgetonew NaN 15 19
edgetonew NaN 15 20
edgetonew NaN 20 21

You can use ismember and logical indexing to accomplish this. ismember will return a boolean array the size of the first input that is true if each value in the first input is anywhere in the second.
% Will be TRUE when the first column == N and the second column isn't in IDs
rows_to_keep = NEW.Edge(:,1) == N & ~ismember(NEW.Edge(:,2), IDs);
% Now use this logical array to grab just the rows that satisfy the condition
out = NEW(rows_to_keep,:);

Related

Creating cumulative matrix which accounts for column start points

I have a simple example matrix as follows: (The actual matrix I'm working on is 674x11 and is not simply all '1' elements).
a =
1 1 1 NaN NaN
1 1 1 NaN NaN
1 1 1 1 NaN
1 1 1 1 1
1 1 1 1 1
I want to create a cumulative matrix which accounts for the fact that numeric elements start in each column at different rows. I want to achieve this by replacing the NaN value above the first numeric element in each column with the mean of that row.
So instead of:
cumsum(a)=
1 1 1 NaN NaN
2 2 2 NaN NaN
3 3 3 1 NaN
4 4 4 2 1
5 5 5 3 2
what I want to achieve is:
cumsum(a) =
1 1 1 NaN NaN
2 2 2 2 NaN
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
where element (2,4) is the mean of a(2,1:3) and element (3,5) is the mean of a(3,1:4).
You can compute the mean of each row (ignoring the NaN values) by using nanmean. We can then use find to identify the row in which each NaN is and replace the values with the mean of that row. Then we can follow that up with the cumsum operation
% Get the rows of each NaN value
bool = isnan(a);
[row,col] = find(bool);
% Compute the mean value of each row
rowmeans = nanmean(a, 2);
% Replace the NaN values with their row means
a(bool) = rowmeans(row);
% Perform the cumulative sum
result = cumsum(a);
If you want to leave the initial NaN values as NaN values afterwards, then you can follow it up with
result(bool) = NaN;

MATLAB: combine rows with similar values

I am new to MATLAB and I am trying to combine rows with similar values (I have thousands of rows), for example
1 NaN
1 NaN
1 NaN
2 9
2 26.5
2 21.5
2 18
2 24.5
2 12
2 22.5
3 NaN
3 NaN
3 NaN
3 NaN
4 18.5
4 22
4 35.5
...
...
...
to
1 NaN NaN NaN
2 9 26.5 21.5 18 24.5 12 22.5
3 NaN NaN NaN NaN
4 18.5 22 35.5
can any one please help me with this?
This can't be done with normal arrays. Each row has to have same number of columns, but your desired output isn't so. You can work with cell arrays if you wish.
If cell arrays are an option, the best way to tackle this IMHO would be to use an accumarray/sort/cellfun pipeline. First use accumarray to group all of the values together that belong to the same ID, so the first column in your case. Each group would thus be a cell array. However, a consequence with accumarray is that the values that come in per group are unordered. Therefore, what you'd have to group instead are the locations of the values instead. You'd sort these locations and what is output is a cell array where each cell are a list of indices you'd access in the original data.
You'd then call cellfun as the last step to use the indices access the actual data itself.
Something like this comes to mind, assuming your data is stored in X and it's a two-column array.
ind = (1 : size(X,1)).'; %'
out_ind = accumarray(X(:,1), ind, [], #(x) {sort(x)});
out = cellfun(#(x) X(x,2), out_ind, 'uni', 0);
We thus get:
>> celldisp(out)
out{1} =
NaN
NaN
NaN
out{2} =
9.0000
26.5000
21.5000
18.0000
24.5000
12.0000
22.5000
out{3} =
NaN
NaN
NaN
NaN
out{4} =
18.5000
22.0000
35.5000

Include rows of NaN in matrix at predetermined row numbers.

Initially, I have
Matrix A=
[ 1 2 3
4 255 6
NaN NaN NaN
7 8 9
10 11 12
NaN NaN NaN
10 9 11 ];
I find out the row numbers which are all NaN.
Row_NaN_MatA = [3 6];
After eliminating these rows, I am left with:
Matrix B1 =
[ 1 2 3
4 255 6
7 8 9
10 11 12
10 9 11 ];
After applying a filter, I make the second row of Matrix B = NaN NaN NaN. Therefore
Matrix B2 =
[ 1 2 3
NaN NaN NaN
7 8 9
10 11 12
10 9 11 ];
Now, the question is, after all these processing, I want to get the initial matrix back, but with all the deleted elements as NaN. So the required output I want is:
Output Matrix=
[ 1 2 3
NaN NaN NaN
NaN NaN NaN
7 8 9
10 11 12
NaN NaN NaN
10 9 11 ];
I know the dimensions of output I want (= initial Matrix A dimensions), and the row numbers which should be NaN (= Row_NaN_MatA) . The rest of the rows should be equal to rows of Matrix B2.
How can I do this?
Use setdiff to get the row IDs that were not part of Row_NaN_MatA by setdiff-ing Row_NaN_MatA with the a 1D array of indices for the entire row extent of A, like so -
output = A
output(setdiff(1:size(A,1),Row_NaN_MatA),:) = B2
You can also use ismember for the same effect -
output(~ismember(1:size(A,1),Row_NaN_MatA),:) = B2
Or use bsxfun there -
output(all(bsxfun(#ne,Row_NaN_MatA(:),1:size(A,1))),:) = B2
Sample run -
>> A
A =
1 2 3
4 255 6
NaN NaN NaN
7 8 9
10 11 12
NaN NaN NaN
10 9 11
>> B1
B1 =
1 2 3
4 255 6
7 8 9
10 11 12
10 9 11
>> B2
B2 =
1 2 3
NaN NaN NaN
7 8 9
10 11 12
10 9 11
>> output
output =
1 2 3
NaN NaN NaN
NaN NaN NaN
7 8 9
10 11 12
NaN NaN NaN
10 9 11

Concatenate matrices with different start index and different end index (Aligning)

for i = 1 : numel(T);
j = 1 : numel(T(i).n);
P(i,j) = (T(i).n);
G(i) = (T(i).lastPulse)-1100;
Y = P(1,G(1):length(T(1).n));
S = P(2,G(2):length(T(2).n));
end
I have the preceeding code. P is a (191x10000) matrix. I want to take out a specific portion of each row as I showed in S and Y and then concatenate S and Y and other row matrices corresponding to other rows of P to create matrix A(191x[max length of (S,Y,...)]). BUT the tricky part is that I cannot make S and Y aligned.
EXAMPLE:
P = [1 2 1 3 1 1 1 0 3 1 0]
[3 0 2 0 1 1 4 1 1 2 0];
S = P(1,1:7) = [1 2 1 3 1 1 1];
Y = P(2,5:10) = [1 1 4 1 1 2];
% A = concatenated S and Y aligned to original P.
A = [ 1 2 1 3 1 1 1 nan nan nan nan]
[nan nan nan nan 1 1 4 1 1 2 nan];
Preferably I would like to use a loop instead of separated matrices such as S and Y since I have many rows.
Suggested Answer:
I have the idea that probably I have to use indices corresponding to P and use them to concatenate Y and S, I just don't know how to execute this thought especially in a loop.
If I got the question correctly in my head, it seems bsxfun could be used here for creating a mask and then keep the masked elements from P and thus have an aligned output. Here's an implementation to go along those lines -
%// Random input array
P = randi(9,5,11)
%// Define the start and stop(end) indices as vectors
start_idx = [1 5 3 4 11]
stop_idx = [7 10 3 6 11]
%// Get the size of P and initialize output array
[M,N] = size(P);
P_out = NaN(M,N);
%// Create the mask for extracting specific elements from P
mask = bsxfun(#le,start_idx(:),1:N) & bsxfun(#ge,stop_idx(:),1:N);
%// Put masked elements from P into output array
P_out(mask) = P(mask)
Another way to get the output without initializing it, would be like this -
P_out = P.*mask;
P_out(~mask) = NaN;
So, to correlate with the variables used in the question, start_idx would be G and stop_idx would be [length(T(1).n),length(T(2).n).length(T(3).n),...].
Sample run -
P =
1 6 8 8 8 1 9 1 2 4 2
8 8 6 3 7 6 7 2 5 1 2
6 8 9 5 6 6 6 8 6 5 2
9 9 5 9 3 7 9 5 1 2 1
7 1 5 6 6 9 6 8 6 2 6
start_idx =
1 5 3 4 11
stop_idx =
7 10 3 6 11
P_out =
1 6 8 8 8 1 9 NaN NaN NaN NaN
NaN NaN NaN NaN 7 6 7 2 5 1 NaN
NaN NaN 9 NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN 9 3 7 NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 6

finding clustered NaNs but leaving lonely NaNs alone

I have an incomplete dataset,
N = [NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN]'
I wish to identify a cluster of Nans, that is, if the subsequent number of them exceeds 2. how do i do that?
You could do something like this:
aux = diff([0; isnan(N); 0]);
clusters = [find(aux == 1) find(aux == -1) - 1];
Then clusters will be a Nx2 matrix, where N is the number of NaN clusters (all of them), and each row gives you the start and end index of the cluster.
In this example, that would be:
clusters =
1 1
5 5
8 9
15 19
It means you have 4 NaN clusters, and cluster one ranges from index 1 to index 1, cluster two ranges from 5 to 5, cluster three ranges from 8 to 9 and cluster four ranges from 15 to 19.
If you want only the clusters with at least K NaNs, you could do it like this (for example, with K = 2):
K = 2;
clusters(clusters(:,2) - clusters(:,1) + 1 >= K, :)
That would give you this:
ans =
8 9
15 19
That is, clusters 8-9 and 15-19 have 2 or more NaNs.
Explanation:
Finding the clusters
isnan(N) gives you a logical vector containing the NaNs as ones:
N --------> NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN
isnan(N) -> 1 0 0 0 1 0 0 1 1 0 0 0 0 0 1 1 1 1 1
We want to know where each sequence of ones start, so we use diff, which calculates each value minus the previous one, and gives us this:
aux = diff(isnan(N));
N ----> NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN
aux --> -1 0 0 1 -1 0 1 0 -1 0 0 0 0 1 0 0 0 0
Where a 1 indicates the group start and a -1 indicates a group end. But it misses the first group start and the last group end, because the first 1 element is absent (it doesn't have a previous on N because it is the first) and the last -1 is absent too (because there is nothing after the last 1 on N). A common fix is to add a zero before and after the array, which gives us this:
aux = diff([0; isnan(N); 0]);
N ----> NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN
aux --> 1 -1 0 0 1 -1 0 1 0 -1 0 0 0 0 1 0 0 0 0 -1
Notice two things:
If the diff at index i is 1, N(i) is the start of the NaN block.
If the diff at index i is -1, N(i - 1) is the end of the NaN block.
To get the start and end, we use find to get the indexes where aux == 1 and aux == -1. Hence, we call find twice, and concatenate both calls using [ and ]:
aux = diff([0; isnan(N); 0]);
clusters = [find(aux == 1) find(aux == -1) - 1];
Filtering the clusters whick have K or more elements
The last step is to find clusters which have K or more elements. To do that, we first take the cluster matrix and subtract the first column from the first, and add 1, like this:
clusters(:,2) - clusters(:,1) + 1
ans =
1
1
2
5
It means clusters 1 and 2 have 1 NaN, cluster 3 have 3 NaNs and cluster 4 have 5 NaNs. If we ask which values are greather than or equal K, we get this:
clusters(:,2) - clusters(:,1) + 1 >= K
ans =
0
0
1
1
It's a logical array. We can use that to index only the 1 (true) rows of the cluster matrix, like this:
clusters(clusters(:,2) - clusters(:,1) + 1 >= K, :)
ans =
8 9
15 19
It's like asking: give us only the clusters where the rows match the ones on this logical vector, and give us all columns (denoted by the :).
Here is a modular solution:
% the number of NaN you consider as a cluster
num = 3;
% moving average filter
Z = filter(ones(num,1),1,isnan(N));
x = arrayfun(#(x) find(Z == num) - num + x, 1:num,'uni',0)
y = unique(cell2mat(x))
(UPDATE: faster version below)
gives for num = 1:
y = 1 5 8 9 15 16 17 18 19
for num = 2:
y = 8 9 15 16 17 18 19
for num = 3, num = 4 and num = 5:
y = 15 16 17 18 19
and finally for num = 6 ... and more
y = Empty matrix: 1-by-0
Explanation
isnan(N) returns a logical array with ones at the positions of NaN.
Z = filter(ones(num,1),1,isnan(N)); is a implementation for a moving average filter with a filter window of ones(num,1) = [1 1 1] (for num = 3). So the filter of size 3 glides of the array and just reaches the value num = 3 when there are 3 NaN in a row.
So it basicall looks like:
%// N isnan(N) Z
NaN 1 1
1 0 1
2 0 1
3 0 0
NaN 1 1
5 0 1
6 0 1
NaN 1 1
NaN 1 2
7 0 2
8 0 1
10 0 0
12 0 0
20 0 0
NaN 1 1
NaN 1 2
NaN 1 3
NaN 1 3
NaN 1 3
Now it is easy to find all elements which are 3: find(Z == num) - but you also need all 2 right before: find(Z == num) - num + 2 and all 1 right before: find(Z == num) - num + 1. Instead of a loop arrayfun is used, which is basically the same. As result you get a matrix with a lot of indices, lot of them mulitple, but you just need the unique ones. I hope everything is clear now.
Actually it would be much faster to get find out of arrayfun, which can then even be substituted by bsxfun and you can get rid of cell2mat, which leads to the following form:
Faster:
Z = find( filter(ones(num,1),1,isnan(N)) == num ) - num;
y = unique( bsxfun(#plus, Z,1:num) );
or faster the obligatory fancy one-liner:
y = unique(bsxfun(#plus,find(filter(ones(num,1),1,isnan(N))==num)-num,1:num));
STRFIND Approach
I. Fancy One Liner:
%%// Given input N
N = [NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN]
out = [strfind(num2str(isnan([ 0 N 0]),'%1d'),'011');strfind(num2str(isnan([ 0 N 0]),'%1d'),'110')]'
Output
out =
8 9
15 19
II. Detailed one with explanation:
Basically you are trying to do sliding window checks, for which there is no direct method when working with double arrays, but after converting to strings, one can use strfind. This trick is used here.
I would suggest following the comments used in the code and the output numbers to understand it. Please note that for this particular case a cluster means a group of two or more consecutive NaNs
Code
%%// Given input N
N = [NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN]
%%// Set the locations where NaNs are present and then
%%// append at the start and end with zeros
N2 = isnan([ 0 N 0])
%%// Find the start indices of all NaN clusters
start_ind = strfind(num2str(N2,'%1d'),'011')
%%// Find the stop indices of all NaN clusters
stop_ind = [strfind(num2str(N2,'%1d'),'110')]
%%// Put start and stop indices into a Mx2 matrix
out = [start_ind' stop_ind']
Output
N =
NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN
N2 =
0 1 0 0 0 1 0 0 1 1 0 0 0 0 0 1 1 1 1 1 0
start_ind =
8 15
stop_ind =
9 19
out =
8 9
15 19
This uses diff, as Rafael Monteiro's answer, but seems to be simpler:
ind = diff([0; isnan(N(:))]);
result = find(ind(1:end-1)==1 & ind(2:end)==0);
In your example, this gives [8 15].
How it works: ind takes the values:
1 where a run of (one or more) NaN values starts;
0 where there is no change between NaN and numeric with respect to the previous value;
-1 where a run of (one o more) numeric values starts.
The second line selects the positions at which a run of NaN starts and such that the next position is also NaN. Thus it gives the start of each run of more than one NaN, as desired.