Sum of groups of four in a matrix - matlab

I have the following matrix: first column are the values of 1 to 5, second column is 1 to 20, and the third column are random values.
1 1 2545
1 2 0
1 3 0
1 4 0
2 5 0
2 6 0
2 7 231
2 8 54587
3 9 41
3 10 1111
3 11 0
3 12 1213
4 13 0
4 14 0
4 15 0
4 16 0
5 17 898
5 18 6887
5 19 522
5 20 23
What I am trying to do is to get the sum in groups of four when all values are different of zero. As an example, in the matrix the output I want is:
1 NaN
2 NaN
3 NaN
4 NaN
5 8330

Assuming that the first column delineates what values in the third column belong to what group, the easiest would be to change all values that are zero to NaN, then use accumarray to sum all of the values that belong to each group. This is crucial because as soon as you sum over a matrix / array and any value is NaN, the result will be NaN. This is nice because if you sum over each group, you will get a NaN result if at least one of the values in the group was equal to 0 before the change.
I'm going to assume that your matrix is stored in X like so:
X = [1 1 2545
1 2 0
1 3 0
1 4 0
2 5 0
2 6 0
2 7 231
2 8 54587
3 9 41
3 10 1111
3 11 0
3 12 1213
4 13 0
4 14 0
4 15 0
4 16 0
5 17 898
5 18 6887
5 19 522
5 20 23 ];
Make a copy of the third column, and let's do some magic:
col = X(:,3);
col(col == 0) = NaN;
out = accumarray(X(:,1), col);
We thus get:
out =
NaN
NaN
NaN
NaN
8330
The nice thing about this approach is that the group ID for each value in your matrix doesn't have to be in order as you have placed in your post.
If however your matrix is guaranteed to have the order where each group consists of consecutive 4-tuples of elements, you can do the same thing with the NaN assignment, but you can avoid using accumarray and reshape the third column into a matrix of four rows then sum over each row individually:
col = X(:,3);
col(col == 0) = NaN;
out = sum(reshape(col, 4, []), 1);

Related

How to efficiently compare elements in two vectors in MATLAB without using loops?

Say I have a matrix A whose first column contains item IDs with repetition and second column contains their weights.
A= [1 40
3 33
2 12
4 22
2 10
3 6
1 15
6 29
4 10
1 2
5 18
5 11
2 8
6 25
1 14
2 11
4 28
3 38
5 35
3 9];
I now want to find the difference of each instance of A and its associated minimum weight. For that, I make a matrix B with its first column containing the unique IDs from column 1 of A, and its column 2 containing the associated minimum weight found from column 2 of A.
B=[1 2
2 8
3 6
4 10
5 11
6 25];
Then, I want to store in column 3 of A the difference of each entry and its associated minimum weight.
A= [1 40 38
3 33 27
2 12 4
4 22 12
2 10 2
3 6 0
1 15 13
6 29 4
4 10 0
1 2 0
5 18 7
5 11 0
2 8 0
6 25 0
1 14 12
2 11 3
4 28 18
3 38 32
5 35 24
3 9 3];
This is the code I wrote to do this:
for i=1:size(A,1)
A(i,3) = A(i,1) - B(B(:,1)==A(i,2),2);
end
But this code takes a long time to execute as it needs to loop through B every time it loops through A. That is, it has a complexity of size(A) x size(B). Is there a better way to do this without using loops, that would execute faster?
You can use accumarray to first compute the minimum value in the second column of A for each unique value in the first column of A. We can then index into the result using the first column of A and compare to the second column of A to create the third column.
% Compute the mins
min_per_group = accumarray(A(:,1), A(:,2), [], #min);
% Compute the difference between the second column and the minima
A(:,3) = A(:,2) - min_per_group(A(:,1));

Dividing a vector to form different matrices

I have a two long vector. Vector one contains values of 0,1,2,3,4's, 0 represent no action, 1 represent action 1 and 2 represent the second action and so on. Each action is 720 sample point which means that you could find 720 consecutive twos then 720 consecutive 4s for example. Vector two contains raw data corresponding to each action. I need to create a matrix for each action ( 1, 2, 3 and 4) which contains the corresponding data of the second vector. For example matrix 1 should has all the data (vector 2 data) which occurred at the same indices of action 1. Any Help??
Example on small amount of data:
Vector 1: 0 0 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 2 2 2
Vector 2: 6 7 5 6 4 6 5 9 8 7 9 7 0 5 6 4 1 5 8 0
Result:
Matrix 1:
5 6 4
0 5 6
Matrix 2:
9 8 7
5 8 0
Here is one approach. I used a cell array to store the output matrices, hard-coding names for such variables isn't a good plan.
V1=[0 0 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 2 2 2]
V2=[6 7 5 6 4 6 5 9 8 7 9 7 0 5 6 4 1 5 8 0]
%// Find length of sequences of 1's/2's
len=find(diff(V1(find(diff(V1)~=0,1)+1:end))~=0,1)
I=unique(V1(V1>0)); %// This just finds how many matrices to make, 1 and 2 in this case
C=bsxfun(#eq,V1,I.'); %// The i-th row of C contains 1's where there are i's in V1
%// Now pick out the elements of V2 based on C, and store them in cell arrays
Matrix=arrayfun(#(m) reshape(V2(C(m,:)),len,[]).',I,'uni',0);
%// Note, the reshape converts from a vector to a matrix
%// Display results
Matrix{1}
Matrix{2}
Since, there is a regular pattern in the lengths of groups within Vector 1, that could be exploited to vectorize many things while proposing a solution. Here's one such implementation -
%// Form new vectors out of input vectors for non-zero elements in vec1
vec1n = vec1(vec1~=0)
vec2n = vec2(vec1~=0)
%// Find positions of group shifts and length of groups
df1 = diff(vec1n)~=0
grp_change = [true df1]
grplen = find(df1,1)
%// Reshape vec2n, so that we end up with N x grplen sized array
vec2nr = reshape(vec2n,grplen,[]).' %//'
%// ID/tag each group change based on their unique vector 2 values
[R,C] = sort(vec1n(grp_change))
%// Re-arrange rows of reshaped vector2, s.t. same ID rows are grouped succesively
vec2nrs = vec2nr(C,:)
%// Find extents of each group & use those extents to have final cell array output
grp_extent = diff(find([1 diff(R) 1]))
out = mat2cell(vec2nrs,grp_extent,grplen)
Sample run for the given inputs -
>> vec1
vec1 =
0 0 1 1 1 0 0 2 2 2 ...
0 0 1 1 1 0 0 2 2 2
>> vec2
vec2 =
6 7 5 6 4 6 5 9 8 7 ...
9 7 0 5 6 4 1 5 8 0
>> celldisp(out)
out{1} =
5 6 4
0 5 6
out{2} =
9 8 7
5 8 0
Here is another solution:
v1 = [0 0 1 1 1 0 0 2 2 2 0 0 1 1 1 0 0 2 2 2];
v2 = [6 7 5 6 4 6 5 9 8 7 9 7 0 5 6 4 1 5 8 0];
m1 = reshape(v2(v1 == 1), 3, [])'
m2 = reshape(v2(v1 == 2), 3, [])'
EDIT: David's solution is more flexible and probably more efficient.

finding clustered NaNs but leaving lonely NaNs alone

I have an incomplete dataset,
N = [NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN]'
I wish to identify a cluster of Nans, that is, if the subsequent number of them exceeds 2. how do i do that?
You could do something like this:
aux = diff([0; isnan(N); 0]);
clusters = [find(aux == 1) find(aux == -1) - 1];
Then clusters will be a Nx2 matrix, where N is the number of NaN clusters (all of them), and each row gives you the start and end index of the cluster.
In this example, that would be:
clusters =
1 1
5 5
8 9
15 19
It means you have 4 NaN clusters, and cluster one ranges from index 1 to index 1, cluster two ranges from 5 to 5, cluster three ranges from 8 to 9 and cluster four ranges from 15 to 19.
If you want only the clusters with at least K NaNs, you could do it like this (for example, with K = 2):
K = 2;
clusters(clusters(:,2) - clusters(:,1) + 1 >= K, :)
That would give you this:
ans =
8 9
15 19
That is, clusters 8-9 and 15-19 have 2 or more NaNs.
Explanation:
Finding the clusters
isnan(N) gives you a logical vector containing the NaNs as ones:
N --------> NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN
isnan(N) -> 1 0 0 0 1 0 0 1 1 0 0 0 0 0 1 1 1 1 1
We want to know where each sequence of ones start, so we use diff, which calculates each value minus the previous one, and gives us this:
aux = diff(isnan(N));
N ----> NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN
aux --> -1 0 0 1 -1 0 1 0 -1 0 0 0 0 1 0 0 0 0
Where a 1 indicates the group start and a -1 indicates a group end. But it misses the first group start and the last group end, because the first 1 element is absent (it doesn't have a previous on N because it is the first) and the last -1 is absent too (because there is nothing after the last 1 on N). A common fix is to add a zero before and after the array, which gives us this:
aux = diff([0; isnan(N); 0]);
N ----> NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN
aux --> 1 -1 0 0 1 -1 0 1 0 -1 0 0 0 0 1 0 0 0 0 -1
Notice two things:
If the diff at index i is 1, N(i) is the start of the NaN block.
If the diff at index i is -1, N(i - 1) is the end of the NaN block.
To get the start and end, we use find to get the indexes where aux == 1 and aux == -1. Hence, we call find twice, and concatenate both calls using [ and ]:
aux = diff([0; isnan(N); 0]);
clusters = [find(aux == 1) find(aux == -1) - 1];
Filtering the clusters whick have K or more elements
The last step is to find clusters which have K or more elements. To do that, we first take the cluster matrix and subtract the first column from the first, and add 1, like this:
clusters(:,2) - clusters(:,1) + 1
ans =
1
1
2
5
It means clusters 1 and 2 have 1 NaN, cluster 3 have 3 NaNs and cluster 4 have 5 NaNs. If we ask which values are greather than or equal K, we get this:
clusters(:,2) - clusters(:,1) + 1 >= K
ans =
0
0
1
1
It's a logical array. We can use that to index only the 1 (true) rows of the cluster matrix, like this:
clusters(clusters(:,2) - clusters(:,1) + 1 >= K, :)
ans =
8 9
15 19
It's like asking: give us only the clusters where the rows match the ones on this logical vector, and give us all columns (denoted by the :).
Here is a modular solution:
% the number of NaN you consider as a cluster
num = 3;
% moving average filter
Z = filter(ones(num,1),1,isnan(N));
x = arrayfun(#(x) find(Z == num) - num + x, 1:num,'uni',0)
y = unique(cell2mat(x))
(UPDATE: faster version below)
gives for num = 1:
y = 1 5 8 9 15 16 17 18 19
for num = 2:
y = 8 9 15 16 17 18 19
for num = 3, num = 4 and num = 5:
y = 15 16 17 18 19
and finally for num = 6 ... and more
y = Empty matrix: 1-by-0
Explanation
isnan(N) returns a logical array with ones at the positions of NaN.
Z = filter(ones(num,1),1,isnan(N)); is a implementation for a moving average filter with a filter window of ones(num,1) = [1 1 1] (for num = 3). So the filter of size 3 glides of the array and just reaches the value num = 3 when there are 3 NaN in a row.
So it basicall looks like:
%// N isnan(N) Z
NaN 1 1
1 0 1
2 0 1
3 0 0
NaN 1 1
5 0 1
6 0 1
NaN 1 1
NaN 1 2
7 0 2
8 0 1
10 0 0
12 0 0
20 0 0
NaN 1 1
NaN 1 2
NaN 1 3
NaN 1 3
NaN 1 3
Now it is easy to find all elements which are 3: find(Z == num) - but you also need all 2 right before: find(Z == num) - num + 2 and all 1 right before: find(Z == num) - num + 1. Instead of a loop arrayfun is used, which is basically the same. As result you get a matrix with a lot of indices, lot of them mulitple, but you just need the unique ones. I hope everything is clear now.
Actually it would be much faster to get find out of arrayfun, which can then even be substituted by bsxfun and you can get rid of cell2mat, which leads to the following form:
Faster:
Z = find( filter(ones(num,1),1,isnan(N)) == num ) - num;
y = unique( bsxfun(#plus, Z,1:num) );
or faster the obligatory fancy one-liner:
y = unique(bsxfun(#plus,find(filter(ones(num,1),1,isnan(N))==num)-num,1:num));
STRFIND Approach
I. Fancy One Liner:
%%// Given input N
N = [NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN]
out = [strfind(num2str(isnan([ 0 N 0]),'%1d'),'011');strfind(num2str(isnan([ 0 N 0]),'%1d'),'110')]'
Output
out =
8 9
15 19
II. Detailed one with explanation:
Basically you are trying to do sliding window checks, for which there is no direct method when working with double arrays, but after converting to strings, one can use strfind. This trick is used here.
I would suggest following the comments used in the code and the output numbers to understand it. Please note that for this particular case a cluster means a group of two or more consecutive NaNs
Code
%%// Given input N
N = [NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN]
%%// Set the locations where NaNs are present and then
%%// append at the start and end with zeros
N2 = isnan([ 0 N 0])
%%// Find the start indices of all NaN clusters
start_ind = strfind(num2str(N2,'%1d'),'011')
%%// Find the stop indices of all NaN clusters
stop_ind = [strfind(num2str(N2,'%1d'),'110')]
%%// Put start and stop indices into a Mx2 matrix
out = [start_ind' stop_ind']
Output
N =
NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN
N2 =
0 1 0 0 0 1 0 0 1 1 0 0 0 0 0 1 1 1 1 1 0
start_ind =
8 15
stop_ind =
9 19
out =
8 9
15 19
This uses diff, as Rafael Monteiro's answer, but seems to be simpler:
ind = diff([0; isnan(N(:))]);
result = find(ind(1:end-1)==1 & ind(2:end)==0);
In your example, this gives [8 15].
How it works: ind takes the values:
1 where a run of (one or more) NaN values starts;
0 where there is no change between NaN and numeric with respect to the previous value;
-1 where a run of (one o more) numeric values starts.
The second line selects the positions at which a run of NaN starts and such that the next position is also NaN. Thus it gives the start of each run of more than one NaN, as desired.

Summation of matrices

I want to sum together each cell in the same position for each matrix. I have k amount of (i,j) matrices stored in MATLAB as (i,j,k) and I want to create one matrix which is the sum of all them - however the MATLAB command sums together every value in each column whereas I want to sum together each cell in the same position from each matrix.
1 3 4 3 4 0 2 4 4
0 3 1 2 7 8 0 3 1
9 0 2 0 1 2 1 2 3
I want to create a matrix that is:
1+3+2 3+4+4 4+0+4
0+2+1 3+7+3 1+8+1
9+0+1 0+1+2 2+2+3
=
6 11 8
3 13 10
10 3 7
Use a second input to sum specifying the dimension along which to sum (in your case, 3):
>> A(:,:,1) = [ 1 3 4
0 3 1
9 0 2 ];
>> A(:,:,2) = [ 3 4 0
2 7 8
0 1 2 ];
>> A(:,:,3) = [ 2 4 4
0 3 1
1 2 3 ];
>> sum(A,3)
ans =
6 11 8
2 13 10
10 3 7

How can you remove matrix rows in Matlab based on some criteria?

In Matlab, how can I remove spesific rows from a matrix I require? If for example I would like to remove all rows from a matrix which contain a spesific value (like 0 or NaN)?
Let's say you have A
A = [1 2 3;4 5 0; 7 8 9; 10 NaN 12]
A =
1 2 3
4 5 0
7 8 9
10 NaN 12
Then, you can choose the rows as follows:
any(isnan(A'))
ans =
0 0 0 1
To delete those NaN-containing rows, you can do:
A(any(isnan(A')),:) = []
A =
1 2 3
4 5 0
7 8 9
You can choose 0-containing rows by any(A' == 0). If you want all elements to be 0s or NaNs, then you can use all instead of any.