MATLAB: combine rows with similar values - matlab

I am new to MATLAB and I am trying to combine rows with similar values (I have thousands of rows), for example
1 NaN
1 NaN
1 NaN
2 9
2 26.5
2 21.5
2 18
2 24.5
2 12
2 22.5
3 NaN
3 NaN
3 NaN
3 NaN
4 18.5
4 22
4 35.5
...
...
...
to
1 NaN NaN NaN
2 9 26.5 21.5 18 24.5 12 22.5
3 NaN NaN NaN NaN
4 18.5 22 35.5
can any one please help me with this?

This can't be done with normal arrays. Each row has to have same number of columns, but your desired output isn't so. You can work with cell arrays if you wish.
If cell arrays are an option, the best way to tackle this IMHO would be to use an accumarray/sort/cellfun pipeline. First use accumarray to group all of the values together that belong to the same ID, so the first column in your case. Each group would thus be a cell array. However, a consequence with accumarray is that the values that come in per group are unordered. Therefore, what you'd have to group instead are the locations of the values instead. You'd sort these locations and what is output is a cell array where each cell are a list of indices you'd access in the original data.
You'd then call cellfun as the last step to use the indices access the actual data itself.
Something like this comes to mind, assuming your data is stored in X and it's a two-column array.
ind = (1 : size(X,1)).'; %'
out_ind = accumarray(X(:,1), ind, [], #(x) {sort(x)});
out = cellfun(#(x) X(x,2), out_ind, 'uni', 0);
We thus get:
>> celldisp(out)
out{1} =
NaN
NaN
NaN
out{2} =
9.0000
26.5000
21.5000
18.0000
24.5000
12.0000
22.5000
out{3} =
NaN
NaN
NaN
NaN
out{4} =
18.5000
22.0000
35.5000

Related

Matlab: Plot non-equal matrices in a cell array without a loop

Knowing that:
There are a lot of discussion about plotting equal sized matrices in a cell array and it is quite easy to do without a loop.
For example, to plot the 2-by-2 matrices in mycell:
mycell = {[1 1; 2 1], [1 1; 3 1], [1 1; 4 1]};
We can use cellfun to add a row of NaN at the bottom of each matrix and then convert the cell to a matrix:
mycellnaned = cellfun(#(x) {[x;nan(1,2)]}, mycell);
mymat = cell2mat(mycellnaned');
mymat looks like:
1 1 1 1 1
2 1 3 1 4
NaN NaN NaN NaN NaN
Then we can plot it easily:
mymatx = mymat(:,1:2:end);
mymaty = mymat(:,2:2:end);
figure;
plot(mymatx, mymaty,'+-');
The problem:
The problem is now, how do I do something similar with a cell containing non-equal matrices? Such as:
mycell = {
[1:2; ones(1,2)]';
[1:4; ones(1,4)*2]';
[1:6; ones(1,6)*3]';
[1:8; ones(1,8)*4]';
[1:10; ones(1,10)*5]';
[1:12; ones(1,12)*6]';
};
mycell = repmat(mycell,1000,1);
I would not be able to convert them into one matrix like I did before. I could use a loop, as suggested in this answer, but it would be very inefficient if the cell contains thousands of matrices.
Therefore, I'm looking for a more efficient way of plotting non-equal sized matrices in a cell array.
Note that different colours should be used for different matrices in the figure.
Well, while I was writing the question, I figured it out...
I'd like to keep the question open since there might be better solutions.
For everyone else's reference, the solution is simple: add NaN to make the matrices equal sized:
% find out the maximum length of all matrices in the array
cellLengthMax = max(cellfun('length', mycell));
% fill the matrices so they are equal in size.
mycellfilled = cellfun(#(x) {[
x
nan(cellLengthMax-size(x,1), 2)
nan(1, 2)
]}, mycell);
Then convert to a matrix and plot:
mymat = cell2mat(mycellfilled');
mymatx = mymat(:,1:2:end);
mymaty = mymat(:,2:2:end);
figure;
plot(mymatx, mymaty,'+-');
mymat looks like:
1 1 1 2 1 3 1 4 1 5 1 6
2 1 2 2 2 3 2 4 2 5 2 6
NaN NaN 3 2 3 3 3 4 3 5 3 6
NaN NaN 4 2 4 3 4 4 4 5 4 6
NaN NaN NaN NaN 5 3 5 4 5 5 5 6
NaN NaN NaN NaN 6 3 6 4 6 5 6 6
NaN NaN NaN NaN NaN NaN 7 4 7 5 7 6
NaN NaN NaN NaN NaN NaN 8 4 8 5 8 6
NaN NaN NaN NaN NaN NaN NaN NaN 9 5 9 6
NaN NaN NaN NaN NaN NaN NaN NaN 10 5 10 6
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 11 6
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 12 6
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Update:
Time cost for plotting 6000 matrices:
using the solution proposed here: 1.183546 seconds.
using a loop: 3.450423 seconds.
Still not very satisfactory. I really wish to reduce the time to 0.1 seconds, because I'm trying to design an interactive UI, where the user can change a few parameters and the result get plotted instantly.
I don't want to reduce the resolution of the figure.
Update:
I did a profiler and it seems the 99% of the time is wasted on plot(mymatx, mymaty,'+-');. So the conclusion is, there is probably no other way to fasten this.

Row selection with a condition - Matlab

I want to take the rows from the following table (NEW) that the values of the first column of the Edge column (i.e. NEW.Edge(i,1)) are equal to the certain Number (N) and the values in the second column of Edge column (i.e. NEW.Edge(i,2)) are not equal to the values of array (IDs). For example, if N=2 and IDs=15,20 then I should get the rows from 1 to row 9.
The code that I tried so far is :
Y =[];
for i = 1:size(NEW,1)
if ((NEW.Edge(i,1)==N) & (sum(ismember(IDs,NEW.Edge(i,2))==0)))
Y = NEW.Edge(i,1)&NEW.Edge(i,2);
end
end
And
Y = ((NEW.Edge(i,1)==N) & (sum(ismember(IDs,NEW.Edge(i,2))==0)))
Lines = NEW (Y,:)
----Table 'NEW'
Event Node Edge
_________ ____ ________
edgetonew NaN 2 6
edgetonew NaN 2 7
edgetonew NaN 2 8
edgetonew NaN 2 9
edgetonew NaN 2 10
edgetonew NaN 2 11
edgetonew NaN 2 12
edgetonew NaN 2 13
edgetonew NaN 2 14
edgetonew NaN 2 15
edgetonew NaN 15 16
edgetonew NaN 15 17
edgetonew NaN 15 18
edgetonew NaN 15 19
edgetonew NaN 15 20
edgetonew NaN 20 21
You can use ismember and logical indexing to accomplish this. ismember will return a boolean array the size of the first input that is true if each value in the first input is anywhere in the second.
% Will be TRUE when the first column == N and the second column isn't in IDs
rows_to_keep = NEW.Edge(:,1) == N & ~ismember(NEW.Edge(:,2), IDs);
% Now use this logical array to grab just the rows that satisfy the condition
out = NEW(rows_to_keep,:);

Replace non-NaN values with their row indices within matrix

I have the 4x2 matrix A:
A = [2 NaN 5 8; 14 NaN 23 NaN]';
I want to replace the non-NaN values with their associated indices within each column in A. The output looks like this:
out = [1 NaN 3 4; 1 NaN 3 NaN]';
I know how to do it for each column manually, but I would like an automatic solution, as I have much larger matrices to handle. Anyone has any idea?
out = bsxfun(#times, A-A+1, (1:size(A,1)).');
How it works:
A-A+1 replaces actual numbers in A by 1, and keeps NaN as NaN
(1:size(A,1)).' is a column vector of row indices
bsxfun(#times, ...) multiplies both of the above with singleton expansion.
As pointed out by #thewaywewalk, in Matlab R2016 onwards bsxfun(#times...) can be replaced by .*, as singleton expansion is enabled by default:
out = (A-A+1) .* (1:size(A,1)).';
An alternative suggested by #Dev-Il is
out = bsxfun(#plus, A*0, (1:size(A,1)).');
This works because multiplying by 0 replaces actual numbers by 0, and keeps NaN as is.
Applying ind2sub to a mask created with isnan will do.
mask = find(~isnan(A));
[rows,~] = ind2sub(size(A),mask)
A(mask) = rows;
Note that the second output of ind2sub needs to be requested (but neglected with ~) as well [rows,~] to indicate you want the output for a 2D-matrix.
A =
1 1
NaN NaN
3 3
4 NaN
A.' =
1 NaN 3 4
1 NaN 3 NaN
Also be careful the with the two different transpose operators ' and .'.
Alternative
[n,m] = size(A);
B = ndgrid(1:n,1:m);
B(isnan(A)) = NaN;
or even (with a little inspiration by Luis Mendo)
[n,m] = size(A);
B = A-A + ndgrid(1:n,1:m)
or in one line
B = A-A + ndgrid(1:size(A,1),1:size(A,2))
This can be done using repmat and isnan as follows:
A = [ 2 NaN 5 8;
14 NaN 23 NaN];
out=repmat([1:size(A,2)],size(A,1),1); % out contains indexes of all the values
out(isnan(A))= NaN % Replacing the indexes where NaN exists with NaN
Output:
1 NaN 3 4
1 NaN 3 NaN
You can take the transpose if you want.
I'm adding another answer for a couple of reasons:
Because overkill (*ahem* kron *ahem*) is fun.
To demonstrate that A*0 does the same as A-A.
A = [2 NaN 5 8; 14 NaN 23 NaN].';
out = A*0 + kron((1:size(A,1)).', ones(1,size(A,2)))
out =
1 1
NaN NaN
3 3
4 NaN

Replace NaN sequence according to values before and after the sequence

I would appreciate if someone can help me with this problem...
I have a vector
A = [NaN 1 1 1 1 NaN NaN NaN NaN NaN 2 2 2 NaN NaN NaN 2 NaN NaN 3 NaN NaN];
I would like to fill the NaN values according to this logic.
1) if the value that precedes the sequence of NaN is different from the one that follows the sequence => assign half of the NaNs to the first value and half to the second value
2) if the NaN seqence is between 2 equal values => fill the NaN with that value.
A should be then:
A = [1 1 1 1 1 1 1 (1) 2 2 2 2 2 2 2 2 2 2 3 3 3]
I have put one 1 within brakets because I assigned that value to the first half...the sequence of NaNs is odd.
I am typing this in my phone, without MATLAB - so there can be some issues. But this should be close:
t = 1:numel(A);
Anew = interp1(t(~isnan(A)),A(~isnan(A)),t,'nearest','extrap');
If you have the image processing toolbox, you can use bwdist to calculate the index of the nearest non-NaN-neighbor:
nanMask = isnan(A);
[~,idx] = bwdist(~nanMask);
A(nanMask) = A(idx(nanMask));

Formatting matrix data for ANOVAN in MATLAB

I have a matrix of data values that looks a bit like this, though significantly larger (2000+ rows, 30+ columns):
NaN 12 3 NaN 18 NaN 42 NaN NaN NaN NaN...
68 NaN 14 Nan NaN NaN NaN NaN NaN NaN 26 ...
...
So you see that is largely populated by NaN values. What I am interested in, naturally, are the cells that are populated by values.
I want to be able to run anovan on this data set, and unfortunately it is too large to reformat by hand. What I want to do is have a script run through the matrix, find every value that is not NaN and its index in the matrix, and create three arrays for the anovan input:
Values=[ 12 3 18 42 68 14 26 ...]
Rows= [ 1 1 1 1 2 2 2 ...]
Columns= [ 2 3 5 7 1 3 11 ...]
The rows and columns correspond to raters and ratees in a study, which is why they it is so important for me to preserve the exact index of each value.
I cannot figure out how to do this, though.
I have tried using find, but can't get it to do what I want.
[r c v] = find(~isnan(datamatrix)) %% doesn't work
EDIT: It occurs to me I could just do:
[r c v] = find(datamatrix)
This will include all of the NaN values, in the [r c v] output, though. In that situation, how would I go through the V array and delete the NaN values AND their corresponding R and C values?
EDIT2: Scratch that. I forgot that some of my values are 0, so I can't use the FIND command.
You can extract all the non-NaN numbers and their indices from data matrix like this:
i = find(~isnan(datamatrix));
values = datamatrix(i);
[rows,columns] = ind2sub(size(datamatrix),i);
For the example data you included, this results in:
rows' = [2 1 1 2 1 1 2]
columns' = [ 1 2 3 3 5 7 11]
values' = [68 12 3 14 18 42 26]
That's all the indices and all their corresponding values. If you need them ordered in a particular way you'll have to do that seperately.