Formatting matrix data for ANOVAN in MATLAB - matlab

I have a matrix of data values that looks a bit like this, though significantly larger (2000+ rows, 30+ columns):
NaN 12 3 NaN 18 NaN 42 NaN NaN NaN NaN...
68 NaN 14 Nan NaN NaN NaN NaN NaN NaN 26 ...
...
So you see that is largely populated by NaN values. What I am interested in, naturally, are the cells that are populated by values.
I want to be able to run anovan on this data set, and unfortunately it is too large to reformat by hand. What I want to do is have a script run through the matrix, find every value that is not NaN and its index in the matrix, and create three arrays for the anovan input:
Values=[ 12 3 18 42 68 14 26 ...]
Rows= [ 1 1 1 1 2 2 2 ...]
Columns= [ 2 3 5 7 1 3 11 ...]
The rows and columns correspond to raters and ratees in a study, which is why they it is so important for me to preserve the exact index of each value.
I cannot figure out how to do this, though.
I have tried using find, but can't get it to do what I want.
[r c v] = find(~isnan(datamatrix)) %% doesn't work
EDIT: It occurs to me I could just do:
[r c v] = find(datamatrix)
This will include all of the NaN values, in the [r c v] output, though. In that situation, how would I go through the V array and delete the NaN values AND their corresponding R and C values?
EDIT2: Scratch that. I forgot that some of my values are 0, so I can't use the FIND command.

You can extract all the non-NaN numbers and their indices from data matrix like this:
i = find(~isnan(datamatrix));
values = datamatrix(i);
[rows,columns] = ind2sub(size(datamatrix),i);
For the example data you included, this results in:
rows' = [2 1 1 2 1 1 2]
columns' = [ 1 2 3 3 5 7 11]
values' = [68 12 3 14 18 42 26]
That's all the indices and all their corresponding values. If you need them ordered in a particular way you'll have to do that seperately.

Related

Replace non-NaN values with their row indices within matrix

I have the 4x2 matrix A:
A = [2 NaN 5 8; 14 NaN 23 NaN]';
I want to replace the non-NaN values with their associated indices within each column in A. The output looks like this:
out = [1 NaN 3 4; 1 NaN 3 NaN]';
I know how to do it for each column manually, but I would like an automatic solution, as I have much larger matrices to handle. Anyone has any idea?
out = bsxfun(#times, A-A+1, (1:size(A,1)).');
How it works:
A-A+1 replaces actual numbers in A by 1, and keeps NaN as NaN
(1:size(A,1)).' is a column vector of row indices
bsxfun(#times, ...) multiplies both of the above with singleton expansion.
As pointed out by #thewaywewalk, in Matlab R2016 onwards bsxfun(#times...) can be replaced by .*, as singleton expansion is enabled by default:
out = (A-A+1) .* (1:size(A,1)).';
An alternative suggested by #Dev-Il is
out = bsxfun(#plus, A*0, (1:size(A,1)).');
This works because multiplying by 0 replaces actual numbers by 0, and keeps NaN as is.
Applying ind2sub to a mask created with isnan will do.
mask = find(~isnan(A));
[rows,~] = ind2sub(size(A),mask)
A(mask) = rows;
Note that the second output of ind2sub needs to be requested (but neglected with ~) as well [rows,~] to indicate you want the output for a 2D-matrix.
A =
1 1
NaN NaN
3 3
4 NaN
A.' =
1 NaN 3 4
1 NaN 3 NaN
Also be careful the with the two different transpose operators ' and .'.
Alternative
[n,m] = size(A);
B = ndgrid(1:n,1:m);
B(isnan(A)) = NaN;
or even (with a little inspiration by Luis Mendo)
[n,m] = size(A);
B = A-A + ndgrid(1:n,1:m)
or in one line
B = A-A + ndgrid(1:size(A,1),1:size(A,2))
This can be done using repmat and isnan as follows:
A = [ 2 NaN 5 8;
14 NaN 23 NaN];
out=repmat([1:size(A,2)],size(A,1),1); % out contains indexes of all the values
out(isnan(A))= NaN % Replacing the indexes where NaN exists with NaN
Output:
1 NaN 3 4
1 NaN 3 NaN
You can take the transpose if you want.
I'm adding another answer for a couple of reasons:
Because overkill (*ahem* kron *ahem*) is fun.
To demonstrate that A*0 does the same as A-A.
A = [2 NaN 5 8; 14 NaN 23 NaN].';
out = A*0 + kron((1:size(A,1)).', ones(1,size(A,2)))
out =
1 1
NaN NaN
3 3
4 NaN

MATLAB: combine rows with similar values

I am new to MATLAB and I am trying to combine rows with similar values (I have thousands of rows), for example
1 NaN
1 NaN
1 NaN
2 9
2 26.5
2 21.5
2 18
2 24.5
2 12
2 22.5
3 NaN
3 NaN
3 NaN
3 NaN
4 18.5
4 22
4 35.5
...
...
...
to
1 NaN NaN NaN
2 9 26.5 21.5 18 24.5 12 22.5
3 NaN NaN NaN NaN
4 18.5 22 35.5
can any one please help me with this?
This can't be done with normal arrays. Each row has to have same number of columns, but your desired output isn't so. You can work with cell arrays if you wish.
If cell arrays are an option, the best way to tackle this IMHO would be to use an accumarray/sort/cellfun pipeline. First use accumarray to group all of the values together that belong to the same ID, so the first column in your case. Each group would thus be a cell array. However, a consequence with accumarray is that the values that come in per group are unordered. Therefore, what you'd have to group instead are the locations of the values instead. You'd sort these locations and what is output is a cell array where each cell are a list of indices you'd access in the original data.
You'd then call cellfun as the last step to use the indices access the actual data itself.
Something like this comes to mind, assuming your data is stored in X and it's a two-column array.
ind = (1 : size(X,1)).'; %'
out_ind = accumarray(X(:,1), ind, [], #(x) {sort(x)});
out = cellfun(#(x) X(x,2), out_ind, 'uni', 0);
We thus get:
>> celldisp(out)
out{1} =
NaN
NaN
NaN
out{2} =
9.0000
26.5000
21.5000
18.0000
24.5000
12.0000
22.5000
out{3} =
NaN
NaN
NaN
NaN
out{4} =
18.5000
22.0000
35.5000

Replace NaN sequence according to values before and after the sequence

I would appreciate if someone can help me with this problem...
I have a vector
A = [NaN 1 1 1 1 NaN NaN NaN NaN NaN 2 2 2 NaN NaN NaN 2 NaN NaN 3 NaN NaN];
I would like to fill the NaN values according to this logic.
1) if the value that precedes the sequence of NaN is different from the one that follows the sequence => assign half of the NaNs to the first value and half to the second value
2) if the NaN seqence is between 2 equal values => fill the NaN with that value.
A should be then:
A = [1 1 1 1 1 1 1 (1) 2 2 2 2 2 2 2 2 2 2 3 3 3]
I have put one 1 within brakets because I assigned that value to the first half...the sequence of NaNs is odd.
I am typing this in my phone, without MATLAB - so there can be some issues. But this should be close:
t = 1:numel(A);
Anew = interp1(t(~isnan(A)),A(~isnan(A)),t,'nearest','extrap');
If you have the image processing toolbox, you can use bwdist to calculate the index of the nearest non-NaN-neighbor:
nanMask = isnan(A);
[~,idx] = bwdist(~nanMask);
A(nanMask) = A(idx(nanMask));

How can I use values within a MATLAB matrix as indices to determine the location of data in a new matrix?

I have a matrix that looks like the following.
I want to take the column 3 values and put them in another matrix, according to the following rule.
The value in the Column 5 is the row index for the new matrix, and Column 6 is the column index. Therefore 20 (taken from 29,3) should be in Row 1 Column 57 of the new matrix, 30 (from 30,3) should in Row 1 column 4 of the new matrix, and so on.
If the value in column 3 is NaN then I want NaN to be copied over to the new matrix.
Example:
% matrix of values and row/column subscripts
A = [
20 1 57
30 1 4
25 1 16
nan 1 26
nan 1 28
25 1 36
nan 1 53
50 1 56
nan 2 1
nan 2 2
nan 2 3
80 2 5
];
% fill new matrix
B = zeros(5,60);
idx = sub2ind(size(B), A(:,2), A(:,3));
B(idx) = A(:,1);
There are a couple other ways to do this, but I think the above code is easy to understand. It is using linear indexing.
Assuming you don't have duplicate subscripts, you could also use:
B = full(sparse(A(:,2), A(:,3), A(:,1), m, n));
(where m and n are the output matrix size)
Another one:
B = accumarray(A(:,[2 3]), A(:,1), [m,n]);
I am not sure if I understood your question clearly but this might help:
(Assuming your main matrix is A)
nRows = max(A(:,5));
nColumns = max(A(:,6));
FinalMatrix = zeros(nRows,nColumns);
for i=1:size(A,1)
FinalMatrix(A(i,5),A(i,6))=A(i,3);
end
Note that above code sets the rest of the elements equal to zero.

How to get rid of NaN while moving cells up column wise in matlab

I have a relatively big matrix (800'000 x 1'000) which contains NaNs at the end of some columns and I need to get rid of them while moving up each cells. When I remove a NaN the next cell should move up. I don't get to move the values of the next column into the cells after the non NaN values of the column right before. It is important that the number of rows remains the same as the initial matrix (which I fix) but the number of columns will obviously change.
Here is an example on a smaller matrix A1 4x5:
A1 = [
1 5 8 9 11
2 6 NaN 10 12
3 7 NaN NaN 13
4 NaN NaN NaN NaN ]
I need A1 to become:
A2 = [
1 5 9 13
2 6 10 NaN
3 7 11 NaN
4 8 12 NaN ]
In this example A1(1,3)=8 moved to A2(4,2)=8, A1(1,4)=9 moved to A2(1,3)=9, A1(2,4)=10 moved to A2(2,3)=10 and so on. The number of rows is still 4 but the number of columns becomes 4. The NaN cells in the last columns are needed to avoid 'matrix dimension mismatch error' but I do not need it so, after that (or at the same time), I should get rid of the last column of the matrix which may still contain NaNs. Finally, my matrix should become:
A3 = [
1 5 9
2 6 10
3 7 11
4 8 12 ]
I tried to use A1(~isnan(A1)) but this command put the values in a single column vector, while I need to still have a matrix of predetermined number of rows or at least a cell array which contains each column of the matrix A3 in each cell array.
Is there a way to go from A1 to A3?
What you'd have to do is to first filter out the NaN's, then reshape the remaining data. Try this:
reshape(A1(isfinite(A1)),4,[])
You might need to tweak this a bit, but I think it'll do what you want in a single step.
I'm not sure if the replace operator will work with missing values like this, however, so you might need something like this:
A2=A1(isfinite(A1))
A3=reshape(A2(1:(4*floor(length(A2)/4))),4,[])
Here's a very straightforward approach - there's almost certainly a more efficient way of doing it, given the amount of copying going on here.
vals = A1(~isnan(A1));
A2 = NaN(size(A1));
A2(1:length(vals)) = vals;
A3 = A2(:,~any(isnan(A2)));
n = size(A1,1); %// number of rows
A1 = A1(~isnan(A1)); %// linearize to a column and remove NaN's
A1 = A1(1:floor(numel(A1)/n)*n); %// trim last values if needed
A1 = reshape(A1,n,[]); %// put into desired shape