Matlab convert columnar data into ndarray - matlab

Is there a simple (ideally without multiple for loops) way to group a vector of values according to a set of categories in Matlab?
I have data matrix in the form
CATEG_A CATEG_B CATEG_C ... VALUE
1 1 1 ... 0.64
1 2 1 ... 0.86
1 1 1 ... 0.74
1 1 2 ... 0.56
...
etc.
and what I want is an N-dimensional array
all_VALUE( CATEG_A, CATEG_B, CATEG_C, ..., index ) = VALUE_i
of course there may be any number of values with the same category combination, so size(end) would be the number of value in the biggest category -- and the remaining items would be padded with nan.
Alternatively I'd be happy with
all_VALUE { CATEG_A, CATEG_B, CATEG_C, ... } ( index )
i.e. a cell array of vectors. I suppose it's a bit like creating a pivot table, but with n-dimensions, and not computing the mean.
I found this function in the help
A = accumarray(subs,val,[],#(x) {x})
but I couldn't fathom how to make it do what I wanted!

This is also a mess, but works. It goes the ND-array way.
X = [1 1 1 0.64
1 2 1 0.86
1 1 1 0.74
1 1 2 0.56]; %// data
N = size(X,1); %// number of values
[~, ~, label] = unique(X(:,1:end-1),'rows'); %// unique labels for indices
cumLabel = cumsum(sparse(1:N, label, 1),1); %// used for generating a cumulative count
%// for each label. The trick here is to separate each label in a different column
lastInd = full(cumLabel((1:N).'+(label-1)*N)); %'// pick appropriate values from
%// cumLabel to generate the cumulative count, which will be used as last index
%// for the result array
sizeY = [max(X(:,1:end-1),[],1) max(lastInd)]; %// size of result
Y = NaN(sizeY); %// initiallize result with NaNs
ind = mat2cell([X(:,1:end-1) lastInd], ones(1,N)); %// needed for comma-separated list
Y(sub2ind(sizeY, ind{:})) = X(:,end); %// linear indexing of values into Y
The result in your example is the following 4D array:
>> Y
Y(:,:,1,1) =
0.6400 0.8600
Y(:,:,2,1) =
0.5600 NaN
Y(:,:,1,2) =
0.7400 NaN
Y(:,:,2,2) =
NaN NaN

It's a mess but here is one solution
[U,~,subs] = unique(X(:,1:end-1),'rows');
sz = max(U);
Uc = mat2cell(U, size(U,1), ones(1,size(U,2)));
%// Uc is converted to cell matrices so that we can take advantage of the {:} notation which returns a comma-separated-list which allows us to pass a dynamic number of arguments to functions like sub2ind
I = sub2ind(sz, Uc{:});
G = accumarray(subs, X(:,end),[],#(x){x});
A{prod(max(U))} = []; %// Pre-assign the correct number of cells to A so we can reshape later
A(I) = G;
reshape(A, sz)
On your example data (ignoring the ...s) this returns:
A(:,:,1) =
[2x1 double] [0.8600]
A(:,:,2) =
[0.5600] []
where A(1,1,1) is [0.74; 0.64]

Related

How to find maximum value within multiple intervals?

I have two arrays, der_pos and der_neg, which contain indices of an array interpolated. I want to get all the indices for the maximum values of interpolated in the intervals:
der_pos(1):der_neg(1)
der_pos(2):der_neg(2)
etc...
E.g.:
interpolated = [1,5,3,2,7,10,8,14,4]
der_pos = [1,6]
der_neg = [4,9]
So, I would like to obtain the indices:
[2,8]
Because:
in the interval der_pos(1):der_neg(1) → 1:4 → interpolated(1:4) = [1,5,3,2] the max is 5 which is at index 2.
in the interval der_pos(2):der_neg(2) → 6:9 → interpolated(6:9) = [10,8,14,4] the max is 14 which is at index 8.
I managed to do it using a for loop:
interpolated = [1,5,3,2,7,10,8,14,4];
der_pos = [1,6];
der_neg = [4,9];
indices = zeros(1,length(der_pos));
for i = [1:length(der_pos)]
[peak, index] = max(interpolated(der_pos(i):der_neg(i)));
indices(i) = index + der_pos(i) - 1;
endfor
indices % gives [2,8]
But is there a more concise way of doing this?
Here's a sample code. The function findpeaks returns all the peak. The loop then keeps indices of peaks that are in the wanted range.
I added a test to avoid errors in case of no found peak (index will be -1), and to keep the first peak if two peaks are found. You can use a cell if you want to keep all peaks in an interval.
interpolated = [1,5,3,2,7,10,8,14,4];
der_pos = [1 6 7 ];
der_neg = [4 9 8];
[~,i]=findpeaks(interpolated);
indices= -1+zeros(size(der_pos,2),1);
for loopi = 1:length(i)
val = i(i>=der_pos(loopi)&i<=der_neg(loopi));
if ~isempty(val)
indices(loopi) = val(1);
end
end
Here's a way:
interpolated = [1,5,3,2,7,10,8,14,4]; % data
der_pos = [1,6]; % data
der_neg = [4,9]; % data
m = bsxfun(#ge, 1:numel(interpolated), der_pos(:)) .* ...
bsxfun(#le, 1:numel(interpolated), der_neg(:)); % each row is a mask that contains
% 1 for values in the interval and 0 for values outside the interval
m(~m) = NaN; % replace 0 by NaN
[val, result] = max(bsxfun(#times, m, interpolated(:).'), [], 2); % val is the maximum
% of each row, and result is its column index. The maximum is 1 for rows that
% contain at least a 1, and NaN for rows that only contain NaN
result(isnan(val)) = 0; % If the maximum value was NaN the result is set to 0
% (or maybe use NaN), to indicate that the interval was empty
This gives 0 for empty intervals. For example, der_pos = [1,6,8]; der_neg = [4,9,6]; produce result = [2;8;0].
The intervals may overlap. For example, der_pos = [1,6,3]; der_neg = [4,9,7]; produce result = [2;8;6].

Get the first four minimum values in a matrix

I have a matrix:
X =
0 81 13 15 100 2
11 0 6 10 200 8
19 22 0 20 300 23
I want to get the first four minimal values in the whole array X with the indices of each value in the array. For example I should get vector v = [2 6 8 10] and the index of each value in X.
Also, I want to ignore the zero values when the row number equals the column number.
I have tried to use the min and sort functions, but I am not sure how to do it.
I would suggest the following
X2 = X;
X2(~~eye(size(X2))) = inf; %// or X2(logical(eye(size(X2)))) = inf
[val, idx] = sort(X2(:));
result = val(1:4);
[idxRow, idxCol] = ind2sub(size(X), idx(1:4));
Use:
vals = sort(X(~eye(size(X)))); %takes non diagonal values and sort the result
res = vals(1:4) %finds the first 4 elements (which are the smallest)
[row, col] = find(ismember(X,res)); %gets the indices
result:
res = [2; 6; 8; 10]
By The way, if you don't want to ignore all the diagonal values, only the zero ones, use:
vals = sort(X(~eye(size(X)) | (eye(size(X)) & X~=0)));
Sort all but the ones on the diagonal and then find the indices of the ones which are smaller than or equal to the 4th element of sorted array and not on the diagonal:
T=sort(X(~eye(size(X))));
v = T(1:4);
[I,J] = find(X <= v(end) & ~eye(size(X)));
Just want to add to drorco's perfect answer how to find indexes of this first elements:
indexes = arrayfun( #(a) find(X==a), res);
or if you want to get numbers of rows and columns:
[r,c] = arrayfun( #(a) find(X==a), res);
P.S. it works perfectly if all elements except zeros in X are unique.

Sort elements of rows in a matrix with another matrix

I have a matrix D of distances between 3 places and 4 persons
example D(2,3) = 10 means person 3 is far away from place 2 of 10 units.
D=[23 54 67 32
32 5 10 2
3 11 13 5]
another matrix A with the same number of rows (3 places) and where A(i,:) correspond to the persons that picked place i
example for place 1, persons 1 and 3 picked it
no one picked place 2
and persons 2 and 4 picked place 3
A=[1 3 0
0 0 0
2 4 0]
I want to reorder each row of A by the persons who are closest to the place it represents.
In this example, for place 1, person 1 is closer to it than person 3 based on D so nothing to do.
nothing to do for place 2
and there is a change for place 3 since person 4 is closer than 2 to place 3 D(3,2)>D(3,4)
The result should be
A=[1 3
0 0
4 2 ]
each row(place) in A can have 0 or many non zeros elements in it (persons that picked it)
Basically, I want to reorder elements in each row of A based on the rows of D (the closest to the location comes first), something like this but here A and D are not of the same size (number of columns).
[SortedD,Ind] = sort(D,2)
for r = 1:size(A,1)
A(r,:) = A(r,Ind(r,:));
end
There is another Matlab function sortrows(C,colummn_index) that can do the trick. It can sort rows based on the elements in a particular column. So if you transpose your matrix A (C = A') and extend the result by adding to the end the proper column, according to which you want to sort a required row, then you will get what you want.
To be more specific, you can do something like this:
clear all
D=[23 54 67 32;
32 5 10 2;
3 11 13 5];
A=[1 0;
3 0;
4 2 ];
% Sort elements in each row of the matrix A,
% because indices of elements in each row of the matrix D are always
% ascending.
A_sorted = sort(A,2);
% shifting all zeros in each row to the end
for i = 1:length(A_sorted(:,1))
num_zeros = sum(A_sorted(i,:)==0);
if num_zeros < length(A_sorted(i,:))
z = zeros(1,num_zeros);
A_sorted(i,:) = [A_sorted(i,num_zeros+1:length(A_sorted(i,:))) z];
end;
end;
% Prelocate in memory an associated array of the corresponding elements in
% D. The matrix Dr is just a reduced derivation from the matrix D.
Dr = zeros(length(A_sorted(:,1)),length(A_sorted(1,:)));
% Create a matrix Dr of elements in D corresponding to the matrix A_sorted.
for i = 1:length(A_sorted(:,1)) % i = 1:3
for j = 1:length(A_sorted(1,:)) % j = 1:2
if A_sorted(i,j) == 0
Dr(i,j) = 0;
else
Dr(i,j) = D(i,A_sorted(i,j));
end;
end;
end;
% We don't need the matrix A_sorted anymore
clear A_sorted
% In order to use the function SORTROWS, we need to transpose matrices
A = A';
Dr = Dr';
% The actual sorting procedure starts here.
for i = 1:length(A(1,:)) % i = 1:3
C = zeros(length(A(:,1)),2); % buffer matrix
C(:,1) = A(:,i);
C(:,2) = Dr(:,i);
C = sortrows(C,2);
A(:,i) = C(:,1);
% shifting all zeros in each column to the end
num_zeros = sum(A(:,i)==0);
if num_zeros < length(A(:,i))
z = zeros(1,num_zeros);
A(:,i) = [A(num_zeros+1:length(A(:,i)),i) z]';
end;
end;
% Transpose the matrix A back
A = A';
clear C Dr z

Permuting columns of a matrix in MATLAB

Say I have an n by d matrix A and I want to permute the entries of some columns. To do this, I compute permutations of 1 ... n as
idx1 = randperm(n)'
idx2 = randperm(n)'
Then I could do:
A(:,1) = A(idx1,1)
A(:,2) = A(idx2,2)
However, I dont want to do this using a for-loop, as it'll be slow. Say I have an n by d matrix A and an n by d index matrix IDX that specifies the permutations, is there a quicker equivalent of the following for-loop:
for i = 1:d
A(:,i) = A(IDX(:,i),i);
end
Using linear-indexing with the help of bsxfun -
[n,m] = size(A);
newA = A(bsxfun(#plus,IDX,[0:m-1]*n))
I guess another rather stupid way to do it is with cellfun, stupid because you have to convert it into a cell and then convert it back, but it is there anyways.
N=ones(d,1)*n; %//create a vector of d length with each element = n
M=num2cell(N); %//convert it into a cell
P=cellfun(#randperm, M,'uni',0); %//cellfun applys randperm to each cell
res = cell2mat(P); %//Convert result back into a matrix (since results are numeric).
This also allows randperm of type Char and String, but the cell2mat will not work for those cases, results are in Cell Array format instead.
for d = 5, n = 3:
>> res =
1 3 2
1 2 3
2 3 1
3 1 2
3 2 1

Replacing zeros (or NANs) in a matrix with the previous element row-wise or column-wise in a fully vectorized way

I need to replace the zeros (or NaNs) in a matrix with the previous element row-wise, so basically I need this Matrix X
[0,1,2,2,1,0;
5,6,3,0,0,2;
0,0,1,1,0,1]
To become like this:
[0,1,2,2,1,1;
5,6,3,3,3,2;
0,0,1,1,1,1],
please note that if the first row element is zero it will stay like that.
I know that this has been solved for a single row or column vector in a vectorized way and this is one of the nicest way of doing that:
id = find(X);
X(id(2:end)) = diff(X(id));
Y = cumsum(X)
The problem is that the indexing of a matrix in Matlab/Octave is consecutive and increments columnwise so it works for a single row or column but the same exact concept cannot be applied but needs to be modified with multiple rows 'cause each of raw/column starts fresh and must be regarded as independent. I've tried my best and googled the whole google but coukldn’t find a way out. If I apply that same very idea in a loop it gets too slow cause my matrices contain 3000 rows at least. Can anyone help me out of this please?
Special case when zeros are isolated in each row
You can do it using the two-output version of find to locate the zeros and NaN's in all columns except the first, and then using linear indexing to fill those entries with their row-wise preceding values:
[ii jj] = find( (X(:,2:end)==0) | isnan(X(:,2:end)) );
X(ii+jj*size(X,1)) = X(ii+(jj-1)*size(X,1));
General case (consecutive zeros are allowed on each row)
X(isnan(X)) = 0; %// handle NaN's and zeros in a unified way
aux = repmat(2.^(1:size(X,2)), size(X,1), 1) .* ...
[ones(size(X,1),1) logical(X(:,2:end))]; %// positive powers of 2 or 0
col = floor(log2(cumsum(aux,2))); %// col index
ind = bsxfun(#plus, (col-1)*size(X,1), (1:size(X,1)).'); %'// linear index
Y = X(ind);
The trick is to make use of the matrix aux, which contains 0 if the corresponding entry of X is 0 and its column number is greater than 1; or else contains 2 raised to the column number. Thus, applying cumsum row-wise to this matrix, taking log2 and rounding down (matrix col) gives the column index of the rightmost nonzero entry up to the current entry, for each row (so this is a kind of row-wise "cummulative max" function.) It only remains to convert from column number to linear index (with bsxfun; could also be done with sub2ind) and use that to index X.
This is valid for moderate sizes of X only. For large sizes, the powers of 2 used by the code quickly approach realmax and incorrect indices result.
Example:
X =
0 1 2 2 1 0 0
5 6 3 0 0 2 3
1 1 1 1 0 1 1
gives
>> Y
Y =
0 1 2 2 1 1 1
5 6 3 3 3 2 3
1 1 1 1 1 1 1
You can generalize your own solution as follows:
Y = X.'; %'// Make a transposed copy of X
Y(isnan(Y)) = 0;
idx = find([ones(1, size(X, 1)); Y(2:end, :)]);
Y(idx(2:end)) = diff(Y(idx));
Y = reshape(cumsum(Y(:)), [], size(X, 1)).'; %'// Reshape back into a matrix
This works by treating the input data as a long vector, applying the original solution and then reshaping the result back into a matrix. The first column is always treated as non-zero so that the values don't propagate throughout rows. Also note that the original matrix is transposed so that it is converted to a vector in row-major order.
Modified version of Eitan's answer to avoid propagating values across rows:
Y = X'; %'
tf = Y > 0;
tf(1,:) = true;
idx = find(tf);
Y(idx(2:end)) = diff(Y(idx));
Y = reshape(cumsum(Y(:)),fliplr(size(X)))';
x=[0,1,2,2,1,0;
5,6,3,0,1,2;
1,1,1,1,0,1];
%Do it column by column is easier
x=x';
rm=0;
while 1
%fields to replace
l=(x==0);
%do nothing for the first row/column
l(1,:)=0;
rm2=sum(sum(l));
if rm2==rm
%nothing to do
break;
else
rm=rm2;
end
%replace zeros
x(l) = x(find(l)-1);
end
x=x';
I have a function I use for a similar problem for filling NaNs. This can probably be cutdown or sped up further - it's extracted from pre-existing code that has a bunch more functionality (forward/backward filling, maximum distance etc).
X = [
0 1 2 2 1 0
5 6 3 0 0 2
1 1 1 1 0 1
0 0 4 5 3 9
];
X(X == 0) = NaN;
Y = nanfill(X,2);
Y(isnan(Y)) = 0
function y = nanfill(x,dim)
if nargin < 2, dim = 1; end
if dim == 2, y = nanfill(x',1)'; return; end
i = find(~isnan(x(:)));
j = 1:size(x,1):numel(x);
j = j(ones(size(x,1),1),:);
ix = max(rep([1; i],diff([1; i; numel(x) + 1])),j(:));
y = reshape(x(ix),size(x));
function y = rep(x,times)
i = find(times);
if length(i) < length(times), x = x(i); times = times(i); end
i = cumsum([1; times(:)]);
j = zeros(i(end)-1,1);
j(i(1:end-1)) = 1;
y = x(cumsum(j));