Ranking values with repeats. Matlab - matlab

My question is simple, but I can't find the answer...
I have a 100,000 rows x 30 colums matrix for a simulation, and I need to rank the 100k values of each column.
I'm looking for something similar to tiedrank but I need that repeats does count (and not the average).
Suppose: data = [-1 2 0 -2 0] what I need is rank= [2 5 3 1 4]
Any suggestion?
Thank you very much!
Juan

It seems like you need sort:
data = [-1 2 0 -2 0];
[ignore, idx] = sort( data );
rank( idx ) = 1:numel(idx)
rank =
2 5 3 1 4
To rank all columns of matrix as once you can use the following code
data = [ -1 2 0 -2 0; -1 -1 -2 2 2]' ; %'
[n m] = size( data ); % number of rows and columns
[ignore idx] = sort(data); % sort columns
rank = zeros( size(data) ); % allocate
rank( sub2ind( size(rank), idx, bsxfun(#times, 1:m, ones(n,1) ) ) ) = ...
repmat( (1:n)', 1, m )
rank =
2 2
5 3
3 1
1 4
4 5

Related

MATLAB: After reshaping matrix to array, how can we know back where the value originally belongs to?

Let z = [1 3 5 6] and by getting all the difference between each elements:
we get:
bsxfun(#minus, z', z)
ans =
0 -2 -4 -5
2 0 -2 -3
4 2 0 -1
5 3 1 0
I now want to order these values in ascending order and remove the duplicates. So:
sort(reshape(bsxfun(#minus, z', z),1,16))
ans =
Columns 1 through 13
-5 -4 -3 -2 -2 -1 0 0 0 0 1 2 2
Columns 14 through 16
3 4 5
C = unique(sort(reshape(bsxfun(#minus, z', z),1,16)))
C =
-5 -4 -3 -2 -1 0 1 2 3 4 5
But by looking at -5 in [-5 -4 -3 -2 -1 0 1 2 3 4 5],
how can I tell where -5 comes from. By reading myself the matrix,
0 -2 -4 -5
2 0 -2 -3
4 2 0 -1
5 3 1 0
I know it comes from z(1) - z(4), i.e. row 1 column 4.
Also 2 comes from both z(3) - z(2) and z(2) - z(1), which comes from two cases. Without reading the originally matrix itself, how can we know that the 2 in [-5 -4 -3 -2 -1 0 1 2 3 4 5] is originally in row 3 column 2 and row 2 column 1 of the original matrix?
So by looking at each element in [-5 -4 -3 -2 -1 0 1 2 3 4 5], how do we know, for example, where -5 comes from in the original matrix index efficiently. I want to know as I need to do operation on ,e.g.,-5 and two indices that produce this: for example, for each difference, say -5, i do (-5)*1*6, as z(1)- z(6) = -5. But for 2, I need to do 2*(3*2+2*1) as z(3) - z(2) = 2, z(2) - z(1) = 2 which is not distinct.
Thinking hard, I think i should not reshape bsxfun(#minus, z', z) to array. I will also create two index array such that I can do operations like (-5)*1*6 stated above effectively. However, this is easier said than done and I also have to take care of nondistinct sources. Or should I do the desired operations first?
Use the third output from unique. And don't sort, unique will do that for you.
[sortedOutput,~,linearIndices] = unique(reshape(bsxfun(#minus, z', z),[1 16]))
You can reconstruct the result from bsxfun like so:
distances = reshape(sortedOutput(linearIndices),[4 4]);
If you want to know where a certain value appears, you write
targetValue = -5;
targetValueIdx = find(sortedOutput==targetValue);
linearIndexIntoDistances = find(targetValueIdx==linearIndices);
[row,col] = ind2sub([4 4],linearIndexIntoDistances);
Because linearIndices is 1 wherever the first value in sortedOutput appears in the original vector.
If you save the result of bsxfun in an intermediate variable:
distances=bsxfun(#minus, z', z)
Then you can look for the values of C in distances using find iteratively.
[rows,cols]=find(C(i)==distances)
This will give all rows and cols if the values are repeated. You just need to then use them for your equation.
You can use accumarray to collect all row and column indices that correspond to the same value in the matrix of differences:
z = [1 3 5 6]; % data vector
zd = bsxfun(#minus, z.', z); % matrix of differences
[C, ~, ind] = unique(zd); % unique values and indices
[rr, cc] = ndgrid(1:numel(z)); % template for row and col indices
f = #(x){x}; % anonymous function to collect row and col indices
row = accumarray(ind, rr(:), [], f); % group row indices according to ind
col = accumarray(ind, cc(:), [], f); % same for col indices
For example, C(6) is value 0, which appears four times in zd, at positions given by row{6} and col{6}:
>> row{6}.'
ans =
3 2 1 4
>> col{6}.'
ans =
3 2 1 4
As you see, the results are not guaranteed to be sorted. If you need to sort them in linear order:
rowcol = cellfun(#(r,c)sortrows([r c]), row, col, 'UniformOutput', false);
so now
>> rowcol{6}
ans =
1 1
2 2
3 3
4 4
I'm not sure I've followed exactly but some points to consider:
unique will sort the data for you by default so you don't need to call sort first
unique actually has three outputs and you can recover your original vector (i.e. with duplicates) using the third output so
[C,~,ic] = unique(reshape(bsxfun(#minus, z', z),1,16))
now you can get back to bsxfun(#minus, z', z),1,16) by calling
reshape(C(ic), numel(z), numel(z))
You might be more interested in the second output of unique which tells you what index each unique value was at in your 1-by-16 vector. It really depends on what you're trying to do though. But with this you could get a list of row column pairs to match your unique values:
[rows, cols] = ndgrid(1:4);
coords = [rows(:), cols(:)];
[C, ia] = unique(reshape(bsxfun(#minus, z', z),1,16));
coords_pairs = coords(ia,:)
which results in
coords_pairs =
1 4
1 3
2 4
2 3
3 4
4 4
4 3
3 2
4 2
3 1
4 1

Cartesian product of row-indices in Matlab

I have a binary matrix A of dimension mxn with m>n in Matlab. I want to construct a matrix B of dimension cxn listing row wise each element of the Cartesian product of the row indices of the ones contained in A. To be more clear consider the following example.
Example:
%m=4;
%n=3;
A=[1 0 1;
0 0 1;
1 1 0;
0 0 1];
%column 1: "1" are at rows {1,3}
%column 2: "1" are at row {3}
%column 3: "1" are at rows {1,2,4}
%Hence, the Cartesian product {1,3}x{3}x{1,2,4} is
%{(1,3,1),(1,3,2),(1,3,4),(3,3,1),(3,3,2),(3,3,4)}
%I construct B by disposing row-wise each 3-tuple in the Cartesian product
%c=6
B=[1 3 1;
1 3 2;
1 3 4;
3 3 1;
3 3 2;
3 3 4];
You can get the cartesian product with the combvec command, for your example:
A=[1 0 1;...
0 0 1;...
1 1 0;...
0 0 1];
[x y]=find(A);
B=combvec(x(y==1).',x(y==2).',x(y==3).').';
% B =
% 1 3 1
% 3 3 1
% 1 3 2
% 3 3 2
% 1 3 4
% 3 3 4
You can expand this to an unknown number of columns by using the associative property of the product.
[x y]=find(A);
u_y=unique(y);
B=x(y==u_y(1)).';
for i=2:length(u_y)
B=combvec(B, x(y==u_y(i)).');
end
B=B.';
One solution (without toolbox):
A= [1 0 1;
0 0 1;
1 1 0;
0 0 1];
[ii,jj] = find(A)
kk = unique(jj);
for i = 1:length(kk)
v{i} = ii(jj==kk(i));
end
t=cell(1,length(kk));
[t{:}]= ndgrid(v{:});
product = []
for i = 1:length(kk)
product = [product,t{i}(:)];
end
You can use use accumarray to obtain vectors with the row indices of nonzero elements for each column. This works for an arbitrary number of columns:
[ii, jj] = find(A);
vectors = accumarray(jj, ii, [], #(x){sort(x.')});
Then apply this answer to efficiently compute the Cartesian product of those vectors:
n = numel(vectors);
B = cell(1,n);
[B{end:-1:1}] = ndgrid(vectors{end:-1:1});
B = cat(n+1, B{:});
B = reshape(B,[],n);
In your example, this gives
B =
1 3 1
1 3 2
1 3 4
3 3 1
3 3 2
3 3 4
In short, I would use find to generate the indices needed for the Cartesian product and then use ndgrid to perform the Cartesian product of these indices. The code to do so is:
clear
close all
clc
A = [1 0 1;
0 0 1;
1 1 0;
0 0 1];
[row,col] = find(A);
[~,ia,~] = unique(col);
n_cols = size(A,2);
indices = cell(n_cols,1);
for ii = 1:n_cols-1
indices{ii} = row(ia(ii):ia(ii+1)-1);
end
indices{end} = row(ia(end):end);
cp_temp = cell(n_cols,1);
[cp_temp{:}] = ndgrid(indices{:});
cp = NaN(numel(cp_temp{1}),n_cols);
for ii = 1:n_cols
cp(:,ii) = cp_temp{ii}(:);
end
cp = sortrows(cp);
cp

Fastest way of generating a logical matrix by given row indices of true values?

What is the most efficient way of generating
>> A
A =
0 1 1
1 1 0
1 0 1
0 0 0
with
>> B = [2 3; 1 2; 1 3]
B =
2 3
1 2
1 3
in MATLAB?
E.g., B(1, :), which is [2 3], means that A(2, 1) and A(3, 1) are true.
My attempt still requires one for loop, iterating through B's row. Is there a loop-free or more efficient way of doing this?
This is one way of many, though sub2ind is the dedicated function for that:
%// given row indices
B = [2 3; 1 2; 1 3]
%// size of row index matrix
[n,m] = size(B)
%// size of output matrix
[N,M] = deal( max(B(:)), n)
%// preallocation of output matrix
A = zeros(N,M)
%// get col indices to given row indices
cols = bsxfun(#times, ones(n,m),(1:n).')
%// set values
A( sub2ind([N,M],B,cols) ) = 1
A =
0 1 1
1 1 0
1 0 1
If you want a logical matrix, change the following to lines
A = false(N,M)
A( sub2ind([N,M],B,cols) ) = true
Alternative solution
%// given row indices
B = [2 3; 1 2; 1 3];
%// number if rows
r = 4; %// e.g. = max(B(:))
%// number if cols
c = 3; %// size(B,1)
%// preallocation of output matrix
A = zeros(r,c);
%// set values
A( bsxfun(#plus, B.', 0:r:(r*(c-1))) ) = 1;
Here's a way, using the sparse function:
A = full(sparse(cumsum(ones(size(B))), B, 1));
This gives
A =
0 1 1
1 1 0
1 0 1
If you need a predefined number of rows in the output, say r (in your example r = 4):
A = full(sparse(cumsum(ones(size(B))), B, 1, 4, size(B,1)));
which gives
A =
0 1 1
1 1 0
1 0 1
0 0 0
You can equivalently use the accumarrray function:
A = accumarray([repmat((1:size(B,1)).',size(B,2),1), B(:)], 1);
gives
A =
0 1 1
1 1 0
1 0 1
Or with a predefined number of rows, r = 4,
A = accumarray([repmat((1:size(B,1)).',size(B,2),1), B(:)], 1, [r size(B,1)]);
gives
A =
0 1 1
1 1 0
1 0 1
0 0 0

MATLAB - finding max/min in selected rows/columns of a matrix

if i have a matrix, say:
A = [ 0 2 4 0
2 0 5 0
4 5 0 3
0 0 3 0 ]
and i want to find the maximum value in the matrix i can type:
max(max(A))
or
max(A(:))
if i only want to find the maximum of rows 1 and 2 and columns 3 and 4 i can do this:
a = [1 2]
b = [3 4]
max(max(A(a,b))
but what if i want to find the indices of the rows and columns that correspond to that value?
according to the matlab documentation, if i am using the whole matrix i can use the ind2sub function:
[val,idx] = max(A(:))
[row,col] = ind2sub(size(A),idx)
but how can i get that working for my example where i am using vectors a and b to determine the rows and columns it finds the values over?
here is the only way i have been able to work it out so far:
max_val = 0;
max_idx = [1 1];
for ii = a
[val,idx] = max(A(ii,b))
if val > max_val
max_val = val
max_idx = [ii idx]
but that seems rather clunky to me.. any ideas?
Assuming that the submatrix A(a,b) is contiguous (like in your example):
A = [ 0 2 4 0
2 0 5 0
4 5 0 3
0 0 3 0 ]
a = [1 2]; b = [3 4];
B = A(a,b)
[val,idx] = max(B(:));
[row,col] = ind2sub(size(B),idx);
maxrow = row + a(1) - 1;
maxcol = col + b(1) - 1;
You are finding the relative index in the submatrix B. Which is equivalent to the additional rows and columns from the upper left corner of the submatrix.
Now assuming that a and b result in a set of rows and columns that are NOT a contiguous submatrix, e.g. a = [1 3], b = [3 4], the result is very similar. "row" and "col" are the index in the a and b vectors:
A = [ 0 2 4 0
2 0 5 0
4 5 0 3
0 0 3 0 ]
a = [1 3]; b = [3 4];
B = A(a,b)
[val,idx] = max(B(:));
[row,col] = ind2sub(size(B),idx);
maxrow = a(row);
maxcol = b(col);
Now you're working in the index of indexes.

MATLAB: Fastest Way to Count Unique # of 2 Number Combinations in a Vector of Integers

Given a vector of integers such as:
X = [1 2 3 4 5 1 2]
I would like to find a really fast way to count the number of unique combinations with 2-elements.
In this case the two-number combinations are:
[1 2] (occurs twice)
[2 3] (occurs once)
[3 4] (occurs once)
[4 5] (occurs once)
[5 1] (occurs once)
As it stands, I am currently doing this in MATLAB as follows
X = [1 2 3 4 5 1 2];
N = length(X)
X_max = max(X);
COUNTS = nan(X_max); %store as a X_max x X_max matrix
for i = 1:X_max
first_number_indices = find(X==1)
second_number_indices = first_number_indices + 1;
second_number_indices(second_number_indices>N) = [] %just in case last entry = 1
second_number_vals = X(second_number_indices);
for j = 1:X_max
COUNTS(i,j) = sum(second_number_vals==j)
end
end
Is there a faster/smarter way of doing this?
Here is a super fast way:
>> counts = sparse(x(1:end-1),x(2:end),1)
counts =
(5,1) 1
(1,2) 2
(2,3) 1
(3,4) 1
(4,5) 1
You could convert to a full matrix simply as: full(counts)
Here is an equivalent solution using accumarray:
>> counts = accumarray([x(1:end-1);x(2:end)]', 1)
counts =
0 2 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
1 0 0 0 0
EDIT: #Amro has provided a much better solution (well, better in the vast majority of cases, I suspect my method would work better if MaxX is very large and X contains zeros - this is because the presence of zeros will rule out the use of sparse while a large MaxX will slow down the accumarray approach as it creates a matrix of size MaxX by MaxX).
EDIT: Thanks to #EitanT for pointing out an improvement that can be made using accumarray.
Here is how I would solve it:
%Generate some random data
T = 20;
MaxX = 3;
X = randi(MaxX, T, 1);
%Get the unique combinations and an index. Note, I am assuming X is a column vector.
[UniqueComb, ~, Ind] = unique([X(1:end-1), X(2:end)], 'rows');
NumComb = size(UniqueComb, 1);
%Count the number of occurrences of each combination
Count = accumarray(Ind, 1);
All unique sequential two element combinations are now stored in UniqueComb, while the corresponding counts for each unique combination are stored in Count.