Intersection indices by rows - matlab

Given these two matrices:
m1 = [ 1 1;
2 2;
3 3;
4 4;
5 5 ];
m2 = [ 4 2;
1 1;
4 4;
7 5 ];
I'm looking for a function, such as:
indices = GetIntersectionIndecies (m1,m2);
That the output of which will be
indices =
1
0
0
1
0
How can I find the intersection indices of rows between these two matrices without using a loop ?

One possible solution:
function [Index] = GetIntersectionIndicies(m1, m2)
[~, I1] = intersect(m1, m2, 'rows');
Index = zeros(size(m1, 1), 1);
Index(I1) = 1;
By the way, I love the inventive solution of #Shai, and it is much faster than my solution if your matrices are small. But if your matrices are large, then my solution will dominate. This is because if we set T = size(m1, 1), then the tmp variable in the answer of #Shai will be T*T, ie a very large matrix if T is large. Here's some code for a quick speed test:
%# Set parameters
T = 1000;
M = 10;
%# Build test matrices
m1 = randi(5, T, 2);
m2 = randi(5, T, 2);
%# My solution
tic
for m = 1:M
[~, I1] = intersect(m1, m2, 'rows');
Index = zeros(size(m1, 1), 1);
Index(I1) = 1;
end
toc
%# #Shai solution
tic
for m = 1:M
tmp = bsxfun( #eq, permute( m1, [ 1 3 2 ] ), permute( m2, [ 3 1 2 ] ) );
tmp = all( tmp, 3 ); % tmp(i,j) is true iff m1(i,:) == m2(j,:)
imdices = any( tmp, 2 );
end
toc
Set T = 10 and M = 1000, and we get:
Elapsed time is 0.404726 seconds. %# My solution
Elapsed time is 0.017669 seconds. %# #Shai solution
But set T = 1000 and M = 100 and we get:
Elapsed time is 0.068831 seconds. %# My solution
Elapsed time is 0.508370 seconds. %# #Shai solution

How about using bsxfun
function indices = GetIntersectionIndecies( m1, m2 )
tmp = bsxfun( #eq, permute( m1, [ 1 3 2 ] ), permute( m2, [ 3 1 2 ] ) );
tmp = all( tmp, 3 ); % tmp(i,j) is true iff m1(i,:) == m2(j,:)
indices = any( tmp, 2 );
end
Cheers!

Related

"symmetrical" rows detection in matlab

I have integer matrix A (nA x c) with even number of columns (e.g. mod(c,2) = 0) and unique rows.
How to effectively (by speed and memory optimized function symmetricRows) find the "symmetrical" rows of matrix A, iA1 and iA2, where "symmetric" rows iA1 and iA2 are defined as:
all(A(iA1,1:end/2) == A(iA2,end/2+1:end) & A(iA1,end/2+1:end) == A(iA2,1:end/2),2) = true
Example ():
A = [1 1 1 1;
2 2 2 2;
1 2 3 4;
4 3 2 1;
2 2 3 3;
3 4 1 2;
3 3 2 2]
[iA1, iA2] = symmetricRows(A)
iA1 =
1
2
3
5
iA2 =
1
2
6
7
Typical size of matrices A: nA ~ 1e4 to 1e6, c ~ 60 to 120
The problem is motivated by pre-processing of large dataset, where "symmetrical" rows are irrelevant from the point of user defined distance metric.
Example 2: to prepare larger test data set is possible to use this function and then, for example:
N = 10;
A = allcomb([1:N],[1:N],[1:N],[1:N]);
iA = symmetricRows(A)
If you have the Statistics Toolbox:
d = ~pdist2(A(:,1:end/2), A(:,end/2+1:end));
[iA1, iA2] = find(triu(d & d.'));
You could do this with implicit expansion to create a 3D matrix of comparisons, if you have enough memory.
AL = A(:,1:end/2);
AR = A(:,end/2+1:end);
AcompLR = squeeze( all( AL == reshape( AR.', 1, 2, [] ), 2 ) );
AcompRL = squeeze( all( reshape( AL.', 1, 2, [] ) == AR, 2 ) );
[iA(:,1), iA(:,2)] = find( AcompLR & AcompRL );
iA = unique( sort(iA,2), 'rows' );
This returns iA where column 1 is your iA1 and column 2 is your iA2.
Note that I needed the unique to avoid reversed matches i.e. [5,7]/[7,5]
I've not done any benchmarking, but this might be quicker than looping as it is all done in single operations. We could instead be clever about the indexing, and do only the necessary comparisons, this would save memory and a call to unique:
% Create row indices to cover all combinations of rows
rIdx = arrayfun( #(x) [ones(x,1)*x,(1:x).'], 1:size(A,1), 'uni', 0 );
rIdx = vertcat( rIdx{:} );
% Logical indexing comparisons
iA = rIdx( all( A( rIdx(:,1), 1:end/2 ) == A( rIdx(:,2), end/2+1:end ), 2 ) & ...
all( A( rIdx(:,2), 1:end/2 ) == A( rIdx(:,1), end/2+1:end ), 2 ), : );

Matlab Vectorization - none-zero matrix row indices to cell

I am working with Matlab.
I have a binary square matrix. For each row, there is one or more entries of 1. I want to go through each row of this matrix and return the index of those 1s and store them in the entry of a cell.
I was wondering if there is a way to do this without looping over all the rows of this matrix, as for loop is really slow in Matlab.
For example, my matrix
M = 0 1 0
1 0 1
1 1 1
Then eventually, I want something like
A = [2]
[1,3]
[1,2,3]
So A is a cell.
Is there a way to achieve this goal without using for loop, with the aim of calculating the result more quickly?
At the bottom of this answer is some benchmarking code, since you clarified that you're interested in performance rather than arbitrarily avoiding for loops.
In fact, I think for loops are probably the most performant option here. Since the "new" (2015b) JIT engine was introduced (source) for loops are not inherently slow - in fact they are optimised internally.
You can see from the benchmark that the mat2cell option offered by ThomasIsCoding here is very slow...
If we get rid of that line to make the scale clearer, then my splitapply method is fairly slow, obchardon's accumarray option is a bit better, but the fastest (and comparable) options are either using arrayfun (as also suggested by Thomas) or a for loop. Note that arrayfun is basically a for loop in disguise for most use-cases, so this isn't a surprising tie!
I would recommend you use a for loop for increased code readability and the best performance.
Edit:
If we assume that looping is the fastest approach, we can make some optimisations around the find command.
Specifically
Make M logical. As the below plot shows, this can be faster for relatively small M, but slower with the trade-off of type conversion for large M.
Use a logical M to index an array 1:size(M,2) instead of using find. This avoids the slowest part of the loop (the find command) and outweighs the type conversion overhead, making it the quickest option.
Here is my recommendation for best performance:
function A = f_forlooplogicalindexing( M )
M = logical(M);
k = 1:size(M,2);
N = size(M,1);
A = cell(N,1);
for r = 1:N
A{r} = k(M(r,:));
end
end
I've added this to the benchmark below, here is the comparison of loop-style approaches:
Benchmarking code:
rng(904); % Gives OP example for randi([0,1],3)
p = 2:12;
T = NaN( numel(p), 7 );
for ii = p
N = 2^ii;
M = randi([0,1],N);
fprintf( 'N = 2^%.0f = %.0f\n', log2(N), N );
f1 = #()f_arrayfun( M );
f2 = #()f_mat2cell( M );
f3 = #()f_accumarray( M );
f4 = #()f_splitapply( M );
f5 = #()f_forloop( M );
f6 = #()f_forlooplogical( M );
f7 = #()f_forlooplogicalindexing( M );
T(ii, 1) = timeit( f1 );
T(ii, 2) = timeit( f2 );
T(ii, 3) = timeit( f3 );
T(ii, 4) = timeit( f4 );
T(ii, 5) = timeit( f5 );
T(ii, 6) = timeit( f6 );
T(ii, 7) = timeit( f7 );
end
plot( (2.^p).', T(2:end,:) );
legend( {'arrayfun','mat2cell','accumarray','splitapply','for loop',...
'for loop logical', 'for loop logical + indexing'} );
grid on;
xlabel( 'N, where M = random N*N matrix of 1 or 0' );
ylabel( 'Execution time (s)' );
disp( 'Done' );
function A = f_arrayfun( M )
A = arrayfun(#(r) find(M(r,:)),1:size(M,1),'UniformOutput',false);
end
function A = f_mat2cell( M )
[i,j] = find(M.');
A = mat2cell(i,arrayfun(#(r) sum(j==r),min(j):max(j)));
end
function A = f_accumarray( M )
[val,ind] = ind2sub(size(M),find(M.'));
A = accumarray(ind,val,[],#(x) {x});
end
function A = f_splitapply( M )
[r,c] = find(M);
A = splitapply( #(x) {x}, c, r );
end
function A = f_forloop( M )
N = size(M,1);
A = cell(N,1);
for r = 1:N
A{r} = find(M(r,:));
end
end
function A = f_forlooplogical( M )
M = logical(M);
N = size(M,1);
A = cell(N,1);
for r = 1:N
A{r} = find(M(r,:));
end
end
function A = f_forlooplogicalindexing( M )
M = logical(M);
k = 1:size(M,2);
N = size(M,1);
A = cell(N,1);
for r = 1:N
A{r} = k(M(r,:));
end
end
You can try arrayfun like below, which sweep through rows of M
A = arrayfun(#(r) find(M(r,:)),1:size(M,1),'UniformOutput',false)
A =
{
[1,1] = 2
[1,2] =
1 3
[1,3] =
1 2 3
}
or (a slower approach by mat2cell)
[i,j] = find(M.');
A = mat2cell(i,arrayfun(#(r) sum(j==r),min(j):max(j)))
A =
{
[1,1] = 2
[2,1] =
1
3
[3,1] =
1
2
3
}
Using accumarray:
M = [0 1 0
1 0 1
1 1 1];
[val,ind] = find(M.');
A = accumarray(ind,val,[],#(x) {x});
You can use strfind :
A = strfind(cellstr(char(M)), char(1));
Edit: I added a benchmark, the results show that a for loop is more efficient than accumarray.
You can usefind and accumarray:
[c, r] = find(A');
C = accumarray(r, c, [], #(v) {v'});
The matrix is transposed (A') because find groups by column.
Example:
A = [1 0 0 1 0
0 1 0 0 0
0 0 1 1 0
1 0 1 0 1];
% Find nonzero rows and colums
[c, r] = find(A');
% Group row indices for each columns
C = accumarray(r, c, [], #(v) {v'});
% Display cell array contents
celldisp(C)
Output:
C{1} =
1 4
C{2} =
2
C{3} =
3 4
C{4} =
1 3 5
Benchmark:
m = 10000;
n = 10000;
A = randi([0 1], m,n);
disp('accumarray:')
tic
[c, r] = find(A');
C = accumarray(r, c, [], #(v) {v'});
toc
disp(' ')
disp('For loop:')
tic
C = cell([size(A,1) 1]);
for i = 1:size(A,1)
C{i} = find(A(i,:));
end
toc
Result:
accumarray:
Elapsed time is 2.407773 seconds.
For loop:
Elapsed time is 1.671387 seconds.
A for loop is more efficient than accumarray...

Matlab: Increment of matrix values with indices

I have a vector of indices and want to increase values in matrix in every index. For example:
ind = [1 2 2 5];
m = zeros(3);
m(ind) = m(ind) + 1;
The result is as follow:
m = [1 0 0
1 1 0
0 0 0]
But I need the results to be
m = [1 0 0
2 1 0
0 0 0]
The time complexity is very important to me, and I can't use for. Thanks.
Here's a way. I haven't timed it.
ind = [1 2 2 5];
N = 3;
m = full(reshape(sparse(ind, 1, 1, N^2, 1), N, N));
Equivalently, you can use
ind = [1 2 2 5];
N = 3;
m = reshape(accumarray(ind(:), 1, [N^2 1]), N, N);
or its variation (thanks to #beaker)
ind = [1 2 2 5];
N = 3;
m = zeros(N);
m(:) = accumarray(ind(:), 1, [N^2 1]);
This one is probably slower than the others:
ind = [1 2 2 5];
N = 3;
m = zeros(N);
[ii, ~, vv] = find(accumarray(ind(:), 1));
m(ii) = vv;
For a sorted array of indices, we can have a play of diff -
out = zeros(M,N); % Output array of size(M,N)
df = diff([0,ind,ind(end)+1]);
put_idx = diff(find(df)); % gets count of dups
out(ind(df(1:end-1)~=0)) = put_idx;
The basic idea being we count the duplicates along the length with diff. Those counts are the values to be assigned into the zeros array. The indices at which those values are to be assigned are simply the unique indices, which could be found out by looking for the start of each group of duplicate indices.
Benchmarking
Script to create sorted indices array (create_data.m) -
function ind = create_data(M,N, num_unq_ind, max_repeats)
unq_ind = unique(randi([1,M*N],1,num_unq_ind));
num_repeats = randi(max_repeats, [1,numel(unq_ind)]);
ind = repelem(unq_ind, num_repeats);
Benchmarking script (bench1.m) to test out various scenarios -
clear all; close all;
M = 5000; % Array size
N = 5000;
% Input params and setup input indices array (edited for various runs)
num_unq_ind = 100000;
max_repeats = 100;
ind = create_data(M,N, num_unq_ind, max_repeats);
num_iter = 100; % No. of iterations to have reliable benchmarking
disp('Input params :')
disp(['num_unq_ind = ' int2str(num_unq_ind)])
disp(['max_repeats = ' int2str(max_repeats)])
disp('------------------ Using diff ----------------')
tic
for i=1:num_iter
out = zeros(M,N);
df = diff([0,ind,ind(end)+1]);
put_idx = diff(find(df));
out(ind(df(1:end-1)~=0)) = put_idx;
end
toc
% Luis's soln
disp('------------------ Using accumaray ----------------')
tic
for i=1:num_iter
m = reshape(accumarray(ind(:), 1, [N^2 1]), N, N);
end
toc
Various scenario runs -
>> bench1
Input params :
num_unq_ind = 10000
max_repeats = 10
------------------ Using diff ----------------
Elapsed time is 0.948544 seconds.
------------------ Using accumaray ----------------
Elapsed time is 1.502658 seconds.
>> bench1
Input params :
num_unq_ind = 100000
max_repeats = 10
------------------ Using diff ----------------
Elapsed time is 1.784576 seconds.
------------------ Using accumaray ----------------
Elapsed time is 1.533280 seconds.
>> bench1
Input params :
num_unq_ind = 10000
max_repeats = 100
------------------ Using diff ----------------
Elapsed time is 1.315998 seconds.
------------------ Using accumaray ----------------
Elapsed time is 1.391323 seconds.
>> bench1
Input params :
num_unq_ind = 100000
max_repeats = 100
------------------ Using diff ----------------
Elapsed time is 6.180565 seconds.
------------------ Using accumaray ----------------
Elapsed time is 3.576154 seconds.
With less sparsey and more repeats, accumarray seems to be doing better.
You can use histcounts
n = 3;
m = reshape(histcounts(ind, [1:n^2 n^2]), n, n);

How should I average groups of rows in a matrix to produce a new, smaller matrix?

I have a very large matrix (216 rows, 31286 cols) of doubles. For reasons specific to the data, I want to average every 9 rows to produce one new row. So, the new matrix will have 216/9=24 rows.
I am a Matlab beginner so I was wondering if this solution I came up with can be improved upon. Basically, it loops over every group, sums up the rows, and then divides the new row by 9. Here's a simplified version of what I wrote:
matrix_avg = []
for group = 1:216/9
new_row = zeros(1, 31286);
idx_low = (group - 1) * 9 + 1;
idx_high = idx_low + 9 - 1;
% Add the 9 rows to new_row
for j = idx_low:idx_high
new_row = new_row + M(j,:);
end
% Compute the mean
new_row = new_row ./ 9
matrix_avg = [matrix_avg; new_row];
end
You can reshape your big matrix from 216 x 31286 to 9 x (216/9 * 31286).
Then you can use mean, which operates on each column. Since your matrix only has 9 rows per column, this takes the 9-row average.
Then you can just reshape your matrix back.
% generate big matrix
M = rand([216 31286]);
n = 9 % want 9-row average.
% reshape
tmp = reshape(M, [n prod(size(M))/n]);
% mean column-wise (and only 9 rows per col)
tmp = mean(tmp);
% reshape back
matrix_avg = reshape(tmp, [ size(M,1)/n size(M,2) ]);
In a one-liner (but why would you?):
matrix_avg = reshape(mean(reshape(M,[n prod(size(M))/n])), [size(M,1)/n size(M,2)]);
Note - this will have problems if the number of rows in M isn't exactly divisible by 9, but so will your original code.
I measured the 4 solutions and here are the results:
reshape: Elapsed time is 0.017242 seconds.
blockproc [9 31286]: Elapsed time is 0.242044 seconds.
blockproc [9 1]: Elapsed time is 44.477094 seconds.
accumarray: Elapsed time is 103.274071 seconds.
This is the code I used:
M = rand(216,31286);
fprintf('reshape: ');
tic;
n = 9;
matrix_avg1 = reshape(mean(reshape(M,[n prod(size(M))/n])), [size(M,1)/n size(M,2)]);
toc
fprintf('blockproc [9 31286]: ');
tic;
fun = #(block_struct) mean(block_struct.data);
matrix_avg2 = blockproc(M,[9 31286],fun);
toc
fprintf('blockproc [9 1]: ');
tic;
fun = #(block_struct) mean(block_struct.data);
matrix_avg3 = blockproc(M,[9 1],fun);
toc
fprintf('accumarray: ');
tic;
[nR,nC] = size(M);
n2average = 9;
[xx,yy] = ndgrid(1:nR,1:nC);
x = ceil(xx/n2average); %# makes xx 1 1 1 1 2 2 2 2 etc
matrix_avg4 = accumarray([xx(:),yy(:)],M(:),[],#mean);
toc
Here's an alternative based on accumarray. You create an array with row and column indices into matrix_avg that tells you which element in matrix_avg a given element in M contributes to, then you use accumarray to average the elements that contribute to the same element in matrix_avg. This solution works even if the number of rows in M is not divisible by 9.
M = rand(216,31286);
[nR,nC] = size(M);
n2average = 9;
[xx,yy] = ndgrid(1:nR,1:nC);
x = ceil(xx/n2average); %# makes xx 1 1 1 1 2 2 2 2 etc
matrix_avg = accumarray([xx(:),yy(:)],M(:),[],#mean);

concatenation of N^2 3x3 matrixes into a 3Nx3N matrix

I have N^2 matrixes.
Each one is a 3x3 matrix.
One way to concatenation them to a 3Nx3N matrix is to write
A(:,:,i)= # 3x3 matrix i=1:N^2
B=[A11 A12 ..A1N;A21 ...A2N;...]
But When N is large is a tedious work.
What do you offer?
Here's a really fast one-liner that only uses RESHAPE and PERMUTE:
B = reshape(permute(reshape(A,3,3*N,N),[2 1 3]),3*N,3*N).';
And a test:
>> N=2;
>> A = rand(3,3,N^2)
A(:,:,1) =
0.5909 0.6571 0.8082
0.7118 0.6090 0.7183
0.4694 0.9588 0.5582
A(:,:,2) =
0.1791 0.6844 0.6286
0.4164 0.4140 0.5833
0.1380 0.1099 0.8970
A(:,:,3) =
0.2232 0.2355 0.1214
0.1782 0.6873 0.3394
0.5645 0.4745 0.9763
A(:,:,4) =
0.5334 0.7559 0.9984
0.8454 0.7618 0.1065
0.0549 0.5029 0.3226
>> B = reshape(permute(reshape(A,3,3*N,N),[2 1 3]),3*N,3*N).'
B =
0.5909 0.6571 0.8082 0.1791 0.6844 0.6286
0.7118 0.6090 0.7183 0.4164 0.4140 0.5833
0.4694 0.9588 0.5582 0.1380 0.1099 0.8970
0.2232 0.2355 0.1214 0.5334 0.7559 0.9984
0.1782 0.6873 0.3394 0.8454 0.7618 0.1065
0.5645 0.4745 0.9763 0.0549 0.5029 0.3226
Try the following code:
N = 4;
A = rand(3,3,N^2); %# 3-by-3-by-N^2
c1 = squeeze( num2cell(A,[1 2]) );
c2 = cell(N,1);
for i=0:N-1
c2{i+1} = cat(2, c1{i*N+1:(i+1)*N});
end
B = cat(1, c2{:}); %# 3N-by-3N
Another possibility involving mat2cell and reshape
N = 2;
A = rand(3,3,N^2);
C = mat2cell(A,3,3,ones(N^2,1));
C = reshape(C,N,N)'; %'# make a N-by-N cell array and transpose
%# catenate into 3N-by-3N cell array
B = cell2mat(C);
Here's the same in one line if you like that better
B = cell2mat(reshape(mat2cell(A,2,2,ones(N^2,1)),N,N)');
For N=2
>> A = rand(3,3,N^2)
A(:,:,1) =
0.40181 0.12332 0.41727
0.075967 0.18391 0.049654
0.23992 0.23995 0.90272
A(:,:,2) =
0.94479 0.33772 0.1112
0.49086 0.90005 0.78025
0.48925 0.36925 0.38974
A(:,:,3) =
0.24169 0.13197 0.57521
0.40391 0.94205 0.05978
0.096455 0.95613 0.23478
A(:,:,4) =
0.35316 0.043024 0.73172
0.82119 0.16899 0.64775
0.015403 0.64912 0.45092
B =
0.40181 0.12332 0.41727 0.94479 0.33772 0.1112
0.075967 0.18391 0.049654 0.49086 0.90005 0.78025
0.23992 0.23995 0.90272 0.48925 0.36925 0.38974
0.24169 0.13197 0.57521 0.35316 0.043024 0.73172
0.40391 0.94205 0.05978 0.82119 0.16899 0.64775
0.096455 0.95613 0.23478 0.015403 0.64912 0.45092
Why not do the old fashioned pre-allocate and loop? Should be pretty fast.
N = 4;
A = rand(3,3,N^2); % Assuming column major order for Aij
8
B = zeros(3*N, 3*N);
for j = 1:N^2
ix = mod(j-1, N)*3 + 1;
iy = floor((j-1)/N)*3 + 1;
fprintf('%02d - %02d\n', ix, iy);
B(ix:ix+2, iy:iy+2) = A(:,:,j);
end
EDIT: For the speed junkies out here are the rankings:
N = 200;
A = rand(3,3,N^2); % test set
#gnovice solution: Elapsed time is 0.013069 seconds.
#Amro solution: Elapsed time is 0.203308 seconds.
#Rich C solution: Elapsed time is 0.887077 seconds.
#Jonas solution: Elapsed time is 7.065174 seconds.