"symmetrical" rows detection in matlab - matlab

I have integer matrix A (nA x c) with even number of columns (e.g. mod(c,2) = 0) and unique rows.
How to effectively (by speed and memory optimized function symmetricRows) find the "symmetrical" rows of matrix A, iA1 and iA2, where "symmetric" rows iA1 and iA2 are defined as:
all(A(iA1,1:end/2) == A(iA2,end/2+1:end) & A(iA1,end/2+1:end) == A(iA2,1:end/2),2) = true
Example ():
A = [1 1 1 1;
2 2 2 2;
1 2 3 4;
4 3 2 1;
2 2 3 3;
3 4 1 2;
3 3 2 2]
[iA1, iA2] = symmetricRows(A)
iA1 =
1
2
3
5
iA2 =
1
2
6
7
Typical size of matrices A: nA ~ 1e4 to 1e6, c ~ 60 to 120
The problem is motivated by pre-processing of large dataset, where "symmetrical" rows are irrelevant from the point of user defined distance metric.
Example 2: to prepare larger test data set is possible to use this function and then, for example:
N = 10;
A = allcomb([1:N],[1:N],[1:N],[1:N]);
iA = symmetricRows(A)

If you have the Statistics Toolbox:
d = ~pdist2(A(:,1:end/2), A(:,end/2+1:end));
[iA1, iA2] = find(triu(d & d.'));

You could do this with implicit expansion to create a 3D matrix of comparisons, if you have enough memory.
AL = A(:,1:end/2);
AR = A(:,end/2+1:end);
AcompLR = squeeze( all( AL == reshape( AR.', 1, 2, [] ), 2 ) );
AcompRL = squeeze( all( reshape( AL.', 1, 2, [] ) == AR, 2 ) );
[iA(:,1), iA(:,2)] = find( AcompLR & AcompRL );
iA = unique( sort(iA,2), 'rows' );
This returns iA where column 1 is your iA1 and column 2 is your iA2.
Note that I needed the unique to avoid reversed matches i.e. [5,7]/[7,5]
I've not done any benchmarking, but this might be quicker than looping as it is all done in single operations. We could instead be clever about the indexing, and do only the necessary comparisons, this would save memory and a call to unique:
% Create row indices to cover all combinations of rows
rIdx = arrayfun( #(x) [ones(x,1)*x,(1:x).'], 1:size(A,1), 'uni', 0 );
rIdx = vertcat( rIdx{:} );
% Logical indexing comparisons
iA = rIdx( all( A( rIdx(:,1), 1:end/2 ) == A( rIdx(:,2), end/2+1:end ), 2 ) & ...
all( A( rIdx(:,2), 1:end/2 ) == A( rIdx(:,1), end/2+1:end ), 2 ), : );

Related

Merge array rows based on the first digit in a column

I have two arrays, A and B. The first digit of each row is the serial number.
How do I combine A and B to have an array C, such that all rows in A with the same serial number in B are concatenated horizontally?
A = [ 12345;
47542;
32673;
65436;
75343;
23496;
54765 ]
B = [ 23566;
33425;
65438;
75354 ]
y = ismember(A(:,1), B(:,1), 'rows');
t=find(y);
C= [A(t,1:12),B(t,1:12)];
I need C to be:
C = [ 12345, 00000;
23496, 23566;
32673, 33425;
47542, 00000;
54765, 00000;
65436, 00000;
75343, 75354]
My approach would be the following, extract the leading digits of both arrays and compare those:
a=num2str(A)-'0';
b=num2str(B)-'0';
[ida,idb]=ismember(a(:,1),b(:,1));
Now get the sorting index of A
[~,ids]=sort(a(:,1));
Create output array
C=zeros(size(A,1),2);
Finally assign and sort output
C(:,1)=A;
C(ida,2)=B(idb(idb>0));
%sort result
C=C(ids,:)
If it's only the first digit, we only need to check if the first digit (i.e. floor(A/1e4)) matches 0 to 9, and index accordingly...
% Add some zeros at the front to make indexing work with the unmatched ismember outputs
Az = [zero(; A]; Bz = [0; B];
% Find the indices for 0 to 9 within the first digits of A and B
[~,ia] = ismember( 0:9, floor( A/1e4 ) );
[~,ib] = ismember( 0:9, floor( B/1e4 ) );
% Assign to C and discard unmatched rows
C = [Az(ia+1), Bz(ib+1)];
C( all( C==0, 2 ), : ) = [];
Note that keeping things numeric with the floor operation should always be preferable to flipping between numeric and character data with things like num2str...
Edit
You changed the scope of the question by commenting with new data. Here is the same method, written to be more generic so it handles A and B with more columns and different magnitude IDs
% Add some zeros at the front to make indexing work with the unmatched ismember outputs
Az = [zeros(1,size(A,2)); A]; Bz = [zeros(1,size(A,2)); B];
% Function for getting first digit
f = #(x) floor(x./(10.^floor(log10(x))));
% Find the indices for 0 to 9 within the first digits of A and B
[~,ia] = ismember( 0:9, f(A(:,1)) );
[~,ib] = ismember( 0:9, f(B(:,1)) );
% Assign to C and discard unmatched rows
C = [Az(ia+1,:), Bz(ib+1,:)];
C( all( C==0, 2 ), : ) = [];
First of all, the whole script. At first glance, I couldn't find a solution without using loops.
A = [ 12345;
47542;
32673;
65436;
75343;
23496;
54765; ]
B = [ 23566;
33425;
65438;
75354; ]
A = sort(A); % Sort A and B.
B = sort(B);
A_str = int2str(A); % Convert integers to chars.
B_str = int2str(B);
A_sn = A_str(:, 1); % Extract first columns.
B_sn = B_str(:, 1); % Basically, these are the serial numbers.
C = zeros(size(A, 1), size(A, 2) * 2); % Initialize C.
C(:, 1) = A; % First column of C is just A.
for i = 1:length(A_sn) % For all serial numbers in A...
for j = 1:length(B_sn) % For all serial numbers in B...
if (A_sn(i) == B_sn(j)) % Check if serial number in B equals the serial number in A.
C(i, 2) = B(j); % If so, set i-th row in C to the corresponding value in B.
end
end
end
C
Results in:
A =
12345
47542
32673
65436
75343
23496
54765
B =
23566
33425
65438
75354
C =
12345 0
23496 23566
32673 33425
47542 0
54765 0
65436 65438
75343 75354

filling sparse matrices efficiently matlab

I am working with a sparse matrix of very large size:
U = sparse(a,b) % a and b are very large
On the hand, there exists the cell Ind which has 'a' rows. In each row, there exists a 'variate' number of elements, e.g. :
Ind{1} = [1 3 5 19 1000 1340]
Ind{2} = [9 100 1500 1600 8000 b]
...
Ind{a} = [3 5 6 90 1000 4300 5712 9480]
as could be seen the maximum index number in Ind{i} can be 'b'. For each of these index vector also exists a content matrix like 'c' :
c = [2 3 1 6 3 5 1 3 4 1 2 ... 5]
Here is the question, for each element in Ind{i}, I want to fill the 'row = i' and the 'col=Ind{i}' with c(Ind{i}), i.e.
for i = 1 : a
U(i,Ind{i}) = c(Ind{i}) ;
end
the problem is 'a' is very large and the loop takes long time to be computed. Any idea to avoid looping?
I'm not sure if there is a way to avoid the loop, but I do get a factor of 2-to-20 speed increase (I ranged a from 3 to 5,000 with b fixed at 10,000) by building three large vectors (two for row and column indices and one for values) and building the sparse matrix after the loop:
strides = cellfun(#numel,Ind);
n = sum(strides);
I(n,1) = 0;
J(n,1) = 0;
S(n,1) = 0;
bot = 1;
for k = 1:a
top = bot + strides(k) - 1 ;
mask = bot:top ;
%
I(mask) = k ;
J(mask) = Ind{k} ;
S(mask) = c(Ind{k}) ;
%
bot = top + 1;
end
U = sparse(I,J,S,a,b);
This is the recommend usage of sparse because assignments to a sparse matrix are more costly than regular arrays.

How to find the mapping after permutation of a 2-d matrix in Matlab

I have two 2-dimensional matrices A,B, where B is produced by a (row-wise) permutation of A. There are a few repetitive records in A (and so in B). I want to find the mapping that produced B. I am using Matlab. Only one solution is sufficient for me.
Example:
A = [ 2 3 4; 4 5 6; 2 3 4];
B = [ 4 5 6; 2 3 4; 2 3 4];
The mapping would be:
p = [3 1 2] // I want this mapping, however the solution p= [2 1 3] is also correct and acceptable
where A = B(p,:) in Matlab. // EDITED
Regards
low hanging fruits first.
Suppose there are no duplicate rows:
% compute the permutation matrix
P = all( bsxfun( #eq, permute( A, [1 3 2]),permute(B,[3 1 2]) ), 3 );
[~, p] = max(P, [], 2 ); % gives you what you want
If there are duplicates, we need to "break ties" in the rows/columns of P:
n = size(A,1);
bt = abs( bsxfun(#minus, 1:n, (1:n)' ) )/n; %//'
[~, p] = max( P+bt, [], 2 );
Since we know that A and B always have the same rows, let's look for a transformation that will convert each one to a common identical representation. How about sort?
[As, Ai] = sortrows(A);
[Bs, Bi] = sortrows(B);
Now A(Ai,:) == B(Bi,:), so all we have to do is find the indices for Bi that match Ai. Bi is a forward mapping, Ai is a reverse mapping. So:
p = zeros(size(A,1),1);
p(Ai) = Bi;
(Answer edited to match edit of problem statement)
Here is a solution using sort() to get around the problem of needing to generate all permutations.
The idea is to sort both A and B which will produce the same sorted matrix. The permutation can now be found by using the indices IA and IB that produce the two sorted matrices.
A = [ 2 3 4; 4 5 6; 2 3 4];
B = [ 4 5 6; 2 3 4; 2 3 4];
[CA,IA]=sort(A,1)
[CB,IB]=sort(B,1)
idxA = IA(:,1)
idxB = IB(:,1)
[~, idxB_inverse] = sort(idxB)
idxA(idxB_inverse)

Intersection indices by rows

Given these two matrices:
m1 = [ 1 1;
2 2;
3 3;
4 4;
5 5 ];
m2 = [ 4 2;
1 1;
4 4;
7 5 ];
I'm looking for a function, such as:
indices = GetIntersectionIndecies (m1,m2);
That the output of which will be
indices =
1
0
0
1
0
How can I find the intersection indices of rows between these two matrices without using a loop ?
One possible solution:
function [Index] = GetIntersectionIndicies(m1, m2)
[~, I1] = intersect(m1, m2, 'rows');
Index = zeros(size(m1, 1), 1);
Index(I1) = 1;
By the way, I love the inventive solution of #Shai, and it is much faster than my solution if your matrices are small. But if your matrices are large, then my solution will dominate. This is because if we set T = size(m1, 1), then the tmp variable in the answer of #Shai will be T*T, ie a very large matrix if T is large. Here's some code for a quick speed test:
%# Set parameters
T = 1000;
M = 10;
%# Build test matrices
m1 = randi(5, T, 2);
m2 = randi(5, T, 2);
%# My solution
tic
for m = 1:M
[~, I1] = intersect(m1, m2, 'rows');
Index = zeros(size(m1, 1), 1);
Index(I1) = 1;
end
toc
%# #Shai solution
tic
for m = 1:M
tmp = bsxfun( #eq, permute( m1, [ 1 3 2 ] ), permute( m2, [ 3 1 2 ] ) );
tmp = all( tmp, 3 ); % tmp(i,j) is true iff m1(i,:) == m2(j,:)
imdices = any( tmp, 2 );
end
toc
Set T = 10 and M = 1000, and we get:
Elapsed time is 0.404726 seconds. %# My solution
Elapsed time is 0.017669 seconds. %# #Shai solution
But set T = 1000 and M = 100 and we get:
Elapsed time is 0.068831 seconds. %# My solution
Elapsed time is 0.508370 seconds. %# #Shai solution
How about using bsxfun
function indices = GetIntersectionIndecies( m1, m2 )
tmp = bsxfun( #eq, permute( m1, [ 1 3 2 ] ), permute( m2, [ 3 1 2 ] ) );
tmp = all( tmp, 3 ); % tmp(i,j) is true iff m1(i,:) == m2(j,:)
indices = any( tmp, 2 );
end
Cheers!

how to repeat element matrix in matlab

How to repeat
A = [ 1 2 ;
3 4 ]
repeated by
B = [ 1 2 ;
2 1 ]
So I want my answer like matrix C:
C = [ 1 2 2;
3 3 4 ]
Thanks for your help.
Just for the fun of it, another solution making use of arrayfun:
res = cell2mat(arrayfun(#(a,b) ones(b,1).*a, A', B', 'uniformoutput', false))'
This results in:
res =
1 2 2
3 3 4
To make this simple, I assume that you're only going to add more columns, and that you've checked that you have the same number of columns for each row.
Then it becomes a simple combination of repeating elements and reshaping.
EDIT I've modified the code so that it also works if A and B are 3D arrays.
%# get the number of rows from A, transpose both
%# A and B so that linear indexing works
[nRowsA,~,nValsA] = size(A);
A = permute(A,[2 1 3]);
B = permute(B,[2 1 3]);
%# create an index vector from B
%# so that we know what to repeat
nRep = sum(B(:));
repIdx = zeros(1,nRep);
repIdxIdx = cumsum([1 B(1:end-1)]);
repIdx(repIdxIdx) = 1;
repIdx = cumsum(repIdx);
%# assemble the array C
C = A(repIdx);
C = permute(reshape(C,[],nRowsA,nValsA),[2 1 3]);
C =
1 2 2
3 3 4