Efficiently find Unique triplets of three char vectors in MATLAB - matlab

Given three arrays of char, say size(a) = [N,80], size(b) = [N,100]; size(c) = [N,10];
When N=5
a, b and c look something like,
ans =
5×80 char array
‘efawefref’
‘Afreafraef’
‘afeafefaef’
‘afeafeaffa’
‘afeafaefae’
I want to find the unique entries (not combinations), this is, the unique entries of x = [a, b, c]
Of course I can do unique([a, b, c]) but this is terrible slow for this data. N~1e7
Example,
a = [ 'timon ';
'simba ';
'nala ';
'timon ';
'mufasa'];
b = [ 'boar ';
'lion ';
'lionese';
'boar ';
'lion '];
c = [ 'chubby';
'small ';
'fat ';
'chubby';
'fit '];
unique([a,b,c],'rows')
ans =
4×19 char array
'mufasalion fit '
'nala lionesefat '
'simba lion small '
'timon boar chubby'
size(unique([a,b,c],'rows'),1)
ans =
4
IS there a smarter way to do this?
EDIT: results from answers
For entries of these sizes,
>> size(a)
ans =
11724952 76
>> size(b)
ans =
11724952 64
>> size(c)
ans =
11724952 6
Results
#myradio
>> tic, size(unique(horzcat(a,b,c),'rows')), toc
ans =
1038303 146
Elapsed time is 74.402044 seconds.
#gnovice 1
>> tic, size(unique(cellstr([a b c]))), toc
ans =
1038303 1
Elapsed time is 77.044463 seconds.
#gnovice 2
>> tic, map = containers.Map(cellstr([a b c]), ones(length(a), 1)); size(map.keys.'), toc
ans =
1038303 1
Elapsed time is 58.732947 seconds.
#Wolfie
>> tic, size(unique( [categorical(cellstr(a)),categorical(cellstr(b)),categorical(cellstr(c))], 'rows' )), toc
ans =
1038303 3
Elapsed time is 189.517131 seconds.
#obchardon
>> tic, x = primes(2000); a1 = prod(x(a+0),2); b1 = prod(x(b+0),2); c1 = prod(x(c+0),2); size(unique([a1,b1,c1],'rows')), toc
ans =
1038258 3
Elapsed time is 46.889431 seconds.
I am puzzled about this last one, I tried with other examples and it always gives a slightly lower value.

To mimic the larger set of data in the question, I created the following randomized character arrays using randi:
a = char(randi([65 90], [100 76])); % Generate 100 76-character arrays
a = a(randi([1 100], [11724952 1]), :); % Replicate rows: 11724952-by-76 result
b = char(randi([65 90], [100 64])); % Generate 100 64-character arrays
b = b(randi([1 100], [11724952 1]), :); % Replicate rows: 11724952-by-64 result
c = char(randi([65 90], [100 6])); % Generate 100 6-character arrays
c = c(randi([1 100], [11724952 1]), :); % Replicate rows: 11724952-by-6 result
With up to 100 unique strings in each of a, b, and c, this will yield close to 1,000,000 unique combinations when concatenated.
I then tested 3 solutions: the original using unique, a variant that converts the character array to a cell array of strings using cellstr to avoid using the 'rows' argument, and one using a containers.Map object. The last one feeds the strings as keys to the containers.Map class (with dummy associated values) and lets it create a map that will have only the unique strings as its keys, which you can then extract.
Since these tests took a minimum of 1 minute to run, it wasn't feasible to use the more accurate timing routine timeit (which runs the function many times over to get an average measurement). I therefore used tic/toc. Here are some typical results using version R2018a:
>> clear d
>> tic; d = unique(horzcat(a, b, c), 'rows'); toc
Elapsed time is 726.324408 seconds.
>> clear d
>> tic; d = unique(cellstr([a b c])); toc
Elapsed time is 99.312927 seconds.
>> clear d
>> tic; map = containers.Map(cellstr([a b c]), ones(size(a, 1), 1)); d = map.keys.'; toc
Elapsed time is 89.853430 seconds.
The two faster solutions typically averaged around the same, with the containers.Map being slightly faster on average. They are both much faster than using unique with the 'rows' argument, although this is in disagreement with the results in the post using version R2018b. Maybe unique had significant updates in the newer version, or maybe the specific content of the character arrays matters greatly (e.g. whether all strings repeat with roughly equal frequency, if the arrays are sorted versus unsorted, etc.).

Categorical arrays are often quicker for this sort of thing, as they are roughly treated as ordinals internally.
% Set up your example
a = [ 'timon '; 'simba '; 'nala '; 'timon '; 'mufasa'];
b = [ 'boar '; 'lion '; 'lionese'; 'boar '; 'lion '];
c = [ 'chubby'; 'small '; 'fat '; 'chubby'; 'fit '];
% Make the arrays larger and join into one big categorical array
k = [categorical(cellstr(a)),categorical(cellstr(b)),categorical(cellstr(c))];
% Get unique rows
u = unique( k, 'rows' );
We can make the categorical(cellstr(...)) look a bit cleaner if operating on lots of variables by using an anonymous function:
cc = #(x) categorical(cellstr(x));
u = unique( [cc(a), cc(b), cc(c)], 'rows' );
Edit: Not sure this actually shows a speed-up, the categorical call is really slow for large char arrays,my test was rubbish.

I don't know if unique work faster with integer. If this is the case we could use this code to eventually speed up the operation:
%get at least ~200 primes numbers
x = primes(2000);
%prime multiplication will give an unique integer (prime factorization theorem)
a1 = prod(x(a+0),2);
b1 = prod(x(b+0),2);
c1 = prod(x(c+0),2);
%Now we apply unique on integer instead of char
[~,ind] = unique([a1,b1,c1],'rows')
%Get the unique sentence.
r = [a(ind,:),b(ind,:),c(ind,:)]
Of course if N is too big the prime multiplication will give Inf.
EDIT:
As pointed out by #gnovice my hashing function is highly surjective (which can lead to collision).
So we can use another "hashing" function:
% each sentence are converted to an unique number
x = [a,b,c]+0*[10.^(0:18)].'
%get index
[~,ind] = unique(x)
%unique sentence:
r = [a(ind,:),b(ind,:),c(ind,:)]
This time there will be no collision, but again the sentences should be shorter than ~110 characters

Related

MATLAB: Block matrix multiplying without loops

I have a block matrix [A B C...] and a matrix D (all 2-dimensional). D has dimensions y-by-y, and A, B, C, etc are each z-by-y. Basically, what I want to compute is the matrix [D*(A'); D*(B'); D*(C');...], where X' refers to the transpose of X. However, I want to accomplish this without loops for speed considerations.
I have been playing with the reshape command for several hours now, and I know how to use it in other cases, but this use case is different from the other ones and I cannot figure it out. I also would like to avoid using multi-dimensional matrices if at all possible.
Honestly, a loop is probably the best way to do it. In my image-processing work I found a well-written loop that takes advantage of Matlab's JIT compiler is often faster than all the extra overhead of manipulating the data to be able to use a vectorised operation. A loop like this:
[m n] = size(A);
T = zeros(m, n);
AT = A';
for ii=1:m:n
T(:, ii:ii+m-1) = D * AT(ii:ii+m-1, :);
end
contains only built-in operators and the bare minimum of copying, and given the JIT is going to be hard to beat. Even if you want to factor in interpreter overhead it's still only a single statement with no functions to consider.
The "loop-free" version with extra faffing around and memory copying, is to split the matrix and iterate over the blocks with a hidden loop:
blksize = size(D, 1);
blkcnt = size(A, 2) / blksize;
blocks = mat2cell(A, blksize, repmat(blksize,1,blkcnt));
blocks = cellfun(#(x) D*x', blocks, 'UniformOutput', false);
T = cell2mat(blocks);
Of course, if you have access to the Image Processing Toolbox, you can also cheat horribly:
T = blockproc(A, size(D), #(x) D*x.data');
Prospective approach & Solution Code
Given:
M is the block matrix [A B C...], where each A, B, C etc. are of size z x y. Let the number of such matrices be num_mat for easy reference later on.
If those matrices are concatenated along the columns, then M would be of size z x num_mat*y.
D is the matrix to be multiplied with each of those matrices A, B, C etc. and is of size y x y.
Now, as stated in the problem, the output you are after is [D*(A'); D*(B'); D*(C');...], i.e. the multiplication results being concatenated along the rows.
If you are okay with those multiplication results to be concatenated along the columns instead i.e. [D*(A') D*(B') D*(C') ...],
you can achieve the same with some reshaping and then performing the
matrix multiplications for the entire M with D and thus have a vectorized no-loop approach. Thus, to get such a matrix multiplication result, you can do -
mults = D*reshape(permute(reshape(M,z,y,[]),[2 1 3]),y,[]);
But, if you HAVE to get an output with the multiplication results being concatenated along the rows, you need to do some more reshaping like so -
out = reshape(permute(reshape(mults,y,z,[]),[1 3 2]),[],z);
Benchmarking
This section covers benchmarking codes comparing the proposed vectorized approach against a naive JIT powered loopy approach to get the desired output. As discussed earlier, depending on how the output array must hold the multiplication results, you can have two cases.
Case I: Multiplication results concatenated along the columns
%// Define size paramters and then define random inputs with those
z = 500; y = 500; num_mat = 500;
M = rand(z,num_mat*y);
D = rand(y,y);
%// Warm up tic/toc.
for k = 1:100000
tic(); elapsed = toc();
end
disp('---------------------------- With loopy approach')
tic
out1 = zeros(z,y*num_mat);
for k1 = 1:y:y*num_mat
out1(:,k1:k1+y-1) = D*M(:,k1:k1+y-1).'; %//'
end
toc, clear out1 k1
disp('---------------------------- With proposed approach')
tic
mults = D*reshape(permute(reshape(M,z,y,[]),[2 1 3]),y,[]);
toc
Case II: Multiplication results concatenated along the rows
%// Define size paramters and then define random inputs with those
z = 500; y = 500; num_mat = 500;
M = rand(z,num_mat*y);
D = rand(y,y);
%// Warm up tic/toc.
for k = 1:100000
tic(); elapsed = toc();
end
disp('---------------------------- With loopy approach')
tic
out1 = zeros(y*num_mat,z);
for k1 = 1:y:y*num_mat
out1(k1:k1+y-1,:) = D*M(:,k1:k1+y-1).'; %//'
end
toc, clear out1 k1
disp('---------------------------- With proposed approach')
tic
mults = D*reshape(permute(reshape(M,z,y,[]),[2 1 3]),y,[]);
out2 = reshape(permute(reshape(mults,y,z,[]),[1 3 2]),[],z);
toc
Runtimes
Case I:
---------------------------- With loopy approach
Elapsed time is 3.889852 seconds.
---------------------------- With proposed approach
Elapsed time is 3.051376 seconds.
Case II:
---------------------------- With loopy approach
Elapsed time is 3.798058 seconds.
---------------------------- With proposed approach
Elapsed time is 3.292559 seconds.
Conclusions
The runtimes suggest about a good 25% speedup with the proposed vectorized approach! So, hopefully this works out for you!
If you want to get A, B, and C from a bigger matrix you can do this, assuming the bigger matrix is called X:
A = X(:,1:y)
B = X(:,y+1:2*y)
C = X(:,2*y+1:3*y)
If there are N such matrices, the best way is to use reshape like:
F = reshape(X, x,y,N)
Then use a loop to generate a new matrix I call it F1 as:
F1=[];
for n=1:N
F1 = [F1 F(:,:,n)'];
end
Then compute F2 as:
F2 = D*F1;
and finally get your result as:
R = reshape(F2,N*y,x)
Note: this for loop does not slow you down as it is just to reformat the matrix and the multiplication is done in matrix form.

Using find with a struct

I have a struct that holds thousands of samples of data. Each data point contains multiple objects. For example:
Structure(1).a = 7
Structure(1).b = 3
Structure(2).a = 2
Structure(2).b = 6
Structure(3).a = 1
Structure(3).b = 6
...
... (thousands more)
...
Structure(2345).a = 4
Structure(2345).b = 9
... and so on.
If I wanted to find the index number of all the '.b' objects containing the number 6, I would have expected the following function would do the trick:
find(Structure.b == 6)
... and I would expect the answer to contain '2' and '3' (for the input shown above).
However, this doesn't work. What is the correct syntax and/or could I be arranging my data in a more logical way in the first place?
The syntax Structure.b for an array of structs gives you a comma-separated list, so you'll have to concatenate them all (for instance, using brackets []) in order to obtain a vector:
find([Structure.b] == 6)
For the input shown above, the result is as expected:
ans =
2 3
As Jonas noted, this would work only if there are no fields containing empty matrices, because empty matrices will not be reflected in the concatenation result.
Handling structs with empty fields
If you suspect that these fields may contain empty matrices, either convert them to NaNs (if possible...) or consider using one of the safer solutions suggested by Rody.
In addition, I've thought of another interesting workaround for this using strings. We can concatenate everything into a delimited string to keep the information about empty fields, and then tokenize it back (this, in my humble opinion, is easier to be done in MATLAB than handle numerical values stored in cells).
Inspired by Jonas' comment, we can convert empty fields to NaNs like so:
str = sprintf('%f,', Structure.b)
B = textscan(str, '%f', 'delimiter', ',', 'EmptyValue', NaN)
and this allows you to apply find on the contents of B:
find(B{:} == 6)
ans =
2
3
Building on EitanT's answer with Jonas' comment, a safer way could be
>> S(1).a = 7;
S(1).b = 3;
S(2).a = 2;
S(2).b = 6;
S(3).a = 1;
S(3).b = [];
S(4).a = 1;
S(4).b = 6;
>> find( cellfun(#(x)isequal(x,6),{S.b}) )
ans =
2 4
It's probably not very fast though (compared to EitanT's version), so only use this when needed.
Another answer to this question! This time, we'll compare the performance of the following 4 methods:
My original method
EitanT's original method (which does not handle emtpies)
EitanT's improved method using strings
A new method: a simple for-loop
Another new method: a vectorized, emtpy-safe version
Test code:
% Set up test
N = 1e5;
S(N).b = [];
for ii = 1:N
S(ii).b = randi(6); end
% Rody Oldenhuis 1
tic
sol1 = find( cellfun(#(x)isequal(x,6),{S.b}) );
toc
% EitanT 1
tic
sol2 = find([S.b] == 6);
toc
% EitanT 2
tic
str = sprintf('%f,', S.b);
values = textscan(str, '%f', 'delimiter', ',', 'EmptyValue', NaN);
sol3 = find(values{:} == 6);
toc
% Rody Oldenhuis 2
tic
ids = false(N,1);
for ii = 1:N
ids(ii) = isequal(S(ii).b, 6);
end
sol4 = find(ids);
toc
% Rody Oldenhuis 3
tic
idx = false(size(S));
SS = {S.b};
inds = ~cellfun('isempty', SS);
idx(inds) = [SS{inds}]==6;
sol5 = find(idx);
toc
% make sure they are all equal
all(sol1(:)==sol2(:))
all(sol1(:)==sol3(:))
all(sol1(:)==sol4(:))
all(sol1(:)==sol5(:))
Results on my machine at work (AMD A6-3650 APU (4 cores), 4GB RAM, Windows 7 64 bit):
Elapsed time is 28.990076 seconds. % Rody Oldenhuis 1 (cellfun)
Elapsed time is 0.119165 seconds. % EitanT 1 (no empties)
Elapsed time is 22.430720 seconds. % EitanT 2 (string manipulation)
Elapsed time is 0.706631 seconds. % Rody Oldenhuis 2 (loop)
Elapsed time is 0.207165 seconds. % Rody Oldenhuis 3 (vectorized)
ans =
1
ans =
1
ans =
1
ans =
1
On my Homebox (AMD Phenom(tm) II X6 1100T (6 cores), 16GB RAM, Ubuntu64 12.10):
Elapsed time is 0.572098 seconds. % cellfun
Elapsed time is 0.119557 seconds. % no emtpties
Elapsed time is 0.220903 seconds. % string manipulation
Elapsed time is 0.107345 seconds. % loop
Elapsed time is 0.180842 seconds. % cellfun-with-string
Gotta love that JIT :)
and wow...anyone know why the two systems behave so differently?
Also, little known fact -- cellfun with one of the possible string arguments is incredibly fast (which goes to show how much overhead anonymous functions require...).
Still, if you can be absolutely sure there are no empties, go for EitanT's original answer; that's what Matlab is for. If you can't be sure, just go for the loop.

Finding the index of a specific value in a cell in MATLAB

I have a two dimensional cell where every element is either a) empty or b) a vector of varying length with values ranging from 0 to 2. I would like to get the indices of the cell elements where a certain value occurs or even better, the "complete" index of every occurrence of a certain value.
I'm currently working on an agent based model of disease spreading and this is done in order to find the positions of infected agents.
Thanks in advance.
Here's how I would do it:
% some example data
A = { [], [], [3 4 5]
[4 8 ], [], [0 2 3 0 1] };
p = 4; % value of interest
% Finding the indices:
% -------------------------
% use cellfun to find indices
I = cellfun(#(x) find(x==p), A, 'UniformOutput', false);
% check again for empties
% (just for consistency; you may skip this step)
I(cellfun('isempty', I)) = {[]};
Call this method1.
A loop is also possible:
I = cell(size(A));
for ii = 1:numel(I)
I{ii} = find(A{ii} == p);
end
I(cellfun('isempty',I)) = {[]};
Call this method2.
Comparing the two methods for speed like so:
tic; for ii = 1:1e3, [method1], end; toc
tic; for ii = 1:1e3, [method2], end; toc
gives
Elapsed time is 0.483969 seconds. % method1
Elapsed time is 0.047126 seconds. % method2
on Matlab R2010b/32bit w/ Intel Core i3-2310M#2.10GHz w/ Ubuntu 11.10/2.6.38-13. This is mostly due to JIT on loops (and how terribly cellfun and anonymous functions seem to be implemented, mumblemumble..)
Anyway, in short, use the loop: it's better readable, and an order of magnitude faster than the vectorized solution.

Use a vector to index a matrix without linear index

G'day,
I'm trying to find a way to use a vector of [x,y] points to index from a large matrix in MATLAB.
Usually, I would convert the subscript points to the linear index of the matrix.(for eg. Use a vector as an index to a matrix) However, the matrix is 4-dimensional, and I want to take all of the elements of the 3rd and 4th dimensions that have the same 1st and 2nd dimension. Let me hopefully demonstrate with an example:
Matrix = nan(4,4,2,2); % where the dimensions are (x,y,depth,time)
Matrix(1,2,:,:) = 999; % note that this value could change in depth (3rd dim) and time (4th time)
Matrix(3,4,:,:) = 888; % note that this value could change in depth (3rd dim) and time (4th time)
Matrix(4,4,:,:) = 124;
Now, I want to be able to index with the subscripts (1,2) and (3,4), etc and return not only the 999 and 888 which exist in Matrix(:,:,1,1) but the contents which exist at Matrix(:,:,1,2),Matrix(:,:,2,1) and Matrix(:,:,2,2), and so on (IRL, the dimensions of Matrix might be more like size(Matrix) = (300 250 30 200)
I don't want to use linear indices because I would like the results to be in a similar vector fashion. For example, I would like a result which is something like:
ans(time=1)
999 888 124
999 888 124
ans(time=2)
etc etc etc
etc etc etc
I'd also like to add that due to the size of the matrix I'm dealing with, speed is an issue here - thus why I'd like to use subscript indices to index to the data.
I should also mention that (unlike this question: Accessing values using subscripts without using sub2ind) since I want all the information stored in the extra dimensions, 3 and 4, of the i and jth indices, I don't think that a slightly faster version of sub2ind still would not cut it..
I can think of three ways to go about this
Simple loop
Just loop over all the 2D indices you have, and use colons to access the remaining dimensions:
for jj = 1:size(twoDinds,1)
M(twoDinds(jj,1),twoDinds(jj,2),:,:) = rand;
end
Vectorized calculation of Linear indices
Skip sub2ind and vectorize the computation of linear indices:
% generalized for arbitrary dimensions of M
sz = size(M);
nd = ndims(M);
arg = arrayfun(#(x)1:x, sz(3:nd), 'UniformOutput', false);
[argout{1:nd-2}] = ndgrid(arg{:});
argout = cellfun(...
#(x) repmat(x(:), size(twoDinds,1),1), ...
argout, 'Uniformoutput', false);
twoDinds = kron(twoDinds, ones(prod(sz(3:nd)),1));
% the linear indices
inds = twoDinds(:,1) + ([twoDinds(:,2) [argout{:}]]-1) * cumprod(sz(1:3)).';
Sub2ind
Just use the ready-made tool that ships with Matlab:
inds = sub2ind(size(M), twoDinds(:,1), twoDinds(:,2), argout{:});
Speed
So which one's the fastest? Let's find out:
clc
M = nan(4,4,2,2);
sz = size(M);
nd = ndims(M);
twoDinds = [...
1 2
4 3
3 4
4 4
2 1];
tic
for ii = 1:1e3
for jj = 1:size(twoDinds,1)
M(twoDinds(jj,1),twoDinds(jj,2),:,:) = rand;
end
end
toc
tic
twoDinds_prev = twoDinds;
for ii = 1:1e3
twoDinds = twoDinds_prev;
arg = arrayfun(#(x)1:x, sz(3:nd), 'UniformOutput', false);
[argout{1:nd-2}] = ndgrid(arg{:});
argout = cellfun(...
#(x) repmat(x(:), size(twoDinds,1),1), ...
argout, 'Uniformoutput', false);
twoDinds = kron(twoDinds, ones(prod(sz(3:nd)),1));
inds = twoDinds(:,1) + ([twoDinds(:,2) [argout{:}]]-1) * cumprod(sz(1:3)).';
M(inds) = rand;
end
toc
tic
for ii = 1:1e3
twoDinds = twoDinds_prev;
arg = arrayfun(#(x)1:x, sz(3:nd), 'UniformOutput', false);
[argout{1:nd-2}] = ndgrid(arg{:});
argout = cellfun(...
#(x) repmat(x(:), size(twoDinds,1),1), ...
argout, 'Uniformoutput', false);
twoDinds = kron(twoDinds, ones(prod(sz(3:nd)),1));
inds = sub2ind(size(M), twoDinds(:,1), twoDinds(:,2), argout{:});
M(inds) = rand;
end
toc
Results:
Elapsed time is 0.004778 seconds. % loop
Elapsed time is 0.807236 seconds. % vectorized linear inds
Elapsed time is 0.839970 seconds. % linear inds with sub2ind
Conclusion: use the loop.
Granted, the tests above are largely influenced by JIT's failure to compile the two last loops, and the non-specificity to 4D arrays (the last two method also work on ND arrays). Making a specialized version for 4D will undoubtedly be much faster.
Nevertheless, the indexing with simple loop is, well, simplest to do, easiest on the eyes and very fast too, thanks to JIT.
So, here is a possible answer... but it is messy. I suspect it would more computationally expensive then a more direct method... And this would definitely not be my preferred answer. It would be great if we could get the answer without any for loops!
Matrix = rand(100,200,30,400);
grabthese_x = (1 30 50 90);
grabthese_y = (61 9 180 189);
result=nan(size(length(grabthese_x),size(Matrix,3),size(Matrix,4));
for tt = 1:size(Matrix,4)
subset = squeeze(Matrix(grabthese_x,grabthese_y,:,tt));
for NN=1:size(Matrix,3)
result(:,NN,tt) = diag(subset(:,:,NN));
end
end
The resulting matrix, result should have size size(result) = (4 N tt).
I think this should work, even if Matrix isn't square. However, it is not ideal, as I said above.

How do I compare elements of one row with every other row in the same matrix

I have the matrix:
a = [ 1 2 3 4;
2 4 5 6;
4 6 8 9]
and I want to compare every row with every other two rows one by one. If they share the same key then the result will tell they have a common key.
Using #gnovice's idea of getting all combinations with nchoosek, I propose yet another two solutions:
one using ismember (as noted by #loren)
the other using bsxfun with the eq function handle
The only difference is that intersect sorts and keeps only the unique common keys.
a = randi(30, [100 20]);
%# a = sort(a,2);
comparisons = nchoosek(1:size(a,1),2);
N = size(comparisons,1);
keys1 = cell(N,1);
keys2 = cell(N,1);
keys3 = cell(N,1);
tic
for i=1:N
keys1{i} = intersect(a(comparisons(i,1),:),a(comparisons(i,2),:));
end
toc
tic
for i=1:N
query = a(comparisons(i,1),:);
set = a(comparisons(i,2),:);
keys2{i} = query( ismember(query, set) ); %# unique(...)
end
toc
tic
for i=1:N
query = a(comparisons(i,1),:);
set = a(comparisons(i,2),:)';
keys3{i} = query( any(bsxfun(#eq, query, set),1) ); %'# unique(...)
end
toc
... with the following time comparisons:
Elapsed time is 0.713333 seconds.
Elapsed time is 0.289812 seconds.
Elapsed time is 0.135602 seconds.
Note that even by sorting a beforehand and adding a call to unique inside the loops (commented parts), these two methods are still faster than intersect.
Here's one solution (which is generalizable to larger matrices than the sample in the question):
comparisons = nchoosek(1:size(a,1),2);
N = size(comparisons,1);
keys = cell(N,1);
for i = 1:N
keys{i} = intersect(a(comparisons(i,1),:),a(comparisons(i,2),:));
end
The function NCHOOSEK is used to generate all of the unique combinations of row comparisons. For the matrix a in your question, you will get comparisons = [1 2; 1 3; 2 3], meaning that we will need to compare rows 1 and 2, then 1 and 3, and finally 2 and 3. keys is a cell array that stores the results of each comparison. For each comparison, the function INTERSECT is used to find the common values (i.e. keys). For the matrix a given in the question, you will get keys = {[2 4], 4, [4 6]}.