in matlab, calculate mean in a part of one column where another column satisfies a condition

in matlab, calculate mean in a part of one column where another column satisfies a condition - matlab

I'm quite new to matlab, and I'm curious how to do this:
I have a rather large (27000x11) matrix, and the 8th column contains a number which changes sometimes but is constant for like 2000 rows (not necessarily consecutive).
I would like to calculate the mean of the entries in the 3rd column for those rows where the 8th column has the same value. This for each value of the 8th column.
I would also like to plot the 3rd column's means as a function of the 8th column's value but that I can do if I can get a new matrix (2x2) containing [mean_of_3rd,8th].
Ex: (smaller matrix for convenience)
1 2 3 4 5
3 7 5 3 2
1 3 2 5 3
4 5 7 5 8
2 4 7 4 4
Since the 4th column has the same value in row 1 and 5 I'd like to calculate the mean of 2 and 4 (the corresponding elements of column 2, italic bold) and put it in another matrix together with the 4th column's value. The same for 3 and 5 (bold) since the 4th column has the same value for these two.
3 4
4 5
and so on... is this possible in an easy way?

Use the all-mighty, underused accumarray :
This line gives you mean values of 4th column accumulated by 2nd column:
means = accumarray( A(:,4) ,A(:,2),[],#mean)
This line gives you number of element in each set:
count = accumarray( A(:,4) ,ones(size(A(:,4))))
Now if you want to filter only those that have at least one occurence:
>> filtered = means(count>1)
filtered =
3
4
This will work only for positive integers in the 4th column.
Another possibility for counting amount of elements in each set:
count = accumarray( A(:,4) ,A(:,4),[],#numel)

A slightly refined approach based on the ideas of Andrey and Rody. We can not use accumarray directly, since the data is real, not integer. But, we can use unique to find the indices of the repeating entries. Then we operate on integers.
% get unique entries in 4th column
[R, I, J] = unique(A(:,4));
% count the repeating entries: now we have integer indices!
counts = accumarray(J, 1, size(R));
% sum the 2nd column for all entries
sums = accumarray(J, A(:,2), size(R));
% compute means
means = sums./counts;
% choose only the entries that show more than once in 4th column
inds = counts>1;
result = [means(inds) R(inds)];
Time comparison for the following synthetic data:
A=randi(100, 1000000, 5);
% Rody's solution
Elapsed time is 0.448222 seconds.
% The above code
Elapsed time is 0.148304 seconds.

My official answer:
A4 = A(:,4);
R = unique(A4);
means = zeros(size(R));
inds = false(size(R));
for jj = 1:numel(R)
I = A4==R(jj);
sumI = sum(I);
inds(jj) = sumI>1;
means(jj) = sum(A(I,2))/sumI;
end
result = [means(inds) R(inds)];
This is because of the following. Here's all of the alternatives we've come up with, in profiling form:
%# sample data
A = [
1 2 3 4 5
3 7 5 3 2
1 3 2 5 3
4 5 7 5 8
2 4 7 4 4];
%# accumarray
%# works only on positive integers in A(:,4)
tic
for ii = 1:1e4
means = accumarray( A(:,4) ,A(:,2),[],#mean);
count = accumarray( A(:,4) ,ones(size(A(:,4))));
filtered = means(count>1);
end
toc
%# arrayfun
%# works only on integers in A(:,4)
tic
for ii = 1:1e4
B = arrayfun(#(x) A(A(:,4)==x, 2), min(A(:,4)):max(A(:,4)), 'uniformoutput', false);
filtered = cellfun(#mean, B(cellfun(#(x) numel(x)>1, B)) );
end
toc
%# ordinary loop
%# works only on integers in A(:,4)
tic
for ii = 1:1e4
A4 = A(:,4);
R = min(A4):max(A4);
means = zeros(size(R));
inds = false(size(R));
for jj = 1:numel(R)
I = A4==R(jj);
sumI = sum(I);
inds(jj) = sumI>1;
means(jj) = sum(A(I,2))/sumI;
end
filtered = means(inds);
end
toc
Results:
Elapsed time is 1.238352 seconds. %# (accumarray)
Elapsed time is 7.208585 seconds. %# (arrayfun + cellfun)
Elapsed time is 0.225792 seconds. %# (for loop)
The ordinary loop is clearly the way to go here.
Note the absence of mean in the inner loop. This is because mean is not a Matlab builtin function (at least, on R2010), so that using it inside the loop makes the loop unqualified for JIT compilation, which slows it down by a factor of over 10. Using the form above accelerates the loop to almost 5.5 times the speed of the accumarray solution.
Judging on your comment, it is almost trivial to change the loop to work on all entries in A(:,4) (not just the integers):
A4 = A(:,4);
R = unique(A4);
means = zeros(size(R));
inds = false(size(R));
for jj = 1:numel(A4)
I = A4==R(jj);
sumI = sum(I);
inds(jj) = sumI>1;
means(jj) = sum(A(I,2))/sumI;
end
filtered = means(inds);
Which I will copy-paste to the top as my official answer :)

Related

How to get all the possible combinations of elements in a matrix, but don't allow exchange of elements inbetween columns?

Lets say I have this matrice A: [3 x 4]
1 4 7 10
2 5 8 11
3 6 9 12
I want to permute the element of in each column, but they can't change to a different column, so 1 2 3 need to always be part of the first column. So for exemple I want:
3 4 8 10
1 5 7 11
2 6 9 12
3 4 8 11
1 6 7 10
2 5 9 12
1 6 9 11
. . . .
So in one matrix I would like to have all the possible permutation, in this case, there are 3 different choices 3x3x3x3=81possibilities.So my result matrixe should be 81x4, because I only need each time one [1x4]line vector answer, and that 81 time.
An other way to as the question would be (for the same end for me), would be, if I have 4 column vector:
a=[1;2;3]
b=[4;5;6]
c=[7;8;9]
d=[10;11;12;13]
Compare to my previous exemple, each column vector can have a different number of row. Then is like I have 4 boxes, A, B C, D and I can only put one element of a in A, b in B and so on; so I would like to get all the permutation possible with the answer [A B C D] beeing a [1x4] row, and in this case, I would have 3x3x3x4=108 different row. So where I have been missunderstood (my fault), is that I don't want all the different [3x4] matrix answers but just [1x4]lines.
so in this case the answer would be:
1 4 7 10
and 1 4 7 11
and 1 4 7 12
and 1 4 7 13
and 2 4 8 10
and ...
until there are the 108 combinations
The fonction perms in Matlab can't do that since I don't want to permute all the matrix (and btw, this is already a too big matrix to do so).
So do you have any idea how I could do this or is there is a fonction which can do that? I, off course, also could have matrix which have different size. Thank you

Basically you want to get all combinations of 4x the permutations of 1:3.
You could generate these with combvec from the Neural Networks Toolbox (like #brainkz did), or with permn from the File Exchange.
After that it's a matter of managing indices, applying sub2ind (with the correct column index) and rearranging until everything is in the order you want.
a = [1 4 7 10
2 5 8 11
3 6 9 12];
siz = size(a);
perm1 = perms(1:siz(1));
Nperm1 = size(perm1,1); % = factorial(siz(1))
perm2 = permn(1:Nperm1, siz(2) );
Nperm2 = size(perm2,1);
permidx = reshape(perm1(perm2,:)', [Nperm2 siz(1), siz(2)]); % reshape unnecessary, easier for debugging
col_base_idx = 1:siz(2);
col_idx = col_base_idx(ones(Nperm2*siz(1) ,1),:);
lin_idx = reshape(sub2ind(size(a), permidx(:), col_idx(:)), [Nperm2*siz(1) siz(2)]);
result = a(lin_idx);
This avoids any loops or cell concatenation and uses straigh indexing instead.
Permutations per column, unique rows
Same method:
siz = size(a);
permidx = permn(1:siz(1), siz(2) );
Npermidx = size(permidx, 1);
col_base_idx = 1:siz(2);
col_idx = col_base_idx(ones(Npermidx, 1),:);
lin_idx = reshape(sub2ind(size(a), permidx(:), col_idx(:)), [Npermidx siz(2)]);
result = a(lin_idx);

Your question appeared to be a very interesting brain-teaser. I suggest the following:
in = [1,2,3;4,5,6;7,8,9;10,11,12]';
b = perms(1:3);
a = 1:size(b,1);
c = combvec(a,a,a,a);
for k = 1:length(c(1,:))
out{k} = [in(b(c(1,k),:),1),in(b(c(2,k),:),2),in(b(c(3,k),:),3),in(b(c(4,k),:),4)];
end
%and if you want your result as an ordinary array:
out = vertcat(out{:});
b is a 6x3 array that contains all possible permutations of [1,2,3]. c is 4x1296 array that contains all possible combinations of elements in a = 1:6. In the for loop we use number from 1 to 6 to get the permutation in b, and that permutation is used as indices to the column.
Hope that helps

this is another octave friendly solution:
function result = Tuples(A)
[P,n]= size(A);
M = reshape(repmat(1:P, 1, P ^(n-1)), repmat(P, 1, n));
result = zeros(P^ n, n);
for i = 1:n
result(:, i) = A(reshape(permute(M, circshift((1:n)', i)), P ^ n, 1), i);
end
end
%%%example
A = [...
1 4 7 10;...
2 5 8 11;...
3 6 9 12];
result = Tuples(A)
Update:
Question updated that: given n vectors of different length generates a list of all possible tuples whose ith element is from vector i:
function result = Tuples( A)
if exist('repelem') ==0
repelem = #(v,n) repelems(v,[1:numel(v);n]);
end
n = numel(A);
siz = [ cell2mat(cellfun(#numel, A , 'UniformOutput', false))];
tot_prd = prod(siz);
cum_prd=cumprod(siz);
tot_cum = tot_prd ./ cum_prd;
cum_siz = cum_prd ./ siz;
result = zeros(tot_prd, n);
for i = 1: n
result(:, i) = repmat(repelem(A{i},repmat(tot_cum(i),1,siz(i))) ,1,cum_siz(i));
end
end
%%%%example
a = {...
[1;2;3],...
[4;5;6],...
[7;8;9],...
[10;11;12;13]...
};
result =Tuples(a)

This is a little complicated but it works without the need for any additional toolboxes:
You basically want a b element 'truth table' which you can generate like this (adapted from here) if you were applying it to each element:
[b, n] = size(A)
truthtable = dec2base(0:power(b,n)-1, b) - '0'
Now you need to convert the truth table to linear indexes by adding the column number times the total number of rows:
idx = bsxfun(#plus, b*(0:n-1)+1, truthtable)
now you instead of applying this truth table to each element you actually want to apply it to each permutation. There are 6 permutations so b becomes 6. The trick is to then create a 6-by-1 cell array where each element has a distinct permutation of [1,2,3] and then apply the truth table idea to that:
[m,n] = size(A);
b = factorial(m);
permutations = reshape(perms(1:m)',[],1);
permCell = mat2cell(permutations,ones(b,1)*m,1);
truthtable = dec2base(0:power(b,n)-1, b) - '0';
expandedTT = cell2mat(permCell(truthtable + 1));
idx = bsxfun(#plus, m*(0:n-1), expandedTT);
A(idx)

Another answer. Rather specific just to demonstrate the concept, but can easily be adapted.
A = [1,4,7,10;2,5,8,11;3,6,9,12];
P = perms(1:3)'
[X,Y,Z,W] = ndgrid(1:6,1:6,1:6,1:6);
You now have 1296 permutations. If you wanted to access, say, the 400th one:
Permutation_within_column = [P(:,X(400)), P(:,Y(400)), P(:,Z(400)), P(:,W(400))];
ColumnOffset = repmat([0:3]*3,[3,1])
My_permutation = Permutation_within_column + ColumnOffset; % results in valid linear indices
A(My_permutation)
This approach allows you to obtain the 400th permutation on demand; if you prefer to have all possible permutations concatenated in the 3rd dimension, (i.e. a 3x4x1296 matrix), you can either do this with a for loop, or simply adapt the above and vectorise; for example, if you wanted to create a 3x4x2 matrix holding the first two permutations along the 3rd dimension:
Permutations_within_columns = reshape(P(:,X(1:2)),3,1,[]);
Permutations_within_columns = cat(2, Permutations_within_columns, reshape(P(:,Y(1:2)),3,1,[]));
Permutations_within_columns = cat(2, Permutations_within_columns, reshape(P(:,Z(1:2)),3,1,[]));
Permutations_within_columns = cat(2, Permutations_within_columns, reshape(P(:,W(1:2)),3,1,[]));
ColumnOffsets = repmat([0:3]*3,[3,1,2]);
My_permutations = Permutations_within_columns + ColumnOffsets;
A(My_permutations)
This approach enables you to collect a specific subrange, which may be useful if available memory is a concern (i.e. for larger matrices) and you'd prefer to perform your operations by blocks. If memory isn't a concern you can get all 1296 permutations at once in one giant matrix if you wish; just adapt as appropriate (e.g. replicate ColumnOffsets the right number of times in the 3rd dimension)

Shifting repeating rows to a new column in a matrix

I am working with a n x 1 matrix, A, that has repeating values inside it:
A = [0;1;2;3;4; 0;1;2;3;4; 0;1;2;3;4; 0;1;2;3;4]
which correspond to an n x 1 matrix of B values:
B = [2;4;6;8;10; 3;5;7;9;11; 4;6;8;10;12; 5;7;9;11;13]
I am attempting to produce a generalised code to place each repetition into a separate column and store it into Aa and Bb, e.g.:
Aa = [0 0 0 0 Bb = [2 3 4 5
1 1 1 1 4 5 6 7
2 2 2 2 6 7 8 9
3 3 3 3 8 9 10 11
4 4 4 4] 10 11 12 13]
Essentially, each repetition from A and B needs to be copied into the next column and then deleted from the first column
So far I have managed to identify how many repetitions there are and copy the entire column over to the next column and then the next for the amount of repetitions there are but my method doesn't shift the matrix rows to columns as such.
clc;clf;close all
A = [0;1;2;3;4;0;1;2;3;4;0;1;2;3;4;0;1;2;3;4];
B = [2;4;6;8;10;3;5;7;9;11;4;6;8;10;12;5;7;9;11;13];
desiredCol = 1; %next column to go to
destinationCol = 0; %column to start on
n = length(A);
for i = 2:1:n-1
if A == 0;
A = [ A(:, 1:destinationCol)...
A(:, desiredCol+1:destinationCol)...
A(:, desiredCol)...
A(:, destinationCol+1:end) ];
end
end
A = [...] retrieved from Move a set of N-rows to another column in MATLAB
Any hints would be much appreciated. If you need further explanation, let me know!
Thanks!

Given our discussion in the comments, all you need is to use reshape which converts a matrix of known dimensions into an output matrix with specified dimensions provided that the number of elements match. You wish to transform a vector which has a set amount of repeating patterns into a matrix where each column has one of these repeating instances. reshape creates a matrix in column-major order where values are sampled column-wise and the matrix is populated this way. This is perfect for your situation.
Assuming that you already know how many "repeats" you're expecting, we call this An, you simply need to reshape your vector so that it has T = n / An rows where n is the length of the vector. Something like this will work.
n = numel(A); T = n / An;
Aa = reshape(A, T, []);
Bb = reshape(B, T, []);
The third parameter has empty braces and this tells MATLAB to infer how many columns there will be given that there are T rows. Technically, this would simply be An columns but it's nice to show you how flexible MATLAB can be.

If you say you already know the repeated subvector, and the number of times it repeats then it is relatively straight forward:
First make your new A matrix with the repmat function.
Then remap your B vector to the same size as you new A matrix
% Given that you already have the repeated subvector Asub, and the number
% of times it repeats; An:
Asub = [0;1;2;3;4];
An = 4;
lengthAsub = length(Asub);
Anew = repmat(Asub, [1,An]);
% If you can assume that the number of elements in B is equal to the number
% of elements in A:
numberColumns = size(Anew, 2);
newB = zeros(size(Anew));
for i = 1:numberColumns
indexStart = (i-1) * lengthAsub + 1;
indexEnd = indexStart + An;
newB(:,i) = B(indexStart:indexEnd);
end
If you don't know what is in your original A vector, but you do know it is repetitive, if you assume that the pattern has no repeats you can use the find function to find when the first element is repeated:
lengthAsub = find(A(2:end) == A(1), 1);
Asub = A(1:lengthAsub);
An = length(A) / lengthAsub
Hopefully this fits in with your data: the only reason it would not is if your subvector within A is a pattern which does not have unique numbers, such as:
A = [0;1;2;3;2;1;0; 0;1;2;3;2;1;0; 0;1;2;3;2;1;0; 0;1;2;3;2;1;0;]
It is worth noting that from the above intuitively you would have lengthAsub = find(A(2:end) == A(1), 1) - 1;, But this is not necessary because you are already effectively taking the one off by only looking in the matrix A(2:end).

How to vectorize searching function in Matlab?

Here is a Matlab coding problem (A little different version with intersect not setdiff here:
a rating matrix A with 3 cols, the 1st col is user'ID which maybe duplicated, 2nd col is the item'ID which maybe duplicated, 3rd col is rating from user to item, ranging from 1 to 5.
Now, I have a subset of user IDs smallUserIDList and a subset of item IDs smallItemIDList, then I want to find the rows in A that rated by users in smallUserIDList, and collect the items that user rated, and do some calculations, such as setdiff with smallItemIDList and count the result, as the following code does:
userStat = zeros(length(smallUserIDList), 1);
for i = 1:length(smallUserIDList)
A2= A(A(:,1) == smallUserIDList(i), :);
itemIDList_each = unique(A2(:,2));
setDiff = setdiff(itemIDList_each , smallItemIDList);
userStat(i) = length(setDiff);
end
userStat
Finally, I find the profile viewer showing that the loop above is inefficient, the question is how to improve this piece of code with vectorization but the help of for loop?
For example:
Input:
A = [
1 11 1
2 22 2
2 66 4
4 44 5
6 66 5
7 11 5
7 77 5
8 11 2
8 22 3
8 44 3
8 66 4
8 77 5
]
smallUserIDList = [1 2 7 8]
smallItemIDList = [11 22 33 55 77]
Output:
userStat =
0
1
0
2

Vanilla MATLAB:
As far as I can tell your code is equivalent to:
%// Create matrix such that: user_item_rating(user,item)==rating
user_item_rating = sparse(A(:,1),A(:,2),A(:,3));
%// Keep all BUT the items in smallItemIDList
user_item_rating(:,smallItemIDList) = [];
%// Keep only those users in `smallUserIDList` and use order of this list
user_item_rating = user_item_rating(smallUserIDList,:);
%// Count the number of ratings
userStat = sum(user_item_rating~=0, 2);
This will work if there is at most one rating per (user,item)-combination. Also it should be quite efficient.
Clean approach without reinventing the wheel:
Check out grpstats from the Statistics Toolbox!
An implementation could look similar to this:
%// Create ratings table
ratings = array2table(A, 'VariableNames', {'user','item','rating'});
%// Remove items we don't care about (smallItemIDList)
ratings = ratings(~ismember(ratings.item, smallItemIDList),:);
%// Keep only users we care about (smallUserIDList)
ratings = ratings(ismember(ratings.user, smallUserIDList),:);
%// Compute the statistics grouped by 'user'.
userStat = grpstats(ratings, 'user');

This could be one vectorized approach -
%// Take care of equality between first column of A and smallUserIDList to
%// find the matching row and column indices.
%// NOTE: This corresponds to "A(:,1) == smallUserIDList(i)" from OP.
[R,C] = find(bsxfun(#eq,A(:,1),smallUserIDList.')); %//'
%// Take care of non-equality between second column of A and smallItemIDList.
%// NOTE: This corresponds to SETDIFF in the original loopy code from OP.
mask1 = ~ismember(A(R,2),smallItemIDList);
AR2 = A(R,2); %// Elements from 2nd col of A that has matches from first step
%// Get only those elements from C and AR2 that has ONES in mask1
C1 = C(mask1);
AR2 = AR2(mask1);
%// Initialized output array
userStat = zeros(numel(smallUserIDList),1);
if ~isempty(C1)%//There is at least one element in C, so do further processing
%// Find the count of duplicate elements for each ID in C1 indexed into AR2.
%// NOTE: This corresponds to "unique(A2(:,2))" from OP.
dup_counts = accumarray(C1,AR2,[],#(x) numel(x)-numel(unique(x)));
%// Get the count of matches for each ID in C in the mask1.
%// NOTE: This corresponds to:
%// "length(setdiff(itemIDList_each , smallItemIDList))" from OP.
accums = accumarray(C,mask1);
%// Store the counts in output array and also subtract the dup counts
userStat(1:numel(accums)) = accums;
userStat(1:numel(dup_counts)) = userStat(1:numel(dup_counts)) - dup_counts;
end
Benchmarking
The code listed next compares runtimes for proposed approach against the original loopy code -
%// Size parameters and random inputs with them
A_nrows = 5000;
IDlist_len = 5000;
max_userID = 1000;
max_itemID = 1000;
A = [randi(max_userID,A_nrows,1) randi(max_itemID,A_nrows,1) randi(5,A_nrows,2)];
smallUserIDList = randi(max_userID,IDlist_len,1);
smallItemIDList = randi(max_itemID,IDlist_len,1);
disp('---------------------------- With Original Approach')
tic
%// Original posted code
toc
disp('---------------------------- With Proposed Approach'))
tic
%// Proposed approach code
toc
The runtimes thus obtained with three sets of datasizes were -
Case #1:
A_nrows = 500;
IDlist_len = 500;
max_userID = 100;
max_itemID = 100;
---------------------------- With Original Approach
Elapsed time is 0.136630 seconds.
---------------------------- With Proposed Approach
Elapsed time is 0.004163 seconds.
Case #2:
A_nrows = 5000;
IDlist_len = 5000;
max_userID = 100;
max_itemID = 100;
---------------------------- With Original Approach
Elapsed time is 1.579468 seconds.
---------------------------- With Proposed Approach
Elapsed time is 0.050498 seconds.
Case #3:
A_nrows = 5000;
IDlist_len = 5000;
max_userID = 1000;
max_itemID = 1000;
---------------------------- With Original Approach
Elapsed time is 1.252294 seconds.
---------------------------- With Proposed Approach
Elapsed time is 0.044198 seconds.
Conclusion: The speedups with the proposed approach over the original loopy code thus seem to be huge!!

I think you are trying to remove a fixed set of ratings for a subset of users and count the number of remaining ratings:
Does the following work:
Asub = A(ismember(A(:,1), smallUserIDList),1:2);
Bremove = allcomb(smallUserIDList, smallItemIDList);
Akeep = setdiff(Asub, Bremove, 'rows');
T = varfun(#sum, array2table(Akeep), 'InputVariables', 'Akeep2', 'GroupingVariables', 'Akeep1');
% userStat = T.GroupCount;
you need the allcomb function from the file exchange from matlab central, it gives a cartesian product of two vectors, and is easy to implement anyway.

Finding maxima in 2D matrix along certain dimension with indices

I have a <206x193> matrix A. It contains the values of a parameter at 206 different locations at 193 time steps. I am interested in the maximum value at each location over all times as well as the corresponding indices. I have another matrix B with the same dimensions of A and I'm interested in values for each location at the time that A's value at that location was maximal.
I've tried [max_val pos] = max(A,[],2), which gives the right maximum values, but A(pos) does not equal max_val.
How exactly does this function work?
I tried a smaller example as well. Still I don't understand the meaning of the indices....
>> H
H(:,:,1) =
1 2
3 4
H(:,:,2) =
5 6
7 8
>> [val pos] = max(H,[],2)
val(:,:,1) =
2
4
val(:,:,2) =
6
8
pos(:,:,1) =
2
2
pos(:,:,2) =
2
2

The indices in idx represent the index of the max value in the corresponding row. You can use sub2ind to create a linear index if you want to test if A(pos)=max_val
A=rand(206, 193);
[max_val, idx]=max(A, [], 2);
A_max=A(sub2ind(size(A), (1:size(A,1))', idx));
Similarly, you can access the values of B with:
B_Amax=B(sub2ind(size(A), (1:size(A,1))', idx));
From your example:
H(:,:,2) =
5 6
7 8
[val pos] = max(H,[],2)
val(:,:,2) =
6
8
pos(:,:,2) =
2
2
The reason why pos(:,:,2) is [2; 2] is because the maximum is at position 2 for both rows.

max is a primarily intended for use with vectors. In normal mode, even the multi-dimensional arrays are treated as a series of vectors along which the max function is applied.
So, to get the values in B at each location at the time where A is maximum, you should
// find the maximum values and positions in A
[c,i] = max(A, [], 2);
// iterate along the first dimension, to retrieve the corresponding values in B
C = [];
for k=1:size(A,1)
C(k) = B(k,i(k));
end
You can refer to #Jigg's answer for a more concise way of creating matrix C

How do I compare elements of one row with every other row in the same matrix

I have the matrix:
a = [ 1 2 3 4;
2 4 5 6;
4 6 8 9]
and I want to compare every row with every other two rows one by one. If they share the same key then the result will tell they have a common key.

Using #gnovice's idea of getting all combinations with nchoosek, I propose yet another two solutions:
one using ismember (as noted by #loren)
the other using bsxfun with the eq function handle
The only difference is that intersect sorts and keeps only the unique common keys.
a = randi(30, [100 20]);
%# a = sort(a,2);
comparisons = nchoosek(1:size(a,1),2);
N = size(comparisons,1);
keys1 = cell(N,1);
keys2 = cell(N,1);
keys3 = cell(N,1);
tic
for i=1:N
keys1{i} = intersect(a(comparisons(i,1),:),a(comparisons(i,2),:));
end
toc
tic
for i=1:N
query = a(comparisons(i,1),:);
set = a(comparisons(i,2),:);
keys2{i} = query( ismember(query, set) ); %# unique(...)
end
toc
tic
for i=1:N
query = a(comparisons(i,1),:);
set = a(comparisons(i,2),:)';
keys3{i} = query( any(bsxfun(#eq, query, set),1) ); %'# unique(...)
end
toc
... with the following time comparisons:
Elapsed time is 0.713333 seconds.
Elapsed time is 0.289812 seconds.
Elapsed time is 0.135602 seconds.
Note that even by sorting a beforehand and adding a call to unique inside the loops (commented parts), these two methods are still faster than intersect.

Here's one solution (which is generalizable to larger matrices than the sample in the question):
comparisons = nchoosek(1:size(a,1),2);
N = size(comparisons,1);
keys = cell(N,1);
for i = 1:N
keys{i} = intersect(a(comparisons(i,1),:),a(comparisons(i,2),:));
end
The function NCHOOSEK is used to generate all of the unique combinations of row comparisons. For the matrix a in your question, you will get comparisons = [1 2; 1 3; 2 3], meaning that we will need to compare rows 1 and 2, then 1 and 3, and finally 2 and 3. keys is a cell array that stores the results of each comparison. For each comparison, the function INTERSECT is used to find the common values (i.e. keys). For the matrix a given in the question, you will get keys = {[2 4], 4, [4 6]}.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

in matlab, calculate mean in a part of one column where another column satisfies a condition - matlab

Related

How to get all the possible combinations of elements in a matrix, but don't allow exchange of elements inbetween columns?

Shifting repeating rows to a new column in a matrix

How to vectorize searching function in Matlab?

Finding maxima in 2D matrix along certain dimension with indices

How do I compare elements of one row with every other row in the same matrix

Categories

Resources