MATLAB: Fastest Way to Count Unique # of 2 Number Combinations in a Vector of Integers - matlab

Given a vector of integers such as:
X = [1 2 3 4 5 1 2]
I would like to find a really fast way to count the number of unique combinations with 2-elements.
In this case the two-number combinations are:
[1 2] (occurs twice)
[2 3] (occurs once)
[3 4] (occurs once)
[4 5] (occurs once)
[5 1] (occurs once)
As it stands, I am currently doing this in MATLAB as follows
X = [1 2 3 4 5 1 2];
N = length(X)
X_max = max(X);
COUNTS = nan(X_max); %store as a X_max x X_max matrix
for i = 1:X_max
first_number_indices = find(X==1)
second_number_indices = first_number_indices + 1;
second_number_indices(second_number_indices>N) = [] %just in case last entry = 1
second_number_vals = X(second_number_indices);
for j = 1:X_max
COUNTS(i,j) = sum(second_number_vals==j)
end
end
Is there a faster/smarter way of doing this?

Here is a super fast way:
>> counts = sparse(x(1:end-1),x(2:end),1)
counts =
(5,1) 1
(1,2) 2
(2,3) 1
(3,4) 1
(4,5) 1
You could convert to a full matrix simply as: full(counts)
Here is an equivalent solution using accumarray:
>> counts = accumarray([x(1:end-1);x(2:end)]', 1)
counts =
0 2 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
1 0 0 0 0

EDIT: #Amro has provided a much better solution (well, better in the vast majority of cases, I suspect my method would work better if MaxX is very large and X contains zeros - this is because the presence of zeros will rule out the use of sparse while a large MaxX will slow down the accumarray approach as it creates a matrix of size MaxX by MaxX).
EDIT: Thanks to #EitanT for pointing out an improvement that can be made using accumarray.
Here is how I would solve it:
%Generate some random data
T = 20;
MaxX = 3;
X = randi(MaxX, T, 1);
%Get the unique combinations and an index. Note, I am assuming X is a column vector.
[UniqueComb, ~, Ind] = unique([X(1:end-1), X(2:end)], 'rows');
NumComb = size(UniqueComb, 1);
%Count the number of occurrences of each combination
Count = accumarray(Ind, 1);
All unique sequential two element combinations are now stored in UniqueComb, while the corresponding counts for each unique combination are stored in Count.

Related

How can I randomize two binary vectors that are similar while making sure all possible combinations are respected?

Been trying to solve this simple problem for a while but just couldn't find the solution for the life of me...
I'm programming an experiment in PsychToolbox but I'll spare you the details, basically I have two vectors A and B of equal size with the same number of ones and zeroes:
A = [0 0 1 1]
B = [0 0 1 1]
Both vectors A and B must be randomized independently but in such a way that one combination of items between the two vectors is never repeated. That is, I must end up with this
A = [1 1 0 0]
B = [1 0 0 1]
or this:
A = [0 0 1 1]
B = [0 1 0 1]
but I should never end up with this:
A = [1 1 0 0]
B = [1 1 0 0]
or this
A = [0 1 0 1]
B = [0 1 0 1]
One way to determine this is to check the sum of items between the two vectors A+B, which should always contain only one 2 or only one 0:
A = [1 1 0 0]
B = [1 0 0 1]
A+B = 2 1 0 1
Been trying to make this a condition within a 'while' loop (e.g. so long as the number of zeroes in the vector obtained by A+B is superior to 1, keep randomizing A and B), but either it still produces repeated combination or it just never stops looping. I know this is a trivial problem but I just can't get my head around it somehow. Anyone care to help?
This is a simplified version of the script I got:
A = [1 1 0 0];
B = A;
ARand = randperm(length(A));
A = A(ARand);
BRand = randperm(length(B));
B = B(BRand);
while nnz(~(A+B)) > 1
ARand = randperm(length(A));
A = A(ARand);
BRand = randperm(length(B));
B = B(BRand);
end
Still, I end up with repeated combinations.
% If you are only looking for an answer to this scenario the easiest way is
% as follows:
A = [0 0 1 1];
B = [0 0 1 1];
nn = length(A);
keepset = [0 0 1 1;0 1 0 1];
keepset = keepset(:,randperm(nn))
% If you want a more general solution for arbitrary A & B (for example)
A = [0 0 0 1 1 1 2 2 2];
B = [0 0 0 1 1 1 2 2 2];
nn = length(A);
Ai = A(randperm(nn));
Bi = B(randperm(nn));
% initialize keepset with the first combination of A & B
keepset = [Ai(1);Bi(1)];
loopcnt = 0;
while (size(keepset,2) < nn)
% randomize the elements in A and B independently
Ai = A(randperm(nn));
Bi = B(randperm(nn));
% test each combination of Ai and Bi to see if it is already in the
% keepset
for ii = 1:nn
tstcombo = [Ai(ii);Bi(ii)];
matchtest = bsxfun(#eq,tstcombo,keepset);
matchind = find((matchtest(1,:) & matchtest(2,:)));
if isempty(matchind)
keepset = [keepset tstcombo];
end
end
loopcnt = loopcnt + 1;
if loopcnt > 1000
disp('Execution halted after 1000 attempts')
break
elseif (size(keepset,2) >= nn)
disp(sprintf('Completed in %0.f iterations',loopcnt))
end
end
keepset
It's much more efficient to permute the combinations randomly than shuffling the arrays independently and handling the inevitable matching A/B elements.
There are lots of ways to generate all possible pairs, see
How to generate all pairs from two vectors in MATLAB using vectorised code?
For this example I'll use
allCombs = combvec([0,1],[0,1]);
% = [ 0 1 0 1
% 0 0 1 1 ]
Now you just want to select some amount of unique (non-repeating) columns from this array in a random order. In all of your examples you select all 4 columns. The randperm function is perfect for this, from the docs:
p = randperm(n,k) returns a row vector containing k unique integers selected randomly from 1 to n inclusive.
n = size(allCombs,2); % number of combinations (or columns) to choose from
k = 4; % number of columns to choose for output
AB = allCombs( :, randperm(n,k) ); % random selection of pairs
If you need this split into two variables then you have
A = AB(1,:);
B = AB(2,:);
Here's a possible solution:
A = [0 0 1 1];
B = [0 0 1 1];
% Randomize A and B independently
ARand = randperm(length(A));
A = A(ARand);
BRand = randperm(length(B));
B = B(BRand);
% Keep randomizing A and B until the condition is met
while sum(A+B) ~= 1 && sum(A+B) ~= length(A)
ARand = randperm(length(A));
A = A(ARand);
BRand = randperm(length(B));
B = B(BRand);
end
This solution checks if the sum of the elements in A+B is either 1 or the length of A, which indicates that only one element in A+B is either a 0 or a 2, respectively. If either of these conditions is not met, the vectors A and B are randomized again.

Generating a Matrix from a given vector

Im trying to generate a specific type o matrix in Matlab.
I need to modify it for specific types of data with the following rules:
First I have to choose a grade g (max 6 let's say) then I have to choose the number of elements per row n (max 18) ;
These numbers are the powers of a specific polynomial of grade g ;
The sum per row in the matrix is not allowed to be bigger than the chosen g grade ;
The biggest element per row is the chosen g.
For g = 2, n = 2 the matrix will look like this:
A = [0 0;
0 1;
1 0;
0 2;
2 0;
1 1]
For g = 2, n = 3 the matrix will look like this:
A = [0 0 0;
0 0 1;
0 0 2;
0 1 0;
0 2 0;
1 0 0;
2 0 0;
0 1 1;
1 0 1;
1 1 0]
How can I generate all possible combinations of an array elements?
Ex : given v = [0 1 2];
Result = [0 0 0;
0 0 1;
0 1 0;
0 1 1;
1 0 0;
1 0 1;
1 1 0;
1 1 1;
0 0 2;
0 2 0;
2 0 0;
2 0 1;
2 1 1;
2 1 2;
...]
and so on...
I've tried this with perms, nchoosek, repelem, repmat, for-loops, unique, matrix concatenations, everything but I couldn't be able to find and algorithm.
You can first generate all n permutations of [0..g] with repetition and rearrangement, then select allowed combinations:
% all n permutations of [0..g] (with repetition and rearrangement)
powers = unique(nchoosek(repmat(0:g, 1, n), n), 'row');
% allowed set of powers
powers = powers(sum(powers, 2) <= g, :);
As I already said in comments, the above code is extremely time and memory inefficient. For example when I run it for g=6 and n=9, MATLAB gives the following error:
Error using zeros Requested 23667689815x9 (1587.0GB) array exceeds
maximum array size preference.
...
To reduce memory consumption, you can do the following:
% all n permutations of [0..g] (with repetition)
gPerms = nmultichoosek(0:g, n);
% allowed set of powers
allowed = gPerms(sum(gPerms, 2) <= g, :);
% all permutations of [1..n] (no repetition)
nPerms = perms(1:n);
% all n permutations of [0..g] (with repetition and rearrangement)
arranges = arrayfun(#(i) allowed(:, nPerms(i, :)), ...
1:size(nPerms, 1), 'UniformOutput', false)';
powers = cell2mat(arranges);
% unique set of permutations
powers = unique(powers, 'rows');
In the above code, first n permutations with repetition of g is generated using #knedlsepp's implementation. The filtered to keep only combinations that their sums is less than or equal to g. In next step all rearrangements of these combinations are calculated. It still takes more than 13 seconds here to find 5005 combinations for the g=6 and n=9 case.

Extract unique elements from each row of a matrix (Matlab)

I would like to apply the function unique to each row of a given matrix, without involving any for loop. Suppose I have the following 4-by-5 matrix
full(A) = [0 1 0 0 1
2 1 0 3 0
1 2 0 0 2
0 3 1 0 0]
where A is the corresponding sparse matrix. As an example using a for loop, I can do
uniq = cell(4,1);
for i = 1:4
uniq{i} = unique(A(i,:));
end
and I would obtain the cell structure uniq given by
uniq{1} = {1}
uniq{2} = {[1 2 3]}
uniq{3} = {[1 2]}
uniq{4} = {[1 3]}
Is there a quicker way to vectorize this and avoid for loops?
I need to apply this to matrices M-by-5 with M large.
Note that I'm not interested in the number of unique elements per row (I know there are around answers for such a problem).
You can use accumarray with a custom function:
A = sparse([0 1 0 0 1; 2 1 0 3 0; 1 2 0 0 2; 0 3 1 0 0]); % data
[ii, ~, vv] = find(A);
uniq = accumarray(ii(:), vv(:), [], #(x){unique(x.')});
This gives:
>> celldisp(uniq)
uniq{1} =
1
uniq{2} =
1 2 3
uniq{3} =
1 2
uniq{4} =
1 3
you can use num2cell(A,2) to convert each row into cell and then cellfun with unique to get cell array of unique values from each row:
% generate large nX5 matrix
n = 5000;
A = randi(5,n,5);
% convert each row into cell
C = num2cell(A,2);
% take unique values from each cell
U = cellfun(#unique,C,'UniformOutput',0);

Finding the column index for the 1 in each row of a matrix

I have the following matrix in Matlab:
M = [0 0 1
1 0 0
0 1 0
1 0 0
0 0 1];
Each row has exactly one 1. How can I (without looping) determine a column vector so that the first element is a 2 if there is a 1 in the second column, the second element is a 3 for a one in the third column etc.? The above example should turn into:
M = [ 3
1
2
1
3];
You can actually solve this with simple matrix multiplication.
result = M * (1:size(M, 2)).';
3
1
2
1
3
This works by multiplying your M x 3 matrix with a 3 x 1 array where the elements of the 3x1 are simply [1; 2; 3]. Briefly, for each row of M, element-wise multiplication is performed with the 3 x 1 array. Only the 1's in the row of M will yield anything in the result. Then the result of this element-wise multiplication is summed. Because you only have one "1" per row, the result is going to be the column index where that 1 is located.
So for example for the first row of M.
element_wise_multiplication = [0 0 1] .* [1 2 3]
[0, 0, 3]
sum(element_wise_multiplication)
3
Update
Based on the solutions provided by #reyryeng and #Luis below, I decided to run a comparison to see how the performance of the various methods compared.
To setup the test matrix (M) I created a matrix of the form specified in the original question and varied the number of rows. Which column had the 1 was chosen randomly using randi([1 nCols], size(M, 1)). Execution times were analyzed using timeit.
When run using M of type double (MATLAB's default) you get the following execution times.
If M is a logical, then the matrix multiplication takes a hit due to the fact that it has to be converted to a numerical type prior to matrix multiplication, whereas the other two have a bit of a performance improvement.
Here is the test code that I used.
sizes = round(linspace(100, 100000, 100));
times = zeros(numel(sizes), 3);
for k = 1:numel(sizes)
M = generateM(sizes(k));
times(k,1) = timeit(#()M * (1:size(M, 2)).');
M = generateM(sizes(k));
times(k,2) = timeit(#()max(M, [], 2), 2);
M = generateM(sizes(k));
times(k,3) = timeit(#()find(M.'), 2);
end
figure
plot(range, times / 1000);
legend({'Multiplication', 'Max', 'Find'})
xlabel('Number of rows in M')
ylabel('Execution Time (ms)')
function M = generateM(nRows)
M = zeros(nRows, 3);
col = randi([1 size(M, 2)], 1, size(M, 1));
M(sub2ind(size(M), 1:numel(col), col)) = 1;
end
You can also abuse find and observe the row positions of the transpose of M. You have to transpose the matrix first as find operates in column major order:
M = [0 0 1
1 0 0
0 1 0
1 0 0
0 0 1];
[out,~] = find(M.');
Not sure if this is faster than matrix multiplication though.
Yet another approach: use the second output of max:
[~, result] = max(M.', [], 1);
Or, as suggested by #rayryeng, use max along the second dimension instead of transposing M:
[~, result] = max(M, [], 2);
For
M = [0 0 1
1 0 0
0 1 0
1 0 0
0 0 1];
this gives
result =
3 1 2 1 3
If M contains more than one 1 in a given row, this will give the index of the first such 1.

How to shift zero in the last column of a matrix

I have one matrix like below-
A=[1 1 1 1 1;
0 1 1 1 2;
0 0 1 1 3]
But I want to place all the 0 at the end of the row, so A should be like-
A=[1 1 1 1 1;
1 1 1 2 0;
1 1 3 0 0]
How can I do this? Matlab experts please help me.
There you go. Whole matrix, no loops, works even for non-contiguous zeros:
A = [1 1 1 1 1; 0 1 1 1 2; 0 0 1 1 3];
At = A.'; %// It's easier to work with the transpose
[~, rows] = sort(At~=0,'descend'); %// This is the important part.
%// It sends the zeros to the end of each column
cols = repmat(1:size(At,2),size(At,1),1);
ind = sub2ind(size(At),rows(:),cols(:));
sol = repmat(NaN,size(At,1),size(At,2));
sol(:) = At(ind);
sol = sol.'; %'// undo transpose
As usual, for Matlab versions that do not support the ~ symbol on function return, change ~ by a dummy variable, for example:
[nada, rows] = sort(At~=0,'descend'); %// This is the important part.
A more generic example:
A = [1 3 0 1 1;
0 1 1 1 2;
0 0 1 1 3]
% Sort columns directly
[~,srtcol] = sort(A == 0,2);
% Sorted positions
sz = size(A);
pos = bsxfun(#plus, (srtcol-1)*sz(1), (1:sz(1))'); % or use sub2ind
The result
B = A(pos)
B =
1 3 1 1 0
1 1 1 2 0
1 1 3 0 0
there are many ways to do this. one fast way can be easily like this:
a = [1 2 3 4 0 5 7 0];
idx=(find(a==0));
idx =
5 8
b=a; % save a new copy of the vector
b(idx)=[]; % remove zero elements
b =
1 2 3 4 5 7
c=[b zeros(size(idx))]
c =
1 2 3 4 5 7 0 0
You may modify this code as well.
If your zeros are always together, you could use the circshift command. This shifts values in an array by a specified number of places, and wraps values that run off the edge over to the other side. It looks like you would need to do this separately for each row in A, so in your example above you could try:
A(2,:) = circshift(A(2,:), [1 -1]); % shift the second row one to the left with wrapping
A(3,:) = circshift(A(3,:), [1 -2]); % shift the third row two to the left with wrapping
In general, if your zeros are always at the front of the row in A, you could try something like:
for ii = 1:size(A,1) % iterate over rows in A
numShift = numel(find(A(ii,:) == 0)); % assuming zeros at the front of the row, this is how many times we have to shift the row.
A(ii,:) = circshift(A(ii,:), [1 -numShift]); % shift it
end
Try this (just a fast hack):
for row_k = 1:size(A, 1)
[A_sorted, A_sortmap] = sort(A(row_k, :) == 0, 'ascend');
% update row in A:
A(row_k, :) = A(row_k, A_sortmap);
end
Now optimized for versions of MATLAB not supporting ~ as garbage lhs-identifier.
#LuisMendo's answer is inspiring in its elegance, but I couldn't get it to work (perhaps a matlab version thing). The following (based on his answer) worked for me:
Aaux = fliplr(reshape([1:numel(A)],size(A)));
Aaux(find(A==0))=0;
[Asort iso]=sort(Aaux.',1,'descend');
iso = iso + repmat([0:size(A,1)-1]*size(A,2),size(A,2),1);
A=A.';
A(iso).'
I've also asked this question and got a super elegant answer (non of above answers is same) here:
Optimize deleting matrix leading zeros in MATLAB