Generating a Matrix from a given vector - matlab

Im trying to generate a specific type o matrix in Matlab.
I need to modify it for specific types of data with the following rules:
First I have to choose a grade g (max 6 let's say) then I have to choose the number of elements per row n (max 18) ;
These numbers are the powers of a specific polynomial of grade g ;
The sum per row in the matrix is not allowed to be bigger than the chosen g grade ;
The biggest element per row is the chosen g.
For g = 2, n = 2 the matrix will look like this:
A = [0 0;
0 1;
1 0;
0 2;
2 0;
1 1]
For g = 2, n = 3 the matrix will look like this:
A = [0 0 0;
0 0 1;
0 0 2;
0 1 0;
0 2 0;
1 0 0;
2 0 0;
0 1 1;
1 0 1;
1 1 0]
How can I generate all possible combinations of an array elements?
Ex : given v = [0 1 2];
Result = [0 0 0;
0 0 1;
0 1 0;
0 1 1;
1 0 0;
1 0 1;
1 1 0;
1 1 1;
0 0 2;
0 2 0;
2 0 0;
2 0 1;
2 1 1;
2 1 2;
...]
and so on...
I've tried this with perms, nchoosek, repelem, repmat, for-loops, unique, matrix concatenations, everything but I couldn't be able to find and algorithm.

You can first generate all n permutations of [0..g] with repetition and rearrangement, then select allowed combinations:
% all n permutations of [0..g] (with repetition and rearrangement)
powers = unique(nchoosek(repmat(0:g, 1, n), n), 'row');
% allowed set of powers
powers = powers(sum(powers, 2) <= g, :);
As I already said in comments, the above code is extremely time and memory inefficient. For example when I run it for g=6 and n=9, MATLAB gives the following error:
Error using zeros Requested 23667689815x9 (1587.0GB) array exceeds
maximum array size preference.
...
To reduce memory consumption, you can do the following:
% all n permutations of [0..g] (with repetition)
gPerms = nmultichoosek(0:g, n);
% allowed set of powers
allowed = gPerms(sum(gPerms, 2) <= g, :);
% all permutations of [1..n] (no repetition)
nPerms = perms(1:n);
% all n permutations of [0..g] (with repetition and rearrangement)
arranges = arrayfun(#(i) allowed(:, nPerms(i, :)), ...
1:size(nPerms, 1), 'UniformOutput', false)';
powers = cell2mat(arranges);
% unique set of permutations
powers = unique(powers, 'rows');
In the above code, first n permutations with repetition of g is generated using #knedlsepp's implementation. The filtered to keep only combinations that their sums is less than or equal to g. In next step all rearrangements of these combinations are calculated. It still takes more than 13 seconds here to find 5005 combinations for the g=6 and n=9 case.

Related

How can I randomize two binary vectors that are similar while making sure all possible combinations are respected?

Been trying to solve this simple problem for a while but just couldn't find the solution for the life of me...
I'm programming an experiment in PsychToolbox but I'll spare you the details, basically I have two vectors A and B of equal size with the same number of ones and zeroes:
A = [0 0 1 1]
B = [0 0 1 1]
Both vectors A and B must be randomized independently but in such a way that one combination of items between the two vectors is never repeated. That is, I must end up with this
A = [1 1 0 0]
B = [1 0 0 1]
or this:
A = [0 0 1 1]
B = [0 1 0 1]
but I should never end up with this:
A = [1 1 0 0]
B = [1 1 0 0]
or this
A = [0 1 0 1]
B = [0 1 0 1]
One way to determine this is to check the sum of items between the two vectors A+B, which should always contain only one 2 or only one 0:
A = [1 1 0 0]
B = [1 0 0 1]
A+B = 2 1 0 1
Been trying to make this a condition within a 'while' loop (e.g. so long as the number of zeroes in the vector obtained by A+B is superior to 1, keep randomizing A and B), but either it still produces repeated combination or it just never stops looping. I know this is a trivial problem but I just can't get my head around it somehow. Anyone care to help?
This is a simplified version of the script I got:
A = [1 1 0 0];
B = A;
ARand = randperm(length(A));
A = A(ARand);
BRand = randperm(length(B));
B = B(BRand);
while nnz(~(A+B)) > 1
ARand = randperm(length(A));
A = A(ARand);
BRand = randperm(length(B));
B = B(BRand);
end
Still, I end up with repeated combinations.
% If you are only looking for an answer to this scenario the easiest way is
% as follows:
A = [0 0 1 1];
B = [0 0 1 1];
nn = length(A);
keepset = [0 0 1 1;0 1 0 1];
keepset = keepset(:,randperm(nn))
% If you want a more general solution for arbitrary A & B (for example)
A = [0 0 0 1 1 1 2 2 2];
B = [0 0 0 1 1 1 2 2 2];
nn = length(A);
Ai = A(randperm(nn));
Bi = B(randperm(nn));
% initialize keepset with the first combination of A & B
keepset = [Ai(1);Bi(1)];
loopcnt = 0;
while (size(keepset,2) < nn)
% randomize the elements in A and B independently
Ai = A(randperm(nn));
Bi = B(randperm(nn));
% test each combination of Ai and Bi to see if it is already in the
% keepset
for ii = 1:nn
tstcombo = [Ai(ii);Bi(ii)];
matchtest = bsxfun(#eq,tstcombo,keepset);
matchind = find((matchtest(1,:) & matchtest(2,:)));
if isempty(matchind)
keepset = [keepset tstcombo];
end
end
loopcnt = loopcnt + 1;
if loopcnt > 1000
disp('Execution halted after 1000 attempts')
break
elseif (size(keepset,2) >= nn)
disp(sprintf('Completed in %0.f iterations',loopcnt))
end
end
keepset
It's much more efficient to permute the combinations randomly than shuffling the arrays independently and handling the inevitable matching A/B elements.
There are lots of ways to generate all possible pairs, see
How to generate all pairs from two vectors in MATLAB using vectorised code?
For this example I'll use
allCombs = combvec([0,1],[0,1]);
% = [ 0 1 0 1
% 0 0 1 1 ]
Now you just want to select some amount of unique (non-repeating) columns from this array in a random order. In all of your examples you select all 4 columns. The randperm function is perfect for this, from the docs:
p = randperm(n,k) returns a row vector containing k unique integers selected randomly from 1 to n inclusive.
n = size(allCombs,2); % number of combinations (or columns) to choose from
k = 4; % number of columns to choose for output
AB = allCombs( :, randperm(n,k) ); % random selection of pairs
If you need this split into two variables then you have
A = AB(1,:);
B = AB(2,:);
Here's a possible solution:
A = [0 0 1 1];
B = [0 0 1 1];
% Randomize A and B independently
ARand = randperm(length(A));
A = A(ARand);
BRand = randperm(length(B));
B = B(BRand);
% Keep randomizing A and B until the condition is met
while sum(A+B) ~= 1 && sum(A+B) ~= length(A)
ARand = randperm(length(A));
A = A(ARand);
BRand = randperm(length(B));
B = B(BRand);
end
This solution checks if the sum of the elements in A+B is either 1 or the length of A, which indicates that only one element in A+B is either a 0 or a 2, respectively. If either of these conditions is not met, the vectors A and B are randomized again.

Finding the column index for the 1 in each row of a matrix

I have the following matrix in Matlab:
M = [0 0 1
1 0 0
0 1 0
1 0 0
0 0 1];
Each row has exactly one 1. How can I (without looping) determine a column vector so that the first element is a 2 if there is a 1 in the second column, the second element is a 3 for a one in the third column etc.? The above example should turn into:
M = [ 3
1
2
1
3];
You can actually solve this with simple matrix multiplication.
result = M * (1:size(M, 2)).';
3
1
2
1
3
This works by multiplying your M x 3 matrix with a 3 x 1 array where the elements of the 3x1 are simply [1; 2; 3]. Briefly, for each row of M, element-wise multiplication is performed with the 3 x 1 array. Only the 1's in the row of M will yield anything in the result. Then the result of this element-wise multiplication is summed. Because you only have one "1" per row, the result is going to be the column index where that 1 is located.
So for example for the first row of M.
element_wise_multiplication = [0 0 1] .* [1 2 3]
[0, 0, 3]
sum(element_wise_multiplication)
3
Update
Based on the solutions provided by #reyryeng and #Luis below, I decided to run a comparison to see how the performance of the various methods compared.
To setup the test matrix (M) I created a matrix of the form specified in the original question and varied the number of rows. Which column had the 1 was chosen randomly using randi([1 nCols], size(M, 1)). Execution times were analyzed using timeit.
When run using M of type double (MATLAB's default) you get the following execution times.
If M is a logical, then the matrix multiplication takes a hit due to the fact that it has to be converted to a numerical type prior to matrix multiplication, whereas the other two have a bit of a performance improvement.
Here is the test code that I used.
sizes = round(linspace(100, 100000, 100));
times = zeros(numel(sizes), 3);
for k = 1:numel(sizes)
M = generateM(sizes(k));
times(k,1) = timeit(#()M * (1:size(M, 2)).');
M = generateM(sizes(k));
times(k,2) = timeit(#()max(M, [], 2), 2);
M = generateM(sizes(k));
times(k,3) = timeit(#()find(M.'), 2);
end
figure
plot(range, times / 1000);
legend({'Multiplication', 'Max', 'Find'})
xlabel('Number of rows in M')
ylabel('Execution Time (ms)')
function M = generateM(nRows)
M = zeros(nRows, 3);
col = randi([1 size(M, 2)], 1, size(M, 1));
M(sub2ind(size(M), 1:numel(col), col)) = 1;
end
You can also abuse find and observe the row positions of the transpose of M. You have to transpose the matrix first as find operates in column major order:
M = [0 0 1
1 0 0
0 1 0
1 0 0
0 0 1];
[out,~] = find(M.');
Not sure if this is faster than matrix multiplication though.
Yet another approach: use the second output of max:
[~, result] = max(M.', [], 1);
Or, as suggested by #rayryeng, use max along the second dimension instead of transposing M:
[~, result] = max(M, [], 2);
For
M = [0 0 1
1 0 0
0 1 0
1 0 0
0 0 1];
this gives
result =
3 1 2 1 3
If M contains more than one 1 in a given row, this will give the index of the first such 1.

How can I vectorise this loop in MATLAB

I have a loop that iterates over a matrix and sets all rows and columns with only one non-zero element to all zeroes.
so for example, it will transform this matrix:
A = [ 1 0 1 1
0 0 1 0
1 1 1 1
1 0 1 1 ]
to the matrix:
A' = [ 1 0 1 1
0 0 0 0
1 0 1 1
1 0 1 1 ]
row/column 2 of A only has 1 non zero element in it, so every element in row/column 2 is set to 0 in A'
(it is assumed that the matrices will always be diagonally symmetrical)
here is my non-vectorised code:
for ii = 1:length(A)
if nnz(A(ii,:)) == 1
A(ii,:) = 0;
A(:,ii) = 0;
end
end
Is there a more efficient way of writing this code in MATLAB?
EDIT:
I have been asked in the comments for some clarification, so I will oblige.
The purpose of this code is to remove edges from a graph that lead to a vertex of degree 1.
if A is the adjacency matrix representing a undirected graph G, then a row or column of that matrix which only has one non-zero element indicates that row/column represents a vertex of degree one, as it only has one edge incident to it.
My objective is to remove such edges from the graph, as these vertices will never be visited in a solution to the problem I am trying to solve, and reducing the graph will also reduce the size of the input to my search algorithm.
#TimeString, i understand that in the example you gave, recursively applying the algorithm to your matrix will result in a zero matrix, however the matrices that I am applying it to represent large, connected graphs, so there will never be a case like that. In response to your question as to why I only check for how many elements in a row, but the clear both columns and rows; this is because the matrix is always diagonally symmetrical, so i know that if something is true for a row, so it will be for the corresponding column..
so, just to clarify using another example:
I want to turn this graph G:
represented by matrix:
A = [ 0 1 1 0
1 0 1 0
1 1 0 1
0 0 1 0 ]
to this graph G':
represented by this matrix:
A' = [ 0 1 1 0
1 0 1 0
1 1 0 0
0 0 0 0 ]
(i realise that this matrix should actually be a 3x3 matrix because point D has been removed, but i already know how to shrink the matrix in this instance, my question is about efficiently setting columns/rows with only 1 non-zero element all to 0)
i hope that is a good enough clarification..
Not sure if it's really faster (depends on Matlab's JIT) but you can try the following:
To find out which columns (equivalently, rows, since the matrix is symmetric) have more than one non zero element use:
sum(A ~= 0) > 1
The ~= 0 is probably not needed in your case since the matrix consists of 1/0 elements only (graph edges if I understand correctly).
Transform the above into a diagonal matrix in order to eliminate unwanted columns:
D = diag(sum(A~=0) > 1)
And multiply with A from left to zero rows and from right to zero columns:
res = D * A * D
Thanks to nimrodm's suggestion of using sum(A ~= 0) instead of nnz, i managed to find a better solution than my original one
to clear the rows with one element i use:
A(sum(A ~= 0) == 1,:) = 0;
and then to clear columns with one element:
A(:,sum(A ~= 0) == 1) = 0;
for those of you who are interested, i did a 'tic-toc' comparison on a 1000 x 1000 matrix:
% establish matrix
A = magic(1000);
rem_rows = [200,555,950];
A(rem_rows,:) = 0;
A(:,rem_rows) = 0;
% insert single element into empty rows/columns
A(rem_rows,500) = 5;
A(500,rem_rows) = 5;
% testing original version
A_temp = A;
for test = 1
tic
for ii = 1:length(A_temp)
if nnz(A_temp(ii,:)) == 1
A_temp(ii,:) = 0;
A_temp(:,ii) = 0;
end
end
toc
end
Elapsed time is 0.041104 seconds.
% testing new version
A_temp = A;
for test = 1
tic
A_temp(sum(A_temp ~= 0) == 1,:) = 0;
A_temp(:,sum(A_temp ~= 0) == 1) = 0;
toc
end
Elapsed time is 0.010378 seconds
% testing matrix operations based solution suggested by nimrodm
A_temp = A;
for test = 1
tic
B = diag(sum(A_temp ~= 0) > 1);
res = B * A_temp * B;
toc
end
Elapsed time is 0.258799 seconds
so it appears that the single line version that I came up with, inspired by nimrodm's suggestion, is the fastest
thanks for all your help!
Bsxfuning it -
A(bsxfun(#or,(sum(A~=0,2)==1),(sum(A~=0,1)==1))) = 0
Sample run -
>> A
A =
1 0 1 1
0 0 1 0
1 1 1 1
1 0 1 1
>> A(bsxfun(#or,(sum(A~=0,2)==1),(sum(A~=0,1)==1))) = 0
A =
1 0 1 1
0 0 0 0
1 0 1 1
1 0 1 1

Performance of vectorizing code to create a sparse matrix with a single 1 per row from a vector of indexes

I have a large column vector y containing integer values from 1 to 10. I wanted to convert it to a matrix where each row is full of 0s except for a 1 at the index given by the value at the respective row of y.
This example should make it clearer:
y = [3; 4; 1; 10; 9; 9; 4; 2; ...]
% gets converted to:
Y = [
0 0 1 0 0 0 0 0 0 0;
0 0 0 1 0 0 0 0 0 0;
1 0 0 0 0 0 0 0 0 0;
0 0 0 0 0 0 0 0 0 1;
0 0 0 0 0 0 0 0 1 0;
0 0 0 0 0 0 0 0 1 0;
0 0 0 1 0 0 0 0 0 0;
0 1 0 0 0 0 0 0 0 0;
...
]
I have written the following code for this (it works):
m = length(y);
Y = zeros(m, 10);
for i = 1:m
Y(i, y(i)) = 1;
end
I know there are ways I could remove the for loop in this code (vectorizing). This post contains a few, including something like:
Y = full(sparse(1:length(y), y, ones(length(y),1)));
But I had to convert y to doubles to be able to use this, and the result is actually about 3x slower than my "for" approach, using 10.000.000 as the length of y.
Is it likely that doing this kind of vectorization will lead to better performance for a very large y? I've read many times that vectorizing calculations leads to better performance (not only in MATLAB), but this kind of solution seems to result in more calculations.
Is there a way to actually improve performance over the for approach in this example? Maybe the problem here is simply that acting on doubles instead of ints isn't the best thing for comparison, but I couldn't find a way to use sparse otherwise.
Here is a test to comapre:
function [t,v] = testIndicatorMatrix()
y = randi([1 10], [1e6 1], 'double');
funcs = {
#() func1(y);
#() func2(y);
#() func3(y);
#() func4(y);
};
t = cellfun(#timeit, funcs, 'Uniform',true);
v = cellfun(#feval, funcs, 'Uniform',false);
assert(isequal(v{:}))
end
function Y = func1(y)
m = numel(y);
Y = zeros(m, 10);
for i = 1:m
Y(i, y(i)) = 1;
end
end
function Y = func2(y)
m = numel(y);
Y = full(sparse(1:m, y, 1, m, 10, m));
end
function Y = func3(y)
m = numel(y);
Y = zeros(m,10);
Y(sub2ind([m,10], (1:m).', y)) = 1;
end
function Y = func4(y)
m = numel(y);
Y = zeros(m,10);
Y((y-1).*m + (1:m).') = 1;
end
I get:
>> testIndicatorMatrix
ans =
0.0388
0.1712
0.0490
0.0430
Such a simple for-loop can be dynamically JIT-compiled at runtime, and would run really fast (even slightly faster than vectorized code)!
It seems you are looking for that full numeric matrix Y as the output. So, you can try this approach -
m = numel(y);
Y1(m,10) = 0; %// Faster way to pre-allocate zeros than using function call `zeros`
%// Source - http://undocumentedmatlab.com/blog/preallocation-performance
linear_idx = (y-1)*m+(1:m)'; %//'# since y is mentioned as a column vector,
%// so directly y can be used instead of y(:)
Y1(linear_idx)=1; %// Y1 would be the desired output
Benchmarking
Using Amro's benchmark post and increasing the datasize a bit -
y = randi([1 10], [1.5e6 1], 'double');
And finally doing the faster pre-allocation scheme mentioned earlier of using Y(m,10)=0; instead of Y = zeros(m,10);, I got these results on my system -
>> testIndicatorMatrix
ans =
0.1798
0.4651
0.1693
0.1457
That is the vectorized approach mentioned here (the last one in the benchmark suite) is giving you more than 15% performance improvement over your for-loop code (the first one in the benchmark suite). So, if you are using large datasizes and intend to get full versions of sparse matrices, this approach would make sense (in my personal opinion).
Does something like this not work for you?
tic;
N = 1e6;
y = randperm( N );
Y = spalloc( N, N, N );
inds = sub2ind( size(Y), y(:), (1:N)' );
Y = sparse( 1:N, y, 1, N, N, N );
toc
The above outputs
Elapsed time is 0.144683 seconds.

MATLAB: Fastest Way to Count Unique # of 2 Number Combinations in a Vector of Integers

Given a vector of integers such as:
X = [1 2 3 4 5 1 2]
I would like to find a really fast way to count the number of unique combinations with 2-elements.
In this case the two-number combinations are:
[1 2] (occurs twice)
[2 3] (occurs once)
[3 4] (occurs once)
[4 5] (occurs once)
[5 1] (occurs once)
As it stands, I am currently doing this in MATLAB as follows
X = [1 2 3 4 5 1 2];
N = length(X)
X_max = max(X);
COUNTS = nan(X_max); %store as a X_max x X_max matrix
for i = 1:X_max
first_number_indices = find(X==1)
second_number_indices = first_number_indices + 1;
second_number_indices(second_number_indices>N) = [] %just in case last entry = 1
second_number_vals = X(second_number_indices);
for j = 1:X_max
COUNTS(i,j) = sum(second_number_vals==j)
end
end
Is there a faster/smarter way of doing this?
Here is a super fast way:
>> counts = sparse(x(1:end-1),x(2:end),1)
counts =
(5,1) 1
(1,2) 2
(2,3) 1
(3,4) 1
(4,5) 1
You could convert to a full matrix simply as: full(counts)
Here is an equivalent solution using accumarray:
>> counts = accumarray([x(1:end-1);x(2:end)]', 1)
counts =
0 2 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
1 0 0 0 0
EDIT: #Amro has provided a much better solution (well, better in the vast majority of cases, I suspect my method would work better if MaxX is very large and X contains zeros - this is because the presence of zeros will rule out the use of sparse while a large MaxX will slow down the accumarray approach as it creates a matrix of size MaxX by MaxX).
EDIT: Thanks to #EitanT for pointing out an improvement that can be made using accumarray.
Here is how I would solve it:
%Generate some random data
T = 20;
MaxX = 3;
X = randi(MaxX, T, 1);
%Get the unique combinations and an index. Note, I am assuming X is a column vector.
[UniqueComb, ~, Ind] = unique([X(1:end-1), X(2:end)], 'rows');
NumComb = size(UniqueComb, 1);
%Count the number of occurrences of each combination
Count = accumarray(Ind, 1);
All unique sequential two element combinations are now stored in UniqueComb, while the corresponding counts for each unique combination are stored in Count.