MATLAB: efficient generation of a large integer matrix of multi-indices - matlab

Let d and p be two integers. I need to generate a large matrix A of integers, having d columns and N=nchoosek(d+p,p) rows. Note that nchoosek(d+p,p) increases quickly with d and p, so it's very important that I can generate A quickly. The rows of A are all the multi-indices with components from 0 to p, such that the sum of the components is less than or equal to p. This means that, if d=3 and p=3, then A is an [N=nchoosek(3+3,3)=20x3] matrix with the following structure:
A=[0 0 0;
1 0 0;
0 1 0;
0 0 1;
2 0 0;
1 1 0;
1 0 1;
0 2 0;
0 1 1;
0 0 2;
3 0 0;
2 1 0;
2 0 1;
1 2 0;
1 1 1;
1 0 2;
0 3 0;
0 2 1;
0 1 2;
0 0 3]
It is not indispensable to follow exactly the row ordering I used, although it would make my life easier (for those interested, it's called graded lexicographical ordering and it's described here:
http://en.wikipedia.org/wiki/Monomial_order).
In case you are curious about the origin of this weird matrix, let me know!

Solution using nchoosek and diff
The following solution is based on this clever answer by Mark Dickinson.
function degrees = monomialDegrees(numVars, maxDegree)
if numVars==1
degrees = (0:maxDegree).';
return;
end
degrees = cell(maxDegree+1,1);
k = numVars;
for n = 0:maxDegree
dividers = flipud(nchoosek(1:(n+k-1), k-1));
degrees{n+1} = [dividers(:,1), diff(dividers,1,2), (n+k)-dividers(:,end)]-1;
end
degrees = cell2mat(degrees);
You can get your matrix by calling monomialDegrees(d,p).
Solution using nchoosek and accumarray/histc
This approach is based on the following idea: There is a bijection between all k-multicombinations and the matrix we are looking for. The multicombinations give the positions, where the entries should be added. For example the multicombination [1,1,1,1,3] will be mapped to [4,0,1], as there are four 1s, and one 3. This can be either converted using accumarray or histc. Here is the accumarray-approach:
function degrees = monomialDegrees(numVars, maxDegree)
if numVars==1
degrees = (0:maxDegree).';
return;
end
degrees = cell(maxDegree+1,1);
degrees{1} = zeros(1,numVars);
for n = 1:maxDegree
pos = nmultichoosek(1:numVars, n);
degrees{n+1} = accumarray([reshape((1:size(pos,1)).'*ones(1,n),[],1),pos(:)],1);
end
degrees = cell2mat(degrees);
And here the alternative using histc:
function degrees = monomialDegrees(numVars, maxDegree)
if numVars==1
degrees = (0:maxDegree).';
return;
end
degrees = cell(maxDegree+1,1);
degrees(1:2) = {zeros(1,numVars); eye(numVars);};
for n = 2:maxDegree
pos = nmultichoosek(1:numVars, n);
degrees{n+1} = histc(pos.',1:numVars).';
end
degrees = cell2mat(degrees(1:maxDegree+1));
Both use the following function to generate multicombinations:
function combs = nmultichoosek(values, k)
if numel(values)==1
n = values;
combs = nchoosek(n+k-1,k);
else
n = numel(values);
combs = bsxfun(#minus, nchoosek(1:n+k-1,k), 0:k-1);
combs = reshape(values(combs),[],k);
end
Benchmarking:
Benchmarking the above codes yields that the diff-solution is faster if your numVars is low and maxDegree high. If numVars is higher than maxDegree, then the histc solution will be faster.
Old approach:
This is an alternative to Dennis' approach of dec2base, which has a limit on the maximum base. It is still a lot slower than the above solutions.
function degrees = monomialDegrees(numVars, maxDegree)
Cs = cell(1,numVars);
[Cs{:}] = ndgrid(0:maxDegree);
degrees = reshape(cat(maxDegree+1, Cs{:}),(maxDegree+1)^numVars,[]);
degrees = degrees(sum(degrees,2)<=maxDegree,:);

I would solve it this way:
ncols=d;
colsum=p;
base=(0:colsum)';
v=#(dm)permute(base,[dm:-1:1]);
M=bsxfun(#plus,base,v(2));
for idx=3:ncols
M=bsxfun(#plus,M,v(idx));
end
L=M<=colsum;
A=cell(1,ncols);
[A{:}]=ind2sub(size(L),find(L));
a=cell2mat(A);
%subtract 1 because 1 based indexing but base starts at 0
a=a-1+min(base);
It builds up a p-dimensional matrix which contains the sum. The efficiency of this code depends on sum(L(:))/numel(L), this quotient tells you how much of the created matrix is actually used for solutions. If this gets low for your intput, there probably exits a better solution.

Here is a very easy way to do it:
L = dec2base(0:4^3-1,4);
idx=sum(num2str(L)-'0',2)<=3;
L(idx,:)
I think the first line can be very time efficient for creating a list of candidates, but unfortunately I don't know how to reduce the list in an efficient way after that.
So the second line works, but could use improvement performance wise.

Related

How to get indexes of logical matrix without using find in matlab?

Let's assume my matrix A is the output of comparison function i.e. logical matrix having values 0 and 1's only. For a small matrix of size 3*4, we might have something like:
A =
1 1 0 0
0 0 1 0
0 0 1 1
Now, I am generating another matrix B which is of the same size as A, but its rows are filled with indexes of A and any leftover values in each row are set to zero.
B =
1 2 0 0
3 0 0 0
3 4 0 0
Currently, I am using find function on each row of A to get matrix B. Complete code can be written as:
A=[1,1,0,0;0,0,1,0;0,0,1,1];
[rows,columns]=size(A);
B=zeros(rows,columns);
for i=1:rows
currRow=find(A(i,:));
B(i,1:length(currRow))=currRow;
end
For large martixes, "find" function is taking time in the calculation as per Matlab Profiler. Is there any way to generate matrix B faster?
Note:
Matrix A is having more than 1000 columns in each row but non-zero elements are never more than 50. Here, I am taking Matrix B as the same size as A but Matrix B can be of much smaller size column-wise.
I would suggest using parfor, but the overhead is too much here, and there are more issues with it, so it is not a good solution.
rows = 5e5;
cols = 1000;
A = rand(rows, cols) < 0.050;
I = uint16(1:cols);
B = zeros(size(A), 'uint16');
% [r,c] = find(A);
tic
for i=1:rows
% currRow = find(A(i,:));
currRow = I(A(i,:));
B(i,1:length(currRow)) = currRow;
end
toc
#Cris suggests replacing find with an indexing operation. It increases the performance by about 10%.
Apparently, there is not a better optimization unless B is required to be in that specific form you tell. I suggest using [r,c] = find(A); if the indexes are not required in a matrix form.

Why does the rowsize of A matter in fmincon

I have a Matlab code, which use fmincon with some constraints. So that I am able to modify the code I have thought about whether the line position within the condition matrix A makes a difference
I set up a test file so I can change some variables. It turns out that the position of the condition is irrelevant for the result, but the number of rows in A and b plays a role. I´m suprised by that because I would expect that a row with only zeros in A and b just cancel out.
fun = #(x)100*(x(2)-x(1)^2)^2 + (1-x(1))^2;
options1 = optimoptions('fmincon','Display','off');
A=zeros(2,2); %setup A
A(2,2)=1; %x2<0
b=[0 0]'; %setup b
x = fmincon(fun,[-1,2],A,b,[],[],[],[],[],options1);x
%change condition position inside A
A=zeros(2,2);
A(1,2)=1; %x2<0
b=[0 0]';
x = fmincon(fun,[-1,2],A,b,[],[],[],[],[],options1);x
% no change; the position doesn´t influence fmincon
%change row size of A
A=zeros(1,2);
A(1,2)=1; %x2<0
b=[0]';
x = fmincon(fun,[-1,2],A,b,[],[],[],[],[],options1);x
%change in x2
%increase size of A
A=zeros(10,2);
A(1,2)=1; %x2<0
b=[0 0 0 0 0 0 0 0 0 0]';
x = fmincon(fun,[-1,2],A,b,[],[],[],[],[],options1);x
%change in x2
Can someone explain to me why fmincon is influenced by the row number? What is the "right" rownumber in A and b? The number of variables or the number of conditions?
EDIT
For reasons of completeness:
I agree that different values are possible because of the iteration process. Nevertheless I can find situations where the difference is bigger than the tolerance:
Added +log(x(2) to the function:
fun = #(x)100*(x(2)-x(1)^2)^2 + (1-x(1))^2+log(x(3));
options1 = optimoptions('fmincon','Display','off');
options = optimoptions('fmincon')
A=zeros(2,3); %setup A
A(2,3)=1; %x2<0
b=[0 0]'; %setup b
x = fmincon(fun,[-1,2,1],A,b,[],[],[],[],[],options1);x
%change row size of A
A=zeros(1,3);
A(1,3)=1; %x2<0
b=[0]';
x = fmincon(fun,[-1,2,1],A,b,[],[],[],[],[],options1);x
%change in x2
%increase size of A
A=zeros(10,3);
A(1,3)=1; %x2<0
b=[0 0 0 0 0 0 0 0 0 0]';
x = fmincon(fun,[-1,2,1],A,b,[],[],[],[],[],options1);x
%change in x2
x =
-0.79876 **0.49156** 2.3103e-11
x =
-0.79921 0.49143 1.1341e-11
x =
-0.80253 **0.50099** 5.8733e-12
Matlab support told me that the A matrix should not have more rows than conditions. Each condition makes it more difficult for the algorithm.
Note that fmincom doesn't necessarily give the exact solution but a good approximation of the solution according to a certain criteria.
The difference in results are plausible since fminconis an iterative algorithm and these matrix multiplications (even if there are mainly zeros) will eventually end with different results. Matlab will actually do these matrix multiplications until he finds the best result. So these results are all correct in the sense they are all close to the solution.
x =
0.161261791015350 -0.000000117317860
x =
0.161261791015350 -0.000000117317860
x =
0.161261838607809 -0.000000077614999
x =
0.161261877075196 -0.000000096088746
The difference in your results is around 1.0e-07 which is decent result considering you don't specify stopping criteria. You can see what you have by default with the command
options = optimoptions('fmincon')
My result is
Default properties:
Algorithm: 'interior-point'
CheckGradients: 0
ConstraintTolerance: 1.0000e-06
Display: 'final'
FiniteDifferenceStepSize: 'sqrt(eps)'
FiniteDifferenceType: 'forward'
HessianApproximation: 'bfgs'
HessianFcn: []
HessianMultiplyFcn: []
HonorBounds: 1
MaxFunctionEvaluations: 3000
MaxIterations: 1000
ObjectiveLimit: -1.0000e+20
OptimalityTolerance: 1.0000e-06
OutputFcn: []
PlotFcn: []
ScaleProblem: 0
SpecifyConstraintGradient: 0
SpecifyObjectiveGradient: 0
StepTolerance: 1.0000e-10
SubproblemAlgorithm: 'factorization'
TypicalX: 'ones(numberOfVariables,1)'
UseParallel: 0
For example, I can reach closer results with the option:
options1 = optimoptions('fmincon','Display','off', 'OptimalityTolerance', 1.0e-09);
Result is
x =
0.161262015455003 -0.000000000243997
x =
0.161262015455003 -0.000000000243997
x =
0.161262015706777 -0.000000000007691
x =
0.161262015313928 -0.000000000234186
You can also try and play with other criteria MaxFunctionEvaluations, MaxFunctionEvaluations etc to see if you can have even closer results...

vectorising while loop insertion sort matlab

array = [2 1 3 2 1]
for i = 2:length(array)
value = array(i);
j = i - 1;
array_j=array(1:j);
array_j_indices=cumsum(array_j>value);
[~,n]=find(array_j_indices==1);
newArray=array;
array(n+1:i)=array_j(array_j>value);
j=j-max(array_j_indices);
array(j+1) = value;
end %forLoop
disp(array);
Hello,
I saw this code for vectorising while loop insertion code but i cannot seem to understand how it works.
How does cumsum(array_j>value) work? I understand and tested cumsum functions but i can't seem to understand how the rational operator of (array_j>value) works in the within a cumsum function under the for loop.
Also, i dont understand how [~,n]=find(array_j_indices==1) stores value for the matrix of n. Does it store it only in columns because there is a not (~) in the rows?
cumsum(array_j>value)?
array_j>value: due to the sorted nature of array_j, the result is always some zeros followed by some ones, e.g. [0 0 0 0 1 1 1 1]
cumsum(array_j>value) = [0 0 0 0 1 2 3 4]: at most one element will be equal to 1.
[~,n]=find(array_j_indices==1); ?
Because there is only one row, this is equal to n=find(array_j_indices==1);.
Fastest implementation?
Note that this 'vectorised' code is slower the following (easier) implementation:
for i = 2:length(array)
value = array(i);
j = i - 1;
n=find(array(1:j)>value,1);
array(n+1:i)=array(n:j);
array(n) = value;
end
and much slower than the built-in matlab sort method.

Condition for columns based on same index as vector

I'm trying to get a logical matrix as a result of a condition that is specific for each column M(:,i) of the original matrix, based on the value of the same index i in vector N, that is, N(i).
I have looked this up online, but can't find anything quite like it. There must be a simple and clean way of doing this.
M =
3 -1 100 8
200 2 300 4
-10 0 0 400
N =
4 0 90 7
and my desired solution is, for each column of M(:,i), the values less than N(i):
1 1 0 0
0 0 0 1
1 0 1 0
It's a standard use-case for bsxfun:
O = bsxfun(#lt, M, N)
Here #lt is calling the "less than" function, i.e. it is the function handle to the < operator. bsxfun will then "expand" N along its singleton dimension by applying the function #lt to each row of M and the whole of N.
Note that you can easily achieve the same thing using a for-loop:
O = zeros(size(M));
for row = 1:size(M,1)
O(row,:) = M(row,:) < N;
end
Or by using repmat:
O = M < repmat(N, size(M,1), 1);
but in MATLAB the bsxfun is usually the most efficient.
Possible two-line solution using arrayfun to apply the comparison to each column and index pair:
T = arrayfun(#(jj)M(:,jj) < N(jj), 1:numel(N), 'UniformOutput', false);
result = cat(2,T{:});
Edit: Of course, the bsxfun solution is much more efficient.

Matlab: Convert elements larger (smaller) than 1 (-1) into a sequence of 1 (-1)

UPDATE: I've done some testing, and the solution of Jonas is the fastest for a range of different size input vectors. In particular, as angainor points out, the solution scales up to large sizes incredibly well - an important test as it is usually the large size problems that prompt us to pose these kind of questions on SO. Thanks to both Jonas and tmpearce for your solutions - based on the efficiency of the solution for large size problems I'm giving the answer tick to Jonas.
My Question: I have this column vector:
Vec = [0; 1; 2; -1; -3; 0; 0; 2; 1; -1];
I would like to convert every element greater than one into a sequence of ones that has length equal to the value of the element. Similarly, I want to convert every element less than minus one into a sequence of minus ones. Thus my output vector should look like this:
VecLong = [0; 1; 1; 1; -1; -1; -1; -1; 0; 0; 1; 1; 1; -1];
Note that each 2 has been changed into two 1's, while the -3 has been changed into three -1's. Currently, I solve the problem like this:
VecTemp = Vec;
VecTemp(VecTemp == 0) = 1;
VecLong = NaN(sum(abs(VecTemp)), 1);
c = 1;
for n = 1:length(Vec)
if abs(Vec(n)) <= 1
VecLong(c) = Vec(n);
c = c + 1;
else
VecLong(c:c + abs(Vec(n))) = sign(Vec(n));
c = c + abs(Vec(n));
end
end
This doesn't feel very elegant. Can anyone suggest a better method? Note: You can assume that Vec will contain only integer values. Thanks in advance for all suggestions.
You can use the good old cumsum-approach to repeating the entries properly. Note that I'm assigning a few temporary variables that you can get rid of, if you want to put everything into one line.
%# create a list of values to repeat
signVec = sign(Vec);
%# create a list of corresponding indices that repeat
%# as often as the value in signVec has to be repeated
tmp = max(abs(Vec),1); %# max: zeros have to be repeated once
index = zeros(sum(tmp),1);
index([1;cumsum(tmp(1:end-1))+1])=1; %# assign ones a pivots for cumsum
index = cumsum(index); %# create repeating indices
%# repeat
out = signVec(index);
out'
out =
0 1 1 1 -1 -1 -1 -1 0 0 1 1 1 -1
Edit: I thought of another (slightly obscure) but shorter way to do this, and it is faster than the loop you've got.
for rep=1:100000
#% original loop-based solution
end
toc
Elapsed time is 2.768822 seconds.
#% bsxfun-based indexing alternative
tic;
for rep=1:100000
TempVec=abs(Vec);TempVec(Vec==0)=1;
LongVec = sign(Vec(sum(bsxfun(#gt,1:sum(TempVec),cumsum(TempVec)))+1))
end
toc
Elapsed time is 1.798339 seconds.
This answer scales pretty well too, compared to the original - at least, to a point. There's a performance sweet spot.
Vec = repmat(OrigVec,10,1);
#% test with 100,000 loops
#% loop-based solution:
Elapsed time is 19.005226 seconds.
#% bsxfun-based solution:
Elapsed time is 4.411316 seconds.
Vec = repmat(OrigVer,1000,1);
#% test with 1,000 loops - 100,000 would be horribly slow
#% loop-based solution:
Elapsed time is 18.105728 seconds.
#% bsxfun-based solution:
Elapsed time is 98.699396 seconds.
bsxfun is expanding the vector into a matrix, then collapsing it with sum. With very large vectors this is needlessly memory heavy compared to the loop, so it ends up losing. Before then though, it does quite well.
Original, slow answer:
Here's a one-liner:
out=cell2mat(arrayfun(#(x) repmat(((x>0)*2)-1+(x==0),max(1,abs(x)),1),Vec,'uni',0));
out' =
0 1 1 1 -1 -1 -1 -1 0 0 1 1 1 -1
What's going on:
((x>0)*2)-1 + (x==0) #% if an integer is >0, make it a 1, <0 becomes -1, 0 stays 0
max(1,abs(x)) #% figure out how many times to replicate the value
arrayfun(#(x) (the above stuff), Vec, 'uni', 0) #% apply the function
#% to each element in the array, generating a cell array output
cell2mat( (the above stuff) ) #% convert back to a matrix