Consider a Matlab matrix B which lists all possible unordered pairs (without repetitions) from [1 2 ... n]. For example, if n=4,
B=[1 2;
1 3;
1 4;
2 3;
2 4;
3 4]
Note that B has size n(n-1)/2 x 2
I want to take a random draw of m rows from B and store them in a matrix C. Continuing the example above, I could do that as
m=2;
C=B(randi([1 size(B,1)],m,1),:);
However, in my actual case, n=371293. Hence, I cannot create B and, then, run the code above to obtain C. This is because storing B would require a huge amount of memory.
Could you advise on how I could proceed to create C, without having to first store B? Comments on a different question suggest to
Draw at random m integers between 1 and n(n-1)/2.
I=randi([1 n*(n-1)/2],m,1);
Use ind2sub to obtain C.
Here, I'm struggling to implement the second step.
Thanks to the comments below, I wrote this
n=4;
m=10;
coord=NaN(m,2);
R= randi([1 n^2],m,1);
for i=1:m
[cr, cc]=ind2sub([n,n],R(i));
if cr>cc
coord(i,1)=cc;
coord(i,2)=cr;
elseif cr<cc
coord(i,1)=cr;
coord(i,2)=cc;
end
end
coord(any(isnan(coord),2),:) = []; %delete NaN rows from coord
I guess there are more efficient ways to implement the same thing.
You can use the function named myind2ind in this post to take random rows of all possible unordered pairs without generating all of them.
function [R , C] = myind2ind(ii, N)
jj = N * (N - 1) / 2 + 1 - ii;
r = (1 + sqrt(8 * jj)) / 2;
R = N -floor(r);
idx_first = (floor(r + 1) .* floor(r)) / 2;
C = idx_first-jj + R + 1;
end
I=randi([1 n*(n-1)/2],m,1);
[C1 C2] = myind2ind (I, n);
If you look at the odds, for i=1:n-1, the number of combinations where the first value is equal to i is (n-i) and the total number of cominations is n*(n-1)/2. You can use this law to generate the first column of C. The values of the second column of C can then be generated randomly as integers uniformly distributed in the range [i+1, n]. Here is a code that performs the desired tasks:
clc; clear all; close all;
% Parameters
n = 371293; m = 10;
% Generation of C
R = rand(m,1);
C = zeros(m,2);
s = 0;
t = n*(n-1)/2;
for i=1:n-1
if (i<n-1)
ind_i = R>=s/t & R<(s+n-i)/t;
else % To avoid rounding errors for n>>1, we impose (s+n-i)=t at the last iteration (R<(s+n-i)/t=1 always true)
ind_i = R>=s/t;
end
C(ind_i,1) = i;
C(ind_i,2) = randi([i+1,n],sum(ind_i),1);
s = s+n-i;
end
% Display
C
Output:
C =
84333 266452
46609 223000
176395 328914
84865 94391
104444 227034
221905 302546
227497 335959
188486 344305
164789 266497
153603 354932
Good luck!
I am trying to implement a simplex algorithm following the rules I was given at my optimization course. The problem is
min c'*x s.t.
Ax = b
x >= 0
All vectors are assumes to be columns, ' denotes the transpose. The algorithm should also return the solution to dual LP. The rules to follow are:
Here, A_J denotes columns from A with indices in J and x_J, x_K denotes elements of vector x with indices in J or K respectively. Vector a_s is column s of matrix A.
Now I do not understand how this algorithm takes care of condition x >= 0, but I decided to give it a try and follow it step by step. I used Matlab for this and got the following code.
X = zeros(n, 1);
Y = zeros(m, 1);
% i. Choose starting basis J and K = {1,2,...,n} \ J
J = [4 5 6] % for our problem
K = setdiff(1:n, J)
% this while is for goto
while 1
% ii. Solve system A_J*\bar{x}_J = b.
xbar = A(:,J) \ b
% iii. Calculate value of criterion function with respect to current x_J.
fval = c(J)' * xbar
% iv. Calculate dual solution y from A_J^T*y = c_J.
y = A(:,J)' \ c(J)
% v. Calculate \bar{c}^T = c_K^T - u^T A_K. If \bar{c}^T >= 0, we have
% found the optimal solution. If not, select the smallest s \in K, such
% that c_s < 0. Variable x_s enters basis.
cbar = c(K)' - c(J)' * inv(A(:,J)) * A(:,K)
cbar = cbar'
tmp = findnegative(cbar)
if tmp == -1 % we have found the optimal solution since cbar >= 0
X(J) = xbar;
Y = y;
FVAL = fval;
return
end
s = findnegative(c, K) %x_s enters basis
% vi. Solve system A_J*\bar{a} = a_s. If \bar{a} <= 0, then the problem is
% unbounded.
abar = A(:,J) \ A(:,s)
if findpositive(abar) == -1 % we failed to find positive number
disp('The problem is unbounded.')
return;
end
% vii. Calculate v = \bar{x}_J / \bar{a} and find the smallest rho \in J,
% such that v_rho > 0. Variable x_rho exits basis.
v = xbar ./ abar
rho = J(findpositive(v))
% viii. Update J and K and goto ii.
J = setdiff(J, rho)
J = union(J, s)
K = setdiff(K, s)
K = union(K, rho)
end
Functions findpositive(x) and findnegative(x, S) return the first index of positive or negative value in x. S is the set of indices, over which we look at. If S is omitted, whole vector is checked. Semicolons are omitted for debugging purposes.
The problem I tested this code on is
c = [-3 -1 -3 zeros(1,3)];
A = [2 1 1; 1 2 3; 2 2 1];
A = [A eye(3)];
b = [2; 5; 6];
The reason for zeros(1,3) and eye(3) is that the problem is inequalities and we need slack variables. I have set starting basis to [4 5 6] because the notes say that starting basis should be set to slack variables.
Now, what happens during execution is that on first run of while, variable with index 1 enters basis (in Matlab, indices go from 1 on) and 4 exits it and that is reasonable. On the second run, 2 enters the basis (since it is the smallest index such that c(idx) < 0 and 1 leaves it. But now on the next iteration, 1 enters basis again and I understand why it enters, because it is the smallest index, such that c(idx) < 0. But here the looping starts. I assume that should not have happened, but following the rules I cannot see how to prevent this.
I guess that there has to be something wrong with my interpretation of the notes but I just cannot see where I am wrong. I also remember that when we solved LP on the paper, we were updating our subjective function on each go, since when a variable entered basis, we removed it from the subjective function and expressed that variable in subj. function with the expression from one of the equalities, but I assume that is different algorithm.
Any remarks or help will be highly appreciated.
The problem has been solved. Turned out that the point 7 in the notes was wrong. Instead, point 7 should be
I am doing a very large calculation (atmospheric absorption) that has a lot of individual narrow peaks that all get added up at the end. For each peak, I have pre-calculated the range over which the value of the peak shape function is above my chosen threshold, and I am then going line by line and adding the peaks to my spectrum. A minimum example is given below:
X = 1:1e7;
K = numel(a); % count the number of peaks I have.
spectrum = zeros(size(X));
for k = 1:K
grid = X >= rng(1,k) & X <= rng(2,k);
spectrum(grid) = spectrum(grid) + peakfn(X(grid),a(k),b(k),c(k)]);
end
Here, each peak has some parameters that define the position and shape (a,b,c), and a range over which to do the calculation (rng). This works great, and on my machine it benchmarks at around 220 seconds to do a complete data set. However, I have a 4 core machine and I would eventually like to run this on a cluster, so I'd like to parallelize it and make it scaleable.
Because each loop relies on the results of the previous iteration, I cannot use parfor, so I am taking my first step into learning how to use spmd blocks. My first try looked like this:
X = 1:1e7;
cores = matlabpool('size');
K = numel(a);
spectrum = zeros(size(X),cores);
spmd
n = labindex:cores:K
N = numel(n);
for k = 1:N
grid = X >= rng(1,n(k)) & X <= rng(2,n(k));
spectrum(grid,labindex) = spectrum(grid,labindex) + peakfn(X(grid),a(n(k)),b(n(k)),c(n(k))]);
end
end
finalSpectrum = sum(spectrum,2);
This almost works. The program crashes at the last line because spectrum is of type Composite, and the documentation for 2013a is spotty on how to turn Composite data into a matrix (cell2mat does not work). This also does not scale well because the more cores I have, the larger the matrix is, and that large matrix has to get copied to each worker, which then ignores most of the data. Question 1: how do I turn a Composite data type into a useable array?
The second thing I tried was to use a codistributed array.
spmd
spectrum = codistributed.zeros(K,cores);
disp(size(getLocalPart(spectrum)))
end
This tells me that each worker has a single vector of size [K 1], which I believe is what I want, but when I try to then meld the above methods
spmd
spectrum = codistributed.zeros(K,cores);
n = labindex:cores:K
N = numel(n);
for k = 1:N
grid = X >= rng(1,n(k)) & X <= rng(2,n(k));
spectrum(grid) = spectrum(grid) + peakfn(X(grid),a(n(k)),b(n(k)),c(n(k))]); end
finalSpectrum = gather(spectrum);
end
finalSpectrum = sum(finalSpectrum,2);
I get Matrix dimensions must agree errors. Since it's in a parallel block, I can't use my normal debugging crutch of stepping through the loop and seeing what the size of each block is at each point to see what's going on. Question 2: what is the proper way to index into and out of a codistributed array in an spmd block?
Regarding question#1, the Composite variable in the client basically refers to a non-distributed variant array stored on the workers. You can access the array from each worker by {}-indexing using its corresponding labindex (e.g: spectrum{1}, spectrum{2}, ..).
For your code that would be: finalSpectrum = sum(cat(2,spectrum{:}), 2);
Now I tried this problem myself using random data. Below are three implementations to compare (see here to understand the difference between distributed and nondistributed arrays). First we start with the common data:
len = 100; % spectrum length
K = 10; % number of peaks
X = 1:len;
% random position and shape parameters
a = rand(1,K); b = rand(1,K); c = rand(1,K);
% random peak ranges (lower/upper thresholds)
ranges = sort(randi([1 len], [2 K]));
% dummy peakfn() function
fcn = #(x,a,b,c) x+a+b+c;
% prepare a pool of MATLAB workers
matlabpool open
1) Serial for-loop:
spectrum = zeros(size(X));
for i=1:size(ranges,2)
r = ranges(:,i);
idx = (r(1) <= X & X <= r(2));
spectrum(idx) = spectrum(idx) + fcn(X(idx), a(i), b(i), c(i));
end
s1 = spectrum;
clear spectrum i r idx
2) SPMD with Composite array
spmd
spectrum = zeros(1,len);
ind = labindex:numlabs:K;
for i=1:numel(ind)
r = ranges(:,ind(i));
idx = (r(1) <= X & X <= r(2));
spectrum(idx) = spectrum(idx) + ...
feval(fcn, X(idx), a(ind(i)), b(ind(i)), c(ind(i)));
end
end
s2 = sum(vertcat(spectrum{:}));
clear spectrum i r idx ind
3) SPMD with co-distributed array
spmd
spectrum = zeros(numlabs, len, codistributor('1d',1));
ind = labindex:numlabs:K;
for i=1:numel(ind)
r = ranges(:,ind(i));
idx = (r(1) <= X & X <= r(2));
spectrum(labindex,idx) = spectrum(labindex,idx) + ...
feval(fcn, X(idx), a(ind(i)), b(ind(i)), c(ind(i)));
end
end
s3 = sum(gather(spectrum));
clear spectrum i r idx ind
All three results should be equal (to within an acceptably small margin of error)
>> max([max(s1-s2), max(s1-s3), max(s2-s3)])
ans =
2.8422e-14
I would like to find a clean way so that I can iterate over all the vectors of positive integers of length, say n (called x), such that sum(x) == 100 in MATLAB.
I know it is an exponentially complex task. If the length is sufficiently small, say 2-3 I can do it by a for loop (I know it is very inefficient) but how about longer vectors?
Thanks in advance,
Here is a quick and dirty method that uses recursion. The idea is that to generate all vectors of length k that sum to n, you first generate vectors of length k-1 that sum to n-i for each i=1..n, and then add an extra i to the end of each of these.
You could speed this up by pre-allocating x in each loop.
Note that the size of the output is (n + k - 1 choose n) rows and k columns.
function x = genperms(n, k)
if k == 1
x = n;
elseif n == 0
x = zeros(1,k);
else
x = zeros(0, k);
for i = 0:n
y = genperms(n-i,k-1);
y(:,end+1) = i;
x = [x; y];
end
end
Edit
As alluded to in the comments, this will run into memory issues for large n and k. A streaming solution is preferable, which generates the outputs one at a time. In a non-strict language like Haskell this is very simple -
genperms n k
| k == 1 = return [n]
| n == 0 = return (replicate k 0)
| otherwise = [i:y | i <- [0..n], y <- genperms (n-i) (k-1)]
viz.
>> mapM_ print $ take 10 $ genperms 100 30
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,100]
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,99]
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,98]
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,97]
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,96]
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,95]
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,94]
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,7,93]
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,92]
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,9,91]
which runs virtually instantaneously - no memory issues to worry about.
In Python you could achieve something nearly as simple using generators and the yield keyword. In Matlab it is certainly possible, but I leave the translation up to you!
This is one possible method to generate all vectors at once (will give memory problems for moderately large n):
s = 10; %// desired sum
n = 3; %// number of digits
vectors = cell(1,n);
[vectors{:}] = ndgrid(0:s); %// I assume by "integer" you mean non-negative int
vectors = cell2mat(cellfun(#(c) reshape(c,1,[]), vectors, 'uni', 0).');
vectors = vectors(:,sum(vectors)==s); %// each column is a vector
Now you can iterate over those vectors:
for vector = vectors %// take one column at each iteration
%// do stuff with the vector
end
To avoid memory problems it is better to generate each vector as needed, instead of generating all of them initially. The following approach iterates over all possible n-vectors in one for loop (regardless of n), rejecting those vectors whose sum is not the desired value:
s = 10; %// desired sum
n = 3;; %// number of digits
for number = 0: s^n-1
vector = dec2base(number,s).'-'0'; %// column vector of n rows
if sum(vector) ~= s
continue %// reject that vector
end
%// do stuff with the vector
end
I want to vectorize the following MATLAB code. I think it must be simple but I'm finding it confusing nevertheless.
r = some constant less than m or n
[m,n] = size(C);
S = zeros(m-r,n-r);
for i=1:m-r+1
for j=1:n-r+1
S(i,j) = sum(diag(C(i:i+r-1,j:j+r-1)));
end
end
The code calculates a table of scores, S, for a dynamic programming algorithm, from another score table, C.
The diagonal summing is to generate scores for individual pieces of the data used to generate C, for all possible pieces (of size r).
Thanks in advance for any answers! Sorry if this one should be obvious...
Note
The built-in conv2 turned out to be faster than convnfft, because my eye(r) is quite small ( 5 <= r <= 20 ). convnfft.m states that r should be > 20 for any benefit to manifest.
If I understand correctly, you're trying to calculate the diagonal sum of every subarray of C, where you have removed the last row and column of C (if you should not remove the row/col, you need to loop to m-r+1, and you need to pass the entire array C to the function in my solution below).
You can do this operation via a convolution, like so:
S = conv2(C(1:end-1,1:end-1),eye(r),'valid');
If C and r are large, you may want to have a look at CONVNFFT from the Matlab File Exchange to speed up calculations.
Based on the idea of JS, and as Jonas pointed out in the comments, this can be done in two lines using IM2COL with some array manipulation:
B = im2col(C, [r r], 'sliding');
S = reshape( sum(B(1:r+1:end,:)), size(C)-r+1 );
Basically B contains the elements of all sliding blocks of size r-by-r over the matrix C. Then we take the elements on the diagonal of each of these blocks B(1:r+1:end,:), compute their sum, and reshape the result to the expected size.
Comparing this to the convolution-based solution by Jonas, this does not perform any matrix multiplication, only indexing...
I would think you might need to rearrange C into a 3D matrix before summing it along one of the dimensions. I'll post with an answer shortly.
EDIT
I didn't manage to find a way to vectorise it cleanly, but I did find the function accumarray, which might be of some help. I'll look at it in more detail when I am home.
EDIT#2
Found a simpler solution by using linear indexing, but this could be memory-intensive.
At C(1,1), the indexes we want to sum are 1+[0, m+1, 2*m+2, 3*m+3, 4*m+4, ... ], or (0:r-1)+(0:m:(r-1)*m)
sum_ind = (0:r-1)+(0:m:(r-1)*m);
create S_offset, an (m-r) by (n-r) by r matrix, such that S_offset(:,:,1) = 0, S_offset(:,:,2) = m+1, S_offset(:,:,3) = 2*m+2, and so on.
S_offset = permute(repmat( sum_ind, [m-r, 1, n-r] ), [1, 3, 2]);
create S_base, a matrix of base array addresses from which the offset will be calculated.
S_base = reshape(1:m*n,[m n]);
S_base = repmat(S_base(1:m-r,1:n-r), [1, 1, r]);
Finally, use S_base+S_offset to address the values of C.
S = sum(C(S_base+S_offset), 3);
You can, of course, use bsxfun and other methods to make it more efficient; here I chose to lay it out for clarity. I have yet to benchmark this to see how it compares with the double-loop method though; I need to head home for dinner first!
Is this what you're looking for? This function adds the diagonals and puts them into a vector similar to how the function 'sum' adds up all of the columns in a matrix and puts them into a vector.
function [diagSum] = diagSumCalc(squareMatrix, LLUR0_ULLR1)
%
% Input: squareMatrix: A square matrix.
% LLUR0_ULLR1: LowerLeft to UpperRight addition = 0
% UpperLeft to LowerRight addition = 1
%
% Output: diagSum: A vector of the sum of the diagnols of the matrix.
%
% Example:
%
% >> squareMatrix = [1 2 3;
% 4 5 6;
% 7 8 9];
%
% >> diagSum = diagSumCalc(squareMatrix, 0);
%
% diagSum =
%
% 1 6 15 14 9
%
% >> diagSum = diagSumCalc(squareMatrix, 1);
%
% diagSum =
%
% 7 12 15 8 3
%
% Written by M. Phillips
% Oct. 16th, 2013
% MIT Open Source Copywrite
% Contact mphillips#hmc.edu fmi.
%
if (nargin < 2)
disp('Error on input. Needs two inputs.');
return;
end
if (LLUR0_ULLR1 ~= 0 && LLUR0_ULLR1~= 1)
disp('Error on input. Only accepts 0 or 1 as input for second condition.');
return;
end
[M, N] = size(squareMatrix);
if (M ~= N)
disp('Error on input. Only accepts a square matrix as input.');
return;
end
diagSum = zeros(1, M+N-1);
if LLUR0_ULLR1 == 1
squareMatrix = rot90(squareMatrix, -1);
end
for i = 1:length(diagSum)
if i <= M
countUp = 1;
countDown = i;
while countDown ~= 0
diagSum(i) = squareMatrix(countUp, countDown) + diagSum(i);
countUp = countUp+1;
countDown = countDown-1;
end
end
if i > M
countUp = i-M+1;
countDown = M;
while countUp ~= M+1
diagSum(i) = squareMatrix(countUp, countDown) + diagSum(i);
countUp = countUp+1;
countDown = countDown-1;
end
end
end
Cheers