Take a random draw of all possible pairs of indices in Matlab - matlab

Consider a Matlab matrix B which lists all possible unordered pairs (without repetitions) from [1 2 ... n]. For example, if n=4,
B=[1 2;
1 3;
1 4;
2 3;
2 4;
3 4]
Note that B has size n(n-1)/2 x 2
I want to take a random draw of m rows from B and store them in a matrix C. Continuing the example above, I could do that as
m=2;
C=B(randi([1 size(B,1)],m,1),:);
However, in my actual case, n=371293. Hence, I cannot create B and, then, run the code above to obtain C. This is because storing B would require a huge amount of memory.
Could you advise on how I could proceed to create C, without having to first store B? Comments on a different question suggest to
Draw at random m integers between 1 and n(n-1)/2.
I=randi([1 n*(n-1)/2],m,1);
Use ind2sub to obtain C.
Here, I'm struggling to implement the second step.
Thanks to the comments below, I wrote this
n=4;
m=10;
coord=NaN(m,2);
R= randi([1 n^2],m,1);
for i=1:m
[cr, cc]=ind2sub([n,n],R(i));
if cr>cc
coord(i,1)=cc;
coord(i,2)=cr;
elseif cr<cc
coord(i,1)=cr;
coord(i,2)=cc;
end
end
coord(any(isnan(coord),2),:) = []; %delete NaN rows from coord
I guess there are more efficient ways to implement the same thing.

You can use the function named myind2ind in this post to take random rows of all possible unordered pairs without generating all of them.
function [R , C] = myind2ind(ii, N)
jj = N * (N - 1) / 2 + 1 - ii;
r = (1 + sqrt(8 * jj)) / 2;
R = N -floor(r);
idx_first = (floor(r + 1) .* floor(r)) / 2;
C = idx_first-jj + R + 1;
end
I=randi([1 n*(n-1)/2],m,1);
[C1 C2] = myind2ind (I, n);

If you look at the odds, for i=1:n-1, the number of combinations where the first value is equal to i is (n-i) and the total number of cominations is n*(n-1)/2. You can use this law to generate the first column of C. The values of the second column of C can then be generated randomly as integers uniformly distributed in the range [i+1, n]. Here is a code that performs the desired tasks:
clc; clear all; close all;
% Parameters
n = 371293; m = 10;
% Generation of C
R = rand(m,1);
C = zeros(m,2);
s = 0;
t = n*(n-1)/2;
for i=1:n-1
if (i<n-1)
ind_i = R>=s/t & R<(s+n-i)/t;
else % To avoid rounding errors for n>>1, we impose (s+n-i)=t at the last iteration (R<(s+n-i)/t=1 always true)
ind_i = R>=s/t;
end
C(ind_i,1) = i;
C(ind_i,2) = randi([i+1,n],sum(ind_i),1);
s = s+n-i;
end
% Display
C
Output:
C =
84333 266452
46609 223000
176395 328914
84865 94391
104444 227034
221905 302546
227497 335959
188486 344305
164789 266497
153603 354932
Good luck!

Related

How to programmatically compute this summation

I want to compute the above summation, for a given 'x'. The summation is to be carried out over a block of lengths specified by an array , for example block_length = [5 4 3]. The summation is carried as follows: from -5 to 5 across one dimension, -4 to 4 in the second dimension and -3 to 3 in the last dimension.
The pseudo code will be something like this:
sum = 0;
for i = -5:5
for j = -4:4
for k = -3:3
vec = [i j k];
tv = vec * vec';
sum = sum + 1/(1+tv)*cos(2*pi*x*vec'));
end
end
end
The problem is that I want to find the sum when the number of dimensions are not known ahead of time, using some kind of variable nested loops hopefully. Matlab uses combvec, but it returns all possible combinations of vectors, which is not required as we only compute the sum. When there are many dimensions, combvec returning all combinations is not feasible memory wise.
Appreciate any ideas towards solutions.
PS: I want to do this at high number of dimensions, for example 650, as in machine learning.
Based on https://www.mathworks.com/matlabcentral/answers/345551-function-with-varying-number-of-for-loops I came up with the following code (I haven't tested it for very large number of indices!):
function sum = fun(x, block_length)
sum = 0;
n = numel(block_length); % Number of loops
vec = -ones(1, n) .* block_length; % Index vector
ready = false;
while ~ready
tv = vec * vec';
sum = sum + 1/(1+tv)*cos(2*pi*x*vec');
% Update the index vector:
ready = true; % Assume that the WHILE loop is ready
for k = 1:n
vec(k) = vec(k) + 1;
if vec(k) <= block_length(k)
ready = false;
break; % v(k) increased successfully, leave "for k" loop
end
vec(k) = -1 * block_length(k); % v(k) reached the limit, reset it
end
end
end
where x and block_length should be both 1-x-n vectors.
The idea is that, instead of using explicitly nested loops, we use a vector of indices.
How good/efficient is this when tackling the suggested use case where block_length can have 650 elements? Not much! Here's a "quick" test using merely 16 dimensions and a [-1, 1] range for the indices:
N = 16; tic; in = 0.1 * ones(1, N); sum = fun(in, ones(size(in))), toc;
which yields an elapsed time of 12.7 seconds on my laptop.

Memory usage by Matlab

I'm running a Matlab code in the HPC of my university. I have two versions of the code. The second version, despite generating a smaller array, seems to require more memory. I would like your help to understand if this is in fact the case and why.
Let me start from some preliminary lines:
clear
rng default
%Some useful components
n=7^4;
vectors{1}=[1,20,20,20,-1,Inf,-Inf];
vectors{2}=[-19,19,19,19,-20,Inf,-Inf];
vectors{3}=[-19,0,0,0,-20,Inf,-Inf];
vectors{4}=[-19,0,0,0,-20,Inf,-Inf];
T_temp = cell(1,4);
[T_temp{:}] = ndgrid(vectors{:});
T_temp = cat(4+1, T_temp{:});
T = reshape(T_temp,[],4); %all the possible 4-tuples from vectors{1}, ..., vectors{4}
This is the first version 1 of the code: I construct the matrix D1 listing all possible pairs of unordered rows from T
indices_pairs=pairIndices(n);
D1=[T(indices_pairs(:,1),:) T(indices_pairs(:,2),:)];
This is the second version of the code: I construct the matrix D2 listing a random draw of m=10^6 unordered pairs of rows from T
m=10^6;
p=n*(n-1)/2;
random_indices_pairs = randperm(p, m).';
[C1, C2] = myind2ind (random_indices_pairs, n);
indices_pairs=[C1 C2];
D2=[T(indices_pairs(:,1),:) T(indices_pairs(:,2),:)];
My question: when generating D2 the HPC goes out of memory. When generating D1 the HPC works fine, despite D1 being a larger array than D2. Why is that the case?
These are complementary functions used above:
function indices = pairIndices(n)
[y, x] = find(tril(logical(ones(n)), -1)); %#ok<LOGL>
indices = [x, y];
end
function [R , C] = myind2ind(ii, N)
jj = N * (N - 1) / 2 + 1 - ii;
r = (1 + sqrt(8 * jj)) / 2;
R = N -floor(r);
idx_first = (floor(r + 1) .* floor(r)) / 2;
C = idx_first-jj + R + 1;
end

Matlab function for cumulative power

Is there a function in MATLAB that generates the following matrix for a given scalar r:
1 r r^2 r^3 ... r^n
0 1 r r^2 ... r^(n-1)
0 0 1 r ... r^(n-2)
...
0 0 0 0 ... 1
where each row behaves somewhat like a power analog of the CUMSUM function?
You can compute each term directly using implicit expansion and element-wise power, and then apply triu:
n = 5; % size
r = 2; % base
result = triu(r.^max((1:n)-(1:n).',0));
Or, maybe a little faster because it doesn't compute unwanted powers:
n = 5; % size
r = 2; % base
t = (1:n)-(1:n).';
u = find(t>=0);
t = t(u);
result = zeros(n);
result(u) = r.^t;
Using cumprod and triu:
% parameters
n = 5;
r = 2;
% Create a square matrix filled with 1:
A = ones(n);
% Assign the upper triangular part shifted by one with r
A(triu(A,1)==1)=r;
% cumprod along the second dimension and get only the upper triangular part
A = triu(cumprod(A,2))
Well, cumsum accumulates the sum of a vector but you are asking for a specially design matrix, so the comparison is a bit problematic....
Anyway, it might be that there is a function for this if this is a common special case triangular matrix (my mathematical knowledge is limited here, sorry), but we can also build it quite easily (and efficiently=) ):
N = 10;
r = 2;
% allocate arry
ary = ones(1,N);
% initialize array
ary(2) = r;
for i = 3:N
ary(i) = ary(i-1)*r;
end
% build matrix i.e. copy the array
M = eye(N);
for i = 1:N
M(i,i:end) = ary(1:end-i+1);
end
This assumes that you want to have a matrix of size NxN and r is the value that you want calculate the power of.
FIX: a previous version stated in line 13 M(i,i:end) = ary(i:end);, but the assignment needs to start always at the first position of the ary

Implementing Simplex Method infinite loop

I am trying to implement a simplex algorithm following the rules I was given at my optimization course. The problem is
min c'*x s.t.
Ax = b
x >= 0
All vectors are assumes to be columns, ' denotes the transpose. The algorithm should also return the solution to dual LP. The rules to follow are:
Here, A_J denotes columns from A with indices in J and x_J, x_K denotes elements of vector x with indices in J or K respectively. Vector a_s is column s of matrix A.
Now I do not understand how this algorithm takes care of condition x >= 0, but I decided to give it a try and follow it step by step. I used Matlab for this and got the following code.
X = zeros(n, 1);
Y = zeros(m, 1);
% i. Choose starting basis J and K = {1,2,...,n} \ J
J = [4 5 6] % for our problem
K = setdiff(1:n, J)
% this while is for goto
while 1
% ii. Solve system A_J*\bar{x}_J = b.
xbar = A(:,J) \ b
% iii. Calculate value of criterion function with respect to current x_J.
fval = c(J)' * xbar
% iv. Calculate dual solution y from A_J^T*y = c_J.
y = A(:,J)' \ c(J)
% v. Calculate \bar{c}^T = c_K^T - u^T A_K. If \bar{c}^T >= 0, we have
% found the optimal solution. If not, select the smallest s \in K, such
% that c_s < 0. Variable x_s enters basis.
cbar = c(K)' - c(J)' * inv(A(:,J)) * A(:,K)
cbar = cbar'
tmp = findnegative(cbar)
if tmp == -1 % we have found the optimal solution since cbar >= 0
X(J) = xbar;
Y = y;
FVAL = fval;
return
end
s = findnegative(c, K) %x_s enters basis
% vi. Solve system A_J*\bar{a} = a_s. If \bar{a} <= 0, then the problem is
% unbounded.
abar = A(:,J) \ A(:,s)
if findpositive(abar) == -1 % we failed to find positive number
disp('The problem is unbounded.')
return;
end
% vii. Calculate v = \bar{x}_J / \bar{a} and find the smallest rho \in J,
% such that v_rho > 0. Variable x_rho exits basis.
v = xbar ./ abar
rho = J(findpositive(v))
% viii. Update J and K and goto ii.
J = setdiff(J, rho)
J = union(J, s)
K = setdiff(K, s)
K = union(K, rho)
end
Functions findpositive(x) and findnegative(x, S) return the first index of positive or negative value in x. S is the set of indices, over which we look at. If S is omitted, whole vector is checked. Semicolons are omitted for debugging purposes.
The problem I tested this code on is
c = [-3 -1 -3 zeros(1,3)];
A = [2 1 1; 1 2 3; 2 2 1];
A = [A eye(3)];
b = [2; 5; 6];
The reason for zeros(1,3) and eye(3) is that the problem is inequalities and we need slack variables. I have set starting basis to [4 5 6] because the notes say that starting basis should be set to slack variables.
Now, what happens during execution is that on first run of while, variable with index 1 enters basis (in Matlab, indices go from 1 on) and 4 exits it and that is reasonable. On the second run, 2 enters the basis (since it is the smallest index such that c(idx) < 0 and 1 leaves it. But now on the next iteration, 1 enters basis again and I understand why it enters, because it is the smallest index, such that c(idx) < 0. But here the looping starts. I assume that should not have happened, but following the rules I cannot see how to prevent this.
I guess that there has to be something wrong with my interpretation of the notes but I just cannot see where I am wrong. I also remember that when we solved LP on the paper, we were updating our subjective function on each go, since when a variable entered basis, we removed it from the subjective function and expressed that variable in subj. function with the expression from one of the equalities, but I assume that is different algorithm.
Any remarks or help will be highly appreciated.
The problem has been solved. Turned out that the point 7 in the notes was wrong. Instead, point 7 should be

Vectorizing sums of different diagonals in a matrix

I want to vectorize the following MATLAB code. I think it must be simple but I'm finding it confusing nevertheless.
r = some constant less than m or n
[m,n] = size(C);
S = zeros(m-r,n-r);
for i=1:m-r+1
for j=1:n-r+1
S(i,j) = sum(diag(C(i:i+r-1,j:j+r-1)));
end
end
The code calculates a table of scores, S, for a dynamic programming algorithm, from another score table, C.
The diagonal summing is to generate scores for individual pieces of the data used to generate C, for all possible pieces (of size r).
Thanks in advance for any answers! Sorry if this one should be obvious...
Note
The built-in conv2 turned out to be faster than convnfft, because my eye(r) is quite small ( 5 <= r <= 20 ). convnfft.m states that r should be > 20 for any benefit to manifest.
If I understand correctly, you're trying to calculate the diagonal sum of every subarray of C, where you have removed the last row and column of C (if you should not remove the row/col, you need to loop to m-r+1, and you need to pass the entire array C to the function in my solution below).
You can do this operation via a convolution, like so:
S = conv2(C(1:end-1,1:end-1),eye(r),'valid');
If C and r are large, you may want to have a look at CONVNFFT from the Matlab File Exchange to speed up calculations.
Based on the idea of JS, and as Jonas pointed out in the comments, this can be done in two lines using IM2COL with some array manipulation:
B = im2col(C, [r r], 'sliding');
S = reshape( sum(B(1:r+1:end,:)), size(C)-r+1 );
Basically B contains the elements of all sliding blocks of size r-by-r over the matrix C. Then we take the elements on the diagonal of each of these blocks B(1:r+1:end,:), compute their sum, and reshape the result to the expected size.
Comparing this to the convolution-based solution by Jonas, this does not perform any matrix multiplication, only indexing...
I would think you might need to rearrange C into a 3D matrix before summing it along one of the dimensions. I'll post with an answer shortly.
EDIT
I didn't manage to find a way to vectorise it cleanly, but I did find the function accumarray, which might be of some help. I'll look at it in more detail when I am home.
EDIT#2
Found a simpler solution by using linear indexing, but this could be memory-intensive.
At C(1,1), the indexes we want to sum are 1+[0, m+1, 2*m+2, 3*m+3, 4*m+4, ... ], or (0:r-1)+(0:m:(r-1)*m)
sum_ind = (0:r-1)+(0:m:(r-1)*m);
create S_offset, an (m-r) by (n-r) by r matrix, such that S_offset(:,:,1) = 0, S_offset(:,:,2) = m+1, S_offset(:,:,3) = 2*m+2, and so on.
S_offset = permute(repmat( sum_ind, [m-r, 1, n-r] ), [1, 3, 2]);
create S_base, a matrix of base array addresses from which the offset will be calculated.
S_base = reshape(1:m*n,[m n]);
S_base = repmat(S_base(1:m-r,1:n-r), [1, 1, r]);
Finally, use S_base+S_offset to address the values of C.
S = sum(C(S_base+S_offset), 3);
You can, of course, use bsxfun and other methods to make it more efficient; here I chose to lay it out for clarity. I have yet to benchmark this to see how it compares with the double-loop method though; I need to head home for dinner first!
Is this what you're looking for? This function adds the diagonals and puts them into a vector similar to how the function 'sum' adds up all of the columns in a matrix and puts them into a vector.
function [diagSum] = diagSumCalc(squareMatrix, LLUR0_ULLR1)
%
% Input: squareMatrix: A square matrix.
% LLUR0_ULLR1: LowerLeft to UpperRight addition = 0
% UpperLeft to LowerRight addition = 1
%
% Output: diagSum: A vector of the sum of the diagnols of the matrix.
%
% Example:
%
% >> squareMatrix = [1 2 3;
% 4 5 6;
% 7 8 9];
%
% >> diagSum = diagSumCalc(squareMatrix, 0);
%
% diagSum =
%
% 1 6 15 14 9
%
% >> diagSum = diagSumCalc(squareMatrix, 1);
%
% diagSum =
%
% 7 12 15 8 3
%
% Written by M. Phillips
% Oct. 16th, 2013
% MIT Open Source Copywrite
% Contact mphillips#hmc.edu fmi.
%
if (nargin < 2)
disp('Error on input. Needs two inputs.');
return;
end
if (LLUR0_ULLR1 ~= 0 && LLUR0_ULLR1~= 1)
disp('Error on input. Only accepts 0 or 1 as input for second condition.');
return;
end
[M, N] = size(squareMatrix);
if (M ~= N)
disp('Error on input. Only accepts a square matrix as input.');
return;
end
diagSum = zeros(1, M+N-1);
if LLUR0_ULLR1 == 1
squareMatrix = rot90(squareMatrix, -1);
end
for i = 1:length(diagSum)
if i <= M
countUp = 1;
countDown = i;
while countDown ~= 0
diagSum(i) = squareMatrix(countUp, countDown) + diagSum(i);
countUp = countUp+1;
countDown = countDown-1;
end
end
if i > M
countUp = i-M+1;
countDown = M;
while countUp ~= M+1
diagSum(i) = squareMatrix(countUp, countDown) + diagSum(i);
countUp = countUp+1;
countDown = countDown-1;
end
end
end
Cheers