How to randomly select multiple small and non-overlapping matrices from a large matrix? - matlab

Let's say I've a large N x M -sized matrix A (e.g. 1000 x 1000). Selecting k random elements without replacement from A is relatively straightforward in MATLAB:
A = rand(1000,1000); % Generate random data
k = 5; % Number of elements to be sampled
sizeA = numel(A); % Number of elements in A
idx = randperm(sizeA); % Random permutation
B = A(idx(1:k)); % Random selection of k elements from A
However, I'm looking for a way to expand the above concept so that I could randomly select k non-overlapping n x m -sized sub-matrices (e.g. 5 x 5) from A. What would be the most convenient way to achieve this? I'd very much appreciate any help!

This probably isn't the most efficient way to do this. I'm sure if I (or somebody else) gave it more thought there would be a better way but it should help you get started.
First I take the original idx(1:k) and reshape it into a 3D matrix reshape(idx(1:k), 1, 1, k). Then I extend it to the length required, padding with zeros, idx(k, k, 1) = 0; % Extend padding with zeros and lastly I use 2 for loops to create the correct indices
for n = 1:k
for m = 1:k
idx(m, 1:k, n) = size(A)*(m - 1) + idx(1, 1, n):size(A)*(m - 1) + idx(1, 1, n) + k - 1;
end
end
The complete script built onto the end of yours
A = rand(1000, 1000);
k = 5;
idx = randperm(numel(A));
B = A(idx(1:k));
idx = reshape(idx(1:k), 1, 1, k);
idx(k, k, 1) = 0; % Extend padding with zeros
for n = 1:k
for m = 1:k
idx(m, 1:k, n) = size(A)*(m - 1) + idx(1, 1, n):size(A)*(m - 1) + idx(1, 1, n) + k - 1;
end
end
C = A(idx);

Related

Take a random draw of all possible pairs of indices in Matlab

Consider a Matlab matrix B which lists all possible unordered pairs (without repetitions) from [1 2 ... n]. For example, if n=4,
B=[1 2;
1 3;
1 4;
2 3;
2 4;
3 4]
Note that B has size n(n-1)/2 x 2
I want to take a random draw of m rows from B and store them in a matrix C. Continuing the example above, I could do that as
m=2;
C=B(randi([1 size(B,1)],m,1),:);
However, in my actual case, n=371293. Hence, I cannot create B and, then, run the code above to obtain C. This is because storing B would require a huge amount of memory.
Could you advise on how I could proceed to create C, without having to first store B? Comments on a different question suggest to
Draw at random m integers between 1 and n(n-1)/2.
I=randi([1 n*(n-1)/2],m,1);
Use ind2sub to obtain C.
Here, I'm struggling to implement the second step.
Thanks to the comments below, I wrote this
n=4;
m=10;
coord=NaN(m,2);
R= randi([1 n^2],m,1);
for i=1:m
[cr, cc]=ind2sub([n,n],R(i));
if cr>cc
coord(i,1)=cc;
coord(i,2)=cr;
elseif cr<cc
coord(i,1)=cr;
coord(i,2)=cc;
end
end
coord(any(isnan(coord),2),:) = []; %delete NaN rows from coord
I guess there are more efficient ways to implement the same thing.
You can use the function named myind2ind in this post to take random rows of all possible unordered pairs without generating all of them.
function [R , C] = myind2ind(ii, N)
jj = N * (N - 1) / 2 + 1 - ii;
r = (1 + sqrt(8 * jj)) / 2;
R = N -floor(r);
idx_first = (floor(r + 1) .* floor(r)) / 2;
C = idx_first-jj + R + 1;
end
I=randi([1 n*(n-1)/2],m,1);
[C1 C2] = myind2ind (I, n);
If you look at the odds, for i=1:n-1, the number of combinations where the first value is equal to i is (n-i) and the total number of cominations is n*(n-1)/2. You can use this law to generate the first column of C. The values of the second column of C can then be generated randomly as integers uniformly distributed in the range [i+1, n]. Here is a code that performs the desired tasks:
clc; clear all; close all;
% Parameters
n = 371293; m = 10;
% Generation of C
R = rand(m,1);
C = zeros(m,2);
s = 0;
t = n*(n-1)/2;
for i=1:n-1
if (i<n-1)
ind_i = R>=s/t & R<(s+n-i)/t;
else % To avoid rounding errors for n>>1, we impose (s+n-i)=t at the last iteration (R<(s+n-i)/t=1 always true)
ind_i = R>=s/t;
end
C(ind_i,1) = i;
C(ind_i,2) = randi([i+1,n],sum(ind_i),1);
s = s+n-i;
end
% Display
C
Output:
C =
84333 266452
46609 223000
176395 328914
84865 94391
104444 227034
221905 302546
227497 335959
188486 344305
164789 266497
153603 354932
Good luck!

How to fill columns of a matrix with random numbers of specific range?

I've a matrix of order 100*10 . Now the objective is to fill each columns of the matrix with random integer within a specific range. Now the problem is for every column the range of the random number changes. For instance, for the first column, the range is [1,100] , for the second its -10 to 1 and so on till 10th column.
This is what I've tried:
b = [0,100;-10,1;0,1;-1,1;10,20]
a = []
for i=1 to 10
a[] = [(i:100)' randi(1,100)]
end
How do I generate a matrix of this form?
I don't have matlab installed right now, but i would do something like this.
m = 100;
n = size(b, 1);
range = b(:, 2) - b(:, 1);
offset = b(:, 1);
A = round(bsxfun(#minus, bsxfun(#times, rand(m, n), range), offset);
Without loop it would become:
M = 100;
N = size(b, 1);
A = zeros(m, n); % preallocate to avoid matrix expansion
for ii = 1:n
A(:, ii) = randi(b(ii,:), m, 1);
end

Smarter way to generate a matrix of zeros and ones in Matlab

I would like to generate all the possible adjacency matrices (zero diagonale) of an undirected graph of n nodes.
For example, with no relabeling for n=3 we get 23(3-1)/2 = 8 possible network configurations (or adjacency matrices).
One solution that works for n = 3 (and which I think is quite stupid) would be the following:
n = 3;
A = [];
for k = 0:1
for j = 0:1
for i = 0:1
m = [0 , i , j ; i , 0 , k ; j , k , 0 ];
A = [A, m];
end
end
end
Also I though of the following which seems to be faster but something is wrong with my indexing since 2 matrices are missing:
n = 3
C = [];
E = [];
A = zeros(n);
for i = 1:n
for j = i+1:n
A(i,j) = 1;
A(j,i) = 1;
C = [C,A];
end
end
B = ones(n);
B = B- diag(diag(ones(n)));
for i = 1:n
for j = i+1:n
B(i,j) = 0;
B(j,i) = 0;
E = [E,B];
end
end
D = [C,E]
Is there a faster way of doing this?
I would definitely generate the off-diagonal elements of the adjacency matrices with binary encoding:
n = 4; %// number of nodes
m = n*(n-1)/2;
offdiags = dec2bin(0:2^m-1,m)-48; %//every 2^m-1 possible configurations
If you have the Statistics and Machine Learning Toolbox, then squareform will easily create the matrices for you, one by one:
%// this is basically a for loop
tmpcell = arrayfun(#(k) squareform(offdiags(k,:)),1:size(offdiags,1),...
'uniformoutput',false);
A = cat(2,tmpcell{:}); %// concatenate the matrices in tmpcell
Although I'd consider concatenating along dimension 3, then you can see each matrix individually and conveniently.
Alternatively, you can do the array synthesis yourself in a vectorized way, it's probably even quicker (at the cost of more memory):
A = zeros(n,n,2^m);
%// lazy person's indexing scheme:
[ind_i,ind_j,ind_k] = meshgrid(1:n,1:n,1:2^m);
A(ind_i>ind_j) = offdiags.'; %'// watch out for the transpose
%// copy to upper diagonal:
A = A + permute(A,[2 1 3]); %// n x n x 2^m matrix
%// reshape to n*[] matrix if you wish
A = reshape(A,n,[]); %// n x (n*2^m) matrix

Vectorizing a nested for loop which fills a dynamic programming table

I was wondering if there was a way to vectorize the nested for loop in this function which is filling up the entries of the 2D dynamic programming table DP. I believe that at the very least the inner loop could be vectorized as each row only depends on the previous row. I'm not sure how to do it though. Note this function is called on large 2D arrays (images) so the nested for loop really doesn't cut it.
function [cols] = compute_seam(energy)
[r, c, ~] = size(energy);
cols = zeros(r);
DP = padarray(energy, [0, 1], Inf);
BP = zeros(r, c);
for i = 2 : r
for j = 1 : c
[x, l] = min([DP(i - 1, j), DP(i - 1, j + 1), DP(i - 1, j + 2)]);
DP(i, j + 1) = DP(i, j + 1) + x;
BP(i, j) = j + (l - 2);
end
end
[~, j] = min(DP(r, :));
j = j - 1;
for i = r : -1 : 1
cols(i) = j;
j = BP(i, j);
end
end
Vectorization of the innermost nested loop
You were right in postulating that at least the inner loop is vectorizable. Here's the modified code for the nested loops part -
rows_DP = size(DP,1); %// rows in DP
%// Get first row linear indices for a group of neighboring three columns,
%// which would be incremented as we move between rows with the row iterator
start_ind1 = bsxfun(#plus,[1:rows_DP:2*rows_DP+1]',[0:c-1]*rows_DP); %//'
for i = 2 : r
ind1 = start_ind1 + i-2; %// setup linear indices for the row of this iteration
[x,l] = min(DP(ind1),[],1); %// get x and l values in one go
DP(i,2:c+1) = DP(i,2:c+1) + x; %// set DP values of a row in one go
BP(i,1:c) = [1:c] + l-2; %// set BP values of a row in one go
end
Benchmarking
Benchmarking Code -
N = 3000; %// Datasize
energy = rand(N);
[r, c, ~] = size(energy);
disp('------------------------------------- With Original Code')
DP = padarray(energy, [0, 1], Inf);
BP = zeros(r, c);
tic
for i = 2 : r
for j = 1 : c
[x, l] = min([DP(i - 1, j), DP(i - 1, j + 1), DP(i - 1, j + 2)]);
DP(i, j + 1) = DP(i, j + 1) + x;
BP(i, j) = j + (l - 2);
end
end
toc,clear DP BP x l
disp('------------------------------------- With Vectorized Code')
DP = padarray(energy, [0, 1], Inf);
BP = zeros(r, c);
tic
rows_DP = size(DP,1); %// rows in DP
start_ind1 = bsxfun(#plus,[1:rows_DP:2*rows_DP+1]',[0:c-1]*rows_DP); %//'
for i = 2 : r
ind1 = start_ind1 + i-2; %// setup linear indices for the row of this iteration
[x,l] = min(DP(ind1),[],1); %// get x and l values in one go
DP(i,2:c+1) = DP(i,2:c+1) + x; %// set DP values of a row in one go
BP(i,1:c) = [1:c] + l-2; %// set BP values of a row in one go
end
toc
Results -
------------------------------------- With Original Code
Elapsed time is 44.200746 seconds.
------------------------------------- With Vectorized Code
Elapsed time is 1.694288 seconds.
Thus, you might enjoy a good 26x speedup improvement in performance with that little vectorization tweak.
More tweaks
Few more optimization tweaks could be tried into your code for performance -
cols = zeros(r) could be replaced with col(r,r) = 0.
DP = padarray(energy, [0, 1], Inf) could be replaced with
DP(1:size(energy,1),1:size(energy,2)+2)=Inf;
DP(:,2:end-1) = energy;
BP = zeros(r, c) could be replaced with BP(r, c) = 0.
The pre-allocation tweaks used here are inspired by this blog post.

Matlab: Argmax and dot product for each row in a matrix

I have 2 matrices = X in R^(n*m) and W in R^(k*m) where k<<n.
Let x_i be the i-th row of X and w_j be the j-th row of W.
I need to find, for each x_i what is the j that maximizes <w_j,x_i>
I can't see a way around iterating over all the rows in X, but it there a way to find the maximum dot product without iterating every time over all of W?
A naive implementation would be:
n = 100;
m = 50;
k = 10;
X = rand(n,m);
W = rand(k,m);
Y = zeros(n, 1);
for i = 1 : n
max_ind = 1;
max_val = dot(W(1,:), X(i,:));
for j = 2 : k
cur_val = dot(W(j,:),X(i,:));
if cur_val > max_val
max_val = cur_val;
max_ind = j;
end
end
Y(i,:) = max_ind;
end
Dot product is essentially matrix multiplication:
[~, Y] = max(W*X');
bsxfun based approach to speed-up things for you -
[~,Y] = max(sum(bsxfun(#times,X,permute(W,[3 2 1])),2),[],3)
On my system, using your dataset I am getting a 100x+ speedup with this.
One can think of two more "closeby" approaches, but they don't seem to give any huge improvement over the earlier one -
[~,Y] = max(squeeze(sum(bsxfun(#times,X,permute(W,[3 2 1])),2)),[],2)
and
[~,Y] = max(squeeze(sum(bsxfun(#times,X',permute(W,[2 3 1]))))')