Vectorizing a nested for loop which fills a dynamic programming table - matlab

I was wondering if there was a way to vectorize the nested for loop in this function which is filling up the entries of the 2D dynamic programming table DP. I believe that at the very least the inner loop could be vectorized as each row only depends on the previous row. I'm not sure how to do it though. Note this function is called on large 2D arrays (images) so the nested for loop really doesn't cut it.
function [cols] = compute_seam(energy)
[r, c, ~] = size(energy);
cols = zeros(r);
DP = padarray(energy, [0, 1], Inf);
BP = zeros(r, c);
for i = 2 : r
for j = 1 : c
[x, l] = min([DP(i - 1, j), DP(i - 1, j + 1), DP(i - 1, j + 2)]);
DP(i, j + 1) = DP(i, j + 1) + x;
BP(i, j) = j + (l - 2);
end
end
[~, j] = min(DP(r, :));
j = j - 1;
for i = r : -1 : 1
cols(i) = j;
j = BP(i, j);
end
end

Vectorization of the innermost nested loop
You were right in postulating that at least the inner loop is vectorizable. Here's the modified code for the nested loops part -
rows_DP = size(DP,1); %// rows in DP
%// Get first row linear indices for a group of neighboring three columns,
%// which would be incremented as we move between rows with the row iterator
start_ind1 = bsxfun(#plus,[1:rows_DP:2*rows_DP+1]',[0:c-1]*rows_DP); %//'
for i = 2 : r
ind1 = start_ind1 + i-2; %// setup linear indices for the row of this iteration
[x,l] = min(DP(ind1),[],1); %// get x and l values in one go
DP(i,2:c+1) = DP(i,2:c+1) + x; %// set DP values of a row in one go
BP(i,1:c) = [1:c] + l-2; %// set BP values of a row in one go
end
Benchmarking
Benchmarking Code -
N = 3000; %// Datasize
energy = rand(N);
[r, c, ~] = size(energy);
disp('------------------------------------- With Original Code')
DP = padarray(energy, [0, 1], Inf);
BP = zeros(r, c);
tic
for i = 2 : r
for j = 1 : c
[x, l] = min([DP(i - 1, j), DP(i - 1, j + 1), DP(i - 1, j + 2)]);
DP(i, j + 1) = DP(i, j + 1) + x;
BP(i, j) = j + (l - 2);
end
end
toc,clear DP BP x l
disp('------------------------------------- With Vectorized Code')
DP = padarray(energy, [0, 1], Inf);
BP = zeros(r, c);
tic
rows_DP = size(DP,1); %// rows in DP
start_ind1 = bsxfun(#plus,[1:rows_DP:2*rows_DP+1]',[0:c-1]*rows_DP); %//'
for i = 2 : r
ind1 = start_ind1 + i-2; %// setup linear indices for the row of this iteration
[x,l] = min(DP(ind1),[],1); %// get x and l values in one go
DP(i,2:c+1) = DP(i,2:c+1) + x; %// set DP values of a row in one go
BP(i,1:c) = [1:c] + l-2; %// set BP values of a row in one go
end
toc
Results -
------------------------------------- With Original Code
Elapsed time is 44.200746 seconds.
------------------------------------- With Vectorized Code
Elapsed time is 1.694288 seconds.
Thus, you might enjoy a good 26x speedup improvement in performance with that little vectorization tweak.
More tweaks
Few more optimization tweaks could be tried into your code for performance -
cols = zeros(r) could be replaced with col(r,r) = 0.
DP = padarray(energy, [0, 1], Inf) could be replaced with
DP(1:size(energy,1),1:size(energy,2)+2)=Inf;
DP(:,2:end-1) = energy;
BP = zeros(r, c) could be replaced with BP(r, c) = 0.
The pre-allocation tweaks used here are inspired by this blog post.

Related

MATLAB find the average time using tic toc

Construct an experiment to study the performance of the Cramer rule (with two implementations
determinants) in relation to Gauss's algorithm.
In each iteration 10 random arrays A (NxN), and vectors b (Nx1) will be created.
The 10 linear systems will be solved using the Cramer rule ("cramer.m") using
of rec_det (A) and using det (A), and the Gaussian algorithm
(“GaussianElimination.m”), and the time for each technique will be the average of 10 values.
Repeat the above for N = 2 to 10 and make a graph of the average time
in relation to the dimension N.
This is my task. I dont know if the way that I calculate the average time is correct and the graphic is not displayed.
T1=0;
T2=0;
T3=0;
for N=2:10
for i=1:10
A=rand(N,N);
b=rand(N,1);
t1=[1,i];
t2=[1,i];
t3=[1,i];
tic;
crammer(A,b);
t1(i)=toc;
tic
crammer_rec(A,b);
t2(i)=toc;
tic
gaussianElimination(A,b);
t3(i)=toc;
T1=T1+t1(i);
T2=T2+t2(i);
T3=T3+t3(i);
end
avT1=T1/10;
avT2=T2/10;
avT3=T3/10;
end
plot(2:10 , avT1 , 2:10 , avT2 , 2:10 , avT3);
function x = cramer(A, b)
n = length(b);
d = det(A);
% d = rec_det(A);
x = zeros(n, 1);
for j = 1:n
x(j) = det([A(:,1:j-1) b A(:,j+1:end)]) / d;
% x(j) = rec_det([A(:,1:j-1) b A(:,j+1:end)]) / d;
end
end
function x = cramer(A, b)
n = length(b);
d = rec_det(A);
x = zeros(n, 1);
for j = 1:n
x(j) = rec_det([A(:,1:j-1) b A(:,j+1:end)]) / d;
end
end
function deta = rec_det(R)
if size(R,1)~=size(R,2)
error('Error.Matrix must be square.')
else
n = size(R,1);
if ( n == 2 )
deta=(R(1,1)*R(2,2))-(R(1,2)*R(2,1));
else
for i=1:n
deta_temp=R;
deta_temp(1,:)=[ ];
deta_temp(:,i)=[ ];
if i==1
deta=(R(1,i)*((-1)^(i+1))*rec_det(deta_temp));
else
deta=deta+(R(1,i)*((-1)^(i+1))*rec_det(deta_temp));
end
end
end
end
end
function x = gaussianElimination(A, b)
[m, n] = size(A);
if m ~= n
error('Matrix A must be square!');
end
n1 = length(b);
if n1 ~= n
error('Vector b should be equal to the number of rows and columns of A!');
end
Aug = [A b]; % build the augmented matrix
C = zeros(1, n + 1);
% elimination phase
for k = 1:n - 1
% ensure that the pivoting point is the largest in its column
[pivot, j] = max(abs(Aug(k:n, k)));
C = Aug(k, :);
Aug(k, :) = Aug(j + k - 1, :);
Aug(j + k - 1, :) = C;
if Aug(k, k) == 0
error('Matrix A is singular');
end
for i = k + 1:n
r = Aug(i, k) / Aug(k, k);
Aug(i, k:n + 1) = Aug(i, k:n + 1) - r * Aug(k, k: n + 1);
end
end
% back substitution phase
x = zeros(n, 1);
x(n) = Aug(n, n + 1) / Aug(n, n);
for k = n - 1:-1:1
x(k) = (Aug(k, n + 1) - Aug(k, k + 1:n) * x(k + 1:n)) / Aug(k, k);
end
end
I think the easiest way to do this is by creating a 9 * 3 dimensional matrix to contain all the total times, and then take the average at the end.
allTimes = zeros(9, 3);
for N=2:10
for ii=1:10
A=rand(N,N);
b=rand(N,1);
tic;
crammer(A,b);
temp = toc;
allTimes(N-1,1) = allTimes(N-1,1) + temp;
tic
crammer_rec(A,b);
temp = toc;
allTimes(N-1,2) = allTimes(N-1,2) + temp;
tic
gaussianElimination(A,b);
temp = toc;
allTimes(N-1,3) = allTimes(N-1,3) + temp;
end
end
allTimes = allTimes/10;
figure; plot(2:10, allTimes);
You can use this approach because the numbers are quite straightforward and simple. If you had a more complicated setup, the way to store the times/calculate the averages would have to be tweaked.
If you had more functions you could also use function handles and create a third inner loop, but this is a little more advanced.

Filling MATLAB array using formula and arrays of values

I want to fill a 10x15 matrix in MATLAB using the formula z(i, j) = 2 * x(i) + 3 * y(j)^2, so that each entry at (i, j) = z(i, j). I have arrays for x and y, which are of size 10 and 15, respectively.
I've accomplished the task using the code below, but I want to do it in one line, since I'm told it's possible. Any thoughts?
x = linspace(0,1,10);
y = linspace(-0.5,0.5,15);
z = zeros(10,15);
m_1 = 2;
m_2 = 3;
for i = 1:length(x)
for j = 1:length(y)
z(i, j) = m_1*x(i) + m_2*y(i)^2;
end
end
It looks like you have a bug in your original loop:
You are using i index twice: m_1*x(i) + m_2*y(i)^2.
The result is that all the columns of z matrix are the same.
For applying the formula z(i, j) = 2*x(i) + 3*y(j)^2 use the following loop:
x = linspace(0,1,10);
y = linspace(-0.5,0.5,15);
z = zeros(10,15);
m_1 = 2;
m_2 = 3;
for i = 1:length(x)
for j = 1:length(y)
z(i, j) = m_1*x(i) + m_2*y(j)^2;
end
end
For implementing the above loop using one line, we may use meshgrid first.
Replace the loop with:
[Y, X] = meshgrid(y, x);
Z = m_1*X + m_2*Y.^2;
For expansions, read the documentation of meshgrid, it is much better than any of the expansions I can write...
The following command gives the same output as your original loop (but it's probably irrelevant):
Z = repmat((m_1*x + m_2*y(1:length(x)).^2)', [1, length(y)]);
Testing:
max(max(abs(Z - z)))
ans =
0

Plotting numbers in a Cell array

I want to just plot the data, which is all real numbers, that is stored in a Cell Array. My cell array is 1-100 1-dimensional, but I am confused on how to actually apply the plot() function with the hold on function.
Here is my code:
% Initalize arrays for storing data
C = cell(1,100); % Store output vector from floww()
D = cell(1,6); % User inputted initial point
I1 = cell(1,100);
I2 = cell(1,100);
I3 = cell(1,100);
%Declare alpha and beta variables detailed in Theorem 1 of paper
a1 = 0; a2 = 2; a3 = 4; a4 = 6;
b1 = 2; b2 = 3; b3 = 7; b4 = 10;
% Declare the \lambda_i, i=1,..., 6, variables
L = cell(1,6);
L1 = abs((b2 - b3)/(a2 - a3));
L2 = abs((b1 - b3)/(a1 - a3));
L3 = abs((b1 - b2)/(a1 - a2));
L4 = abs((b1 - b4)/(a1 - a4));
L5 = abs((b2 - b4)/(a2 - a4));
L6 = abs((b3 - b4)/(a3 - a4));
L{1,1} = L1;
L{1,2} = L2;
L{1,3} = L3;
L{1,4} = L4;
L{1,5} = L5;
L{1,6} = L6;
% Create function handle for floww()
F = #floww;
for j = 1:6
D{1,j} = input('Input in1 through in6: ');
end
% Iterate through floww()
k = [0:5:100];
for i = 1: 100
C{1,i} = F(D{1,1}, D{1,2}, D{1,3}, D{1,4}, D{1,5}, D{1,6},L); % Output from floww() is a 6-by-1 vector
for j = 1:6
D{1,j} = C{1,i}(j,1); % Reassign input values to put back into floww()
end
% First integrals as described in the paper
I1{1,i} = 2*(C{1,i}(1,1)).^2 + 2*(C{1,i}(2,1)).^2 + 2*(C{1,i}(3,1)).^2 + 2*(C{1,i}(4,1)).^2 + 2*(C{1,i}(5,1)).^2 + 2*(C{1,i}(6,1)).^2;
I2{1,i} = (-C{1,i}(3,1))*(-C{1,i}(6,1)) - (C{1,i}(2,1))*(-C{1,i}(5,1)) + (-C{1,i}(1,1))*(-C{1,i}(4,1));
I3{1,i} = 2*L1*(C{1,i}(1,1)).^2 + 2*L2*(C{1,i}(2,1)).^2 + 2*L3*(C{1,i}(3,1)).^2 + 2*L4*(C{1,i}(4,1)).^2 + 2*L5*(C{1,i}(5,1)).^2 + 2*L6*(C{1,i}(6,1)).^2;
plot(k, I1{1,i});
hold on;
end
% This function will solve the linear system
% Bx^(n+1) = x detailed in the research notes
function [out1] = floww(in1, in2, in3, in4, in5, in6, L)
% A_ij = (lambda_i - lambda_j)
% Declare relevant A_ij values
A32 = L{1,3} - L{1,2};
A65 = L{1,6} - L{1,5};
A13 = L{1,1} - L{1,3};
A46 = L{1,4} - L{1,6};
A21 = L{1,2} - L{1,1};
A54 = L{1,5} - L{1,4};
A35 = L{1,3} - L{1,5};
A62 = L{1,6} - L{1,2};
A43 = L{1,4} - L{1,3};
A16 = L{1,1} - L{1,6};
A24 = L{1,2} - L{1,4};
A51 = L{1,5} - L{1,1};
% Declare del(T)
delT = 1;
% Declare the 6-by-6 coefficient matrix B
B = [1, -A32*(delT/2)*in3, -A32*(delT/2)*in2, 0, -A65*(delT/2)*in6, -A65*(delT/2)*in5;
-A13*(delT/2)*in3, 1, -A13*(delT/2)*in1, -A46*(delT/2)*in6, 0, A46*(delT/2)*in4;
-A21*(delT/2)*in2, -A21*(delT/2)*in1, 1, -A54*(delT/2)*in5, -A54*(delT/2)*in4, 0;
0, -A62*(delT/2)*in6, -A35*(delT/2)*in5, 1, -A35*(delT/2)*in3, -A62*(delT/2)*in2;
-A16*(delT/2)*in6, 0, -A43*(delT/2)*in4, -A43*(delT/2)*in3, 1, -A16*(delT/2)*in1;
-A51*(delT/2)*in5, -A24*(delT/2)*in4, 0, -A24*(delT/2)*in2, -A51*(delT/2)*in1, 1];
% Declare input vector
N = [in1; in2; in3; in4; in5; in6];
% Solve the system Bx = N for x where x
% denotes the X_i^(n+1) vector in research notes
x = B\N;
% Assign output variables
out1 = x;
%disp(x);
%disp(out1(2,1));
end
The plotting takes place in the for-loop with plot(k, I1{1,i});. The figure that is outputted is not what I expect nor want:
Can someone please explain to me what I am doing wrong and/or how to get what I want?
You need to stop using cell arrays for numeric data, and indexed variable names when an array would be way simpler.
I've edited your code, below, to plot the I1 array.
To make it work, I changed almost all cell arrays to numeric arrays and simplified a bunch of the indexing. Note initialisation is now with zeros instead of cell, therefore indexing with parentheses () not curly braces {}.
I didn't change the structure too much, because your comments indicated you were following some literature with this layout
For the plotting, you were trying to plot single points during the loop - to do that you have no line (the points are distinct), so need to specify a marker like plot(x,y,'o'). However, what I've done is just plot after the loop - since you're storing the resulting I1 array anyway.
% Initalize arrays for storing data
C = cell(1,100); % Store output vector from floww()
D = zeros(1,6); % User inputted initial point
I1 = zeros(1,100);
I2 = zeros(1,100);
I3 = zeros(1,100);
%Declare alpha and beta variables detailed in Theorem 1 of paper
a1 = 0; a2 = 2; a3 = 4; a4 = 6;
b1 = 2; b2 = 3; b3 = 7; b4 = 10;
% Declare the \lambda_i, i=1,..., 6, variables
L = zeros(1,6);
L(1) = abs((b2 - b3)/(a2 - a3));
L(2) = abs((b1 - b3)/(a1 - a3));
L(3) = abs((b1 - b2)/(a1 - a2));
L(4) = abs((b1 - b4)/(a1 - a4));
L(5) = abs((b2 - b4)/(a2 - a4));
L(6) = abs((b3 - b4)/(a3 - a4));
for j = 1:6
D(j) = input('Input in1 through in6: ');
end
% Iterate through floww()
for i = 1:100
C{i} = floww(D(1), D(2), D(3), D(4), D(5), D(6), L); % Output from floww() is a 6-by-1 vector
for j = 1:6
D(j) = C{i}(j,1); % Reassign input values to put back into floww()
end
% First integrals as described in the paper
I1(i) = 2*(C{i}(1,1)).^2 + 2*(C{i}(2,1)).^2 + 2*(C{i}(3,1)).^2 + 2*(C{i}(4,1)).^2 + 2*(C{i}(5,1)).^2 + 2*(C{i}(6,1)).^2;
I2(i) = (-C{i}(3,1))*(-C{i}(6,1)) - (C{i}(2,1))*(-C{i}(5,1)) + (-C{i}(1,1))*(-C{i}(4,1));
I3(i) = 2*L(1)*(C{i}(1,1)).^2 + 2*L(2)*(C{i}(2,1)).^2 + 2*L(3)*(C{i}(3,1)).^2 + 2*L(4)*(C{i}(4,1)).^2 + 2*L(5)*(C{i}(5,1)).^2 + 2*L(6)*(C{i}(6,1)).^2;
end
plot(1:100, I1);
% This function will solve the linear system
% Bx^(n+1) = x detailed in the research notes
function [out1] = floww(in1, in2, in3, in4, in5, in6, L)
% A_ij = (lambda_i - lambda_j)
% Declare relevant A_ij values
A32 = L(3) - L(2);
A65 = L(6) - L(5);
A13 = L(1) - L(3);
A46 = L(4) - L(6);
A21 = L(2) - L(1);
A54 = L(5) - L(4);
A35 = L(3) - L(5);
A62 = L(6) - L(2);
A43 = L(4) - L(3);
A16 = L(1) - L(6);
A24 = L(2) - L(4);
A51 = L(5) - L(1);
% Declare del(T)
delT = 1;
% Declare the 6-by-6 coefficient matrix B
B = [1, -A32*(delT/2)*in3, -A32*(delT/2)*in2, 0, -A65*(delT/2)*in6, -A65*(delT/2)*in5;
-A13*(delT/2)*in3, 1, -A13*(delT/2)*in1, -A46*(delT/2)*in6, 0, A46*(delT/2)*in4;
-A21*(delT/2)*in2, -A21*(delT/2)*in1, 1, -A54*(delT/2)*in5, -A54*(delT/2)*in4, 0;
0, -A62*(delT/2)*in6, -A35*(delT/2)*in5, 1, -A35*(delT/2)*in3, -A62*(delT/2)*in2;
-A16*(delT/2)*in6, 0, -A43*(delT/2)*in4, -A43*(delT/2)*in3, 1, -A16*(delT/2)*in1;
-A51*(delT/2)*in5, -A24*(delT/2)*in4, 0, -A24*(delT/2)*in2, -A51*(delT/2)*in1, 1];
% Declare input vector
N = [in1; in2; in3; in4; in5; in6];
% Solve the system Bx = N for x where x
% denotes the X_i^(n+1) vector in research notes
x = B\N;
% Assign output variables
out1 = x;
end
Output with in1..6 = 1 .. 6:
Note: you could simplify this code a lot if you embraced arrays over clunky variable names. The below achieves the exact same result for the body of your script, but is much more flexible and maintainable:
See how much simpler your integral expressions become!
% Initalize arrays for storing data
C = cell(1,100); % Store output vector from floww()
D = zeros(1,6); % User inputted initial point
I1 = zeros(1,100);
I2 = zeros(1,100);
I3 = zeros(1,100);
%Declare alpha and beta variables detailed in Theorem 1 of paper
a = [0, 2, 4, 6];
b = [2, 3, 7, 10];
% Declare the \lambda_i, i=1,..., 6, variables
L = abs( ( b([2 1 1 1 2 3]) - b([3 3 2 4 4 4]) ) ./ ...
( a([2 1 1 1 2 3]) - a([3 3 2 4 4 4]) ) );
for j = 1:6
D(j) = input('Input in1 through in6: ');
end
% Iterate through floww()
k = 1:100;
for i = k
C{i} = floww(D(1), D(2), D(3), D(4), D(5), D(6), L); % Output from floww() is a 6-by-1 vector
D = C{i}; % Reassign input values to put back into floww()
% First integrals as described in the paper
I1(i) = 2*sum(D.^2);
I2(i) = sum( D(1:3).*D(4:6) );
I3(i) = 2*sum((L.').*D.^2).^2;
end
plot( k, I1 );
Edit:
You can simplify the floww function by using a couple of things
A can be declared really easily as a single matrix.
Notice delT/2 is a factor in almost every element, factor it out!
The only non-zero elements where delT/2 isn't a factor are the diagonal of ones... add this in using eye instead.
Input the in1..6 variables as a vector. You already have the vector when you call floww - makes no sense breaking it up.
With the input as a vector, we can use utility functions like hankel to do some neat indexing. This one is a stretch for a beginner, but I include it as a demo.
Code:
% In code body, call floww with an array input
C{i} = floww(D, L);
% ...
function [out1] = floww(D, L)
% A_ij = (lambda_i - lambda_j)
% Declare A_ij values in a matrix
A = L.' - L;
% Declare del(T)
delT = 1;
% Declare the 6-by-6 coefficient matrix B
% Factored out (delt/2) and the D coefficients
B = eye(6,6) - (delT/2) * D( hankel( [4 3 2 1 6 5], [5 4 3 2 1 6] ) ) .*...
[ 0, A(3,2), A(3,2), 0, A(6,5), A(6,5);
A(1,3), 0, A(1,3), A(4,6), 0, -A(4,6);
A(2,1), A(2,1), 0, A(5,4), A(5,4), 0;
0, A(6,2), A(3,5), 0, A(3,5), A(6,2);
A(1,6), 0, A(4,3), A(4,3), 0, A(1,6);
A(5,1), A(2,4), 0, A(2,4), A(5,1), 0];
% Solve the system Bx = N for x where x
% denotes the X_i^(n+1) vector in research notes
out1 = B\D(:);
end
You see when we simplify things like this, code is easier to read. For instance, it looks to me (without knowing the literature at all) like you've got a sign error in your B(2,6) element, it's the opposite sign to all other elements...

Optimizing DP in matlab

I have the following DP which I am applying on a binarized image (either 0 or 1) in Matlab
[x, y] = size(img);
dp = zeros(x, y);
dp(1,:) = img(1,:);
dp(:,1) = img(:,1);
for i = 2:x
for j = 2:y
if img(i, j) == 0
dp(i, j) = min([dp(i, j - 1), dp(i - 1, j), dp(i - 1, j - 1)]) + 1;
end
end
end
The code for large x and y takes a lot of time maybe because of the if condition and using for loops instead of writing vectorized code.
Can anyone optimize it.?
Or is there any approach which optimizes the above code by exploiting the fact that the matrix img contains either 0 or 1 (fewer 1s than 0s).
Also is it possible to somehow use parallel for loops to speed up.?
As far as I am aware, you cannot really speed up this computation in general. But if you know that there are only very few entries where img(i,j)==0 following approach might save you a little bit of time:
[x, y] = size(img);
dp = zeros(x, y);
dp(1,:) = img(1,:);
dp(:,1) = img(:,1);
[i, j] = find(img(2:end, 2:end) == 0); % Extract only these pixels where we actually need to do something
i = i + 1; %correct for removing the first row and column
j = j + 1;
for k = 1:numel(i);
dp(i(k), j(k)) = min([dp(i(k), j(k) - 1), dp(i(k) - 1, j(k)), dp(i(k) - 1, j(k) - 1)]) + 1;
end

How to randomly select multiple small and non-overlapping matrices from a large matrix?

Let's say I've a large N x M -sized matrix A (e.g. 1000 x 1000). Selecting k random elements without replacement from A is relatively straightforward in MATLAB:
A = rand(1000,1000); % Generate random data
k = 5; % Number of elements to be sampled
sizeA = numel(A); % Number of elements in A
idx = randperm(sizeA); % Random permutation
B = A(idx(1:k)); % Random selection of k elements from A
However, I'm looking for a way to expand the above concept so that I could randomly select k non-overlapping n x m -sized sub-matrices (e.g. 5 x 5) from A. What would be the most convenient way to achieve this? I'd very much appreciate any help!
This probably isn't the most efficient way to do this. I'm sure if I (or somebody else) gave it more thought there would be a better way but it should help you get started.
First I take the original idx(1:k) and reshape it into a 3D matrix reshape(idx(1:k), 1, 1, k). Then I extend it to the length required, padding with zeros, idx(k, k, 1) = 0; % Extend padding with zeros and lastly I use 2 for loops to create the correct indices
for n = 1:k
for m = 1:k
idx(m, 1:k, n) = size(A)*(m - 1) + idx(1, 1, n):size(A)*(m - 1) + idx(1, 1, n) + k - 1;
end
end
The complete script built onto the end of yours
A = rand(1000, 1000);
k = 5;
idx = randperm(numel(A));
B = A(idx(1:k));
idx = reshape(idx(1:k), 1, 1, k);
idx(k, k, 1) = 0; % Extend padding with zeros
for n = 1:k
for m = 1:k
idx(m, 1:k, n) = size(A)*(m - 1) + idx(1, 1, n):size(A)*(m - 1) + idx(1, 1, n) + k - 1;
end
end
C = A(idx);