Optimizing DP in matlab - matlab

I have the following DP which I am applying on a binarized image (either 0 or 1) in Matlab
[x, y] = size(img);
dp = zeros(x, y);
dp(1,:) = img(1,:);
dp(:,1) = img(:,1);
for i = 2:x
for j = 2:y
if img(i, j) == 0
dp(i, j) = min([dp(i, j - 1), dp(i - 1, j), dp(i - 1, j - 1)]) + 1;
The code for large x and y takes a lot of time maybe because of the if condition and using for loops instead of writing vectorized code.
Can anyone optimize it.?
Or is there any approach which optimizes the above code by exploiting the fact that the matrix img contains either 0 or 1 (fewer 1s than 0s).
Also is it possible to somehow use parallel for loops to speed up.?

As far as I am aware, you cannot really speed up this computation in general. But if you know that there are only very few entries where img(i,j)==0 following approach might save you a little bit of time:
[x, y] = size(img);
dp = zeros(x, y);
dp(1,:) = img(1,:);
dp(:,1) = img(:,1);
[i, j] = find(img(2:end, 2:end) == 0); % Extract only these pixels where we actually need to do something
i = i + 1; %correct for removing the first row and column
j = j + 1;
for k = 1:numel(i);
dp(i(k), j(k)) = min([dp(i(k), j(k) - 1), dp(i(k) - 1, j(k)), dp(i(k) - 1, j(k) - 1)]) + 1;


Filling MATLAB array using formula and arrays of values

I want to fill a 10x15 matrix in MATLAB using the formula z(i, j) = 2 * x(i) + 3 * y(j)^2, so that each entry at (i, j) = z(i, j). I have arrays for x and y, which are of size 10 and 15, respectively.
I've accomplished the task using the code below, but I want to do it in one line, since I'm told it's possible. Any thoughts?
x = linspace(0,1,10);
y = linspace(-0.5,0.5,15);
z = zeros(10,15);
m_1 = 2;
m_2 = 3;
for i = 1:length(x)
for j = 1:length(y)
z(i, j) = m_1*x(i) + m_2*y(i)^2;
It looks like you have a bug in your original loop:
You are using i index twice: m_1*x(i) + m_2*y(i)^2.
The result is that all the columns of z matrix are the same.
For applying the formula z(i, j) = 2*x(i) + 3*y(j)^2 use the following loop:
x = linspace(0,1,10);
y = linspace(-0.5,0.5,15);
z = zeros(10,15);
m_1 = 2;
m_2 = 3;
for i = 1:length(x)
for j = 1:length(y)
z(i, j) = m_1*x(i) + m_2*y(j)^2;
For implementing the above loop using one line, we may use meshgrid first.
Replace the loop with:
[Y, X] = meshgrid(y, x);
Z = m_1*X + m_2*Y.^2;
For expansions, read the documentation of meshgrid, it is much better than any of the expansions I can write...
The following command gives the same output as your original loop (but it's probably irrelevant):
Z = repmat((m_1*x + m_2*y(1:length(x)).^2)', [1, length(y)]);
max(max(abs(Z - z)))
ans =

Matlab euler's method

I'm supposed to use eulers method to find the zonal wind field for all latitudes(-90S to 90N) and altitudes (0 to 22km) on earth
%Wind speed = 0 at surface
a = -12;
b_1 = 40;
x = -90:1:90; %latitude
y = 0:1:22; %altitude
z_r = 12;
[X, Y] = meshgrid(x, y);
%dy_dx = (y_2 - y_1)/(x_2 - x_1)
T = a + (b_1*(1-Y/z_r)).(3./2(2./3 + (sin(Xpi/180)).^2).(cos(X*pi/180)).^3);
title("Temperature field")
xlabel("Latitude (degrees)")
ylabel("Altitude (km)")
%all code works above this line as it should. You can plot this and you will see what is be happening in this simple model.
%gravity in km/s
g = 0.0981
%f = coriolis force
f = (1.458*10^(-4))sin(Xpi/180);
%here I'm trying to use eulers method to find the zonal wind field everywhere on earth
for i = 1:22
for j = 1:180
i(i,j) = 0
for i = 1:22
for j = 1:180
dtdy(i, j+1) = (T(i, j+1) - T(i, j))./(Y(i, j+1) - Y(i, j))
u(i+1, j) = u(i, j) - ((g./(f*T(i, j))).dtdy(i, j+1)*(X(i+1, j) - X(i, j)))./111.21
%I get an error saying the matrix dimensions must agree but I'm not very proficient in matlab, so I'm unsure why.

Compute weighted summation of matrix power (matrix polynomial) in Matlab

Given an nxn matrix A_k and a nx1 vector x, is there any smart way to compute
using Matlab? x_i are the elements of the vector x, therefore J is a sum of matrices. So far I have used a for loop, but I was wondering if there was a smarter way.
Short answer: you can use the builtin matlab function polyvalm for matrix polynomial evaluation as follows:
x = x(end:-1:1); % flip the order of the elements
x(end+1) = 0; % append 0
J = polyvalm(x, A);
Long answer: Matlab uses a loop internally. So, you didn't gain that much or you perform even worse if you optimise your own implementation (see my calcJ_loopOptimised function):
% construct random input
n = 100;
A = rand(n);
x = rand(n, 1);
% calculate the result using different methods
Jbuiltin = calcJ_builtin(A, x);
Jloop = calcJ_loop(A, x);
JloopOptimised = calcJ_loopOptimised(A, x);
% check if the functions are mathematically equivalent (should be in the order of `eps`)
relativeError1 = max(max(abs(Jbuiltin - Jloop)))/max(max(Jbuiltin))
relativeError2 = max(max(abs(Jloop - JloopOptimised)))/max(max(Jloop))
% measure the execution time
t_loopOptimised = timeit(#() calcJ_loopOptimised(A, x))
t_builtin = timeit(#() calcJ_builtin(A, x))
t_loop = timeit(#() calcJ_loop(A, x))
% check if builtin function is faster
builtinFaster = t_builtin < t_loopOptimised
% calculate J using Matlab builtin function
function J = calcJ_builtin(A, x)
x = x(end:-1:1);
x(end+1) = 0;
J = polyvalm(x, A);
% naive loop implementation
function J = calcJ_loop(A, x)
n = size(A, 1);
J = zeros(n,n);
for i=1:n
J = J + A^i * x(i);
% optimised loop implementation (cache result of matrix power)
function J = calcJ_loopOptimised(A, x)
n = size(A, 1);
J = zeros(n,n);
A_ = eye(n);
for i=1:n
A_ = A_*A;
J = J + A_ * x(i);
For n=100, I get the following:
t_loopOptimised = 0.0077
t_builtin = 0.0084
t_loop = 0.0295
For n=5, I get the following:
t_loopOptimised = 7.4425e-06
t_builtin = 4.7399e-05
t_loop = 1.0496e-04
Note that my timings fluctuates somewhat between different runs, but the optimised loop is almost always faster (up to 6x for small n) than the builtin function.

How to randomly select multiple small and non-overlapping matrices from a large matrix?

Let's say I've a large N x M -sized matrix A (e.g. 1000 x 1000). Selecting k random elements without replacement from A is relatively straightforward in MATLAB:
A = rand(1000,1000); % Generate random data
k = 5; % Number of elements to be sampled
sizeA = numel(A); % Number of elements in A
idx = randperm(sizeA); % Random permutation
B = A(idx(1:k)); % Random selection of k elements from A
However, I'm looking for a way to expand the above concept so that I could randomly select k non-overlapping n x m -sized sub-matrices (e.g. 5 x 5) from A. What would be the most convenient way to achieve this? I'd very much appreciate any help!
This probably isn't the most efficient way to do this. I'm sure if I (or somebody else) gave it more thought there would be a better way but it should help you get started.
First I take the original idx(1:k) and reshape it into a 3D matrix reshape(idx(1:k), 1, 1, k). Then I extend it to the length required, padding with zeros, idx(k, k, 1) = 0; % Extend padding with zeros and lastly I use 2 for loops to create the correct indices
for n = 1:k
for m = 1:k
idx(m, 1:k, n) = size(A)*(m - 1) + idx(1, 1, n):size(A)*(m - 1) + idx(1, 1, n) + k - 1;
The complete script built onto the end of yours
A = rand(1000, 1000);
k = 5;
idx = randperm(numel(A));
B = A(idx(1:k));
idx = reshape(idx(1:k), 1, 1, k);
idx(k, k, 1) = 0; % Extend padding with zeros
for n = 1:k
for m = 1:k
idx(m, 1:k, n) = size(A)*(m - 1) + idx(1, 1, n):size(A)*(m - 1) + idx(1, 1, n) + k - 1;
C = A(idx);

Vectorizing a nested for loop which fills a dynamic programming table

I was wondering if there was a way to vectorize the nested for loop in this function which is filling up the entries of the 2D dynamic programming table DP. I believe that at the very least the inner loop could be vectorized as each row only depends on the previous row. I'm not sure how to do it though. Note this function is called on large 2D arrays (images) so the nested for loop really doesn't cut it.
function [cols] = compute_seam(energy)
[r, c, ~] = size(energy);
cols = zeros(r);
DP = padarray(energy, [0, 1], Inf);
BP = zeros(r, c);
for i = 2 : r
for j = 1 : c
[x, l] = min([DP(i - 1, j), DP(i - 1, j + 1), DP(i - 1, j + 2)]);
DP(i, j + 1) = DP(i, j + 1) + x;
BP(i, j) = j + (l - 2);
[~, j] = min(DP(r, :));
j = j - 1;
for i = r : -1 : 1
cols(i) = j;
j = BP(i, j);
Vectorization of the innermost nested loop
You were right in postulating that at least the inner loop is vectorizable. Here's the modified code for the nested loops part -
rows_DP = size(DP,1); %// rows in DP
%// Get first row linear indices for a group of neighboring three columns,
%// which would be incremented as we move between rows with the row iterator
start_ind1 = bsxfun(#plus,[1:rows_DP:2*rows_DP+1]',[0:c-1]*rows_DP); %//'
for i = 2 : r
ind1 = start_ind1 + i-2; %// setup linear indices for the row of this iteration
[x,l] = min(DP(ind1),[],1); %// get x and l values in one go
DP(i,2:c+1) = DP(i,2:c+1) + x; %// set DP values of a row in one go
BP(i,1:c) = [1:c] + l-2; %// set BP values of a row in one go
Benchmarking Code -
N = 3000; %// Datasize
energy = rand(N);
[r, c, ~] = size(energy);
disp('------------------------------------- With Original Code')
DP = padarray(energy, [0, 1], Inf);
BP = zeros(r, c);
for i = 2 : r
for j = 1 : c
[x, l] = min([DP(i - 1, j), DP(i - 1, j + 1), DP(i - 1, j + 2)]);
DP(i, j + 1) = DP(i, j + 1) + x;
BP(i, j) = j + (l - 2);
toc,clear DP BP x l
disp('------------------------------------- With Vectorized Code')
DP = padarray(energy, [0, 1], Inf);
BP = zeros(r, c);
rows_DP = size(DP,1); %// rows in DP
start_ind1 = bsxfun(#plus,[1:rows_DP:2*rows_DP+1]',[0:c-1]*rows_DP); %//'
for i = 2 : r
ind1 = start_ind1 + i-2; %// setup linear indices for the row of this iteration
[x,l] = min(DP(ind1),[],1); %// get x and l values in one go
DP(i,2:c+1) = DP(i,2:c+1) + x; %// set DP values of a row in one go
BP(i,1:c) = [1:c] + l-2; %// set BP values of a row in one go
Results -
------------------------------------- With Original Code
Elapsed time is 44.200746 seconds.
------------------------------------- With Vectorized Code
Elapsed time is 1.694288 seconds.
Thus, you might enjoy a good 26x speedup improvement in performance with that little vectorization tweak.
More tweaks
Few more optimization tweaks could be tried into your code for performance -
cols = zeros(r) could be replaced with col(r,r) = 0.
DP = padarray(energy, [0, 1], Inf) could be replaced with
DP(:,2:end-1) = energy;
BP = zeros(r, c) could be replaced with BP(r, c) = 0.
The pre-allocation tweaks used here are inspired by this blog post.