Construct an experiment to study the performance of the Cramer rule (with two implementations
determinants) in relation to Gauss's algorithm.
In each iteration 10 random arrays A (NxN), and vectors b (Nx1) will be created.
The 10 linear systems will be solved using the Cramer rule ("cramer.m") using
of rec_det (A) and using det (A), and the Gaussian algorithm
(“GaussianElimination.m”), and the time for each technique will be the average of 10 values.
Repeat the above for N = 2 to 10 and make a graph of the average time
in relation to the dimension N.
This is my task. I dont know if the way that I calculate the average time is correct and the graphic is not displayed.
T1=0;
T2=0;
T3=0;
for N=2:10
for i=1:10
A=rand(N,N);
b=rand(N,1);
t1=[1,i];
t2=[1,i];
t3=[1,i];
tic;
crammer(A,b);
t1(i)=toc;
tic
crammer_rec(A,b);
t2(i)=toc;
tic
gaussianElimination(A,b);
t3(i)=toc;
T1=T1+t1(i);
T2=T2+t2(i);
T3=T3+t3(i);
end
avT1=T1/10;
avT2=T2/10;
avT3=T3/10;
end
plot(2:10 , avT1 , 2:10 , avT2 , 2:10 , avT3);
function x = cramer(A, b)
n = length(b);
d = det(A);
% d = rec_det(A);
x = zeros(n, 1);
for j = 1:n
x(j) = det([A(:,1:j-1) b A(:,j+1:end)]) / d;
% x(j) = rec_det([A(:,1:j-1) b A(:,j+1:end)]) / d;
end
end
function x = cramer(A, b)
n = length(b);
d = rec_det(A);
x = zeros(n, 1);
for j = 1:n
x(j) = rec_det([A(:,1:j-1) b A(:,j+1:end)]) / d;
end
end
function deta = rec_det(R)
if size(R,1)~=size(R,2)
error('Error.Matrix must be square.')
else
n = size(R,1);
if ( n == 2 )
deta=(R(1,1)*R(2,2))-(R(1,2)*R(2,1));
else
for i=1:n
deta_temp=R;
deta_temp(1,:)=[ ];
deta_temp(:,i)=[ ];
if i==1
deta=(R(1,i)*((-1)^(i+1))*rec_det(deta_temp));
else
deta=deta+(R(1,i)*((-1)^(i+1))*rec_det(deta_temp));
end
end
end
end
end
function x = gaussianElimination(A, b)
[m, n] = size(A);
if m ~= n
error('Matrix A must be square!');
end
n1 = length(b);
if n1 ~= n
error('Vector b should be equal to the number of rows and columns of A!');
end
Aug = [A b]; % build the augmented matrix
C = zeros(1, n + 1);
% elimination phase
for k = 1:n - 1
% ensure that the pivoting point is the largest in its column
[pivot, j] = max(abs(Aug(k:n, k)));
C = Aug(k, :);
Aug(k, :) = Aug(j + k - 1, :);
Aug(j + k - 1, :) = C;
if Aug(k, k) == 0
error('Matrix A is singular');
end
for i = k + 1:n
r = Aug(i, k) / Aug(k, k);
Aug(i, k:n + 1) = Aug(i, k:n + 1) - r * Aug(k, k: n + 1);
end
end
% back substitution phase
x = zeros(n, 1);
x(n) = Aug(n, n + 1) / Aug(n, n);
for k = n - 1:-1:1
x(k) = (Aug(k, n + 1) - Aug(k, k + 1:n) * x(k + 1:n)) / Aug(k, k);
end
end
I think the easiest way to do this is by creating a 9 * 3 dimensional matrix to contain all the total times, and then take the average at the end.
allTimes = zeros(9, 3);
for N=2:10
for ii=1:10
A=rand(N,N);
b=rand(N,1);
tic;
crammer(A,b);
temp = toc;
allTimes(N-1,1) = allTimes(N-1,1) + temp;
tic
crammer_rec(A,b);
temp = toc;
allTimes(N-1,2) = allTimes(N-1,2) + temp;
tic
gaussianElimination(A,b);
temp = toc;
allTimes(N-1,3) = allTimes(N-1,3) + temp;
end
end
allTimes = allTimes/10;
figure; plot(2:10, allTimes);
You can use this approach because the numbers are quite straightforward and simple. If you had a more complicated setup, the way to store the times/calculate the averages would have to be tweaked.
If you had more functions you could also use function handles and create a third inner loop, but this is a little more advanced.
I want to realize component-wise matrix multiplication in MATLAB, which can be done using numpy.einsum in Python as below:
import numpy as np
M = 2
N = 4
I = 2000
J = 300
A = np.random.randn(M, M, I)
B = np.random.randn(M, M, N, J, I)
C = np.random.randn(M, J, I)
# using einsum
D = np.einsum('mki, klnji, lji -> mnji', A, B, C)
# naive for-loop
E = np.zeros(M, N, J, I)
for i in range(I):
for j in range(J):
for n in range(N):
E[:,n,j,i] = B[:,:,i] # A[:,:,n,j,i] # C[:,j,i]
print(np.sum(np.abs(D-E))) # expected small enough
So far I use for-loop of i, j, and n, but I don't want to, at least for-loop of n.
Option 1: Calling numpy from MATLAB
Assuming your system is set up according to the documentation, and you have the numpy package installed, you could do (in MATLAB):
np = py.importlib.import_module('numpy');
M = 2;
N = 4;
I = 2000;
J = 300;
A = matpy.mat2nparray( randn(M, M, I) );
B = matpy.mat2nparray( randn(M, M, N, J, I) );
C = matpy.mat2nparray( randn(M, J, I) );
D = matpy.nparray2mat( np.einsum('mki, klnji, lji -> mnji', A, B, C) );
Where matpy can be found here.
Option 2: Native MATLAB
Here the most important part is to get the permutations right, so we need to keep track of our dimensions. We'll be using the following order:
I(1) J(2) K(3) L(4) M(5) N(6)
Now, I'll explain how I got the correct permute order (let's take the example of A): einsum expects the dimension order to be mki, which according to our numbering is 5 3 1. This tells us that the 1st dimension of A needs to be the 5th, the 2nd needs to be 3rd and the 3rd needs to be 1st (in short 1->5, 2->3, 3->1). This also means that the "sourceless dimensions" (meaning those that have no original dimensions becoming them; in this case 2 4 6) should be singleton. Using ipermute this is really simple to write:
pA = ipermute(A, [5,3,1,2,4,6]);
In the above example, 1->5 means we write 5 first, and the same goes for the other two dimensions (yielding [5,3,1]). Then we just add the singletons (2,4,6) at the end to get [5,3,1,2,4,6]. Finally:
A = randn(M, M, I);
B = randn(M, M, N, J, I);
C = randn(M, J, I);
% Reference dim order: I(1) J(2) K(3) L(4) M(5) N(6)
pA = ipermute(A, [5,3,1,2,4,6]); % 1->5, 2->3, 3->1; 2nd, 4th & 6th are singletons
pB = ipermute(B, [3,4,6,2,1,5]); % 1->3, 2->4, 3->6, 4->2, 5->1; 5th is singleton
pC = ipermute(C, [4,2,1,3,5,6]); % 1->4, 2->2, 3->1; 3rd, 5th & 6th are singletons
pD = sum( ...
permute(pA .* pB .* pC, [5,6,2,1,3,4]), ... 1->5, 2->6, 3->2, 4->1; 3rd & 4th are singletons
[5,6]);
(see note regarding sum at the bottom of the post.)
Another way to do it in MATLAB, as mentioned by #AndrasDeak, is the following:
rD = squeeze(sum(reshape(A, [M, M, 1, 1, 1, I]) .* ...
reshape(B, [1, M, M, N, J, I]) .* ...
... % same as: reshape(B, [1, size(B)]) .* ...
... % same as: shiftdim(B,-1) .* ...
reshape(C, [1, 1, M, 1, J, I]), [2, 3]));
See also: squeeze, reshape, permute, ipermute, shiftdim.
Here's a full example that shows that tests whether these methods are equivalent:
function q55913093
M = 2;
N = 4;
I = 2000;
J = 300;
mA = randn(M, M, I);
mB = randn(M, M, N, J, I);
mC = randn(M, J, I);
%% Option 1 - using numpy:
np = py.importlib.import_module('numpy');
A = matpy.mat2nparray( mA );
B = matpy.mat2nparray( mB );
C = matpy.mat2nparray( mC );
D = matpy.nparray2mat( np.einsum('mki, klnji, lji -> mnji', A, B, C) );
%% Option 2 - native MATLAB:
%%% Reference dim order: I(1) J(2) K(3) L(4) M(5) N(6)
pA = ipermute(mA, [5,3,1,2,4,6]); % 1->5, 2->3, 3->1; 2nd, 4th & 6th are singletons
pB = ipermute(mB, [3,4,6,2,1,5]); % 1->3, 2->4, 3->6, 4->2, 5->1; 5th is singleton
pC = ipermute(mC, [4,2,1,3,5,6]); % 1->4, 2->2, 3->1; 3rd, 5th & 6th are singletons
pD = sum( permute( ...
pA .* pB .* pC, [5,6,2,1,3,4]), ... % 1->5, 2->6, 3->2, 4->1; 3rd & 4th are singletons
[5,6]);
rD = squeeze(sum(reshape(mA, [M, M, 1, 1, 1, I]) .* ...
reshape(mB, [1, M, M, N, J, I]) .* ...
reshape(mC, [1, 1, M, 1, J, I]), [2, 3]));
%% Comparisons:
sum(abs(pD-D), 'all')
isequal(pD,rD)
Running the above we get that the results are indeed equivalent:
>> q55913093
ans =
2.1816e-10
ans =
logical
1
Note that these two methods of calling sum were introduced in recent releases, so you might need to replace them if your MATLAB is relatively old:
S = sum(A,'all') % can be replaced by ` sum(A(:)) `
S = sum(A,vecdim) % can be replaced by ` sum( sum(A, dim1), dim2) `
As requested in the comments, here's a benchmark comparing the methods:
function t = q55913093_benchmark(M,N,I,J)
if nargin == 0
M = 2;
N = 4;
I = 2000;
J = 300;
end
% Define the arrays in MATLAB
mA = randn(M, M, I);
mB = randn(M, M, N, J, I);
mC = randn(M, J, I);
% Define the arrays in numpy
np = py.importlib.import_module('numpy');
pA = matpy.mat2nparray( mA );
pB = matpy.mat2nparray( mB );
pC = matpy.mat2nparray( mC );
% Test for equivalence
D = cat(5, M1(), M2(), M3());
assert( sum(abs(D(:,:,:,:,1) - D(:,:,:,:,2)), 'all') < 1E-8 );
assert( isequal (D(:,:,:,:,2), D(:,:,:,:,3)));
% Time
t = [ timeit(#M1,1), timeit(#M2,1), timeit(#M3,1)];
function out = M1()
out = matpy.nparray2mat( np.einsum('mki, klnji, lji -> mnji', pA, pB, pC) );
end
function out = M2()
out = permute( ...
sum( ...
ipermute(mA, [5,3,1,2,4,6]) .* ...
ipermute(mB, [3,4,6,2,1,5]) .* ...
ipermute(mC, [4,2,1,3,5,6]), [3,4]...
), [5,6,2,1,3,4]...
);
end
function out = M3()
out = squeeze(sum(reshape(mA, [M, M, 1, 1, 1, I]) .* ...
reshape(mB, [1, M, M, N, J, I]) .* ...
reshape(mC, [1, 1, M, 1, J, I]), [2, 3]));
end
end
On my system this results in:
>> q55913093_benchmark
ans =
1.3964 0.1864 0.2428
Which means that the 2nd method is preferable (at least for the default input sizes).
Suppose I have a m-by-n-by-p matrix "A", each indices stores a real number, now I want to create another matrix "B" and B(i, j, k) = f(A(i, j, k), i, j, k, otherVars), is there a faster way to do it in matlab rather than looping through all the elements? (notice the function requires the index number (i, j, k))
An example is as follows(The actual function f could be more complex):
A = rand(3, 4, 5);
B = zeros(size(A));
C = 10;
for x = 1:size(A, 1)
for y = 1:size(A, 2)
for z = 1:size(A, 3)
B(x, y, z) = A(x,y,z) + x - y * z + C;
end
end
end
I've tried creating a cell "B", and
B{i, j, k} = [A(i, j, k), i, j, k];
I then applied cellfun() to do the parallel computing, but it's even slower than a for-loop over each elements in A.
In my real implementation, function f is much more complex than B = A + X - Y.*Z + C; it takes four scaler values and I don't want to modify it since it's a function written in an external package. Any suggestions?
Vectorize it by building an ndgrid of the appropriate values:
[X,Y,Z] = ndgrid(1:size(A,1), 1:size(A,2), 1:size(A,3));
B = A + X - Y.*Z + C;
I have the following code:
n = 10000;
s = 100;
Z = rand(n, 2);
x = rand(s, 1);
y = rand(s, 1);
fun = #(a) exp(a);
In principle, the anonymous function f can have a different form. I need to create two arrays.
First, I need to create an array of size n x s x s with generic elements
fun(Z(i, 1) - x(j)) * fun(Z(i, 2) - y(k))
where i=1,...n while j,k=1,...,s. What I can easily do, is to construct matrices using bsxfun, e.g.
bsxfun(#(x, y) fun(x - y), Z(:, 1), x');
bsxfun(#(x, y) fun(x - y), Z(:, 2), y');
But then I would need to combine them into 3D array by multiplying element-wise each column of those two matrices.
In the second step, I need to create an array of size n x 3 x s x s, which would look from one side as the following matrix
[ones(n, 1), Z(:, 1) - x(i), Z(:, 2) - y(j);]
where i=1,...s, j=1,...s. I could loop over the two extra dimensions with something like
A = [ones(n, 1), Z(:, 1) - x(1), Z(:, 2) - y(1)];
for i = 1:s
for j = 1:s
A(:, :, i, j) = [ones(n, 1), Z(:, 1) - x(i), Z(:, 2) - y(j);];
end
end
Is there a way to avoid loops?
In the third step, suppose that after obtaining array out1 (output from first step), I want to create a new array out3 of dimension n x n x s x s, which contains the original array out1 on the main diagonal, i.e. out3(i,i,s,s) = out1(i, s, s) and out3(i,j,s,s)=0 for all i~=j. Is there some kind of alternative of diag for creating "diagonal arrays"? Alternatively, if I create n x n x s x s array of zeros, is there a way to put out1 on the main diagonal?
Code
exp_Z_x = exp(bsxfun(#minus,Z(:,1),x.')); %//'
exp_Z_y = exp(bsxfun(#minus,Z(:,2),y.')); %//'
out1 = bsxfun(#times,exp_Z_x,permute(exp_Z_y,[1 3 2]));
Z1 = [ones(n,1) Z(:,1) Z(:,2)];
X1 = permute([ zeros(s,1) x zeros(s,1)],[3 2 1]);
Y1 = permute([ zeros(s,1) zeros(s,1) y],[4 2 3 1]);
out2 = bsxfun(#minus,bsxfun(#minus,Z1,X1),Y1);
out3 = zeros(n,n,s,s); %// out3(n,n,s,s) = 0; could be used for performance
out3(bsxfun(#plus,[1:n+1:n*n]',[0:s*s-1]*n*n)) = out1; %//'
%// out1, out2 and out3 are the desired outputs
I was wondering if there was a way to vectorize the nested for loop in this function which is filling up the entries of the 2D dynamic programming table DP. I believe that at the very least the inner loop could be vectorized as each row only depends on the previous row. I'm not sure how to do it though. Note this function is called on large 2D arrays (images) so the nested for loop really doesn't cut it.
function [cols] = compute_seam(energy)
[r, c, ~] = size(energy);
cols = zeros(r);
DP = padarray(energy, [0, 1], Inf);
BP = zeros(r, c);
for i = 2 : r
for j = 1 : c
[x, l] = min([DP(i - 1, j), DP(i - 1, j + 1), DP(i - 1, j + 2)]);
DP(i, j + 1) = DP(i, j + 1) + x;
BP(i, j) = j + (l - 2);
end
end
[~, j] = min(DP(r, :));
j = j - 1;
for i = r : -1 : 1
cols(i) = j;
j = BP(i, j);
end
end
Vectorization of the innermost nested loop
You were right in postulating that at least the inner loop is vectorizable. Here's the modified code for the nested loops part -
rows_DP = size(DP,1); %// rows in DP
%// Get first row linear indices for a group of neighboring three columns,
%// which would be incremented as we move between rows with the row iterator
start_ind1 = bsxfun(#plus,[1:rows_DP:2*rows_DP+1]',[0:c-1]*rows_DP); %//'
for i = 2 : r
ind1 = start_ind1 + i-2; %// setup linear indices for the row of this iteration
[x,l] = min(DP(ind1),[],1); %// get x and l values in one go
DP(i,2:c+1) = DP(i,2:c+1) + x; %// set DP values of a row in one go
BP(i,1:c) = [1:c] + l-2; %// set BP values of a row in one go
end
Benchmarking
Benchmarking Code -
N = 3000; %// Datasize
energy = rand(N);
[r, c, ~] = size(energy);
disp('------------------------------------- With Original Code')
DP = padarray(energy, [0, 1], Inf);
BP = zeros(r, c);
tic
for i = 2 : r
for j = 1 : c
[x, l] = min([DP(i - 1, j), DP(i - 1, j + 1), DP(i - 1, j + 2)]);
DP(i, j + 1) = DP(i, j + 1) + x;
BP(i, j) = j + (l - 2);
end
end
toc,clear DP BP x l
disp('------------------------------------- With Vectorized Code')
DP = padarray(energy, [0, 1], Inf);
BP = zeros(r, c);
tic
rows_DP = size(DP,1); %// rows in DP
start_ind1 = bsxfun(#plus,[1:rows_DP:2*rows_DP+1]',[0:c-1]*rows_DP); %//'
for i = 2 : r
ind1 = start_ind1 + i-2; %// setup linear indices for the row of this iteration
[x,l] = min(DP(ind1),[],1); %// get x and l values in one go
DP(i,2:c+1) = DP(i,2:c+1) + x; %// set DP values of a row in one go
BP(i,1:c) = [1:c] + l-2; %// set BP values of a row in one go
end
toc
Results -
------------------------------------- With Original Code
Elapsed time is 44.200746 seconds.
------------------------------------- With Vectorized Code
Elapsed time is 1.694288 seconds.
Thus, you might enjoy a good 26x speedup improvement in performance with that little vectorization tweak.
More tweaks
Few more optimization tweaks could be tried into your code for performance -
cols = zeros(r) could be replaced with col(r,r) = 0.
DP = padarray(energy, [0, 1], Inf) could be replaced with
DP(1:size(energy,1),1:size(energy,2)+2)=Inf;
DP(:,2:end-1) = energy;
BP = zeros(r, c) could be replaced with BP(r, c) = 0.
The pre-allocation tweaks used here are inspired by this blog post.