I want to realize component-wise matrix multiplication in MATLAB, which can be done using numpy.einsum in Python as below:
import numpy as np
M = 2
N = 4
I = 2000
J = 300
A = np.random.randn(M, M, I)
B = np.random.randn(M, M, N, J, I)
C = np.random.randn(M, J, I)
# using einsum
D = np.einsum('mki, klnji, lji -> mnji', A, B, C)
# naive for-loop
E = np.zeros(M, N, J, I)
for i in range(I):
for j in range(J):
for n in range(N):
E[:,n,j,i] = B[:,:,i] # A[:,:,n,j,i] # C[:,j,i]
print(np.sum(np.abs(D-E))) # expected small enough
So far I use for-loop of i, j, and n, but I don't want to, at least for-loop of n.
Option 1: Calling numpy from MATLAB
Assuming your system is set up according to the documentation, and you have the numpy package installed, you could do (in MATLAB):
np = py.importlib.import_module('numpy');
M = 2;
N = 4;
I = 2000;
J = 300;
A = matpy.mat2nparray( randn(M, M, I) );
B = matpy.mat2nparray( randn(M, M, N, J, I) );
C = matpy.mat2nparray( randn(M, J, I) );
D = matpy.nparray2mat( np.einsum('mki, klnji, lji -> mnji', A, B, C) );
Where matpy can be found here.
Option 2: Native MATLAB
Here the most important part is to get the permutations right, so we need to keep track of our dimensions. We'll be using the following order:
I(1) J(2) K(3) L(4) M(5) N(6)
Now, I'll explain how I got the correct permute order (let's take the example of A): einsum expects the dimension order to be mki, which according to our numbering is 5 3 1. This tells us that the 1st dimension of A needs to be the 5th, the 2nd needs to be 3rd and the 3rd needs to be 1st (in short 1->5, 2->3, 3->1). This also means that the "sourceless dimensions" (meaning those that have no original dimensions becoming them; in this case 2 4 6) should be singleton. Using ipermute this is really simple to write:
pA = ipermute(A, [5,3,1,2,4,6]);
In the above example, 1->5 means we write 5 first, and the same goes for the other two dimensions (yielding [5,3,1]). Then we just add the singletons (2,4,6) at the end to get [5,3,1,2,4,6]. Finally:
A = randn(M, M, I);
B = randn(M, M, N, J, I);
C = randn(M, J, I);
% Reference dim order: I(1) J(2) K(3) L(4) M(5) N(6)
pA = ipermute(A, [5,3,1,2,4,6]); % 1->5, 2->3, 3->1; 2nd, 4th & 6th are singletons
pB = ipermute(B, [3,4,6,2,1,5]); % 1->3, 2->4, 3->6, 4->2, 5->1; 5th is singleton
pC = ipermute(C, [4,2,1,3,5,6]); % 1->4, 2->2, 3->1; 3rd, 5th & 6th are singletons
pD = sum( ...
permute(pA .* pB .* pC, [5,6,2,1,3,4]), ... 1->5, 2->6, 3->2, 4->1; 3rd & 4th are singletons
[5,6]);
(see note regarding sum at the bottom of the post.)
Another way to do it in MATLAB, as mentioned by #AndrasDeak, is the following:
rD = squeeze(sum(reshape(A, [M, M, 1, 1, 1, I]) .* ...
reshape(B, [1, M, M, N, J, I]) .* ...
... % same as: reshape(B, [1, size(B)]) .* ...
... % same as: shiftdim(B,-1) .* ...
reshape(C, [1, 1, M, 1, J, I]), [2, 3]));
See also: squeeze, reshape, permute, ipermute, shiftdim.
Here's a full example that shows that tests whether these methods are equivalent:
function q55913093
M = 2;
N = 4;
I = 2000;
J = 300;
mA = randn(M, M, I);
mB = randn(M, M, N, J, I);
mC = randn(M, J, I);
%% Option 1 - using numpy:
np = py.importlib.import_module('numpy');
A = matpy.mat2nparray( mA );
B = matpy.mat2nparray( mB );
C = matpy.mat2nparray( mC );
D = matpy.nparray2mat( np.einsum('mki, klnji, lji -> mnji', A, B, C) );
%% Option 2 - native MATLAB:
%%% Reference dim order: I(1) J(2) K(3) L(4) M(5) N(6)
pA = ipermute(mA, [5,3,1,2,4,6]); % 1->5, 2->3, 3->1; 2nd, 4th & 6th are singletons
pB = ipermute(mB, [3,4,6,2,1,5]); % 1->3, 2->4, 3->6, 4->2, 5->1; 5th is singleton
pC = ipermute(mC, [4,2,1,3,5,6]); % 1->4, 2->2, 3->1; 3rd, 5th & 6th are singletons
pD = sum( permute( ...
pA .* pB .* pC, [5,6,2,1,3,4]), ... % 1->5, 2->6, 3->2, 4->1; 3rd & 4th are singletons
[5,6]);
rD = squeeze(sum(reshape(mA, [M, M, 1, 1, 1, I]) .* ...
reshape(mB, [1, M, M, N, J, I]) .* ...
reshape(mC, [1, 1, M, 1, J, I]), [2, 3]));
%% Comparisons:
sum(abs(pD-D), 'all')
isequal(pD,rD)
Running the above we get that the results are indeed equivalent:
>> q55913093
ans =
2.1816e-10
ans =
logical
1
Note that these two methods of calling sum were introduced in recent releases, so you might need to replace them if your MATLAB is relatively old:
S = sum(A,'all') % can be replaced by ` sum(A(:)) `
S = sum(A,vecdim) % can be replaced by ` sum( sum(A, dim1), dim2) `
As requested in the comments, here's a benchmark comparing the methods:
function t = q55913093_benchmark(M,N,I,J)
if nargin == 0
M = 2;
N = 4;
I = 2000;
J = 300;
end
% Define the arrays in MATLAB
mA = randn(M, M, I);
mB = randn(M, M, N, J, I);
mC = randn(M, J, I);
% Define the arrays in numpy
np = py.importlib.import_module('numpy');
pA = matpy.mat2nparray( mA );
pB = matpy.mat2nparray( mB );
pC = matpy.mat2nparray( mC );
% Test for equivalence
D = cat(5, M1(), M2(), M3());
assert( sum(abs(D(:,:,:,:,1) - D(:,:,:,:,2)), 'all') < 1E-8 );
assert( isequal (D(:,:,:,:,2), D(:,:,:,:,3)));
% Time
t = [ timeit(#M1,1), timeit(#M2,1), timeit(#M3,1)];
function out = M1()
out = matpy.nparray2mat( np.einsum('mki, klnji, lji -> mnji', pA, pB, pC) );
end
function out = M2()
out = permute( ...
sum( ...
ipermute(mA, [5,3,1,2,4,6]) .* ...
ipermute(mB, [3,4,6,2,1,5]) .* ...
ipermute(mC, [4,2,1,3,5,6]), [3,4]...
), [5,6,2,1,3,4]...
);
end
function out = M3()
out = squeeze(sum(reshape(mA, [M, M, 1, 1, 1, I]) .* ...
reshape(mB, [1, M, M, N, J, I]) .* ...
reshape(mC, [1, 1, M, 1, J, I]), [2, 3]));
end
end
On my system this results in:
>> q55913093_benchmark
ans =
1.3964 0.1864 0.2428
Which means that the 2nd method is preferable (at least for the default input sizes).
I'm trying to make a code that solves A*x = b, linear systems.
I made the code below using the gauss elimination process, and it works everytime if A doesn't have any 0's in it. If A has zeros in it, then sometimes it works, sometimes it doesn't. Basically I'm trying an alternative to the "A\b" in MATLAB.
Is there a better/simpler way of doing this?
A = randn(5,5);
b = randn(5,1);
nn = size(A);
n = nn(1,1);
U = A;
u = b;
for c = 1:1:n
k = U(:,c);
for r = n:-1:c
if k(r,1) == 0
continue;
else
U(r,:) = U(r,:)/k(r,1);
u(r,1) = u(r,1)/k(r,1);
end
end
for r = n:-1:(c+1)
if k(r,1) == 0
continue;
else
U(r,:) = U(r,:) - U(r-1,:);
u(r,1) = u(r,1) - u(r-1,1);
end
end
end
x = zeros(size(b));
for r = n:-1:1
if r == n
x(r,1) = u(r,1);
else
x(r,1) = u(r,1);
x(r,1) = x(r,1) - U(r,r+1:n)*x(r+1:n,1);
end
end
error = A*x - b;
for i = 1:1:n
if abs(error(i)) > 0.001
disp('ERROR!');
break;
else
continue;
end
end
disp('x:');
disp(x);
Working example with 0's:
A = [1, 3, 1, 3;
3, 4, 4, 1;
3, 0, 3, 9;
0, 4, 0, 1];
b = [3;
4;
5;
6];
Example that fails (A*x-b isn't [0])
A = [1, 3, 1, 3;
3, 4, 4, 1;
0, 0, 3, 9;
0, 4, 0, 1];
b = [3;
4;
5;
6];
Explanation of my algorithm:
Lets say I have the following A matrix:
|4, 1, 9|
|3, 4, 5|
|1, 3, 5|
For the first column, I divide each line by the first number in the row, so every row starts with 1
|1, 1/4, 9/4|
|1, 4/3, 5/3|
|1, 3, 5|
Then I subtract the last row with the one above it, and then I'll do the same for the row above and so on.
|1, 1/4, 9/4|
|0, 4/3-1/4, 5/3-9/4|
|0, 3-4/3, 5-5/3|
|1, 0.25, 2.250|
|0, 1.083, -0.5833|
|0, 1.667, 3.333|
Then I repeat the same for the rest of the columns.
|1, 0.25, 2.250|
|0, 1, -0.5385|
|0, 1, 1.999|
|1, 0.25, 2.250|
|0, 1, -0.5385|
|0, 0, -8.7700|
|1, 0.25, 2.250|
|0, 1, -0.5385|
|0, 0, 1|
The same operations I do in A I do in b so the system stays equivalent.
re-UPDATE:
I added this right after "for c = 1:1:n"
So before doing anything it sorts the rows of A (and b) in order to make the "c" column have decrescent entries (0's will be left on the bottom rows of A). Right now it seems to work for any invertible square matrix, although I'm not sure it will.
r = c;
a = r + 1;
while r <= n
if r == n
r = r + 1;
elseif a <= n
while a <= n
if abs(U(r,c)) < abs(U(a,c))
UU = U(r,:);
U(r,:) = U(a,:);
U(a,:) = UU;
uu = u(r,1);
u(r,1) = u(a,1);
u(a,1) = uu;
else
a = a+1;
end
end
else
r = r+1;
a = r+1;
end
end
Gaussian elimination with pivoting is as following.
function [L,U,P] = my_lu_piv(A)
n = size(A,1);
I = eye(n);
O = zeros(n);
L = I;
U = O;
P = I;
function change_rows(k,p)
x = P(k,:); P(k,:) = P(p,:); P(p,:) = x;
x = A(k,:); A(k,:) = A(p,:); A(p,:) = x;
x = v(k); v(k) = v(p); v(p) = x;
end
function change_L(k,p)
x = L(k,1:k-1); L(k,1:k-1) = L(p,1:k-1);
L(p,1:k-1) = x;
end
for k = 1:n
if k == 1, v(k:n) = A(k:n,k);
else
z = L(1:k-1,1:k -1)\ A(1:k-1,k);
U(1:k-1,k) = z;
v(k:n) = A(k:n,k)-L(k:n,1:k-1)*z;
end
if k<n
x = v(k:n); p = (k-1)+find(abs(x) == max(abs(x))); % find index p
change_rows(k,p);
L(k+1:n,k) = v(k+1:n)/v(k);
if k > 1, change_L(k,p); end
end
U(k,k) = v(k);
end
end
In order to solve the system..
% Ax = b (1) original system % LU = PA
(2) factorization of PA or A(p,:) into the product LU % PAx =
Pb (3) multiply both sides of (1) by P % LUx = Pb
(4) substitute (2) into (3) % let y = Ux (5) define y as
Ux % let c = Pb (6) define c as Pb % Ly = c
(7) subsitute (5) and (6) into (4) % U*x = y (8) a
rewrite of (5)
To do this..
% [L U p] = lu (A) ; % factorize % y = L \ (P*b) ; % forward
solve of (7), a lower triangular system % x = U \ y ; %
backsolve of (8), an upper triangular system
Gaussian algorithm assumes that the matrix is converted to an upper triangular matrix. This does not happen in your example. The result of your algorithm is
A =
1 3 1 3
3 4 4 1
0 0 3 9
0 4 0 1
U =
1.00000 3.00000 1.00000 3.00000
-0.00000 1.00000 -0.20000 1.60000
0.00000 0.00000 1.00000 3.00000
0.00000 4.00000 -0.00000 1.00000
As you can see, it's not upper triangular. You are skipping rows, if the pivot element is zero. That does not work. To fix this you need to swap columns in the matrix and rows in the vector if the pivot element is zero. At the end you have to swap back rows in your result b resp. u.
Gaussian algorithm is:
1 Set n = 1
2 Take pivot element (n, n)
3 If (n, n) == 0, swap column n with column m, so that m > n and (n, m) != 0 (swap row m and n in vector b)
4 Divide n-th row by pivot element (divide n-th row in vector b)
5 For each m > n
6 If (m, n) != 0
7 Divide row m by m and subtract element-wise row n (same for vector b)
8 n = n + 1
9 If n <= number of rows, go to line 2
In terms of numerical stability it would be best to use the maximum of each row as pivot element. Also you can use the maximum of the matrix as pivot element by swapping columns and rows. But remember to swap in b and to swap back in your solution.
Try this:
Ab = [A,b] % Extended matrix of the system of equations
rref(Ab) % Result of applying the Gauss-Jordan elimination to the extended matrix
See rref documentation for more details and examples.
Suppose I have a m-by-n-by-p matrix "A", each indices stores a real number, now I want to create another matrix "B" and B(i, j, k) = f(A(i, j, k), i, j, k, otherVars), is there a faster way to do it in matlab rather than looping through all the elements? (notice the function requires the index number (i, j, k))
An example is as follows(The actual function f could be more complex):
A = rand(3, 4, 5);
B = zeros(size(A));
C = 10;
for x = 1:size(A, 1)
for y = 1:size(A, 2)
for z = 1:size(A, 3)
B(x, y, z) = A(x,y,z) + x - y * z + C;
end
end
end
I've tried creating a cell "B", and
B{i, j, k} = [A(i, j, k), i, j, k];
I then applied cellfun() to do the parallel computing, but it's even slower than a for-loop over each elements in A.
In my real implementation, function f is much more complex than B = A + X - Y.*Z + C; it takes four scaler values and I don't want to modify it since it's a function written in an external package. Any suggestions?
Vectorize it by building an ndgrid of the appropriate values:
[X,Y,Z] = ndgrid(1:size(A,1), 1:size(A,2), 1:size(A,3));
B = A + X - Y.*Z + C;
I was wondering if there was a way to vectorize the nested for loop in this function which is filling up the entries of the 2D dynamic programming table DP. I believe that at the very least the inner loop could be vectorized as each row only depends on the previous row. I'm not sure how to do it though. Note this function is called on large 2D arrays (images) so the nested for loop really doesn't cut it.
function [cols] = compute_seam(energy)
[r, c, ~] = size(energy);
cols = zeros(r);
DP = padarray(energy, [0, 1], Inf);
BP = zeros(r, c);
for i = 2 : r
for j = 1 : c
[x, l] = min([DP(i - 1, j), DP(i - 1, j + 1), DP(i - 1, j + 2)]);
DP(i, j + 1) = DP(i, j + 1) + x;
BP(i, j) = j + (l - 2);
end
end
[~, j] = min(DP(r, :));
j = j - 1;
for i = r : -1 : 1
cols(i) = j;
j = BP(i, j);
end
end
Vectorization of the innermost nested loop
You were right in postulating that at least the inner loop is vectorizable. Here's the modified code for the nested loops part -
rows_DP = size(DP,1); %// rows in DP
%// Get first row linear indices for a group of neighboring three columns,
%// which would be incremented as we move between rows with the row iterator
start_ind1 = bsxfun(#plus,[1:rows_DP:2*rows_DP+1]',[0:c-1]*rows_DP); %//'
for i = 2 : r
ind1 = start_ind1 + i-2; %// setup linear indices for the row of this iteration
[x,l] = min(DP(ind1),[],1); %// get x and l values in one go
DP(i,2:c+1) = DP(i,2:c+1) + x; %// set DP values of a row in one go
BP(i,1:c) = [1:c] + l-2; %// set BP values of a row in one go
end
Benchmarking
Benchmarking Code -
N = 3000; %// Datasize
energy = rand(N);
[r, c, ~] = size(energy);
disp('------------------------------------- With Original Code')
DP = padarray(energy, [0, 1], Inf);
BP = zeros(r, c);
tic
for i = 2 : r
for j = 1 : c
[x, l] = min([DP(i - 1, j), DP(i - 1, j + 1), DP(i - 1, j + 2)]);
DP(i, j + 1) = DP(i, j + 1) + x;
BP(i, j) = j + (l - 2);
end
end
toc,clear DP BP x l
disp('------------------------------------- With Vectorized Code')
DP = padarray(energy, [0, 1], Inf);
BP = zeros(r, c);
tic
rows_DP = size(DP,1); %// rows in DP
start_ind1 = bsxfun(#plus,[1:rows_DP:2*rows_DP+1]',[0:c-1]*rows_DP); %//'
for i = 2 : r
ind1 = start_ind1 + i-2; %// setup linear indices for the row of this iteration
[x,l] = min(DP(ind1),[],1); %// get x and l values in one go
DP(i,2:c+1) = DP(i,2:c+1) + x; %// set DP values of a row in one go
BP(i,1:c) = [1:c] + l-2; %// set BP values of a row in one go
end
toc
Results -
------------------------------------- With Original Code
Elapsed time is 44.200746 seconds.
------------------------------------- With Vectorized Code
Elapsed time is 1.694288 seconds.
Thus, you might enjoy a good 26x speedup improvement in performance with that little vectorization tweak.
More tweaks
Few more optimization tweaks could be tried into your code for performance -
cols = zeros(r) could be replaced with col(r,r) = 0.
DP = padarray(energy, [0, 1], Inf) could be replaced with
DP(1:size(energy,1),1:size(energy,2)+2)=Inf;
DP(:,2:end-1) = energy;
BP = zeros(r, c) could be replaced with BP(r, c) = 0.
The pre-allocation tweaks used here are inspired by this blog post.