Loop optimization with row-wise sum - matlab

Aiming at optimizing following nested loop, the problem is how to deal with need for row-wise sum.
for i=1:N
for j=1:N
F(i,i)=(exp(-B(j,j))) * P(i,j)+ F(i,i);
end
end
Specifically, I want to eliminate loops and it looks achievable according to this solution but the problem is how to store changing value of F in each iteration.
I come up with this idea :
for j=1:N
F(:)=(exp(-B(j,j))) * P(:,j)+ F(:);
end
Using this solution, F will be overwritten in every iteration!! Any idea?

I think your code can be simplified to the code like below
F = diag(P*exp(-diag(B)));
Example
N = 3;
B = rand(N,N);
P = rand(N,N);
F = zeros(N);
for i=1:N
for j=1:N
F(i,i)=(exp(-B(j,j))) * P(i,j)+ F(i,i);
end
end
d = isequal(F,diag(P*exp(-diag(B)))); % check if F is identical to diag(P*exp(-diag(B)))
such that
>> d
d = 1 % indicating that they are identical

Related

Applying a function to a matrix, which depends on the indices?

Suppose I have a matrix A and I want to apply a function f to each of its elements. I can then use f(A), if f is vectorized or arrayfun(f,A) if it's not.
But what if I had a function that depends on the entry and its indices: f = #(i,j,x) something. How do I apply this function to the matrix A without using a for loop like the following?
for j=1:size(A,2)
for i=1:size(A,1)
fA(i,j) = f(i,j,A(i,j));
end
end
I'd like to consider the function f to be vectorized. Hints on shorter notation for non-vectorized functions are welcome, though.
I have read your answers and I came up with another idea using indexing, which is the fastest way. Here is my test script:
%// Test function
f = #(i,j,x) i.*x + j.*x.^2;
%// Initialize times
tfor = 0;
tnd = 0;
tsub = 0;
tmy = 0;
%// Do the calculation 100 times
for it = 1:100
%// Random input data
A = rand(100);
%// Clear all variables
clear fA1 fA2 fA3 fA4;
%// Use the for loop
tic;
fA1(size(A,1),size(A,2)) = 0;
for j=1:size(A,2)
for i=1:size(A,1)
fA1(i,j) = f(i,j,A(i,j));
end
end
tfor = tfor + toc;
%// Use ndgrid, like #Divakar suggested
clear I J;
tic;
[I,J] = ndgrid(1:size(A,1),1:size(A,2));
fA2 = f(I,J,A);
tnd = tnd + toc;
%// Test if the calculation is correct
if max(max(abs(fA2-fA1))) > 0
max(max(abs(fA2-fA1)))
end
%// Use ind2sub, like #DennisKlopfer suggested
clear I J;
tic;
[I,J] = ind2sub(size(A),1:numel(A));
fA3 = arrayfun(f,reshape(I,size(A)),reshape(J,size(A)),A);
tsub = tsub + toc;
%// Test if the calculation is correct
if max(max(abs(fA3-fA1))) > 0
max(max(abs(fA3-fA1)))
end
%// My suggestion using indexing
clear sA1 sA2 ssA1 ssA2;
tic;
sA1=size(A,1);
ssA1=1:sA1;
sA2=size(A,2);
ssA2=1:sA2;
fA4 = f(ssA1(ones(1,sA2),:)', ssA2(ones(1,sA1,1),:), A); %'
tmy = tmy + toc;
%// Test if the calculation is correct
if max(max(abs(fA4-fA1))) > 0
max(max(abs(fA4-fA1)))
end
end
%// Print times
tfor
tnd
tsub
tmy
I get the result
tfor =
0.6813
tnd =
0.0341
tsub =
10.7477
tmy =
0.0171
Assuming that the function is vectorized ( no dependency or recursions involved), as mentioned in the comments earlier, you could use ndgrid to create 2D meshes corresponding to the two nested loop iterators i and j and of the same size as A. When these are fed to the particular function f, it would operate on the input 2D arrays in a vectorized manner. Thus, the implementation would look something like this -
[I,J] = ndgrid(1:size(A,1),1:size(A,2));
out = f(I,J,A);
Sample run -
>> f = #(i,j,k) i.^2+j.^2+sin(k);
A = rand(4,5);
for j=1:size(A,2)
for i=1:size(A,1)
fA(i,j) = f(i,j,A(i,j));
end
end
>> fA
fA =
2.3445 5.7939 10.371 17.506 26.539
5.7385 8.282 13.538 20.703 29.452
10.552 13.687 18.076 25.804 34.012
17.522 20.684 25.054 32.13 41.331
>> [I,J] = ndgrid(1:size(A,1),1:size(A,2)); out = f(I,J,A);
>> out
out =
2.3445 5.7939 10.371 17.506 26.539
5.7385 8.282 13.538 20.703 29.452
10.552 13.687 18.076 25.804 34.012
17.522 20.684 25.054 32.13 41.331
Using arrayfun(), ind2sub() and reshape() you can create the indexes matching the form of A. This way arrayfun() is applicable. There might be a better version as this feels a little bit like a hack, it should work on vectorized and unvectorized functions though.
[I,J] = ind2sub(size(A),1:numel(A));
fA = arrayfun(f,reshape(I,size(A)),reshape(J,size(A)),A)

Vectorization - Sum and Bessel function

Can anyone help vectorize this Matlab code? The specific problem is the sum and bessel function with vector inputs.
Thank you!
N = 3;
rho_g = linspace(1e-3,1,N);
phi_g = linspace(0,2*pi,N);
n = 1:3;
tau = [1 2.*ones(1,length(n)-1)];
for ii = 1:length(rho_g)
for jj = 1:length(phi_g)
% Coordinates
rho_o = rho_g(ii);
phi_o = phi_g(jj);
% factors
fc = cos(n.*(phi_o-phi_s));
fs = sin(n.*(phi_o-phi_s));
Ez_t(ii,jj) = sum(tau.*besselj(n,k(3)*rho_s).*besselh(n,2,k(3)*rho_o).*fc);
end
end
You could try to vectorize this code, which might be possible with some bsxfun or so, but it would be hard to understand code, and it is the question if it would run any faster, since your code already uses vector math in the inner loop (even though your vectors only have length 3). The resulting code would become very difficult to read, so you or your colleague will have no idea what it does when you have a look at it in 2 years time.
Before wasting time on vectorization, it is much more important that you learn about loop invariant code motion, which is easy to apply to your code. Some observations:
you do not use fs, so remove that.
the term tau.*besselj(n,k(3)*rho_s) does not depend on any of your loop variables ii and jj, so it is constant. Calculate it once before your loop.
you should probably pre-allocate the matrix Ez_t.
the only terms that change during the loop are fc, which depends on jj, and besselh(n,2,k(3)*rho_o), which depends on ii. I guess that the latter costs much more time to calculate, so it better to not calculate this N*N times in the inner loop, but only N times in the outer loop. If the calculation based on jj would take more time, you could swap the for-loops over ii and jj, but that does not seem to be the case here.
The result code would look something like this (untested):
N = 3;
rho_g = linspace(1e-3,1,N);
phi_g = linspace(0,2*pi,N);
n = 1:3;
tau = [1 2.*ones(1,length(n)-1)];
% constant part, does not depend on ii and jj, so calculate only once!
temp1 = tau.*besselj(n,k(3)*rho_s);
Ez_t = nan(length(rho_g), length(phi_g)); % preallocate space
for ii = 1:length(rho_g)
% calculate stuff that depends on ii only
rho_o = rho_g(ii);
temp2 = besselh(n,2,k(3)*rho_o);
for jj = 1:length(phi_g)
phi_o = phi_g(jj);
fc = cos(n.*(phi_o-phi_s));
Ez_t(ii,jj) = sum(temp1.*temp2.*fc);
end
end
Initialization -
N = 3;
rho_g = linspace(1e-3,1,N);
phi_g = linspace(0,2*pi,N);
n = 1:3;
tau = [1 2.*ones(1,length(n)-1)];
Nested loops form (Copy from your code and shown here for comparison only) -
for ii = 1:length(rho_g)
for jj = 1:length(phi_g)
% Coordinates
rho_o = rho_g(ii);
phi_o = phi_g(jj);
% factors
fc = cos(n.*(phi_o-phi_s));
fs = sin(n.*(phi_o-phi_s));
Ez_t(ii,jj) = sum(tau.*besselj(n,k(3)*rho_s).*besselh(n,2,k(3)*rho_o).*fc);
end
end
Vectorized solution -
%%// Term - 1
term1 = repmat(tau.*besselj(n,k(3)*rho_s),[N*N 1]);
%%// Term - 2
[n1,rho_g1] = meshgrid(n,rho_g);
term2_intm = besselh(n1,2,k(3)*rho_g1);
term2 = transpose(reshape(repmat(transpose(term2_intm),[N 1]),N,N*N));
%%// Term -3
angle1 = repmat(bsxfun(#times,bsxfun(#minus,phi_g,phi_s')',n),[N 1]);
fc = cos(angle1);
%%// Output
Ez_t = sum(term1.*term2.*fc,2);
Ez_t = transpose(reshape(Ez_t,N,N));
Points to note about this vectorization or code simplification –
‘fs’ doesn’t change the output of the script, Ez_t, so it could be removed for now.
The output seems to be ‘Ez_t’,which requires three basic terms in the code as –
tau.*besselj(n,k(3)*rho_s), besselh(n,2,k(3)*rho_o) and fc. These are calculated separately for vectorization as terms1,2 and 3 respectively.
All these three terms appear to be of 1xN sizes. Our aim thus becomes to calculate these three terms without loops. Now, the two loops run for N times each, thus giving us a total loop count of NxN. Thus, we must have NxN times the data in each such term as compared to when these terms were inside the nested loops.
This is basically the essence of the vectorization done here, as the three terms are represented by ‘term1’,’term2’ and ‘fc’ itself.
In order to give a self-contained answer, I'll copy the original initialization
N = 3;
rho_g = linspace(1e-3,1,N);
phi_g = linspace(0,2*pi,N);
n = 1:3;
tau = [1 2.*ones(1,length(n)-1)];
and generate some missing data (k(3) and rho_s and phi_s in the dimension of n)
rho_s = rand(size(n));
phi_s = rand(size(n));
k(3) = rand(1);
then you can compute the same Ez_t with multidimensional arrays:
[RHO_G, PHI_G, N] = meshgrid(rho_g, phi_g, n);
[~, ~, TAU] = meshgrid(rho_g, phi_g, tau);
[~, ~, RHO_S] = meshgrid(rho_g, phi_g, rho_s);
[~, ~, PHI_S] = meshgrid(rho_g, phi_g, phi_s);
FC = cos(N.*(PHI_G - PHI_S));
FS = sin(N.*(PHI_G - PHI_S)); % not used
EZ_T = sum(TAU.*besselj(N, k(3)*RHO_S).*besselh(N, 2, k(3)*RHO_G).*FC, 3).';
You can check afterwards that both matrices are the same
norm(Ez_t - EZ_T)

Finding matrix inverse by Gaussian Elimination With Partial Pivoting

Hello guys I am writing program to compute determinant(this part i already did) and Inverse matrix with GEPP. Here problem arises since i have completely no idea how to inverse Matrix using GEPP, i know how to inverse using Gauss Elimination ([A|I]=>[I|B]). I have searched through internet but still no clue, could you please explain me?
Here is my matlab code (maybe someone will find it useful), as of now it solves AX=b and computes determinant:
function [det1,X ] = gauss_czesciowy( A, b )
%GEPP
perm=0;
n = length(b);
if n~=m
error('vector has wrong size');
end
for j = 1:n
p=j;
% choice of main element
for i = j:n
if abs(A(i,j)) >= abs(A(p,j))
p = i;
end
end
if A(p,j) == 0
error('Matrix A is singular');
end
%rows permutation
t = A(p,:);
A(p,:) = A(j,:);
A(j,:) = t;
t = b(p);
b(p) = b(j);
b(j) = t;
if~(p==i)
perm=perm+1;
end
% reduction
for i = j+1:n
t = (A(i,j)/A(j,j));
A(i,:) = A(i,:)-A(j,:)*t;
b(i) = b(i)-b(j)*t;
end
end
%determinant
mn=1;
for i=1:n
mn=mn*A(i,i);
end
det1=mn*(-1)^perm;
% solution
X = zeros(1,n);
X(n) = b(n)/A(n,n);
if (det1~=0)
for i = 1:n
s = sum( A(i, (i+1):n) .* X((i+1):n) );
X(i) = (b(i) - s) / A(i,i);
end
end
end
Here is the algorithm for Guassian elimination with partial pivoting. Basically you do Gaussian elimination as usual, but at each step you exchange rows to pick the largest-valued pivot available.
To get the inverse, you have to keep track of how you are switching rows and create a permutation matrix P. The permutation matrix is just the identity matrix of the same size as your A-matrix, but with the same row switches performed. Then you have:
[A] --> GEPP --> [B] and [P]
[A]^(-1) = [B]*[P]
I would try this on a couple of matrices just to be sure.
EDIT: Rather than empirically testing this, let's reason it out. Basically what you are doing when you switch rows in A is you are multiplying it by your permutation matrix P. You could just do this before you started GE and end up with the same result, which would be:
[P*A|I] --> GE --> [I|B] or
(P*A)^(-1) = B
Due to the properties of the inverse operation, this can be rewritten:
A^(-1) * P^(-1) = B
And you can multiply both sides by P on the right to get:
A^(-1) * P^(-1)*P = B*P
A^(-1) * I = B*P
A^(-1) = B*P

MATLAB How to vectorize these for loops?

I've searched a lot but didn't find any solution to my problem, could you please help me vectorizing (or just a way to make it way faster) these loops ?
% n is the size of C
h = 1/(n-1)
dt = 1e-6;
a = 1e-2;
F=zeros(n,n);
F2=zeros(n,n);
C2=zeros(n,n);
t = 0.0;
for iter=1:12000
F2=F.^3-F;
for i=1:n
for j=1:n
F2(i,j)=F2(i,j)-(C(ij(i-1),j)+C(ij(i+1),j)+C(i,ij(j-1))+C(i,ij(j+1))-4*C(i,j)).*(a.^2)./(h.^2);
end
end
F=F2;
for i=1:n
for j=1:n
C2(i,j)=C(i,j)+(F(ij(i-1),j)+F(ij(i+1),j)+F(i,ij(j-1))+F(i,ij(j+1))-4*F(i,j)).*dt./(h^2);
end
end
C=C2;
t = t + dt;
end
function i=ij(i) %Just to have a matrix as loop (the n+1 th cases are the 1 th and 0 the 0th are nth)
if i==0
i=n;
return
elseif i==n+1
i=1;
end
return
end
thanks a lot
EDIT: Found an answer, it was totally ridiculous and I was searching way too far
%n is still the size of C
h = 1/((n-1))
dt = 1e-6;
a = 1e-2;
F=zeros(n,n);
var1=(a^2)/(h^2); %to make a bit less calculus
var2=dt/(h^2); % the same
t = 0.0;
for iter=1:12000
F=C.^3-C-var1*(C([n 1:n-1],1:n) + C([2:n 1], 1:n) + C(1:n, [n 1:n-1]) + C(1:n, [2:n 1]) - 4*C);
C = C + var2*(F([n 1:n-1], 1:n) + F([2:n 1], 1:n) + F(1:n, [n 1:n-1]) + F(1:n,[2:n 1]) - 4*F);
t = t + dt;
end
Found an answer, it was totally ridiculous and I was searching way too far
%n is still the size of C
h = 1/((n-1))
dt = 1e-6;
a = 1e-2;
F=zeros(n,n);
var1=(a^2)/(h^2); %to make a bit less calculus
var2=dt/(h^2); % the same
prev = [n 1:n-1];
next = [2:n 1];
t = 0.0;
for iter=1:12000
F = C.*C.*C - C - var1*(C(:,next)+C(:,prev)+C(next,:)+C(prev,:)-4*C);
C = C + var2*(F(:,next)+F(:,prev)+F(next,:)+F(prev,:)-4*F);
t = t + dt;
end
The behavior of the inner loop looks like a 2-dimensional circular convolution. That's the same as multiplication in the FFT domain. Subtraction is invariant across a linear operation such as FFT.
You'll want to use the fft2 and ifft2 functions.
Once you do that, I think you'll find that the repeated convolution can be eliminated by raising the convolution kernel (element-wise) to the power iter. If that optimization is correct, I'm predicting a speedup of 5 orders of magnitude.
You can replace for example C(ij(i-1),j) by using circshift(C,[1,0]) or circshift(C,[1,0]) (i can't figure out witch one of two is correct)
http://www.mathworks.com/help/matlab/ref/circshift.htm

Generalized Matrix Product

I'm fairly new to MATLAB. Normal matrix multiplication of a M x K matrix by an K x N matrix -- C = A * B -- has c_ij = sum(a_ik * b_kj, k = 1:K). What if I want this to be instead c_ij = sum(op(a_ik, b_kj), k = 1:K) for some simple binary operation op? Is there any nice way to vectorize this in MATLAB (or maybe even a built-in function)?
EDIT: This is currently the best I can do.
% A is M x K, B is K x N
% op is min
C = zeros(M, N);
for i = 1:M:
C(i, :) = sum(bsxfun(#min, A(i, :)', B));
end
Listed in this post is a vectorized approach that persists with bsxfun by using permute to create singleton dimensions as needed by bsxfun to let the singleton-expansion do its work and thus essentially replacing the loop in the original post. Please be reminded that bsxfun is a memory hungry implementation, so expect speedup with it only until it is stretched too far. Here's the final solution code -
op = #min; %// Edit this with your own function/ operation
C = sum(bsxfun(op, permute(A,[1 3 2]),permute(B,[3 2 1])),3)
NB - The above solution was inspired by Removing four nested loops in Matlab.
if the operator can operate element-by-element (like .*):
if(size(A,2)~=size(B,1))
error(blah, blah, blah...);
end
C = zeros(size(A,1),size(B,2));
for i = 1:size(A,1)
for j = 1:size(B,2)
C(i,j) = sum(binaryOp(A(i,:)',B(:,j)));
end
end
You can always write the loops yourself:
A = rand(2,3);
B = rand(3,4);
op = #times; %# use your own function here
C = zeros(size(A,1),size(B,2));
for i=1:size(A,1)
for j=1:size(B,2)
for k=1:size(A,2)
C(i,j) = C(i,j) + op(A(i,k),B(k,j));
end
end
end
isequal(C,A*B)
Depending on your specific needs, you may be able to use bsxfun in 3D to trick the binary operator. See this answer for more infos: https://stackoverflow.com/a/23808285/1121352
Another alternative would be to use cellfun with a custom function:
http://matlabgeeks.com/tips-tutorials/computation-using-cellfun/