I am writing a program in MATLAB as a part of my project based on DFT.
Let the N x N data matrix be X and the corresponding DFT matrix be Y, then the DFT coefficients can be expressed as
Y(k1,k2) = ∑(n1=0:N-1)∑(n2=0:N-1)[X(n1,n2)*(WN^(n1k1+n2k2))] (1)
0≤k1,k2≤N-1
Where WN^k=e^((-j2πk)/N)
Since the twiddle factor WN is periodic, (1) can be expressed as
Y(k1,k2)=∑(n1=0:N-1)∑(n1=0:N-1)[X(n1,n2)*(WN^([(n1k1+n2k2)mod N) ] (2)
The exponent ((n1k1 +n2k2)) N = p is satisfied by a set of (n1,n2) for a given (k1,k2). Hence, by grouping such data and applying the property that WN^(p+N /2) = -(WN^P),
(2) can be expressed as
Y(k1,k2)= ∑(p=0:M-1)[Y(k1,k2,p)*(WN^p)] (3)
Where
Y(k1,k2,p)= ∑(∀(n1,n2)|z=p)X(n1,n2) - ∑(∀(n1,n2)|z=p+M)X(n1,n2) (4)
z=[(n1k1+n2k2)mod N] (5)
I am coding a program to find Y(k1,k2,p).ie I need to create slices of 2d matrices(ie a 3D matrix in which each slice is a 2D matrix )from a given 2D square matrix (which is the matrix X)..Dimensions of X can be upto 512.
Based on the above equations,I have written a code as follows.I need to vectorise it.
N=size(X,1);
M=N/2;
Y(1:N,1:N,1:M)=0;
for k1 = 1:N
for k2 = 1:N
for p= 1:M
for n1=1:N
for n2=1:N
N1=n1-1; N2=n2-1; P=p-1; K1=k1-1; K2=k2-1;
z=mod((N1*K1+N2*K2),N);
if (z==P)
Y(k1,k2,p)= Y(k1,k2,p)+ X(n1,n2);
elsif (z==(P+M))
Y(k1,k2,p)= Y(k1,k2,p)- X(n1,n2);
end
end
end
end
end
As there is 5 FOR loops, the execution time is very large for large dimensions of N. Hence please provide me a solution for eliminating the FOR loops and vectorising the code..I need to make the code execute in maximum speed...Thanks Again..
Here is a first hint to vectorize the most inner loop.
From your code, we can notice that n1, N1, P, K1 and K2 are constant in this loop.
So we can rewrite z as a mask vector as follows:
z = mod(N1*K1+K2*(0:N-1));
Then your if-statement is equivalent to adding the sum of all elements in X so that z==P minus the sum of all elements in X so that z==P+M. Rewriting this is straightforward:
Y(k1,k2,p)= Y(k1,k2,p)+sum(X(n1,z==P))-sum(X(n1,z==P+M));
So your program can be first written as follows:
N=size(X,1);
M=N/2;
Y(1:N,1:N,1:M)=0;
for k1 = 1:N
for k2 = 1:N
for p= 1:M
for n1=1:N
N1=n1-1; P=p-1; K1=k1-1; K2=k2-1;
z=mod(N1*K1+K2*(0:N-1),N);
Y(k1,k2,p) = sum(X(n1,z==P))-sum(X(n1,z==P+M));
end
end
end
end
Then you can do the same thing with n1; for that, you need to construct a 2D array for z, such as:
z = mod(K1*repmat(0:N-1,N,1)+K2*repmat((0:N-1).',1,N));
Notice that size(z)==size(X).Then the 2D sum for Y becomes:
Y(k1,k2,p) = Y(k1,k2,p)+sum(X(z==P))-sum(X(z==P+M));
The += operation is here no longer needed, since you access only once to each element of Y:
Y(k1,k2,p)= sum(X(n1,z==P))-sum(X(n1,z==P+M));
And so we discard one more loop:
N=size(X,1);
M=N/2;
Y(1:N,1:N,1:M)=0;
for k1 = 1:N
for k2 = 1:N
for p= 1:M
P=p-1; K1=k1-1; K2=k2-1;
z = mod(K1*repmat(0:N-1,N,1)+K2*repmat((0:N-1).',1,N));
Y(k1,k2,p) = sum(X(z==P))-sum(X(z==P+M));
end
end
end
Concerning the other loops, I don't think it worths it to vectorize them, as you have to build a 5D array, which could be very huge in memory. My advise is to keep z as a 2D array, as it is of the size of X. If it does not fit well in memory, just vectorize the most inner loop.
Related
I have a matlab program with 5 nested
for
loops and a
if
condition like this:
for x0=1:N
for y0=1:N
for k=1:N
for x1=1:N
for y1=1:N
if ~((y1-x1>N/2)||(x1-y1>N/2)) && ~((y0-x0>N/2)||(x0-y0>N/2))
A(x0,y0)=A(x0,y0)+2^(k*((x0-y0)+(x1-y1)))*B(x1,y1)
end
end
end
end
end
end
where A and B are two matrices. How can I make this program run faster?
I've tried to use meshgrid but it seems doesn't work because there's a
if
condition.
Lets be smart about loops and conditions first, as you are using the loop indices as condition variables.
We start with
~(y1-x1>N/2)||(x1-y1>N/2), or way clearer, abs(y1-x1)<N/2.
Instead of having an if condition, why not enforce y1 to be in range, always?
The last loop can be written as y1=max(x1-N/2,1):min(x1+N/2,N), and thus the entirety of the first part of the if condition is not needed. We can do the same for the other variables, of course:
for x0=1:N
for y0=max(x0-N/2,1):min(x0+N/2,N)
for k=1:N
for x1=1:N
for y1=max(x1-N/2,1):min(x1+N/2,N)
A(x0,y0)=A(x0,y0)+2^(k*((x0-y0)+(x1-y1)))*B(x1,y1)
end
end
end
end
end
Now, for clarity, lets reshuffle and vectorize that k. There is no need for it to be the middle loop, in fact, its only feature as the middle loop is to confuse the person reading the code. But aside from that, there is no need for it to be a loop either.
k=1:N;
for x0=1:N
for y0=max(x0-N/2,1):min(x0+N/2,N)
for x1=1:N
for y1=max(x1-N/2,1):min(x1+N/2,N)
A(x0,y0)=A(x0,y0)+sum(2.^(k*((x0-y0)+(x1-y1))))*B(x1,y1)
end
end
end
end
Now, is this faster?
No. MATLAB is really good at optimizing your code, so it is not faster. But at least its way way clearer, so I guess you got that going for you. But you need it faster! Well.... I am not sure you can. You have a 5 nested loops, that is just super slow. I don't think you can meshgrid this, even without the conditions, because you intermingle all loops. meshgrid is good when well, you do operations on a mesh grid, but in your case you use all x1,y1 for every x0,y0 and thus its not a mesh operation.
Here is a vectorized solution:
x0 = (1:N).';
y0 = 1:N;
x1 = (1:N).';
y1 = 1:N;
k = reshape(1:N, 1, 1, N);
conditiona = ~((y0-x0 > N/2) | (x0-y0 > N/2));
conditionb = ~((y1-x1 > N/2) | (x1-y1 > N/2));
a = 2 .^ (k .* (x0-y0)) .* conditiona;
b = 2 .^ (k .* (x1-y1)) .* B .* conditionb;
bsum = squeeze(sum(sum(b, 1) ,2));
A = A + reshape(reshape(a, [] , N) * bsum ,N ,N);
Note that two 3D arrays a and b are created that may/may not require a lot of memory. In such a case you need to loop over k. For example in the first iteration set k to 1:5. In the second iteration set it to 6:10 and so on. You need to addv the result of each iteration to the previous iteration to form the final A.
Explanation
This function can be vectorized by implicit expansion (that is more efficient than using meshgrid) and using element-wise operators like .^ and .* instead of ^ and * operators. As a result a 5D array is created (because we have 5 loop variables) that should be summed over 3-5th dimensions to produce the final 2D matrix. However that may require a lot of memory. Another point is that functions that contains the sum of products usually can be written as efficient matrix multiplication.
The expression:
2^(k*((x0-y0)+(x1-y1)))*B(x1,y1);
can be written as:
2 .^ (k .* (x0-y0)) .* 2 .^ (k .* (x1-y1)) .* B(x1, y1)
------- a -------- ------------- b ---------------
that is the multiplication of two sub-expressions that each has 3 dimensions, because each contains just 3 loop variables. So the 5D problem is reduced to 3D.
The if condition has also two sub-expressions that each can be multiplied by a and b sub-expressions:
conditiona = ~((y0-x0 > N/2) | (x0-y0 > N/2));
conditionb = ~((y1-x1 > N/2) | (x1-y1 > N/2));
a = 2 .^ (k .* (x0-y0)) .* conditiona;
b = 2 .^ (k .* (x1-y1)) .* B .* conditionb;
A for loop can be formed just by using two loop variables x0 and y0:
for x0=1:N
for y0=1:N
A(x0,y0)=A(x0,y0)+ sum(sum(sum(a(x0,x0, :) .* b, 3), 2), 1);
%or simply A(x0,y0)=A(x0,y0)+ sum(a(x0,x0, :) .* b, "all");
end
end
That can be simplified to the following loop by precomputing sum of b:
bsum = sum(sum(b, 1) ,2);
% bsum = sum(b ,[1, 2]);
for x0=1:N
for y0=1:N
A(x0,y0)=A(x0,y0)+ sum(a(x0,x0, :) .* bsum, 3);
% or as vector x vector multiplication
% A(x0,y0)=A(x0,y0)+ squeeze(a(x0,x0, :)).' * squeeze(bsum);
end
end
Here the loop can be prevented by using the matrix x vector multiplication:
A = A + reshape(reshape(a, [] , N) * bsum ,N ,N);
Update this solution may not be faster under Matlab, because the execution engine can optimise the loops in the original code. (It does provide a speedup under Octave.)
One trick to deal with if statements within loops is to turn the if statement (or part of it) into a logical matrix. You can then multiply the logical matrix elementwise by the matrix of values you are adding in each step. A false value will evaluate to zero and will not change the result.
This only works if each element can be computed independently of the others.
It will generally make the actual calculation line slower, but in Matlab this is often outweighed by the huge speed improvement from the the removal of the for loops.
In your example, we can use this idea with meshgrid to remove the loops over x0 and y0.
The calculation line needs to become an elementwise matrix caluclation, so elementwise operators .*, .^ and | replace *, ^ and |.
% Warning: Y0 and X0 are swapped in this example
[Y0, X0] = meshgrid(1:N,1:N);
% Create a logical matrix which represents part of the if statement
C = ~((Y0-X0>N/2) | (X0-Y0>N/2));
for k=1:N
for x1=1:N
for y1=1:N
if ~((y1-x1>N/2)||(x1-y1>N/2))
% Multiply by C elementwise
A = A + C.*2.^(k*((X0-Y0)+(x1-y1)))*B(x1,y1);
end
end
end
end
You could even take this concept further and remove more loops using multidemensional ndgrids, but it becomes more complex (you have to start summing over dimensions) and the multidimensional arrays become unwieldy if N is large.
Note: be careful with index order. meshgrid defines y as rows and x as columns, so matrix indexing is A(y,x) but you are using A(x,y). So to make your example work I've switched x and y in the output of meshgrid.
I am currently comparing elements of two matrices A and B in two nested for loops to create a new matrix F with the higher value at each position. All three matrices have equal dimensions.
My current code looks like this (MWE):
F = zeros(n, n)
Indeces_max_i = zeros(n, n)
for i=1 : max
%generate the matrix, mock example
B = randn(n, n)
for c=1 : size(F,1) %Loop over each element, possible to optimise?
for l=1 : size(F,2)
if B(c,l) > F(c,l)
F(c,l) = B(c,l);
Indeces_max_i(c,l) = i;
end
end
end
end
As this can get quite computational heavy, I was wondering if there is a possibility to improve performance of the for loop 1 - and potentially if this solution could also apply if B had size n x n x m (so it would take the highest value of each element along dimensions m)?
I am trying to evaluate the matrices Y(p,k) and Z(p,k) using the following simplified Matlab code.
They depend on some matrices A(j,k), B(j,p) and C(j,k) which I am able to precalculate, so I have just initialised them as random arrays for this MWE. (Note that B is a different size to A and C).
Nj = 5000;
Nk = 1000;
Np = 500; % max loop iterations
A = rand(Nj,Nk); % dummy precalculated matrices
B = rand(Nj,Np);
C = rand(Nj,Nk);
Y = zeros(Np,Nk); % allocate storage
Z = zeros(Np,Nk);
tic
for p = 1:Np
X = A .* B(:,p);
Y(p,:) = sum( X , 1 );
Z(p,:) = sum( C .* X , 1 );
end
toc % Evaluates to 11 seconds on my system
As can be seen above, I am repeating my calculation by looping over index p (because the matrix B depends on p).
I have managed to get this far by moving everything which can be precalculated outside the loop (contained in A, B and C), but on my system this code still takes around 11 seconds to execute. Can anyone see a way in Matlab to speed this up, or perhaps even remove the loop and process all at once?
Thank you
I think the following should be equivalent and much faster:
Y = B' * A;
Z = B' * (A.*C);
Notes:
If B is complex-valued then you should use .' for transposition instead.
You may also want to pre-compute B directly in transposed form (i.e. as a Np by Nj matrix) to avoid the transposition altogether.
If C is not needed anywhere else, then pre-compute it as A.*C instead in order to avoid the extra element-wise multiplication.
I need to numerically evaluate some integrals which are all of the form shown in this image:
These integrals are the matrix elements of a N x N matrix, so I need to evaluate them for all possible combinations of n and m in the range of 1 to N. The integrals are symmetric in n and m which I have implemented in my current nested for loop approach:
function [V] = coulomb3(N, l, R, R0, c, x)
r1 = 0.01:x:R;
r2 = R:x:R0;
r = [r1 r2];
rl1 = r1.^(2*l);
rl2 = r2.^(2*l);
sines = zeros(N, length(r));
V = zeros(N, N);
for i = 1:N;
sines(i, :) = sin(i*pi*r/R0);
end
x1 = length(r1);
x2 = length(r);
for nn = 1:N
for mm = 1:nn
f1 = (1/6)*rl1.*r1.^2.*sines(nn, 1:x1).*sines(mm, 1:x1);
f2 = ((R^2/2)*rl2 - (R^3/3)*rl2.*r2.^(-1)).*sines(nn, x1+1:x2).*sines(mm, x1+1:x2);
value = 4*pi*c*x*trapz([f1 f2]);
V(nn, mm) = value;
V(mm, nn) = value;
end
end
I figured that calling sin(x) in the loop was a bad idea, so I calculate all the needed values and store them. To evaluate the integrals I used trapz, but as the first and the second/third integrals have different ranges the function values need to be calculated separately and then combined.
I've tried a couple different ways of vectorization but the only one that gives the correct results takes much longer than the above loop (used gmultiply but the arrays created are enourmous). I've also made an analytical solution (which is possible assuming m and n are integers and R0 > R > 0) but these solutions involve a cosine integral (cosint in MATLAB) function which is extremely slow for large N.
I'm not sure the entire thing can be vectorized without creating very large arrays, but the inner loop at least should be possible. Any ideas would be be greatly appreciated!
The inputs I use currently are:
R0 = 1000;
R = 8.4691;
c = 0.393*10^(-2);
x = 0.01;
l = 0 # Can reasonably be 0-6;
N = 20; # Increasing the value will give the same results,
# but I would like to be able to do at least N = 600;
Using these values
V(1, 1:3) = 873,379900963549 -5,80688363271849 -3,38139152472590
Although the diagonal values never converge with increasing R0 so they are less interesting.
You will lose the gain from the symmetricity of the problem with my approach, but this means a factor of 2 loss. Odds are that you'll still benefit in the end.
The idea is to use multidimensional arrays, making use of trapz supporting these inputs. I'll demonstrate the first term in your figure, as the two others should be done similarly, and the point is the technique:
r1 = 0.01:x:R;
r2 = R:x:R0;
r = [r1 r2].';
rl1 = r1.'.^(2*l);
rl2 = r2.'.^(2*l);
sines = zeros(length(r),N); %// CHANGED!!
%// V = zeros(N, N); not needed now, see later
%// you can define sines in a vectorized way as well:
sines = sin(r*(1:N)*pi/R0); %//' now size [Nr, N] !
%// note that implicitly r is of size [Nr, 1, 1]
%// and sines is of size [Nr, N, 1]
sines2mat = permute(sines,[1, 3, 2]); %// size [Nr, 1, N]
%// the first term in V: perform integral along first dimension
%//V1 = 1/6*squeeze(trapz(bsxfun(#times,bsxfun(#times,r.^(2*l+2),sines),sines2mat),1))*x; %// 4*pi*c prefactor might be physics, not math
V1 = 1/6*permute(trapz(bsxfun(#times,bsxfun(#times,r.^(2*l+2),sines),sines2mat),1),[2,3,1])*x; %// 4*pi*c prefactor might be physics, not math
The key point is that bsxfun(#times,r.^(2*l+2),sines) is a matrix of size [Nr,N,1], which is again multiplied by sines2mat using bsxfun, the result is of size [Nr,N,N] and an element (k1,k2,k3) corresponds to an integrand at radial point k1, n=k2 and m=k3. Using trapz() with explicitly the first dimension (which would be default) reduces this to an array of size [1,N,N], which is just what you need after a good squeeze(). Update: as per #Dev-iL's comment you should use permute instead of squeeze to get rid of the leading singleton dimension, as that might be more efficent.
The two other terms can be handled the same way, and of course it might still help if you restructure the integrals based on overlapping and non-overlapping parts.
Is given:
stationary mass ms=1;
Eta-constant eta=0.45;
variable number of repetitions, e.g. N=5;
omega OM=sqrt(ks/ms);
angular frequency om=eta*OM;
time period T=2*pi/om;
upper bound TTT=1.5;
variable for creating function t=0:0.001:TTT;
I made a function like that:
kt=zeros(size(t));
for j=1:2*N+1
n= j-(N+1);
if n==0
k(j)=ks/2;
else
k(j)=i/pi/n;
end
kt=kt+k(j)*exp(i*n*om*t);
end
It’s a Sawtooth wave and there is my problem. From the complex vector kt with value 1x1501 double I have to make the Hermitean matrix for variable N. This means that N can be 5, can be 50, 100, etc. The matrix should look like (picture):
Where k1 is k for N=1, k0 is k for N=0 or k-1 is k for N=-1. Size of matrix is 2*N+1 and 2*N+1.
Thank you for your help and responding!
That's a Toeplitz matrix, you can use the toeplitz command to generate the matrix above. In the general case, this would have been written as:
H = toeplitz(kt(N:end), kt(1:N + 1))
where the first N values in kt correspond to k-N, ... k-1, and the last N + 1 values are k0, ... kN. However, since H is Hermitian, this can be simplified to:
H = toeplitz(kt(N:end));
Try this code:
k=[1 2+i 3+i 4+i 5+i];
N=7;
M=diag(k(1)*ones(N,1));
for j=1:length(k)-1
M=M+diag(k(j+1)*ones(N-j,1),j)+diag(conj(k(j+1))*ones(N-j,1),-j)
end;
Here N should be equal or greater than the length of k array