Been having trouble simplifying this with into a series of loops? Not sure if to do nested / for or while loops. The variable Pnt could potentially be extended to an arbitrarily long number, but variables D and N are always from 1 to 6.
Pnt(7,:) = Pnt(1,:) +(-2*D(2)*N(2,:));
Pnt(8,:) = Pnt(1,:) + (-2*D(3)*N(3,:));
Pnt(9,:) = Pnt(1,:) + (-2*D(4)*N(4,:));
Pnt(10,:) = Pnt(1,:) + (-2*D(5)*N(5,:));
Pnt(11,:) = Pnt(1,:) + (-2*D(6)*N(6,:));
Pnt(12,:) = Pnt(2,:) + (-2*D(3)*N(3,:));
Pnt(13,:) = Pnt(2,:) + (-2*D(4)*N(4,:));
Pnt(14,:) = Pnt(2,:) + (-2*D(5)*N(5,:));
Pnt(15,:) = Pnt(2,:) + (-2*D(6)*N(6,:));
Pnt(16,:) = Pnt(2,:) + (-2*D(1)*N(1,:));
Pnt(17,:) = Pnt(3,:) + (-2*D(4)*N(4,:));
Pnt(18,:) = Pnt(3,:) + (-2*D(5)*N(5,:));
Pnt(19,:) = Pnt(3,:) + (-2*D(6)*N(6,:));
Pnt(20,:) = Pnt(3,:) + (-2*D(1)*N(1,:));
Pnt(21,:) = Pnt(3,:) + (-2*D(2)*N(2,:));
Pnt(22,:) = Pnt(4,:) + (-2*D(5)*N(5,:));
Pnt(23,:) = Pnt(4,:) + (-2*D(6)*N(6,:));
Pnt(24,:) = Pnt(4,:) + (-2*D(1)*N(1,:));
Pnt(25,:) = Pnt(4,:) + (-2*D(2)*N(2,:));
Pnt(26,:) = Pnt(4,:) + (-2*D(3)*N(3,:));
Pnt(27,:) = Pnt(5,:) + (-2*D(6)*N(6,:));
Pnt(28,:) = Pnt(5,:) + (-2*D(1)*N(1,:));
Pnt(29,:) = Pnt(5,:) + (-2*D(2)*N(2,:));
Pnt(30,:) = Pnt(5,:) + (-2*D(3)*N(3,:));
Pnt(31,:) = Pnt(5,:) + (-2*D(4)*N(4,:));
Pnt(32,:) = Pnt(6,:) + (-2*D(1)*N(1,:));
Pnt(33,:) = Pnt(6,:) + (-2*D(2)*N(2,:));
Pnt(34,:) = Pnt(6,:) + (-2*D(3)*N(3,:));
Pnt(35,:) = Pnt(6,:) + (-2*D(4)*N(4,:));
Pnt(36,:) = Pnt(6,:) + (-2*D(5)*N(5,:));
Here's a possible solution using 2 loops:
% Don't forget to pre-allocate memory for Pnt
p=7; q=1; r=1; %Initializing the variables to be used in the loop
while p<=36 % Since last row of Pnt to be calculated is 36th
for s=1:5 % Since each row of Pnt is used 5 times
r = mod(r,6); %Since maximum value for rows of D and N is 6
Pnt(p,:) = Pnt(q,:) + (-2*D(r+1)*N(r+1,:)); %Expression
p=p+1; r=r+1;
end
q=q+1; r=q;
end
bsxfun provides a handy way of performing element-wise operation on two arrays with implicit size-extension. The following is a way of implementing what you want in a more compact fashion using this funciton.
% compute the product of D and N beforehand for efficiency
DN = -2 * bsxfun(#times, N, (D(:)));
% compute new points in for-loop. 5 new points are generated
% in each iteration. Run longer for more points.
for ii=1:6
Pnt(1+ii*5+(1:5), :) = bsxfun(#plus, DN(mod(ii+(0:4), 6)+1, :), Pnt(mod(ii-1, 6)+1, :));
end
Related
I´m attempting to use the Runge-Kutta method to compare it to the lsode function. But it is performing rather poorly, every other method I used (forwards and backwards Euler, Heun) to compare to lsode do a way better job to the point they are almost indistinguishable from lsode.
This is what my code returns
https://i.stack.imgur.com/vJ6Yi.png
If anyone can point out a way to improve it or if I'm doing something wrong I´d appreciate it.
The following is what I use for the Runge-Kutta method
%Initial conditions
u(1) = 1;
v(1) = 2;
p(1) = -1/sqrt(3);
q(1) = 1/sqrt(3);
%Graf interval / step size
s0 = 0;
sf = 50;
h = 0.25;
n=(sf-s0)/h;
s(1) = s0;
%-----------------------------------------------------------------------%
for j = 2:n
i = j-1;
k1_u(j) = p(i);
k1_v(j) = q(i);
k1_p(j) = (-2*v(i)*p(i)*q(i)) / (u(i)*u(i) + v(i)*v(i) + 1);
k1_q(j) = (-2*u(i)*p(i)*q(i)) / (u(i)*u(i) + v(i)*v(i) + 1);
u1(j) = p(i) + (1/2)*k1_u(j)*h;
v1(j) = q(i) + (1/2)*k1_v(j)*h;
p1(j) = (-2*v(i)*p(i)*q(i)) / (u(i)*u(i) + v(i)*v(i) + 1) + (1/2)*k1_p(j)*h;
q1(j) = (-2*u(i)*p(i)*q(i)) / (u(i)*u(i) + v(i)*v(i) + 1) + (1/2)*k1_q(j)*h;
k2_u(j) = p1(j);
k2_v(j) = q1(j);
k2_p(j) = (-2*v1(j)*p1(j)*q1(j)) / (u1(j)*u1(j) + v1(j)*v1(j) + 1);
k2_q(j) = (-2*u1(j)*p1(j)*q1(j)) / (u1(j)*u1(j) + v1(j)*v1(j) + 1);
u2(j) = p(i) + (1/2)*k2_u(j)*h;
v2(j) = q(i) + (1/2)*k2_v(j)*h;
p2(j) = (-2*v(i)*p(i)*q(i)) / (u(i)*u(i) + v(i)*v(i) + 1) + (1/2)*k2_p(j)*h;
q2(j) = (-2*u(i)*p(i)*q(i)) / (u(i)*u(i) + v(i)*v(i) + 1) + (1/2)*k2_q(j)*h;
k3_u(j) = p2(j);
k3_v(j) = q2(j);
k3_p(j) = (-2*v2(j)*p2(j)*q2(j)) / (u2(j)*u2(j) + v2(j)*v2(j) + 1);
k3_q(j) = (-2*u2(j)*p2(j)*q2(j)) / (u2(j)*u2(j) + v2(j)*v2(j) + 1);
u3(j) = p(i) + k3_u(j)*h;
v3(j) = q(i) + k3_v(j)*h;
p3(j) = (-2*v(i)*p(i)*q(i)) / (u(i)*u(i) + v(i)*v(i) + 1) + k3_p(j)*h;
q3(j) = (-2*u(i)*p(i)*q(i)) / (u(i)*u(i) + v(i)*v(i) + 1) + k3_q(j)*h;
k4_u(j) = p3(j);
k4_v(j) = q3(j);
k4_p(j) = (-2*v3(j)*p3(j)*q3(j)) / (u3(j)*u3(j) + v3(j)*v3(j) + 1);
k4_q(j) = (-2*u3(j)*p3(j)*q3(j)) / (u3(j)*u3(j) + v3(j)*v3(j) + 1);
s(j) = s(j-1) + h;
u(j) = u(j-1) + (h/6)*(k1_u(j) + 2*k2_u(j) + 2*k3_u(j) + k4_u(j));
v(j) = v(j-1) + (h/6)*(k1_v(j) + 2*k2_v(j) + 2*k3_v(j) + k4_v(j));
p(j) = p(j-1) + (h/6)*(k1_p(j) + 2*k2_p(j) + 2*k3_p(j) + k4_p(j));
q(j) = q(j-1) + (h/6)*(k1_q(j) + 2*k2_q(j) + 2*k3_q(j) + k4_q(j));
endfor
subplot(2,3,1), plot(s,u);
hold on; plot(s,v); hold off;
title ("Runge-Kutta");
h = legend ("u(s)", "v(s)");
legend (h, "location", "northwestoutside");
set (h, "fontsize", 10);
You misunderstood something in the method. The intermediate values for p,q are computed the same way as the intermediate values for u,v, and both are "Euler steps" with the last computed slopes, not separate slope computations. For the first ones that is
u1(j) = u(i) + (1/2)*k1_u(j)*h;
v1(j) = v(i) + (1/2)*k1_v(j)*h;
p1(j) = p(i) + (1/2)*k1_p(j)*h;
q1(j) = q(i) + (1/2)*k1_q(j)*h;
The computation for the k2 values then is correct, the next midpoints need to be computed correctly via "Euler steps", etc.
I've been trying to port a set of differential equations from Matlab R2014b to python 3.4.
I've used both odeint and ode, with no satisfactory results. The expected results are the ones I get from Matlab where xi = xl and xj = xk with an offset of 180 in phase for each group, as you can see from the image below.
The code I'm running in Matlab is the following:
function Fv = cpg(t, ini_i);
freq = 2;
%fixed parameters
alpha = 5;
beta = 50;
miu = 1;
b = 1;
%initial conditions
xi = ini_i(1);
yi = ini_i(2);
xj = ini_i(3);
yj = ini_i(4);
xk = ini_i(5);
yk = ini_i(6);
xl = ini_i(7);
yl = ini_i(8);
ri = sqrt(xi^2 + yi^2);
rj = sqrt(xj^2 + yj^2);
rk = sqrt(xk^2 + yk^2);
rl = sqrt(xl^2 + yl^2);
%frequency for all oscillators
w_swing = 1;
w_stance = freq*w_swing;
%Coupling matrix, that determines a walking gate for
%a quadruped robot.
k = [ 0, -1, -1, 1;
-1, 0, 1, -1;
-1, 1, 0, -1;
1, -1, -1, 0];
%Hopf oscillator 1
omegai = w_stance/(exp(-b*yi)+1) + w_swing/(exp(b*yi)+1);
xi_dot = alpha*(miu - ri^2)*xi - omegai*yi;
yi_dot = beta*(miu - ri^2)*yi + omegai*xi + k(1,2)*yj + k(1,3)*yk + k(1,4)*yl;
%Hopf oscillator 2
omegaj = w_stance/(exp(-b*yj)+1) + w_swing/(exp(b*yj)+1);
xj_dot = alpha*(miu - rj^2)*xj - omegaj*yj;
yj_dot = beta*(miu - rj^2)*yj + omegaj*xj + k(2,1)*yi + k(2,3)*yk + k(2,4)*yl;
%Hopf oscillator 3
omegak = w_stance/(exp(-b*yk)+1) + w_swing/(exp(b*yj)+1);
xk_dot = alpha*(miu - rk^2)*xk - omegak*yk;
yk_dot = beta *(miu - rk^2)*yk + omegak*xk + k(3,4)*yl + k(3,2)*yj + k(3,1)*yi;
%Hopf oscillator 4
omegal = w_stance/(exp(-b*yl)+1) + w_swing/(exp(b*yl)+1);
xl_dot = alpha*(miu - rl^2)*xl - omegal*yl;
yl_dot = beta *(miu - rl^2)*yl + omegal*xl + k(4,3)*yk + k(4,2)*yj + k(4,1)*yi;
%Outputs
Fv(1,1) = xi_dot;
Fv(2,1) = yi_dot;
Fv(3,1) = xj_dot;
Fv(4,1) = yj_dot;
Fv(5,1) = xk_dot;
Fv(6,1) = yk_dot;
Fv(7,1) = xl_dot;
Fv(8,1) = yl_dot;
However, when I moved the code to python I get an output like this.
In python I ran the ODE solver using the same time step as in Matlab and using solver 'dopri5', which is supposed to be equivalent to the one I'm using in Matlab, ode45. I use the same initial conditions in both cases. I've used both odeint and ode with similar results.
I just started programming in Python and it is my first implementation using Scipy and Numpy so maybe I'm misinterpreting something?
def cpg(t, ini, k, freq):
alpha = 5
beta = 50
miu = 1
b = 1
assert freq == 2
'Initial conditions'
xi = ini[0]
yi = ini[1]
xj = ini[2]
yj = ini[3]
xk = ini[4]
yk = ini[5]
xl = ini[6]
yl = ini[7]
ri = sqrt(xi**2 + yi**2)
rj = sqrt(xj**2 + yj**2)
rk = sqrt(xk**2 + yk**2)
rl = sqrt(xl**2 + yl**2)
'Frequencies for each oscillator'
w_swing = 1
w_stance = freq * w_swing
'First Oscillator'
omegai = w_stance/(exp(-b*yi)+1) + w_swing/(exp(b*yi)+1)
xi_dot = alpha*(miu - ri**2)*xi - omegai*yi
yi_dot = beta*(miu - ri**2)*yi + omegai*xi + k[1]*yj + k[2]*yk + k[3]*yl
'Second Oscillator'
omegaj = w_stance/(exp(-b*yj)+1) + w_swing/(exp(b*yj)+1)
xj_dot = alpha*(miu - rj**2)*xj - omegaj*yj
yj_dot = beta*(miu - rj**2)*yj + omegaj*xj + k[4]*yi + k[6]*yk + k[7]*yl
'Third Oscillator'
omegak = w_stance/(exp(-b*yk)+1) + w_swing/(exp(b*yk)+1)
xk_dot = alpha*(miu - rk**2)*xk - omegak*yk
yk_dot = beta*(miu - rk**2)*yk + omegak*xk + k[8]*yi + k[9]*yj + k[11]*yl
'Fourth Oscillator'
omegal = w_stance/(exp(-b*yl)+1) + w_swing/(exp(b*yl)+1)
xl_dot = alpha*(miu - rl**2)*xl - omegal*yl
yl_dot = beta*(miu - rl**2)*yl + omegal*xl + k[12]*yi + k[13]*yj + k[14]*yk
return [xi_dot, yi_dot, xj_dot, yj_dot, xk_dot, yk_dot, xl_dot, yl_dot]
The way that I'm calling the ODE is as follows:
X0 = [1,-1, 0,-1, 1,1, 0,1]
r = ode(cpg).set_integrator('dopri5')
r.set_initial_value(X0).set_f_params(k_trot, 2)
t1 = 30.
dt = .012
while r.successful() and r.t < (t1-dt):
r.integrate(r.t+dt)
I hope I was clear enough.
Any suggestions?
I use function with multiple outputs farina4 that computes coefficients a, b, e, f and a vector out_p5tads_final (1 x n array) through a minimization of a system of equations using the data input set p5tads (1 x n array):
function [a b e f fval out_p5tads_final] = farina4(p5tads)
f = #(coeff)calculs_farina4(coeff,p5tads);
[ans,fval] = fminsearchcon(f,coeff0,[0 0 0 0],[1 1 1 1]);% fminsearch with constrains
a = ans(1);
b = ans(2);
e = ans(3);
f = ans(4);
out_p5tads_final = p5tads_farina4(a,b,e,f);
function out_coeff = calculs_farina4(coeff0,p5tads)
%bla-bla-bla
end
function out_p5tads = p5tads_farina4(a,b,e,f)
%bla-bla-bla
end
end
After calculating a, b, e, f and out_p5tads_final I need to calculate/minimize the RMS function with respect to out_p5tads_f4.
RMS = sqrt(mean((p5tads(:) - out_p5tads_f4(:)).^2))*100
and to repeat function farina4 in order to find the optimal set of the parameters a, b, e, f and out_p5tads_final.
I am trying to build up an algorithm of such optimization and do not see a way so far.
For instance, it seems to be not possible to introduce a function with multiple output inside the above RMS equation unless there is a way to index somehow the output of this function farina4.
If there can be an alternative optimization algorithm for RMS without fminsearch (or similar) ?
a b e and f are values between 0 and 1
out_p5tads_final is an (1 x 10) array
%
function out_coeff = calculs_farina4(coeff0,p5tads)
%
mmmm = p5tads(1);
mmmr = p5tads(2);
rmmr = p5tads(3);
mmrm = p5tads(4);
mmrr = p5tads(5);
rmrm = p5tads(6);
rmrr = p5tads(7);
mrrm = p5tads(8);
mrrr = p5tads(9);
rrrr = p5tads(10);
%
a = coeff0(1);
b = coeff0(2);
e = coeff0(3);
f = coeff0(4);
%
f_mmmm = mmmm - ((a^2*b^2*(a + b) + e^2*f^2*(e + f))/2);
f_mmmr = mmmr - (a^2*b^2*(e + f) + e^2*f^2*(a + b));
f_rmmr = rmmr - ((a^2*f^2*(b + e) + b^2*e^2*(a + f))/2);
f_mmrm = mmrm - 2*a*b*e*f;
f_mmrr = mmrr - b*f*(a^3 + e^3) + a*e*(a^3 + f^3);
f_rmrm = rmrm - 2*a*b*e*f;
f_rmrr = rmrr - 2*a*b*e*f;
f_mrrm = mrrm - ((a^2*b^2*(e + f) + e^2*f^2*(a + b))/2);
f_mrrr = mrrr - (a^2*f^2*(b + e) + b^2*e^2*(a + f));
f_rrrr = rrrr - ((a^2*f^2*(a + f) + b^2*e^2*(b + e))/2);
%
out_coeff = f_mmmm^2 + f_mmmr^2 + f_rmmr^2 + f_mmrm^2 + f_mmrr^2 + f_rmrm^2 + f_rmrr^2 + f_mrrm^2 + f_mrrr^2 + f_rrrr^2;
end
%
function out_p5tads = p5tads_farina4(a,b,e,f)
%
p_mmmm = ((a^2*b^2*(a + b) + e^2*f^2*(e + f))/2);
p_mmmr = (a^2*b^2*(e + f) + e^2*f^2*(a + b));
p_rmmr = ((a^2*f^2*(b + e) + b^2*e^2*(a + f))/2);
p_mmrm = 2*a*b*e*f;
p_mmrr = b*f*(a^3 + e^3) + a*e*(a^3 + f^3);
p_rmrm = 2*a*b*e*f;
p_rmrr = 2*a*b*e*f;
p_mrrm = ((a^2*b^2*(e + f) + e^2*f^2*(a + b))/2);
p_mrrr = (a^2*f^2*(b + e) + b^2*e^2*(a + f));
p_rrrr = ((a^2*f^2*(a + f) + b^2*e^2*(b + e))/2);
%
out_p5tads = [p_mmmm,p_mmmr,p_rmmr,p_mmrm,p_mmrr,p_rmrm,p_rmrr,p_mrrm,p_mrrr,p_rrrr];
end
end
Thanks much in advance !
19/08/2014 3:35 pm
I need to get an optimal set of coefficients a b e f that the RMS value , which is calculated from
RMS = sqrt(mean((p5tads(:) - out_p5tads_f4(:)).^2))*100
is minimal. Here, the vector p5tads is used to calculate/optimize the set of a b e f coefficients, which are in turn used to calculate the vector out_p5tads_f4. The code should run a desired number of optimizations cycles (e.g. by default 100) and then select the series of a b e f and out_p5tads_f4 afforded the minimal RMS error value (with respect to out_p5tads_f4).
I have a 1024*1024*51 matrix. I'll do calculations to change some value of the matrix within for loops (change the value of matrix for each iteration). I find that the computing speed gets slower and slower and finally my computer gets into trouble.However the size of the matrix doesn't change. Anyone can shed some light on this problem?
function ActiveContours3D(method,grad,im,mu,nu,lambda1,lambda2,TimeSteps)
epsilon = 10e-10;
tic
fid=fopen('Chr18_z_25of25tiles-C=0_c0_n000.raw','rb','ieee-le');
Xdim = 1024;
Ydim = 1024;
Zdim = 51;
A = fread(fid,[Xdim Ydim*Zdim],'int16');
A = double(A);
size_of_A = size(A)
for(i=1:Zdim)
u0_color(:,:,i) = A(1 : Xdim , (i-1)*Ydim+1 : i*Ydim);
end
fclose(fid)
time = toc
[M,N,P,color] = size(u0_color);
size(u0_color );
u0_color = double(u0_color); % Convert u0_color values to double;
u0 = u0_color(:,:,:,1); % Define the Grayscale volumetric image.
u0_color = uint8(u0_color); % Necessary for color visualization
x = 1:M;
y = 1:N;
z = 1:P;
dx = 1
dy = 1
dz = 1
dim_approx = 2*M*N*P / sqrt(M*N*P);
if(method == 'Explicit')
dt = 0.9 / ((2*mu/(dx^2)) + (2*mu/(dy^2)) + (2*mu/(dz^2))) % 90% CFL
elseif(method == 'Implicit')
dt = (10e7) * 0.9 / ((2*mu/(dx^2)) + (2*mu/(dy^2)) + (2*mu/(dz^2)))
end
[X,Y,Z] = meshgrid(x,y,z);
x0 = (M+1)/2;
y0 = (N+1)/2;
z0 = (P+1)/2;
r0 = min(min(M,N),P)/3;
phi = sqrt((X-x0).^2 + (Y-y0).^2 + (Z-z0).^2) - r0;
phi_visualize = phi; % Use this for visualization in 3D
phi = permute(phi,[2,1,3]); % Use this for computations in 3D
write_to_binary_file(phi_visualize,0); % record initial conditions
tic
for(n=1:TimeSteps)
n
c1 = C1_3d(u0,phi);
c2 = C2_3d(u0,phi);
% x
phi_xp = [phi(2:M,:,:); phi(M,:,:)]; % vertical concatenation
phi_xm = [phi(1,:,:); phi(1:M-1,:,:)]; % (since x values are rows)
% cat(1,A,B) is the same as [A;B]
Dx_m = (phi - phi_xm)/dx; % first derivatives
Dx_p = (phi_xp - phi)/dx;
Dxx = (Dx_p - Dx_m)/dx; % second derivative
% y
phi_yp = [phi(:,2:N,:) phi(:,N,:)]; % horizontal concatenation
phi_ym = [phi(:,1,:) phi(:,1:N-1,:)]; % (since y values are columns)
% cat(2,A,B) is the same as [A,B]
Dy_m = (phi - phi_ym)/dy;
Dy_p = (phi_yp - phi)/dy;
Dyy = (Dy_p - Dy_m)/dy;
% z
phi_zp = cat(3,phi(:,:,2:P),phi(:,:,P));
phi_zm = cat(3,phi(:,:,1) ,phi(:,:,1:P-1));
Dz_m = (phi - phi_zm)/dz;
Dz_p = (phi_zp - phi)/dz;
Dzz = (Dz_p - Dz_m)/dz;
% x,y,z
Dx_0 = (phi_xp - phi_xm) / (2*dx);
Dy_0 = (phi_yp - phi_ym) / (2*dy);
Dz_0 = (phi_zp - phi_zm) / (2*dz);
phi_xp_yp = [phi_xp(:,2:N,:) phi_xp(:,N,:)];
phi_xp_ym = [phi_xp(:,1,:) phi_xp(:,1:N-1,:)];
phi_xm_yp = [phi_xm(:,2:N,:) phi_xm(:,N,:)];
phi_xm_ym = [phi_xm(:,1,:) phi_xm(:,1:N-1,:)];
phi_xp_zp = cat(3,phi_xp(:,:,2:P),phi_xp(:,:,P));
phi_xp_zm = cat(3,phi_xp(:,:,1) ,phi_xp(:,:,1:P-1));
phi_xm_zp = cat(3,phi_xm(:,:,2:P),phi_xm(:,:,P));
phi_xm_zm = cat(3,phi_xm(:,:,1) ,phi_xm(:,:,1:P-1));
phi_yp_zp = cat(3,phi_yp(:,:,2:P),phi_yp(:,:,P));
phi_yp_zm = cat(3,phi_yp(:,:,1) ,phi_yp(:,:,1:P-1));
phi_ym_zp = cat(3,phi_ym(:,:,2:P),phi_ym(:,:,P));
phi_ym_zm = cat(3,phi_ym(:,:,1) ,phi_ym(:,:,1:P-1));
if(grad == 'Dirac')
Grad = DiracDelta(phi); % Dirac delta
%Grad = 1;
elseif(grad == 'Grad ')
Grad = (((Dx_0.^2)+(Dy_0.^2)+(Dz_0.^2)).^(1/2)); % |grad phi|
end
if(method == 'Explicit')
% CURVATURE: *mu*k|grad phi|* (central differences):
K = zeros(M,N,P);
Dxy = (phi_xp_yp - phi_xp_ym - phi_xm_yp + phi_xm_ym) / (4*dx*dy);
Dxz = (phi_xp_zp - phi_xp_zm - phi_xm_zp + phi_xm_zm) / (4*dx*dz);
Dyz = (phi_yp_zp - phi_yp_zm - phi_ym_zp + phi_ym_zm) / (4*dy*dz);
K = ( (Dx_0.^2).*Dyy - 2*Dx_0.*Dy_0.*Dxy + (Dy_0.^2).*Dxx ...
+ (Dx_0.^2).*Dzz - 2*Dx_0.*Dz_0.*Dxz + (Dz_0.^2).*Dxx ...
+ (Dy_0.^2).*Dzz - 2*Dy_0.*Dz_0.*Dyz + (Dz_0.^2).*Dyy) ./ ((Dx_0.^2 + Dy_0.^2 + Dz_0.^2).^(3/2) + epsilon);
phi_temp = phi + dt * Grad .* ( mu.*K + lambda1*(u0 - c1).^2 - lambda2*(u0 - c2).^2 );
elseif(method == 'Implicit')
C1x = 1 ./ sqrt(Dx_p.^2 + Dy_0.^2 + Dz_0.^2 + (10e-7)^2);
C2x = 1 ./ sqrt(Dx_m.^2 + Dy_0.^2 + Dz_0.^2 + (10e-7)^2);
C3y = 1 ./ sqrt(Dx_0.^2 + Dy_p.^2 + Dz_0.^2 + (10e-7)^2);
C4y = 1 ./ sqrt(Dx_0.^2 + Dy_m.^2 + Dz_0.^2 + (10e-7)^2);
C5z = 1 ./ sqrt(Dx_0.^2 + Dy_0.^2 + Dz_p.^2 + (10e-7)^2);
C6z = 1 ./ sqrt(Dx_0.^2 + Dy_0.^2 + Dz_m.^2 + (10e-7)^2);
% m = (dt/(dx*dy)) * Grad .* mu; % 2D
m = (dt/(dx*dy)) * Grad .* mu;
C = 1 + m.*(C1x + C2x + C3y + C4y + C5z + C6z);
C1x_2x = C1x.*phi_xp + C2x.*phi_xm;
C3y_4y = C3y.*phi_yp + C4y.*phi_ym;
C5z_6z = C5z.*phi_zp + C6z.*phi_zm;
phi_temp = (1 ./ C) .* ( phi + m.*(C1x_2x+C3y_4y) + (dt*Grad).*(lambda1*(u0 - c1).^2) - (dt*Grad).*(lambda2*(u0 - c2).^2) );
end
phi = phi_temp;
phi_visualize = permute(phi,[2,1,3]);
write_to_binary_file(phi_visualize,n); % record
end
time = toc
n = n
T = dt*n
clear
clear all
In general Matlab keeps track of all the variables in the form of matrix. When you work with lot of variables with many dimensions the RAM memory will be allocated for storing this variable. Hence on working with lots of variables that is gonna run for multiple iterations it is better to clear the variable from the memory. To do so use the command
clear variable_name_1, variable_name_2,... variable_name_3;
Although keeping all the variables keeps the code to look organised, however when you face such issues try clearing the unwanted variables.
Check this link to use clear command in detail: http://www.mathworks.in/help/matlab/ref/clear.html
I'm not really familiar with vectorization, but I am aware that, amongst MATLAB's strengths, code vectorization is probably the most rewarded.
I have this code:
ikx= (-Nx/2:Nx/2-1)*dk1;
iky= (-Ny/2:Ny/2-1)*dk2;
ikz= (-Nz/2:Nz/2-1)*dk3;
[k1,k2,k3] = ndgrid(ikx,iky,ikz);
k = sqrt(k1.^2 + k2.^2 + k3.^2);
Cij = zeros(3,3,Nx,Ny,Nz);
count = 0;
for ii = 1:Nx
for jj = 1:Ny
for kk = 1:Nz
if ~isequal(k1(ii,jj,kk),0)
count = count +1;
fprintf('iteration step %i\r\n',count)
E_int = interp1(k_vec,E_vec,k(ii,jj,kk),'spline','extrap');
beta = c*gamma./(k(ii,jj,kk).*sqrt(E_int));
k30 = k3(ii,jj,kk) + beta*k1(ii,jj,kk);
k0 = sqrt(k1(ii,jj,kk)^2 + k2(ii,jj,kk)^2 + k30^2);
Ek0 = 1.453*(k0^4/((1 + k0^2)^(17/6)));
B = sigmaiso*sqrt((Ek0./(k0.^2))*((dk1*dk2*dk3)/(4*pi)));
C1 = ((beta.*k1(ii,jj,kk).^2).*(k0.^2 - 2*k30.^2 + k30.*beta.*k1(ii,jj,kk)))./(k(ii,jj,kk).^2.*(k1(ii,jj,kk).^2 + k2(ii,jj,kk).^2));
C2 = ((k2(ii,jj,kk).*(k0.^2))./((k1(ii,jj,kk).^2 + k2(ii,jj,kk).^2).^(3/2))).*atan2((beta.*k1(ii,jj,kk).*sqrt(k1(ii,jj,kk).^2 + k2(ii,jj,kk).^2)),(k0.^2 - k30.*beta.*k1(ii,jj,kk)));
xhsi1 = C1 - C2.*(k2(ii,jj,kk)./k1(ii,jj,kk));
xhsi2 = C1.*(k2(ii,jj,kk)./k1(ii,jj,kk)) + C2;
Cij(1,1,ii,jj,kk) = B.*((k2(ii,jj,kk).*xhsi1)./(k0));
Cij(1,2,ii,jj,kk) = B.*((k3(ii,jj,kk)-k1(ii,jj,kk).*xhsi1+beta.*k1(ii,jj,kk))./(k0));
Cij(1,3,ii,jj,kk) = B.*(-k2(ii,jj,kk)./(k0));
Cij(2,1,ii,jj,kk) = B.*((k2(ii,jj,kk).*xhsi2-k3(ii,jj,kk)-beta.*k1(ii,jj,kk))./(k0));
Cij(2,2,ii,jj,kk) = B.*((-k1(ii,jj,kk).*xhsi2)./(k0));
Cij(2,3,ii,jj,kk) = B.*(k1(ii,jj,kk)./(k0));
Cij(3,1,ii,jj,kk) = B.*(k2(ii,jj,kk).*k0./(k(ii,jj,kk).^2));
Cij(3,2,ii,jj,kk) = B.*(-k1(ii,jj,kk).*k0./(k(ii,jj,kk).^2));
end
end
end
end
Generally, I might avoid the nested for loops; nonetheless, the if statement on k1 values is currently directing me towards the classical and old-fashion code structure.
I blatantly would like to bypass the presence of the for loops in favour of vectorized and more elegant solution.
Any support is more than welcome.
EDIT
To let better understand what the code is expected to perform, I hereby provide you with some basics:
EDIT2
As #Floris advised, I came up with this alternative solution:
ikx= (-Nx/2:Nx/2-1)*dk1;
iky= (-Ny/2:Ny/2-1)*dk2;
ikz= (-Nz/2:Nz/2-1)*dk3;
[k1,k2,k3] = ndgrid(ikx,iky,ikz);
k = sqrt(k1.^2 + k2.^2 + k3.^2);
ii = (ikx ~= 0);
k1w = k1(ii,:,:);
k2w = k2(ii,:,:);
k3w = k3(ii,:,:);
kw = k(ii,:,:);
E_int = interp1(k_vec,E_vec,kw,'spline','extrap');
beta = c*gamma./(kw.*sqrt(E_int));
k30 = k3w + beta.*k1w;
k0 = sqrt(k1w.^2 + k2w.^2 + k30.^2);
Ek0 = (1.453*k0.^4)./((1 + k0.^2).^(17/6));
B = sqrt((2*(pi^2)*(l^3))*(Ek0./(V*k0.^4)));
k1w_2 = k1w.^2;
k2w_2 = k2w.^2;
k30_2 = k30.^2;
k0_2 = k0.^2;
kw_2 = kw.^2;
C1 = ((beta.*k1w_2).*(k0_2 - 2.*k30_2 + beta.*k1w.*k30))./(kw_2.*(k1w_2 + k2w_2));
C2 = ((k2w.*k0_2)./((k1w_2 + k2w_2).^(3/2))).*atan2((beta.*k1w).*sqrt(k1w_2 + k2w_2),(k0_2 - k30.*k1w.*beta));
xhsi1 = C1 - (k2w./k1w).*C2;
xhsi2 = (k2w./k1w).*C1 + C2;
Cij = zeros(3,3,Nx,Ny,Nz);
Cij(1,1,ii,:,:) = B.*(k2w.*xhsi1);
Cij(1,2,ii,:,:) = B.*(k3w - k1w.*xhsi1 + beta.*k1w);
Cij(1,3,ii,:,:) = B.*(-k2w);
Cij(2,1,ii,:,:) = B.*(k2w.*xhsi2 - k3w - beta.*k1w);
Cij(2,2,ii,:,:) = B.*(-k1w.*xhsi2);
Cij(2,3,ii,:,:) = B.*(k1w);
Cij(3,1,ii,:,:) = B.*((k0_2./kw_2).*k2w);
Cij(3,2,ii,:,:) = B.*(-(k0_2./kw_2).*k1w);
You can do your test just once, and then create arrays of "just the elements you need". Example:
% create an index of all the elements that are worth computing:
worthComputing = find(k1(:)~=0);
% now create sub-arrays of all the other arrays... a little bit expensive on memory,
% but much faster for computation:
kw = k(worthComputing);
k1w = k1(worthComputing);
k2w = k2(worthComputing);
k3w = k3(worthComputing);
% now we'll compute all the results of the innermost for loop in single statements:
E_int = interp1(k_vec,E_vec,kw,'spline','extrap');
beta = c*gamma./kw.*sqrt(E_int));
k30 = k3w + beta*k1w;
k0 = sqrt(k1w.^2 + k2w.^2 + k30.^2);
Ek0 = 1.453*(k0.^4/((1 + k0.^2).^(17/6)));
% the next line has dk1, dk2, dk3 ... not sure what they are? Not shown to be initialized. Assuming scalars as they are not indexed.
B = sigmaiso*sqrt((Ek0./(k0.^2))*((dk1*dk2*dk3)/(4*pi)));
C1 = ((beta.*k1w.^2).*(k0.^2 - 2*k30.^2 + k30.*beta.*k1w))./(kw.^2.*(k1w.^2 + k2w.^2));
C2 = ((k2w.*(k0.^2))./((k1w.^2 + k2w.^2).^(3/2))).*atan2((beta.*k1w.*sqrt(k1w.^2 + ...
k2w.^2)),(k0.^2 - k30.*beta.*k1w));
xhsi1 = C1 - C2.*(k2w./k1w);
xhsi2 = C1.*(k2w./k1w) + C2;
% in the next lines I am using the trick of "collapsing" the remaining indices
% in other words, Matlab figures out that I want to access the elements in C
% that correspond to the ii, jj, kk that were picked before...
Cij(1,1,worthComputing) = B.*((k2w.*xhsi1)./(k0));
Cij(1,2,worthComputing) = B.*((k3w-k1w.*xhsi1+beta.*k1w)./(k0));
Cij(1,3,worthComputing) = B.*(-k2w./(k0));
Cij(2,1,worthComputing) = B.*((k2w.*xhsi2-k3w-beta.*k1w)./(k0));
Cij(2,2,worthComputing) = B.*((-k1w.*xhsi2)./(k0));
Cij(2,3,worthComputing) = B.*(k1w./(k0));
Cij(3,1,worthComputing) = B.*(k2w.*k0./(kw.^2));
Cij(3,2,worthComputing) = B.*(-k1w.*k0./(kw.^2));
It is entirely possible there's a typo or two in the above - but this is the basic approach to vectorization.