speed up code - vectorization

speed up code - vectorization - matlab

I'm not really familiar with vectorization, but I am aware that, amongst MATLAB's strengths, code vectorization is probably the most rewarded.
I have this code:
ikx= (-Nx/2:Nx/2-1)*dk1;
iky= (-Ny/2:Ny/2-1)*dk2;
ikz= (-Nz/2:Nz/2-1)*dk3;
[k1,k2,k3] = ndgrid(ikx,iky,ikz);
k = sqrt(k1.^2 + k2.^2 + k3.^2);
Cij = zeros(3,3,Nx,Ny,Nz);
count = 0;
for ii = 1:Nx
for jj = 1:Ny
for kk = 1:Nz
if ~isequal(k1(ii,jj,kk),0)
count = count +1;
fprintf('iteration step %i\r\n',count)
E_int = interp1(k_vec,E_vec,k(ii,jj,kk),'spline','extrap');
beta = c*gamma./(k(ii,jj,kk).*sqrt(E_int));
k30 = k3(ii,jj,kk) + beta*k1(ii,jj,kk);
k0 = sqrt(k1(ii,jj,kk)^2 + k2(ii,jj,kk)^2 + k30^2);
Ek0 = 1.453*(k0^4/((1 + k0^2)^(17/6)));
B = sigmaiso*sqrt((Ek0./(k0.^2))*((dk1*dk2*dk3)/(4*pi)));
C1 = ((beta.*k1(ii,jj,kk).^2).*(k0.^2 - 2*k30.^2 + k30.*beta.*k1(ii,jj,kk)))./(k(ii,jj,kk).^2.*(k1(ii,jj,kk).^2 + k2(ii,jj,kk).^2));
C2 = ((k2(ii,jj,kk).*(k0.^2))./((k1(ii,jj,kk).^2 + k2(ii,jj,kk).^2).^(3/2))).*atan2((beta.*k1(ii,jj,kk).*sqrt(k1(ii,jj,kk).^2 + k2(ii,jj,kk).^2)),(k0.^2 - k30.*beta.*k1(ii,jj,kk)));
xhsi1 = C1 - C2.*(k2(ii,jj,kk)./k1(ii,jj,kk));
xhsi2 = C1.*(k2(ii,jj,kk)./k1(ii,jj,kk)) + C2;
Cij(1,1,ii,jj,kk) = B.*((k2(ii,jj,kk).*xhsi1)./(k0));
Cij(1,2,ii,jj,kk) = B.*((k3(ii,jj,kk)-k1(ii,jj,kk).*xhsi1+beta.*k1(ii,jj,kk))./(k0));
Cij(1,3,ii,jj,kk) = B.*(-k2(ii,jj,kk)./(k0));
Cij(2,1,ii,jj,kk) = B.*((k2(ii,jj,kk).*xhsi2-k3(ii,jj,kk)-beta.*k1(ii,jj,kk))./(k0));
Cij(2,2,ii,jj,kk) = B.*((-k1(ii,jj,kk).*xhsi2)./(k0));
Cij(2,3,ii,jj,kk) = B.*(k1(ii,jj,kk)./(k0));
Cij(3,1,ii,jj,kk) = B.*(k2(ii,jj,kk).*k0./(k(ii,jj,kk).^2));
Cij(3,2,ii,jj,kk) = B.*(-k1(ii,jj,kk).*k0./(k(ii,jj,kk).^2));
end
end
end
end
Generally, I might avoid the nested for loops; nonetheless, the if statement on k1 values is currently directing me towards the classical and old-fashion code structure.
I blatantly would like to bypass the presence of the for loops in favour of vectorized and more elegant solution.
Any support is more than welcome.
EDIT
To let better understand what the code is expected to perform, I hereby provide you with some basics:
EDIT2
As #Floris advised, I came up with this alternative solution:
ikx= (-Nx/2:Nx/2-1)*dk1;
iky= (-Ny/2:Ny/2-1)*dk2;
ikz= (-Nz/2:Nz/2-1)*dk3;
[k1,k2,k3] = ndgrid(ikx,iky,ikz);
k = sqrt(k1.^2 + k2.^2 + k3.^2);
ii = (ikx ~= 0);
k1w = k1(ii,:,:);
k2w = k2(ii,:,:);
k3w = k3(ii,:,:);
kw = k(ii,:,:);
E_int = interp1(k_vec,E_vec,kw,'spline','extrap');
beta = c*gamma./(kw.*sqrt(E_int));
k30 = k3w + beta.*k1w;
k0 = sqrt(k1w.^2 + k2w.^2 + k30.^2);
Ek0 = (1.453*k0.^4)./((1 + k0.^2).^(17/6));
B = sqrt((2*(pi^2)*(l^3))*(Ek0./(V*k0.^4)));
k1w_2 = k1w.^2;
k2w_2 = k2w.^2;
k30_2 = k30.^2;
k0_2 = k0.^2;
kw_2 = kw.^2;
C1 = ((beta.*k1w_2).*(k0_2 - 2.*k30_2 + beta.*k1w.*k30))./(kw_2.*(k1w_2 + k2w_2));
C2 = ((k2w.*k0_2)./((k1w_2 + k2w_2).^(3/2))).*atan2((beta.*k1w).*sqrt(k1w_2 + k2w_2),(k0_2 - k30.*k1w.*beta));
xhsi1 = C1 - (k2w./k1w).*C2;
xhsi2 = (k2w./k1w).*C1 + C2;
Cij = zeros(3,3,Nx,Ny,Nz);
Cij(1,1,ii,:,:) = B.*(k2w.*xhsi1);
Cij(1,2,ii,:,:) = B.*(k3w - k1w.*xhsi1 + beta.*k1w);
Cij(1,3,ii,:,:) = B.*(-k2w);
Cij(2,1,ii,:,:) = B.*(k2w.*xhsi2 - k3w - beta.*k1w);
Cij(2,2,ii,:,:) = B.*(-k1w.*xhsi2);
Cij(2,3,ii,:,:) = B.*(k1w);
Cij(3,1,ii,:,:) = B.*((k0_2./kw_2).*k2w);
Cij(3,2,ii,:,:) = B.*(-(k0_2./kw_2).*k1w);

You can do your test just once, and then create arrays of "just the elements you need". Example:
% create an index of all the elements that are worth computing:
worthComputing = find(k1(:)~=0);
% now create sub-arrays of all the other arrays... a little bit expensive on memory,
% but much faster for computation:
kw = k(worthComputing);
k1w = k1(worthComputing);
k2w = k2(worthComputing);
k3w = k3(worthComputing);
% now we'll compute all the results of the innermost for loop in single statements:
E_int = interp1(k_vec,E_vec,kw,'spline','extrap');
beta = c*gamma./kw.*sqrt(E_int));
k30 = k3w + beta*k1w;
k0 = sqrt(k1w.^2 + k2w.^2 + k30.^2);
Ek0 = 1.453*(k0.^4/((1 + k0.^2).^(17/6)));
% the next line has dk1, dk2, dk3 ... not sure what they are? Not shown to be initialized. Assuming scalars as they are not indexed.
B = sigmaiso*sqrt((Ek0./(k0.^2))*((dk1*dk2*dk3)/(4*pi)));
C1 = ((beta.*k1w.^2).*(k0.^2 - 2*k30.^2 + k30.*beta.*k1w))./(kw.^2.*(k1w.^2 + k2w.^2));
C2 = ((k2w.*(k0.^2))./((k1w.^2 + k2w.^2).^(3/2))).*atan2((beta.*k1w.*sqrt(k1w.^2 + ...
k2w.^2)),(k0.^2 - k30.*beta.*k1w));
xhsi1 = C1 - C2.*(k2w./k1w);
xhsi2 = C1.*(k2w./k1w) + C2;
% in the next lines I am using the trick of "collapsing" the remaining indices
% in other words, Matlab figures out that I want to access the elements in C
% that correspond to the ii, jj, kk that were picked before...
Cij(1,1,worthComputing) = B.*((k2w.*xhsi1)./(k0));
Cij(1,2,worthComputing) = B.*((k3w-k1w.*xhsi1+beta.*k1w)./(k0));
Cij(1,3,worthComputing) = B.*(-k2w./(k0));
Cij(2,1,worthComputing) = B.*((k2w.*xhsi2-k3w-beta.*k1w)./(k0));
Cij(2,2,worthComputing) = B.*((-k1w.*xhsi2)./(k0));
Cij(2,3,worthComputing) = B.*(k1w./(k0));
Cij(3,1,worthComputing) = B.*(k2w.*k0./(kw.^2));
Cij(3,2,worthComputing) = B.*(-k1w.*k0./(kw.^2));
It is entirely possible there's a typo or two in the above - but this is the basic approach to vectorization.

Related

How do I implement six initial conditions for a system of two coupled second order differential equations?

I am new to coding. Basic information about my problem is: r1 and r2 are two variables; u1 = dr1/dt, u2 = dr2/dt, and du1/dt = d^2r1/dt^2, du2/dt = d^2r2/dt^2. In Matlab code: r(1) implies r1, r(2) -> u1, r(3) -> r2, r(4) -> u2. rdot(2) is the expression for du1/dt and rdot(4) is the expression for du2/dt.
Ideally I should need just 4 initial conditions: r1(0), u1(0), r2(0), u2(0), which are 10d-6, 0, 5d-6, 0. But in my case du1/dt has dependence on du2/dt and vice versa. See last term of T1_1 and T2_1. And an ideal IC for both du1/dt and du2/dt is 0. But how do I implement this in my code?
My code is here.
function rdot = f(t, r)
P_stat = 1.01325d5;
P_v = 2.3388d3;
mu = 1.002d-3;
sigma = 72.8d-3;
c_s = 1481d0;
poly_exp = 1.4d0;
rho = 998.2071d0;
f_s = 20d3;
P_s = 1.01325d5;
r1_eq = 10d-6;
r2_eq = 4d-6;
d = 1d-3;
rdot(1) = r(2);
P1_bw = ( (P_stat - P_v + (2.d0*sigma/r1_eq))*((r1_eq/r(1))^(3.d0*poly_exp)) ) - (2.d0*sigma/r(1)) - (4.d0*mu*r(2)/r(1));
P1_ext = P_s*sin(2.d0*pi*f_s*(t + (r(1)/c_s)));
T2_1 = ((2.d0*r(3)*(r(4)^2.d0)) + ((r(3)^2.d0)*rdot(4)))/d;
T2_4 = (1.d0 - (r(2)/c_s))*r(1);
T2_5 = 1.5d0*(1.d0 - (r(2)/(3.d0*c_s)))*(r(2)^2.d0);
T2_6 = (1.d0 + (r(2)/c_s))*(P1_bw - P_stat + P_v - P1_ext)/rho;
T2_8 = ( (-3.d0*poly_exp*r(2)*(P_stat - P_v + (2.d0*sigma/r1_eq))*((r1_eq/r(1))^(3.d0*poly_exp)) ) + (2.d0*sigma*r(2)/r(1)) - (4.d0*mu*(- ((r(2)^2.d0)/r(1)))) )/r(1);
T2_9 = 2.d0*pi*f_s*P_s*(cos(2.d0*pi*f_s*(t + (r(1)/c_s))))*(1.d0 + (r(2)/c_s) );
T2_7 = (r(1)/(rho*c_s))*(T2_8 - T2_9);
rdot(2) = (T2_6 + T2_7 - T2_1 - T2_5)/(T2_4 + (4.d0*mu/(rho*c_s)));
rdot(3) = r(4);
P2_bw = ( (P_stat - P_v + (2.d0*sigma/r1_eq))*((r1_eq/r(3))^(3.d0*poly_exp)) ) - (2.d0*sigma/r(3)) - (4.d0*mu*r(4)/r(3));
P2_ext = P_s*sin(2.d0*pi*f_s*(t + (r(3)/c_s)));
T1_1 = ((2.d0*r(1)*(r(2)^2.d0)) + ((r(1)^2.d0)*rdot(2)))/d;
T1_4 = (1.d0 - (r(4)/c_s))*r(3);
T1_5 = 1.5d0*(1.d0 - (r(4)/(3.d0*c_s)))*(r(4)^2.d0);
T1_6 = (1.d0 + (r(4)/c_s))*(P2_bw - P_stat + P_v - P2_ext)/rho;
T1_8 = ( (-3.d0*poly_exp*r(4)*(P_stat - P_v + (2.d0*sigma/r1_eq))*((r1_eq/r(3))^(3.d0*poly_exp)) ) + (2.d0*sigma*r(4)/r(3)) - (4.d0*mu*(- ((r(4)^2.d0)/r(3)))) )/r(3);
T1_9 = 2.d0*pi*f_s*P_s*(cos(2.d0*pi*f_s*(t + (r(3)/c_s))))*(1.d0 + (r(4)/c_s) );
T1_7 = (r(3)/(rho*c_s))*(T1_8 - T1_9);
rdot(4) = (T1_6 + T1_7 - T1_1 - T1_5)/(T1_4 + (4.d0*mu/(rho*c_s)));
rdot = rdot';
clc;
clear all;
close all;
time_range = [0 3000d-6];
initial_conditions = [10d-6 0.d0 5d-6 0.d0];
[t, r] = ode45('bubble', time_range, initial_conditions);
plot(t, r(:, 1), t, r(:, 3));

For each degree of an ODE you'll need one initial condition. This is due to the amount of functions you are calculating. In your case you got a system of second order ODE's, which after being solved will provide a total of four functions: r(1), r(2), r(3), r(4)
Why do we even need initial conditions? Imagine you have a simple derivative:y'= y
We know that the function y = exp(x) * C solves this problem, but we need to adjust C in order to get "The one Solution". On the other hand side it makes no sense to give y' an initial condition, as it is fully defined, once y is defined. It doesn't matter whether the "foreign" variable appears in the form of a derivative or as a linear factor. It is independent from the amount of IC's.
I hope I could clarify it a bit. From my point of view your program should work that way, but I haven't had the chance to try it out.

I have fixed the problem, by introducing two new variables with initial conditions but then they are updated as the code runs. The new code is here with two variables: r2ddot and r1ddot;
function rdot = f(t, r)
P_stat = 1.01325d5;
P_v = 2.3388d3;
mu = 1.002d-3;
sigma = 72.8d-3;
c_s = 1481d0;
poly_exp = 1.4d0;
rho = 998.2071d0;
f_s = 20d3;
P_s = 1.01325d5;
r1_eq = 4d-6;
r2_eq = 5d-6;
d = 50*(r1_eq + r2_eq);
r2ddot = 0;
r1ddot = 0;
rdot(1) = r(2);
P2_bw = ( (P_stat - P_v + (2.d0*sigma/r2_eq))*((r2_eq/r(3))^(3.d0*poly_exp)) ) - (2.d0*sigma/r(3)) - (4.d0*mu*r(4)/r(3));
P2_ext = P_s*sin(2.d0*pi*f_s*(t + (r(3)/c_s)));
T1_1 = ((2.d0*r(1)*(r(2)^2.d0)) + ((r(1)^2.d0)*r1ddot))/d;
T1_4 = (1.d0 - (r(4)/c_s))*r(3);
T1_5 = 1.5d0*(1.d0 - (r(4)/(3.d0*c_s)))*(r(4)^2.d0);
T1_6 = (1.d0 + (r(4)/c_s))*(P2_bw - P_stat + P_v - P2_ext)/rho;
T1_8 = ( (-3.d0*poly_exp*r(4)*(P_stat - P_v + (2.d0*sigma/r2_eq))*((r2_eq/r(3))^(3.d0*poly_exp)) ) + (2.d0*sigma*r(4)/r(3)) - (4.d0*mu*(- ((r(4)^2.d0)/r(3)))) )/r(3);
T1_9 = 2.d0*pi*f_s*P_s*(cos(2.d0*pi*f_s*(t + (r(3)/c_s))))*(1.d0 + (r(4)/c_s) );
T1_7 = (r(3)/(rho*c_s))*(T1_8 - T1_9);
rdot(2) = (T1_6 + T1_7 - T1_1 - T1_5)/(T1_4 + (4.d0*mu/(rho*c_s))) ;
r2ddot = rdot(2);
rdot(3) = r(4);
P1_bw = ( (P_stat - P_v + (2.d0*sigma/r1_eq))*((r1_eq/r(1))^(3.d0*poly_exp)) ) - (2.d0*sigma/r(1)) - (4.d0*mu*r(2)/r(1));
P1_ext = P_s*sin(2.d0*pi*f_s*(t + (r(1)/c_s)));
T2_1 = ((2.d0*r(3)*(r(4)^2.d0)) + ((r(3)^2.d0)*r2ddot))/d;
T2_4 = (1.d0 - (r(2)/c_s))*r(1);
T2_5 = 1.5d0*(1.d0 - (r(2)/(3.d0*c_s)))*(r(2)^2.d0);
T2_6 = (1.d0 + (r(2)/c_s))*(P1_bw - P_stat + P_v - P1_ext)/rho;
T2_8 = ( (-3.d0*poly_exp*r(2)*(P_stat - P_v + (2.d0*sigma/r1_eq))*((r1_eq/r(1))^(3.d0*poly_exp)) ) + (2.d0*sigma*r(2)/r(1)) - (4.d0*mu*(- ((r(2)^2.d0)/r(1)))) )/r(1);
T2_9 = 2.d0*pi*f_s*P_s*(cos(2.d0*pi*f_s*(t + (r(1)/c_s))))*(1.d0 + (r(2)/c_s) );
T2_7 = (r(1)/(rho*c_s))*(T2_8 - T2_9);
rdot(4) = (T2_6 + T2_7 - T2_1 - T2_5)/(T2_4 + (4.d0*mu/(rho*c_s)));
r1ddot = rdot(4);
rdot = rdot';
clc;
clear all;
close all;
time_range = [0 1d-3];
initial_conditions = [4d-6 0.d0 5d-6 0.d0];
[t, r] = ode45('bubble', time_range, initial_conditions);
plot(t, r(:, 1), t, r(:, 3));
But I am not able to get the desired result. Actually I am trying to reproduce the results from the attached paper, see equation 7. enter link description here
The second set of equations can be obtained by interchanging indices 1 and 2.
Important note: There is a typo in the last term of equation 7 in Mettin's paper, that can be verified by using check on the dimensions of the various term. The correct last term can be seen from another paper https://journals.aps.org/pre/abstract/10.1103/PhysRevE.83.066313
See last term in eq.(1) below. Ignore the other extra terms in the equation. Important point is that the equation is of second order in Rj and the equation has a second order term in Ri at the end. And This is what I have tried to code.
Any help will be highly appreciated.

You have essentially the situation that
rdot(2) = a2 + b2*rdot(4)
rdot(4) = a4 + b4*rdot(2)
where a2,b2,a4,b4 contain all the other terms in your expressions.
This is a linear system that you have to solve to get the correct values to return. You can use a linear solver of Matlab or do in this simple 2-dimensional case do it by hand,
rdot(2) = a2 + b2*(a4 + b4*rdot(2)) ==> rdot(2) = (a2 + b2*a4) / (1 - b2*b4)
rdot(4) = a4 + b4*(a2 + b2*rdot(4)) ==> rdot(2) = (a4 + b4*a2) / (1 - b2*b4)
To apply this you need to split
T1_1 as T1_1a + T1_1b*rdot(4),
you can compute T1_1a and T1_1b from the given constant and state variables. Then where you have in the end
rdot(2)=(other + coeff*T1_1)/denom
you have to split into
a2 =(other + coeff*T1_1a) / denom and
b2 = coeff*T1_1b / denom
and do the same to the second part to get a4,b4 and then apply the solution formulas above.

MATLAB - Adaptive Step Size Runge-Kutta

I've programmed in MATLAB an adaptive step size RK4 to solve a system of ODEs. The code runs without error, however it does not produce the desired curve when I try to plot x against y. Instead of being a toroidal shape, I simply get a flat line. This is evident from the fact that r is outputting a constant value. After checking the outputs of each line, they are not outputting constants or errors or inf or NaN, rather they are outputting both a real and imaginary component (complex numbers). I have no idea as to why this is occurring and I believe it to be the source of my trouble.
function AdaptRK4()
parsec = 3.08*10^18;
r_1 = 8.5*1000.0*parsec; % in cm
theta_1 = 0.0;
a = 0.5*r_1;
gam = 1;
grav = 6.6720*10^-8;
amsun = 1.989*10^33;
amg = 1.5d11*amsun;
gm = grav*amg;
u_1 = 20.0*10^5;
v = sqrt(gm/r_1);
time = 0.0;
epsilon = 0.00001;
m1 = 0.5;
m2 = 0.5;
m3 = 0.5;
i = 1;
nsteps = 50000;
deltat = 5.0*10^12;
angmom = r_1*v;
angmom2 = angmom^2.0;
e = -2*10^5.0*gm/r_1+u_1*u_1/2.0+angmom2/(2.0*r_1*r_1);
for i=1:nsteps
deltat = min(deltat,nsteps-time);
fk3_1 = deltat*u_1;
fk4_1 = deltat*(-gm*r_1*r_1^(-gam)/(a+r_1)^(3- gam)+angmom2/(r_1^3.0));
fk5_1 = deltat*(angmom/(r_1^2.0));
r_2 = r_1+fk3_1/4.0;
u_2 = u_1+fk4_1/4.0;
theta_2 = theta_1+fk5_1/4.0;
fk3_2 = deltat*u_2;
fk4_2 = deltat*(-gm*r_2*r_2^(-gam)/(a+r_2)^(3-gam)+angmom2/(r_2^3.0));
fk5_2 = deltat*(angmom/(r_2^2.0));
r_3 = r_1+(3/32)*fk3_1 + (9/32)*fk3_2;
u_3 = u_1+(3/32)*fk4_1 + (9/32)*fk4_2;
theta_3 = theta_1+(3/32)*fk5_1 + (9/32)*fk5_2;
fk3_3 = deltat*u_3;
fk4_3 = deltat*(-gm*r_3*r_3^(-gam)/(a+r_3)^(3-gam)+angmom2/(r_3^3.0));
fk5_3 = deltat*(angmom/(r_3^2.0));
r_4 = r_1+(1932/2197)*fk3_1 - (7200/2197)*fk3_2 + (7296/2197)*fk3_3;
u_4 = u_1+(1932/2197)*fk4_1 - (7200/2197)*fk4_2 + (7296/2197)*fk4_3;
theta_4 = theta_1+(1932/2197)*fk5_1 - (7200/2197)*fk5_2 + (7296/2197)*fk5_3;
fk3_4 = deltat*u_4;
fk4_4 = deltat*(-gm*r_4*r_4^(-gam)/(a+r_4)^(3-gam)+angmom2/(r_4^3.0));
fk5_4 = deltat*(angmom/(r_4^2.0));
r_5 = r_1+(439/216)*fk3_1 - 8*fk3_2 + (3680/513)*fk3_3 - (845/4104)*fk3_4;
u_5 = u_1+(439/216)*fk4_1 - 8*fk4_2 + (3680/513)*fk4_3 - (845/4104)*fk4_4;
theta_5 = theta_1+(439/216)*fk5_1 - 8*fk5_2 + (3680/513)*fk5_3 - (845/4104)*fk5_4;
fk3_5 = deltat*u_5;
fk4_5 = deltat*(-gm*r_5*r_5^(-gam)/(a+r_5)^(3-gam)+angmom2/(r_5^3.0));
fk5_5 = deltat*(angmom/(r_5^2.0));
r_6 = r_1-(8/27)*fk3_1 - 2*fk3_2 - (3544/2565)*fk3_3 + (1859/4104)*fk3_4-(11/40)*fk3_5;
u_6 = u_1-(8/27)*fk4_1 - 2*fk4_2 - (3544/2565)*fk4_3 + (1859/4104)*fk4_4-(11/40)*fk4_5;
theta_6 = theta_1-(8/27)*fk5_1 - 2*fk5_2 - (3544/2565)*fk5_3 + (1859/4104)*fk5_4-(11/40)*fk5_5;
fk3_6 = deltat*u_6;
fk4_6 = deltat*(-gm*r_6*r_6^(-gam)/(a+r_6)^(3-gam)+angmom2/(r_6^3.0));
fk5_6 = deltat*(angmom/(r_6^2.0));
fm3_1 = m1 + 25*fk3_1/216+1408*fk3_3/2565+2197*fk3_4/4104-fk3_5/5;
fm4_1 = m2 + 25*fk4_1/216+1408*fk4_3/2565+2197*fk4_4/4104-fk4_5/5;
fm5_1 = m3 + 25*fk5_1/216+1408*fk5_3/2565+2197*fk5_4/4104-fk5_5/5;
fm3_2 = m1 + 16*fk3_1/135+6656*fk3_3/12825+28561*fk3_4/56430-9*fk3_5/50+2*fk3_6/55;
fm4_2 = m2 + 16*fk4_1/135+6656*fk4_3/12825+28561*fk4_4/56430-9*fk4_5/50+2*fk4_6/55;
fm5_2 = m3 + 16*fk5_1/135+6656*fk5_3/12825+28561*fk5_4/56430-9*fk5_5/50+2*fk5_6/55;
R3 = abs(fm3_1-fm3_2)/deltat;
R4 = abs(fm4_1-fm4_2)/deltat;
R5 = abs(fm5_1-fm5_2)/deltat;
err3 = 0.84*(epsilon/R3)^(1/4);
err4 = 0.84*(epsilon/R4)^(1/4);
err5 = 0.84*(epsilon/R5)^(1/4);
if R3<= epsilon
time = time+deltat;
fm3 = fm3_1;
i = i+1;
deltat = err3*deltat;
end
if R4<= epsilon
time = time+deltat;
fm4 = fm4_1;
i = i+1;
deltat = err4*deltat;
end
if R5<= epsilon
time = time+deltat;
fm5 = fm5_1;
i = i+1;
deltat = err5*deltat;
end
e=2*gm^2.0/(2*angmom2);
ecc=(1.0+(2.0*e*angmom2)/(gm^2.0))^0.5;
x(i)=r_1*cos(theta_1)/(1000.0*parsec);
y(i)=r_1*sin(theta_1)/(1000.0*parsec);
time=time+deltat;
r(i)=r_1;
time1(i)=time;
end
figure()
plot(x,y, '-k');
TeXString = title('Plot of Orbit in Gamma Potential Obtained Using RK4')
xlabel('x')
ylabel('y')

You are getting complex values because at some point npts - time < 0. You may want to print out the values of deltat to check the error.
Also, your code doesn't seem to take into account the case when the error estimate is larger than your tolerance. When your error estimate is greater than your tolerance you have to:
Shift back the time AND solution
calculate a new step-size based on a formula, and
recalculate your solution and error estimate.
The fact that you don't know how many iterations you will have to go through makes the use of a for-loop for adaptive runge Kutta a bit awkward. I suggest using a while loop instead.

You are using "i" in your code. "i" returns the basic imaginary unit. "i" is equivalent to sqrt(-1). Try to use another identifier in your loops and only use "i" or "j" in calculations where complex numbers are involved.

Efficient solution to object detection algorithm

I'm trying to implement this paper 'Salient Object detection by composition' here is the link: http://research.microsoft.com/en-us/people/yichenw/iccv11_salientobjectdetection.pdf
I have implemented the algorithm but it takes a long time to execute and display the output. I'm using 4 for loops in the code(Using for loops is the only way I could think of to implement this algorithm.) I have searched online for MATLAB code, but couldn't find anything. So can anyone please suggest any faster way to implement the algorithm. Also in the paper they(the authors) say that they have implemented the code using MATLAB and it runs quickly. So there definitely is a way to write the code more efficiently.
I appreciate any hint or code to execute this algorithm efficiently.
clc
clear all
close all
%%instructions to run segment.cpp
%to run this code
%we need an output image
%segment sigma K min input output
%sigma: used for gaussian smoothing of the image
%K: scale of observation; larger K means larger components in segmentation
%min: minimum component size enforced by post processing
%%
%calculating composition cost for each segment
I_org = imread('segment\1.ppm');
I = imread('segment\output1.ppm');
[rows,cols,dims] = size(I);
pixels = zeros(rows*cols,dims);
red_channel = I(:,:,1);
green_channel = I(:,:,2);
blue_channel = I(:,:,3);
[unique_pixels,count_pixels] = countPixels(I);
no_segments = size(count_pixels,1);
area_segments = count_pixels ./ (rows * cols);
appearance_distance = zeros(no_segments,no_segments);
spatial_distance = zeros(no_segments,no_segments);
thresh = multithresh(I_org,11);
thresh_values = [0 thresh];
for i = 1:no_segments
leave_pixel = unique_pixels(i,:);
mask_image = ((I(:,:,1) == leave_pixel(1)) & (I(:,:,2) == leave_pixel(2)) & (I(:,:,3) == leave_pixel(3)));
I_i(:,:,1) = I_org(:,:,1) .* uint8((mask_image));
I_i(:,:,2) = I_org(:,:,2) .* uint8((mask_image));
I_i(:,:,3) = I_org(:,:,3) .* uint8((mask_image));
LAB_trans = makecform('srgb2lab');
I_i_LAB = applycform(I_i,LAB_trans);
L_i_LAB = imhist(I_i_LAB(:,:,1));
A_i_LAB = imhist(I_i_LAB(:,:,2));
B_i_LAB = imhist(I_i_LAB(:,:,3));
for j = i:no_segments
leave_pixel = unique_pixels(j,:);
mask_image = ((I(:,:,1) == leave_pixel(1)) & (I(:,:,2) == leave_pixel(2)) & (I(:,:,3) == leave_pixel(3)));
I_j(:,:,1) = I_org(:,:,1) .* uint8((mask_image));
I_j(:,:,2) = I_org(:,:,2) .* uint8((mask_image));
I_j(:,:,3) = I_org(:,:,3) .* uint8((mask_image));
I_j_LAB = applycform(I_j,LAB_trans);
L_j_LAB = imhist(I_j_LAB(:,:,1));
A_j_LAB = imhist(I_j_LAB(:,:,2));
B_j_LAB = imhist(I_j_LAB(:,:,3));
appearance_distance(i,j) = sum(min(L_i_LAB,L_j_LAB) + min(A_i_LAB,A_j_LAB) + min(B_i_LAB,B_j_LAB));
spatial_distance(i,j) = ModHausdorffDist(I_i,I_j) / max(rows,cols);
end
end
spatial_distance = spatial_distance ./ max(max(spatial_distance));
max_apperance_distance = max(max(appearance_distance));
composition_cost = ((1 - spatial_distance) .* appearance_distance) + (spatial_distance * max_apperance_distance);
%%
%input parameters for computation
window_size = 9; %rows and colums are considered to be same
window = ones(window_size);
additional_elements = (window_size - 1)/2;
I_temp(:,:,1) = [zeros(additional_elements,cols);I(:,:,1);zeros(additional_elements,cols)];
I_new(:,:,1) = [zeros(rows + (window_size - 1),additional_elements) I_temp(:,:,1) zeros(rows + (window_size - 1),additional_elements)];
I_temp(:,:,2) = [zeros(additional_elements,cols);I(:,:,2);zeros(additional_elements,cols)];
I_new(:,:,2) = [zeros(rows + (window_size - 1),additional_elements) I_temp(:,:,2) zeros(rows + (window_size - 1),additional_elements)];
I_temp(:,:,3) = [zeros(additional_elements,cols);I(:,:,3);zeros(additional_elements,cols)];
I_new(:,:,3) = [zeros(rows + (window_size - 1),additional_elements) I_temp(:,:,3) zeros(rows + (window_size - 1),additional_elements)];
cost = zeros(rows,cols);
for i = additional_elements + 1:rows
for j = additional_elements+1:cols
I_windowed(:,:,1) = I_new(i-additional_elements:i+additional_elements,i-additional_elements:i+additional_elements,1);
I_windowed(:,:,2) = I_new(i-additional_elements:i+additional_elements,i-additional_elements:i+additional_elements,2);
I_windowed(:,:,3) = I_new(i-additional_elements:i+additional_elements,i-additional_elements:i+additional_elements,3);
[unique_pixels_w,count_pixels_w] = countPixels(I_windowed);
unique_pixels_w = setdiff(unique_pixels_w,[0 0 0],'rows');
inside_segment = setdiff(unique_pixels,unique_pixels_w);
outside_segments = setdiff(unique_pixels,inside_segment);
area_segment = count_pixels_w;
for k = 1:size(inside_pixels,1)
current_segment = inside_segment(k,:);
cost_curr_seg = sort(composition_cost(ismember(unique_pixels,current_segment,'rows'),:));
for l = 1:size(cost_curr_seg,2)
if(ismember(unique_pixels(l,:),outside_segments,'rows') && count_pixels(l) > 0)
composed_area = min(area_segment(k),count_pixels(l));
cost(i,j) = cost(i,j) + cost_curr_seg(l) * composed_area;
area_segment(k) = area_segment(k) - composed_area;
count_pixels(l) = count_pixels(l) - composed_area;
if area_segment(k) == 0
break
end
end
end
if area(k) > 0
cost(i,j) = cost(i,j) + max_apperance_distance * area_segment(k);
end
end
end
end
cost = cost / window_size;
The code for the countPixels function:
function [unique_rows,counts] = countPixels(I)
[rows,cols,dims] = size(I);
pixels_I = zeros(rows*cols,dims);
count = 1;
for i = 1:rows
for j = 1:cols
pixels_I(count,:) = reshape(I(i,j,:),[1,3]);
count = count + 1;
end
end
[unique_rows,~,ind] = unique(pixels_I,'rows');
counts = histc(ind,unique(ind));
end

find optimum values of model iteratively

Given that I have a model that can be expressed as:
y = a + b*st + c*d2
where st is a smoothed version of some data, and a, b and c are model coffieicients that are unknown. An iterative process should be used to find the best values for a, b, and c and also an additional value alpha, shown below.
Here, I show an example using some data that I have. I'll only show a small fraction of the data here to get an idea of what I have:
17.1003710350253 16.7250000000000 681.521316544969
17.0325989276234 18.0540000000000 676.656460644882
17.0113862864815 16.2460000000000 671.738125420192
16.8744356336601 15.1580000000000 666.767363772145
16.5537077980594 12.8830000000000 661.739644621949
16.0646524243248 10.4710000000000 656.656219934146
15.5904357723302 9.35000000000000 651.523986525985
15.2894427136087 12.4580000000000 646.344231349275
15.1181450512182 9.68700000000000 641.118300709434
15.0074128442766 10.4080000000000 635.847600747838
14.9330905954828 11.5330000000000 630.533597865332
14.8201069920058 10.6830000000000 625.177819082427
16.3126863409751 15.9610000000000 619.781852331734
16.2700386755872 16.3580000000000 614.347346678083
15.8072873786912 10.8300000000000 608.876012461843
15.3788908036751 7.55000000000000 603.369621360944
15.0694302370038 13.1960000000000 597.830006367160
14.6313314652840 8.36200000000000 592.259061672302
14.2479738025295 9.03000000000000 586.658742460043
13.8147156115234 5.29100000000000 581.031064599264
13.5384821473624 7.22100000000000 575.378104234926
13.3603543306796 8.22900000000000 569.701997272687
13.2469020140965 9.07300000000000 564.004938753678
13.2064193251406 12.0920000000000 558.289182116093
13.1513460035983 12.2040000000000 552.557038340513
12.8747853506079 4.46200000000000 546.810874976187
12.5948999131388 4.61200000000000 541.053115045791
12.3969691298003 6.83300000000000 535.286235826545
12.1145822760120 2.43800000000000 529.512767505944
11.9541188991626 2.46700000000000 523.735291710730
11.7457790927936 4.15000000000000 517.956439908176
11.5202981254529 4.47000000000000 512.178891679167
11.2824263926694 2.62100000000000 506.405372863054
11.0981930749608 2.50000000000000 500.638653574697
10.8686514170776 1.66300000000000 494.881546094641
10.7122053911554 1.68800000000000 489.136902633882
10.6255883267131 2.48800000000000 483.407612975178
10.4979083986908 4.65800000000000 477.696601993434
10.3598092538338 4.81700000000000 472.006827058220
10.1929490084608 2.46700000000000 466.341275322034
10.1367069580204 2.36700000000000 460.702960898512
10.0194072271384 4.87800000000000 455.094921935306
9.88627023967911 3.53700000000000 449.520217586971
9.69091601129389 0.417000000000000 443.981924893704
9.48684595125235 -0.567000000000000 438.483135572389
9.30742664359900 0.892000000000000 433.026952726910
9.18283037670750 1.50000000000000 427.616487485241
9.02385722622626 1.75800000000000 422.254855571341
8.90355705229410 2.46700000000000 416.945173820367
8.76138912769045 1.99200000000000 411.690556646207
8.61299614111510 0.463000000000000 406.494112470755
8.56293606861698 6.55000000000000 401.358940124780
8.47831879772002 4.65000000000000 396.288125230599
8.42736865902327 6.45000000000000 391.284736577104
8.26325535934842 -1.37900000000000 386.351822497948
8.14547793724500 1.37900000000000 381.492407263967
8.00075641792910 -1.03700000000000 376.709487501030
7.83932517791044 -1.66700000000000 372.006028644665
7.68389447250257 -4.12900000000000 367.384961442799
7.63402151555169 -2.57900000000000 362.849178517935
The results that follow probably won't be meaningful as the full data would be needed (but this is an example). Using this data I have tried to solve iteratively by
y = d(:,1);
d1 = d(:,2);
d2 = d(:,3);
alpha_o = linspace(0.01,1,10);
a = linspace(0.01,1,10);
b = linspace(0.01,1,10);
c = linspace(0.01,1,10);
defining different values for a, b, and c as well as another term alpha, which is used in the model, and am now going to find every possible combination of these parameters and see which combination provides the best fit to the data:
% every possible combination of values
xx = combvec(alpha_o,a,b,c);
% loop through each possible combination of values
for j = 1:size(xx,2);
alpha_o = xx(1,j);
a_o = xx(2,j);
b_o = xx(3,j);
c_o = xx(4,j);
st = d1(1);
for i = 2:length(d1);
st(i) = alpha_o.*d1(i) + (1-alpha_o).*st(i-1);
end
st = st(:);
y_pred = a_o + (b_o*st) + (c_o*d2);
mae(j) = nanmean(abs(y - y_pred));
end
I can then re-run the model using these optimum values:
[id1,id2] = min(mae);
alpha_opt = xx(:,id2);
st = d1(1);
for i = 2:length(d1);
st(i) = alpha_opt(1).*d1(i) + (1-alpha_opt(1)).*st(i-1);
end
st = st(:);
y_pred = alpha_opt(2) + (alpha_opt(3)*st) + (alpha_opt(4)*d2);
mae_final = nanmean(abs(y - y_pred));
However, to reach a final answer I would need to increase the number of initial guesses to more than 10 for each variable. This will take a long time to run. Thereofre, I am wondering if there is a better method for what I am trying to do here? Any advice is appreciated.

Here's some thoughts: If you could decrease the amount of computation within each for loop, you could possibly speed it up. One possible way is to look for common factors between each loop and move it outside for loop:
If you look at the iteration, you'll see
st(1) = d1(1)
st(2) = a * d1(2) + (1-a) * st(1) = a *d1(2) + (1-a)*d1(1)
st(3) = a * d1(3) + (1-a) * st(2) = a * d1(3) + a *(1-a)*d1(2) +(1-a)^2 * d1(1)
st(n) = a * d1(n) + a *(1-a)*d1(n-1) + a *(1-a)^2 * d1(n-2) + ... +(1-a)^(n-1)*d1(1)
Which means st can be calculated by multiplying these two matrices (here I use n=4 for example to illustrate the concept) and sum along the first dimension:
temp1 = [ 0 0 0 a ;
0 0 a a(1-a) ;
0 a a(1-a) a(1-a)^2 ;
1 (1-a) (1-a)^2 (1-a)^3 ;]
temp2 = [ 0 0 0 d1(4) ;
0 0 d1(3) d1(3) ;
0 d1(2) d1(2) d1(2) ;
d1(1) d1(1) d1(1) d1(1) ;]
st = sum(temp1.*temp2,1)
Here's codes that utilize this concept: Computation has been moved out of the inner for loop and only assignment is left.
alpha_o = linspace(0.01,1,10);
xx = nchoosek(alpha_o, 4);
n = size(d1,1);
matrix_d1 = zeros(n, n);
d2 = d2'; % To make the dimension of d2 and st the same.
for ii = 1:n
matrix_d1(n-ii+1:n, ii) = d1(1:ii);
end
st = zeros(size(d1)'); % Pre-allocation of matrix will improve speed.
mae = zeros(1,size(xx,1));
matrix_alpha = zeros(n, n);
for j = 1 : size(xx,1)
alpha_o = xx(j,1);
temp = (power(1-alpha_o, [0:n-1])*alpha_o)';
matrix_alpha(n,:) = power(1-alpha_o, [0:n-1]);
for ii = 2:n
matrix_alpha(n-ii+1:n-1, ii) = temp(1:ii-1);
end
st = sum(matrix_d1.*matrix_alpha, 1);
y_pred = xx(j,2) + xx(j,3)*st + xx(j,4)*d2;
mae(j) = nanmean(abs(y - y_pred));
end
Then :
idx = find(min(mae));
alpha_opt = xx(idx,:);
st = zeros(size(d1)');
temp = (power(1-alpha_opt(1), [0:n-1])*alpha_opt(1))';
matrix_alpha = zeros(n, n);
matrix_alpha(n,:) = power(1-alpha_opt(1), [0:n-1]);;
for ii = 2:n
matrix_alpha(n-ii+1:n-1, ii) = temp(1:ii-1);
end
st = sum(matrix_d1.*matrix_alpha, 1);
y_pred = alpha_opt(2) + (alpha_opt(3)*st) + (alpha_opt(4)*d2);
mae_final = nanmean(abs(y - y_pred));
Let me know if this helps !

How to Run a Script for 30 - 40 Values with Increasing Magnitude

I would like to run the following for 30 to 40 values of the current (I0) with increasing magnitude. Can someone please tell me how to make this happen?
GNa = 400;
GK = 200;
GL = 2;
ENa = 99;
EK = -85;
VL = -65;
C = 2;
dt = .01;
t = 0:dt:200;
I0 = 200;
It= I0 + 0*t;
It(1:40/dt)=0;
m = zeros(1,length(t));
n = zeros(1,length(t));
h = zeros(1,length(t));
V = zeros(1,length(t));
V(1) = -65;
am = zeros(1,length(t));
bm = zeros(1,length(t));
an = zeros(1,length(t));
bn = zeros(1,length(t));
ah = zeros(1,length(t));
bh = zeros(1,length(t));
for i=1:length(t)-1
I=It(i);
am(i) = (0.1*(V(i) + 40))/(1 - exp(-0.1*(V(i)+40)));
bm(i) = 4*exp(-0.0556*(V(i)+65));
ah(i) = 0.07*exp(-0.05*(V(i)+65));
bh(i) = 1./(1+exp(-0.1*(V(i)+35)));
an(i) = (0.01*(V(i)+55))/(1 - exp(-0.1*(V(i)+55)));
bn(i) = 0.125*exp(-0.0125*(V(i)+65));
m(i+1) = m(i) + dt*(am(i)*(1 - m(i)) - bm(i)*m(i));
h(i+1) = h(i) + dt*(ah(i)*(1 - h(i)) - bh(i)*h(i));
n(i+1) = n(i) + dt*(an(i)*(1 - n(i)) - bn(i)*n(i));
V(i+1) = V(i) + dt*(((-GL*(V(i) - VL) - GNa*(m(i).^3)*h(i)*(V(i) - ENa) - GK* (n(i).^4)*(V(i) - EK)+ I))/C);
end

You can always wrap this in another for loop, no?
I0list = 200:2.5:300 % 41 values including endpoints
for j = 1:length(IOlist)
I0 = IOlist(j);
% Rest of your code, minus setting I0, goes here
end
If you need to store all of the values, make m etc. be m = zeros(length(IOlist),length(t)); instead, and then use both indices when assigning, e.g.
am(j,i) = ...

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

speed up code - vectorization - matlab

Related

How do I implement six initial conditions for a system of two coupled second order differential equations?

MATLAB - Adaptive Step Size Runge-Kutta

Efficient solution to object detection algorithm

find optimum values of model iteratively

How to Run a Script for 30 - 40 Values with Increasing Magnitude

Categories

Resources