ODE45 with very large numbers as constraints - matlab

2nd ODE to solve in MATLAB:
( (a + f(t))·d²x/dt² + (b/2 + k(t))·dx/dt ) · dx/dt - g(t) = 0
Boundary condition:
dx/dt(0) = v0
where
t is the time,
x is the position
dx/dt is the velocity
d2x/dt2 is the acceleration
a, b, v0 are constants
f(t), k(t) and h(t) are KNOWN functions dependent on t
(I do not write them because they are quite big)
As an example, using symbolic variables:
syms t y
%% --- Initial conditions ---
phi = 12.5e-3;
v0 = 300;
e = 3e-3;
ro = 1580;
E = 43e9;
e_r = 0.01466;
B = 0.28e-3;
%% --- Intermediate calculations ---
v_T = sqrt(((1 + e_r) * 620e6) /E) - sqrt(E/ro) * e_r;
R_T = v_T * t;
m_acc = pi * e * ro *(R_T^2);
v_L = sqrt (E/ro);
R_L = v_L * t;
z = 2 * R_L;
E_4 = B * ((e_r^2)* B * (0.9^(z/B)-1)) /(log(0.9));
E_1 = E * e * pi * e_r^2 * (-phi* (phi - 2*v_T*t)) /16;
E_2 = pi * R_T^2 * 10e9;
E_3 = pi * R_T^2 * 1e6 * e;
%% Resolution of the problem
g_t = -diff(E_1 + E_2 + E_3, t);
f(t,y)=(g_t - (pi*v_T*e*ro/2 + E_4) * y^2 /(y* (8.33e-3 + m_acc))];
fun=matlabFunction(f);
[T,Y]=ode45(fun,[0 1], v0]);
How can I rewrite this to get x as y=dx/dt? I'm new to Matlab and any help is very welcome !

First, you chould use subs to evaluate a symbolic function. Another approach is to use matlabFunction to convert all symbolic expressions to anonymous functions, as suggested by Horchler.
Second, you're integrating the ODE as if it is 1st order in dx/dt. If you're interested in x(t) as well as dx/dt(t), then you'll have to modify the function like so:
fun = #(t,y) [y(2);
( subs(g) - (b/2 + subs(k))*y(2)*y(2) ) / ( y(2) * (a + subs(f))) ];
and of course, provide an initial value for x0 = x(0) as well as v0 = dx/dt(0).
Third, the absolute value of the parameters is hardly ever a real concern. IEEE754 double-precision floating point format can effortlessly represent numbers between 2.225073858507201e-308 and 1.797693134862316e+308 (realmin and realmax, respectively). So for the coefficients you gave (O(1014)), this is absolutely not a problem. You might lose a few digits of precision if you don't take precautions (rescale to [-1 +1], reformulate the problem in different units, ...), but the relative error due to this is more than likely to be tiny and insignificant compared to the algorithmic error made by ode45.
<RANDOM_OPINIONATED_RANT>
Fourth, WHY do you use symbolic math for this purpose?! You are doing a numerical integration, meaning, there is no analytic solution anyway. Why bother with symbolics then? Doing the integration with symbolics (through vpa even) is going to be dozens, hundreds, yes, often even thousands of times slower than keeping (or re-implementing) everything numerical (which some would argue is already slow in MATLAB compared to a bare-metal approach).
Yes, of course, for this specific, individual, isolated use case it may not matter much, but for the future I'd strongly advise you to learn to:
use symbolics for derivations, proving theorems, simplifying expressions, ...
use numerics to implement any algorithm or function from which actual numbers are expected.
In other words, symbolics for drafting, numerics for crunching. And exactly zero symbolics should appear in any good implementation of any algorithm.
Although it's possible to mix them to some extent, that does not mean it is a good idea to do so. In fact, that's almost never. And the few isolated cases where it is the only viable option are not a vindication of the approach.
They are rare, isolated cases after all, far from the abundant norm.
For me it bears resemblance with the evil eval, with similar reasons for why it Should. Be. Avoided.
</RANDOM_OPINIONATED_RANT>

With the full code, it's easy to come up with a complete solution:
% Initial conditions
phi = 12.5e-3;
v0 = 300;
x0 = 0; % (my assumption)
e = 3e-3;
ro = 1580;
E = 43e9;
e_r = 0.01466;
B = 0.28e-3;
% Intermediate calculations
v_T = sqrt(((1 + e_r) * 620e6) /E) - sqrt(E/ro) * e_r;
R_T = #(t) v_T * t;
m_acc = #(t) pi * e * ro *(R_T(t)^2);
v_L = sqrt (E/ro);
R_L = #(t) v_L * t;
z = #(t) 2 * R_L(t);
E_4 = #(t) B * ((e_r^2)* B * (0.9^(z(t)/B)-1)) /(log(0.9));
% UNUSED
%{
E_1 = #(t) -phi * E * e * pi * e_r^2 * (phi - 2*v_T*t) /16;
E_2 = #(t) pi * R_T(t)^2 * 10e9;
E_3 = #(t) pi * R_T(t)^2 * 1e6 * e;
%}
% Resolution of the problem
g_t = #(t) -( phi * E * e * pi * e_r^2 * v_T / 8 + ... % dE_1/dt
pi * 10e9 * 2 * R_T(t) * v_T + ... % dE_2/dt
pi * 1e6 * e * 2 * R_T(t) * v_T ); % dE_3/dt
% The derivative of Z = [x(t); x'(t)] equals Z' = [x'(t); x''(t)]
f = #(t,y)[y(2);
(g_t(t) - (0.5*pi*v_T*e*ro + E_4(t)) * y(2)^2) /(y(2) * (8.33e-3 + m_acc(t)))];
% Which is readily integrated
[T,Y] = ode45(f, [0 1], [x0 v0]);
% Plot solutions
figure(1)
plot(T, Y(:,1))
xlabel('t [s]'), ylabel('position [m]')
figure(2)
plot(T, Y(:,2))
xlabel('t [s]'), ylabel('velocity [m/s]')
Results:
Note that I've not used symbolics anywhere, except to double-check my hand-derived derivatives.

Related

Regularized logistic regresion with vectorization

I'm trying to implement a vectorized version of the regularised logistic regression. I have found a post that explains the regularised version but I don't understand it.
To make it easy I will copy the code below:
hx = sigmoid(X * theta);
m = length(X);
J = (sum(-y' * log(hx) - (1 - y') * log(1 - hx)) / m) + lambda * sum(theta(2:end).^2) / (2*m);
grad =((hx - y)' * X / m)' + lambda .* theta .* [0; ones(length(theta)-1, 1)] ./ m ;
I understand the first part of the Cost equation, If I'm correct it could be represented as:
J = ((-y' * log(hx)) - ((1-y)' * log(1-hx)))/m;
The problem it's the regularization term. Let's take more detail:
Dimensions:
X = (m x (n+1))
theta = ((n+1) x 1)
I don't understand why he let the first term of theta (theta_0) outside of the equation, when in theory the regularized term it's:
and it has to take into account all the thetas
For the gradient descent, I think that this equation it's equivalent:
L = eye(length(theta));
L(1,1) = 0;
grad = (1/m * X'* (hx - y)+ (lambda*(L*theta)/m).
In Matlab indexes begin from 1, and in mathematic indexes begin from 0 (the indexes on the formula which you mentioned are also beginning from 0).
So, in theory, the first term of theta also needs to be let outside of the equation.
And as for your second question, you right! It is an equivalent clean equation!

Speedy alternatives to fmincon for log-Likelihood

I am currently using fmincon for minimizing a log-Likelihood function in respect to a 18*18 matrix. While on smaller problems the algorithm is very fast, it takes about 2h to converge in the current setup - as I am iterating over this minimisation problem, running through the code may take up to 2 weeks.
Is there a matlab-based, free alternative to fmincon that improves speed on such specific problems? (Costly solutions are discussed here, non-matlab solutions here.) Or would I need to call e.g. a python script from matlab?
The function I want to minimize:
function [L] = logL(A, U, Sigma_e, T, lags)
% A - parameters to optimize w.r.t
logL = 0;
for t = 1 : T - lags
logL(t, 1) = 0.5*(log(det(A * diag(Sigma_e(t,:)) * A' ) ) + ...
U(t,:) * (A * diag(Sigma_e(t,:)) * A' )^(-1) * U(t,:)' );
end
L = sum(logL);
and calling it by:
Options = optimset('Algorithm', 'active-set', 'Display', 'off', 'Hessian','bfgs', ...
'DerivativeCheck','on','Diagnostics','off','GradObj','off','LargeScale','off');
A = fmincon( #(A0)logL(A0, U, Sigma_e, T, lags), A0 , [], [] , [] , [] , [] , [] , [], Options);
(I have tried the different fmincon algorithms without much improvement). Note, T is quite large ~3000.
A and A0 are a 18*18 matrices,
Sigma_e is T*18,
U is T*18
I'm not aware of any speedy alternative to fminconst but you can vectorize the logL function to speed up the algorithm. Here is a vectorized version:
function [L] = logL(A, U, Sigma_e, T, lags)
ia = inv(A);
iat = ia.';
N = T - lags;
UU = zeros(N,1);
for t = 1: N
UU (t) = U(t,:) * (iat .* 1./Sigma_e(t,:) * ia) * U(t,:)';
end
L = 0.5 *sum( log(det(A) ^ 2 .* prod(Sigma_e(1:N,:),2)) + UU);
end
In some tests in Octave it nearly is 10X faster than the your solution.
Note that if some elements of Sigma_e are equal to zero you need to compute UU as:
UU (t)=U(t,:) * (A * diag(Sigma_e(t,:)) * A' )^(-1) * U(t,:)';
These relations are used to convert the loop solution to the vectorized one:
det(a * b * c) == det(a) * det(b) * det(c)
det(a) == det(a.')
det(diag(a)) == prod(a)
(a * b * c)^-1 == c^-1 * b^-1 * a^-1
a * diag(b) == a .* b
inv(diag(a)) == diag(1./a)

how to implement newton-raphson to calculate the k(i) coefficients of a implicit runge kutta?

I'm trying to implement a RK implicit 2-order to convection-diffusion equation (1D) with fdm_2nd and gauss butcher coefficients: 'u_t = -uu_x + nu .u_xx' .
My goal is to compare the explit versus implcit scheme. The explicit rk which works well with a little number of viscosity. The curve of explicit schem show us a very nice shock wave.
I need your help to implement correctly the solver of the k(i) coefficient. I don't see how implement the newton method for all k(i).
do I need to implement it for all time-space steps ? or just in time ? The jacobian is maybe wrong but i don't see where. Or maybe i use the jacobian in wrong direction...
Actualy, my code works, but i think it's was wrong somewhere ... also the implicit curve does not move from the initial values.
here my function :
function [t,u] = burgers(t0,U,N,dx)
nu=0.01; %coefficient de viscosité
A=(diag(zeros(1,N))-diag(ones(1,N-1),1)+diag(ones(1,N-1),-1)) / (2*dx);
B=(-2*diag(ones(1,N))+diag(ones(1,N-1),1)+diag(ones(1,N-1),-1)) / (dx).^2;
t=t0;
u = - A * U.^2 + nu .* B * U;
the jacobian :
function Jb = burJK(U,dx,i)
%Opérateurs
a(1,1) = 1/4;
a(1,2) = 1/4 - (3).^(1/2) / 6;
a(2,1) = 1/4 + (3).^(1/2) / 6;
a(2,2) = 1/4;
Jb(1,1) = a(1,1) .* (U(i+1,1) - U(i-1,1))/ (2*dx) - 1;
Jb(1,2) = a(1,2) .* (U(i+1,1) - U(i-1,1))/ (2*dx);
Jb(2,1) = a(2,1) .* (U(i+1,2) - U(i-1,2))/ (2*dx);
Jb(2,2) = a(2,2) .* (U(i+1,2) - U(i-1,2))/ (2*dx) - 1;
Here my newton-code:
iter = 1;
iter_max = 100;
k=zeros(2,N);
k(:,1)=[0.4;0.6];
[w_1,f1] =burgers(n + c(1) * dt,uu + dt * (a(1,:) * k(:,iter)),iter,dx);
[w_2,f2] =burgers(n + c(2) * dt,uu + dt * (a(2,:) * k(:,iter)),iter,dx);
f1 = -k(1,iter) + f1;
f2 = -k(1,iter) + f2;
f(:,1)=f1;
f(:,2)=f2;
df = burJK(f,dx,iter+1);
while iter<iter_max-1 % K_newton
delta = df\f(iter,:)';
k(:,iter+1) = k(:,iter) - delta;
iter = iter+1;
[w_1,f1] =burgers(n + c(1) * dt,uu + dt * (a(1,:) * k(:,iter+1)),N,dx);
[w_2,f2] =burgers(n + c(2) * dt,uu + dt * (a(2,:) * k(:,iter+1)),N,dx);
f1 = -k(1,iter+1) + f1;
f2 = -k(1,iter+1) + f2;
f(:,1)=f1;
f(:,2)=f2;
df = burJK(f,dx,iter);
if iter>iter_max
disp('#');
else
disp('ok');
end
end
I'm a little rusty on exactly how to implement this in matlab, but I can walk your through the general steps and hopefully that will help. First we can consider the equation you are solving to fit the general class of problems that can be posed as
du/dt = F(u), where F is a linear or nonlinear function
For a Runge Kutta scheme you typically recast the problem something like this
k(i) = F(u+dt*a(i,i)*k(i)+ a(i,j)*k(j))
for a given stage. Now comes the tricky part, you you need to make 1-D vector constructed by stacking k(1) onto k(2). So the first half of the elements of the vector are k(1) and the second half are k(2). With this new combined vector you can then change F So that it operates on the two k's separately. This results in
K = FF(u+dt*a*K) where FF is F for the new double k vector, K
Ok, now we can implement the Newton's method. You will do this for each time step and until you have converged on the right answer and use it across all spatial points at the same time. What you do is you guess a K and compute the jacobian of G(K,U) = K-FF(FF(u+dt*a*K). G(K,U) should be only valued at zero when K is at the right solution. So in other words, do you Newton's method on K and when looking for convergence you need to see that it is converging at all spots. I would run the newton's method until max(abs(G(K,U)))< SolverTolerance.
Sorry I can't be more help on the matlab implementation, but hopefully I helped with explaining how to implement the newton's method.

Neural Networks: Sigmoid Activation Function for continuous output variable

Okay, so I am in the middle of Andrew Ng's machine learning course on coursera and would like to adapt the neural network which was completed as part of assignment 4.
In particular, the neural network which I had completed correctly as part of the assignment was as follows:
Sigmoid activation function: g(z) = 1/(1+e^(-z))
10 output units, each which could take 0 or 1
1 hidden layer
Back-propagation method used to minimize cost function
Cost function:
where L=number of layers, s_l = number of units in layer l, m = number of training examples, K = number of output units
Now I want to adjust the exercise so that there is one continuous output unit that takes any value between [0,1] and I am trying to work out what needs to change, so far I have
Replaced the data with my own, i.e.,such that the output is continuous variable between 0 and 1
Updated references to the number of output units
Updated the cost function in the back-propagation algorithm to:
where a_3 is the value of the output unit determined from forward propagation.
I am certain that something else must change as the gradient checking method shows the gradient determined by back-propagation and that by the numerical approximation no longer match up. I did not change the sigmoid gradient; it is left at f(z)*(1-f(z)) where f(z) is the sigmoid function 1/(1+e^(-z))) nor did I update the numerical approximation of the derivative formula; simply (J(theta+e) - J(theta-e))/(2e).
Can anyone advise of what other steps would be required?
Coded in Matlab as follows:
% FORWARD PROPAGATION
% input layer
a1 = [ones(m,1),X];
% hidden layer
z2 = a1*Theta1';
a2 = sigmoid(z2);
a2 = [ones(m,1),a2];
% output layer
z3 = a2*Theta2';
a3 = sigmoid(z3);
% BACKWARD PROPAGATION
delta3 = a3 - y;
delta2 = delta3*Theta2(:,2:end).*sigmoidGradient(z2);
Theta1_grad = (delta2'*a1)/m;
Theta2_grad = (delta3'*a2)/m;
% COST FUNCTION
J = 1/(2 * m) * sum( (a3-y).^2 );
% Implement regularization with the cost function and gradients.
Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + Theta1(:,2:end)*lambda/m;
Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + Theta2(:,2:end)*lambda/m;
J = J + lambda/(2*m)*( sum(sum(Theta1(:,2:end).^2)) + sum(sum(Theta2(:,2:end).^2)));
I have since realised that this question is similar to that asked by #Mikhail Erofeev on StackOverflow, however in this case I wish the continuous variable to be between 0 and 1 and therefore use a sigmoid function.
First, your cost function should be:
J = 1/m * sum( (a3-y).^2 );
I think your Theta2_grad = (delta3'*a2)/m;is expected to match the numerical approximation after changed to delta3 = 1/2 * (a3 - y);).
Check this slide for more details.
EDIT:
In case there is some minor discrepancy between our codes, I pasted my code below for your reference. The code has already been compared with numerical approximation function checkNNGradients(lambda);, the Relative Difference is less than 1e-4 (not meets the 1e-11 requirement by Dr.Andrew Ng though)
function [J grad] = nnCostFunctionRegression(nn_params, ...
input_layer_size, ...
hidden_layer_size, ...
num_labels, ...
X, y, lambda)
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
num_labels, (hidden_layer_size + 1));
m = size(X, 1);
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));
X = [ones(m, 1) X];
z1 = sigmoid(X * Theta1');
zs = z1;
z1 = [ones(m, 1) z1];
z2 = z1 * Theta2';
ht = sigmoid(z2);
y_recode = zeros(length(y),num_labels);
for i=1:length(y)
y_recode(i,y(i))=1;
end
y = y_recode;
regularization=lambda/2/m*(sum(sum(Theta1(:,2:end).^2))+sum(sum(Theta2(:,2:end).^2)));
J=1/(m)*sum(sum((ht - y).^2))+regularization;
delta_3 = 1/2*(ht - y);
delta_2 = delta_3 * Theta2(:,2:end) .* sigmoidGradient(X * Theta1');
delta_cap2 = delta_3' * z1;
delta_cap1 = delta_2' * X;
Theta1_grad = ((1/m) * delta_cap1)+ ((lambda/m) * (Theta1));
Theta2_grad = ((1/m) * delta_cap2)+ ((lambda/m) * (Theta2));
Theta1_grad(:,1) = Theta1_grad(:,1)-((lambda/m) * (Theta1(:,1)));
Theta2_grad(:,1) = Theta2_grad(:,1)-((lambda/m) * (Theta2(:,1)));
grad = [Theta1_grad(:) ; Theta2_grad(:)];
end
If you want to have continuous output try not to use sigmoid activation when computing target value.
a1 = [ones(m, 1) X];
a2 = sigmoid(X * Theta1');
a2 = [ones(m, 1) z1];
a3 = z1 * Theta2';
ht = a3;
Normalize input before using it in nnCostFunction. Everything else remains same.

Is there a cleaner way to generate a sum of sinusoids?

I have written a simple matlab / octave function to create the sum of sinusoids with independent amplitude, frequency and phase for each component. Is there a cleaner way to write this?
## Create a sum of cosines with independent amplitude, frequency and
## phase for each component:
## samples(t) = SUM(A[i] * sin(2 * pi * F[i] * t + Phi[i])
## Return samples as a column vector.
##
function signal = sum_of_cosines(A = [1.0],
F = [440],
Phi = [0.0],
duration = 1.0,
sampling_rate = 44100)
t = (0:1/sampling_rate:(duration-1/sampling_rate));
n = length(t);
signal = sum(repmat(A, n, 1) .* cos(2*pi*t' * F + repmat(Phi, n, 1)), 2);
endfunction
In particular, the calls to repmat() seem a bit clunky -- is there some nifty vectorization technique waiting for me to learn?
Is this the same?
signal = cos(2*pi*t' * F + repmat(Phi, n, 1)), 2) * A';
And then maybe
signal = real(exp(j*2*pi*t'*F) * (A .* exp(j*Phi))');
If you are memory constrained, this should work nicely:
e_jtheta = exp(j * 2 * pi * F / sampling_rate);
phasor = A .* exp(j*Phi);
samples = zeros(duration,1);
for k = 1:duration
samples(k) = real((e_jtheta .^ k) * phasor');
end
For row vectors A, F, and Phi, so you can use bsxfun to get rid of the repmat, but it is arguably uglier looking:
signal = cos(bsxfun(#plus, 2*pi*t' * F, Phi)) * A';
Heh. When I call any of the above vectorized versions with length(A) = 10000, octave fills up VM and grinds to a halt (or at least, a slow crawl -- I haven't had the patience to wait for it to complete.
As a result, I've fallen back to the straightforward iterative version:
function signal = sum_of_cosines(A = [1.0],
F = [440],
Phi = [0.0],
duration = 1.0,
sampling_rate = 44100)
t = (0:1/sampling_rate:(duration-1/sampling_rate));
n = length(t);
signal = zeros(n, 1);
for i=1:length(A)
samples += A(i) * cos(2*pi*t'*F(i) + Phi(i));
endfor
endfunction
This version works plenty fast and teaches me a lesson about trying to be 'elegant'.
P.S.: This doesn't diminish my appreciation for the answers given by #BenVoigt and #chappjc -- I've learned something useful from both!