Regularized logistic regresion with vectorization - matlab

I'm trying to implement a vectorized version of the regularised logistic regression. I have found a post that explains the regularised version but I don't understand it.
To make it easy I will copy the code below:
hx = sigmoid(X * theta);
m = length(X);
J = (sum(-y' * log(hx) - (1 - y') * log(1 - hx)) / m) + lambda * sum(theta(2:end).^2) / (2*m);
grad =((hx - y)' * X / m)' + lambda .* theta .* [0; ones(length(theta)-1, 1)] ./ m ;
I understand the first part of the Cost equation, If I'm correct it could be represented as:
J = ((-y' * log(hx)) - ((1-y)' * log(1-hx)))/m;
The problem it's the regularization term. Let's take more detail:
Dimensions:
X = (m x (n+1))
theta = ((n+1) x 1)
I don't understand why he let the first term of theta (theta_0) outside of the equation, when in theory the regularized term it's:
and it has to take into account all the thetas
For the gradient descent, I think that this equation it's equivalent:
L = eye(length(theta));
L(1,1) = 0;
grad = (1/m * X'* (hx - y)+ (lambda*(L*theta)/m).

In Matlab indexes begin from 1, and in mathematic indexes begin from 0 (the indexes on the formula which you mentioned are also beginning from 0).
So, in theory, the first term of theta also needs to be let outside of the equation.
And as for your second question, you right! It is an equivalent clean equation!

Related

ODE45 with very large numbers as constraints

2nd ODE to solve in MATLAB:
( (a + f(t))·d²x/dt² + (b/2 + k(t))·dx/dt ) · dx/dt - g(t) = 0
Boundary condition:
dx/dt(0) = v0
where
t is the time,
x is the position
dx/dt is the velocity
d2x/dt2 is the acceleration
a, b, v0 are constants
f(t), k(t) and h(t) are KNOWN functions dependent on t
(I do not write them because they are quite big)
As an example, using symbolic variables:
syms t y
%% --- Initial conditions ---
phi = 12.5e-3;
v0 = 300;
e = 3e-3;
ro = 1580;
E = 43e9;
e_r = 0.01466;
B = 0.28e-3;
%% --- Intermediate calculations ---
v_T = sqrt(((1 + e_r) * 620e6) /E) - sqrt(E/ro) * e_r;
R_T = v_T * t;
m_acc = pi * e * ro *(R_T^2);
v_L = sqrt (E/ro);
R_L = v_L * t;
z = 2 * R_L;
E_4 = B * ((e_r^2)* B * (0.9^(z/B)-1)) /(log(0.9));
E_1 = E * e * pi * e_r^2 * (-phi* (phi - 2*v_T*t)) /16;
E_2 = pi * R_T^2 * 10e9;
E_3 = pi * R_T^2 * 1e6 * e;
%% Resolution of the problem
g_t = -diff(E_1 + E_2 + E_3, t);
f(t,y)=(g_t - (pi*v_T*e*ro/2 + E_4) * y^2 /(y* (8.33e-3 + m_acc))];
fun=matlabFunction(f);
[T,Y]=ode45(fun,[0 1], v0]);
How can I rewrite this to get x as y=dx/dt? I'm new to Matlab and any help is very welcome !
First, you chould use subs to evaluate a symbolic function. Another approach is to use matlabFunction to convert all symbolic expressions to anonymous functions, as suggested by Horchler.
Second, you're integrating the ODE as if it is 1st order in dx/dt. If you're interested in x(t) as well as dx/dt(t), then you'll have to modify the function like so:
fun = #(t,y) [y(2);
( subs(g) - (b/2 + subs(k))*y(2)*y(2) ) / ( y(2) * (a + subs(f))) ];
and of course, provide an initial value for x0 = x(0) as well as v0 = dx/dt(0).
Third, the absolute value of the parameters is hardly ever a real concern. IEEE754 double-precision floating point format can effortlessly represent numbers between 2.225073858507201e-308 and 1.797693134862316e+308 (realmin and realmax, respectively). So for the coefficients you gave (O(1014)), this is absolutely not a problem. You might lose a few digits of precision if you don't take precautions (rescale to [-1 +1], reformulate the problem in different units, ...), but the relative error due to this is more than likely to be tiny and insignificant compared to the algorithmic error made by ode45.
<RANDOM_OPINIONATED_RANT>
Fourth, WHY do you use symbolic math for this purpose?! You are doing a numerical integration, meaning, there is no analytic solution anyway. Why bother with symbolics then? Doing the integration with symbolics (through vpa even) is going to be dozens, hundreds, yes, often even thousands of times slower than keeping (or re-implementing) everything numerical (which some would argue is already slow in MATLAB compared to a bare-metal approach).
Yes, of course, for this specific, individual, isolated use case it may not matter much, but for the future I'd strongly advise you to learn to:
use symbolics for derivations, proving theorems, simplifying expressions, ...
use numerics to implement any algorithm or function from which actual numbers are expected.
In other words, symbolics for drafting, numerics for crunching. And exactly zero symbolics should appear in any good implementation of any algorithm.
Although it's possible to mix them to some extent, that does not mean it is a good idea to do so. In fact, that's almost never. And the few isolated cases where it is the only viable option are not a vindication of the approach.
They are rare, isolated cases after all, far from the abundant norm.
For me it bears resemblance with the evil eval, with similar reasons for why it Should. Be. Avoided.
</RANDOM_OPINIONATED_RANT>
With the full code, it's easy to come up with a complete solution:
% Initial conditions
phi = 12.5e-3;
v0 = 300;
x0 = 0; % (my assumption)
e = 3e-3;
ro = 1580;
E = 43e9;
e_r = 0.01466;
B = 0.28e-3;
% Intermediate calculations
v_T = sqrt(((1 + e_r) * 620e6) /E) - sqrt(E/ro) * e_r;
R_T = #(t) v_T * t;
m_acc = #(t) pi * e * ro *(R_T(t)^2);
v_L = sqrt (E/ro);
R_L = #(t) v_L * t;
z = #(t) 2 * R_L(t);
E_4 = #(t) B * ((e_r^2)* B * (0.9^(z(t)/B)-1)) /(log(0.9));
% UNUSED
%{
E_1 = #(t) -phi * E * e * pi * e_r^2 * (phi - 2*v_T*t) /16;
E_2 = #(t) pi * R_T(t)^2 * 10e9;
E_3 = #(t) pi * R_T(t)^2 * 1e6 * e;
%}
% Resolution of the problem
g_t = #(t) -( phi * E * e * pi * e_r^2 * v_T / 8 + ... % dE_1/dt
pi * 10e9 * 2 * R_T(t) * v_T + ... % dE_2/dt
pi * 1e6 * e * 2 * R_T(t) * v_T ); % dE_3/dt
% The derivative of Z = [x(t); x'(t)] equals Z' = [x'(t); x''(t)]
f = #(t,y)[y(2);
(g_t(t) - (0.5*pi*v_T*e*ro + E_4(t)) * y(2)^2) /(y(2) * (8.33e-3 + m_acc(t)))];
% Which is readily integrated
[T,Y] = ode45(f, [0 1], [x0 v0]);
% Plot solutions
figure(1)
plot(T, Y(:,1))
xlabel('t [s]'), ylabel('position [m]')
figure(2)
plot(T, Y(:,2))
xlabel('t [s]'), ylabel('velocity [m/s]')
Results:
Note that I've not used symbolics anywhere, except to double-check my hand-derived derivatives.

how to implement newton-raphson to calculate the k(i) coefficients of a implicit runge kutta?

I'm trying to implement a RK implicit 2-order to convection-diffusion equation (1D) with fdm_2nd and gauss butcher coefficients: 'u_t = -uu_x + nu .u_xx' .
My goal is to compare the explit versus implcit scheme. The explicit rk which works well with a little number of viscosity. The curve of explicit schem show us a very nice shock wave.
I need your help to implement correctly the solver of the k(i) coefficient. I don't see how implement the newton method for all k(i).
do I need to implement it for all time-space steps ? or just in time ? The jacobian is maybe wrong but i don't see where. Or maybe i use the jacobian in wrong direction...
Actualy, my code works, but i think it's was wrong somewhere ... also the implicit curve does not move from the initial values.
here my function :
function [t,u] = burgers(t0,U,N,dx)
nu=0.01; %coefficient de viscosité
A=(diag(zeros(1,N))-diag(ones(1,N-1),1)+diag(ones(1,N-1),-1)) / (2*dx);
B=(-2*diag(ones(1,N))+diag(ones(1,N-1),1)+diag(ones(1,N-1),-1)) / (dx).^2;
t=t0;
u = - A * U.^2 + nu .* B * U;
the jacobian :
function Jb = burJK(U,dx,i)
%Opérateurs
a(1,1) = 1/4;
a(1,2) = 1/4 - (3).^(1/2) / 6;
a(2,1) = 1/4 + (3).^(1/2) / 6;
a(2,2) = 1/4;
Jb(1,1) = a(1,1) .* (U(i+1,1) - U(i-1,1))/ (2*dx) - 1;
Jb(1,2) = a(1,2) .* (U(i+1,1) - U(i-1,1))/ (2*dx);
Jb(2,1) = a(2,1) .* (U(i+1,2) - U(i-1,2))/ (2*dx);
Jb(2,2) = a(2,2) .* (U(i+1,2) - U(i-1,2))/ (2*dx) - 1;
Here my newton-code:
iter = 1;
iter_max = 100;
k=zeros(2,N);
k(:,1)=[0.4;0.6];
[w_1,f1] =burgers(n + c(1) * dt,uu + dt * (a(1,:) * k(:,iter)),iter,dx);
[w_2,f2] =burgers(n + c(2) * dt,uu + dt * (a(2,:) * k(:,iter)),iter,dx);
f1 = -k(1,iter) + f1;
f2 = -k(1,iter) + f2;
f(:,1)=f1;
f(:,2)=f2;
df = burJK(f,dx,iter+1);
while iter<iter_max-1 % K_newton
delta = df\f(iter,:)';
k(:,iter+1) = k(:,iter) - delta;
iter = iter+1;
[w_1,f1] =burgers(n + c(1) * dt,uu + dt * (a(1,:) * k(:,iter+1)),N,dx);
[w_2,f2] =burgers(n + c(2) * dt,uu + dt * (a(2,:) * k(:,iter+1)),N,dx);
f1 = -k(1,iter+1) + f1;
f2 = -k(1,iter+1) + f2;
f(:,1)=f1;
f(:,2)=f2;
df = burJK(f,dx,iter);
if iter>iter_max
disp('#');
else
disp('ok');
end
end
I'm a little rusty on exactly how to implement this in matlab, but I can walk your through the general steps and hopefully that will help. First we can consider the equation you are solving to fit the general class of problems that can be posed as
du/dt = F(u), where F is a linear or nonlinear function
For a Runge Kutta scheme you typically recast the problem something like this
k(i) = F(u+dt*a(i,i)*k(i)+ a(i,j)*k(j))
for a given stage. Now comes the tricky part, you you need to make 1-D vector constructed by stacking k(1) onto k(2). So the first half of the elements of the vector are k(1) and the second half are k(2). With this new combined vector you can then change F So that it operates on the two k's separately. This results in
K = FF(u+dt*a*K) where FF is F for the new double k vector, K
Ok, now we can implement the Newton's method. You will do this for each time step and until you have converged on the right answer and use it across all spatial points at the same time. What you do is you guess a K and compute the jacobian of G(K,U) = K-FF(FF(u+dt*a*K). G(K,U) should be only valued at zero when K is at the right solution. So in other words, do you Newton's method on K and when looking for convergence you need to see that it is converging at all spots. I would run the newton's method until max(abs(G(K,U)))< SolverTolerance.
Sorry I can't be more help on the matlab implementation, but hopefully I helped with explaining how to implement the newton's method.

Cost function for linear regression with multiple variables in Matlab

The multivariate linear regression cost function:
Is the following code in Matlab correct?
function J = computeCostMulti(X, y, theta)
m = length(y);
J = 0;
J=(1/(2*m)*(X*theta-y)'*(X*theta-y);
end
There is two ways i tried which is essentially the same code.
J = (X * theta - y)'*(X * theta - y)/2*m;
or you can try:
J = (1/(2*m))*(X * theta - y)'*(X * theta - y)
Your are missing a ) in the end:
J=(1/(2*m))*(X*theta-y)'*(X*theta-y);
^

how to solve a system of Ordinary Differential Equations (ODE's) in Matlab

I have to solve a system of ordinary differential equations of the form:
dx/ds = 1/x * [y* (g + s/y) - a*x*f(x^2,y^2)]
dy/ds = 1/x * [-y * (b + y) * f()] - y/s - c
where x, and y are the variables I need to find out, and s is the independent variable; the rest are constants. I've tried to solve this with ode45 with no success so far:
y = ode45(#yprime, s, [1 1]);
function dyds = yprime(s,y)
global g a v0 d
dyds_1 = 1./y(1) .*(y(2) .* (g + s ./ y(2)) - a .* y(1) .* sqrt(y(1).^2 + (v0 + y(2)).^2));
dyds_2 = - (y(2) .* (v0 + y(2)) .* sqrt(y(1).^2 + (v0 + y(2)).^2))./y(1) - y(2)./s - d;
dyds = [dyds_1; dyds_2];
return
where #yprime has the system of equations. I get the following error message:
YPRIME returns a vector of length 0, but the length of initial
conditions vector is 2. The vector returned by YPRIME and the initial
conditions vector must have the same number of elements.
Any ideas?
thanks
Certainly, you should have a look at your function yprime. Using some simple model that shares the number of differential state variables with your problem, have a look at this example.
function dyds = yprime(s, y)
dyds = zeros(2, 1);
dyds(1) = y(1) + y(2);
dyds(2) = 0.5 * y(1);
end
yprime must return a column vector that holds the values of the two right hand sides. The input argument s can be ignored because your model is time-independent. The example you show is somewhat difficult in that it is not of the form dy/dt = f(t, y). You will have to rearrange your equations as a first step. It will help to rename x into y(1) and y into y(2).
Also, are you sure that your global g a v0 d are not empty? If any one of those variables remains uninitialized, you will be multiplying state variables with an empty matrix, eventually resulting in an empty vector dyds being returned. This can be tested with
assert(~isempty(v0), 'v0 not initialized');
in yprime, or you could employ a debugging breakpoint.
the syntax for ODE solvers is [s y]=ode45(#yprime, [1 10], [2 2])
and you dont need to do elementwise operation in your case i.e. instead of .* just use *

constant term in Matlab principal component regression (pcr) analysis

I am trying to learn principal component regression (pcr) with Matlab. I use this guide here: http://www.mathworks.fr/help/stats/examples/partial-least-squares-regression-and-principal-components-regression.html
it's really good, but I just cannot understand one step:
we do the PCA and the regression, nice and clear:
[PCALoadings,PCAScores,PCAVar] = princomp(X);
betaPCR = regress(y-mean(y), PCAScores(:,1:2));
And then we adjust the first coefficient:
betaPCR = PCALoadings(:,1:2)*betaPCR;
betaPCR = [mean(y) - mean(X)*betaPCR; betaPCR];
yfitPCR = [ones(n,1) X]*betaPCR;
How come that the coefficient needs to be 'mean(y) - mean(X)*betaPCR' for the constant one factor? Can you explain that to me?
Thanks in advance!
This is really a math question, not a coding question. Your PCA extracts a set of features and puts them in a matrix, which gives you PCALoadings and PCAScores. Pull out the first two principal components and their loadings, and put them in their own matrix:
W = PCALoadings(:, 1:2)
Z = PCAScores(:, 1:2)
The relationship between X and Z is that X can be approximated by:
Z = (X - mean(X)) * W <=> X ~ mean(X) + Z * W' (1)
The intuition is that Z captures most of the "important information" in X, and the matrix W tells you how to transform between the two representations.
Now you can do a regression of y on Z. First you have to subtract the mean from y, so that both the left and right hand sides have mean zero:
y - mean(y) = Z * beta + errors (2)
Now you want to use that regression to make predictions for y from X. Substituting from equation (1) into equation (2) gives you
y - mean(y) = (X - mean(X)) * W * beta
= (X - mean(X)) * beta1
where we have defined beta1 = W * beta (you do this in your third line of code). Rearranging:
y = mean(y) - mean(X) * beta1 + X * beta1
= [ones(n,1) X] * [mean(y) - mean(X) * beta1; beta1]
= [ones(n,1) X] * betaPCR
which works out if we define
betaPCR = [mean(y) - mean(X) * beta1; beta1]
as in your fourth line of code.