Regularized logistic regression code in matlab - matlab

I'm trying my hand at regularized LR, simple with this formulas in matlab:
The cost function:
J(theta) = 1/m*sum((-y_i)*log(h(x_i)-(1-y_i)*log(1-h(x_i))))+(lambda/2*m)*sum(theta_j)
The gradient:
∂J(theta)/∂theta_0 = [(1/m)*(sum((h(x_i)-y_i)*x_j)] if j=0
∂j(theta)/∂theta_n = [(1/m)*(sum((h(x_i)-y_i)*x_j)]+(lambda/m)*(theta_j) if j>1
This is not matlab code is just the formula.
So far I've done this:
function [J, grad] = costFunctionReg(theta, X, y, lambda)
J = 0;
grad = zeros(size(theta));
temp_theta = [];
%cost function
%get the regularization term
for jj = 2:length(theta)
temp_theta(jj) = theta(jj)^2;
end
theta_reg = lambda/(2*m)*sum(temp_theta);
temp_sum =[];
%for the sum in the cost function
for ii =1:m
temp_sum(ii) = -y(ii)*log(sigmoid(theta'*X(ii,:)'))-(1-y(ii))*log(1-sigmoid(theta'*X(ii,:)'));
end
tempo = sum(temp_sum);
J = (1/m)*tempo+theta_reg;
%regulatization
%theta 0
reg_theta0 = 0;
for jj=1:m
reg_theta0(jj) = (sigmoid(theta'*X(m,:)') -y(jj))*X(jj,1)
end
reg_theta0 = (1/m)*sum(reg_theta0)
grad_temp(1) = reg_theta0
%for the rest of thetas
reg_theta = [];
thetas_sum = 0;
for ii=2:size(theta)
for kk =1:m
reg_theta(kk) = (sigmoid(theta'*X(m,:)') - y(kk))*X(kk,ii)
end
thetas_sum(ii) = (1/m)*sum(reg_theta)+(lambda/m)*theta(ii)
reg_theta = []
end
for i=1:size(theta)
if i == 1
grad(i) = grad_temp(i)
else
grad(i) = thetas_sum(i)
end
end
end
And the cost function is giving correct results, but I have no idea why the gradient (one step) is not, the cost gives J = 0.6931 which is correct and the gradient grad = 0.3603 -0.1476 0.0320, which is not, the cost starts from 2 because the parameter theta(1) does not have to be regularized, any help? I guess there is something wrong with the code, but after 4 days I can't see it.Thanks

Vectorized:
function [J, grad] = costFunctionReg(theta, X, y, lambda)
hx = sigmoid(X * theta);
m = length(X);
J = (sum(-y' * log(hx) - (1 - y')*log(1 - hx)) / m) + lambda * sum(theta(2:end).^2) / (2*m);
grad =((hx - y)' * X / m)' + lambda .* theta .* [0; ones(length(theta)-1, 1)] ./ m ;
end

I used more variables, so you could see clearly what comes from the regular formula, and what comes from "the regularization cost added". Additionally, It is a good practice to use "vectorization" instead of loops in Matlab/Octave. By doing this, you guarantee a more optimized solution.
function [J, grad] = costFunctionReg(theta, X, y, lambda)
%Hypotheses
hx = sigmoid(X * theta);
%%The cost without regularization
J_partial = (-y' * log(hx) - (1 - y)' * log(1 - hx)) ./ m;
%%Regularization Cost Added
J_regularization = (lambda/(2*m)) * sum(theta(2:end).^2);
%%Cost when we add regularization
J = J_partial + J_regularization;
%Grad without regularization
grad_partial = (1/m) * (X' * (hx -y));
%%Grad Cost Added
grad_regularization = (lambda/m) .* theta(2:end);
grad_regularization = [0; grad_regularization];
grad = grad_partial + grad_regularization;

Finally got it, after rewriting it again like for the 4th time, this is the correct code:
function [J, grad] = costFunctionReg(theta, X, y, lambda)
J = 0;
grad = zeros(size(theta));
temp_theta = [];
for jj = 2:length(theta)
temp_theta(jj) = theta(jj)^2;
end
theta_reg = lambda/(2*m)*sum(temp_theta);
temp_sum =[];
for ii =1:m
temp_sum(ii) = -y(ii)*log(sigmoid(theta'*X(ii,:)'))-(1-y(ii))*log(1-sigmoid(theta'*X(ii,:)'));
end
tempo = sum(temp_sum);
J = (1/m)*tempo+theta_reg;
%regulatization
%theta 0
reg_theta0 = 0;
for i=1:m
reg_theta0(i) = ((sigmoid(theta'*X(i,:)'))-y(i))*X(i,1)
end
theta_temp(1) = (1/m)*sum(reg_theta0)
grad(1) = theta_temp
sum_thetas = []
thetas_sum = []
for j = 2:size(theta)
for i = 1:m
sum_thetas(i) = ((sigmoid(theta'*X(i,:)'))-y(i))*X(i,j)
end
thetas_sum(j) = (1/m)*sum(sum_thetas)+((lambda/m)*theta(j))
sum_thetas = []
end
for z=2:size(theta)
grad(z) = thetas_sum(z)
end
% =============================================================
end
If its helps anyone, or anyone has any comments on how can I do it better. :)

Here is an answer that eliminates the loops
m = length(y); % number of training examples
predictions = sigmoid(X*theta);
reg_term = (lambda/(2*m)) * sum(theta(2:end).^2);
calcErrors = -y.*log(predictions) - (1 -y).*log(1-predictions);
J = (1/m)*sum(calcErrors)+reg_term;
% prepend a 0 column to our reg_term matrix so we can use simple matrix addition
reg_term = [0 (lambda*theta(2:end)/m)'];
grad = sum(X.*(predictions - y)) / m + reg_term;

Related

Steepest Descent using Armijo rule

I want to determine the Steepest descent of the Rosenbruck function using Armijo steplength where x = [-1.2, 1]' (the initial column vector).
The problem is, that the code has been running for a long time. I think there will be an infinite loop created here. But I could not understand where the problem was.
Could anyone help me?
n=input('enter the number of variables n ');
% Armijo stepsize rule parameters
x = [-1.2 1]';
s = 10;
m = 0;
sigma = .1;
beta = .5;
obj=func(x);
g=grad(x);
k_max = 10^5;
k=0; % k = # iterations
nf=1; % nf = # function eval.
x_new = zeros([],1) ; % empty vector which can be filled if length is not known ;
[X,Y]=meshgrid(-2:0.5:2);
fx = 100*(X.^2 - Y).^2 + (X-1).^2;
contour(X, Y, fx, 20)
while (norm(g)>10^(-3)) && (k<k_max)
d = -g./abs(g); % steepest descent direction
s = 1;
newobj = func(x + beta.^m*s*d);
m = m+1;
if obj > newobj - (sigma*beta.^m*s*g'*d)
t = beta^m *s;
x = x + t*d;
m_new = m;
newobj = func(x + t*d);
nf = nf+1;
else
m = m+1;
end
obj=newobj;
g=grad(x);
k = k + 1;
x_new = [x_new, x];
end
% Output x and k
x_new, k, nf
fprintf('Optimal Solution x = [%f, %f]\n', x(1), x(2))
plot(x_new)
function y = func(x)
y = 100*(x(1)^2 - x(2))^2 + (x(1)-1)^2;
end
function y = grad(x)
y(1) = 100*(2*(x(1)^2-x(2))*2*x(1)) + 2*(x(1)-1);
end

Finite Element assembly

I'm having serious problems in a simple example of fem assembly.
I just want to assemble the Mass matrix without any coefficient. The geometry is simple:
conn=[1, 2, 3];
p = [0 0; 1 0; 0 1];
I made it like this so that the physical element will be equal to the reference one.
my basis functions:
phi_1 = #(eta) 1 - eta(1) - eta(2);
phi_2 = #(eta) eta(1);
phi_3 = #(eta) eta(2);
phi = {phi_1, phi_2, phi_3};
Jacobian matrix:
J = #(x,y) [x(2) - x(1), x(3) - x(1);
y(2) - y(1), y(3) - y(1)];
The rest of the code:
M = zeros(np,np);
for K = 1:size(conn,1)
l2g = conn(K,:); %local to global mapping
x = p(l2g,1); %node x-coordinate
y = p(l2g,2); %node y-coordinate
jac = J(x,y);
loc_M = localM(jac, phi);
M(l2g, l2g) = M(l2g, l2g) + loc_M; %add element masses to M
end
localM:
function loc_M = localM(J,phi)
d_J = det(J);
loc_M = zeros(3,3);
for i = 1:3
for j = 1:3
loc_M(i,j) = d_J * quadrature(phi{i}, phi{j});
end
end
end
quadrature:
function value = quadrature(phi_i, phi_j)
p = [1/3, 1/3;
0.6, 0.2;
0.2, 0.6;
0.2, 0.2];
w = [-27/96, 25/96, 25/96, 25/96];
res = 0;
for i = 1:size(p,1)
res = res + phi_i(p(i,:)) * phi_j(p(i,:)) * w(i);
end
value = res;
end
For the simple entry (1,1) I obtain 0.833, while computing the integral by hand or on wolfram alpha I get 0.166 (2 times the result of the quadrature).
I tried with different points and weights for quadrature, but really I do not know what I am doing wrong.

Matrix Dimensions Must Agree Error [duplicate]

I have this code for the cost in logistic regression, in matlab:
function [J, grad] = costFunction(theta, X, y)
m = length(y); % number of training examples
thetas = size(theta,1);
features = size(X,2);
steps = 100;
alpha = 0.1;
J = 0;
grad = zeros(size(theta));
sums = [];
result = 0;
for i=1:m
% sums = [sums; (y(i))*log10(sigmoid(X(i,:)*theta))+(1-y(i))*log10(1-sigmoid(X(i,:)*theta))]
sums = [sums; -y(i)*log(sigmoid(theta'*X(i,:)'))-(1-y(i))*log(1-sigmoid(theta'*X(i,:)'))];
%use log simple not log10, mistake
end
result = sum(sums);
J = (1/m)* result;
%gradient one step
tempo = [];
thetas_update = 0;
temp_thetas = [];
grad = temp_thetas;
for i = 1:size(theta)
for j = 1:m
tempo(j) = (sigmoid(theta'*X(j,:)')-y(j))*X(j,i);
end
temp_thetas(i) = sum(tempo);
tempo = [];
end
grad = (1/m).*temp_thetas;
% =============================================================
end
And I need to vectorize it, but I do not know how do it do it and why? I'm a programmer so I like the for's. But to vectorize it, I'm blank. Any help? Thanks.
function [J, grad] = costFunction(theta, X, y)
hx = sigmoid(X * theta);
m = length(X);
J = (-y' * log(hx) - (1 - y')*log(1 - hx)) / m;
grad = X' * (hx - y) / m;
end
Is the code should be
function [J, grad] = costFunction(theta, X, y)
hx = sigmoid(X' * theta);
m = length(X);
J = sum(-y * log(hx) - (1 - y)*log(1 - hx)) / m;
grad = X' * (hx - y) / m;
end

Vectorization of logistic regression cost

I have this code for the cost in logistic regression, in matlab:
function [J, grad] = costFunction(theta, X, y)
m = length(y); % number of training examples
thetas = size(theta,1);
features = size(X,2);
steps = 100;
alpha = 0.1;
J = 0;
grad = zeros(size(theta));
sums = [];
result = 0;
for i=1:m
% sums = [sums; (y(i))*log10(sigmoid(X(i,:)*theta))+(1-y(i))*log10(1-sigmoid(X(i,:)*theta))]
sums = [sums; -y(i)*log(sigmoid(theta'*X(i,:)'))-(1-y(i))*log(1-sigmoid(theta'*X(i,:)'))];
%use log simple not log10, mistake
end
result = sum(sums);
J = (1/m)* result;
%gradient one step
tempo = [];
thetas_update = 0;
temp_thetas = [];
grad = temp_thetas;
for i = 1:size(theta)
for j = 1:m
tempo(j) = (sigmoid(theta'*X(j,:)')-y(j))*X(j,i);
end
temp_thetas(i) = sum(tempo);
tempo = [];
end
grad = (1/m).*temp_thetas;
% =============================================================
end
And I need to vectorize it, but I do not know how do it do it and why? I'm a programmer so I like the for's. But to vectorize it, I'm blank. Any help? Thanks.
function [J, grad] = costFunction(theta, X, y)
hx = sigmoid(X * theta);
m = length(X);
J = (-y' * log(hx) - (1 - y')*log(1 - hx)) / m;
grad = X' * (hx - y) / m;
end
Is the code should be
function [J, grad] = costFunction(theta, X, y)
hx = sigmoid(X' * theta);
m = length(X);
J = sum(-y * log(hx) - (1 - y)*log(1 - hx)) / m;
grad = X' * (hx - y) / m;
end

Recomposing vector input to algorithm from matrix output

I've written some code to implement an algorithm that takes as input a vector q of real numbers, and returns as an output a complex matrix R. The Matlab code below produces a plot showing the input vector q and the output matrix R.
Given only the complex matrix output R, I would like to obtain the input vector q. Can I do this using least-squares optimization? Since there is a recursive running sum in the code (rs_r and rs_i), the calculation for a column of the output matrix is dependent on the calculation of the previous column.
Perhaps a non-linear optimization can be set up to recompose the input vector q from the output matrix R?
Looking at this in another way, I've used an algorithm to compute a matrix R. I want to run the algorithm "in reverse," to get the input vector q from the output matrix R.
If there is no way to recompose the starting values from the output, thereby treating the problem as a "black box," then perhaps the mathematics of the model itself can be used in the optimization? The program evaluates the following equation:
The Utilde(tau, omega) is the output matrix R. The tau (time) variable comprises the columns of the response matrix R, whereas the omega (frequency) variable comprises the rows of the response matrix R. The integration is performed as a recursive running sum from tau = 0 up to the current tau timestep.
Here are the plots created by the program posted below:
Here is the full program code:
N = 1001;
q = zeros(N, 1); % here is the input
q(1:200) = 55;
q(201:300) = 120;
q(301:400) = 70;
q(401:600) = 40;
q(601:800) = 100;
q(801:1001) = 70;
dt = 0.0042;
fs = 1 / dt;
wSize = 101;
Glim = 20;
ginv = 0;
R = get_response(N, q, dt, wSize, Glim, ginv); % R is output matrix
rows = wSize;
cols = N;
figure; plot(q); title('q value input as vector');
ylim([0 200]); xlim([0 1001])
figure; imagesc(abs(R)); title('Matrix output of algorithm')
colorbar
Here is the function that performs the calculation:
function response = get_response(N, Q, dt, wSize, Glim, ginv)
fs = 1 / dt;
Npad = wSize - 1;
N1 = wSize + Npad;
N2 = floor(N1 / 2 + 1);
f = (fs/2)*linspace(0,1,N2);
omega = 2 * pi .* f';
omegah = 2 * pi * f(end);
sigma2 = exp(-(0.23*Glim + 1.63));
sign = 1;
if(ginv == 1)
sign = -1;
end
ratio = omega ./ omegah;
rs_r = zeros(N2, 1);
rs_i = zeros(N2, 1);
termr = zeros(N2, 1);
termi = zeros(N2, 1);
termr_sub1 = zeros(N2, 1);
termi_sub1 = zeros(N2, 1);
response = zeros(N2, N);
% cycle over cols of matrix
for ti = 1:N
term0 = omega ./ (2 .* Q(ti));
gamma = 1 / (pi * Q(ti));
% calculate for the real part
if(ti == 1)
Lambda = ones(N2, 1);
termr_sub1(1) = 0;
termr_sub1(2:end) = term0(2:end) .* (ratio(2:end).^-gamma);
else
termr(1) = 0;
termr(2:end) = term0(2:end) .* (ratio(2:end).^-gamma);
rs_r = rs_r - dt.*(termr + termr_sub1);
termr_sub1 = termr;
Beta = exp( -1 .* -0.5 .* rs_r );
Lambda = (Beta + sigma2) ./ (Beta.^2 + sigma2); % vector
end
% calculate for the complex part
if(ginv == 1)
termi(1) = 0;
termi(2:end) = (ratio(2:end).^(sign .* gamma) - 1) .* omega(2:end);
else
termi = (ratio.^(sign .* gamma) - 1) .* omega;
end
rs_i = rs_i - dt.*(termi + termi_sub1);
termi_sub1 = termi;
integrand = exp( 1i .* -0.5 .* rs_i );
if(ginv == 1)
response(:,ti) = Lambda .* integrand;
else
response(:,ti) = (1 ./ Lambda) .* integrand;
end
end % ti loop
No, you cannot do so unless you know the "model" itself for this process. If you intend to treat the process as a complete black box, then it is impossible in general, although in any specific instance, anything can happen.
Even if you DO know the underlying process, then it may still not work, as any least squares estimator is dependent on the starting values, so if you do not have a good guess there, it may converge to the wrong set of parameters.
It turns out that by using the mathematics of the model, the input can be estimated. This is not true in general, but for my problem it seems to work. The cumulative integral is eliminated by a partial derivative.
N = 1001;
q = zeros(N, 1);
q(1:200) = 55;
q(201:300) = 120;
q(301:400) = 70;
q(401:600) = 40;
q(601:800) = 100;
q(801:1001) = 70;
dt = 0.0042;
fs = 1 / dt;
wSize = 101;
Glim = 20;
ginv = 0;
R = get_response(N, q, dt, wSize, Glim, ginv);
rows = wSize;
cols = N;
cut_val = 200;
imagLogR = imag(log(R));
Mderiv = zeros(rows, cols-2);
for k = 1:rows
val = deriv_3pt(imagLogR(k,:), dt);
val(val > cut_val) = 0;
Mderiv(k,:) = val(1:end-1);
end
disp('Running iteration');
q0 = 10;
q1 = 500;
NN = cols - 2;
qout = zeros(NN, 1);
for k = 1:NN
data = Mderiv(:,k);
qout(k) = fminbnd(#(q) curve_fit_to_get_q(q, dt, rows, data),q0,q1);
end
figure; plot(q); title('q value input as vector');
ylim([0 200]); xlim([0 1001])
figure;
plot(qout); title('Reconstructed q')
ylim([0 200]); xlim([0 1001])
Here are the supporting functions:
function output = deriv_3pt(x, dt)
% Function to compute dx/dt using the 3pt symmetrical rule
% dt is the timestep
N = length(x);
N0 = N - 1;
output = zeros(N0, 1);
denom = 2 * dt;
for k = 2:N0
output(k - 1) = (x(k+1) - x(k-1)) / denom;
end
function sse = curve_fit_to_get_q(q, dt, rows, data)
fs = 1 / dt;
N2 = rows;
f = (fs/2)*linspace(0,1,N2); % vector for frequency along cols
omega = 2 * pi .* f';
omegah = 2 * pi * f(end);
ratio = omega ./ omegah;
gamma = 1 / (pi * q);
termi = ((ratio.^(gamma)) - 1) .* omega;
Error_Vector = termi - data;
sse = sum(Error_Vector.^2);