Gradient descent and normal equation not giving the same results, why? - matlab

I am working on a simple script that tries to find values for my hypothesis. I am using for one a gradient descent and the second the normal equation. The normal equation is giving me the proper results, but my gradient descent not. I can't figure it out with such a simple case why is not working.
Hi, I am trying to understand why my gradient descend does not match the normal equation on linear regression. I am using matlab to implement both. Here's what I tried:
So I created a dummy training set as such:
x = {1 2 3}, y = {2 3 4}
so my hypothesis should converge to the theta = {1 1} so I get a simple
h(x) = 1 + x;
Here's the test code comparing normal equation and gradient descent:
clear;
disp("gradient descend");
X = [1; 2; 3];
y = [2; 3; 4];
theta = [0 0];
num_iters = 10;
alpha = 0.3;
thetaOut = gradientDescent(X, y, theta, 0.3, 10); % GD -> does not work, why?
disp(thetaOut);
clear;
disp("normal equation");
X = [1 1; 1 2; 1 3];
y = [2;3;4];
Xt = transpose(X);
theta = pinv(Xt*X)*Xt*y; % normal equation -> works!
disp(theta);
And here is the inner loop of the gradient descent:
samples = length(y);
for epoch = 1:iterations
hipoth = X * theta;
factor = alpha * (1/samples);
theta = theta - factor * ((hipoth - y)' * X )';
%disp(epoch);
end
and the output after 10 iterations:
gradient descend = 1.4284 1.4284 - > wrong
normal equation = 1.0000 1.0000 -> correct
does not make sense, it should converge to 1,1.
any ideas? Do I have matlab syntax problem?
thank you!

Gradient Descend can solve a lot of different problems. You want to do a linear regression, i.e. find a linear function h(x) = theta_1 * X + theta_2 that best fits your data:
h(X) = Y + error
What the "best" fit is, is debatable. The most common way to define best fit is to minimize the square of the errors between fit and actual data. Assuming that is what you want ...
Replace the function with
function [theta] = gradientDescent(X, Y, theta, alpha, num_iters)
n = length(Y);
for epoch = 1:num_iters
Y_pred = theta(1)*X + theta(2);
D_t1 = (-2/n) * X' * (Y - Y_pred);
D_t2 = (-2/n) * sum(Y - Y_pred);
theta(1) = theta(1) - alpha * D_t1;
theta(2) = theta(2) - alpha * D_t2;
end
end
and change your parameters a bit, e.g.
num_iters = 10000;
alpha = 0.05;
you get the correct answer. I took the code snippet from here which might also provide a nice starting point to read up on what is actually happening here.

Your gradient descend is solving a different thing than the normal equation, you are not inputing the same data. On top of that you seem to overcomplicate a but the theta update, but that is not a problem. Minor changes in your code result in proper output:
function theta=gradientDescent(X,y,theta,alpha,iterations)
samples = length(y);
for epoch = 1:iterations
hipoth = X * theta;
factor = alpha * (1/samples);
theta = theta - factor * X'*(hipoth - y);
%disp(epoch);
end
end
and the main code:
clear;
X = [1 1; 1 2; 1 3];
y = [2;3;4];
theta = [0 0];
num_iters = 10;
alpha = 0.3;
thetaOut = gradientDescent(X, y, theta', 0.3, 600); % Iterate a bit more, you impatient person!
theta = pinv(X.'*X)*X.'*y; % normal equation -> works!
disp("gradient descend");
disp(thetaOut);
disp("normal equation");
disp(theta);

Related

How to compute CDF from a given PMF in Matlab

For a given PMF p=f(\theta) for \theta between 0 and 2\pi, i computed the CDF in Matlab as
theta=0:2*pi/n:2*pi
for i=1:n
cdf(i)=trapz(theta(1:i),p(1:i));
end
and the result is verified.
I tried to do the same with cumsum as cdf=cumsum(p)*(2*pi)/n but the result is wrong. why?
How can i compute the CDF if the given PMF is in 2D asp=f(\theta,\phi) ? Can i do it without going into detail as explained here ?
In 1D case you can use cumsum to get the vectorized version of loop (assuming that both theta and p are column vectors):
n = 10;
theta = linspace(0, 2*pi, n).';
p = rand(n,1);
cdf = [0; 0.5 * cumsum((p(1:n-1) + p(2:n)) .* diff(theta(1:n)))];
In 2D case the function cumsum will be applied two times, in vertical and horizontal directions:
nthet = 10;
nphi = 10;
theta = linspace(0, 2*pi, nthet).'; % as column vector
phi = linspace(0, pi, nphi); % as row vector
p = rand(nthet, nphi);
cdf1 = 0.5 * cumsum((p(1:end-1, :) + p(2:end, :)) .* diff(theta), 1);
cdf2 = 0.5 * cumsum((cdf1(:, 1:end-1) + cdf1(:, 2:end)) .* diff(phi), 2);
cdf = zeros(nthet, nphi);
cdf(2:end, 2:end) = cdf2;

Global minimum value using Gradient Descent for logistic regression using MATLAB

I am trying to calculate the global minimum values using Gradient Descent algorithm in MATLAB.
The derivated form is:
Equation
My Code is:
%Load and split the features and label
data = load('Data.txt');
X = data(:,1:2);
y = data(:,3);
%Add a new feature
X = [ones(m,1),X];
%Define the length and initial theta value
[m,n] = size(X);
alpha = 0.01;
iteration = 400;
theta = zeros(n,1);
[theta,hist] = GradientDescent(X,y,m, alpha, theta, iteration);
J = CostFunction(X,y,theta,m);
fprintf('The cost for inital value zero is: %f\n',J);
fprintf('Minimum Gradient Descent is:\n')
fprintf('%f\n',theta)
My GradientDescent Functio is:
function [theta,hist] = GradientDescent(X,y,m,alpha, theta, iteration)
hist = zeros(m,1);
for i = 1:iteration
z = X * theta;
sigmoid = Sigmoid(z);
theta = theta - sum((alpha/m) *(X' * (sigmoid - y)));
hist(i) = CostFunction(X,y,theta,m);
end
Cost Function is:
function [J,grad] = CostFunction(X,y,theta,m)
J = 0;
z = X * theta;
hpyX = Sigmoid(z);
J = (1/m) .* ((-y' * log(hpyX)) - ((1-y)' * log(1-hpyX)));
grad = (1/m) * (X' * (hpyX - y));
end
end
and Sigmoid function is:
function sigmoid = Sigmoid(z)
sigmoid = zeros(size(z));
sigmoid = 1 ./ (1 + exp(-z));
end
I am getting output:
The cost for initial value zero is: *NaN*
Minimum Gradient Descent is:
0.413048
0.413048
0.413048
On the other hand, when I am getting different resutl for 400 iteration in fminunc is:
option = optimset('GradObj','on','MaxIter',400);
[theta, cost] = fminunc(#(t)(CostFunction(X,y,t,m)),theta,option);
fprintf('The cost function by fminunc is: %f\n',J)
fprintf('Theta is:\n')
fprintf('%f\n',theta)
Output
The cost function by fminunc is: 0.693147
Theta is:
-25.161343
0.206232
0.201472
I can't understand why I am getting different? Is there I made mistake in the gradient descent function? Also, I am getting NaN value for cost function, when I ma calculating Gradient Descent.

Plotting function with a summation produces a wrong result

I have an equation that needs to be plotted, and the plot is coming out incorrectly.
The equation is as follows:
And the plot should look like this:
But my code:
clear; clc; close all;
eta = 376.7303134617706554679; % 120pi
ka = 4;
N = 24;
coeff = (2)/(pi*eta*ka);
Jz = 0;
theta = [0;0.0351015938948580;0.0702031877897160;0.105304781684574;0.140406375579432;0.175507969474290;0.210609563369148;0.245711157264006;0.280812751158864;0.315914345053722;0.351015938948580;0.386117532843438;0.421219126738296;0.456320720633154;0.491422314528012;0.526523908422870;0.561625502317728;0.596727096212586;0.631828690107444;0.666930284002302;0.702031877897160;0.737133471792019;0.772235065686877;0.807336659581734;0.842438253476592;0.877539847371451;0.912641441266309;0.947743035161167;0.982844629056025;1.01794622295088;1.05304781684574;1.08814941074060;1.12325100463546;1.15835259853031;1.19345419242517;1.22855578632003;1.26365738021489;1.29875897410975;1.33386056800460;1.36896216189946;1.40406375579432;1.43916534968918;1.47426694358404;1.50936853747890;1.54447013137375;1.57957172526861;1.61467331916347;1.64977491305833;1.68487650695319;1.71997810084804;1.75507969474290;1.79018128863776;1.82528288253262;1.86038447642748;1.89548607032233;1.93058766421719;1.96568925811205;2.00079085200691;2.03589244590177;2.07099403979662;2.10609563369148;2.14119722758634;2.17629882148120;2.21140041537606;2.24650200927091;2.28160360316577;2.31670519706063;2.35180679095549;2.38690838485035;2.42200997874520;2.45711157264006;2.49221316653492;2.52731476042978;2.56241635432464;2.59751794821949;2.63261954211435;2.66772113600921;2.70282272990407;2.73792432379893;2.77302591769378;2.80812751158864;2.84322910548350;2.87833069937836;2.91343229327322;2.94853388716807;2.98363548106293;3.01873707495779;3.05383866885265;3.08894026274751;3.12404185664236;-3.12404185664236;-3.08894026274751;-3.05383866885265;-3.01873707495779;-2.98363548106293;-2.94853388716807;-2.91343229327322;-2.87833069937836;-2.84322910548350;-2.80812751158864;-2.77302591769378;-2.73792432379893;-2.70282272990407;-2.66772113600921;-2.63261954211435;-2.59751794821949;-2.56241635432464;-2.52731476042978;-2.49221316653492;-2.45711157264006;-2.42200997874520;-2.38690838485035;-2.35180679095549;-2.31670519706063;-2.28160360316577;-2.24650200927091;-2.21140041537605;-2.17629882148120;-2.14119722758634;-2.10609563369148;-2.07099403979662;-2.03589244590177;-2.00079085200691;-1.96568925811205;-1.93058766421719;-1.89548607032233;-1.86038447642748;-1.82528288253262;-1.79018128863776;-1.75507969474290;-1.71997810084804;-1.68487650695319;-1.64977491305833;-1.61467331916347;-1.57957172526861;-1.54447013137375;-1.50936853747890;-1.47426694358404;-1.43916534968918;-1.40406375579432;-1.36896216189946;-1.33386056800461;-1.29875897410975;-1.26365738021489;-1.22855578632003;-1.19345419242517;-1.15835259853032;-1.12325100463546;-1.08814941074060;-1.05304781684574;-1.01794622295088;-0.982844629056025;-0.947743035161167;-0.912641441266309;-0.877539847371451;-0.842438253476592;-0.807336659581735;-0.772235065686877;-0.737133471792019;-0.702031877897161;-0.666930284002303;-0.631828690107445;-0.596727096212586;-0.561625502317728;-0.526523908422871;-0.491422314528013;-0.456320720633154;-0.421219126738296;-0.386117532843439;-0.351015938948581;-0.315914345053722;-0.280812751158864;-0.245711157264007;-0.210609563369149;-0.175507969474290;-0.140406375579432;-0.105304781684575;-0.0702031877897167;-0.0351015938948580;-2.44929359829471e-16];
for n = 0:N
if n == 0
kappa = 1;
else
kappa = 2;
end
num = (-1.^(n)).*(1i.^(n)).*(cos(n.*theta)).*(kappa);
Hankel = besselh(n,2,ka);
Jz = Jz + ((num./Hankel));
end
Jz = Jz.*coeff;
x = linspace(0,2*pi,length(theta));
plot(x,abs(Jz));
Produces the following incorrect plot:
Note that the values of theta are discrete angles around a circular cylinder.
The equation is the analytical solution to the current density for a TMz polarized cylinder in 2D.
I think that your result is actually correct and this is a simple problem with plotting or with how you specify theta. Since this is a periodic function, lets draw a few more periods:
function q52693512
eta = 376.7303134617706554679; % 120pi
ka = 4;
N = 24;
coeff = (2)/(pi*eta*ka);
Jz = 0;
theta = linspace(-3*pi, 3*pi, 180);
for n = 0:N
kappa = 1 + (n>0);
num = (-1.^(n)).*(1i.^(n)).*(cos(n.*theta)).*(kappa);
Hankel = besselh(n,2,ka);
Jz = Jz + ((num./Hankel));
end
Jz = Jz.*coeff;
figure(); plot(theta, abs(Jz));
You might already be able to see that the desired results is in there but shifted by half a period with respect to our result. This is clearer if we look again at the center (it's exactly the shape you want, if ignoring the horizontal axis values).
Try looking for some justification for ϕ being equal to theta ± π/2 (or something like that).

I don't know what's wrong with my linear regression code [duplicate]

This question already has an answer here:
Machine learning - Linear regression using batch gradient descent
(1 answer)
Closed 6 years ago.
I tried normal equation, and the result was correct.
However, when I used gradient descent, the figure turned out to be wrong. I referred to online resources, but I failed to find out what's wrong. I don't think there's anything special in the following code.
clear;
clc;
m = 100; % generate 100 points
noise = randn(m,1); % 100 noise of normal distribution
x = rand(m, 1) * 10; % generate 100 x's ranging from 0 to 10
y = 10 + 2 * x + noise;
plot (x, y, '.');
hold on;
X = [ones(m, 1) x];
theta = [0; 0];
plot (x, X * theta, 'y');
hold on;
% Method 1 gradient descent
alpha = 0.02; % alpha too big will cause going far away from the result
num_iters = 5;
[theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
% Method 2 normal equation
% theta = (pinv(X' * X )) * X' * y
plot (x, X * theta, 'r');
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y);
J_history = zeros(num_iters, 1);
for iter = 1:num_iters,
theta = theta - alpha * (1/m) * (X' * (X * theta - y));
% plot (X(:, 2), X * theta, 'g');
% hold on;
J_history(iter) = costFunction(X, y, theta);
end
end
function J = costFunction( X, y, theta )
m = length(y);
predictions = X * theta; % prediction on all m examples
sqrErrors = (predictions - y).^2; % Squared errors
J = 1/(2*m) * sum(sqrErrors);
end
Your code is correct. The problem is small number of iterations.
One can take num_iters = 5000; and see that theta converges to right value ([10; 2]).

Gradient Descent with multiple variable without Matrix

I'm new with Matlab and Machine Learning and I tried to make a gradient descent function without using matrix.
m is the number of example on my training set
n is the number of feature for each example
The function gradientDescentMulti takes 5 arguments:
X mxn Matrix
y m-dimensional vector
theta : n-dimensional vector
alpha : a real number
nb_iters : a real number
I already have a solution using matrix multiplication
function theta = gradientDescentMulti(X, y, theta, alpha, num_iters)
for iter = 1:num_iters
gradJ = 1/m * (X'*X*theta - X'*y);
theta = theta - alpha * gradJ;
end
end
The result after iterations:
theta =
1.0e+05 *
3.3430
1.0009
0.0367
But now, I tried to do the same without matrix multiplication, this is the function:
function theta = gradientDescentMulti(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
n = size(X, 2); % number of features
for iter = 1:num_iters
new_theta = zeros(1, n);
%// for each feature, found the new theta
for t = 1:n
S = 0;
for example = 1:m
h = 0;
for example_feature = 1:n
h = h + (theta(example_feature) * X(example, example_feature));
end
S = S + ((h - y(example)) * X(example, n)); %// Sum each feature for this example
end
new_theta(t) = theta(t) - alpha * (1/m) * S; %// Calculate new theta for this example
end
%// only at the end of the function, update all theta simultaneously
theta = new_theta'; %// Transpose new_theta (horizontal vector) to theta (vertical vector)
end
end
The result, all the theta are the same :/
theta =
1.0e+04 *
3.5374
3.5374
3.5374
If you look at the gradient update rule, it may be more efficient to actually compute the hypothesis of all of your training examples first, then subtract this with the ground truth value of each training example and store these into an array or vector. Once you do this, you can then compute the update rule very easily. To me, it doesn't appear that you're doing this in your code.
As such, I rewrote the code, but I have a separate array that stores the difference in the hypothesis of each training example and ground truth value. Once I do this, I compute the update rule for each feature separately:
for iter = 1 : num_iters
%// Compute hypothesis differences with ground truth first
h = zeros(1, m);
for t = 1 : m
%// Compute hypothesis
for tt = 1 : n
h(t) = h(t) + theta(tt)*X(t,tt);
end
%// Compute difference between hypothesis and ground truth
h(t) = h(t) - y(t);
end
%// Now update parameters
new_theta = zeros(1, n);
%// for each feature, find the new theta
for tt = 1 : n
S = 0;
%// For each sample, compute products of hypothesis difference
%// and the right feature of the sample and accumulate
for t = 1 : m
S = S + h(t)*X(t,tt);
end
%// Compute gradient descent step
new_theta(tt) = theta(tt) - (alpha/m)*S;
end
theta = new_theta'; %// Transpose new_theta (horizontal vector) to theta (vertical vector)
end
When I do this, I get the same answers as using the matrix formulation.