Gradient descent Search implemented in matlab theta1 incorrect - matlab

I studied the Machine learning course taught by Prof. Andrew Ng. This is the link
I try to implement the 1st assignment of this course. Exercise 2: Linear Regression based upon Supervised learning problem
1.Implement gradient descent using a learning rate of alpha=0.07.Since Matlab/Octave and Octave index vectors starting from 1 rather than 0, you'll probably use theta(1) and theta(2) in Matlab/Octave to represent theta0 and theta1.
I write down a matlab code to solve this problem:
clc
clear
close all
x = load('ex2x.dat');
y = load('ex2y.dat');
figure % open a new figure window
plot(x, y, '*');
ylabel('Height in meters')
xlabel('Age in years')
m = length(y); % store the number of training examples
x = [ones(m, 1), x]; % Add a column of ones to x
theta = [0 0];
temp=0,temp2=0;
h=[];
alpha=0.07;n=2; %alpha=learning rate
for i=1:m
temp1=0;
for j=1:n
h(j)=theta(j)*x(i,j);
temp1=temp1+h(j);
end
temp=temp+(temp1-y(i));
temp2=temp2+((temp1-y(i))*(x(i,1)+x(i,2)));
end
theta(1)=theta(1)-(alpha*(1/m)*temp);
theta(2)=theta(2)-(alpha*(1/m)*temp2);
I get the answer :
>> theta
theta =
0.0745 0.4545
Here, 0.0745 is exact answer but 2nd one is not accurate.
Actual answer
theta =
0.0745 0.3800
The data set is provided in the link. Can any one help me to fix the problem?

You get wrong results because you write long unnecessary code that is easily prone to bugs, that is exactly why we have matlab:
clear
x = load('d:/ex2x.dat');
y = load('d:/ex2y.dat');
figure(1), clf, plot(x, y, '*'), xlabel('Age in years'), ylabel('Height in meters')
m = length(y); % store the number of training examples
x = [ones(m, 1), x]; % Add a column of ones to x
theta=[0,0]; alpha=0.07;
residuals = x*theta' - y ; %same as: sum(x.*theta,2)-y
theta = theta - alpha*mean(residuals.*x);
disp(theta)

Related

ode45 converges to correct curve shape, but with wrong solution

Thanks in advance for your help. I'm not looking for an explicit solution to my problem, but rather to have my probably obvious errors pointed out.
I have been plugging away at solving a system of non-linear, first order ODEs in MATLAB. The system was solved numerically in this study: http://web.math.ku.dk/~moller/e04/bio/ludwig78.pdf
I have been following the documentation for ode45, and have code that runs.
I have done all of the work to understand and recreate the model from scratch. I presented the qualitative part for a class project. What I am doing now is taking that project a step farther by solving the system in MATLAB with runge-kutta (or any method that works). Finally, I want to dive into the theory behind the numerical analysis to find out why the chosen method converges.
Here is a plot of the numerically solved system, which I am trying to re-create:
I have found that I can create a plot with roughly the same shape, but there are several problems:
The time-scale over which the change occurs is three times that of the above plot.
The range of function values is is vastly wrong.
The desired shapes only occur if I tweak the initial conditions to
be significantly different than what is shown near t=0 above.
So what I'm looking for is a reason for these discrepancies. I've checked my system of ODEs and parameter values so many times my eyes are blurry. Perhaps I am missing something conceptually?
Code:
% System Parameters:
r_b = 1.52;
k_b = 355;
alph = 1.11;
bet = 43200;
r_e = 0.92;
k_e = 1;
p = 0.00195;
r_s = 0.095;
k_s = 25440;
tspan = [0 200];
init = [1 1 1];
[t, Y] = ode45(#(t,y) odefcn(t, y, r_b, k_b, alph, bet, r_e, k_e, p, r_s, k_s), tspan, init);
subplot(3,1,1);
plot(t,Y(:,1),'b');
title('Budworm Density');
subplot(3,1,2)
plot(t,Y(:,2),'g');
title('Branch Density');
subplot(3,1,3);
plot(t,Y(:,3),'r');
title('Foliage Condition');
function dydt = odefcn(t, y, r_b, k_b, alph, bet, r_e, k_e, p, r_s, k_s)
dydt = [ r_b*y(1)*(1 - y(1)/(k_b*y(2))) - bet*(y(1)^2/((alph*y(2))^2 + y(1)^2));
r_s*y(2)*(1 - (y(2)*k_e)/(k_s*y(3)));
r_e*y(3)*(1 - (y(3)/k_e)) - p*y(1)/y(2)
];
end
I don't see anything wrong with your code as such. But I think there are some subtleties involved in producing the figure which are not well explained in the paper.
1) The S axis is scaled (it says 'relative' in the label). I believe they've scaled S by k_s. I think you also need to scale the parameter p (set p = p*k_s) else the final term in the equation for E will be tiny and the E population won't decrease over the required timescales.
2) I think they must have enforced some lower limit on E, to avoid dividing by 0. You can see in the figure that E->0 first, but in your equation for S, if this happened then you would be dividing by 0 and the solver wouldn't converge.
Putting these together, the following slight modification of your code produces a result more similar to that in the paper:
% System Parameters:
r_b = 1.52;
k_b = 355;
alph = 1.11;
bet = 43200;
r_e = 0.92;
k_e = 1;
p = 0.00195;
r_s = 0.095;
k_s = 25440;
% Scale p with k_s
p = p*k_s;
tspan = [0 50]; % [0 200];
init = [1e-16 0.075*k_s 1]; % [1 1 1];
[t, Y] = ode45(#(t,y) odefcn(t, y, r_b, k_b, alph, bet, r_e, k_e, p, r_s, k_s), tspan, init);
% To scale before plotting, so everything fits on a 0->1 y axis.
maxB = 500;
S_scale = k_s;
figure('Position', [200 200 1000 600]);
hold on;
plot(t,Y(:,1)/maxB,'b');
plot(t,Y(:,2)/(S_scale),'g');
plot(t,Y(:,3),'r');
ylim([0, 1]);
hold off;
box on;
legend({['Budworm Density, B / ', num2str(maxB)], 'Branch Density, S / 0.75', 'Foliage Condition, E'}, ...
'Location', 'eastoutside')
function dydt = odefcn(t, y, r_b, k_b, alph, bet, r_e, k_e, p, r_s, k_s)
% Place lower limit on E
E = max(y(3), 1e-5);
dydt = [ r_b*y(1)*(1 - y(1)/(k_b*y(2))) - bet*(y(1)^2/((alph*y(2))^2 + y(1)^2));
r_s*y(2)*(1 - (y(2)*k_e)/(k_s*E));
r_e*E*(1 - (E/k_e)) - p*y(1)/y(2)
];
end
There is a lot of sensitivity to the initial conditions.
A further tweak gets you closer still to the original figure, but I'm not sure if this is just a hack: in the first equation, replace k_b*y(2) with just k_b. Without this, the Budworm density becomes too big before decreasing. The new plot is below.

How to plot decision boundary from linear SVM after PCA in Matlab?

I have conducted a linear SVM on a large dataset, however in order to reduce the number of dimensions I performed a PCA, than conducted the SVM on a subset of the component scores (the first 650 components which explained 99.5% of the variance). Now I want to plot the decision boundary in the original variable space using the beta weights and bias from the SVM created in PCA space. But I can't figure out how to project the bias term from the SVM into the original variable space. I've written a demo using the fisher iris data to illustrate:
clear; clc; close all
% load data
load fisheriris
inds = ~strcmp(species,'setosa');
X = meas(inds,3:4);
Y = species(inds);
mu = mean(X)
% perform the PCA
[eigenvectors, scores] = pca(X);
% train the svm
SVMModel = fitcsvm(scores,Y);
% plot the result
figure(1)
gscatter(scores(:,1),scores(:,2),Y,'rgb','osd')
title('PCA space')
% now plot the decision boundary
betas = SVMModel.Beta;
m = -betas(1)/betas(2); % my gradient
b = -SVMModel.Bias; % my y-intercept
f = #(x) m.*x + b; % my linear equation
hold on
fplot(f,'k')
hold off
axis equal
xlim([-1.5 2.5])
ylim([-2 2])
% inverse transform the PCA
Xhat = scores * eigenvectors';
Xhat = bsxfun(#plus, Xhat, mu);
% plot the result
figure(2)
hold on
gscatter(Xhat(:,1),Xhat(:,2),Y,'rgb','osd')
% and the decision boundary
betaHat = betas' * eigenvectors';
mHat = -betaHat(1)/betaHat(2);
bHat = b * eigenvectors';
bHat = bHat + mu; % I know I have to add mu somewhere...
bHat = bHat/betaHat(2);
bHat = sum(sum(bHat)); % sum to reduce the matrix to a single value
% the correct value of bHat should be 6.3962
f = #(x) mHat.*x + bHat;
fplot(f,'k')
hold off
axis equal
title('Recovered feature space')
xlim([3 7])
ylim([0 4])
Any guidance on how I'm calculating bHat incorrectly would be much appreciated.
Just in case anyone else comes across this problem, the solution is the bias term can be used to find the y-intercept, b = -SVMModel.Bias/betas(2). And the y-intercept is just another point in space [0 b] which can be recovered/unrotated by inverse transforming it through the PCA. This new point can then be used to solve the linear equation y = mx + b (i.e., b = y - mx). So the code should be:
% and the decision boundary
betaHat = betas' * eigenvectors';
mHat = -betaHat(1)/betaHat(2);
yint = b/betas(2); % y-intercept in PCA space
yintHat = [0 b] * eigenvectors'; % recover in original space
yintHat = yintHat + mu;
bHat = yintHat(2) - mHat*yintHat(1); % solve the linear equation
% the correct value of bHat is now 6.3962

Finding the best monotonic curve fit

Edit: Some time after I asked this question, an R package called MonoPoly (available here) came out that does exactly what I want. I highly recommend it.
I have a set of points I want to fit a curve to. The curve must be monotonic (never decreasing in value) i.e. the curve can only go upward or stay flat.
I originally had been polyfitting my results and this had been working great until I found a particular dataset. The polyfit for data in this dataset was non-monotonic.
I did some research and found a possible solution in this post:
Use lsqlin. Constrain the first derivative to be non-negative at both
ends of the domain of interest.
I'm coming from a programming rather than math background so this is a little beyond me. I don't know how to constrain the first derivative to be non-negative as he said. Also, I think in my case I need a curve so I should use lsqcurvefit but I don't know how to constrain it to produce monotonic curves.
Further research turned up this post recommending lsqcurvefit but I can't figure out how to use the important part:
Try this non-linear function F(x) also. You use it together with
lsqcurvefit but it require a start guess on the parameters. But it is
a nice analytic expression to give as a semi-empirical formula in a
paper or a report.
%Monotone function F(x), with c0,c1,c2,c3 varitional constants F(x)=
c3 + exp(c0 - c1^2/(4*c2))sqrt(pi)...
Erfi((c1 + 2*c2*x)/(2*sqrt(c2))))/(2*sqrt(c2))
%Erfi(x)=erf(i*x) (look mathematica) but the function %looks much like
x^3 %derivative f(x), probability density f(x)>=0
f(x)=dF/dx=exp(c0+c1*x+c2*x.^2)
I must have a monotonic curve but I'm not sure how to do it, even with all of this information. Would a random number be enough for a "start guess". Is lsqcurvefit best? How can I use it to produce a best fitting monotonic curve?
Thanks
Here is a simple solution using lsqlin. The derivative constrain is enforced in each data point, this could be easily modified if needed.
Two coefficient matrices are needed, one (C) for least square error calculation and one (A) for derivatives in the data points.
% Following lsqlin's notations
%--------------------------------------------------------------------------
% PRE-PROCESSING
%--------------------------------------------------------------------------
% for reproducibility
rng(125)
degree = 3;
n_data = 10;
% dummy data
x = rand(n_data,1);
d = rand(n_data,1) + linspace(0,1,n_data).';
% limit on derivative - in each data point
b = zeros(n_data,1);
% coefficient matrix
C = nan(n_data, degree+1);
% derivative coefficient matrix
A = nan(n_data, degree);
% loop over polynomial terms
for ii = 1:degree+1
C(:,ii) = x.^(ii-1);
A(:,ii) = (ii-1)*x.^(ii-2);
end
%--------------------------------------------------------------------------
% FIT - LSQ
%--------------------------------------------------------------------------
% Unconstrained
% p1 = pinv(C)*y
p1 = fliplr((C\d).')
p2 = polyfit(x,d,degree)
% Constrained
p3 = fliplr(lsqlin(C,d,-A,b).')
%--------------------------------------------------------------------------
% PLOT
%--------------------------------------------------------------------------
xx = linspace(0,1,100);
plot(x, d, 'x')
hold on
plot(xx, polyval(p1, xx))
plot(xx, polyval(p2, xx),'--')
plot(xx, polyval(p3, xx))
legend('data', 'lsq-pseudo-inv', 'lsq-polyfit', 'lsq-constrained', 'Location', 'southoutside')
xlabel('X')
ylabel('Y')
For the specified input the fitted curves:
Actually this code is more general than what you requested, since the degree of polynomial can be changed as well.
EDIT: enforce derivative constrain in additional points
The issue pointed out in the comments is due to that the derivative checks are enforced only in the data points. Between those no checks are performed. Below is a solution to alleviate this problem. The idea: convert the problem to an unconstrained optimization by using a penalty term.
Note that it is using a term pen to penalize the violation of the derivative check, thus the result is not a true least square error solution. Additionally, the result is dependent on the penalty function.
function lsqfit_constr
% Following lsqlin's notations
%--------------------------------------------------------------------------
% PRE-PROCESSING
%--------------------------------------------------------------------------
% for reproducibility
rng(125)
degree = 3;
% data from comment
x = [0.2096 -3.5761 -0.6252 -3.7951 -3.3525 -3.7001 -3.7086 -3.5907].';
d = [95.7750 94.9917 90.8417 62.6917 95.4250 89.2417 89.4333 82.0250].';
n_data = length(d);
% number of equally spaced points to enforce the derivative
n_deriv = 20;
xd = linspace(min(x), max(x), n_deriv);
% limit on derivative - in each data point
b = zeros(n_deriv,1);
% coefficient matrix
C = nan(n_data, degree+1);
% derivative coefficient matrix
A = nan(n_deriv, degree);
% loop over polynom terms
for ii = 1:degree+1
C(:,ii) = x.^(ii-1);
A(:,ii) = (ii-1)*xd.^(ii-2);
end
%--------------------------------------------------------------------------
% FIT - LSQ
%--------------------------------------------------------------------------
% Unconstrained
% p1 = pinv(C)*y
p1 = (C\d);
lsqe = sum((C*p1 - d).^2);
p2 = polyfit(x,d,degree);
% Constrained
[p3, fval] = fminunc(#error_fun, p1);
% correct format for polyval
p1 = fliplr(p1.')
p2
p3 = fliplr(p3.')
fval
%--------------------------------------------------------------------------
% PLOT
%--------------------------------------------------------------------------
xx = linspace(-4,1,100);
plot(x, d, 'x')
hold on
plot(xx, polyval(p1, xx))
plot(xx, polyval(p2, xx),'--')
plot(xx, polyval(p3, xx))
% legend('data', 'lsq-pseudo-inv', 'lsq-polyfit', 'lsq-constrained', 'Location', 'southoutside')
xlabel('X')
ylabel('Y')
%--------------------------------------------------------------------------
% NESTED FUNCTION
%--------------------------------------------------------------------------
function e = error_fun(p)
% squared error
sqe = sum((C*p - d).^2);
der = A*p;
% penalty term - it is crucial to fine tune it
pen = -sum(der(der<0))*10*lsqe;
e = sqe + pen;
end
end
Gradient free methods might be used to solve the problem by exactly enforcing the derivative constrain, for example:
[p3, fval] = fminsearch(#error_fun, p_ini);
%--------------------------------------------------------------------------
% NESTED FUNCTION
%--------------------------------------------------------------------------
function e = error_fun(p)
% squared error
sqe = sum((C*p - d).^2);
der = A*p;
if any(der<0)
pen = Inf;
else
pen = 0;
end
e = sqe + pen;
end
fmincon with non-linear constraint might be a better choice.
I let you to work out the details and to tune the algorithms. I hope that it is sufficient.

Finding solution to Cauchy prob. in Matlab

I need some help with finding solution to Cauchy problem in Matlab.
The problem:
y''+10xy = 0, y(0) = 7, y '(0) = 3
Also I need to plot the graph.
I wrote some code but, I'm not sure whether it's correct or not. Particularly in function section.
Can somebody check it? If it's not correct, where I made a mistake?
Here is separate function in other .m file:
function dydx = funpr12(x,y)
dydx = y(2)+10*x*y
end
Main:
%% Cauchy problem
clear all, clc
xint = [0,5]; % interval
y0 = [7;3]; % initial conditions
% numerical solution using ode45
sol = ode45(#funpr12,xint,y0);
xx = [0:0.01:5]; % vector of x values
y = deval(sol,xx); % vector of y values
plot(xx,y(1,:),'r', 'LineWidth',3)
legend('y1(x)')
xlabel('x')
ylabel('y(x)')
I get this graph:
ode45 and its related ilk are only designed to solve first-order differential equations which are of the form y' = .... You need to do a bit of work if you want to solve second-order differential questions.
Specifically, you'll have to represent your problem as a system of first-order differential equations. You currently have the following ODE:
y'' + 10xy = 0, y(0) = 7, y'(0) = 3
If we rearrange this to solve for y'', we get:
y'' = -10xy, y(0) = 7, y'(0) = 3
Next, you'll want to use two variables... call it y1 and y2, such that:
y1 = y
y2 = y'
The way you have built your code for ode45, the initial conditions that you specified are exactly this - the guess using y and its first-order guess y'.
Taking the derivative of each side gives:
y1' = y'
y2' = y''
Now, doing some final substitutions we get this final system of first-order differential equations:
y1' = y2
y2' = -10*x*y1
If you're having trouble seeing this, simply remember that y1 = y, y2 = y' and finally y2' = y'' = -10*x*y = -10*x*y1. Therefore, you now need to build your function so that it looks like this:
function dydx = funpr12(x,y)
y1 = y(2);
y2 = -10*x*y(1);
dydx = [y1 y2];
end
Remember that the vector y is a two element vector which represents the value of y and the value of y' respectively at each time point specified at x. I would also argue that making this an anonymous function is cleaner. It requires less code:
funpr12 = #(x,y) [y(2); -10*x*y(1)];
Now go ahead and solve it (using your code):
%%// Cauchy problem
clear all, clc
funpr12 = #(x,y) [y(2); -10*x*y(1)]; %// Change
xint = [0,5]; % interval
y0 = [7;3]; % initial conditions
% numerical solution using ode45
sol = ode45(funpr12,xint,y0); %// Change - already a handle
xx = [0:0.01:5]; % vector of x values
y = deval(sol,xx); % vector of y values
plot(xx,y(1,:),'r', 'LineWidth',3)
legend('y1(x)')
xlabel('x')
ylabel('y(x)')
Take note that the output when simulating the solution to the differential equation by deval will be a two column matrix. The first column is the solution to the system while the second column is the derivative of the solution. As such, you'll want to plot the first column, which is what the plot syntax is doing.
I get this plot now:

How to plot two 1-dimensional Gaussian distributions together with the classification boundary [Matlab]?

I have two classes(normally distributed), C1 and C2, each defined by their mean and standard deviation. I want to be able to visualize the pdf plot of a normal distributions and the classification boundary between the two. Currently I have the code to plot the distributions but I'm not sure how to go about plotting the decision boundary. Any ideas would be appreciated. I have included a sample of what I want to plot. 1
Many thanks!
This is what I came up with:
% Generate some example data
mu1 = -0.5; sigma1 = 0.7; mu2 = 0.8; sigma2 = 0.5;
x = linspace(-8, 8, 500);
y1 = normpdf(x, mu1, sigma1);
y2 = normpdf(x, mu2, sigma2);
% Plot it
figure; plot(x, [y1; y2])
hold on
% Detect intersection between curves; choose threshold so you get the whole
% intersection (0.0001 should do unless your sigmas are very large)
ind = y1 .* y2 > 0.0001;
% Find the minimum values in range
minVals = min([y1(ind); y2(ind)]);
if ~isempty(minVals)
area(x(ind), minVals)
end
I don't know if this is the best way to do what you want, but it seems to work.