Why "theta" in this code is NaN? [duplicate] - matlab

This question already has an answer here:
Machine learning - Linear regression using batch gradient descent
(1 answer)
Closed 6 years ago.
I'm learning neural networks (linear regression) in MATLAB for my research project and this is a part of the code I use.
The problem is the value of "theta" is NaN and I don't know why.
Could you tell me where is the error?
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
theta = zeros(2, 1); % initialize fitting parameters
%GRADIENTDESCENT Performs gradient descent to learn theta
% theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by
% taking num_iters gradient steps with learning rate alpha
% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
theta = theta - ((alpha/m)*((X*theta)-y)' * X)';
end
end
% run gradient descent
theta = gradientDescent(X, y, theta, alpha, iterations);

The function you have is fine. But the sizes of X and theta are incompatible. In general, if size(X) is [N, M], then size(theta) should be [M, 1].
So I would suggest replacing the line
theta = zeros(2, 1);
with
theta = zeros(size(X, 2), 1);
should have as many columns as theta has elements. So in this example, size(X) should be [133, 2].
Also, you should move that initialization before you call the function.
For example, the following code does not return NaN if you remove the initialization of theta from the function.
X = rand(133, 1); % or rand(133, 2)
y = rand(133, 1);
theta = zeros(size(X, 2), 1); % initialize fitting parameters
% run gradient descent
theta = gradientDescent(X, y, theta, 0.1, 1500)
EDIT: This is in response to comments below.
Your problem is due to the gradient descent algorithm not converging. To see it yourself, plot J_history, which should never increase if the algorithm is stable. You can compute J_history by inserting the following line inside the for-loop in the function gradientDescent:
J_history(iter) = mean((X * theta - y).^2);
In your case (i.e. given data file and alpha = 0.01), J_history increases exponentially. This is shown in the plot below. Note that the y-axis is in logarithmic scale.
This is a clear sign of instability in gradient descent.
There are two ways to eliminate this problem.
Option 1. Use smaller alpha. alpha controls the rate of gradient descent. If it is too large, the algorithm is unstable. If it is too small, the algorithm takes a long time to reach the optimal solution. Try something like alpha = 1e-8 and go from there. For example, alpha = 1e-8 results in the following cost function:
Option 2. Use feature scaling to reduce the magnitude of the inputs. One way of doing this is called Standarization. The following is an example of using standarization and the resulting cost function:
data=xlsread('v & t.xlsx');
data(:,1) = (data(:,1)-mean(data(:,1)))/std(data(:,1));

Related

Interpolation using chebyshev points

Interpolate the Runge function of Example 10.6 at Chebyshev points for n from 10 to 170
in increments of 10. Calculate the maximum interpolation error on the uniform evaluation
mesh x = -1:.001:1 and plot the error vs. polynomial degree as in Figure 10.8 using
semilogy. Observe spectral accuracy.
The runge function is given by: f(x) = 1 / (1 + 25x^2)
My code so far:
x = -1:0.001:1;
n = 170;
i = 10:10:170;
cx = cos(((2*i + 1)/(2*(n+1)))*pi); %chebyshev pts
y = 1 ./ (1 + 25*x.^2); %true fct
%chebyshev polynomial, don't know how to construct using matlab
yc = polyval(c, x); %graph of approx polynomial fct
plot(x, yc);
mErr = (1 / ((2.^n).*(n+1)!))*%n+1 derivative of f evaluated at max x in [-1,1], not sure how to do this
%plotting stuff
I know very little matlab, so I am struggling on creating the interpolating polynomial. I did some google work, but I was confused with the current functions as I didn't find one that just simply took in points and the polynomial to be interpolated. I am also a bit confused in this case of whether I should be doing i = 0:1:n and n=10:10:170 or if n is fixed here. Any help is appreciated, thank you
Since you know very little about MATLAB, I will try explain everything step by step:
First, to visualize the Runge function, you can type:
f = #(x) 1./(1+25*x.^2); % Runge function
% plot Runge function over [-1,1];
x = -1:1e-3:1;
y = f(x);
figure;
plot(x,y); title('Runge function)'); xlabel('x');ylabel('y');
The #(x) part of the code is a function handle, a very useful feature of MATLAB. Notice the function is properly vecotrized, so it can receive as an argument a variable or an array. The plot function is straightforward.
To understand the Runge phenomenon, consider a linearly spaced vector of [-1,1] of 10 elements and use these points to obtain the interpolating (Lagrange) polynomial. You get the following:
% 10 linearly spaced points
xc = linspace(-1,1,10);
yc = f(xc);
p = polyfit(xc,yc,9); % gives the coefficients of the polynomial of degree 10
hold on; plot(xc,yc,'o',x,polyval(p,x));
The polyfit function does a polynomial curve fitting - it obtains the coefficients of the interpolating polynomial, given the poins x,y and the degree of the polynomial n. You can easily evaluate the polynomial at other points with the polyval function.
Obseve that, close to the end domains, you get an oscilatting polynomial and the interpolation is not a good approximation of the function. As a matter of fact, you can plot the absolute error, comparing the value of the function f(x) and the interpolating polynomial p(x):
plot(x,abs(y-polyval(p,x))); xlabel('x');ylabel('|f(x)-p(x)|');title('Error');
This error can be reduced if, instead of using a linearly space vector, you use other points to do the interpolation. A good choice is to use the Chebyshev nodes, which should reduce the error. As a matter of fact, notice that:
% find 10 Chebyshev nodes and mark them on the plot
n = 10;
k = 1:10; % iterator
xc = cos((2*k-1)/2/n*pi); % Chebyshev nodes
yc = f(xc); % function evaluated at Chebyshev nodes
hold on;
plot(xc,yc,'o')
% find polynomial to interpolate data using the Chebyshev nodes
p = polyfit(xc,yc,n-1); % gives the coefficients of the polynomial of degree 10
plot(x,polyval(p,x),'--'); % plot polynomial
legend('Runge function','Chebyshev nodes','interpolating polynomial','location','best')
Notice how the error is reduced close to the end domains. You don't get now that high oscillatory behaviour of the interpolating polynomial. If you plot the error, you will observe:
plot(x,abs(y-polyval(p,x))); xlabel('x');ylabel('|f(x)-p(x)|');title('Error');
If, now, you change the number of Chebyshev nodes, you will get an even better approximation. A little modification on the code lets you run it again for different numbers of nodes. You can store the maximum error and plot it as a function of the number of nodes:
n=1:20; % number of nodes
% pre-allocation for speed
e_ln = zeros(1,length(n)); % error for the linearly spaced interpolation
e_cn = zeros(1,length(n)); % error for the chebyshev nodes interpolation
for ii=1:length(n)
% linearly spaced vector
x_ln = linspace(-1,1,n(ii)); y_ln = f(x_ln);
p_ln = polyfit(x_ln,y_ln,n(ii)-1);
e_ln(ii) = max( abs( y-polyval(p_ln,x) ) );
% Chebyshev nodes
k = 1:n(ii); x_cn = cos((2*k-1)/2/n(ii)*pi); y_cn = f(x_cn);
p_cn = polyfit(x_cn,y_cn,n(ii)-1);
e_cn(ii) = max( abs( y-polyval(p_cn,x) ) );
end
figure
plot(n,e_ln,n,e_cn);
xlabel('no of points'); ylabel('maximum absolute error');
legend('linearly space','chebyshev nodes','location','best')

Gradient Descent Overshooting and Cost Blowing Up when used for Regularized Logistic Regression

I'm using MATLAB to code Regularized Logistic Regression and am using Gradient Descent to discover the parameters. All is based on Andrew Ng's Coursera Machine Learning course. I am trying to code the cost function from Andrew's notes/videos. I am not entirely sure if I'm doing it right.
The main problem is... if the number of iterations gets too large, my cost seems to be blowing up. This happens regardless of whether I normalize or not (converting all the data to be between 0 and 1). This problem also causes the decision boundary being produced to shrink (underfit?). Below are three sample results that were obtained, where the decision boundaries of GD are compared against that of Matlab's fminunc.
As can be seen, the cost shoots up when the number of iterations increases. Could it be that I incorrectly coded the cost? Or is there indeed a possibility that Gradient Descent can overshoot? If it helps, I am providing my code. The code I used to calculate the cost history is:
costHistory(i) = (-1 * ( (1/m) * y'*log(h_x) + (1-y)'*log(1-h_x))) + ( (lambda/(2*m)) * sum(theta(2:end).^2) );, based on the equation below:
The full code is given below. Note that I have called other functions as well in this code. Would appreciate any pointers! :) Thank you in advance!
% REGULARIZED Logistic Regression with Gradient Descent
clc; clear all; close all;
dataset = load('ex2data2.txt');
x = dataset(:,1:end-1); y = dataset(:,end); m = length(y);
% Mapping the features (includes adding the intercept term)
x = mapFeature(x(:,1), x(:,2)); % Change to polynomial of the 6th degree
% Define the initial thetas. Same as the number of features, including
% the newly added intercept term (1s)
theta = zeros(size(x,2),1) + 0.05;
initial_theta = theta; % will be used later...
% Set lambda equals to 1
lambda = 1;
% calculate theta transpose x and also the hypothesis h_x
alpha = 0.005;
itr = 120000; % number of iterations set to 120K
for i = 1:itr
ttrx = x * theta; % theta transpose x
h_x = 1 ./ (1 + exp(-ttrx)); % sigmoid hypothesis
error = h_x - y;
% the gradient a.k.a. the derivative of J(\theta)
for j = 1:length(theta)
if j == 1
gradientA(j,1) = 1/m * (error)' * x(:,j);
theta(j) = theta(j) - alpha * gradientA(j,1);
else
gradientA(j,1) = (1/m * (error)' * x(:,j)) - (lambda/m)*theta(j);
theta(j) = theta(j) - alpha * gradientA(j,1);
end
end
costHistory(i) = (-1 * ( (1/m) * y'*log(h_x) + (1-y)'*log(1-h_x))) + ( (lambda/(2*m)) * sum(theta(2:end).^2) );
end
[cost, grad] = costFunctionReg(initial_theta, x, y, lambda);
% Using MATLAB's built-in function fminunc to minimze the cost function
% Set options for fminunc
options = optimset('GradObj', 'on', 'MaxIter', 500);
% Run fminunc to obtain the optimal theta
% This function will return theta and the cost
[thetafm, cost] = fminunc(#(t)(costFunctionReg(t, x, y, lambda)), initial_theta, options);
close all;
plotDecisionBoundary_git(theta, x, y); % based on GD
plotDecisionBoundary_git(thetafm, x, y); % based on fminunc
figure;
plot(1:itr, costHistory(:), '--r');
title('The cost history based on GD');

Gradient descent Search implemented in matlab theta1 incorrect

I studied the Machine learning course taught by Prof. Andrew Ng. This is the link
I try to implement the 1st assignment of this course. Exercise 2: Linear Regression based upon Supervised learning problem
1.Implement gradient descent using a learning rate of alpha=0.07.Since Matlab/Octave and Octave index vectors starting from 1 rather than 0, you'll probably use theta(1) and theta(2) in Matlab/Octave to represent theta0 and theta1.
I write down a matlab code to solve this problem:
clc
clear
close all
x = load('ex2x.dat');
y = load('ex2y.dat');
figure % open a new figure window
plot(x, y, '*');
ylabel('Height in meters')
xlabel('Age in years')
m = length(y); % store the number of training examples
x = [ones(m, 1), x]; % Add a column of ones to x
theta = [0 0];
temp=0,temp2=0;
h=[];
alpha=0.07;n=2; %alpha=learning rate
for i=1:m
temp1=0;
for j=1:n
h(j)=theta(j)*x(i,j);
temp1=temp1+h(j);
end
temp=temp+(temp1-y(i));
temp2=temp2+((temp1-y(i))*(x(i,1)+x(i,2)));
end
theta(1)=theta(1)-(alpha*(1/m)*temp);
theta(2)=theta(2)-(alpha*(1/m)*temp2);
I get the answer :
>> theta
theta =
0.0745 0.4545
Here, 0.0745 is exact answer but 2nd one is not accurate.
Actual answer
theta =
0.0745 0.3800
The data set is provided in the link. Can any one help me to fix the problem?
You get wrong results because you write long unnecessary code that is easily prone to bugs, that is exactly why we have matlab:
clear
x = load('d:/ex2x.dat');
y = load('d:/ex2y.dat');
figure(1), clf, plot(x, y, '*'), xlabel('Age in years'), ylabel('Height in meters')
m = length(y); % store the number of training examples
x = [ones(m, 1), x]; % Add a column of ones to x
theta=[0,0]; alpha=0.07;
residuals = x*theta' - y ; %same as: sum(x.*theta,2)-y
theta = theta - alpha*mean(residuals.*x);
disp(theta)

[Octave]Using fminunc is not always giving a consistent solution

I am trying to find the coefficients in an equation to model the step response of a motor which is of the form 1-e^x. The equation I'm using to model is of the form
a(1)*t^2 + a(2)*t^3 + a(3)*t^3 + ...
(It is derived in a research paper used to solve for motor parameters)
Sometimes using fminunc to find the coefficients works out okay, and I get a good result, and it matches the training data fairly well. Other times the returned coefficients are horrible (going extremely higher than what the output should be and is orders of magnitude off). This especially happens once I started using higher order terms: using any model that uses x^8 or higher (x^9, x^10, x^11, etc.) always produces bad results.
Since it works sometimes, I can't think why my implementation would be wrong. I have tried fminunc while providing the gradients and while also not providing the gradients yet there is no difference. I've looked into using other functions to solve for the coefficients, like polyfit, but in that instance it has to have terms that are raised from 1 to the highest order term, but the model I'm using has its lowest power at 2.
Here is the main code:
clear;
%Overall Constants
max_power = 7;
%Loads in data
%data = load('TestData.txt');
load testdata.mat
%Sets data into variables
indep_x = data(:,1); Y = data(:,2);
%number of data points
m = length(Y);
%X is a matrix with the independant variable
exps = [2:max_power];
X_prime = repmat(indep_x, 1, max_power-1); %Repeats columns of the indep var
X = bsxfun(#power, X_prime, exps);
%Initializes theta to rand vals
init_theta = rand(max_power-1,1);
%Sets up options for fminunc
options = optimset( 'MaxIter', 400, 'Algorithm', 'quasi-newton');
%fminunc minimizes the output of the cost function by changing the theta paramaeters
[theta, cost] = fminunc(#(t)(costFunction(t, X, Y)), init_theta, options)
%
Y_line = X * theta;
figure;
hold on; plot(indep_x, Y, 'or');
hold on; plot(indep_x, Y_line, 'bx');
And here is costFunction:
function [J, Grad] = costFunction (theta, X, Y)
%# of training examples
m = length(Y);
%Initialize Cost and Grad-Vector
J = 0;
Grad = zeros(size(theta));
%Poduces an output based off the current values of theta
model_output = X * theta;
%Computes the squared error for each example then adds them to get the total error
squared_error = (model_output - Y).^2;
J = (1/(2*m)) * sum(squared_error);
%Computes the gradients for each theta t
for t = 1:size(theta, 1)
Grad(t) = (1/m) * sum((model_output-Y) .* X(:, t));
end
endfunction
Any help or advice would be appreciated.
Try adding regularization to your costFunction:
function [J, Grad] = costFunction (theta, X, Y, lambda)
m = length(Y);
%Initialize Cost and Grad-Vector
J = 0;
Grad = zeros(size(theta));
%Poduces an output based off the current values of theta
model_output = X * theta;
%Computes the squared error for each example then adds them to get the total error
squared_error = (model_output - Y).^2;
J = (1/(2*m)) * sum(squared_error);
% Regularization
J = J + lambda*sum(theta(2:end).^2)/(2*m);
%Computes the gradients for each theta t
regularizator = lambda*theta/m;
% overwrite 1st element i.e the one corresponding to theta zero
regularizator(1) = 0;
for t = 1:size(theta, 1)
Grad(t) = (1/m) * sum((model_output-Y) .* X(:, t)) + regularizator(t);
end
endfunction
The regularization term lambda is used to control the learning rate. Start with lambda=1. The grater the value for lambda, the slower the learning will occur. Increase lambda if the behavior you describe persists. You may need to increase the number of iterations if lambda gets high.
You may also consider normalization of your data, and some heuristic for initializing theta - setting all theta to 0.1 may be better than random. If nothing else it'll provide better reproducibility from training to training.

How to fit a curve to a damped sine wave in matlab

I have some measurements done and It should be a damped sine wave but I can't find any information on how to make (if possible) a good damped sine wave with Matlab's curve fitting tool.
Here's what I get using a "Smoothing spline":
Image http://s21.postimg.org/yznumla1h/damped.png.
Edit 1:
Here's what I got using the "custom equation" option:
Edit 2:
I've uploaded the data to pastebin in csv format where the first column is the amplitude and the second is the time.
The damped sin function can be created using the following code:
f=f*2*pi;
t=0:.001:1;
y=A*sin(f*t + phi).*exp(-a*t);
plot(t,y);
axis([0 1 -2.2 2.2]);
Now you can use "cftool" from matlab and load your data then set the equation type to custom and enter the formula of the damped sin function. Here you can see what I found so far...
I think the distribution of the data makes it hard for the fitting tool to do a good fit. It may need more data or more distributed data. In addition, there are other tools and methods as well, for instance check this for now: docstoc.com/docs/74524947/Mathcad-Damped-sine-fit-mcd
For all these methods that actually trying to search and find a local optimum (for each fitting parameters), the most important thing is the initial condition. I believe if you choose a good initial condition (initial guess), the fitting tool works well.
Good Luck
I wouldn't use the curve fitting toolbox for this, I'd use a curve-fitting function, e.g. lsqcurvefit. Here is an example taken from something I did a while back:
% Define curve model functions
expsin = #(a, f, phi, tau, t)a * sin(omega * t + phi) .* exp(-tau * t);
lsqexpsin = #(p, t)expsin(p(1), p(2), p(3), p(4), t);
% Setup data params
a = 1; % gain
f = 10; % frequency
phi = pi/2; % phase angle
tau = 0.9252523;% time constant
fs = 100; % sample rate
N = fs; % length
SNR = 10; % signal to noise ratio
% Generate time vector
dt = 1/fs;
t = (0:N-1)*dt;
omega = 2 * pi * f; % angular freq
noiseGain = 10^(-SNR/20); % gain for given SNR
% Generate dummy data: decaying sinusoid plus noise
x = expsin(a, omega, phi, tau, t);
noise = noiseGain * rand(size(x));
noise = noise - mean(noise);
x = x + noise;
close all; figure; hold on;
plot(t, x, 'k-', 'LineWidth', 2);
% Count zero crossings to find frequency
zCross = find(x(1:end-1) .* x(2:end) < 0);
T = mean(diff(zCross) * dt) * 2;
fEstimate = 1 / T;
omegaEstimate = 2 * pi * fEstimate;
% Fit model to data
init = [0.5, omegaEstimate, 0, 0.5];
[newparams, err] = lsqcurvefit(lsqexpsin, init, t, x);
plot(t, lsqexpsin(newparams, t))
Here some data with known parameters is generated, and some random noise added; the data is plotted. The parameters [a, phi, tau] are estimated from the data and a curve with the estimated parameters plotted on top.