[Octave]Using fminunc is not always giving a consistent solution - matlab

I am trying to find the coefficients in an equation to model the step response of a motor which is of the form 1-e^x. The equation I'm using to model is of the form
a(1)*t^2 + a(2)*t^3 + a(3)*t^3 + ...
(It is derived in a research paper used to solve for motor parameters)
Sometimes using fminunc to find the coefficients works out okay, and I get a good result, and it matches the training data fairly well. Other times the returned coefficients are horrible (going extremely higher than what the output should be and is orders of magnitude off). This especially happens once I started using higher order terms: using any model that uses x^8 or higher (x^9, x^10, x^11, etc.) always produces bad results.
Since it works sometimes, I can't think why my implementation would be wrong. I have tried fminunc while providing the gradients and while also not providing the gradients yet there is no difference. I've looked into using other functions to solve for the coefficients, like polyfit, but in that instance it has to have terms that are raised from 1 to the highest order term, but the model I'm using has its lowest power at 2.
Here is the main code:
clear;
%Overall Constants
max_power = 7;
%Loads in data
%data = load('TestData.txt');
load testdata.mat
%Sets data into variables
indep_x = data(:,1); Y = data(:,2);
%number of data points
m = length(Y);
%X is a matrix with the independant variable
exps = [2:max_power];
X_prime = repmat(indep_x, 1, max_power-1); %Repeats columns of the indep var
X = bsxfun(#power, X_prime, exps);
%Initializes theta to rand vals
init_theta = rand(max_power-1,1);
%Sets up options for fminunc
options = optimset( 'MaxIter', 400, 'Algorithm', 'quasi-newton');
%fminunc minimizes the output of the cost function by changing the theta paramaeters
[theta, cost] = fminunc(#(t)(costFunction(t, X, Y)), init_theta, options)
%
Y_line = X * theta;
figure;
hold on; plot(indep_x, Y, 'or');
hold on; plot(indep_x, Y_line, 'bx');
And here is costFunction:
function [J, Grad] = costFunction (theta, X, Y)
%# of training examples
m = length(Y);
%Initialize Cost and Grad-Vector
J = 0;
Grad = zeros(size(theta));
%Poduces an output based off the current values of theta
model_output = X * theta;
%Computes the squared error for each example then adds them to get the total error
squared_error = (model_output - Y).^2;
J = (1/(2*m)) * sum(squared_error);
%Computes the gradients for each theta t
for t = 1:size(theta, 1)
Grad(t) = (1/m) * sum((model_output-Y) .* X(:, t));
end
endfunction
Any help or advice would be appreciated.

Try adding regularization to your costFunction:
function [J, Grad] = costFunction (theta, X, Y, lambda)
m = length(Y);
%Initialize Cost and Grad-Vector
J = 0;
Grad = zeros(size(theta));
%Poduces an output based off the current values of theta
model_output = X * theta;
%Computes the squared error for each example then adds them to get the total error
squared_error = (model_output - Y).^2;
J = (1/(2*m)) * sum(squared_error);
% Regularization
J = J + lambda*sum(theta(2:end).^2)/(2*m);
%Computes the gradients for each theta t
regularizator = lambda*theta/m;
% overwrite 1st element i.e the one corresponding to theta zero
regularizator(1) = 0;
for t = 1:size(theta, 1)
Grad(t) = (1/m) * sum((model_output-Y) .* X(:, t)) + regularizator(t);
end
endfunction
The regularization term lambda is used to control the learning rate. Start with lambda=1. The grater the value for lambda, the slower the learning will occur. Increase lambda if the behavior you describe persists. You may need to increase the number of iterations if lambda gets high.
You may also consider normalization of your data, and some heuristic for initializing theta - setting all theta to 0.1 may be better than random. If nothing else it'll provide better reproducibility from training to training.

Related

Using fminsearch for parameter estimation

I am trying to find log Maximum likelihood estimation for Gaussian distribution, in order to estimate parameters.
I know that Matlab has a built-in function that does this by fitting a Gaussian distribution, but I need to do this with logMLE in order to expand this method later for other distributions.
So here is the log-likelihood function for gaussian dist :
Gaussian Log MLE
And I used this code to estimate the parameters for a set of variables (r) with fminsearch. but my search does not coverage and I don't fully understand where is the problem:
clear
clc
close all
%make random numbers with gaussian dist
r=[2.39587291079469
1.57478022109723
-0.442284350603745
4.39661178526569
7.94034385633171
7.52208574723178
5.80673144943155
-3.11338531920164
6.64267230284774
-2.02996003947964];
% mu=2 sigma=3
%introduce f
f=#(x,r)-(sum((-0.5.*log(2*3.14.*(x(2))))-(((r-(x(2))).^2)./(2.*(x(1))))))
fun = #(x)f(x,r);
% starting point
x0 = [0,0];
[y,fval,exitflag,output] = fminsearch(fun,x0)
f =
#(x,r)-(sum((-0.5.*log(2*3.14.*(x(2))))-(((r-(x(2))).^2)./(2.*(x(1))))))
Exiting: Maximum number of function evaluations has been exceeded
- increase MaxFunEvals option.
Current function value: 477814.233176
y = 1×2
1.0e+-3 *
0.2501 -0.0000
fval = 4.7781e+05 + 1.5708e+01i
exitflag = 0
output =
iterations: 183
funcCount: 400
algorithm: 'Nelder-Mead simplex direct search'
message: 'Exiting: Maximum number of function evaluations has been exceeded↵ - increase MaxFunEvals option.↵ Current function value: 477814.233176 ↵'
Rewrite f as follows:
function y = g(x, r)
n = length(r);
log_part = 0.5.*n.*log(x(2).^2);
sum_part = ((sum(r-x(1))).^2)./(2.*x(2).^2);
y = log_part + sum_part;
end
Use fmincon instead of fminsearch because standard deviation is
always a positif number.
Set standard deviation lower bound to zero 0
The entire code is as follows:
%make random numbers with gaussian dist
r=[2.39587291079469
1.57478022109723
-0.442284350603745
4.39661178526569
7.94034385633171
7.52208574723178
5.80673144943155
-3.11338531920164
6.64267230284774
-2.02996003947964];
% mu=2 sigma=3
fun = #(x)g(x, r);
% starting point
x0 = [0,0];
% borns
lb = [-inf, 0];
ub = [inf, inf];
[y, fval] = fmincon(fun,x0,[],[],[],[],lb,ub, []);
function y = g(x, r)
n = length(r);
log_part = 0.5.*n.*log(x(2).^2);
sum_part = ((sum(r-x(1))).^2)./(2.*x(2).^2);
y = log_part + sum_part;
end
Solution
y = [3.0693 0.0000]
For better estimation use mle() directly
The code is quiet simple:
y = mle(r,'distribution','normal')
Solution
y = [3.0693 3.8056]

Gradient Descent Overshooting and Cost Blowing Up when used for Regularized Logistic Regression

I'm using MATLAB to code Regularized Logistic Regression and am using Gradient Descent to discover the parameters. All is based on Andrew Ng's Coursera Machine Learning course. I am trying to code the cost function from Andrew's notes/videos. I am not entirely sure if I'm doing it right.
The main problem is... if the number of iterations gets too large, my cost seems to be blowing up. This happens regardless of whether I normalize or not (converting all the data to be between 0 and 1). This problem also causes the decision boundary being produced to shrink (underfit?). Below are three sample results that were obtained, where the decision boundaries of GD are compared against that of Matlab's fminunc.
As can be seen, the cost shoots up when the number of iterations increases. Could it be that I incorrectly coded the cost? Or is there indeed a possibility that Gradient Descent can overshoot? If it helps, I am providing my code. The code I used to calculate the cost history is:
costHistory(i) = (-1 * ( (1/m) * y'*log(h_x) + (1-y)'*log(1-h_x))) + ( (lambda/(2*m)) * sum(theta(2:end).^2) );, based on the equation below:
The full code is given below. Note that I have called other functions as well in this code. Would appreciate any pointers! :) Thank you in advance!
% REGULARIZED Logistic Regression with Gradient Descent
clc; clear all; close all;
dataset = load('ex2data2.txt');
x = dataset(:,1:end-1); y = dataset(:,end); m = length(y);
% Mapping the features (includes adding the intercept term)
x = mapFeature(x(:,1), x(:,2)); % Change to polynomial of the 6th degree
% Define the initial thetas. Same as the number of features, including
% the newly added intercept term (1s)
theta = zeros(size(x,2),1) + 0.05;
initial_theta = theta; % will be used later...
% Set lambda equals to 1
lambda = 1;
% calculate theta transpose x and also the hypothesis h_x
alpha = 0.005;
itr = 120000; % number of iterations set to 120K
for i = 1:itr
ttrx = x * theta; % theta transpose x
h_x = 1 ./ (1 + exp(-ttrx)); % sigmoid hypothesis
error = h_x - y;
% the gradient a.k.a. the derivative of J(\theta)
for j = 1:length(theta)
if j == 1
gradientA(j,1) = 1/m * (error)' * x(:,j);
theta(j) = theta(j) - alpha * gradientA(j,1);
else
gradientA(j,1) = (1/m * (error)' * x(:,j)) - (lambda/m)*theta(j);
theta(j) = theta(j) - alpha * gradientA(j,1);
end
end
costHistory(i) = (-1 * ( (1/m) * y'*log(h_x) + (1-y)'*log(1-h_x))) + ( (lambda/(2*m)) * sum(theta(2:end).^2) );
end
[cost, grad] = costFunctionReg(initial_theta, x, y, lambda);
% Using MATLAB's built-in function fminunc to minimze the cost function
% Set options for fminunc
options = optimset('GradObj', 'on', 'MaxIter', 500);
% Run fminunc to obtain the optimal theta
% This function will return theta and the cost
[thetafm, cost] = fminunc(#(t)(costFunctionReg(t, x, y, lambda)), initial_theta, options);
close all;
plotDecisionBoundary_git(theta, x, y); % based on GD
plotDecisionBoundary_git(thetafm, x, y); % based on fminunc
figure;
plot(1:itr, costHistory(:), '--r');
title('The cost history based on GD');

Optimize a definite integral

In the integral
I want to optimize the function Dt, as I know the end result of the integral. I have expressions for k1 and k0 in terms of k2 and N, and it is k2 and N that I would like to optimize. They have constraints, needing to be between certain values. I have it all setup in my code, but I am just unaware of how to tell the genetic alogrithm to optimize an integral function? Is there something I'm missing here? The integral is usually evaluated numerically but I am trying to go backwards, and assuming I know an answer find the input parameters
EDIT:
All right, so here's my code. I know the integral MUST add up to a known value, and I know the value, so I need to optimize the variables with that given parameter. I have created an objective function y= integral - DT. I kept theta as syms because it is the thing being integrated to give DT.
function y = objective(k)
% Define constants
AU = astroConstants(2);
mu = astroConstants(4);
% Define start and finish parameters for the exponential sinusoid.
r1 = AU; % Initial radius
psi = pi/2; % Final polar angle of Mars/finish transfer
phi = pi/2;
r2 = 1.5*AU;
global k1
k1 = sqrt( ( (log(r1/r2) + sin(k(1)*(psi + 2*pi*k(2)))*tan(0)/k(1)) / (1-
cos(k(1)*(psi+2*pi*k(2)))) )^2 + tan(0)^2/k(1)^2 );
k0 = r1/exp(k1*sin(phi));
syms theta
R = k0*exp(k1*sin(k(1)*theta + phi));
syms theta
theta_dot = sqrt((mu/(R^3))*1/((tan(0))^2 + k1*(k(1))^2*sin(k(1)*theta +
phi) + 1));
z = 1/theta_dot;
y = int(z, theta, 0,(psi+2*pi*k(2))) - 1.3069e08;
global x
x=y;
end
my k's are constrained, and the following is the constraint function. I'm hoping what I have done here is tell it that the function MUST = 0.
function [c,c_eq] = myconstraints(k)
global k1 x
c = [norm(k1*(k(1)^2))-1 -norm(k1*(k(1)^2))];
c_eq =[x];
end
And finally, my ga code looks like this. Honestly, I've been playing with it all night and getting error messages after error messages - ranging from "constraint function must return real value" to "error in fcnvectorizer" and "unable to convert expression into double array", with the last two coming after i've removed the constraints.
clc; clear;
ObjFcn = #objective;
nvars = 2
LB = [0 2];
UB = [1 7];
ConsFcn = #myconstraints;
[k,fval] = ga(ObjFcn,nvars,[],[],[],[],LB,UB,ConsFcn);
I've been stuck on this problem for weeks and have gotten nowhere, even with searching through literature.

Gradient descent in linear regression goes wrong

I actually want to use a linear model to fit a set of 'sin' data, but it turns out the loss function goes larger during each iteration. Is there any problem with my code below ? (gradient descent method)
Here is my code in Matlab
m=20;
rate = 0.1;
x = linspace(0,2*pi,20);
x = [ones(1,length(x));x]
y = sin(x);
w = rand(1,2);
for i=1:500
h = w*x;
loss = sum((h-y).^2)/m/2
total_loss = [total_loss loss];
**gradient = (h-y)*x'./m ;**
w = w - rate.*gradient;
end
Here is the data I want to fit
There isn't a problem with your code. With your current framework, if you can define data in the form of y = m*x + b, then this code is more than adequate. I actually ran it through a few tests where I define an equation of the line and add some Gaussian random noise to it (amplitude = 0.1, mean = 0, std. dev = 1).
However, one problem I will mention to you is that if you take a look at your sinusoidal data, you define a domain between [0,2*pi]. As you can see, you have multiple x values that get mapped to the same y value but of different magnitude. For example, at x = pi/2 we get 1 but at x = -3*pi/2 we get -1. This high variability will not bode well with linear regression, and so one suggestion I have is to restrict your domain... so something like [0, pi]. Another reason why it probably doesn't converge is the learning rate you chose is too high. I'd set it to something low like 0.01. As you mentioned in your comments, you already figured that out!
However, if you want to fit non-linear data using linear regression, you're going to have to include higher order terms to account for the variability. As such, try including second order and/or third order terms. This can simply be done by modifying your x matrix like so:
x = [ones(1,length(x)); x; x.^2; x.^3];
If you recall, the hypothesis function can be represented as a summation of linear terms:
h(x) = theta0 + theta1*x1 + theta2*x2 + ... + thetan*xn
In our case, each theta term would build a higher order term of our polynomial. x2 would be x^2 and x3 would be x^3. Therefore, we can still use the definition of gradient descent for linear regression here.
I'm also going to control the random generation seed (via rng) so that you can produce the same results I have gotten:
clear all;
close all;
rng(123123);
total_loss = [];
m = 20;
x = linspace(0,pi,m); %// Change
y = sin(x);
w = rand(1,4); %// Change
rate = 0.01; %// Change
x = [ones(1,length(x)); x; x.^2; x.^3]; %// Change - Second and third order terms
for i=1:500
h = w*x;
loss = sum((h-y).^2)/m/2;
total_loss = [total_loss loss];
% gradient is now in a different expression
gradient = (h-y)*x'./m ; % sum all in each iteration, it's a batch gradient
w = w - rate.*gradient;
end
If we try this, we get for w (your parameters):
>> format long g;
>> w
w =
Columns 1 through 3
0.128369521905694 0.819533906064327 -0.0944622478526915
Column 4
-0.0596638117151464
My final loss after this point is:
loss =
0.00154350916582836
This means that our equation of the line is:
y = 0.12 + 0.819x - 0.094x^2 - 0.059x^3
If we plot this equation of the line with your sinusoidal data, this is what we get:
xval = x(2,:);
plot(xval, y, xval, polyval(fliplr(w), xval))
legend('Original', 'Fitted');

Matlab Finding Damped Sine wave decay factor for a given frequency

how to find decay constant value of a damped sine wave for a given frequency in matlab?
t=0:1e-6:0.1;
f= 500000;
y=sin(2*pi*f*t).*exp(-d*t);
i want to solve above equation for "d"
You first need to find the envelope of your oscillating function (the function that amplitude-modulates your sine). This can be done e.g. by rectifying the signal and then low-pass-filtering, but I choose to do a quick and dirty running maximum. After you found the envelope, there are various ways to fit an exponential function. I again chose a quick&dirty trick to do a first order polyfit to the log of the envelope. The code below works for the simple example you gave, but is might not work if you have an offset, if you choose n wrong, etc. It also won't give the best possible result in case of a noisy measurement.
fsamp = 1e5;
tmax = 0.1;
t=0:1/fsamp:tmax;
f = 12e3; %should be smaller than fsamp/2!
tau = 0.0765;
y=sin(2 * pi * f * t) .* exp(-t / tau);
%calculate running maximum
n = 20; %number of points to take max over
nblocks = floor(length(t) / n);
trun = mean(reshape(t(1:n*nblocks), n, nblocks), 1); %n-point mean
envelope = max(reshape(y(1:n*nblocks), n, nblocks), [], 1); %n-point max
%quick and dirty exponential fit, not the proper way in case of noise
p = polyfit(trun, log(envelope), 1);
tau_fit = -1/p(1);
k_fit = exp(p(2));
plot(t, y, trun, envelope, 'or', t, k_fit * exp(-t / tau_fit), '-k')
title(sprintf('tau = %g', tau))
Note that with exponential decay, it is more common to define the time-constant tau = 1 / d.