I have to write this piece of code for the lrcostfunction assignment in the Machine Learning course in coursera. But I still don't understand why
theta1 = [0 ; theta(2:end, :)];
is written? theta1 means what?
h = sigmoid(X * theta)
theta1 = [0 ; theta(2:end, :)];
p = lambda * (theta1' * theta1)/(2 * m);
J = ((-y)'*log(h)-(1-y)'*log(1-h))/m + p;
grad = (X' * (h - y) + lambda * theta1)/ m;
In logistic regression, theta (θ) is a vector representing the parameters (or weights) of the linear function of x.
Now, given a training set, one method to learn the parameters theta (θ) is to be to make h(x) close to y, at least for the training examples we have. This is defined using a cost function or the error function (J(θ)), for each value of the θ, which we want to minimize.
The first theta1 parameter is initialized as zero. Later using gradient descent, next theta parameter is computed. In gradient descent, the J(θ) parameter is calculated using partial differentiation as we want to minimize it.
Here \alpha is learning rate with which gradient descent algorithm runs. It starts with an initial value in the array - theta1 as zero and then, next value is calculated using the above equation. and so on for other theta parameters.
EDIT:
Explaining the code:
theta1 = [0 ; theta(2:end, :)];
The above code is MATLAB code. Here theta1 is an Array (vector or matrix representation). It is created using horizontal concatenation of two fields.
1) 0
2) theta(2:end, :)
First, is a scalar value 0
Second, this means that take all values as it is, except the first row from the array theta. (Note theta is input array to LRCOSTFUNCTION(theta, X, y, lambda))
Related
I am having some difficulties in implementing logistic regression, in terms of how should I should proceed stepwise. According to what I have done so far I am implementing it in the following way:
First taking theta equal to the number of features and making it a n*1 vector of zeros. Now using this theta to compute the the following
htheta = sigmoid(theta' * X');
theta = theta - (alpha/m) * sum (htheta' - y)'*X
Now using the theta computed in the first step to compute the cost function
J= 1/m *((sum(-y*log(htheta))) - (sum((1-y) * log(1 - htheta)))) + lambda/(2*m) * sum(theta).^2
In the end computing the gradient
grad = (1/m) * sum ((sigmoid(X*theta) - y')*X);
As i am taking theta to be zero. I am getting same value of J throughout the vector, is this the right output?
You are computing the gradient in the last step, while it has been computed before in the computation of the new theta. Moreover, your definition of the cost function contains a regularization parameter, but this is not incorporated in the gradient computation. A working version without the regularization:
% generate dummy data for testing
y=randi(2,[10,1])-1;
X=[ones(10,1) randn([10,1])];
% initialize
alpha = 0.1;
theta = zeros(1,size(X,2));
J = NaN(100,1);
% loop a fixed number of times => can improve this by stopping when the
% cost function no longer decreases
htheta = sigmoid(X*theta');
for n=1:100
grad = X' * (htheta-y); % gradient
theta = theta - alpha*grad'; % update theta
htheta = sigmoid(X*theta');
J(n) = sum(-y'*log(htheta)) - sum((1-y)' * log(1 - htheta)); % cost function
end
If you now plot the cost function, you will see (except for randomness) that it converges after about 15 iterations.
I am trying to find the coefficients in an equation to model the step response of a motor which is of the form 1-e^x. The equation I'm using to model is of the form
a(1)*t^2 + a(2)*t^3 + a(3)*t^3 + ...
(It is derived in a research paper used to solve for motor parameters)
Sometimes using fminunc to find the coefficients works out okay, and I get a good result, and it matches the training data fairly well. Other times the returned coefficients are horrible (going extremely higher than what the output should be and is orders of magnitude off). This especially happens once I started using higher order terms: using any model that uses x^8 or higher (x^9, x^10, x^11, etc.) always produces bad results.
Since it works sometimes, I can't think why my implementation would be wrong. I have tried fminunc while providing the gradients and while also not providing the gradients yet there is no difference. I've looked into using other functions to solve for the coefficients, like polyfit, but in that instance it has to have terms that are raised from 1 to the highest order term, but the model I'm using has its lowest power at 2.
Here is the main code:
clear;
%Overall Constants
max_power = 7;
%Loads in data
%data = load('TestData.txt');
load testdata.mat
%Sets data into variables
indep_x = data(:,1); Y = data(:,2);
%number of data points
m = length(Y);
%X is a matrix with the independant variable
exps = [2:max_power];
X_prime = repmat(indep_x, 1, max_power-1); %Repeats columns of the indep var
X = bsxfun(#power, X_prime, exps);
%Initializes theta to rand vals
init_theta = rand(max_power-1,1);
%Sets up options for fminunc
options = optimset( 'MaxIter', 400, 'Algorithm', 'quasi-newton');
%fminunc minimizes the output of the cost function by changing the theta paramaeters
[theta, cost] = fminunc(#(t)(costFunction(t, X, Y)), init_theta, options)
%
Y_line = X * theta;
figure;
hold on; plot(indep_x, Y, 'or');
hold on; plot(indep_x, Y_line, 'bx');
And here is costFunction:
function [J, Grad] = costFunction (theta, X, Y)
%# of training examples
m = length(Y);
%Initialize Cost and Grad-Vector
J = 0;
Grad = zeros(size(theta));
%Poduces an output based off the current values of theta
model_output = X * theta;
%Computes the squared error for each example then adds them to get the total error
squared_error = (model_output - Y).^2;
J = (1/(2*m)) * sum(squared_error);
%Computes the gradients for each theta t
for t = 1:size(theta, 1)
Grad(t) = (1/m) * sum((model_output-Y) .* X(:, t));
end
endfunction
Any help or advice would be appreciated.
Try adding regularization to your costFunction:
function [J, Grad] = costFunction (theta, X, Y, lambda)
m = length(Y);
%Initialize Cost and Grad-Vector
J = 0;
Grad = zeros(size(theta));
%Poduces an output based off the current values of theta
model_output = X * theta;
%Computes the squared error for each example then adds them to get the total error
squared_error = (model_output - Y).^2;
J = (1/(2*m)) * sum(squared_error);
% Regularization
J = J + lambda*sum(theta(2:end).^2)/(2*m);
%Computes the gradients for each theta t
regularizator = lambda*theta/m;
% overwrite 1st element i.e the one corresponding to theta zero
regularizator(1) = 0;
for t = 1:size(theta, 1)
Grad(t) = (1/m) * sum((model_output-Y) .* X(:, t)) + regularizator(t);
end
endfunction
The regularization term lambda is used to control the learning rate. Start with lambda=1. The grater the value for lambda, the slower the learning will occur. Increase lambda if the behavior you describe persists. You may need to increase the number of iterations if lambda gets high.
You may also consider normalization of your data, and some heuristic for initializing theta - setting all theta to 0.1 may be better than random. If nothing else it'll provide better reproducibility from training to training.
I am implementing a batch gradient descent on Matlab. I have a problem with the update step of theta.
theta is a vector of two components (two rows).
X is a matrix containing m rows (number of training samples) and n=2 columns (number of features).
Y is an m rows vector.
During the update step, I need to set each theta(i) to
theta(i) = theta(i) - (alpha/m)*sum((X*theta-y).*X(:,i))
This can be done with a for loop, but I can't figure out how to vectorize it (because of the X(:,i) term).
Any suggestion?
Looks like you are trying to do a simple matrix multiplication, the thing MATLAB is supposedly best at.
theta = theta - (alpha/m) * (X' * (X*theta-y));
In addition to the answer given by Mad Physicist, the following can also be applied.
theta = theta - (alpha/m) * sum( (X * theta - y).* X )';
I should write a MATLAB function that takes a first order ordinary differential equation in form y’(t) = a*y(t) +b with an initial point y(t0)=y0 as inputs and calculates first 15 points of the solution. Also draws the solution curve for first 15 points.
And the equation that we want to solve is ;y’(t) = 4*y(t)+1 with the initial point y(0)=0.
For this function I wrote the bellowing code but this gives me an error about y. How should I implement the euler function correctly? And also I could not determine how I can draw the solution curves..
function E=euler(f,y)
%Input - f is the function entered as a string 'f'
% - a and b are the left and right endpoints
% - ya is the initial condition y(a)
% - M is the number of steps
%Output - E=[T' Y'] where T is the vector of abscissas and
% Y is the vector of ordinates
h=0.1;
y(0)=0;
for j=0:15
Y(j+1)=Y(j)+h*feval(4*(y(t)+1));
end
Patch:
h = 0.1;
y(1) = 0;
for j = 1:16
Y(j + 1) = Y(j) + h * feval(4 * (y(t - 1) + 1));
end
Well, I am not sure about the mathematical part, but - The indices need to start at "1". Other then e.g. in C, you must not use "0" as an index.
Given a differential equation:
y[n] - 0.9y[n-1] + 0.81y[n-2] = x[n] - x[n-2]
a. Find the impulse response for h[n], n=0,1,2 using recursion.
b. Find the impulse response using MATLAB command filter.
I understand that this is homework, so I will try to give you guidelines without actually giving away the answer completely:
Using recursion
This is actually quite simple, because the differential equation contains the body of the recursive function almost entirely: y[n] = 0.9y[n-1] - 0.81y[n-2] + x[n] - x[n-2]
The parts in bold are actually the recursive calls! What you need to do is to build a function (let's call it func) that receives x and n, and calculates y[n]:
function y = func(x, n)
if (n < 0)
%# Handling for edge case n<0
return 0
else if (n == 0)
%# Handling for edge case n=0
return x(0)
else
%# The recursive loop
return 0.9 * func(x, n-1) - 0.81 * func(x, n-2) + x(n) - x(n-2)
end
Note that it's pseudo-code, so you still have to check the edge cases and deal with the indexation (indices in MATLAB start with 1 and not 0!).
Using filters
The response of a digital filter is actually the y[n] that you're looking for. As you probably know from lesson, the coefficients of that filter would be the coefficients specified in the differential equation. MATLAB has a built-in function filter that emulates just that, so if you write:
B = [1, 0, 1]; %# Coefficients for x
A = [1, 0.9, -0.81]; %# Coefficients for y
y = filter(B, A, x);
You'd get an output vector which holds all the values of y[n].
a=[1 -0.9 0.81]
b=[1 -1]
impz(b,a,50)