I am in week 5 of Andrew Ng's Machine Learning Course on Coursera. I am working through the programming assignment in Matlab for this week, and I chose to use a for loop implementation to compute the cost J. Here is my function.
function [J grad] = nnCostFunction(nn_params, ...
input_layer_size, ...
hidden_layer_size, ...
num_labels, ...
X, y, lambda)
%NNCOSTFUNCTION Implements the neural network cost function for a two layer
%neural network which performs classification
% [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...
% X, y, lambda) computes the cost and gradient of the neural network. The
% parameters for the neural network are "unrolled" into the vector
% nn_params and need to be converted back into the weight matrices.
% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices
% for our 2 layer neural network
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
num_labels, (hidden_layer_size + 1));
% Setup some useful variables
m = size(X, 1);
% add bias to X to create 5000x401 matrix
X = [ones(m, 1) X];
% You need to return the following variables correctly
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));
% initialize summing terms used in cost expression
sum_i = 0.0;
% loop through each sample to calculate the cost
for i = 1:m
% logical vector output for 1 example
y_i = zeros(num_labels, 1);
class = y(m);
y_i(class) = 1;
% first layer just equals features in one example 1x401
a1 = X(i, :);
% compute z2, a 25x1 vector
z2 = Theta1*a1';
% compute activation of z2
a2 = sigmoid(z2);
% add bias to a2 to create a 26x1 vector
a2 = [1; a2];
% compute z3, a 10x1 vector
z3 = Theta2*a2;
%compute activation of z3. returns output vector of size 10x1
a3 = sigmoid(z3);
h = a3;
% loop through each class k to sum cost over each class
for k = 1:num_labels
% sum_i returns cost summed over each class
sum_i = sum_i + ((-1*y_i(k) * log(h(k))) - ((1 - y_i(k)) * log(1 - h(k))));
end
end
J = sum_i/m;
I understand that a vectorized implementaion of this would be easier, but I do not understand why this implementation is wrong. When num_labels = 10, this function outputs J = 8.47, but the expected cost is 0.287629. I computed J from this formula. Am I misunderstanding the computation? My understanding is that each training example's cost for each of the 10 classes are computed then the cost for all 10 classes for each example are summed together. Is that incorrect? Or did I not implement this in my code properly? Thanks in advance.
the problem is in the formula you are implementing
this expression ((-1*y_i(k) * log(h(k))) - ((1 - y_i(k)) * log(1 - h(k)))); represent the loss in case in binary classification because you were simply have 2 classes so either
y_i is 0 so (1 - yi) = 1
y_i is 1 so (1 - yi) = 0
so you basically take into account only the target class probability.
how ever in case of 10 labels as you mention (y_i) or (1 - yi) not necessary of one of them to be 0 and the other to be 1
you should correct the loss function implementation so that you only take into account the probability of the target class only not all other classes.
My problem is with indexing. Rather than saying class = y(m) it should be class = y(i) since i is the index and m is 5000 from the number of rows in the training data.
I have to write this piece of code for the lrcostfunction assignment in the Machine Learning course in coursera. But I still don't understand why
theta1 = [0 ; theta(2:end, :)];
is written? theta1 means what?
h = sigmoid(X * theta)
theta1 = [0 ; theta(2:end, :)];
p = lambda * (theta1' * theta1)/(2 * m);
J = ((-y)'*log(h)-(1-y)'*log(1-h))/m + p;
grad = (X' * (h - y) + lambda * theta1)/ m;
In logistic regression, theta (θ) is a vector representing the parameters (or weights) of the linear function of x.
Now, given a training set, one method to learn the parameters theta (θ) is to be to make h(x) close to y, at least for the training examples we have. This is defined using a cost function or the error function (J(θ)), for each value of the θ, which we want to minimize.
The first theta1 parameter is initialized as zero. Later using gradient descent, next theta parameter is computed. In gradient descent, the J(θ) parameter is calculated using partial differentiation as we want to minimize it.
Here \alpha is learning rate with which gradient descent algorithm runs. It starts with an initial value in the array - theta1 as zero and then, next value is calculated using the above equation. and so on for other theta parameters.
EDIT:
Explaining the code:
theta1 = [0 ; theta(2:end, :)];
The above code is MATLAB code. Here theta1 is an Array (vector or matrix representation). It is created using horizontal concatenation of two fields.
1) 0
2) theta(2:end, :)
First, is a scalar value 0
Second, this means that take all values as it is, except the first row from the array theta. (Note theta is input array to LRCOSTFUNCTION(theta, X, y, lambda))
I am having trouble with my code that is meant to provide a cost function for my neural network. The cost function (J) is defined as the cost function. Given sample inputs, the cost function returns a negative value that is about 5 times less than the expected value. I have worked on this issue for a few hours but still cannot get the desired value.
Thanks in advance.
function [J grad] = nnCostFunction(nn_params, ...
input_layer_size, ...
hidden_layer_size, ...
num_labels, ...
X, y, lambda)
%NNCOSTFUNCTION Implements the neural network cost function for a two layer
%neural network which performs classification
% [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...
% X, y, lambda) computes the cost and gradient of the neural network. The
% parameters for the neural network are "unrolled" into the vector
% nn_params and need to be converted back into the weight matrices.
%
% The returned parameter grad should be a "unrolled" vector of the
% partial derivatives of the neural network.
%
% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices
% for our 2 layer neural network
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
num_labels, (hidden_layer_size + 1));
% Setup some useful variables
m = size(X, 1);
% You need to return the following variables correctly
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));
% ====================== YOUR CODE HERE ======================
% Instructions: You should complete the code by working through the
% following parts.
%
% Part 1: Feedforward the neural network and return the cost in the
one=ones(1,10);
temp=one
one=transpose(temp);
sizeTheta1=size(Theta1);
sizeTheta2=size(Theta2);
for i=1:m
if y(i)==1
yVec=[1,0,0,0,0,0,0,0,0,0];
end
if y(i)==2
yVec=[0,1,0,0,0,0,0,0,0,0];
end
if y(i)==3
yVec=[0,0,1,0,0,0,0,0,0,0];
end
if y(i)==4
yVec=[0,0,0,1,0,0,0,0,0,0];
end
if y(i)==5
yVec=[0,0,0,0,1,0,0,0,0,0];
end
if y(i)==6
yVec=[0,0,0,0,0,1,0,0,0,0];
end
if y(i)==7
yVec=[1,0,0,0,0,0,1,0,0,0];
end
if y(i)==8
yVec=[1,0,0,0,0,0,0,1,0,0];
end
if y(i)==9
yVec=[0,0,0,0,0,0,0,0,1,0];
end
if y(i)==10
yVec=[0,0,0,0,0,0,0,0,0,1];
end
xVec=transpose(X(i,:));
term1=transpose(-yVec).*(log(sigmoid(Theta2(1:10,1:sizeTheta2(2)-1))*(sigmoid(Theta1(1:25,1:sizeTheta1(2)-1)*xVec))));
term2=(one-transpose(yVec)).*(log(one-(sigmoid(Theta2(1:10,1:sizeTheta2(2)-1)*(sigmoid(Theta1(1:25,1:sizeTheta1(2)-1)*xVec))))));
J=J+(term1-term2);
end
regTheta1=0;
regTheta2=0;
J=sum(sum(J))*(1/m);
regTheta1=(sum(sum(Theta1.*Theta1)));
regTheta2=(sum(sum(Theta2.*Theta2)));
J=J+((lambda)*(regTheta1+regTheta2))/(2*m);
% -------------------------------------------------------------
% =========================================================================
% Unroll gradients
grad = [Theta1_grad(:) ; Theta2_grad(:)];
end
I am trying to write a code for classification of data. I try to implement a sigmoid function and then I try to use that function in calculation the cost.I keep getting errors and I have a feeling that it is because of the sigmoid function.I would like the sigmoid function to return a vector.But it keeps returning a scalar.
function g = sigmoid(z)
%SIGMOID Compute sigmoid functoon
% J = SIGMOID(z) computes the sigmoid of z.
% You need to return the following variables correctly
g=zeros(size(z));
m=ones(size(z));
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the sigmoid of each value of z (z can be a matrix,
% vector or scalar).
g=1/(m+exp(-z));
This is my cost function:
m = length(y); % number of training examples
% You need to return the following variables correctly
grad=(1/m)*((X*(sigmoid(X*theta)-y)));//this is the derivative in gradient descent
J=(1/m)*(-(transpose(y)*log(sigmoid((X*theta))))-(transpose(1-y)*log(sigmoid((X*theta)))));//this is the cost function
the dimension of X are 100,4; of theta are 4,1;y is 100,1.
THank you.
Errors:
Program paused. Press enter to continue.
sigmoid answer: 0.500000Error using -
Matrix dimensions must agree.
Error in costFunction (line 11)
grad=(1/m)*((X*(sigmoid(X*theta)-y)));
Error in ex2 (line 69)
[cost, grad] = costFunction(initial_theta, X, y);
Please replace g=1/(m+exp(-z)); with g=1./(m+exp(-z)); in your method sigmoid
z = [2,3,4;5,6,7] ;
%SIGMOID Compute sigmoid functoon
% J = SIGMOID(z) computes the sigmoid of z.
% You need to return the following variables correctly
g=zeros(size(z));
m=ones(size(z));
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the sigmoid of each value of z (z can be a matrix,
% vector or scalar).
g=1./(m+exp(-z));
Cost function
I am trying to code the above expression in Matlab. Unfortunately I seem to be getting a cost of 10.441460 instead of 0.287629 so I'm out by a factor of over 36!
As for each of the symbols:
m is the number of training examples. [a scalar number]
K is the number of output nodes. [a scalar number]
y is the vector of training outputs [an m by 1 vector]
y^{(i)}_{k} is the ith training output (target) for the kth output node.
[a scalar number]
x^{(i)} is the ith training input. [a column vector for all the input
nodes]
h_{\theta}(x^{(i)})_{k} is the value of the hypothesis at output k, with
weights theta, and training input i. [a scalar number]
note: h_{\theta}(x^{(i)}) will be a column vector with K rows.
My attempt for the cost function:
Theta1 = [ones(1,size(Theta1,2));Theta1];
X = [ones(m,1) , X]; %Add a column of 1's to X
R=zeros(m,1);
for i = 1:m
a = y(i) == [10 1:9];
R(i) = -(a*(log(sigmoid(Theta2*(sigmoid(Theta1*X(i,:)'))))) + (1-a)*(log(1-sigmoid(Theta2*(sigmoid(Theta1*X(i,:)'))))))/m;
end
J = sum(R);
This will probably be useful for reference:
function [J grad] = nnCostFunction(nn_params, ...
input_layer_size, ...
hidden_layer_size, ...
num_labels, ...
X, y, lambda)
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
num_labels, (hidden_layer_size + 1));
% Setup some useful variables
m = size(X, 1);
% ====================== YOUR CODE HERE ======================
% Instructions: You should complete the code by working through the
% following parts.
%
% Part 1: Feedforward the neural network and return the cost in the
% variable J. After implementing Part 1, you can verify that your
% cost function computation is correct by verifying the cost
% computed in ex4.m
%
% =========================================================================
end