Implementing a neural network to figure out its cost - matlab

Cost function
I am trying to code the above expression in Matlab. Unfortunately I seem to be getting a cost of 10.441460 instead of 0.287629 so I'm out by a factor of over 36!
As for each of the symbols:
m is the number of training examples. [a scalar number]
K is the number of output nodes. [a scalar number]
y is the vector of training outputs [an m by 1 vector]
y^{(i)}_{k} is the ith training output (target) for the kth output node.
[a scalar number]
x^{(i)} is the ith training input. [a column vector for all the input
nodes]
h_{\theta}(x^{(i)})_{k} is the value of the hypothesis at output k, with
weights theta, and training input i. [a scalar number]
note: h_{\theta}(x^{(i)}) will be a column vector with K rows.
My attempt for the cost function:
Theta1 = [ones(1,size(Theta1,2));Theta1];
X = [ones(m,1) , X]; %Add a column of 1's to X
R=zeros(m,1);
for i = 1:m
a = y(i) == [10 1:9];
R(i) = -(a*(log(sigmoid(Theta2*(sigmoid(Theta1*X(i,:)'))))) + (1-a)*(log(1-sigmoid(Theta2*(sigmoid(Theta1*X(i,:)'))))))/m;
end
J = sum(R);
This will probably be useful for reference:
function [J grad] = nnCostFunction(nn_params, ...
input_layer_size, ...
hidden_layer_size, ...
num_labels, ...
X, y, lambda)
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
num_labels, (hidden_layer_size + 1));
% Setup some useful variables
m = size(X, 1);
% ====================== YOUR CODE HERE ======================
% Instructions: You should complete the code by working through the
% following parts.
%
% Part 1: Feedforward the neural network and return the cost in the
% variable J. After implementing Part 1, you can verify that your
% cost function computation is correct by verifying the cost
% computed in ex4.m
%
% =========================================================================
end

Related

Cost function computation for neural network

I am in week 5 of Andrew Ng's Machine Learning Course on Coursera. I am working through the programming assignment in Matlab for this week, and I chose to use a for loop implementation to compute the cost J. Here is my function.
function [J grad] = nnCostFunction(nn_params, ...
input_layer_size, ...
hidden_layer_size, ...
num_labels, ...
X, y, lambda)
%NNCOSTFUNCTION Implements the neural network cost function for a two layer
%neural network which performs classification
% [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...
% X, y, lambda) computes the cost and gradient of the neural network. The
% parameters for the neural network are "unrolled" into the vector
% nn_params and need to be converted back into the weight matrices.
% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices
% for our 2 layer neural network
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
num_labels, (hidden_layer_size + 1));
% Setup some useful variables
m = size(X, 1);
% add bias to X to create 5000x401 matrix
X = [ones(m, 1) X];
% You need to return the following variables correctly
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));
% initialize summing terms used in cost expression
sum_i = 0.0;
% loop through each sample to calculate the cost
for i = 1:m
% logical vector output for 1 example
y_i = zeros(num_labels, 1);
class = y(m);
y_i(class) = 1;
% first layer just equals features in one example 1x401
a1 = X(i, :);
% compute z2, a 25x1 vector
z2 = Theta1*a1';
% compute activation of z2
a2 = sigmoid(z2);
% add bias to a2 to create a 26x1 vector
a2 = [1; a2];
% compute z3, a 10x1 vector
z3 = Theta2*a2;
%compute activation of z3. returns output vector of size 10x1
a3 = sigmoid(z3);
h = a3;
% loop through each class k to sum cost over each class
for k = 1:num_labels
% sum_i returns cost summed over each class
sum_i = sum_i + ((-1*y_i(k) * log(h(k))) - ((1 - y_i(k)) * log(1 - h(k))));
end
end
J = sum_i/m;
I understand that a vectorized implementaion of this would be easier, but I do not understand why this implementation is wrong. When num_labels = 10, this function outputs J = 8.47, but the expected cost is 0.287629. I computed J from this formula. Am I misunderstanding the computation? My understanding is that each training example's cost for each of the 10 classes are computed then the cost for all 10 classes for each example are summed together. Is that incorrect? Or did I not implement this in my code properly? Thanks in advance.
the problem is in the formula you are implementing
this expression ((-1*y_i(k) * log(h(k))) - ((1 - y_i(k)) * log(1 - h(k)))); represent the loss in case in binary classification because you were simply have 2 classes so either
y_i is 0 so (1 - yi) = 1
y_i is 1 so (1 - yi) = 0
so you basically take into account only the target class probability.
how ever in case of 10 labels as you mention (y_i) or (1 - yi) not necessary of one of them to be 0 and the other to be 1
you should correct the loss function implementation so that you only take into account the probability of the target class only not all other classes.
My problem is with indexing. Rather than saying class = y(m) it should be class = y(i) since i is the index and m is 5000 from the number of rows in the training data.

Neural network Cost function in Andrew Ng's Lecture

I am having trouble with my code that is meant to provide a cost function for my neural network. The cost function (J) is defined as the cost function. Given sample inputs, the cost function returns a negative value that is about 5 times less than the expected value. I have worked on this issue for a few hours but still cannot get the desired value.
Thanks in advance.
function [J grad] = nnCostFunction(nn_params, ...
input_layer_size, ...
hidden_layer_size, ...
num_labels, ...
X, y, lambda)
%NNCOSTFUNCTION Implements the neural network cost function for a two layer
%neural network which performs classification
% [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...
% X, y, lambda) computes the cost and gradient of the neural network. The
% parameters for the neural network are "unrolled" into the vector
% nn_params and need to be converted back into the weight matrices.
%
% The returned parameter grad should be a "unrolled" vector of the
% partial derivatives of the neural network.
%
% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices
% for our 2 layer neural network
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
num_labels, (hidden_layer_size + 1));
% Setup some useful variables
m = size(X, 1);
% You need to return the following variables correctly
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));
% ====================== YOUR CODE HERE ======================
% Instructions: You should complete the code by working through the
% following parts.
%
% Part 1: Feedforward the neural network and return the cost in the
one=ones(1,10);
temp=one
one=transpose(temp);
sizeTheta1=size(Theta1);
sizeTheta2=size(Theta2);
for i=1:m
if y(i)==1
yVec=[1,0,0,0,0,0,0,0,0,0];
end
if y(i)==2
yVec=[0,1,0,0,0,0,0,0,0,0];
end
if y(i)==3
yVec=[0,0,1,0,0,0,0,0,0,0];
end
if y(i)==4
yVec=[0,0,0,1,0,0,0,0,0,0];
end
if y(i)==5
yVec=[0,0,0,0,1,0,0,0,0,0];
end
if y(i)==6
yVec=[0,0,0,0,0,1,0,0,0,0];
end
if y(i)==7
yVec=[1,0,0,0,0,0,1,0,0,0];
end
if y(i)==8
yVec=[1,0,0,0,0,0,0,1,0,0];
end
if y(i)==9
yVec=[0,0,0,0,0,0,0,0,1,0];
end
if y(i)==10
yVec=[0,0,0,0,0,0,0,0,0,1];
end
xVec=transpose(X(i,:));
term1=transpose(-yVec).*(log(sigmoid(Theta2(1:10,1:sizeTheta2(2)-1))*(sigmoid(Theta1(1:25,1:sizeTheta1(2)-1)*xVec))));
term2=(one-transpose(yVec)).*(log(one-(sigmoid(Theta2(1:10,1:sizeTheta2(2)-1)*(sigmoid(Theta1(1:25,1:sizeTheta1(2)-1)*xVec))))));
J=J+(term1-term2);
end
regTheta1=0;
regTheta2=0;
J=sum(sum(J))*(1/m);
regTheta1=(sum(sum(Theta1.*Theta1)));
regTheta2=(sum(sum(Theta2.*Theta2)));
J=J+((lambda)*(regTheta1+regTheta2))/(2*m);
% -------------------------------------------------------------
% =========================================================================
% Unroll gradients
grad = [Theta1_grad(:) ; Theta2_grad(:)];
end

Why does my Markov cluster algorithm (MCL) produce NaN as a result in Matlab?

I have tried the Markov cluster algorithm (MCL) in Matalb but sadly I got a matrix of size 67*67 and all their elements have NaN.
Could anyone tell me what's wrong?
function adjacency_matrixmod2= mcl(adjacency_matrixmod2)
% test the explanations in stijn van dongens thesis.
%
% #author gregor :: arbylon . net
if nargin < 1
% m contains T(G3 + I) as stochastic matrix
load -ascii adjacency_matrixmod2.txt
end
p = 2;
minval = 0.001;
e = 1.;
emax = 0.001;
while e > emax
fprintf('iteration %i before expansion:\n', i);
adjacency_matrixmod2
fprintf('iteration %i after expansion/before inflation:\n', i);
m2 = expand(adjacency_matrixmod2)
fprintf('inflation:\n')
[adjacency_matrixmod2, e] = inflate(m2, p, minval);
fprintf('residual energy: %f\n', e);
end % while e
end % mcl
% expand by multiplying m * m
% this preserves column (or row) normalisation
function m2 = expand(adjacency_matrixmod2)
m2 = adjacency_matrixmod2 *adjacency_matrixmod2;
end
% inflate by Hadamard potentiation
% and column re-normalisation
% prune elements of m that are below minval
function [m2, energy] = inflate(adjacency_matrixmod2, p, minval)
% inflation
m2 = adjacency_matrixmod2 .^ p;
% pruning
m2(find(m2 < minval)) = 0;
% normalisation
dinv = diag(1./sum(m2));
m2 = m2 * dinv;
% calculate residual energy
maxs = max(m2);
sqsums = sum(m2 .^ 2);
energy = max(maxs - sqsums);
end
This is the code I have used, and its input is an adjacency matrix.

Matlab Regularized Logistic Regression - how to compute gradient

I am currently taking Machine Learning on the Coursera platform and I am trying to implement Logistic Regression. To implement Logistic Regression, I am using gradient descent to minimize the cost function and I am to write a function called costFunctionReg.m that returns both the cost and the gradient of each parameter evaluated at the current set of parameters.
The problem is better described below:
My cost function is working, but the gradient function is not. Please note that I would prefer to implement this using looping, rather than element-by-element operations.
I am computing theta[0] (in MATLAB, theta(1)) separately as it is not being regularized, i.e. we do not use the first term (with lambda).
function [J, grad] = costFunctionReg(theta, X, y, lambda)
%COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
% J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
% theta as the parameter for regularized logistic regression and the
% gradient of the cost w.r.t. to the parameters.
% Initialize some useful values
m = length(y); % number of training examples
n = length(theta); %number of parameters (features)
% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
% You should set J to the cost.
% Compute the partial derivatives and set grad to the partial
% derivatives of the cost w.r.t. each parameter in theta
% ----------------------1. Compute the cost-------------------
%hypothesis
h = sigmoid(X * theta);
for i = 1 : m
% The cost for the ith term before regularization
J = J - ( y(i) * log(h(i)) ) - ( (1 - y(i)) * log(1 - h(i)) );
% Adding regularization term
for j = 2 : n
J = J + (lambda / (2*m) ) * ( theta(j) )^2;
end
end
J = J/m;
% ----------------------2. Compute the gradients-------------------
%not regularizing theta[0] i.e. theta(1) in matlab
j = 1;
for i = 1 : m
grad(j) = grad(j) + ( h(i) - y(i) ) * X(i,j);
end
for j = 2 : n
for i = 1 : m
grad(j) = grad(j) + ( h(i) - y(i) ) * X(i,j) + lambda * theta(j);
end
end
grad = (1/m) * grad;
% =============================================================
end
What am I doing wrong?
The way you are applying regularization is incorrect. You add regularization after you sum over all training examples but instead you are adding regularization after each example. If you left your code as it was before the correction, you are inadvertently making the gradient step larger and will eventually overshoot the solution. This overshooting will accumulate and will inevitably give you a gradient vector of Inf or -Inf for all components (except for the bias term).
Simply put, place your lambda*theta(j) statement after the second for loop terminates:
for j = 2 : n
for i = 1 : m
grad(j) = grad(j) + ( h(i) - y(i) ) * X(i,j); % Change
end
grad(j) = grad(j) + lambda * theta(j); % Change
end

Simulink simulation behaving massively different than the same simulation using rk4

I have difficulties simulating an object discribed by the following state space equations in simulink:
The right hand side of the state space equation is described by the funcion below.
function dxdt = RHS( t, x, F)
% parameters
b = 1.22; % cart friction coeffitient
c = 0.0027; %pendulum friction coeffitient
g = 9.81; % gravity
M = 0.548+0.022*2; % cart weight
m = 0.031*2; %pendulum masses
I = 0.046;%0.02*0.025/12+0.02*0.12^2+0.011*0.42^2; % moment of inertia
l = 0.1313;
% x(1) = theta
% x(2) = theta_dot
% x(3) = x
% x(4) = x_dot
dxdt = [x(2);
(-(M+m)*c*x(2)-(M+m)*g*l*sin(x(1))-m^2*l^2*x(2)^2*sin(x(1))*cos(x(1))+m*l*b*x(4)*cos(x(1))-m*l*cos(x(1))*F)/(I*(m+M)+m*M*l^2+m^2*l^2*sin(x(1))^2);
x(4);
(F - b*x(4) + l*m*x(2)^2*sin(x(1)) + (l*m*cos(x(1))*(c*x(2)*(M + m) + g*l*sin(x(1))*(M + m) + F*l*m*cos(x(1)) + l^2*m^2*x(2)^2*cos(x(1))*sin(x(1)) - b*l*m*x(4)*cos(x(1))))/(I*(M + m) + l^2*m^2*sin(x(1))^2 + M*l^2*m))/(M + m)];
end
The coresponding rk4 function with a simple visualisation is shown below.
function [wi, ti] = rk4 ( RHS, t0, x0, tf, N )
%RK4 approximate the solution of the initial value problem
%
% x'(t) = RHS( t, x ), x(t0) = x0
%
% using the classical fourth-order Runge-Kutta method - this
% routine will work for a system of first-order equations as
% well as for a single equation
%
% calling sequences:
% [wi, ti] = rk4 ( RHS, t0, x0, tf, N )
% rk4 ( RHS, t0, x0, tf, N )
%
% inputs:
% RHS string containing name of m-file defining the
% right-hand side of the differential equation; the
% m-file must take two inputs - first, the value of
% the independent variable; second, the value of the
% dependent variable
% t0 initial value of the independent variable
% x0 initial value of the dependent variable(s)
% if solving a system of equations, this should be a
% row vector containing all initial values
% tf final value of the independent variable
% N number of uniformly sized time steps to be taken to
% advance the solution from t = t0 to t = tf
%
% output:
% wi vector / matrix containing values of the approximate
% solution to the differential equation
% ti vector containing the values of the independent
% variable at which an approximate solution has been
% obtained
%
% x(1) = theta
% x(2) = theta_dot
% x(3) = x
% x(4) = x_dot
t0 = 0; tf = 5; x0 = [pi/2; 0; 0; 0]; N = 400;
neqn = length ( x0 );
ti = linspace ( t0, tf, N+1 );
wi = [ zeros( neqn, N+1 ) ];
wi(1:neqn, 1) = x0';
h = ( tf - t0 ) / N;
% force
u = 0.0;
%init visualisation
h_cart = plot(NaN, NaN, 'Marker', 'square', 'color', 'red', 'LineWidth', 6);
hold on
h_pend = plot(NaN, NaN, 'bo', 'LineWidth', 3);
axis([-5 5 -5 5]);
axis manual;
xlim([-5 5]);
ylim([-5 5]);
for i = 1:N
k1 = h * feval ( 'RHS', t0, x0, u );
k2 = h * feval ( 'RHS', t0 + (h/2), x0 + (k1/2), u);
k3 = h * feval ( 'RHS', t0 + h/2, x0 + k2/2, u);
k4 = h * feval ( 'RHS', t0 + h, x0 + k3, u);
x0 = x0 + ( k1 + 2*k2 + 2*k3 + k4 ) / 6;
t0 = t0 + h;
% model output
wi(1:neqn,i+1) = x0';
% model visualisation
%plotting cart
l = 2;
set(h_cart, 'XData', x0(3), 'YData', 0, 'LineWidth', 5);
%plotting pendulum
%hold on;
set(h_pend, 'XData', sin(x0(1))*l+x0(3), 'YData', -cos(x0(1))*l, 'LineWidth', 2);
%hold off;
% regulator
pause(0.02);
end;
figure;
plot(ti, wi);
legend('theta', 'theta`', 'x', 'x`');
This gives realistic looking results for a pendulum on a cart.
Now to the problem.
I wanted to recreate the exact same equations in simulink. I thought it is going to be as easy as creating the following simulink model.
where I fill the fcn blocks with the second and fourth equation from the RHS file. Like this.
(-(M+m)*c*u(2)-(M+m)*g*l*sin(u(1))-m^2*l^2*u(2)^2*sin(u(1))*cos(u(1))+m*l*b*u(3)*cos(u(1))-m*l*cos(u(1))*u(4))/(I*(m+M)+m*M*l^2+m^2*l^2*sin(u(1))^2)
(u(5) - b*u(4) + l*m*u(2)^2*sin(u(1)) + (l*m*cos(u(1))*(c*u(2)*(M + m) + g*l*sin(u(1))*(M + m) + u(5)*l*m*cos(u(1)) + l^2*m^2*u(2)^2*cos(u(1))*sin(u(1)) - b*l*m*u(4)*cos(u(1))))/(I*(M + m) + l^2*m^2*sin(u(1))^2 + M*l^2*m))/(M + m)
The problem is this doesn't give the correct results from above, but the one below
Does anybody know what I do incorrectly?
Edit:After #am304 comment I decided to add the following information. I changed the setting for the simulink solver to use the fixed-step rk4 solver, so as to get the same results. The second integrator3 from the model above has been initialized to pi/2.
Edit2: If somebody wants to check out the simulink model for themselves click on the link to download the file.
Edit3: As you can read in the answer below the problem was trivial. You can download the correct model here
I looked through your Simulink model and it seems you may have mixed up the two functions you were using. You used the theta_dd function where you meant to put x_dd and vice versa.
In your model, you also force x_d to be set to a constant value 0. I assume you actually meant to set the initial condition to 0, which you can see is done via the Integrator block. x_d (as an input to f) should be your state vector which is also an output of your integrators. This is just a consequence of what you define x_d to be, the integral of x_dd. This is how RK4 works as well; you use an initial state vector first and then use the predicted state vector to drive the next RK4 step.
The resulting output from the scope (i've outputted your whole state vector here) is as follows and looks like what you expect:
I do not think I should link externally to the simulink file so if you would like a copy of the file you can open a chat and ask for it. Otherwise the picture above should be sufficient enough to help you reproduce the same results.