I'm having trouble on an easy exercise about an artificial neural network with 2 features, a hidden layer of 5 neurons and two possible outputs (0 or 1).
My X matrix is a 51x2 matrix, and y is a 51x1 vector.
I know I'm not supposed to do the while E>1 but I wanted to see if eventually my error would be lower than 1
I'd like to know what I am doing wrong. My error doesn't seem to lower (around 1.5 no matter how much iterations I'm doing). Do you see in the code where I am doing a mistake? I'm supposed to use gradient descent.
function [E, v,w] = costFunction(X, y,alpha1,alpha2)
[m n] = size(X);
E = 1;
v = 2*rand(5,3)-1;
w = 2*rand(2,6)-1;
grad_v=zeros(size(v));
grad_w=zeros(size(w));
K = 2;
E = 2;
while E> 1
a1 = [ones(m,1) X];
z2 = a1 * v';
a2 = sigmoid(z2);
a2 = [ones(size(a2,1),1),a2];
z3 = a2 * w';
h = sigmoid(z3);
cost = sum((-y.*log(h)) - ((1-y).*log(1-h)),2);
E = (1/m)*sum(cost);
Delta1=0;
Delta2=0;
for t = 1:m
a1 = [1;X(t,:)'];
z2 = v * a1;
a2 = sigmoid(z2);
a2 = [1;a2];
z3 = w * a2;
a3 = sigmoid(z3);
d3 = a3 - y(t,:)';
d2 = (w(:,2:end)'*d3).*sigmoidGradient(z2);
Delta2 += (d3*a2');
Delta1 += (d2*a1');
end
grad_v = (1/m) * Delta1;
grad_w = (1/m) * Delta2;
v -= alpha1 * grad_v;
w -= alpha2 * grad_w;
end
end
Related
Here is my code. I think it is wrong because the difference between this computed gradient and my numerical estimate is too significant. It doesn't seem to be due to wrongly inverting matrices, etc.
For context, Y is the output layer, X is the input layer, and there is only 1 hidden layer. Theta1 is the weights for the first input layer and Theta2 is the weights for the hidden layer.
for t = 1:m
% do fw prop again...
a1 = [1 X(i,:)];
a2 = [1 sigmoid(a1 * Theta1')];
a3 = sigmoid(a2 * Theta2');
delta_3 = a3' - Y(:, t);
delta_2 = Theta2' * delta_3 .* a2' .* (1 - a2)';
delta_2 = delta_2(2:end,:);
Theta1_grad = Theta1_grad + delta_2 * [1 X(i, :)];
Theta2_grad = Theta2_grad + delta_3 * [1 sigmoid([1 X(i,:)] * Theta1')];
end
grad = [Theta1_grad(:) ; Theta2_grad(:)];
I am trying to use ODE45 to find the solution to 2 rotating bars, rotating on vertical plane, that have a torsional spring that creates a moment on the bars only when the angle between them differs from 90 degrees. I am just using a1-b4 as the constants in the diffEQ and are just imputing their values into a matrix before sending it into the function. I keep betting back an error saying that I am sending 6 initial conditions, but only get 5 back from the ODE45 function. Any ideas on how to fix this?
%system1.m
function [dx] = system1(t,x,parameters)
dx = zeros(4,1);
a1 = parameters(1);
a2 = parameters(2);
a3 = parameters(3);
a4 = parameters(4);
b1 = parameters(5);
b2 = parameters(6);
b3 = parameters(7);
b4 = parameters(8);
dx(1) = x(2); %dtheta1 = angular velocity1
dx(2) = x(3); %d(angular velocity1) = angular acceleration1
dx(4) = x(5); %dtheta2 = angular velocity2
dx(5) = x(6); %d(angular velocity2) = angular acceleration2
dx(2) = a1*x(1)+a2*x(4)+a3*x(2)+a4*x(5); %motion equation 1
dx(5) = b1*x(1)+b2*x(4)+b3*x(2)+b4*x(5); %motion equation 2
%CA2Lou.m
%set parameters
clear;
a1 = -12;
a2 = 12;
a3 = 0;
a4 = 0;
b1 = 4;
b2 = -4;
b3 = 0;
b4 = 0;
parameters = [a1 a2 a3 a4 b1 b2 b3 b4];
%set final time
tf = .5;
options = odeset('MaxStep',.05);
%setting initial conditions
InitialConditions = [90 0 0 0 0 0];
[t_sol,x_sol] = ode45(#system1,[0 tf],InitialConditions,[],parameters);
Your size and indexing for dx don't match x. You initialize dx to 4 elements, even though x has 6. Then you assign values to 4 indices of dx (specifically, [1 2 4 5]) which results in a new size for dx of 5 elements, still one less than the 6 it expects.
You probably need to initialize dx like so:
dx = zeros(6, 1);
Then, your first and second motion equations should probably (I'm guessing) be placed in indices 3 and 6:
dx(3) = a1*x(1)+a2*x(4)+a3*x(2)+a4*x(5); %motion equation 1
dx(6) = b1*x(1)+b2*x(4)+b3*x(2)+b4*x(5); %motion equation 2
I am attempting to create a Abeles matrix formalism model to analyse some experimental data - I have attached a wiki link to this for reference so that you can see what I an attempting to achieve.
The crux of my issue is that I am unable to multiply four sets of matrices against each other, as in: A[1]*B[1]*C[1]*D[1], A[2]*B[2]*C[2]*D[2], ..., A[n]*B[n]*C[n]*D[n]. I then need to store the results as individual matrices of their own - each matrix represents the corresponding momentum transfer value from Qmin:Qstep:Qmax.
Also, when I attempt to carry out the final step; R = abs((ABCD(2,1)./ABCD(1,1)).^2) I end up with a single value rather than a value of R for each Q value.
Due to the size of the code a simple for loop isn't a realistic option.
my 'test' code is:
%import data fid = fopen('run_22208_09.dat');
%A = textscan(fid,'%f%f%f',270,'headerlines',0,'delimiter',',');
NQ = size(A{1,1});
NQ = NQ(1);
Qmin = A{1,1}(1);
Qmax = A{1,1}(NQ);
Qstep = A{1,1}(2) - A{1,1}(1);
fclose('all');
s0 = 2e-6;
s1 = 10e-6;
s2 = 6e-6;
s3 = 4e-6;
s4 = 8e-6;
sn = 12e-6;
r1 = 2;
r2 = 10;
r3 = 3;
r4 = 7;
t1 = 10;
t2 = 45;
t3 = 5;
t4 = 20;
Q=Qmin:Qstep:Qmax;
k = 2.*Q;
k1 = (((k).^2) - 4.*pi.*(s1 - s0)).^0.5;
k2 = (((k).^2) - 4.*pi.*(s2 - s0)).^0.5;
k3 = (((k).^2) - 4.*pi.*(s3 - s0)).^0.5;
k4 = (((k).^2) - 4.*pi.*(s4 - s0)).^0.5;
kn = (((k).^2) - 4.*pi.*(sn - s0)).^0.5;
layer1 = ((k1 - k2)./(k1 + k2)).*(exp(-2.*k1.*k2.*(r1.^2)));
beta1 = (sqrt(-1)).*k1.*t1;
layer2 = ((k2 - k3)./(k2 + k3)).*(exp(-2.*k2.*k3.*(r2.^2)));
beta2 = (sqrt(-1)).*k2.*t2;
layer3 = ((k3 - k4)./(k3 + k4)).*(exp(-2.*k3.*k4.*(r3.^2)));
beta3 = (sqrt(-1)).*k3.*t3;
layer4 = ((k4 - kn)./(k4 + kn)).*(exp(-2.*k4.*kn.*(r4.^2)));
beta4 = (sqrt(-1)).*k4.*t4;
%general matrix
C1 = [exp(beta1),layer1.*(exp(beta1));layer1.*exp(-beta1),exp(-beta1)]
C2 = [exp(beta2),layer2.*(exp(beta2));layer2.*exp(-beta2),exp(-beta2)];
C3 = [exp(beta3),layer3.*(exp(beta3));layer3.*exp(-beta3),exp(-beta3)];
C4 = [exp(beta4),layer4.*(exp(beta4));layer4.*exp(-beta4),exp(-beta4)];
% CA = bsxfun(#times,C1,C2)
% CB = bsxfun(#times,CA,C3);
% C = bsxfun(#times,CB,C4)
% R = abs((C(2,1)./C(1,1)).^2)
For element-wise multiplications of arrays, you write
M = C1 .* C2 .* C3 .* C4;
I asked a question a few days before but I guess it was a little too complicated and I don't expect to get any answer.
My problem is that I need to use ANN for classification. I've read that much better cost function (or loss function as some books specify) is the cross-entropy, that is J(w) = -1/m * sum_i( yi*ln(hw(xi)) + (1-yi)*ln(1 - hw(xi)) ); i indicates the no. data from training matrix X. I tried to apply it in MATLAB but I find it really difficult. There are couple things I don't know:
should I sum each outputs given all training data (i = 1, ... N, where N is number of inputs for training)
is the gradient calculated correctly
is the numerical gradient (gradAapprox) calculated correctly.
I have following MATLAB codes. I realise I may ask for trivial thing but anyway I hope someone can give me some clues how to find the problem. I suspect the problem is to calculate gradients.
Many thanks.
Main script:
close all
clear all
L = #(x) (1 + exp(-x)).^(-1);
NN = #(x,theta) theta{2}*[ones(1,size(x,1));L(theta{1}*[ones(size(x,1),1) x]')];
% theta = [10 -30 -30];
x = [0 0; 0 1; 1 0; 1 1];
y = [0.9 0.1 0.1 0.1]';
theta0 = 2*rand(9,1)-1;
options = optimset('gradObj','on','Display','iter');
thetaVec = fminunc(#costFunction,theta0,options,x,y);
theta = cell(2,1);
theta{1} = reshape(thetaVec(1:6),[2 3]);
theta{2} = reshape(thetaVec(7:9),[1 3]);
NN(x,theta)'
Cost function:
function [jVal,gradVal,gradApprox] = costFunction(thetaVec,x,y)
persistent index;
% 1 x x
% 1 x x
% 1 x x
% x = 1 x x
% 1 x x
% 1 x x
% 1 x x
m = size(x,1);
if isempty(index) || index > size(x,1)
index = 1;
end
L = #(x) (1 + exp(-x)).^(-1);
NN = #(x,theta) theta{2}*[ones(1,size(x,1));L(theta{1}*[ones(size(x,1),1) x]')];
theta = cell(2,1);
theta{1} = reshape(thetaVec(1:6),[2 3]);
theta{2} = reshape(thetaVec(7:9),[1 3]);
Dew = cell(2,1);
DewApprox = cell(2,1);
% Forward propagation
a0 = x(index,:)';
z1 = theta{1}*[1;a0];
a1 = L(z1);
z2 = theta{2}*[1;a1];
a2 = L(z2);
% Back propagation
d2 = 1/m*(a2 - y(index))*L(z2)*(1-L(z2));
Dew{2} = [1;a1]*d2;
d1 = [1;a1].*(1 - [1;a1]).*theta{2}'*d2;
Dew{1} = [1;a0]*d1(2:end)';
% NNRes = NN(x,theta)';
% jVal = -1/m*sum(NNRes-y)*NNRes*(1-NNRes);
jVal = -1/m*(a2 - y(index))*a2*(1-a2);
gradVal = [Dew{1}(:);Dew{2}(:)];
gradApprox = CalcGradApprox(0.0001);
index = index + 1;
function output = CalcGradApprox(epsilon)
output = zeros(size(gradVal));
for n=1:length(thetaVec)
thetaVecMin = thetaVec;
thetaVecMax = thetaVec;
thetaVecMin(n) = thetaVec(n) - epsilon;
thetaVecMax(n) = thetaVec(n) + epsilon;
thetaMin = cell(2,1);
thetaMax = cell(2,1);
thetaMin{1} = reshape(thetaVecMin(1:6),[2 3]);
thetaMin{2} = reshape(thetaVecMin(7:9),[1 3]);
thetaMax{1} = reshape(thetaVecMax(1:6),[2 3]);
thetaMax{2} = reshape(thetaVecMax(7:9),[1 3]);
a2min = NN(x(index,:),thetaMin)';
a2max = NN(x(index,:),thetaMax)';
jValMin = -1/m*(a2min-y(index))*a2min*(1-a2min);
jValMax = -1/m*(a2max-y(index))*a2max*(1-a2max);
output(n) = (jValMax - jValMin)/2/epsilon;
end
end
end
EDIT:
Below I present the correct version of my costFunction for those who may be interested.
function [jVal,gradVal,gradApprox] = costFunction(thetaVec,x,y)
m = size(x,1);
L = #(x) (1 + exp(-x)).^(-1);
NN = #(x,theta) L(theta{2}*[ones(1,size(x,1));L(theta{1}*[ones(size(x,1),1) x]')]);
theta = cell(2,1);
theta{1} = reshape(thetaVec(1:6),[2 3]);
theta{2} = reshape(thetaVec(7:9),[1 3]);
Delta = cell(2,1);
Delta{1} = zeros(size(theta{1}));
Delta{2} = zeros(size(theta{2}));
D = cell(2,1);
D{1} = zeros(size(theta{1}));
D{2} = zeros(size(theta{2}));
jVal = 0;
for in = 1:size(x,1)
% Forward propagation
a1 = [1;x(in,:)']; % added bias to a0
z2 = theta{1}*a1;
a2 = [1;L(z2)]; % added bias to a1
z3 = theta{2}*a2;
a3 = L(z3);
% Back propagation
d3 = a3 - y(in);
d2 = theta{2}'*d3.*a2.*(1 - a2);
Delta{2} = Delta{2} + d3*a2';
Delta{1} = Delta{1} + d2(2:end)*a1';
jVal = jVal + sum( y(in)*log(a3) + (1-y(in))*log(1-a3) );
end
D{1} = 1/m*Delta{1};
D{2} = 1/m*Delta{2};
jVal = -1/m*jVal;
gradVal = [D{1}(:);D{2}(:)];
gradApprox = CalcGradApprox(x(in,:),0.0001);
% Nested function to calculate gradApprox
function output = CalcGradApprox(x,epsilon)
output = zeros(size(thetaVec));
for n=1:length(thetaVec)
thetaVecMin = thetaVec;
thetaVecMax = thetaVec;
thetaVecMin(n) = thetaVec(n) - epsilon;
thetaVecMax(n) = thetaVec(n) + epsilon;
thetaMin = cell(2,1);
thetaMax = cell(2,1);
thetaMin{1} = reshape(thetaVecMin(1:6),[2 3]);
thetaMin{2} = reshape(thetaVecMin(7:9),[1 3]);
thetaMax{1} = reshape(thetaVecMax(1:6),[2 3]);
thetaMax{2} = reshape(thetaVecMax(7:9),[1 3]);
a3min = NN(x,thetaMin)';
a3max = NN(x,thetaMax)';
jValMin = 0;
jValMax = 0;
for inn=1:size(x,1)
jValMin = jValMin + sum( y(inn)*log(a3min) + (1-y(inn))*log(1-a3min) );
jValMax = jValMax + sum( y(inn)*log(a3max) + (1-y(inn))*log(1-a3max) );
end
jValMin = 1/m*jValMin;
jValMax = 1/m*jValMax;
output(n) = (jValMax - jValMin)/2/epsilon;
end
end
end
I've only had a quick eyeball over your code. Here are some pointers.
Q1
should I sum each outputs given all training data (i = 1, ... N, where
N is number of inputs for training)
If you are talking in relation to the cost function, it is normal to sum and normalise by the number of training examples in order to provide comparison between.
I can't tell from the code whether you have a vectorised implementation which will change the answer. Note that the sum function will only sum up a single dimension at a time - meaning if you have a (M by N) array, sum will result in a 1 by N array.
The cost function should have a scalar output.
Q2
is the gradient calculated correctly
The gradient is not calculated correctly - specifically the deltas look wrong. Try following Andrew Ng's notes [PDF] they are very good.
Q3
is the numerical gradient (gradAapprox) calculated correctly.
This line looks a bit suspect. Does this make more sense?
output(n) = (jValMax - jValMin)/(2*epsilon);
EDIT: I actually can't make heads or tails of your gradient approximation. You should only use forward propagation and small tweaks in the parameters to compute the gradient. Good luck!
I'm trying to solve the a very large system of coupled nonlinear equations. Following this thread and the related help by Matalb (first example) I tried to wrote the following code:
%% FSOLVE TEST #2
clc; clear; close all
%%
global a0 a1 a2 a3 a4 h0 TM JA JB
a0 = 2.0377638272727268;
a1 = -7.105521894545453;
a2 = 9.234000147272726;
a3 = -5.302489919999999;
a4 = 1.1362478399999998;
h0 = 45.5;
TM = 0.00592256;
JA = 1.0253896074561006;
JB = 1.3079437258774012;
%%
global N
N = 5;
XA = 0;
XB = 15;
dX = (XB-XA)/(N-1);
XX = XA:dX:XB;
y0 = JA:(JB-JA)/(N-1):JB;
plot(XX,y0,'o')
[x,fval] = fsolve(#nlsys,y0);
where the function nlsys is as follows:
function S = nlsys(x)
global a1 a2 a3 a4 N TM h0 dX JA JB
H = h0^2/12;
e = cell(N,1);
for i = 2:N-1
D1 = (x(i+1) - x(i-1))./2./dX;
D2 = (x(i+1) + x(i-1) - 2.*x(i))./(dX^2);
f = a1 + 2*a2.*x(i) + 3*a3.*x(i).^2 + 4*a4.*x(i).^3;
g = - H.* (a1 + 2*a2.*x(i) + 3*a3.*x(i).^2 + 4*a4.*x(i).^3)./(x(i).^5);
b = (H/2) .* (5*a1 + 8*a2.*x(i) + 9*a3.*x(i).^2 + 8*a4.*x(i).^3)./(x(i).^6);
e{i} = #(x) f + b.*(D1.^2) + g.*D2 - TM;
end
e{1} = #(x) x(1) - JA;
e{N} = #(x) x(N) - JB;
S = #(x) cellfun(#(E) E(x), e);
When I run the program, Matlab gives the following errors:
Error using fsolve (line 280)
FSOLVE requires all values returned by user functions to be of data type double.
Error in fsolve_test2 (line 32)
[x,fval] = fsolve(#nlsys,y0);
Where are my mistakes?
Thanks in advance.
Petrus