I'm trying to solve the following system of differential equations with Matlab.
The parameters i use are found here
.
The problem i have is that when i run my code i get the same results regardless of the values i set for d1 and d2. Precisely, i get the same results as in the simplified version of the system, where d1,d2 = 0. Since i'm trying to replicate the results of someone else, i know that this should not be the case. Does anyone have any idea why this is happening?
% Define u_1(t,j), u_2(t,i) as:
% u_1(t,1) = x(1)
% u_2(t,1) = x(2)
% u_1(t,2) = x(3)
% u_2(t,2) = x(4)
% So
% x(1)' = x(1)*(epsilon - (epsilon/k1)*x(1) - alpha*x(2)) + d1*(rho1(x(4))*x(3) - rho1(x(2))*x(1))
% x(2)' = x(2)*(gama + beta*x(1) - (gama/k2)*x(2)) + d2*(rho2(x(3))*x(4) - rho2(x(1))*x(2))
% x(3)' = x(3)*(epsilon - (epsilon/k1)*x(3) - alpha*x(4)) + d1*(rho1(x(2))*x(1) - rho1(x(4))*x(3))
% x(4)' = x(4)*(gama + beta*x(3) - (gama/k2)*x(4)) + d2*(rho2(x(1))*x(2) - rho2(x(3))*x(4))
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
epsilon = 3;
alpha = 0.2;
gama = 0.4;
beta = 0.4;
k1 = 2;
k2 = 1;
m1 = 1;
m2 = 1;
d2 = 0.1;
d1 = 17;
% rho1(x(2)) = m1*exp(-x(2)/m1)
% rho1(x(4)) = m1*exp(-x(4)/m1)
% rho2(x(1)) = m2*exp(-x(1)/m2)
% rho2(x(3)) = m2*exp(-x(3)/m2)
g = #(t,x)[
x(1)*(epsilon - (epsilon/k1)*x(1) - alpha*x(2)) + d1*(m1*exp(-x(4)/m1)*x(3) - m1*exp(-x(2)/m1)*x(1));
x(2)*(gama + beta*x(1) - (gama/k2)*x(2)) + d2*(m2*exp(-x(3)/m2)*x(4) - m2*exp(-x(1)/m2)*x(2));
x(3)*(epsilon - (epsilon/k1)*x(3) - alpha*x(4)) + d1*(m1*exp(-x(2)/m1)*x(1) - m1*exp(-x(4)/m1)*x(3));
x(4)*(gama + beta*x(3) - (gama/k2)*x(4)) + d2*(m2*exp(-x(1)/m2)*x(2) - m2*exp(-x(3)/m2)*x(4))];
[t1,xa1] = ode45(#(t,x) g(t,x),[0 20],[4 4 4 4]);
[t2,xa2] = ode45(#(t,x) g(t,x),[0 20],[3 3 3 3]);
[t3,xa3] = ode45(#(t,x) g(t,x),[0 20],[2 2 2 2]);
[t4,xa4] = ode45(#(t,x) g(t,x),[0 20],[0.5 0.5 0.5 0.5]);
figure;
subplot(2,2,1)
plot(t1,xa1(:,1),"r",t1,xa1(:,2),"b")
title('y0 = 4')
xlabel('t')
legend({'Free Jobs','Labour Force'},'Location','southwest')
subplot(2,2,2)
plot(t2,xa2(:,1),"r",t2,xa2(:,2),"b")
title('y0 = 3')
xlabel('t')
legend({'Free Jobs','Labour Force'},'Location','southwest')
subplot(2,2,3)
plot(t3,xa3(:,1),"r",t3,xa3(:,2),"b")
title('y0 = 2')
xlabel('t')
legend({'Free Jobs','Labour Force'},'Location','southwest')
subplot(2,2,4)
plot(t4,xa4(:,1),"r",t4,xa4(:,2),"b")
title('y0 = 0.5')
xlabel('t')
legend({'Free Jobs','Labour Force'},'Location','southwest')
sgtitle('Labour Force and Free Jobs')
figure;
hold on
plot(t1,xa1(:,1),"r",t1,xa1(:,2),"b")
plot(t2,xa2(:,1),"r",t2,xa2(:,2),"b")
plot(t3,xa3(:,1),"r",t3,xa3(:,2),"b")
plot(t4,xa4(:,1),"r",t4,xa4(:,2),"b")
title('Labour Force and Free Jobs')
xlabel('t')
legend({'Free Jobs','Labour Force'},'Location','northeast')
hold off
The issue here is that you are only testing this when all your x values are equal. Look at the first term in g that contains d1
d1*(m1*exp(-x(4)/m1)*x(3) - m1*exp(-x(2)/m1)*x(1))
If all the x values are equal then that equation simplifies to just d1*0. The same is true for all other terms with a d1 or d2 in the function g. As a result, changing these values have no effect on the results. If your values of x vary then the d1 and d2 values will affect the results.
Related
Consider the following problem:
I am now in the third part of this question. I wrote the vectorial loop equations (q=teta2, x=teta3 and y=teta4):
fval(1,1) = r2*cos(q)+r3*cos(x)-r4*cos(y)-r1;
fval(2,1) = r2*sin(q)+r3*sin(x)-r4*sin(y);
I have these 2 functions, and all variables except x and y are given. I found the roots with help of this video.
Now I need to plot graphs of q versus x and q versus y when q is at [0,2pi] with delta q of 2.5 degree. What should I do to plot the graphs?
Below is my attempt so far:
function [fval,jac] = lorenzSystem(X)
%Define variables
x = X(1);
y = X(2);
q = pi/2;
r2 = 15
r3 = 50
r4 = 45
r1 = 40
%Define f(x)
fval(1,1)=r2*cos(q)+r3*cos(x)-r4*cos(y)-r1;
fval(2,1)=r2*sin(q)+r3*sin(x)-r4*sin(y);
%Define Jacobian
jac = [-r3*sin(X(1)), r4*sin(X(2));
r3*cos(X(1)), -r4*cos(X(2))];
%% Multivariate NR
%Initial conditions:
X0 = [0.5;1];
maxIter = 50;
tolX = 1e-6;
X = X0;
Xold = X0;
for i = 1:maxIter
[f,j] = lorenzSystem(X);
X = X - inv(j)*f;
err(:,i) = abs(X-Xold);
Xold = X;
if (err(:,i)<tolX)
break;
end
end
Please take a look at my solution below, and study how it differs from your own.
function [th2,th3,th4] = q65270276()
[th2,th3,th4] = lorenzSystem();
hF = figure(); hAx = axes(hF);
plot(hAx, deg2rad(th2), deg2rad(th3), deg2rad(th2), deg2rad(th4));
xlabel(hAx, '\theta_2')
xticks(hAx, 0:pi/3:2*pi);
xticklabels(hAx, {'$0$','$\frac{\pi}{3}$','$\frac{2\pi}{3}$','$\pi$','$\frac{4\pi}{3}$','$\frac{5\pi}{3}$','$2\pi$'});
hAx.TickLabelInterpreter = 'latex';
yticks(hAx, 0:pi/6:pi);
yticklabels(hAx, {'$0$','$\frac{\pi}{6}$','$\frac{\pi}{3}$','$\frac{\pi}{2}$','$\frac{2\pi}{3}$','$\frac{5\pi}{6}$','$\pi$'});
set(hAx, 'XLim', [0 2*pi], 'YLim', [0 pi], 'FontSize', 16);
grid(hAx, 'on');
legend(hAx, '\theta_3', '\theta_4')
end
function [th2,th3,th4] = lorenzSystem()
th2 = (0:2.5:360).';
[th3,th4] = deal(zeros(size(th2)));
% Define geometry:
r1 = 40;
r2 = 15;
r3 = 50;
r4 = 45;
% Define the residual:
res = #(q,X)[r2*cosd(q)+r3*cosd(X(1))-r4*cosd(X(2))-r1; ... Δx=0
r2*sind(q)+r3*sind(X(1))-r4*sind(X(2))]; % Δy=0
% Define the Jacobian:
J = #(X)[-r3*sind(X(1)), r4*sind(X(2));
r3*cosd(X(1)), -r4*cosd(X(2))];
X0 = [acosd((45^2-25^2-50^2)/(-2*25*50)); 180-acosd((50^2-25^2-45^2)/(-2*25*45))]; % Accurate guess
maxIter = 500;
tolX = 1e-6;
for idx = 1:numel(th2)
X = X0;
Xold = X0;
err = zeros(maxIter, 1); % Preallocation
for it = 1:maxIter
% Update the guess
f = res( th2(idx), Xold );
X = Xold - J(Xold) \ f;
% X = X - pinv(J(X)) * res( q(idx), X ); % May help when J(X) is close to singular
% Determine convergence
err(it) = (X-Xold).' * (X-Xold);
if err(it) < tolX
break
end
% Update history
Xold = X;
end
% Unpack and store θ₃, θ₄
th3(idx) = X(1);
th4(idx) = X(2);
% Update X0 for faster convergence of the next case:
X0 = X;
end
end
Several notes:
All computations are performed in degrees.
The specific plotting code I used is less interesting, what matters is that I defined all θ₂ in advance, then looped over them to find θ₃ and θ₄ (without recursion, as was done in your own implementation).
The initial guess (actually, analytical solution) for the very first case (θ₂=0) can be found by solving the problem manually (i.e. "on paper") using the law of cosines. The solver also works for other guesses, but you might need to increase maxIter. Also, for certain guesses (e.g. X(1)==X(2)), the Jacobian is ill-conditioned, in which case you can use pinv.
If my computation is correct, this is the result:
I am trying to plot a state-space diagram, as well as a time-history diagram, of a dynamical system. There's a catch, though. The state-space is divided into two halves by a plane located at x1 = 0. The state-space axes are x1, x2, x3. The x1 = 0 plane is parallel to the x2/x3 plane. The state-space above the x1 = 0 plane is described by the ODEs in eqx3, whereas the state-space below the x1 = 0 plane is described by the ODEs in eqx4.
So, there is a discontinuity on the plane x1 = 0. I have a vague understanding that an event function (function [value,isterminal,direction] = myEventsFcn(t,y)) should be used, but I do not know what values to give to "value", "isterminal", and "direction".
In my code below, I have an initial condition for eqx3, and another initial condition for eqx4. The initial condition for eqx3 is in the upper half of the state-space (x1 > 0). The orbit then hits the x1 = 0 plane and there is a discontinuity, and the eqx4 trajectory begins on this plane, but from a different point from where eqx3 ended.
Can this be done? How do I put it into a code? Do I stop the integration when the orbit reaches the plane x1 = 0?
eta = 0.05;
omega = 25;
tspan = [0,50];
initcond = [2, 3, 4]
[t,x] = ode45(#(t,x) eqx3(t,x,eta, omega), tspan, initcond);
initcond1 = [0, 1, 1]
[t,y] = ode45(#(t,y) eqx4(t,y,eta, omega), tspan, initcond1);
plot3(x(:,1), x(:,2), x(:,3),y(:,1), y(:,2), y(:,3))
xlabel('x1')
ylabel('x2')
zlabel('x3')
%subplot(222)
%plot(t, x(:,1), t,x(:,2),t,x(:,3),'--');
%xlabel('t')
function xdot = eqx3(t,x,eta,omega)
xdot = zeros(3,1);
xdot(1) = -(2*eta*omega + 1)*x(1) + x(2) - 1;
xdot(2) = -(2*eta*omega + (omega^2))*x(1) + x(3) + 2;
xdot(3) = -(omega^2)*x(1) + x(2) - 1;
end
function ydot = eqx4(t,y,eta,omega)
ydot = zeros(3,1);
ydot(1) = -(2*eta*omega + 1)*y(1) + y(2) + 1;
ydot(2) = -(2*eta*omega + (omega^2))*y(1) + y(3) - 2;
ydot(3) = -(omega^2)*y(1) + y(2) + 1;
end
function [value,isterminal,direction] = myEventsFcn(t,y)
value = 0
isterminal = 1
direction = 1
end
Without events, using a close-by smooth system
First, as the difference between the systems is the addition or subtraction of a constant vector it is easy to find an approximate smooth version of the system using some sigmoid function like tanh.
eta = 0.05;
omega = 25;
t0=0; tf=4;
initcond = [2, 3, 4];
opt = odeset('AbsTol',1e-11,'RelTol',1e-13);
opt = odeset(opt,'MaxStep',0.005); % in Matlab: opt = odeset('Refine',10);
[t,x] = ode45(#(t,x) eqx34(t,x,eta, omega), [t0, tf], initcond, opt);
clf;
subplot(121);
plot3(x(:,1), x(:,2), x(:,3));
xlabel('x1');
ylabel('x2');
zlabel('x3');
subplot(122);
plot(t, 10.*x(:,1), t,x(:,2),':',t,x(:,3),'--');
xlabel('t');
function xdot = eqx34(t,x,eta,omega)
S = tanh(1e6*x(1));
xdot = zeros(3,1);
xdot(1) = -(2*eta*omega + 1)*x(1) + x(2) - S;
xdot(2) = -(2*eta*omega + (omega^2))*x(1) + x(3) + 2*S;
xdot(3) = -(omega^2)*x(1) + x(2) - S;
end
resulting in the plots
As is visible, after t=1.2 the solution is essentially constant with x(1)=0 and the other coordinates close to zero.
With events
If you want to use the event mechanism, make ODE and event function dependent on a sign parameter S denoting the phase and the direction of the zero crossing.
eta = 0.05;
omega = 25;
t0=0; tf=4;
initcond = [2, 3, 4];
opt = odeset('AbsTol',1e-10,'RelTol',1e-12,'InitialStep',1e-6);
opt = odeset(opt,'MaxStep',0.005); % in Matlab: opt = odeset('Refine',10);
T = [t0]; TE = [];
X = [initcond]; XE = [];
S = 1; % sign of x(1)
while t0<tf
opt = odeset(opt,'Events', #(t,x)myEventsFcn(t,x,S));
[t,x,te,xe,ie] = ode45(#(t,x) eqx34(t,x,eta, omega, S), [t0, tf], initcond, opt);
T = [ T; t(2:end) ]; TE = [ TE; te ];
X = [ X; x(2:end,:)]; XE = [ XE; xe ];
t0 = t(end);
initcond = x(end,:);
S = -S % alternatively = 1-2*(initcond(1)<0);
end
disp(TE); disp(XE);
subplot(121);
hold on;
plot3(X(:,1), X(:,2), X(:,3),'b-');
plot3(XE(:,1), XE(:,2), XE(:,3),'or');
hold off;
xlabel('x1');
ylabel('x2');
zlabel('x3');
subplot(122);
plot(T, 10.*X(:,1), T,X(:,2),':',T,X(:,3),'--');
xlabel('t');
function xdot = eqx34(t,x,eta,omega,S)
xdot = zeros(3,1);
xdot(1) = -(2*eta*omega + 1)*x(1) + x(2) - S;
xdot(2) = -(2*eta*omega + (omega^2))*x(1) + x(3) + 2*S;
xdot(3) = -(omega^2)*x(1) + x(2) - S;
end
function [value,isterminal,direction] = myEventsFcn(t,x,S)
value = x(1)+2e-4*S;
isterminal = 1;
direction = -S;
end
The solution enters a sliding mode for 1.36 < t < 1.43 where (theoretically) x(1)=0 and the vector field points to the other phase from both sides. The theoretical solution is to take a linear combination that sets the first component to zero, so that the resulting direction is tangential to the separating surface. In the first variant above the sigmoid achieves something like this automatically. Using events one could add additional event functions that test the vector field for these conditions, and when they cease to persist.
A quick solution is to thicken the boundary surface, that is, test for x(1)+epsilon*S==0 so that the solution has to cross the boundary surface before triggering the event. In sliding mode, it will be immediately pushed back, giving a ping-pong or zigzag motion. epsilon has to be small to not perturb the solution too much, however not too small to avoid the triggering of too many events. With epsilon=2e-4 octave gives the following solution in a close-up to the sliding interval
The octave solver, and in some way also Matlab, will not trigger a terminal event if it happens in the first integration step. For that reason the InitialStep option was set to a rather small value, it should be about 0.01*epsilon. The full solution looks now similar to the one obtained in the first variant
I want to speed up my code. I always use vectorization. But in this code I have no idea how to avoid the for-loop. I would really appreciate a hint how to proceed.
thank u so much for your time.
close all
clear
clc
% generating sample data
x = linspace(10,130,33);
y = linspace(20,100,22);
[xx, yy] = ndgrid(x,y);
k = 2*pi/50;
s = [sin(k*xx+k*yy)];
% generating query points
xi = 10:5:130;
yi = 20:5:100;
[xxi, yyi] = ndgrid(xi,yi);
P = [xxi(:), yyi(:)];
% interpolation algorithm
dx = x(2) - x(1);
dy = y(2) - y(1);
x_ = [x(1)-dx x x(end)+dx x(end)+2*dx];
y_ = [y(1)-dy y y(end)+dy y(end)+2*dy];
s_ = [s(1) s(1,:) s(1,end) s(1,end)
s(:,1) s s(:,end) s(:,end)
s(end,1) s(end,:) s(end,end) s(end,end)
s(end,1) s(end,:) s(end,end) s(end,end)];
si = P(:,1)*0;
M = 1/6*[-1 3 -3 1
3 -6 3 0
-3 0 3 0
1 4 1 0];
tic
for nn = 1:numel(P(:,1))
u = mod(P(nn,1)- x_(1), dx)/dx;
jj = floor((P(nn,1) - x_(1))/dx) + 1;
v = mod(P(nn,2)- y_(1), dy)/dy;
ii = floor((P(nn,2) - y_(1))/dy) + 1;
D = [s_(jj-1,ii-1) s_(jj-1,ii) s_(jj-1,ii+1) s_(jj-1,ii+2)
s_(jj,ii-1) s_(jj,ii) s_(jj,ii+1) s_(jj,ii+2)
s_(jj+1,ii-1) s_(jj+1,ii) s_(jj+1,ii+1) s_(jj+1,ii+2)
s_(jj+2,ii-1) s_(jj+2,ii) s_(jj+2,ii+1) s_(jj+2,ii+2)];
U = [u.^3 u.^2 u 1];
V = [v.^3 v.^2 v 1];
si(nn) = U*M*D*M'*V';
end
toc
scatter3(P(:,1), P(:,2), si)
hold on
mesh(xx,yy,s)
This is the full example and is a cubic B-spline surface interpolation algorithm in 2D space.
In my matlab program:
% Orto-xylene oxidation in an adiabatic PFR.
% ethylbenzene -> Anhydrated-xylene + H2
% There is water as an inert in the system
global y0 F0 deltaHR0 P0 ni n
y0(1) = 0.0769; % ethylbenzene
y0(2) = 0.; % styrene
y0(3) = 0.; % hydrogen
y0(4) = 0.923; % water
Feed = 1105.; % F01 + FH2O Feed rate [mol/s]
F0 = Feed .* y0; % Inlet feed rate of components [mol/s]
Tin = 925.;
n = 4.; % Number of reacting compound
deltaHR0 = 124850.; % heat of reaction at standard conditions[J/mol]
P0 = 240000.; % pressure [Pa]
W = 25400.; % reactor weight [kg]
ni = [-1 1 1 0]; % Stoichiometric matrix
% Initial conditions (feed and T0)
u0 = F0;
u0(n+1) = Tin;
wspan = [0 W];
[w_adiab,u] = ode15s(#dfuns,wspan,u0);
conv_adiab = 1 - u(:,1)/F0(1);
T = u(:,n+1);
subplot(2,1,1)
plot(w_adiab,conv_adiab,'-')
title('Conversion profile')
grid on
xlabel('W')
ylabel('X(A1)')
subplot(2,1,2)
plot(w_adiab,T,'-')
title('Temperature profile')
grid on
xlabel('W')
ylabel('T')
%function file dfuns
function f = dfuns (w,U)
global y0 F0 deltaHR0 P0 ni n
% U(1) = F1
% U(2) = F2
% U(3) = F3
% U(4) = F4
% y is a vector of n components
y = U(1:n)/sum(U(1:n)); % molar fractions
T = U(n+1);
k = 3.46* 10e8.* exp(-10980./T); %Needs to be in Pascal
K = 8.2.* 10e11.* exp(-15200./T); %Needs to be in Pascal
rm = k.* (y(1)* P0- (y(2).* y(3).*(P0^2))./ K);
f(1:n) = -ni(1:n).* rm;
Cp1 = 37.778 + 87.940e-3.* T + 113.436e-5.* T^2; % ethylbenzene
%Cp2 = 71.201 + 54.706e-3.* T + 64.791e-5.* T^2; % styrene
%Cp3 = 23.969 + 30.603e-3.*T - 6.418e-5.* T^2; % hydrogen
Cp4 = 36.540 - 34.827e-3.*T + 11.681e-5.*T^2; % water
deltaCp = 0.;
% deltaCp considered to be different from zero
% deltaCp = -Cp1+Cp2+Cp3;
deltaHR = deltaHR0;
% non-const deltaHR
% T0=273.5;
% deltaHR = - deltaHR0 + (37.778.* T + 87.940e-3.* (T^2)./2 +...
% 113.436e-5.* (T^3)./3) + (71.201.* T + 54.706e-3.* (T^2)./2 + ...
% 64.791e-5.* (T^3)/3) + (23.969.*T + 30.603e-3.* (T^2)./2 -...
% 6.418e-5.* (T^3)/3) - (37.778*T0 - ...
% 87.940e-3.*(T0^2)./2 + 113.436e-5.* (T0^3)./3 )+...
% (71.201 * T0 + 54.706e-3.* (T0^2)./2 + 64.791e-5.* ...
% (T0^3)./3) + (23.969 * T0 + 30.603e-3.* (T0^2)./2 -...
% 6.418e-5.* (T0^3)./3);
f(n+1) = (1./((F0(1)+F0(4)).* (y0(1).* Cp1 + y0(4).* Cp4 +y(1).* deltaCp)*...
(1 - U(1)/F0(1)))).*-deltaHR.* rm;
f = f'; % transformation to column vector
I have a problem because of the forrowing warning:
In catPFR (line 21)
Warning: Matrix is singular, close to singular or badly scaled. Results may be inaccurate. RCOND = NaN.
> In ode15s (line 589)
In catPFR (line 21)
Warning: Failure at t=0.000000e+00. Unable to meet integration tolerances without reducing the step size below the
smallest value allowed (7.905050e-323) at time t.
> In ode15s (line 668)
In catPFR (line 21)
Can please someone help me? This problem could be a error in the function but I'm really not into matlab enough to understand it. Of course, I don't get the graph I was expecting.
Thank you very much,
Serena
If you inspect return values of function dfuns (w,U) during the integration, you will see that in most cases it contains NaN's or Inf's. Of course ode15s cannot proceed in this case. Check dfuns and assure that it returns correct values.
I asked a question a few days before but I guess it was a little too complicated and I don't expect to get any answer.
My problem is that I need to use ANN for classification. I've read that much better cost function (or loss function as some books specify) is the cross-entropy, that is J(w) = -1/m * sum_i( yi*ln(hw(xi)) + (1-yi)*ln(1 - hw(xi)) ); i indicates the no. data from training matrix X. I tried to apply it in MATLAB but I find it really difficult. There are couple things I don't know:
should I sum each outputs given all training data (i = 1, ... N, where N is number of inputs for training)
is the gradient calculated correctly
is the numerical gradient (gradAapprox) calculated correctly.
I have following MATLAB codes. I realise I may ask for trivial thing but anyway I hope someone can give me some clues how to find the problem. I suspect the problem is to calculate gradients.
Many thanks.
Main script:
close all
clear all
L = #(x) (1 + exp(-x)).^(-1);
NN = #(x,theta) theta{2}*[ones(1,size(x,1));L(theta{1}*[ones(size(x,1),1) x]')];
% theta = [10 -30 -30];
x = [0 0; 0 1; 1 0; 1 1];
y = [0.9 0.1 0.1 0.1]';
theta0 = 2*rand(9,1)-1;
options = optimset('gradObj','on','Display','iter');
thetaVec = fminunc(#costFunction,theta0,options,x,y);
theta = cell(2,1);
theta{1} = reshape(thetaVec(1:6),[2 3]);
theta{2} = reshape(thetaVec(7:9),[1 3]);
NN(x,theta)'
Cost function:
function [jVal,gradVal,gradApprox] = costFunction(thetaVec,x,y)
persistent index;
% 1 x x
% 1 x x
% 1 x x
% x = 1 x x
% 1 x x
% 1 x x
% 1 x x
m = size(x,1);
if isempty(index) || index > size(x,1)
index = 1;
end
L = #(x) (1 + exp(-x)).^(-1);
NN = #(x,theta) theta{2}*[ones(1,size(x,1));L(theta{1}*[ones(size(x,1),1) x]')];
theta = cell(2,1);
theta{1} = reshape(thetaVec(1:6),[2 3]);
theta{2} = reshape(thetaVec(7:9),[1 3]);
Dew = cell(2,1);
DewApprox = cell(2,1);
% Forward propagation
a0 = x(index,:)';
z1 = theta{1}*[1;a0];
a1 = L(z1);
z2 = theta{2}*[1;a1];
a2 = L(z2);
% Back propagation
d2 = 1/m*(a2 - y(index))*L(z2)*(1-L(z2));
Dew{2} = [1;a1]*d2;
d1 = [1;a1].*(1 - [1;a1]).*theta{2}'*d2;
Dew{1} = [1;a0]*d1(2:end)';
% NNRes = NN(x,theta)';
% jVal = -1/m*sum(NNRes-y)*NNRes*(1-NNRes);
jVal = -1/m*(a2 - y(index))*a2*(1-a2);
gradVal = [Dew{1}(:);Dew{2}(:)];
gradApprox = CalcGradApprox(0.0001);
index = index + 1;
function output = CalcGradApprox(epsilon)
output = zeros(size(gradVal));
for n=1:length(thetaVec)
thetaVecMin = thetaVec;
thetaVecMax = thetaVec;
thetaVecMin(n) = thetaVec(n) - epsilon;
thetaVecMax(n) = thetaVec(n) + epsilon;
thetaMin = cell(2,1);
thetaMax = cell(2,1);
thetaMin{1} = reshape(thetaVecMin(1:6),[2 3]);
thetaMin{2} = reshape(thetaVecMin(7:9),[1 3]);
thetaMax{1} = reshape(thetaVecMax(1:6),[2 3]);
thetaMax{2} = reshape(thetaVecMax(7:9),[1 3]);
a2min = NN(x(index,:),thetaMin)';
a2max = NN(x(index,:),thetaMax)';
jValMin = -1/m*(a2min-y(index))*a2min*(1-a2min);
jValMax = -1/m*(a2max-y(index))*a2max*(1-a2max);
output(n) = (jValMax - jValMin)/2/epsilon;
end
end
end
EDIT:
Below I present the correct version of my costFunction for those who may be interested.
function [jVal,gradVal,gradApprox] = costFunction(thetaVec,x,y)
m = size(x,1);
L = #(x) (1 + exp(-x)).^(-1);
NN = #(x,theta) L(theta{2}*[ones(1,size(x,1));L(theta{1}*[ones(size(x,1),1) x]')]);
theta = cell(2,1);
theta{1} = reshape(thetaVec(1:6),[2 3]);
theta{2} = reshape(thetaVec(7:9),[1 3]);
Delta = cell(2,1);
Delta{1} = zeros(size(theta{1}));
Delta{2} = zeros(size(theta{2}));
D = cell(2,1);
D{1} = zeros(size(theta{1}));
D{2} = zeros(size(theta{2}));
jVal = 0;
for in = 1:size(x,1)
% Forward propagation
a1 = [1;x(in,:)']; % added bias to a0
z2 = theta{1}*a1;
a2 = [1;L(z2)]; % added bias to a1
z3 = theta{2}*a2;
a3 = L(z3);
% Back propagation
d3 = a3 - y(in);
d2 = theta{2}'*d3.*a2.*(1 - a2);
Delta{2} = Delta{2} + d3*a2';
Delta{1} = Delta{1} + d2(2:end)*a1';
jVal = jVal + sum( y(in)*log(a3) + (1-y(in))*log(1-a3) );
end
D{1} = 1/m*Delta{1};
D{2} = 1/m*Delta{2};
jVal = -1/m*jVal;
gradVal = [D{1}(:);D{2}(:)];
gradApprox = CalcGradApprox(x(in,:),0.0001);
% Nested function to calculate gradApprox
function output = CalcGradApprox(x,epsilon)
output = zeros(size(thetaVec));
for n=1:length(thetaVec)
thetaVecMin = thetaVec;
thetaVecMax = thetaVec;
thetaVecMin(n) = thetaVec(n) - epsilon;
thetaVecMax(n) = thetaVec(n) + epsilon;
thetaMin = cell(2,1);
thetaMax = cell(2,1);
thetaMin{1} = reshape(thetaVecMin(1:6),[2 3]);
thetaMin{2} = reshape(thetaVecMin(7:9),[1 3]);
thetaMax{1} = reshape(thetaVecMax(1:6),[2 3]);
thetaMax{2} = reshape(thetaVecMax(7:9),[1 3]);
a3min = NN(x,thetaMin)';
a3max = NN(x,thetaMax)';
jValMin = 0;
jValMax = 0;
for inn=1:size(x,1)
jValMin = jValMin + sum( y(inn)*log(a3min) + (1-y(inn))*log(1-a3min) );
jValMax = jValMax + sum( y(inn)*log(a3max) + (1-y(inn))*log(1-a3max) );
end
jValMin = 1/m*jValMin;
jValMax = 1/m*jValMax;
output(n) = (jValMax - jValMin)/2/epsilon;
end
end
end
I've only had a quick eyeball over your code. Here are some pointers.
Q1
should I sum each outputs given all training data (i = 1, ... N, where
N is number of inputs for training)
If you are talking in relation to the cost function, it is normal to sum and normalise by the number of training examples in order to provide comparison between.
I can't tell from the code whether you have a vectorised implementation which will change the answer. Note that the sum function will only sum up a single dimension at a time - meaning if you have a (M by N) array, sum will result in a 1 by N array.
The cost function should have a scalar output.
Q2
is the gradient calculated correctly
The gradient is not calculated correctly - specifically the deltas look wrong. Try following Andrew Ng's notes [PDF] they are very good.
Q3
is the numerical gradient (gradAapprox) calculated correctly.
This line looks a bit suspect. Does this make more sense?
output(n) = (jValMax - jValMin)/(2*epsilon);
EDIT: I actually can't make heads or tails of your gradient approximation. You should only use forward propagation and small tweaks in the parameters to compute the gradient. Good luck!