I'm trying to solve the following question :
maximize x^2-5x+y^2-3y
x+y <= 8
x<=2
x,y>= 0
By using Frank Wolf algorithm ( according to http://web.mit.edu/15.053/www/AMP-Chapter-13.pdf ).
But after running of the following program:
syms x y t;
f = x^2-5*x+y^2-3*y;
fdx = diff(f,1,x); % f'x
fdy = diff(f,1,y); % y'x
x0 = [0 0]; %initial point
A = [1 1;1 0]; %constrains matrix
b = [8;2];
lb = zeros(1,2);
eps = 0.00001;
i = 1;
X = [inf inf];
Z = zeros(2,200); %result for end points (x1,x2)
rr = zeros(1,200);
options = optimset('Display','none');
while( all(abs(X-x0)>[eps,eps]) && i < 200)
%f'x(x0)
c1 = subs(fdx,x,x0(1));
c1 = subs(c1,y,x0(2));
%f'y(x0)
c2 = subs(fdy,x,x0(1));
c2 = subs(c2,y,x0(2));
%optimization point of linear taylor function
ys = linprog((-[c1;c2]),A,b,[],[],lb,[],[],options);
%parametric representation of line
xt = (1-t)*x0(1)+t*ys(1,1);
yt = (1-t)*x0(2)+t*ys(2,1);
%f(x=xt,y=yt)
ft = subs(f,x,xt);
ft = subs(ft,y,yt);
%f't(x=xt,y=yt)
ftd = diff(ft,t,1);
%f't(x=xt,y=yt)=0 -> for max point
[t1] = solve(ftd); % (t==theta)
X = double(x0);%%%%%%%%%%%%%%%%%
% [ xt(t=t1) yt(t=t1)]
xnext(1) = subs(xt,t,t1) ;
xnext(2) = subs(yt,t,t1) ;
x0 = double(xnext);
Z(1,i) = x0(1);
Z(2,i) = x0(2);
i = i + 1;
end
x_point = Z(1,:);
y_point = Z(2,:);
% Draw result
scatter(x_point,y_point);
hold on;
% Print results
fprintf('The answer is:\n');
fprintf('x = %.3f \n',x0(1));
fprintf('y = %.3f \n',x0(2));
res = x0(1)^2 - 5*x0(1) + x0(2)^2 - 3*x0(2);
fprintf('f(x0) = %.3f\n',res);
I get the following result:
x = 3.020
y = 0.571
f(x0) = -7.367
And this no matter how many iterations I running this program (1,50 or 200).
Even if I choose a different starting point (For example, x0=(1,6) ), I get a negative answer to most.
I know that is an approximation, but the result should be positive (for x0 final, in this case).
My question is : what's wrong with my implementation?
Thanks in advance.
i changed a few things, it still doesn't look right but hopefully this is getting you in the right direction. It looks like the intial x0 points make a difference to how the algorithm converges.
Also make sure to check what i is after running the program, to determine if it ran to completion or exceeded the maximum iterations
lb = zeros(1,2);
ub = [2,8]; %if x+y<=8 and x,y>0 than both x,y < 8
eps = 0.00001;
i_max = 100;
i = 1;
X = [inf inf];
Z = zeros(2,i_max); %result for end points (x1,x2)
rr = zeros(1,200);
options = optimset('Display','none');
while( all(abs(X-x0)>[eps,eps]) && i < i_max)
%f'x(x0)
c1 = subs(fdx,x,x0(1));
c1 = subs(c1,y,x0(2));
%f'y(x0)
c2 = subs(fdy,x,x0(1));
c2 = subs(c2,y,x0(2));
%optimization point of linear taylor function
[ys, ~ , exit_flag] = linprog((-[c1;c2]),A,b,[],[],lb,ub,x0,options);
so here is the explanation of the changes
ub, uses our upper bound. After i added a ub, the result immediately changed
x0, start this iteration from the previous point
exit_flag this allows you to check exit_flag after execution (it always seems to be 1 indicating it solved the problem correctly)
Related
I'm trying to implement Divide and Conquer SVD of an upper bidiagonal matrix B, but my code is not working. The error is:
"Unable to perform assignment because the size of the left side is
3-by-3 and the size of the right side is 2-by-2.
V_bar(1:k,1:k) = V1;"
Can somebody help me fix it? Thanks.
function [U,S,V] = DivideConquer_SVD(B)
[m,n] = size(B);
k = floor(m/2);
if k == 0
U = 1;
V = 1;
S = B;
return;
else
% Divide the input matrix
alpha = B(k,k);
beta = B(k,k+1);
e1 = zeros(m,1);
e2 = zeros(m,1);
e1(k) = 1;
e2(k+1) = 1;
B1 = B(1:k-1,1:k);
B2 = B(k+1:m,k+1:m);
%recursive computations
[U1,S1,V1] = DivideConquer_SVD(B1);
[U2,S2,V2] = DivideConquer_SVD(B2);
U_bar = zeros(m);
U_bar(1:k-1,1:k-1) = U1;
U_bar(k,k) = 1;
U_bar((k+1):m,(k+1):m) = U2;
D = zeros(m);
D(1:k-1,1:k) = S1;
D((k+1):m,(k+1):m) = S2;
V_bar = zeros(m);
V_bar(1:k,1:k) = V1;
V_bar((k+1):m,(k+1):m) = V2;
u = alpha*e1'*V_bar + beta*e2'*V_bar;
u = u';
D_tilde = D*D + u*u';
% compute eigenvalues and eigenvectors of D^2+uu'
[L1,Q1] = eig(D_tilde);
eigs = diag(L1);
S = zeros(m,n)
S(1:(m+1):end) = eigs
U_tilde = Q1;
V_tilde = Q1;
%Compute eigenvectors of the original input matrix T
U = U_bar*U_tilde;
V = V_bar*V_tilde;
return;
end
With limited mathematical knowledge, you need to help me a bit more -- as I cannot judge if the approach is correct in a mathematical way (with no theory given;) ). Anyway, I couldn't even reproduce the error e.g with this matrix, which The MathWorks use to illustrate their LU matrix factorization
A = [10 -7 0
-3 2 6
5 -1 5];
So I tried to structure your code a bit and gave some hints. Extend this to make your code clearer for those people (like me) who are not too familiar with matrix decomposition.
function [U,S,V] = DivideConquer_SVD(B)
% m x n matrix
[m,n] = size(B);
k = floor(m/2);
if k == 0
disp('if') % for debugging
U = 1;
V = 1;
S = B;
% return; % net necessary as you don't do anything afterwards anyway
else
disp('else') % for debugging
% Divide the input matrix
alpha = B(k,k); % element on diagonal
beta = B(k,k+1); % element on off-diagonal
e1 = zeros(m,1);
e2 = zeros(m,1);
e1(k) = 1;
e2(k+1) = 1;
% divide matrix
B1 = B(1:k-1,1:k); % upper left quadrant
B2 = B(k+1:m,k+1:m); % lower right quadrant
% recusrsive function call
[U1,S1,V1] = DivideConquer_SVD(B1);
[U2,S2,V2] = DivideConquer_SVD(B2);
U_bar = zeros(m);
U_bar(1:k-1,1:k-1) = U1;
U_bar(k,k) = 1;
U_bar((k+1):m,(k+1):m) = U2;
D = zeros(m);
D(1:k-1,1:k) = S1;
D((k+1):m,(k+1):m) = S2;
V_bar = zeros(m);
V_bar(1:k,1:k) = V1;
V_bar((k+1):m,(k+1):m) = V2;
u = (alpha*e1.'*V_bar + beta*e2.'*V_bar).'; % (little show-off tip: '
% is the complex transpose operator; .' is the "normal" transpose
% operator. It's good practice to distinguish between them but there
% is no difference for real matrices anyway)
D_tilde = D*D + u*u.';
% compute eigenvalues and eigenvectors of D^2+uu'
[L1,Q1] = eig(D_tilde);
eigs = diag(L1);
S = zeros(m,n);
S(1:(m+1):end) = eigs;
U_tilde = Q1;
V_tilde = Q1;
% Compute eigenvectors of the original input matrix T
U = U_bar*U_tilde;
V = V_bar*V_tilde;
% return; % net necessary as you don't do anything afterwards anyway
end % for
end % function
This is my Approximate entropy Calculator in MATLAB. https://en.wikipedia.org/wiki/Approximate_entropy
I'm not sure why it isn't working. It's returning a negative value.Can anyone help me with this? R1 being the data.
FindSize = size(R1);
N = FindSize(1);
% N = input ('insert number of data values');
%if you want to put your own N in, take away the % from the line above
and
%insert the % before the N = FindSize(1)
%m = input ('insert m: integer representing length of data, embedding
dimension ');
m = 2;
%r = input ('insert r: positive real number for filtering, threshold
');
r = 0.2*std(R1);
for x1= R1(1:N-m+1,1)
D1 = pdist2(x1,x1);
C11 = (D1 <= r)/(N-m+1);
c1 = C11(1);
end
for i1 = 1:N-m+1
s1 = sum(log(c1));
end
phi1 = (s1/(N-m+1));
for x2= R1(1:N-m+2,1)
D2 = pdist2(x2,x2);
C21 = (D2 <= r)/(N-m+2);
c2 = C21(1);
end
for i2 = 1:N-m+2
s2 = sum(log(c2));
end
phi2 = (s2/(N-m+2));
Ap = phi1 - phi2;
Apen = Ap(1)
Following the documentation provided by the Wikipedia article, I developed this small function that calculates the approximate entropy:
function res = approximate_entropy(U,m,r)
N = numel(U);
res = zeros(1,2);
for i = [1 2]
off = m + i - 1;
off_N = N - off;
off_N1 = off_N + 1;
x = zeros(off_N1,off);
for j = 1:off
x(:,j) = U(j:off_N+j);
end
C = zeros(off_N1,1);
for j = 1:off_N1
dist = abs(x - repmat(x(j,:),off_N1,1));
C(j) = sum(~any((dist > r),2)) / off_N1;
end
res(i) = sum(log(C)) / off_N1;
end
res = res(1) - res(2);
end
I first tried to replicate the computation shown the article, and the result I obtain matches the result shown in the example:
U = repmat([85 80 89],1,17);
approximate_entropy(U,2,3)
ans =
-1.09965411068114e-05
Then I created another example that shows a case in which approximate entropy produces a meaningful result (the entropy of the first sample is always less than the entropy of the second one):
% starting variables...
s1 = repmat([10 20],1,10);
s1_m = mean(s1);
s1_s = std(s1);
s2_m = 0;
s2_s = 0;
% datasample will not always return a perfect M and S match
% so let's repeat this until equality is achieved...
while ((s1_m ~= s2_m) && (s1_s ~= s2_s))
s2 = datasample([10 20],20,'Replace',true,'Weights',[0.5 0.5]);
s2_m = mean(s2);
s2_s = std(s2);
end
m = 2;
r = 3;
ae1 = approximate_entropy(s1,m,r)
ae2 = approximate_entropy(s2,m,r)
ae1 =
0.00138568170752751
ae2 =
0.680090884817465
Finally, I tried with your sample data:
fid = fopen('O1.txt','r');
U = cell2mat(textscan(fid,'%f'));
fclose(fid);
m = 2;
r = 0.2 * std(U);
approximate_entropy(U,m,r)
ans =
1.08567461184858
I have a 3D volume and a 2D image and an approximate mapping (affine transformation with no skwewing, known scaling, rotation and translation approximately known and need fitting) between the two. Because there is an error in this mapping and I would like to further register the 2D image to the 3D volume. I have not written code for registration purposes before, but because I can't find any programs or code to solve this I would like to try and do this. I believe the standard for registration is to optimize mutual information. I think this would also be suitable here, because the intensities are not equal between the two images. So I think I should make a function for the transformation, a function for the mutual information and a function for optimization.
I did find some Matlab code on a mathworks thread from two years ago, based on an article. The OP reports that she managed to get the code to work, but I'm not getting how she did that exactly. Also in the IP package for matlab there is an implementation, but I dont have that package and there does not seem to be an equivalent for octave. SPM is a program that uses matlab and has registration implemented, but does not cope with 2d to 3d registration. On the file exchange there is a brute force method that registers two 2D images using mutual information.
What she does is pass a multi planar reconstruction function and an similarity/error function into a minimization algorithm. But the details I don't quite understand. Maybe it would be better to start fresh:
load mri; volume = squeeze(D);
phi = 3; theta = 2; psi = 5; %some small angles
tx = 1; ty = 1; tz = 1; % some small translation
dx = 0.25, dy = 0.25, dz = 2; %different scales
t = [tx; ty; tz];
r = [phi, theta, psi]; r = r*(pi/180);
dims = size(volume);
p0 = [round(dims(1)/2);round(dims(2)/2);round(dims(3)/2)]; %image center
S = eye(4); S(1,1) = dx; S(2,2) = dy; S(3,3) = dz;
Rx=[1 0 0 0;
0 cos(r(1)) sin(r(1)) 0;
0 -sin(r(1)) cos(r(1)) 0;
0 0 0 1];
Ry=[cos(r(2)) 0 -sin(r(2)) 0;
0 1 0 0;
sin(r(2)) 0 cos(r(2)) 0;
0 0 0 1];
Rz=[cos(r(3)) sin(r(3)) 0 0;
-sin(r(3)) cos(r(3)) 0 0;
0 0 1 0;
0 0 0 1];
R = S*Rz*Ry*Rx;
%make affine matrix to rotate about center of image
T1 = ( eye(3)-R(1:3,1:3) ) * p0(1:3);
T = T1 + t; %add translation
A = R;
A(1:3,4) = T;
Rold2new = A;
Rnew2old = inv(Rold2new);
%the transformation
[xx yy zz] = meshgrid(1:dims(1),1:dims(2),1:1);
coordinates_axes_new = [xx(:)';yy(:)';zz(:)'; ones(size(zz(:)))'];
coordinates_axes_old = Rnew2old*coordinates_axes_new;
Xcoordinates = reshape(coordinates_axes_old(1,:), dims(1), dims(2), dims(3));
Ycoordinates = reshape(coordinates_axes_old(2,:), dims(1), dims(2), dims(3));
Zcoordinates = reshape(coordinates_axes_old(3,:), dims(1), dims(2), dims(3));
%interpolation/reslicing
method = 'cubic';
slice= interp3(volume, Xcoordinates, Ycoordinates, Zcoordinates, method);
%so now I have my slice for which I would like to find the correct position
% first guess for A
A0 = eye(4); A0(1:3,4) = T1; A0(1,1) = dx; A0(2,2) = dy; A0(3,3) = dz;
% this is pretty close to A
% now how would I fit the slice to the volume by changing A0 and examining some similarity measure?
% probably maximize mutual information?
% http://www.mathworks.com/matlabcentral/fileexchange/14888-mutual-information-computation/content//mi/mutualinfo.m
Ok I was hoping for someone else's approach, that would probably have been better than mine as I have never done any optimization or registration before. So I waited for Knedlsepps bounty to almost finish. But I do have some code thats working now. It will find a local optimum so the initial guess must be good. I wrote some functions myself, took some functions from the file exchange as is and I extensively edited some other functions from the file exchange. Now that I put all the code together to work as an example here, the rotations are off, will try and correct that. Im not sure where the difference in code is between the example and my original code, must have made a typo in replacing some variables and data loading scheme.
What I do is I take the starting affine transformation matrix, decompose it to an orthogonal matrix and an upper triangular matrix. I then assume the orthogonal matrix is my rotation matrix so I calculate the euler angles from that. I directly take the translation from the affine matrix and as stated in the problem I assume I know the scaling matrix and there is no shearing. So then I have all degrees of freedom for the affine transformation, which my optimisation function changes and constructs a new affine matrix from, applies it to the volume and calculates the mutual information. The matlab optimisation function patternsearch then minimises 1-MI/MI_max.
What I noticed when using it on my real data which are multimodal brain images is that it works much better on brain extracted images, so with the skull and tissue outside of the skull removed.
%data
load mri; volume = double(squeeze(D));
%transformation parameters
phi = 3; theta = 1; psi = 5; %some small angles
tx = 1; ty = 1; tz = 3; % some small translation
dx = 0.25; dy = 0.25; dz = 2; %different scales
t = [tx; ty; tz];
r = [phi, theta, psi]; r = r*(pi/180);
%image center and size
dims = size(volume);
p0 = [round(dims(1)/2);round(dims(2)/2);round(dims(3)/2)];
%slice coordinate ranges
range_x = 1:dims(1)/dx;
range_y = 1:dims(2)/dy;
range_z = 1;
%rotation
R = dofaffine([0;0;0], r, [1,1,1]);
T1 = ( eye(3)-R(1:3,1:3) ) * p0(1:3); %rotate about p0
%scaling
S = eye(4); S(1,1) = dx; S(2,2) = dy; S(3,3) = dz;
%translation
T = [[eye(3), T1 + t]; [0 0 0 1]];
%affine
A = T*R*S;
% first guess for A
r00 = [1,1,1]*pi/180;
R00 = dofaffine([0;0;0], r00, [1 1 1]);
t00 = T1 + t + ( eye(3) - R00(1:3,1:3) ) * p0;
A0 = dofaffine( t00, r00, [dx, dy, dz] );
[ t0, r0, s0 ] = dofaffine( A0 );
x0 = [ t0.', r0, s0 ];
%the transformation
slice = affine3d(volume, A, range_x, range_y, range_z, 'cubic');
guess = affine3d(volume, A0, range_x, range_y, range_z, 'cubic');
%initialisation
Dt = [1; 1; 1];
Dr = [2 2 2].*pi/180;
Ds = [0 0 0];
Dx = [Dt', Dr, Ds];
%limits
LB = x0-Dx;
UB = x0+Dx;
%other inputs
ref_levels = length(unique(slice));
Qref = imquantize(slice, ref_levels);
MI_max = MI_GG(Qref, Qref);
%patternsearch options
options = psoptimset('InitialMeshSize',0.03,'MaxIter',20,'TolCon',1e-5,'TolMesh',5e-5,'TolX',1e-6,'PlotFcns',{#psplotbestf,#psplotbestx});
%optimise
[x2, MI_norm_neg, exitflag_len] = patternsearch(#(x) AffRegOptFunc(x, slice, volume, MI_max, x0), x0,[],[],[],[],LB(:),UB(:),options);
%check
p0 = [round(size(volume)/2).'];
R0 = dofaffine([0;0;0], x2(4:6)-x0(4:6), [1 1 1]);
t1 = ( eye(3) - R0(1:3,1:3) ) * p0;
A2 = dofaffine( x2(1:3).'+t1, x2(4:6), x2(7:9) ) ;
fitted = affine3d(volume, A2, range_x, range_y, range_z, 'cubic');
overlay1 = imfuse(slice, guess);
overlay2 = imfuse(slice, fitted);
figure(101);
ax(1) = subplot(1,2,1); imshow(overlay1, []); title('pre-reg')
ax(2) = subplot(1,2,2); imshow(overlay2, []); title('post-reg');
linkaxes(ax);
function normed_score = AffRegOptFunc( x, ref_im, reg_im, MI_max, x0 )
t = x(1:3).';
r = x(4:6);
s = x(7:9);
rangx = 1:size(ref_im,1);
rangy = 1:size(ref_im,2);
rangz = 1:size(ref_im,3);
ref_levels = length(unique(ref_im));
reg_levels = length(unique(reg_im));
t0 = x0(1:3).';
r0 = x0(4:6);
s0 = x0(7:9);
p0 = [round(size(reg_im)/2).'];
R = dofaffine([0;0;0], r-r0, [1 1 1]);
t1 = ( eye(3) - R(1:3,1:3) ) * p0;
t = t + t1;
Ap = dofaffine( t, r, s );
reg_im_t = affine3d(reg_im, A, rangx, rangy, rangz, 'cubic');
Qref = imquantize(ref_im, ref_levels);
Qreg = imquantize(reg_im_t, reg_levels);
MI = MI_GG(Qref, Qreg);
normed_score = 1-MI/MI_max;
end
function [ varargout ] = dofaffine( varargin )
% [ t, r, s ] = dofaffine( A )
% [ A ] = dofaffine( t, r, s )
if nargin == 1
%affine to degrees of freedom (no shear)
A = varargin{1};
[T, R, S] = decompaffine(A);
r = GetEulerAngles(R(1:3,1:3));
s = [S(1,1), S(2,2), S(3,3)];
t = T(1:3,4);
varargout{1} = t;
varargout{2} = r;
varargout{3} = s;
elseif nargin == 3
%degrees of freedom to affine (no shear)
t = varargin{1};
r = varargin{2};
s = varargin{3};
R = GetEulerAngles(r); R(4,4) = 1;
S(1,1) = s(1); S(2,2) = s(2); S(3,3) = s(3); S(4,4) = 1;
T = eye(4); T(1,4) = t(1); T(2,4) = t(2); T(3,4) = t(3);
A = T*R*S;
varargout{1} = A;
else
error('incorrect number of input arguments');
end
end
function [ T, R, S ] = decompaffine( A )
%I assume A = T * R * S
T = eye(4);
R = eye(4);
S = eye(4);
%decompose in orthogonal matrix q and upper triangular matrix r
%I assume q is a rotation matrix and r is a scale and shear matrix
%matlab 2014 can force real solution
[q r] = qr(A(1:3,1:3));
R(1:3,1:3) = q;
S(1:3,1:3) = r;
% A*S^-1*R^-1 = T*R*S*S^-1*R^-1 = T*R*I*R^-1 = T*R*R^-1 = T*I = T
T = A*inv(S)*inv(R);
t = T(1:3,4);
T = [eye(4) + [[0 0 0;0 0 0;0 0 0;0 0 0],[t;0]]];
end
function [varargout]= GetEulerAngles(R)
assert(length(R)==3)
dims = size(R);
if min(dims)==1
rx = R(1); ry = R(2); rz = R(3);
R = [[ cos(ry)*cos(rz), -cos(ry)*sin(rz), sin(ry)];...
[ cos(rx)*sin(rz) + cos(rz)*sin(rx)*sin(ry), cos(rx)*cos(rz) - sin(rx)*sin(ry)*sin(rz), -cos(ry)*sin(rx)];...
[ sin(rx)*sin(rz) - cos(rx)*cos(rz)*sin(ry), cos(rz)*sin(rx) + cos(rx)*sin(ry)*sin(rz), cos(rx)*cos(ry)]];
varargout{1} = R;
else
ry=asin(R(1,3));
rz=acos(R(1,1)/cos(ry));
rx=acos(R(3,3)/cos(ry));
if nargout > 1 && nargout < 4
varargout{1} = rx;
varargout{2} = ry;
varargout{3} = rz;
elseif nargout == 1
varargout{1} = [rx ry rz];
else
error('wrong number of output arguments');
end
end
end
I asked a question a few days before but I guess it was a little too complicated and I don't expect to get any answer.
My problem is that I need to use ANN for classification. I've read that much better cost function (or loss function as some books specify) is the cross-entropy, that is J(w) = -1/m * sum_i( yi*ln(hw(xi)) + (1-yi)*ln(1 - hw(xi)) ); i indicates the no. data from training matrix X. I tried to apply it in MATLAB but I find it really difficult. There are couple things I don't know:
should I sum each outputs given all training data (i = 1, ... N, where N is number of inputs for training)
is the gradient calculated correctly
is the numerical gradient (gradAapprox) calculated correctly.
I have following MATLAB codes. I realise I may ask for trivial thing but anyway I hope someone can give me some clues how to find the problem. I suspect the problem is to calculate gradients.
Many thanks.
Main script:
close all
clear all
L = #(x) (1 + exp(-x)).^(-1);
NN = #(x,theta) theta{2}*[ones(1,size(x,1));L(theta{1}*[ones(size(x,1),1) x]')];
% theta = [10 -30 -30];
x = [0 0; 0 1; 1 0; 1 1];
y = [0.9 0.1 0.1 0.1]';
theta0 = 2*rand(9,1)-1;
options = optimset('gradObj','on','Display','iter');
thetaVec = fminunc(#costFunction,theta0,options,x,y);
theta = cell(2,1);
theta{1} = reshape(thetaVec(1:6),[2 3]);
theta{2} = reshape(thetaVec(7:9),[1 3]);
NN(x,theta)'
Cost function:
function [jVal,gradVal,gradApprox] = costFunction(thetaVec,x,y)
persistent index;
% 1 x x
% 1 x x
% 1 x x
% x = 1 x x
% 1 x x
% 1 x x
% 1 x x
m = size(x,1);
if isempty(index) || index > size(x,1)
index = 1;
end
L = #(x) (1 + exp(-x)).^(-1);
NN = #(x,theta) theta{2}*[ones(1,size(x,1));L(theta{1}*[ones(size(x,1),1) x]')];
theta = cell(2,1);
theta{1} = reshape(thetaVec(1:6),[2 3]);
theta{2} = reshape(thetaVec(7:9),[1 3]);
Dew = cell(2,1);
DewApprox = cell(2,1);
% Forward propagation
a0 = x(index,:)';
z1 = theta{1}*[1;a0];
a1 = L(z1);
z2 = theta{2}*[1;a1];
a2 = L(z2);
% Back propagation
d2 = 1/m*(a2 - y(index))*L(z2)*(1-L(z2));
Dew{2} = [1;a1]*d2;
d1 = [1;a1].*(1 - [1;a1]).*theta{2}'*d2;
Dew{1} = [1;a0]*d1(2:end)';
% NNRes = NN(x,theta)';
% jVal = -1/m*sum(NNRes-y)*NNRes*(1-NNRes);
jVal = -1/m*(a2 - y(index))*a2*(1-a2);
gradVal = [Dew{1}(:);Dew{2}(:)];
gradApprox = CalcGradApprox(0.0001);
index = index + 1;
function output = CalcGradApprox(epsilon)
output = zeros(size(gradVal));
for n=1:length(thetaVec)
thetaVecMin = thetaVec;
thetaVecMax = thetaVec;
thetaVecMin(n) = thetaVec(n) - epsilon;
thetaVecMax(n) = thetaVec(n) + epsilon;
thetaMin = cell(2,1);
thetaMax = cell(2,1);
thetaMin{1} = reshape(thetaVecMin(1:6),[2 3]);
thetaMin{2} = reshape(thetaVecMin(7:9),[1 3]);
thetaMax{1} = reshape(thetaVecMax(1:6),[2 3]);
thetaMax{2} = reshape(thetaVecMax(7:9),[1 3]);
a2min = NN(x(index,:),thetaMin)';
a2max = NN(x(index,:),thetaMax)';
jValMin = -1/m*(a2min-y(index))*a2min*(1-a2min);
jValMax = -1/m*(a2max-y(index))*a2max*(1-a2max);
output(n) = (jValMax - jValMin)/2/epsilon;
end
end
end
EDIT:
Below I present the correct version of my costFunction for those who may be interested.
function [jVal,gradVal,gradApprox] = costFunction(thetaVec,x,y)
m = size(x,1);
L = #(x) (1 + exp(-x)).^(-1);
NN = #(x,theta) L(theta{2}*[ones(1,size(x,1));L(theta{1}*[ones(size(x,1),1) x]')]);
theta = cell(2,1);
theta{1} = reshape(thetaVec(1:6),[2 3]);
theta{2} = reshape(thetaVec(7:9),[1 3]);
Delta = cell(2,1);
Delta{1} = zeros(size(theta{1}));
Delta{2} = zeros(size(theta{2}));
D = cell(2,1);
D{1} = zeros(size(theta{1}));
D{2} = zeros(size(theta{2}));
jVal = 0;
for in = 1:size(x,1)
% Forward propagation
a1 = [1;x(in,:)']; % added bias to a0
z2 = theta{1}*a1;
a2 = [1;L(z2)]; % added bias to a1
z3 = theta{2}*a2;
a3 = L(z3);
% Back propagation
d3 = a3 - y(in);
d2 = theta{2}'*d3.*a2.*(1 - a2);
Delta{2} = Delta{2} + d3*a2';
Delta{1} = Delta{1} + d2(2:end)*a1';
jVal = jVal + sum( y(in)*log(a3) + (1-y(in))*log(1-a3) );
end
D{1} = 1/m*Delta{1};
D{2} = 1/m*Delta{2};
jVal = -1/m*jVal;
gradVal = [D{1}(:);D{2}(:)];
gradApprox = CalcGradApprox(x(in,:),0.0001);
% Nested function to calculate gradApprox
function output = CalcGradApprox(x,epsilon)
output = zeros(size(thetaVec));
for n=1:length(thetaVec)
thetaVecMin = thetaVec;
thetaVecMax = thetaVec;
thetaVecMin(n) = thetaVec(n) - epsilon;
thetaVecMax(n) = thetaVec(n) + epsilon;
thetaMin = cell(2,1);
thetaMax = cell(2,1);
thetaMin{1} = reshape(thetaVecMin(1:6),[2 3]);
thetaMin{2} = reshape(thetaVecMin(7:9),[1 3]);
thetaMax{1} = reshape(thetaVecMax(1:6),[2 3]);
thetaMax{2} = reshape(thetaVecMax(7:9),[1 3]);
a3min = NN(x,thetaMin)';
a3max = NN(x,thetaMax)';
jValMin = 0;
jValMax = 0;
for inn=1:size(x,1)
jValMin = jValMin + sum( y(inn)*log(a3min) + (1-y(inn))*log(1-a3min) );
jValMax = jValMax + sum( y(inn)*log(a3max) + (1-y(inn))*log(1-a3max) );
end
jValMin = 1/m*jValMin;
jValMax = 1/m*jValMax;
output(n) = (jValMax - jValMin)/2/epsilon;
end
end
end
I've only had a quick eyeball over your code. Here are some pointers.
Q1
should I sum each outputs given all training data (i = 1, ... N, where
N is number of inputs for training)
If you are talking in relation to the cost function, it is normal to sum and normalise by the number of training examples in order to provide comparison between.
I can't tell from the code whether you have a vectorised implementation which will change the answer. Note that the sum function will only sum up a single dimension at a time - meaning if you have a (M by N) array, sum will result in a 1 by N array.
The cost function should have a scalar output.
Q2
is the gradient calculated correctly
The gradient is not calculated correctly - specifically the deltas look wrong. Try following Andrew Ng's notes [PDF] they are very good.
Q3
is the numerical gradient (gradAapprox) calculated correctly.
This line looks a bit suspect. Does this make more sense?
output(n) = (jValMax - jValMin)/(2*epsilon);
EDIT: I actually can't make heads or tails of your gradient approximation. You should only use forward propagation and small tweaks in the parameters to compute the gradient. Good luck!
I'm trying to write a cubic spline interpolation program. I have written the program but, the graph is not coming out correctly. The spline uses natural boundary conditions(second dervative at start/end node are 0). The code is in Matlab and is shown below,
clear all
%Function to Interpolate
k = 10; %Number of Support Nodes-1
xs(1) = -1;
for j = 1:k
xs(j+1) = -1 +2*j/k; %Support Nodes(Equidistant)
end;
fs = 1./(25.*xs.^2+1); %Support Ordinates
x = [-0.99:2/(2*k):0.99]; %Places to Evaluate Function
fx = 1./(25.*x.^2+1); %Function Evaluated at x
%Cubic Spline Code(Coefficients to Calculate 2nd Derivatives)
f(1) = 2*(xs(3)-xs(1));
g(1) = xs(3)-xs(2);
r(1) = (6/(xs(3)-xs(2)))*(fs(3)-fs(2)) + (6/(xs(2)-xs(1)))*(fs(1)-fs(2));
e(1) = 0;
for i = 2:k-2
e(i) = xs(i+1)-xs(i);
f(i) = 2*(xs(i+2)-xs(i));
g(i) = xs(i+2)-xs(i+1);
r(i) = (6/(xs(i+2)-xs(i+1)))*(fs(i+2)-fs(i+1)) + ...
(6/(xs(i+1)-xs(i)))*(fs(i)-fs(i+1));
end
e(k-1) = xs(k)-xs(k-1);
f(k-1) = 2*(xs(k+1)-xs(k-1));
r(k-1) = (6/(xs(k+1)-xs(k)))*(fs(k+1)-fs(k)) + ...
(6/(xs(k)-xs(k-1)))*(fs(k-1)-fs(k));
%Tridiagonal System
i = 1;
A = zeros(k-1,k-1);
while i < size(A)+1;
A(i,i) = f(i);
if i < size(A);
A(i,i+1) = g(i);
A(i+1,i) = e(i);
end
i = i+1;
end
for i = 2:k-1 %Decomposition
e(i) = e(i)/f(i-1);
f(i) = f(i)-e(i)*g(i-1);
end
for i = 2:k-1 %Forward Substitution
r(i) = r(i)-e(i)*r(i-1);
end
xn(k-1)= r(k-1)/f(k-1);
for i = k-2:-1:1 %Back Substitution
xn(i) = (r(i)-g(i)*xn(i+1))/f(i);
end
%Interpolation
if (max(xs) <= max(x))
error('Outside Range');
end
if (min(xs) >= min(x))
error('Outside Range');
end
P = zeros(size(length(x),length(x)));
i = 1;
for Counter = 1:length(x)
for j = 1:k-1
a(j) = x(Counter)- xs(j);
end
i = find(a == min(a(a>=0)));
if i == 1
c1 = 0;
c2 = xn(1)/6/(xs(2)-xs(1));
c3 = fs(1)/(xs(2)-xs(1));
c4 = fs(2)/(xs(2)-xs(1))-xn(1)*(xs(2)-xs(1))/6;
t1 = c1*(xs(2)-x(Counter))^3;
t2 = c2*(x(Counter)-xs(1))^3;
t3 = c3*(xs(2)-x(Counter));
t4 = c4*(x(Counter)-xs(1));
P(Counter) = t1 +t2 +t3 +t4;
else
if i < k-1
c1 = xn(i-1+1)/6/(xs(i+1)-xs(i-1+1));
c2 = xn(i+1)/6/(xs(i+1)-xs(i-1+1));
c3 = fs(i-1+1)/(xs(i+1)-xs(i-1+1))-xn(i-1+1)*(xs(i+1)-xs(i-1+1))/6;
c4 = fs(i+1)/(xs(i+1)-xs(i-1+1))-xn(i+1)*(xs(i+1)-xs(i-1+1))/6;
t1 = c1*(xs(i+1)-x(Counter))^3;
t2 = c2*(x(Counter)-xs(i-1+1))^3;
t3 = c3*(xs(i+1)-x(Counter));
t4 = c4*(x(Counter)-xs(i-1+1));
P(Counter) = t1 +t2 +t3 +t4;
else
c1 = xn(i-1+1)/6/(xs(i+1)-xs(i-1+1));
c2 = 0;
c3 = fs(i-1+1)/(xs(i+1)-xs(i-1+1))-xn(i-1+1)*(xs(i+1)-xs(i-1+1))/6;
c4 = fs(i+1)/(xs(i+1)-xs(i-1+1));
t1 = c1*(xs(i+1)-x(Counter))^3;
t2 = c2*(x(Counter)-xs(i-1+1))^3;
t3 = c3*(xs(i+1)-x(Counter));
t4 = c4*(x(Counter)-xs(i-1+1));
P(Counter) = t1 +t2 +t3 +t4;
end
end
end
P = P';
P(length(x)) = NaN;
plot(x,P,x,fx)
When I run the code, the interpolation function is not symmetric and, it doesn't converge correctly. Can anyone offer any suggestions about problems in my code? Thanks.
I wrote a cubic spline package in Mathematica a long time ago. Here is my translation of that package into Matlab. Note I haven't looked at cubic splines in about 7 years, so I'm basing this off my own documentation. You should check everything I say.
The basic problem is we are given n data points (x(1), y(1)) , ... , (x(n), y(n)) and we wish to calculate a piecewise cubic interpolant. The interpolant is defined as
S(x) = { Sk(x) when x(k) <= x <= x(k+1)
{ 0 otherwise
Here Sk(x) is a cubic polynomial of the form
Sk(x) = sk0 + sk1*(x-x(k)) + sk2*(x-x(k))^2 + sk3*(x-x(k))^3
The properties of the spline are:
The spline pass through the data point Sk(x(k)) = y(k)
The spline is continuous at the end-points and thus continuous everywhere in the interpolation interval Sk(x(k+1)) = Sk+1(x(k+1))
The spline has continuous first derivative Sk'(x(k+1)) = Sk+1'(x(k+1))
The spline has continuous second derivative Sk''(x(k+1)) = Sk+1''(x(k+1))
To construct a cubic spline from a set of data point we need to solve for the coefficients
sk0, sk1, sk2 and sk3 for each of the n-1 cubic polynomials. That is a total of 4*(n-1) = 4*n - 4 unknowns. Property 1 supplies n constraints, and properties 2,3,4 each supply an additional n-2 constraints. Thus we have n + 3*(n-2) = 4*n - 6 constraints and 4*n - 4 unknowns. This leaves two degrees of freedom. We fix these degrees of freedom by setting the second derivative equal to zero at the start and end nodes.
Let m(k) = Sk''(x(k)) , h(k) = x(k+1) - x(k) and d(k) = (y(k+1) - y(k))/h(k). The following
three-term recurrence relation holds
h(k-1)*m(k-1) + 2*(h(k-1) + h(k))*m(k) + h(k)*m(k+1) = 6*(d(k) - d(k-1))
The m(k) are unknowns we wish to solve for. The h(k) and d(k) are defined by the input data.
This three-term recurrence relation defines a tridiagonal linear system. Once the m(k) are determined the coefficients for Sk are given by
sk0 = y(k)
sk1 = d(k) - h(k)*(2*m(k) + m(k-1))/6
sk2 = m(k)/2
sk3 = m(k+1) - m(k)/(6*h(k))
Okay that is all the math you need to know to completely define the algorithm to compute a cubic spline. Here it is in Matlab:
function [s0,s1,s2,s3]=cubic_spline(x,y)
if any(size(x) ~= size(y)) || size(x,2) ~= 1
error('inputs x and y must be column vectors of equal length');
end
n = length(x)
h = x(2:n) - x(1:n-1);
d = (y(2:n) - y(1:n-1))./h;
lower = h(1:end-1);
main = 2*(h(1:end-1) + h(2:end));
upper = h(2:end);
T = spdiags([lower main upper], [-1 0 1], n-2, n-2);
rhs = 6*(d(2:end)-d(1:end-1));
m = T\rhs;
% Use natural boundary conditions where second derivative
% is zero at the endpoints
m = [ 0; m; 0];
s0 = y;
s1 = d - h.*(2*m(1:end-1) + m(2:end))/6;
s2 = m/2;
s3 =(m(2:end)-m(1:end-1))./(6*h);
Here is some code to plot a cubic spline:
function plot_cubic_spline(x,s0,s1,s2,s3)
n = length(x);
inner_points = 20;
for i=1:n-1
xx = linspace(x(i),x(i+1),inner_points);
xi = repmat(x(i),1,inner_points);
yy = s0(i) + s1(i)*(xx-xi) + ...
s2(i)*(xx-xi).^2 + s3(i)*(xx - xi).^3;
plot(xx,yy,'b')
plot(x(i),0,'r');
end
Here is a function that constructs a cubic spline and plots in on the famous Runge function:
function cubic_driver(num_points)
runge = #(x) 1./(1+ 25*x.^2);
x = linspace(-1,1,num_points);
y = runge(x);
[s0,s1,s2,s3] = cubic_spline(x',y');
plot_points = 1000;
xx = linspace(-1,1,plot_points);
yy = runge(xx);
plot(xx,yy,'g');
hold on;
plot_cubic_spline(x,s0,s1,s2,s3);
You can see it in action by running the following at the Matlab prompt
>> cubic_driver(5)
>> clf
>> cubic_driver(10)
>> clf
>> cubic_driver(20)
By the time you have twenty nodes your interpolant is visually indistinguishable from the Runge function.
Some comments on the Matlab code: I don't use any for or while loops. I am able to vectorize all operations. I quickly form the sparse tridiagonal matrix with spdiags. I solve it using the backslash operator. I counting on Tim Davis's UMFPACK to handle the decomposition and forward and backward solves.
Hope that helps. The code is available as a gist on github https://gist.github.com/1269709
There was a bug in spline function, generated (n-2) by (n-2) should be symmetric:
lower = h(2:end);
main = 2*(h(1:end-1) + h(2:end));
upper = h(1:end-1);
http://www.mpi-hd.mpg.de/astrophysik/HEA/internal/Numerical_Recipes/f3-3.pdf