Rigidly register a 2D image to a 3D volume with good initial guess for affine transformation - matlab

I have a 3D volume and a 2D image and an approximate mapping (affine transformation with no skwewing, known scaling, rotation and translation approximately known and need fitting) between the two. Because there is an error in this mapping and I would like to further register the 2D image to the 3D volume. I have not written code for registration purposes before, but because I can't find any programs or code to solve this I would like to try and do this. I believe the standard for registration is to optimize mutual information. I think this would also be suitable here, because the intensities are not equal between the two images. So I think I should make a function for the transformation, a function for the mutual information and a function for optimization.
I did find some Matlab code on a mathworks thread from two years ago, based on an article. The OP reports that she managed to get the code to work, but I'm not getting how she did that exactly. Also in the IP package for matlab there is an implementation, but I dont have that package and there does not seem to be an equivalent for octave. SPM is a program that uses matlab and has registration implemented, but does not cope with 2d to 3d registration. On the file exchange there is a brute force method that registers two 2D images using mutual information.
What she does is pass a multi planar reconstruction function and an similarity/error function into a minimization algorithm. But the details I don't quite understand. Maybe it would be better to start fresh:
load mri; volume = squeeze(D);
phi = 3; theta = 2; psi = 5; %some small angles
tx = 1; ty = 1; tz = 1; % some small translation
dx = 0.25, dy = 0.25, dz = 2; %different scales
t = [tx; ty; tz];
r = [phi, theta, psi]; r = r*(pi/180);
dims = size(volume);
p0 = [round(dims(1)/2);round(dims(2)/2);round(dims(3)/2)]; %image center
S = eye(4); S(1,1) = dx; S(2,2) = dy; S(3,3) = dz;
Rx=[1 0 0 0;
0 cos(r(1)) sin(r(1)) 0;
0 -sin(r(1)) cos(r(1)) 0;
0 0 0 1];
Ry=[cos(r(2)) 0 -sin(r(2)) 0;
0 1 0 0;
sin(r(2)) 0 cos(r(2)) 0;
0 0 0 1];
Rz=[cos(r(3)) sin(r(3)) 0 0;
-sin(r(3)) cos(r(3)) 0 0;
0 0 1 0;
0 0 0 1];
R = S*Rz*Ry*Rx;
%make affine matrix to rotate about center of image
T1 = ( eye(3)-R(1:3,1:3) ) * p0(1:3);
T = T1 + t; %add translation
A = R;
A(1:3,4) = T;
Rold2new = A;
Rnew2old = inv(Rold2new);
%the transformation
[xx yy zz] = meshgrid(1:dims(1),1:dims(2),1:1);
coordinates_axes_new = [xx(:)';yy(:)';zz(:)'; ones(size(zz(:)))'];
coordinates_axes_old = Rnew2old*coordinates_axes_new;
Xcoordinates = reshape(coordinates_axes_old(1,:), dims(1), dims(2), dims(3));
Ycoordinates = reshape(coordinates_axes_old(2,:), dims(1), dims(2), dims(3));
Zcoordinates = reshape(coordinates_axes_old(3,:), dims(1), dims(2), dims(3));
%interpolation/reslicing
method = 'cubic';
slice= interp3(volume, Xcoordinates, Ycoordinates, Zcoordinates, method);
%so now I have my slice for which I would like to find the correct position
% first guess for A
A0 = eye(4); A0(1:3,4) = T1; A0(1,1) = dx; A0(2,2) = dy; A0(3,3) = dz;
% this is pretty close to A
% now how would I fit the slice to the volume by changing A0 and examining some similarity measure?
% probably maximize mutual information?
% http://www.mathworks.com/matlabcentral/fileexchange/14888-mutual-information-computation/content//mi/mutualinfo.m

Ok I was hoping for someone else's approach, that would probably have been better than mine as I have never done any optimization or registration before. So I waited for Knedlsepps bounty to almost finish. But I do have some code thats working now. It will find a local optimum so the initial guess must be good. I wrote some functions myself, took some functions from the file exchange as is and I extensively edited some other functions from the file exchange. Now that I put all the code together to work as an example here, the rotations are off, will try and correct that. Im not sure where the difference in code is between the example and my original code, must have made a typo in replacing some variables and data loading scheme.
What I do is I take the starting affine transformation matrix, decompose it to an orthogonal matrix and an upper triangular matrix. I then assume the orthogonal matrix is my rotation matrix so I calculate the euler angles from that. I directly take the translation from the affine matrix and as stated in the problem I assume I know the scaling matrix and there is no shearing. So then I have all degrees of freedom for the affine transformation, which my optimisation function changes and constructs a new affine matrix from, applies it to the volume and calculates the mutual information. The matlab optimisation function patternsearch then minimises 1-MI/MI_max.
What I noticed when using it on my real data which are multimodal brain images is that it works much better on brain extracted images, so with the skull and tissue outside of the skull removed.
%data
load mri; volume = double(squeeze(D));
%transformation parameters
phi = 3; theta = 1; psi = 5; %some small angles
tx = 1; ty = 1; tz = 3; % some small translation
dx = 0.25; dy = 0.25; dz = 2; %different scales
t = [tx; ty; tz];
r = [phi, theta, psi]; r = r*(pi/180);
%image center and size
dims = size(volume);
p0 = [round(dims(1)/2);round(dims(2)/2);round(dims(3)/2)];
%slice coordinate ranges
range_x = 1:dims(1)/dx;
range_y = 1:dims(2)/dy;
range_z = 1;
%rotation
R = dofaffine([0;0;0], r, [1,1,1]);
T1 = ( eye(3)-R(1:3,1:3) ) * p0(1:3); %rotate about p0
%scaling
S = eye(4); S(1,1) = dx; S(2,2) = dy; S(3,3) = dz;
%translation
T = [[eye(3), T1 + t]; [0 0 0 1]];
%affine
A = T*R*S;
% first guess for A
r00 = [1,1,1]*pi/180;
R00 = dofaffine([0;0;0], r00, [1 1 1]);
t00 = T1 + t + ( eye(3) - R00(1:3,1:3) ) * p0;
A0 = dofaffine( t00, r00, [dx, dy, dz] );
[ t0, r0, s0 ] = dofaffine( A0 );
x0 = [ t0.', r0, s0 ];
%the transformation
slice = affine3d(volume, A, range_x, range_y, range_z, 'cubic');
guess = affine3d(volume, A0, range_x, range_y, range_z, 'cubic');
%initialisation
Dt = [1; 1; 1];
Dr = [2 2 2].*pi/180;
Ds = [0 0 0];
Dx = [Dt', Dr, Ds];
%limits
LB = x0-Dx;
UB = x0+Dx;
%other inputs
ref_levels = length(unique(slice));
Qref = imquantize(slice, ref_levels);
MI_max = MI_GG(Qref, Qref);
%patternsearch options
options = psoptimset('InitialMeshSize',0.03,'MaxIter',20,'TolCon',1e-5,'TolMesh',5e-5,'TolX',1e-6,'PlotFcns',{#psplotbestf,#psplotbestx});
%optimise
[x2, MI_norm_neg, exitflag_len] = patternsearch(#(x) AffRegOptFunc(x, slice, volume, MI_max, x0), x0,[],[],[],[],LB(:),UB(:),options);
%check
p0 = [round(size(volume)/2).'];
R0 = dofaffine([0;0;0], x2(4:6)-x0(4:6), [1 1 1]);
t1 = ( eye(3) - R0(1:3,1:3) ) * p0;
A2 = dofaffine( x2(1:3).'+t1, x2(4:6), x2(7:9) ) ;
fitted = affine3d(volume, A2, range_x, range_y, range_z, 'cubic');
overlay1 = imfuse(slice, guess);
overlay2 = imfuse(slice, fitted);
figure(101);
ax(1) = subplot(1,2,1); imshow(overlay1, []); title('pre-reg')
ax(2) = subplot(1,2,2); imshow(overlay2, []); title('post-reg');
linkaxes(ax);
function normed_score = AffRegOptFunc( x, ref_im, reg_im, MI_max, x0 )
t = x(1:3).';
r = x(4:6);
s = x(7:9);
rangx = 1:size(ref_im,1);
rangy = 1:size(ref_im,2);
rangz = 1:size(ref_im,3);
ref_levels = length(unique(ref_im));
reg_levels = length(unique(reg_im));
t0 = x0(1:3).';
r0 = x0(4:6);
s0 = x0(7:9);
p0 = [round(size(reg_im)/2).'];
R = dofaffine([0;0;0], r-r0, [1 1 1]);
t1 = ( eye(3) - R(1:3,1:3) ) * p0;
t = t + t1;
Ap = dofaffine( t, r, s );
reg_im_t = affine3d(reg_im, A, rangx, rangy, rangz, 'cubic');
Qref = imquantize(ref_im, ref_levels);
Qreg = imquantize(reg_im_t, reg_levels);
MI = MI_GG(Qref, Qreg);
normed_score = 1-MI/MI_max;
end
function [ varargout ] = dofaffine( varargin )
% [ t, r, s ] = dofaffine( A )
% [ A ] = dofaffine( t, r, s )
if nargin == 1
%affine to degrees of freedom (no shear)
A = varargin{1};
[T, R, S] = decompaffine(A);
r = GetEulerAngles(R(1:3,1:3));
s = [S(1,1), S(2,2), S(3,3)];
t = T(1:3,4);
varargout{1} = t;
varargout{2} = r;
varargout{3} = s;
elseif nargin == 3
%degrees of freedom to affine (no shear)
t = varargin{1};
r = varargin{2};
s = varargin{3};
R = GetEulerAngles(r); R(4,4) = 1;
S(1,1) = s(1); S(2,2) = s(2); S(3,3) = s(3); S(4,4) = 1;
T = eye(4); T(1,4) = t(1); T(2,4) = t(2); T(3,4) = t(3);
A = T*R*S;
varargout{1} = A;
else
error('incorrect number of input arguments');
end
end
function [ T, R, S ] = decompaffine( A )
%I assume A = T * R * S
T = eye(4);
R = eye(4);
S = eye(4);
%decompose in orthogonal matrix q and upper triangular matrix r
%I assume q is a rotation matrix and r is a scale and shear matrix
%matlab 2014 can force real solution
[q r] = qr(A(1:3,1:3));
R(1:3,1:3) = q;
S(1:3,1:3) = r;
% A*S^-1*R^-1 = T*R*S*S^-1*R^-1 = T*R*I*R^-1 = T*R*R^-1 = T*I = T
T = A*inv(S)*inv(R);
t = T(1:3,4);
T = [eye(4) + [[0 0 0;0 0 0;0 0 0;0 0 0],[t;0]]];
end
function [varargout]= GetEulerAngles(R)
assert(length(R)==3)
dims = size(R);
if min(dims)==1
rx = R(1); ry = R(2); rz = R(3);
R = [[ cos(ry)*cos(rz), -cos(ry)*sin(rz), sin(ry)];...
[ cos(rx)*sin(rz) + cos(rz)*sin(rx)*sin(ry), cos(rx)*cos(rz) - sin(rx)*sin(ry)*sin(rz), -cos(ry)*sin(rx)];...
[ sin(rx)*sin(rz) - cos(rx)*cos(rz)*sin(ry), cos(rz)*sin(rx) + cos(rx)*sin(ry)*sin(rz), cos(rx)*cos(ry)]];
varargout{1} = R;
else
ry=asin(R(1,3));
rz=acos(R(1,1)/cos(ry));
rx=acos(R(3,3)/cos(ry));
if nargout > 1 && nargout < 4
varargout{1} = rx;
varargout{2} = ry;
varargout{3} = rz;
elseif nargout == 1
varargout{1} = [rx ry rz];
else
error('wrong number of output arguments');
end
end
end

Related

Finite Element assembly

I'm having serious problems in a simple example of fem assembly.
I just want to assemble the Mass matrix without any coefficient. The geometry is simple:
conn=[1, 2, 3];
p = [0 0; 1 0; 0 1];
I made it like this so that the physical element will be equal to the reference one.
my basis functions:
phi_1 = #(eta) 1 - eta(1) - eta(2);
phi_2 = #(eta) eta(1);
phi_3 = #(eta) eta(2);
phi = {phi_1, phi_2, phi_3};
Jacobian matrix:
J = #(x,y) [x(2) - x(1), x(3) - x(1);
y(2) - y(1), y(3) - y(1)];
The rest of the code:
M = zeros(np,np);
for K = 1:size(conn,1)
l2g = conn(K,:); %local to global mapping
x = p(l2g,1); %node x-coordinate
y = p(l2g,2); %node y-coordinate
jac = J(x,y);
loc_M = localM(jac, phi);
M(l2g, l2g) = M(l2g, l2g) + loc_M; %add element masses to M
end
localM:
function loc_M = localM(J,phi)
d_J = det(J);
loc_M = zeros(3,3);
for i = 1:3
for j = 1:3
loc_M(i,j) = d_J * quadrature(phi{i}, phi{j});
end
end
end
quadrature:
function value = quadrature(phi_i, phi_j)
p = [1/3, 1/3;
0.6, 0.2;
0.2, 0.6;
0.2, 0.2];
w = [-27/96, 25/96, 25/96, 25/96];
res = 0;
for i = 1:size(p,1)
res = res + phi_i(p(i,:)) * phi_j(p(i,:)) * w(i);
end
value = res;
end
For the simple entry (1,1) I obtain 0.833, while computing the integral by hand or on wolfram alpha I get 0.166 (2 times the result of the quadrature).
I tried with different points and weights for quadrature, but really I do not know what I am doing wrong.

Divide and Conquer SVD in MATLAB

I'm trying to implement Divide and Conquer SVD of an upper bidiagonal matrix B, but my code is not working. The error is:
"Unable to perform assignment because the size of the left side is
3-by-3 and the size of the right side is 2-by-2.
V_bar(1:k,1:k) = V1;"
Can somebody help me fix it? Thanks.
function [U,S,V] = DivideConquer_SVD(B)
[m,n] = size(B);
k = floor(m/2);
if k == 0
U = 1;
V = 1;
S = B;
return;
else
% Divide the input matrix
alpha = B(k,k);
beta = B(k,k+1);
e1 = zeros(m,1);
e2 = zeros(m,1);
e1(k) = 1;
e2(k+1) = 1;
B1 = B(1:k-1,1:k);
B2 = B(k+1:m,k+1:m);
%recursive computations
[U1,S1,V1] = DivideConquer_SVD(B1);
[U2,S2,V2] = DivideConquer_SVD(B2);
U_bar = zeros(m);
U_bar(1:k-1,1:k-1) = U1;
U_bar(k,k) = 1;
U_bar((k+1):m,(k+1):m) = U2;
D = zeros(m);
D(1:k-1,1:k) = S1;
D((k+1):m,(k+1):m) = S2;
V_bar = zeros(m);
V_bar(1:k,1:k) = V1;
V_bar((k+1):m,(k+1):m) = V2;
u = alpha*e1'*V_bar + beta*e2'*V_bar;
u = u';
D_tilde = D*D + u*u';
% compute eigenvalues and eigenvectors of D^2+uu'
[L1,Q1] = eig(D_tilde);
eigs = diag(L1);
S = zeros(m,n)
S(1:(m+1):end) = eigs
U_tilde = Q1;
V_tilde = Q1;
%Compute eigenvectors of the original input matrix T
U = U_bar*U_tilde;
V = V_bar*V_tilde;
return;
end
With limited mathematical knowledge, you need to help me a bit more -- as I cannot judge if the approach is correct in a mathematical way (with no theory given;) ). Anyway, I couldn't even reproduce the error e.g with this matrix, which The MathWorks use to illustrate their LU matrix factorization
A = [10 -7 0
-3 2 6
5 -1 5];
So I tried to structure your code a bit and gave some hints. Extend this to make your code clearer for those people (like me) who are not too familiar with matrix decomposition.
function [U,S,V] = DivideConquer_SVD(B)
% m x n matrix
[m,n] = size(B);
k = floor(m/2);
if k == 0
disp('if') % for debugging
U = 1;
V = 1;
S = B;
% return; % net necessary as you don't do anything afterwards anyway
else
disp('else') % for debugging
% Divide the input matrix
alpha = B(k,k); % element on diagonal
beta = B(k,k+1); % element on off-diagonal
e1 = zeros(m,1);
e2 = zeros(m,1);
e1(k) = 1;
e2(k+1) = 1;
% divide matrix
B1 = B(1:k-1,1:k); % upper left quadrant
B2 = B(k+1:m,k+1:m); % lower right quadrant
% recusrsive function call
[U1,S1,V1] = DivideConquer_SVD(B1);
[U2,S2,V2] = DivideConquer_SVD(B2);
U_bar = zeros(m);
U_bar(1:k-1,1:k-1) = U1;
U_bar(k,k) = 1;
U_bar((k+1):m,(k+1):m) = U2;
D = zeros(m);
D(1:k-1,1:k) = S1;
D((k+1):m,(k+1):m) = S2;
V_bar = zeros(m);
V_bar(1:k,1:k) = V1;
V_bar((k+1):m,(k+1):m) = V2;
u = (alpha*e1.'*V_bar + beta*e2.'*V_bar).'; % (little show-off tip: '
% is the complex transpose operator; .' is the "normal" transpose
% operator. It's good practice to distinguish between them but there
% is no difference for real matrices anyway)
D_tilde = D*D + u*u.';
% compute eigenvalues and eigenvectors of D^2+uu'
[L1,Q1] = eig(D_tilde);
eigs = diag(L1);
S = zeros(m,n);
S(1:(m+1):end) = eigs;
U_tilde = Q1;
V_tilde = Q1;
% Compute eigenvectors of the original input matrix T
U = U_bar*U_tilde;
V = V_bar*V_tilde;
% return; % net necessary as you don't do anything afterwards anyway
end % for
end % function

Frank - Wolfe Algorithm in matlab

I'm trying to solve the following question :
maximize x^2-5x+y^2-3y
x+y <= 8
x<=2
x,y>= 0
By using Frank Wolf algorithm ( according to http://web.mit.edu/15.053/www/AMP-Chapter-13.pdf ).
But after running of the following program:
syms x y t;
f = x^2-5*x+y^2-3*y;
fdx = diff(f,1,x); % f'x
fdy = diff(f,1,y); % y'x
x0 = [0 0]; %initial point
A = [1 1;1 0]; %constrains matrix
b = [8;2];
lb = zeros(1,2);
eps = 0.00001;
i = 1;
X = [inf inf];
Z = zeros(2,200); %result for end points (x1,x2)
rr = zeros(1,200);
options = optimset('Display','none');
while( all(abs(X-x0)>[eps,eps]) && i < 200)
%f'x(x0)
c1 = subs(fdx,x,x0(1));
c1 = subs(c1,y,x0(2));
%f'y(x0)
c2 = subs(fdy,x,x0(1));
c2 = subs(c2,y,x0(2));
%optimization point of linear taylor function
ys = linprog((-[c1;c2]),A,b,[],[],lb,[],[],options);
%parametric representation of line
xt = (1-t)*x0(1)+t*ys(1,1);
yt = (1-t)*x0(2)+t*ys(2,1);
%f(x=xt,y=yt)
ft = subs(f,x,xt);
ft = subs(ft,y,yt);
%f't(x=xt,y=yt)
ftd = diff(ft,t,1);
%f't(x=xt,y=yt)=0 -> for max point
[t1] = solve(ftd); % (t==theta)
X = double(x0);%%%%%%%%%%%%%%%%%
% [ xt(t=t1) yt(t=t1)]
xnext(1) = subs(xt,t,t1) ;
xnext(2) = subs(yt,t,t1) ;
x0 = double(xnext);
Z(1,i) = x0(1);
Z(2,i) = x0(2);
i = i + 1;
end
x_point = Z(1,:);
y_point = Z(2,:);
% Draw result
scatter(x_point,y_point);
hold on;
% Print results
fprintf('The answer is:\n');
fprintf('x = %.3f \n',x0(1));
fprintf('y = %.3f \n',x0(2));
res = x0(1)^2 - 5*x0(1) + x0(2)^2 - 3*x0(2);
fprintf('f(x0) = %.3f\n',res);
I get the following result:
x = 3.020
y = 0.571
f(x0) = -7.367
And this no matter how many iterations I running this program (1,50 or 200).
Even if I choose a different starting point (For example, x0=(1,6) ), I get a negative answer to most.
I know that is an approximation, but the result should be positive (for x0 final, in this case).
My question is : what's wrong with my implementation?
Thanks in advance.
i changed a few things, it still doesn't look right but hopefully this is getting you in the right direction. It looks like the intial x0 points make a difference to how the algorithm converges.
Also make sure to check what i is after running the program, to determine if it ran to completion or exceeded the maximum iterations
lb = zeros(1,2);
ub = [2,8]; %if x+y<=8 and x,y>0 than both x,y < 8
eps = 0.00001;
i_max = 100;
i = 1;
X = [inf inf];
Z = zeros(2,i_max); %result for end points (x1,x2)
rr = zeros(1,200);
options = optimset('Display','none');
while( all(abs(X-x0)>[eps,eps]) && i < i_max)
%f'x(x0)
c1 = subs(fdx,x,x0(1));
c1 = subs(c1,y,x0(2));
%f'y(x0)
c2 = subs(fdy,x,x0(1));
c2 = subs(c2,y,x0(2));
%optimization point of linear taylor function
[ys, ~ , exit_flag] = linprog((-[c1;c2]),A,b,[],[],lb,ub,x0,options);
so here is the explanation of the changes
ub, uses our upper bound. After i added a ub, the result immediately changed
x0, start this iteration from the previous point
exit_flag this allows you to check exit_flag after execution (it always seems to be 1 indicating it solved the problem correctly)

data fitting an ellipse in 3D space

Forum
I've got a set of data that apparently forms an ellipse in 3D space (not an ellipsoid, but a curve in 3D).
Being inspired by following thread http://au.mathworks.com/matlabcentral/newsreader/view_thread/65773
and with the help from someone ,I manage to get the optimization code running and outputs a set of best parameters x (vector). However, when I try to use this x to replicate the ellipse,the outcomes is a strange straight line in the space. I have been stacked on this for days., still don't know what went wrong....pretty much devastated... I hope someone can shed some light on this. The Mathematica formulations for the ellipse follows the same as in the above thread ,where
The 3D-ellipse is given by: (x;y;z) = (z1;z2;z3) + R(alpha,beta,gamma).(acos(phi); b*sin(phi);0)
where:
* z is the translation vector.
* R is the rotation matrix (using Euler angles, we first rotate alpha rad around the x-axis, then beta rad around the y-axis and finally gamma rad around the z-axis again).
* a is the long axis of the ellipse
* b is the short axis of the ellipse.
Here is my target function for optimization (ellipsefit.m)
function [merit]= ellipsefit(x, vmatrix) % x is the initial parameters, vmatrix stores the datapoints
load vmatrix.txt % In vmatrix, the data are stored: N rows x 3 columns
a = x(1);
b = x(2);c
alpha = x(3);
beta = x(4);
gamma = x(5);
z = [x(6),x(7),x(8)];
t = z'
[dim1, dim2]=size(vmatrix);
% construction of rotation matrix R(alpha,beta,gamma)
R1 = [cos(alpha), sin(alpha), 0; -sin(alpha), cos(alpha), 0; 0, 0, 1];v
R2 = [1, 0, 0; 0, cos(beta), sin(beta); 0, -sin(beta), cos(beta)];
R3 = [cos(gamma), sin(gamma), 0; -sin(gamma), cos(gamma), 0; 0, 0, 1];
R = R3*R2*R1;
% first compute vector phi (in the length of the data) by minimizing for every
% point in the data set the distance of this point to the ellipse
% (with its initial parameters a,b,alpha,beta,gamma, z1,z2,z3 held fixed) with respect to phi.
for i=1:dim1
point=vmatrix(i,:)';
dist=#(phi)sum((R*[a*cos(phi); b*sin(phi); 0]+t-point)).^2;
phi(i)=fminbnd(dist,0,2*pi);
end
v = [a*cos(phi); b*sin(phi); zeros(size(phi))];
P = R*v;
%The targetfunction is: g = (xi1,xi2,xi3)' -(z1,z2,z3)'-R(alpha,beta,gamma)(a cos(phi), b sin(phi),0)'
% Construction of distance function
merit = [vmatrix(:,1)-z(1)-P(1),vmatrix(:,2)-z(2)-P(2),vmatrix(:,3)-z(3)-P(3)];
merit = sqrt(sum(sum(merit.^2)))
end
here is the main function for parameters initialization and call for opts (xfit.m)
function [y] = xfit (x)
x= [1 1 1 1 1 1 1 1] % initial parameters
[x] = fminsearch(#ellipsefit,x) % set the distance minimum as the target function
y=x
end
code to reconstruct the ellipse in scatter points (ellipsescatter.txt)
x= [0.655,0.876,1.449,2.248,1.024,0.201,-0.11,0.002] % values obtained according to above routines
a = x(1);
b = x(2);
alpha = x(3);
beta = x(4);
gamma = x(5);
z = [x(6),x(7),x(8)];
R1 = [cos(alpha), sin(alpha), 0; -sin(alpha), cos(alpha), 0; 0, 0, 1];
R2 = [1, 0, 0; 0, cos(beta), sin(beta); 0, -sin(beta), cos(beta)];
R3 = [cos(gamma), sin(gamma), 0; -sin(gamma), cos(gamma), 0; 0, 0, 1];
R = R3*R2*R1;
phi=linspace(0,2*pi,100)
v = [a*cos(phi); b*sin(phi); zeros(size(phi))];
P = R*v;
u = P'
and last the data points (vmatrix)
0.002037965 0.004225765 0.002020202
0.005766671 0.007269193 0.004040404
0.010004802 0.00995638 0.006060606
0.014444336 0.012502725 0.008080808
0.019083408 0.014909533 0.01010101
0.023967745 0.017144827 0.012121212
0.03019849 0.01969697 0.014591289
0.038857283 0.022727273 0.017839321
0.045443501 0.024730475 0.02020202
0.051213405 0.026346492 0.022222222
0.061038174 0.028787879 0.02555121
0.069408829 0.030575164 0.028282828
0.075785861 0.031818182 0.030321465
0.088818543 0.033954681 0.034343434
0.095538223 0.03490652 0.036363636
0.109421234 0.036499949 0.04040404
0.123800737 0.037746182 0.044444444
0.131206601 0.038218171 0.046464646
0.146438211 0.038868525 0.050505051
0.162243245 0.039117883 0.054545455
0.178662839 0.03893748 0.058585859
0.195740664 0.038296774 0.062626263
0.204545539 0.037790433 0.064646465
0.222781268 0.036340005 0.068686869
0.23715887 0.034848485 0.071748051
0.251787024 0.033009003 0.074747475
0.26196429 0.031542949 0.076767677
0.278510276 0.028787879 0.079919236
0.294365342 0.025757576 0.082799669
0.306221108 0.023197784 0.084848485
0.31843759 0.020305704 0.086868687
0.331291367 0.016967964 0.088888889
0.342989936 0.013636364 0.090622484
0.352806191 0.010606061 0.091993214
0.36201461 0.007575758 0.093211986
0.376385537 0.002386324 0.094949495
0.386214665 -0.001515152 0.096012
0.396173756 -0.005800677 0.096969697
0.406365393 -0.010606061 0.097799682
0.417897899 -0.016666667 0.098516141
0.428059375 -0.022727273 0.098889844
0.436894505 -0.028787879 0.09893196
0.444444123 -0.034848485 0.098652697
0.45074522 -0.040909091 0.098061305
0.455830971 -0.046969697 0.097166076
0.457867157 -0.05 0.096591789
0.46096663 -0.056060606 0.095199991
0.461974832 -0.059090909 0.094368708
0.462821268 -0.063709158 0.092929293
0.46279206 -0.068181818 0.091323015
0.462224312 -0.071212121 0.090097745
0.461247257 -0.074242424 0.088770148
0.459194871 -0.07812596 0.086868687
0.456406121 -0.0818267 0.084848485
0.45309565 -0.085162601 0.082828283
0.449335762 -0.088184223 0.080808081
0.445185841 -0.090933095 0.078787879
0.440695103 -0.093443633 0.076767677
0.435904796 -0.095744683 0.074747475
0.429042582 -0.098484848 0.072052312
0.419877272 -0.101489369 0.068686869
0.41402731 -0.103049401 0.066666667
0.407719192 -0.104545455 0.064554798
0.395265308 -0.106881864 0.060606061
0.388611992 -0.107880111 0.058585859
0.374697979 -0.10945186 0.054545455
0.360058411 -0.11051623 0.050505051
0.352443612 -0.11084211 0.048484848
0.336646801 -0.111097219 0.044444444
0.320085063 -0.110817414 0.04040404
0.31150078 -0.110465333 0.038383838
0.293673303 -0.109300395 0.034343434
0.275417637 -0.107575758 0.030396076
0.265228963 -0.106361993 0.028282828
0.251914589 -0.104545455 0.025603647
0.234385536 -0.101745907 0.022222222
0.223443994 -0.099745394 0.02020202
0.212154519 -0.097501571 0.018181818
0.20046153 -0.09497557 0.016161616
0.188298809 -0.092121085 0.014141414
0.17558878 -0.088883868 0.012121212
0.162241674 -0.085201142 0.01010101
0.148154337 -0.081000773 0.008080808
0.136529019 -0.077272727 0.006507255
0.127611912 -0.074242424 0.005361311
0.116762238 -0.070350086 0.004040404
0.103195122 -0.065151515 0.002507114
0.095734279 -0.062121212 0.001725236
0.081719652 -0.056060606 0.000388246
0 0 0
This answer is not a direct fit in 3D, it instead involves first a rotation of the data so that the plane of the points coincides with the xy plane, then a fit to the data in 2D.
% input: data, a N x 3 array with one set of Cartesian coords per row
% remove the center of mass
CM = mean(data);
datap = data - ones(size(data,1),1)*CM;
% now rotate all points into the xy plane ...
% start by finding the plane:
[u s v]=svd(datap);
% rotate the data into the principal axes frame:
datap = datap*v;
% fit the equation for an ellipse to the rotated points
x= [0.25 0.07 0.037 0 0]'; % initial parameters
options=1;
xopt = fmins(#fellipse,x,options,[],datap) % set the distance minimum as the target function
This is the function fellipse, based on the function provided:
function [merit]= fellipse(x,data) % x is the initial parameters, data stores the datapoints
a = x(1);
b = x(2);
alpha = x(3);
z = x(4:5);
R = [cos(alpha), sin(alpha), 0; -sin(alpha), cos(alpha), 0; 0, 0, 1];
data = data*R;
merit = 0;
[dim1, dim2]=size(data);
for i=1:dim1
dist=#(phi)sum( ( [a*cos(phi);b*sin(phi)] + z - data(i,1:2)').^2 );
phi=fminbnd(dist,0,2*pi);
merit = merit+dist(phi);
end
end
Also, note again that this can be turned into a fit directly in 3D, but this answer is just as good if you can assume the data points lie approx. in a 2D plane. The current solution is likely much more efficient then a solution in 3D with additional parameters.
Hopefully the code is self-explanatory. I recommend looking at the link included in the OP, it explains the purpose of the loop over phi.
And this is how you can inspect the result of the fit:
a = xopt(1);
b = xopt(2);
alpha = xopt(3);
z = [xopt(4:5) ; 0]';
phi = linspace(0,2*pi,100)';
simdat = [a*cos(phi) b*sin(phi) zeros(size(phi))];
R = [cos(alpha), -sin(alpha), 0; sin(alpha), cos(alpha), 0; 0, 0, 1];
simdat = simdat*R + ones(size(simdat,1), 1)*z ;
figure, hold on
plot3(datap(:,1),datap(:,2),datap(:,3),'o')
plot3(simdat(:,1),simdat(:,2),zeros(size(simdat,1),1),'r-')
Edit
The following is a 3D approach. It does not appear to be very robust as it is quite sensitive to the choice of starting parameters. Some improvements may be necessary.
CM = mean(data);
datap = data - ones(size(data,1),1)*CM;
xopt = [ 0.07 0.25 1 -0.408976 0.610120 0 0 0]';
options=1;
xopt = fmins(#fellipse3d,xopt,options,[],datap) % set the distance minimum as the target function
The function fellipse3d is
function [merit]= fellipse3d(x,data) % x is the initial parameters, data stores the datapoints
a = abs(x(1));
b = abs(x(2));
alpha = x(3);
beta = x(4);
gamma = x(5);
z = x(6:8)';
[dim1, dim2]=size(data);
R1 = [cos(alpha), sin(alpha), 0; -sin(alpha), cos(alpha), 0; 0, 0, 1];
R2 = [1, 0, 0; 0, cos(beta), sin(beta); 0, -sin(beta), cos(beta)];
R3 = [cos(gamma), sin(gamma), 0; -sin(gamma), cos(gamma), 0; 0, 0, 1];
R = R3*R2*R1;
data = (data - z(ones(dim1,1),:))*R;
merit = 0;
for i=1:dim1
dist=#(phi)sum( ([a*cos(phi);b*sin(phi);0] - data(i,:)').^2 );
phi=fminbnd(dist,0,2*pi);
merit = merit+dist(phi);
end
end
You can visualize the results with
a = xopt(1);
b = xopt(2);
alpha = -xopt(3);
beta = -xopt(4);
gamma = -xopt(5);
z = xopt(6:8)' + CM;
dim1 = 100;
phi = linspace(0,2*pi,dim1)';
simdat = [a*cos(phi) b*sin(phi) zeros(size(phi))];
R1 = [cos(alpha), sin(alpha), 0; ...
-sin(alpha), cos(alpha), 0; ...
0, 0, 1];
R2 = [1, 0, 0; ...
0, cos(beta), sin(beta); ...
0, -sin(beta), cos(beta)];
R3 = [cos(gamma), sin(gamma), 0; ...
-sin(gamma), cos(gamma), 0; ...
0, 0, 1];
R = R1*R2*R3;
simdat = simdat*R + z(ones(dim1,1),:);
figure, hold on
plot3(data(:,1),data(:,2),data(:,3),'o')
plot3(simdat(:,1),simdat(:,2),simdat(:,3),'r-')

Application of Neural Network in MATLAB

I asked a question a few days before but I guess it was a little too complicated and I don't expect to get any answer.
My problem is that I need to use ANN for classification. I've read that much better cost function (or loss function as some books specify) is the cross-entropy, that is J(w) = -1/m * sum_i( yi*ln(hw(xi)) + (1-yi)*ln(1 - hw(xi)) ); i indicates the no. data from training matrix X. I tried to apply it in MATLAB but I find it really difficult. There are couple things I don't know:
should I sum each outputs given all training data (i = 1, ... N, where N is number of inputs for training)
is the gradient calculated correctly
is the numerical gradient (gradAapprox) calculated correctly.
I have following MATLAB codes. I realise I may ask for trivial thing but anyway I hope someone can give me some clues how to find the problem. I suspect the problem is to calculate gradients.
Many thanks.
Main script:
close all
clear all
L = #(x) (1 + exp(-x)).^(-1);
NN = #(x,theta) theta{2}*[ones(1,size(x,1));L(theta{1}*[ones(size(x,1),1) x]')];
% theta = [10 -30 -30];
x = [0 0; 0 1; 1 0; 1 1];
y = [0.9 0.1 0.1 0.1]';
theta0 = 2*rand(9,1)-1;
options = optimset('gradObj','on','Display','iter');
thetaVec = fminunc(#costFunction,theta0,options,x,y);
theta = cell(2,1);
theta{1} = reshape(thetaVec(1:6),[2 3]);
theta{2} = reshape(thetaVec(7:9),[1 3]);
NN(x,theta)'
Cost function:
function [jVal,gradVal,gradApprox] = costFunction(thetaVec,x,y)
persistent index;
% 1 x x
% 1 x x
% 1 x x
% x = 1 x x
% 1 x x
% 1 x x
% 1 x x
m = size(x,1);
if isempty(index) || index > size(x,1)
index = 1;
end
L = #(x) (1 + exp(-x)).^(-1);
NN = #(x,theta) theta{2}*[ones(1,size(x,1));L(theta{1}*[ones(size(x,1),1) x]')];
theta = cell(2,1);
theta{1} = reshape(thetaVec(1:6),[2 3]);
theta{2} = reshape(thetaVec(7:9),[1 3]);
Dew = cell(2,1);
DewApprox = cell(2,1);
% Forward propagation
a0 = x(index,:)';
z1 = theta{1}*[1;a0];
a1 = L(z1);
z2 = theta{2}*[1;a1];
a2 = L(z2);
% Back propagation
d2 = 1/m*(a2 - y(index))*L(z2)*(1-L(z2));
Dew{2} = [1;a1]*d2;
d1 = [1;a1].*(1 - [1;a1]).*theta{2}'*d2;
Dew{1} = [1;a0]*d1(2:end)';
% NNRes = NN(x,theta)';
% jVal = -1/m*sum(NNRes-y)*NNRes*(1-NNRes);
jVal = -1/m*(a2 - y(index))*a2*(1-a2);
gradVal = [Dew{1}(:);Dew{2}(:)];
gradApprox = CalcGradApprox(0.0001);
index = index + 1;
function output = CalcGradApprox(epsilon)
output = zeros(size(gradVal));
for n=1:length(thetaVec)
thetaVecMin = thetaVec;
thetaVecMax = thetaVec;
thetaVecMin(n) = thetaVec(n) - epsilon;
thetaVecMax(n) = thetaVec(n) + epsilon;
thetaMin = cell(2,1);
thetaMax = cell(2,1);
thetaMin{1} = reshape(thetaVecMin(1:6),[2 3]);
thetaMin{2} = reshape(thetaVecMin(7:9),[1 3]);
thetaMax{1} = reshape(thetaVecMax(1:6),[2 3]);
thetaMax{2} = reshape(thetaVecMax(7:9),[1 3]);
a2min = NN(x(index,:),thetaMin)';
a2max = NN(x(index,:),thetaMax)';
jValMin = -1/m*(a2min-y(index))*a2min*(1-a2min);
jValMax = -1/m*(a2max-y(index))*a2max*(1-a2max);
output(n) = (jValMax - jValMin)/2/epsilon;
end
end
end
EDIT:
Below I present the correct version of my costFunction for those who may be interested.
function [jVal,gradVal,gradApprox] = costFunction(thetaVec,x,y)
m = size(x,1);
L = #(x) (1 + exp(-x)).^(-1);
NN = #(x,theta) L(theta{2}*[ones(1,size(x,1));L(theta{1}*[ones(size(x,1),1) x]')]);
theta = cell(2,1);
theta{1} = reshape(thetaVec(1:6),[2 3]);
theta{2} = reshape(thetaVec(7:9),[1 3]);
Delta = cell(2,1);
Delta{1} = zeros(size(theta{1}));
Delta{2} = zeros(size(theta{2}));
D = cell(2,1);
D{1} = zeros(size(theta{1}));
D{2} = zeros(size(theta{2}));
jVal = 0;
for in = 1:size(x,1)
% Forward propagation
a1 = [1;x(in,:)']; % added bias to a0
z2 = theta{1}*a1;
a2 = [1;L(z2)]; % added bias to a1
z3 = theta{2}*a2;
a3 = L(z3);
% Back propagation
d3 = a3 - y(in);
d2 = theta{2}'*d3.*a2.*(1 - a2);
Delta{2} = Delta{2} + d3*a2';
Delta{1} = Delta{1} + d2(2:end)*a1';
jVal = jVal + sum( y(in)*log(a3) + (1-y(in))*log(1-a3) );
end
D{1} = 1/m*Delta{1};
D{2} = 1/m*Delta{2};
jVal = -1/m*jVal;
gradVal = [D{1}(:);D{2}(:)];
gradApprox = CalcGradApprox(x(in,:),0.0001);
% Nested function to calculate gradApprox
function output = CalcGradApprox(x,epsilon)
output = zeros(size(thetaVec));
for n=1:length(thetaVec)
thetaVecMin = thetaVec;
thetaVecMax = thetaVec;
thetaVecMin(n) = thetaVec(n) - epsilon;
thetaVecMax(n) = thetaVec(n) + epsilon;
thetaMin = cell(2,1);
thetaMax = cell(2,1);
thetaMin{1} = reshape(thetaVecMin(1:6),[2 3]);
thetaMin{2} = reshape(thetaVecMin(7:9),[1 3]);
thetaMax{1} = reshape(thetaVecMax(1:6),[2 3]);
thetaMax{2} = reshape(thetaVecMax(7:9),[1 3]);
a3min = NN(x,thetaMin)';
a3max = NN(x,thetaMax)';
jValMin = 0;
jValMax = 0;
for inn=1:size(x,1)
jValMin = jValMin + sum( y(inn)*log(a3min) + (1-y(inn))*log(1-a3min) );
jValMax = jValMax + sum( y(inn)*log(a3max) + (1-y(inn))*log(1-a3max) );
end
jValMin = 1/m*jValMin;
jValMax = 1/m*jValMax;
output(n) = (jValMax - jValMin)/2/epsilon;
end
end
end
I've only had a quick eyeball over your code. Here are some pointers.
Q1
should I sum each outputs given all training data (i = 1, ... N, where
N is number of inputs for training)
If you are talking in relation to the cost function, it is normal to sum and normalise by the number of training examples in order to provide comparison between.
I can't tell from the code whether you have a vectorised implementation which will change the answer. Note that the sum function will only sum up a single dimension at a time - meaning if you have a (M by N) array, sum will result in a 1 by N array.
The cost function should have a scalar output.
Q2
is the gradient calculated correctly
The gradient is not calculated correctly - specifically the deltas look wrong. Try following Andrew Ng's notes [PDF] they are very good.
Q3
is the numerical gradient (gradAapprox) calculated correctly.
This line looks a bit suspect. Does this make more sense?
output(n) = (jValMax - jValMin)/(2*epsilon);
EDIT: I actually can't make heads or tails of your gradient approximation. You should only use forward propagation and small tweaks in the parameters to compute the gradient. Good luck!