Is this the correct way to calculate the covariance matrix for 2D feature map? - neural-network

Within a neural network, I have some 2D feature maps with values between 0 and 1. For these maps, I want to calculate the covariance matrix based on the values at each coordination. Unfortunately, pytorch has no .cov() function like in numpy. So I wrote the following function instead:
def get_covariance(tensor):
bn, nk, w, h = tensor.shape
tensor_reshape = tensor.reshape(bn, nk, 2, -1)
x = tensor_reshape[:, :, 0, :]
y = tensor_reshape[:, :, 1, :]
mean_x = torch.mean(x, dim=2).unsqueeze(-1)
mean_y = torch.mean(y, dim=2).unsqueeze(-1)
xx = torch.sum((x - mean_x) * (x - mean_x), dim=2).unsqueeze(-1) / (h * w - 1)
xy = torch.sum((x - mean_x) * (y - mean_y), dim=2).unsqueeze(-1) / (h * w - 1)
yx = xy
yy = torch.sum((y - mean_y) * (y - mean_y), dim=2).unsqueeze(-1) / (h * w - 1)
cov = torch.cat((xx, xy, yx, yy), dim=2)
cov = cov.reshape(bn, nk, 2, 2)
return cov
Is that the correct way to do it?
Edit:
Here is a comparison with the numpy function:
a = torch.randn(1, 1, 64, 64)
a_numpy = a.reshape(1, 1, 2, -1).numpy()
torch_cov = get_covariance(a)
numpy_cov = np.cov(a_numpy[0][0])
torch_cov
tensor([[[[ 0.4964, -0.0053],
[-0.0053, 0.4926]]]])
numpy_cov
array([[ 0.99295635, -0.01069122],
[-0.01069122, 0.98539236]])
Apparently, my values are too small by a factor of 2. Why could that be?
Edit2: Ahhh I figured it out. It has to be divided by (h*w/2 - 1) :) Then the values match.

Related

What type of (probably syntactic) mistake am I making when using a generic function on an array for plotting?

I am trying to plot a geodesic on a 3D surface (tractrix) with Matlab. This worked for me in the past when I didn't need to parametrize the surface (see here). However, the tractrix called for parameterization, chain rule differentiation, and collection of u,v,x,y and f(x,y) values.
After many mistakes I think that I'm getting the right values for x = f1(u,v) and y = f2(u,v) describing a spiral right at the base of the surface:
What I can't understand is why the z value or height of the 3D plot of the curve is consistently zero, when I'm applying the same mathematical formula that allowed me to plot the surface in the first place, ie. f = #(x,y) a.* (y - tanh(y)) .
Here is the code, which runs without any errors on Octave. I'm typing a special note in upper case on the crucial calls. Also note that I have restricted the number of geodesic lines to 1 to decrease the execution time.
a = 0.3;
u = 0:0.1:(2 * pi);
v = 0:0.1:5;
[X,Y] = meshgrid(u,v);
% NOTE THAT THESE FORMULAS RESULT IN A SUCCESSFUL PLOT OF THE SURFACE:
x = a.* cos(X) ./ cosh(Y);
y = a.* sin(X) ./ cosh(Y);
z = a.* (Y - tanh(Y));
h = surf(x,y,z);
zlim([0, 1.2]);
set(h,'edgecolor','none')
colormap summer
hold on
% THESE ARE THE GENERIC FUNCTIONS (f) WHICH DON'T SEEM TO WORK AT THE END:
f = #(x,y) a.* (y - tanh(y));
f1 = #(u,v) a.* cos(u) ./ cosh(v);
f2 = #(u,v) a.* sin(u) ./ cosh(v);
dfdu = #(u,v) ((f(f1(u,v)+eps, f2(u,v)) - f(f1(u,v) - eps, f2(u,v)))/(2 * eps) .*
(f1(u+eps,v)-f1(u-eps,v))/(2*eps) +
(f(f1(u,v), f2(u,v)+eps) - f(f1(u,v), f2(u,v)-eps))/(2 * eps) .*
(f2(u+eps,v)-f2(u-eps,v))/(2*eps));
dfdv = #(u,v) ((f(f1(u,v)+eps, f2(u,v)) - f(f1(u,v) - eps, f2(u,v)))/(2 * eps) .*
(f1(u,v+eps)-f1(u,v-eps))/(2*eps) +
(f(f1(u,v), f2(u,v)+eps) - f(f1(u,v), f2(u,v)-eps))/(2 * eps) .*
(f2(u,v+eps)-f2(u,v-eps))/(2*eps));
% Normal vector to the surface:
N = #(u,v) [- dfdu(u,v), - dfdv(u,v), 1]; % Normal vec to surface # any pt.
% Some colors to draw the lines:
C = {'k','r','g','y','m','c'};
for s = 1:1 % No. of lines to be plotted.
% Starting points:
u0 = [0, u(length(u))];
v0 = [0, v(length(v))];
du0 = 0.001;
dv0 = 0.001;
step_size = 0.00005; % Will determine the progression rate from pt to pt.
eta = step_size / sqrt(du0^2 + dv0^2); % Normalization.
eps = 0.0001; % Epsilon
max_num_iter = 100000; % Number of dots in each line.
% Semi-empty vectors to collect results:
U = [[u0(s), u0(s) + eta*du0], zeros(1,max_num_iter - 2)];
V = [[v0(s), v0(s) + eta*dv0], zeros(1,max_num_iter - 2)];
for i = 2:(max_num_iter - 1) % Creating the geodesic:
ut = U(i);
vt = V(i);
xt = f1(ut,vt);
yt = f2(ut,vt);
ft = f(xt,yt);
utm1 = U(i - 1);
vtm1 = V(i - 1);
xtm1 = f1(utm1,vtm1);
ytm1 = f2(utm1,vtm1);
ftm1 = f(xtm1,ytm1);
usymp = ut + (ut - utm1);
vsymp = vt + (vt - vtm1);
xsymp = f1(usymp,vsymp);
ysymp = f2(usymp,vsymp);
fsymp = ft + (ft - ftm1);
df = fsymp - f(xsymp,ysymp); % Is the surface changing? How much?
n = N(ut,vt); % Normal vector at point t
gamma = df * n(3); % Scalar x change f x z value of N
xtp1 = xsymp - gamma * n(1); % Gamma to modulate incre. x & y.
ytp1 = ysymp - gamma * n(2);
U(i + 1) = usymp - gamma * n(1);;
V(i + 1) = vsymp - gamma * n(2);;
end
% THE PROBLEM! f(f1(U,V),f2(U,V)) below YIELDS ALL ZEROS!!! The expected values are between 0 and 1.2.
P = [f1(U,V); f2(U,V); f(f1(U,V),f2(U,V))]; % Compiling results into a matrix.
units = 35; % Determines speed (smaller, faster)
packet = floor(size(P,2)/units);
P = P(:,1: packet * units);
for k = 1:packet:(packet * units)
hold on
plot3(P(1, k:(k+packet-1)), P(2,(k:(k+packet-1))), P(3,(k:(k+packet-1))),
'.', 'MarkerSize', 5,'color',C{s})
drawnow
end
end
The answer is to Cris Luengo's credit, who noticed that the upper-case assigned to the variable Y, used for the calculation of the height of the curve z, was indeed in the parametrization space u,v as intended, and not in the manifold x,y! I don't use Matlab/Octave other than for occasional simulations, and I was trying every other syntactical permutation I could think of without realizing that f fed directly from v (as intended). I changed now the names of the different variables to make it cleaner.
Here is the revised code:
a = 0.3;
u = 0:0.1:(3 * pi);
v = 0:0.1:5;
[U,V] = meshgrid(u,v);
x = a.* cos(U) ./ cosh(V);
y = a.* sin(U) ./ cosh(V);
z = a.* (V - tanh(V));
h = surf(x,y,z);
zlim([0, 1.2]);
set(h,'edgecolor','none')
colormap summer
hold on
f = #(x,y) a.* (y - tanh(y));
f1 = #(u,v) a.* cos(u) ./ cosh(v);
f2 = #(u,v) a.* sin(u) ./ cosh(v);
dfdu = #(u,v) ((f(f1(u,v)+eps, f2(u,v)) - f(f1(u,v) - eps, f2(u,v)))/(2 * eps) .*
(f1(u+eps,v)-f1(u-eps,v))/(2*eps) +
(f(f1(u,v), f2(u,v)+eps) - f(f1(u,v), f2(u,v)-eps))/(2 * eps) .*
(f2(u+eps,v)-f2(u-eps,v))/(2*eps));
dfdv = #(u,v) ((f(f1(u,v)+eps, f2(u,v)) - f(f1(u,v) - eps, f2(u,v)))/(2 * eps) .*
(f1(u,v+eps)-f1(u,v-eps))/(2*eps) +
(f(f1(u,v), f2(u,v)+eps) - f(f1(u,v), f2(u,v)-eps))/(2 * eps) .*
(f2(u,v+eps)-f2(u,v-eps))/(2*eps));
% Normal vector to the surface:
N = #(u,v) [- dfdu(u,v), - dfdv(u,v), 1]; % Normal vec to surface # any pt.
% Some colors to draw the lines:
C = {'y','r','k','m','w',[0.8 0.8 1]}; % Color scheme
for s = 1:6 % No. of lines to be plotted.
% Starting points:
u0 = [0, -pi/2, 2*pi, 4*pi/3, pi/4, pi];
v0 = [0, 0, 0, 0, 0, 0];
du0 = [0, -0.0001, 0.001, - 0.001, 0.001, -0.01];
dv0 = [0.1, 0.01, 0.001, 0.001, 0.0005, 0.01];
step_size = 0.00005; % Will determine the progression rate from pt to pt.
eta = step_size / sqrt(du0(s)^2 + dv0(s)^2); % Normalization.
eps = 0.0001; % Epsilon
max_num_iter = 180000; % Number of dots in each line.
% Semi-empty vectors to collect results:
Uc = [[u0(s), u0(s) + eta*du0(s)], zeros(1,max_num_iter - 2)];
Vc = [[v0(s), v0(s) + eta*dv0(s)], zeros(1,max_num_iter - 2)];
for i = 2:(max_num_iter - 1) % Creating the geodesic:
ut = Uc(i);
vt = Vc(i);
xt = f1(ut,vt);
yt = f2(ut,vt);
ft = f(xt,yt);
utm1 = Uc(i - 1);
vtm1 = Vc(i - 1);
xtm1 = f1(utm1,vtm1);
ytm1 = f2(utm1,vtm1);
ftm1 = f(xtm1,ytm1);
usymp = ut + (ut - utm1);
vsymp = vt + (vt - vtm1);
xsymp = f1(usymp,vsymp);
ysymp = f2(usymp,vsymp);
fsymp = ft + (ft - ftm1);
df = fsymp - f(xsymp,ysymp); % Is the surface changing? How much?
n = N(ut,vt); % Normal vector at point t
gamma = df * n(3); % Scalar x change f x z value of N
xtp1 = xsymp - gamma * n(1); % Gamma to modulate incre. x & y.
ytp1 = ysymp - gamma * n(2);
Uc(i + 1) = usymp - gamma * n(1);;
Vc(i + 1) = vsymp - gamma * n(2);;
end
x = f1(Uc,Vc);
y = f2(Uc,Vc);
P = [x; y; f(Uc,Vc)]; % Compiling results into a matrix.
units = 35; % Determines speed (smaller, faster)
packet = floor(size(P,2)/units);
P = P(:,1: packet * units);
for k = 1:packet:(packet * units)
hold on
plot3(P(1, k:(k+packet-1)), P(2,(k:(k+packet-1))), P(3,(k:(k+packet-1))),
'.', 'MarkerSize', 5,'color',C{s})
drawnow
end
end

Code Horner’s Method for Polynomial Evaluation

I am trying to code Horner’s Method for Polynomial Evaluation but for some reason its not working for me and I'm not sure where I am getting it wrong.
These are the data I have:
nodes = [-2, -1, 1]
x = 2
c (coefficients) = [-3, 3, -1]
The code I have so far is:
function y = horner(x, nodes, c)
n = length(c);
y = c(1);
for i = 2:n
y = y * ((x - nodes(i - 1)) + c(i));
end
end
I am supposed to end up with a polynomial such as (−1)·(x+2)(x+1)+3·(x+2)−3·1 and if x =2 then I am supposed to get -3. But for some reason I don't know where I am going wrong.
Edit:
So I changed my code. I think it works but I am not sure:
function y = horner(x, nodes, c)
n = length(c);
y = c(n);
for k = n-1:-1:1
y = c(k) + y * (x - nodes((n - k) + 1));
end
end
This works:
function y = horner(x, nodes, c)
n = length(c);
y = 0;
for i = 1:n % We iterate over `c`
tmp = c(i);
for j = 1:i-1 % We iterate over the relevant elements of `nodes`
tmp *= x - nodes(j); % We multiply `c(i) * (x - nodes(1)) * (x -nodes(2)) * (x- nodes(3)) * ... * (x - nodes(i -1))
end
y += tmp; % We added each product to y
end
% Here `y` is as following:
% c(1) + c(2) * (x - nodes(1)) + c(3) * (x - nodes(1)) * (x - nodes(2)) + ... + c(n) * (x - nodes(1)) * ... * (x - nodes(n - 1))
end
(I'm sorry this isn't python but I don't know python)
In the case where we didn't have nodes, horner's method works like this:
p = c[n]
for i=n-1 .. 1
p = x*p + c[i]
for example for a quadratic (with coeffs a,b,c) this is
p = x*(x*a+b)+c
Note that if your language supports fma
fma(x,y,x) = x*y+z
then horner's method can be written
p = c[n]
for i=n-1 .. 1
p = fma( x, p, c[i])
When you do have nodes, the change is simple:
p = c[n]
for i=n-1 .. 1
p = (x-nodes[i])*p + c[i]
Or, using fma
p = c[n]
for i=n-1 .. 1
p = fma( (x-nodes[i]), p, c[i])
For the quadratic above this leads to
p = (x-nodes[1]*((x-nodes[2])*a+b)+c

How to compute log of multvariate gaussian in matlab

I am trying to compute log(N(x | mu, sigma)) in MATLAB where
x is the data vector(Dimensions D x 1) , mu(Dimensions D x 1) is mean and sigma(Dimensions D x D) is covariance.
My present implementation is
function [loggaussian] = logmvnpdf(x,mu,Sigma)
[D,~] = size(x);
const = -0.5 * D * log(2*pi);
term1 = -0.5 * ((x - mu)' * (inv(Sigma) * (x - mu)));
term2 = - 0.5 * logdet(Sigma);
loggaussian = const + term1 + term2;
end
function y = logdet(A)
y = log(det(A));
end
For some cases I get an error
Warning: Matrix is close to singular or badly scaled. Results may be inaccurate. RCOND =
NaN
I know you will point out that my data is not consistent, but I need to implement the function so that I can get the best approximate instead of throwing an warning. . How do I ensure that I always get a value.
I think the warning comes from using inv(Sigma). According to the documentation, you should avoid using inv where its use can be replaced by \ (mldivide). This will give you both better speed and accuracy.
For your code, instead of inv(Sigma) * (x - mu) use Sigma \ (x - mu).
The following approach should be (a little) less sensitive to ill-conditioning of the covariance matrix:
function logpdf = logmvnpdf (x, mu, K)
n = length (x);
R = chol (K);
const = 0.5 * n * log (2 * pi);
term1 = 0.5 * sum (((R') \ (x - mu)) .^ 2);
term2 = sum (log (diag (R)));
logpdf = - (const + term1 + term2);
end
If K is singular or near-singular, you can still have warnings (or errors) when calling chol.

Given Three Points Compute Affine Transformation

I have two images and found three similar 2D points using a sift. I need to compute the affine transformation between the images. Unfortunately, I missed lecture and the information out there is a little dense for me. What would the general method be for computing this 2x3 matrix?
I have the matrix of points in a 2x3 matrix [x1 y1;x2 y2;x3 y3] but I am lost from there.
Thanks for any help.
Usually, an affine transormation of 2D points is experssed as
x' = A*x
Where x is a three-vector [x; y; 1] of original 2D location and x' is the transformed point. The affine matrix A is
A = [a11 a12 a13;
a21 a22 a23;
0 0 1]
This form is useful when x and A are known and you wish to recover x'.
However, you can express this relation in a different way.
Let
X = [xi yi 1 0 0 0;
0 0 0 xi yi 1 ]
and a is a column vector
a = [a11; a12; a13; a21; a22; a23]
Then
X*a = [xi'; yi']
Holds for all pairs of corresponding points x_i, x_i'.
This alternative form is very useful when you know the correspondence between pairs of points and you wish to recover the paramters of A.
Stacking all your points in a large matrix X (two rows for each point) you'll have 2*n-by-6 matrix X multiplyied by 6-vector of unknowns a equals a 2*n-by-1 column vector of the stacked corresponding points (denoted by x_prime):
X*a = x_prime
Solving for a:
a = X \ x_prime
Recovers the parameters of a in a least-squares sense.
Good luck and stop skipping class!
Sorry for not using Matlab, but I only worked with Python a little bit. I think this code may help you (sorry for bad codestyle -- I'm mathematician, not programmer)
import numpy as np
# input data
ins = [[1, 1], [2, 3], [3, 2]] # <- points
out = [[0, 2], [1, 2], [-2, -1]] # <- mapped to
# calculations
l = len(ins)
B = np.vstack([np.transpose(ins), np.ones(l)])
D = 1.0 / np.linalg.det(B)
entry = lambda r,d: np.linalg.det(np.delete(np.vstack([r, B]), (d+1), axis=0))
M = [[(-1)**i * D * entry(R, i) for i in range(l)] for R in np.transpose(out)]
A, t = np.hsplit(np.array(M), [l-1])
t = np.transpose(t)[0]
# output
print("Affine transformation matrix:\n", A)
print("Affine transformation translation vector:\n", t)
# unittests
print("TESTING:")
for p, P in zip(np.array(ins), np.array(out)):
image_p = np.dot(A, p) + t
result = "[OK]" if np.allclose(image_p, P) else "[ERROR]"
print(p, " mapped to: ", image_p, " ; expected: ", P, result)
This code demonstrates how to recover affine transformation as matrix and vector and tests that initial points are mapped to where they should. You can test this code with Google colab, so you don't have to install anything. Probably, you can translate it to Matlab.
Regarding theory behind this code: it is based on equation presented in "Beginner's guide to mapping simplexes affinely", matrix recovery is described in section "Recovery of canonical notation". The same authors published "Workbook on mapping simplexes affinely" that contains many practical examples of this kind.
for implementation purposes in OpenCV you can use cv2.getAffineTransform
getAffineTransform() [1/2]
Mat cv::getAffineTransform ( const Point2f src[],
const Point2f dst[]
)
Python:
cv.getAffineTransform( src, dst ) -> retval
This is for anyone who wants to do it in C. The algorithm is based on this MathStackExchange post. The author shows how to use Gauss-Jordan elimination to find the formula for the Affine transformation coefficients,
/*
*Function: Affine Solver
*Role: Finds Affine Transforming mapping (X,Y) to (X',Y')
*Input: double array[A,B,C,D,E,F],
* int array[X-Coordinates], int array[Y-Coordinates],
* int array[X'-Coordinates],int array[Y'-Coordinates]
*Output:void - Fills double array[A,B,C,D,E,F]
*/
void AffineSolver(double *AtoF, int *X, int *Y, int *XP, int *YP)
{
AtoF[0] = (double)(XP[1]*Y[0] - XP[2]*Y[0] - XP[0]*Y[1] + XP[2]*Y[1] + XP[0]*Y[2] -XP[1]*Y[2]) /
(double)(X[1]*Y[0] - X[2]*Y[0] - X[0]*Y[1] + X[2]*Y[1] + X[0]*Y[2] - X[1]*Y[2]);
AtoF[1] = (double)(XP[1]*X[0] - XP[2]*X[0] - XP[0]*X[1] + XP[2]*X[1] + XP[0]*X[2] -XP[1]*X[2]) /
(double)(-X[1]*Y[0] + X[2]*Y[0] + X[0]*Y[1] - X[2]*Y[1] - X[0]*Y[2] +X[1]*Y[2]);
AtoF[2] = (double)(YP[1]*Y[0] - YP[2]*Y[0] - YP[0]*Y[1] + YP[2]*Y[1] + YP[0]*Y[2] -YP[1]*Y[2]) /
(double)(X[1]*Y[0] - X[2]*Y[0] - X[0]*Y[1] + X[2]*Y[1] + X[0]*Y[2] - X[1]*Y[2]);
AtoF[3] = (double)(YP[1]*X[0] - YP[2]*X[0] - YP[0]*X[1] + YP[2]*X[1] + YP[0]*X[2] -YP[1]*X[2]) /
(double)(-X[1]*Y[0] + X[2]*Y[0] + X[0]*Y[1] - X[2]*Y[1] - X[0]*Y[2] +X[1]*Y[2]);
AtoF[4] = (double)(XP[2]*X[1]*Y[0] - XP[1]*X[2]*Y[0]-XP[2]*X[0]*Y[1] + XP[0]*X[2]*Y[1]+
XP[1]*X[0]*Y[2] - XP[0]*X[1]*Y[2]) /
(double)(X[1]*Y[0] - X[2]*Y[0] - X[0]*Y[1] + X[2]*Y[1] + X[0]*Y[2] - X[1]*Y[2]);
AtoF[5] = (double)(YP[2]*X[1]*Y[0] - YP[1]*X[2]*Y[0]-YP[2]*X[0]*Y[1] + YP[0]*X[2]*Y[1] + YP[1]*X[0]*Y[2] - YP[0]*X[1]*Y[2]) /
(double)(X[1]*Y[0] - X[2]*Y[0] - X[0]*Y[1] + X[2]*Y[1] + X[0]*Y[2] - X[1]*Y[2]);
}
/*
*Function: PrintMatrix
*Role: Prints 2*3 matrix as //a b e
//c d f
*Input: double array[ABCDEF]
*Output: voids
*/
void PrintMatrix(double *AtoF)
{
printf("a = %f ",AtoF[0]);
printf("b = %f ",AtoF[1]);
printf("e = %f\n",AtoF[4]);
printf("c = %f ",AtoF[2]);
printf("d = %f ",AtoF[3]);
printf("f = %f ",AtoF[5]);
}
int main()
{
/*Test*/
/*Find transform mapping (0,10),(0,0),(10,0) to (0,5)(0,0)(5,0)*/
/*Expected Output*/
//a = 0.500000 b = 0.000000 e = -0.000000
//c = -0.000000 d = 0.500000 f = -0.000000
/*Test*/
double *AtoF = calloc(6, sizeof(double));
int X[] = { 0, 0,10};
int Y[] = {10, 0, 0};
int XP[] = { 0, 0, 5};
int YP[] = { 5, 0, 0};
AffineSolver(AtoF,X,Y,XP,YP);
PrintMatrix(AtoF);
free(AtoF);
return 0;
}

Octave backpropagation implementation issues

I wrote a code to implement steepest descent backpropagation with which I am having issues. I am using the Machine CPU dataset and have scaled the inputs and outputs into range [0 1]
The codes in matlab/octave is as follows:
steepest descent backpropagation
%SGD = Steepest Gradient Decent
function weights = nnSGDTrain (X, y, nhid_units, gamma, max_epoch, X_test, y_test)
iput_units = columns (X);
oput_units = columns (y);
n = rows (X);
W2 = rand (nhid_units + 1, oput_units);
W1 = rand (iput_units + 1, nhid_units);
train_rmse = zeros (1, max_epoch);
test_rmse = zeros (1, max_epoch);
for (epoch = 1:max_epoch)
delW2 = zeros (nhid_units + 1, oput_units)';
delW1 = zeros (iput_units + 1, nhid_units)';
for (i = 1:rows(X))
o1 = sigmoid ([X(i,:), 1] * W1); %1xn+1 * n+1xk = 1xk
o2 = sigmoid ([o1, 1] * W2); %1xk+1 * k+1xm = 1xm
D2 = o2 .* (1 - o2);
D1 = o1 .* (1 - o1);
e = (y_test(i,:) - o2)';
delta2 = diag (D2) * e; %mxm * mx1 = mx1
delta1 = diag (D1) * W2(1:(end-1),:) * delta2; %kxm * mx1 = kx1
delW2 = delW2 + (delta2 * [o1 1]); %mx1 * 1xk+1 = mxk+1 %already transposed
delW1 = delW1 + (delta1 * [X(i, :) 1]); %kx1 * 1xn+1 = k*n+1 %already transposed
end
delW2 = gamma .* delW2 ./ n;
delW1 = gamma .* delW1 ./ n;
W2 = W2 + delW2';
W1 = W1 + delW1';
[dummy train_rmse(epoch)] = nnPredict (X, y, nhid_units, [W1(:);W2(:)]);
[dummy test_rmse(epoch)] = nnPredict (X_test, y_test, nhid_units, [W1(:);W2(:)]);
printf ('Epoch: %d\tTrain Error: %f\tTest Error: %f\n', epoch, train_rmse(epoch), test_rmse(epoch));
fflush (stdout);
end
weights = [W1(:);W2(:)];
% plot (1:max_epoch, test_rmse, 1);
% hold on;
plot (1:max_epoch, train_rmse(1:end), 2);
% hold off;
end
predict
%Now SFNN Only
function [o1 rmse] = nnPredict (X, y, nhid_units, weights)
iput_units = columns (X);
oput_units = columns (y);
n = rows (X);
W1 = reshape (weights(1:((iput_units + 1) * nhid_units),1), iput_units + 1, nhid_units);
W2 = reshape (weights((((iput_units + 1) * nhid_units) + 1):end,1), nhid_units + 1, oput_units);
o1 = sigmoid ([X ones(n,1)] * W1); %nxiput_units+1 * iput_units+1xnhid_units = nxnhid_units
o2 = sigmoid ([o1 ones(n,1)] * W2); %nxnhid_units+1 * nhid_units+1xoput_units = nxoput_units
rmse = RMSE (y, o2);
end
RMSE function
function rmse = RMSE (a1, a2)
rmse = sqrt (sum (sum ((a1 - a2).^2))/rows(a1));
end
I have also trained the same dataset using the R RSNNS package mlp and the RMSE for train set (first 100 examples) are around 0.03 . But in my implementation I cannot achieve lower RMSE than 0.14 . And sometimes the errors grow for some higher learning rates, and no learning rate gets me lower RMSE than 0.14. Also a paper i referred report the RMSE in for the train set is around 0.03
I wanted to know where is the problem i the code. I have followed Raul Rojas book and confirmed that things are okay.
In backprobagation code the line
e = (y_test(i,:) - o2)';
is not correct, because the o2 is the output from the train set and i am finding the difference from one example from the test set y_test. The line should have been as below:
e = (y(i,:) - o2)';
which correctly finds the difference between the predicted output by the current model and the target output of the corresponding example.
This took me 3 days to find this one, I am fortunate enough to find this freaking bug which stopped me from going into further modifications.