Matlab neural network handwritten digit recognition, output going to indifference

Matlab neural network handwritten digit recognition, output going to indifference - matlab

Using Matlab I am trying to construct a neural network that can classify handwritten digits that are 30x30 pixels. I use backpropagation to find the correct weights and biases. The network starts with 900 inputs, then has 2 hidden layers with 16 neurons and it ends with 10 outputs. Each output neuron has a value between 0 and 1 that represents the belief that the input should be classified as a certain digit. The problem is that after training, the output becomes almost indifferent to the input and it goes towards a uniform belief of 0.1 for each output.
My approach is to take each image with 30x30 pixels and reshape it to be vectors of 900x1 (note that 'Images_vector' is already in the vector format when it is loaded). The weights and biases are initiated with random values between 0 and 1. I am using stochastic gradiënt descent to update the weights and biases with 10 randomly selected samples per batch. The equations are as described by Nielsen.
The script is as follows.
%% Inputs
numberofbatches = 1000;
batchsize = 10;
alpha = 1;
cutoff = 8000;
layers = [900 16 16 10];
%% Initialization
rng(0);
load('Images_vector')
Images_vector = reshape(Images_vector', 1, 10000);
labels = [ones(1,1000) 2*ones(1,1000) 3*ones(1,1000) 4*ones(1,1000) 5*ones(1,1000) 6*ones(1,1000) 7*ones(1,1000) 8*ones(1,1000) 9*ones(1,1000) 10*ones(1,1000)];
newOrder = randperm(10000);
Images_vector = Images_vector(newOrder);
labels = labels(newOrder);
images_training = Images_vector(1:cutoff);
images_testing = Images_vector(cutoff + 1:10000);
w = cell(1,length(layers) - 1);
b = cell(1,length(layers));
dCdw = cell(1,length(layers) - 1);
dCdb = cell(1,length(layers));
for i = 1:length(layers) - 1
w{i} = rand(layers(i+1),layers(i));
b{i+1} = rand(layers(i+1),1);
end
%% Learning process
batches = randi([1 cutoff - batchsize],1,numberofbatches);
cost = zeros(numberofbatches,1);
c = 1;
for batch = batches
for i = 1:length(layers) - 1
dCdw{i} = zeros(layers(i+1),layers(i));
dCdb{i+1} = zeros(layers(i+1),1);
end
for n = batch:batch+batchsize
y = zeros(10,1);
disp(labels(n))
y(labels(n)) = 1;
% Network
a{1} = images_training{n};
z{2} = w{1} * a{1} + b{2};
a{2} = sigmoid(0, z{2});
z{3} = w{2} * a{2} + b{3};
a{3} = sigmoid(0, z{3});
z{4} = w{3} * a{3} + b{4};
a{4} = sigmoid(0, z{4});
% Cost
cost(c) = sum((a{4} - y).^2) / 2;
% Gradient
d{4} = (a{4} - y) .* sigmoid(1, z{4});
d{3} = (w{3}' * d{4}) .* sigmoid(1, z{3});
d{2} = (w{2}' * d{3}) .* sigmoid(1, z{2});
dCdb{4} = dCdb{4} + d{4} / 10;
dCdb{3} = dCdb{3} + d{3} / 10;
dCdb{2} = dCdb{2} + d{2} / 10;
dCdw{3} = dCdw{3} + (a{3} * d{4}')' / 10;
dCdw{2} = dCdw{2} + (a{2} * d{3}')' / 10;
dCdw{1} = dCdw{1} + (a{1} * d{2}')' / 10;
c = c + 1;
end
% Adjustment
b{4} = b{4} - dCdb{4} * alpha;
b{3} = b{3} - dCdb{3} * alpha;
b{2} = b{2} - dCdb{2} * alpha;
w{3} = w{3} - dCdw{3} * alpha;
w{2} = w{2} - dCdw{2} * alpha;
w{1} = w{1} - dCdw{1} * alpha;
end
figure
plot(cost)
ylabel 'Cost'
xlabel 'Batches trained on'
With the sigmoid function being the following.
function y = sigmoid(derivative, x)
if derivative == 0
y = 1 ./ (1 + exp(-x));
else
y = sigmoid(0, x) .* (1 - sigmoid(0, x));
end
end
Other than this I have also tried to have 1 of each digit in each batch, but this gave the same result. Also I have tried varying the batch size, the number of batches and alpha, but with no success.
Does anyone know what I am doing wrong?

Correct me if I'm wrong: You have 10000 samples in you're data, which you divide into 1000 batches of 10 samples. Your training process consists of running over these 10000 samples once.
This might be too little, normally your training process consists of several epochs (one epoch = iterating over every sample once). You can try going over your batches multiple times.
Also for 900 inputs your network seems small. Try it with more neurons in the second layer. Hope it helps!

Related

Learning XOR with deep neural network

I am novice to deep learning so I begin with the simplest test case: XOR learning.
In the new edition of Digital Image Processing by G & W the authors give an example of XOR learning by a deep net with 3 layers: input, hidden and output (each layer has 2 neurons.), and a sigmoid as the network activation function.
For network initailization they say: "We used alpha = 1.0, an inital set of Gaussian random weights of zero mean and standard deviation of 0.02" (alpha is the gradient descent learning rate).
Training was made with 4 labeled examples:
X = [1 -1 -1 1;1 -1 1 -1];%MATLAB syntax
R = [1 1 0 0;0 0 1 1];%Labels
I have written the following MATLAB code to implement the network learing process:
function output = neuralNet4e(input,specs)
NumPat = size(input.X,2);%Number of patterns
NumLayers = length(specs.W);
for kEpoch = 1:specs.NumEpochs
% forward pass
A = cell(NumLayers,1);%Output of each neuron in each layer
derZ = cell(NumLayers,1);%Activation function derivative on each neuron dot product
A{1} = input.X;
for kLayer = 2:NumLayers
B = repmat(specs.b{kLayer},1,NumPat);
Z = specs.W{kLayer} * A{kLayer - 1} + B;
derZ{kLayer} = specs.activationFuncDerive(Z);
A{kLayer} = specs.activationFunc(Z);
end
% backprop
D = cell(NumLayers,1);
D{NumLayers} = (A{NumLayers} - input.R).* derZ{NumLayers};
for kLayer = (NumLayers-1):-1:2
D{kLayer} = (specs.W{kLayer + 1}' * D{kLayer + 1}).*derZ{kLayer};
end
%Update weights and biases
for kLayer = 2:NumLayers
specs.W{kLayer} = specs.W{kLayer} - specs.alpha * D{kLayer} * A{kLayer - 1}' ;
specs.b{kLayer} = specs.b{kLayer} - specs.alpha * sum(D{kLayer},2);
end
end
output.A = A;
end
Now, when I am using their setup (i.e., weights initalizaion with std = 0.02)
clearvars
s = 0.02;
input.X = [1 -1 -1 1;1 -1 1 -1];
input.R = [1 1 0 0;0 0 1 1];
specs.W = {[];s * randn(2,2);s * randn(2,2)};
specs.b = {[];s * randn(2,1);s * randn(2,1)};
specs.activationFunc = #(x) 1./(1 + exp(-x));
specs.activationFuncDerive = #(x) exp(-x)./(1 + exp(-x)).^2;
specs.NumEpochs = 1e4;
specs.alpha = 1;
output = neuralNet4e(input,specs);
I'm getting (after 10000 epoches) that the final output of the net is
output.A{3} = [0.5 0.5 0.5 0.5;0.5 0.5 0.5 0.5]
but when I changed s = 0.02; to s = 1; I got output.A{3} = [0.989 0.987 0.010 0.010;0.010 0.012 0.0.98 0.98] as it should.
Is it possible to get these results with `s=0.02;' and I am doing something wrong in my code? or is standard deviation of 0.02 is just a typo?

Based on your code, I don't see any errors. In my knowledge, the result that you got,
[0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
That is a typical result of overfitting. There are many reasons for this to happen, such as too many epochs, too large learning rate, too small sample data, and others.
On your example, s=0.02 limits the values of randomized weights and biases. Changing that to s=1 makes the randomized values unchanged/unscaled.
To make the s=0.02 one work, you can try minimizing the number of epochs or maybe lowering the alpha.
Hope this helps.

Neural Network Backpropagation Algorithm Implementation

I implemented a Neural Network Back propagation Algorithm in MATLAB, however is is not training correctly. The training data is a matrix X = [x1, x2], dimension 2 x 200 and I have a target matrix T = [target1, target2], dimension 2 x 200. The first 100 columns in T can be [1; -1] for class 1, and the second 100 columns in T can be [-1; 1] for class 2.
theta = 0.1; % criterion to stop
eta = 0.1; % step size
Nh = 10; % number of hidden nodes
For some reason the total training error is always 1.000, it never goes close to the theta, so it runs forever.
I used the following formulas:
The total training error:
The code is well documented below. I would appreciate any help.
clear;
close all;
clc;
%%('---------------------')
%%('Generating dummy data')
%%('---------------------')
d11 = [2;2]*ones(1,70)+2.*randn(2,70);
d12 = [-2;-2]*ones(1,30)+randn(2,30);
d1 = [d11,d12];
d21 = [3;-3]*ones(1,50)+randn([2,50]);
d22 = [-3;3]*ones(1,50)+randn([2,50]);
d2 = [d21,d22];
hw5_1 = d1;
hw5_2 = d2;
save hw5.mat hw5_1 hw5_2
x1 = hw5_1;
x2 = hw5_2;
% step 1: Construct training data matrix X=[x1,x2], dimension 2x200
training_data = [x1, x2];
% step 2: Construct target matrix T=[target1, target2], dimension 2x200
target1 = repmat([1; -1], 1, 100); % class 1
target2 = repmat([-1; 1], 1, 100); % class 2
T = [target1, target2];
% step 3: normalize training data
training_data = training_data - mean(training_data(:));
training_data = training_data / std(training_data(:));
% step 4: specify parameters
theta = 0.1; % criterion to stop
eta = 0.1; % step size
Nh = 10; % number of hidden nodes, actual hidden nodes should be 11 (including a biase)
Ni = 2; % dimension of input vector = number of input nodes, actual input nodes should be 3 (including a biase)
No = 2; % number of class = number of out nodes
% step 5: Initialize the weights
a = -1/sqrt(No);
b = +1/sqrt(No);
inputLayerToHiddenLayerWeight = (b-a).*rand(Ni, Nh) + a
hiddenLayerToOutputLayerWeight = (b-a).*rand(Nh, No) + a
J = inf;
p = 1;
% activation function
% f(net) = a*tanh(b*net),
% f'(net) = a*b*sech2(b*net)
a = 1.716;
b = 2/3;
while J > theta
% step 6: randomly choose one training sample vector from X,
% together with its target vector
k = randi([1, size(training_data, 2)]);
input_X = training_data(:,k);
input_T = T(:,k);
% step 7: Calculate net_j values for hidden nodes in layer 1
% hidden layer output before activation function applied
netj = inputLayerToHiddenLayerWeight' * input_X;
% step 8: Calculate hidden node output Y using activation function
% apply activation function to hidden layer neurons
Y = a*tanh(b*netj);
% step 9: Calculate net_k values for output nodes in layer 2
% output later output before activation function applied
netk = hiddenLayerToOutputLayerWeight' * Y;
% step 10: Calculate output node output Z using the activation function
% apply activation function to the output layer neurons
Z = a*tanh(b*netk);
% step 11: Calculate sensitivity delta_k = (target - Z) * f'(Z)
% find the error between the expected_output and the neuron output
% we got using the weights
% delta_k = (expected - output) * activation(output)
delta_k = [];
for i=1:size(Z)
yi = Z(i,:);
expected_output = input_T(i,:);
delta_k = [delta_k; (expected_output - yi) ...
* a*b*(sech(b*yi)).^2];
end
% step 12: Calculate sensitivity
% delta_j = Sum_k(delta_k * hidden-to-out weights) * f'(net_j)
% error = (weight_k * error_j) * activation(output)
delta_j = [];
for j=1:size(Y)
yi = Y(j,:);
error = 0;
for k=1:size(delta_k)
error = error + delta_k(k,:)*hiddenLayerToOutputLayerWeight(j, k);
end
delta_j = [delta_j; error * (a*b*(sech(b*yi)).^2)];
end
% step 13: update weights
%2x10
inputLayerToHiddenLayerWeight = [];
for i=1:size(input_X)
xi = input_X(i,:);
wji = [];
for j=1:size(delta_j)
wji = [wji, eta * xi * delta_j(j,:)];
end
inputLayerToHiddenLayerWeight = [inputLayerToHiddenLayerWeight; wji];
end
inputLayerToHiddenLayerWeight
%10x2
hiddenLayerToOutputLayerWeight = [];
for j=1:size(Y)
yi = Y(j,:);
wjk = [];
for k=1:size(delta_k)
wjk = [wjk, eta * delta_k(k,:) * yi];
end
hiddenLayerToOutputLayerWeight = [hiddenLayerToOutputLayerWeight; wjk];
end
hiddenLayerToOutputLayerWeight
% Mean Square Error
J = 0;
for j=1:size(training_data, 2)
X = training_data(:,j);
t = T(:,j);
netj = inputLayerToHiddenLayerWeight' * X;
Y = a*tanh(b*netj);
netk = hiddenLayerToOutputLayerWeight' * Y;
Z = a*tanh(b*netk);
J = J + immse(t, Z);
end
J = J/size(training_data, 2)
p = p + 1;
if p == 4
break;
end
end
% testing neural network using the inputs
test_data = [[2; -2], [-3; -3], [-2; 5], [3; -4]];
for i=1:size(test_data, 2)
end

Weight decay isn't essential for Neural Network training.
What I did notice was that your feature normalization wasn't correct.
The correct algorthim for scaling data to the range of 0 to 1 is
(max - x) / (max - min)
Note: you apply this for every element within the array (or vector). Data inputs for NN need to be within the range of [0,1]. (Technically they can be a little bit outside of that ~[-3,3] but values furthur from 0 make training difficult)
edit*
I am unaware of this activation function
a = 1.716;
b = 2/3;
% f(net) = a*tanh(b*net),
% f'(net) = a*b*sech2(b*net)
It sems like a variation on tanh.
Could you elaborate what it is?
If you're net still doesn't work give me an update and I'll look at your code more closely.

How to use Neural network for non binary input and output

I tried to use the modified version of NN back propagation code by Phil Brierley
(www.philbrierley.com). When i try to solve the XOR problem it works perfectly. but when i try to solve a problem of the form output = x1^2 + x2^2 (ouput = sum of squares of input), the results are not accurate. i have scaled the input and ouput between -1 and 1. I get different results every time i run the same program (i understand its due to random wts initialization), but results are very different. i tried changing learning rate but still results converge.
have given the code below
%---------------------------------------------------------
% MATLAB neural network backprop code
% by Phil Brierley
%--------------------------------------------------------
clear; clc; close all;
%user specified values
hidden_neurons = 4;
epochs = 20000;
input = [];
for i =-10:2.5:10
for j = -10:2.5:10
input = [input;i j];
end
end
output = (input(:,1).^2 + input(:,2).^2);
output1 = output;
% Maximum input and output limit and scaling factors
m1 = -10; m2 = 10;
m3 = 0; m4 = 250;
c = -1; d = 1;
%Scale input and output
for i =1:size(input,2)
I = input(:,i);
scaledI = ((d-c)*(I-m1) ./ (m2-m1)) + c;
input(:,i) = scaledI;
end
for i =1:size(output,2)
I = output(:,i);
scaledI = ((d-c)*(I-m3) ./ (m4-m3)) + c;
output(:,i) = scaledI;
end
train_inp = input;
train_out = output;
%read how many patterns and add bias
patterns = size(train_inp,1);
train_inp = [train_inp ones(patterns,1)];
%read how many inputs and initialize learning rate
inputs = size(train_inp,2);
hlr = 0.1;
%set initial random weights
weight_input_hidden = (randn(inputs,hidden_neurons) - 0.5)/10;
weight_hidden_output = (randn(1,hidden_neurons) - 0.5)/10;
%Training
err = zeros(1,epochs);
for iter = 1:epochs
alr = hlr;
blr = alr / 10;
%loop through the patterns, selecting randomly
for j = 1:patterns
%select a random pattern
patnum = round((rand * patterns) + 0.5);
if patnum > patterns
patnum = patterns;
elseif patnum < 1
patnum = 1;
end
%set the current pattern
this_pat = train_inp(patnum,:);
act = train_out(patnum,1);
%calculate the current error for this pattern
hval = (tanh(this_pat*weight_input_hidden))';
pred = hval'*weight_hidden_output';
error = pred - act;
% adjust weight hidden - output
delta_HO = error.*blr .*hval;
weight_hidden_output = weight_hidden_output - delta_HO';
% adjust the weights input - hidden
delta_IH= alr.*error.*weight_hidden_output'.*(1-(hval.^2))*this_pat;
weight_input_hidden = weight_input_hidden - delta_IH';
end
% -- another epoch finished
%compute overall network error at end of each epoch
pred = weight_hidden_output*tanh(train_inp*weight_input_hidden)';
error = pred' - train_out;
err(iter) = ((sum(error.^2))^0.5);
%stop if error is small
if err(iter) < 0.001
fprintf('converged at epoch: %d\n',iter);
break
end
end
%Output after training
pred = weight_hidden_output*tanh(train_inp*weight_input_hidden)';
Y = m3 + (m4-m3)*(pred-c)./(d-c);
% Testing for a new set of input
input_test = [6 -3.1; 0.5 1; -2 3; 3 -2; -4 5; 0.5 4; 6 1.5];
output_test = (input_test(:,1).^2 + input_test(:,2).^2);
input1 = input_test;
%Scale input
for i =1:size(input1,2)
I = input1(:,i);
scaledI = ((d-c)*(I-m1) ./ (m2-m1)) + c;
input1(:,i) = scaledI;
end
%Predict output
train_inp1 = input1;
patterns = size(train_inp1,1);
bias = ones(patterns,1);
train_inp1 = [train_inp1 bias];
pred1 = weight_hidden_output*tanh(train_inp1*weight_input_hidden)';
%Rescale
Y1 = m3 + (m4-m3)*(pred1-c)./(d-c);
analy_numer = [output_test Y1']
plot(err)
This is the sample output i get for problem
state after 20000 epochs
analy_numer =
45.6100 46.3174
1.2500 -2.9457
13.0000 11.9958
13.0000 9.7097
41.0000 44.9447
16.2500 17.1100
38.2500 43.9815
if i run once more i get different results. as can be observed for small values of input i get totally wrong ans (negative ans not possible). for other values accuracy is still poor.
can someone tell what i am doing wrong and how to correct.
thanks
raman

Continuous RBM: Poor performance only for negative valued input data?

i tried to port this python implementation of a continuous RBM to Matlab:
http://imonad.com/rbm/restricted-boltzmann-machine/
I generated 2-dimensional trainingdata in the shape of a (noisy) circle and trained the rbm with 2 visible an 8 hidden layers. To test the implementation i fed uniformly distributed randomdata to the RBM and plotted the reconstructed data (Same procedure as used in the link above).
Now the confusing part: With trainingdata in the range of (0,1)x(0,1) i get very satisfying results, however with trainingdata in range (-0.5,-0.5)x(-0.5,-0.5) or (-1,0)x(-1,0) the RBM reconstructs only data in the very right top of the circle. I dont understand what causes this, is it just a bug in my implementation i dont see?
Some plots, the blue dots are the training data, the red dots are the reconstructions.
Here is my implementation of the RBM:
Training:
maxepoch = 300;
ksteps = 10;
sigma = 0.2; % cd standard deviation
learnW = 0.5; % learning rate W
learnA = 0.5; % learning rate A
nVis = 2; % number of visible units
nHid = 8; % number of hidden units
nDat = size(dat, 1);% number of training data points
cost = 0.00001; % cost
moment = 0.9; % momentum
W = randn(nVis+1, nHid+1) / 10; % weights
dW = randn(nVis+1, nHid+1) / 1000; % change of weights
sVis = zeros(1, nVis+1); % state of visible neurons
sVis(1, end) = 1.0; % bias
sVis0 = zeros(1, nVis+1); % initial state of visible neurons
sVis0(1, end) = 1.0; % bias
sHid = zeros(1, nHid+1); % state of hidden neurons
sHid(1, end) = 1.0; % bias
aVis = 0.1*ones(1, nVis+1);% A visible
aHid = ones(1, nHid+1); % A hidden
err = zeros(1, maxepoch);
e = zeros(1, maxepoch);
for epoch = 1:maxepoch
wPos = zeros(nVis+1, nHid+1);
wNeg = zeros(nVis+1, nHid+1);
aPos = zeros(1, nHid+1);
aNeg = zeros(1, nHid+1);
for point = 1:nDat
sVis(1:nVis) = dat(point, :);
sVis0(1:nVis) = sVis(1:nVis); % initial sVis
% positive phase
activHid;
wPos = wPos + sVis' * sHid;
aPos = aPos + sHid .* sHid;
% negative phase
activVis;
activHid;
for k = 1:ksteps
activVis;
activHid;
end
tmp = sVis' * sHid;
wNeg = wNeg + tmp;
aNeg = aNeg + sHid .* sHid;
delta = sVis0(1:nVis) - sVis(1:nVis);
err(epoch) = err(epoch) + sum(delta .* delta);
e(epoch) = e(epoch) - sum(sum(W' * tmp));
end
dW = dW*moment + learnW * ((wPos - wNeg) / numel(dat)) - cost * W;
W = W + dW;
aHid = aHid + learnA * (aPos - aNeg) / (numel(dat) * (aHid .* aHid));
% error
err(epoch) = err(epoch) / (nVis * numel(dat));
e(epoch) = e(epoch) / numel(dat);
disp(['epoch: ' num2str(epoch) ' err: ' num2str(err(epoch)) ...
' ksteps: ' num2str(ksteps)]);
end
save(['rbm_' filename '.mat'], 'W', 'err', 'aVis', 'aHid');
activHid.m:
sHid = (sVis * W) + randn(1, nHid+1);
sHid = sigFun(aHid .* sHid, datRange);
sHid(end) = 1.; % bias
activVis.m:
sVis = (W * sHid')' + randn(1, nVis+1);
sVis = sigFun(aVis .* sVis, datRange);
sVis(end) = 1.; % bias
sigFun.m:
function [sig] = sigFun(X, datRange)
a = ones(size(X)) * datRange(1);
b = ones(size(X)) * (datRange(2) - datRange(1));
c = ones(size(X)) + exp(-X);
sig = a + (b ./ c);
end
Reconstruction:
nSamples = 2000;
ksteps = 10;
nVis = 2;
nHid = 8;
sVis = zeros(1, nVis+1); % state of visible neurons
sVis(1, end) = 1.0; % bias
sHid = zeros(1, nHid+1); % state of hidden neurons
sHid(1, end) = 1.0; % bias
input = rand(nSamples, 2);
output = zeros(nSamples, 2);
for sample = 1:nSamples
sVis(1:nVis) = input(sample, :);
for k = 1:ksteps
activHid;
activVis;
end
output(sample, :) = sVis(1:nVis);
end

RBM's were originally designed to work only with binary data. But also work with data between 0 and 1. Its part of the algorithm. Further reading

As input is in the range of [0 1] for both x and y, this is why they stay in that ares. Changing the input to input = (rand(nSamples, 2)*2) -1; results in input sampled from a range of [-1 1] and therefore the red dots will be more spread out around the circle.

Octave backpropagation implementation issues

I wrote a code to implement steepest descent backpropagation with which I am having issues. I am using the Machine CPU dataset and have scaled the inputs and outputs into range [0 1]
The codes in matlab/octave is as follows:
steepest descent backpropagation
%SGD = Steepest Gradient Decent
function weights = nnSGDTrain (X, y, nhid_units, gamma, max_epoch, X_test, y_test)
iput_units = columns (X);
oput_units = columns (y);
n = rows (X);
W2 = rand (nhid_units + 1, oput_units);
W1 = rand (iput_units + 1, nhid_units);
train_rmse = zeros (1, max_epoch);
test_rmse = zeros (1, max_epoch);
for (epoch = 1:max_epoch)
delW2 = zeros (nhid_units + 1, oput_units)';
delW1 = zeros (iput_units + 1, nhid_units)';
for (i = 1:rows(X))
o1 = sigmoid ([X(i,:), 1] * W1); %1xn+1 * n+1xk = 1xk
o2 = sigmoid ([o1, 1] * W2); %1xk+1 * k+1xm = 1xm
D2 = o2 .* (1 - o2);
D1 = o1 .* (1 - o1);
e = (y_test(i,:) - o2)';
delta2 = diag (D2) * e; %mxm * mx1 = mx1
delta1 = diag (D1) * W2(1:(end-1),:) * delta2; %kxm * mx1 = kx1
delW2 = delW2 + (delta2 * [o1 1]); %mx1 * 1xk+1 = mxk+1 %already transposed
delW1 = delW1 + (delta1 * [X(i, :) 1]); %kx1 * 1xn+1 = k*n+1 %already transposed
end
delW2 = gamma .* delW2 ./ n;
delW1 = gamma .* delW1 ./ n;
W2 = W2 + delW2';
W1 = W1 + delW1';
[dummy train_rmse(epoch)] = nnPredict (X, y, nhid_units, [W1(:);W2(:)]);
[dummy test_rmse(epoch)] = nnPredict (X_test, y_test, nhid_units, [W1(:);W2(:)]);
printf ('Epoch: %d\tTrain Error: %f\tTest Error: %f\n', epoch, train_rmse(epoch), test_rmse(epoch));
fflush (stdout);
end
weights = [W1(:);W2(:)];
% plot (1:max_epoch, test_rmse, 1);
% hold on;
plot (1:max_epoch, train_rmse(1:end), 2);
% hold off;
end
predict
%Now SFNN Only
function [o1 rmse] = nnPredict (X, y, nhid_units, weights)
iput_units = columns (X);
oput_units = columns (y);
n = rows (X);
W1 = reshape (weights(1:((iput_units + 1) * nhid_units),1), iput_units + 1, nhid_units);
W2 = reshape (weights((((iput_units + 1) * nhid_units) + 1):end,1), nhid_units + 1, oput_units);
o1 = sigmoid ([X ones(n,1)] * W1); %nxiput_units+1 * iput_units+1xnhid_units = nxnhid_units
o2 = sigmoid ([o1 ones(n,1)] * W2); %nxnhid_units+1 * nhid_units+1xoput_units = nxoput_units
rmse = RMSE (y, o2);
end
RMSE function
function rmse = RMSE (a1, a2)
rmse = sqrt (sum (sum ((a1 - a2).^2))/rows(a1));
end
I have also trained the same dataset using the R RSNNS package mlp and the RMSE for train set (first 100 examples) are around 0.03 . But in my implementation I cannot achieve lower RMSE than 0.14 . And sometimes the errors grow for some higher learning rates, and no learning rate gets me lower RMSE than 0.14. Also a paper i referred report the RMSE in for the train set is around 0.03
I wanted to know where is the problem i the code. I have followed Raul Rojas book and confirmed that things are okay.

In backprobagation code the line
e = (y_test(i,:) - o2)';
is not correct, because the o2 is the output from the train set and i am finding the difference from one example from the test set y_test. The line should have been as below:
e = (y(i,:) - o2)';
which correctly finds the difference between the predicted output by the current model and the target output of the corresponding example.
This took me 3 days to find this one, I am fortunate enough to find this freaking bug which stopped me from going into further modifications.