I am attempting to write a Multi-Layer Perceptron Network inside MATLAB to help me better understand the calculus required for backpropagation.
The aim is so provide the network with XOR data (where upper-right and lower-left quadrant data is class 1 and the remaining quadrants class 0), train the network on this data, and then test it on new data.
My problem is that my loss curve looks very very strange:
It appears to bounce between very low error very high error and converge in the middle to a pretty poor error.
I was wondering if someone could check that I have correctly implemented the chain rule in MATLAB syntax.
The MLP network is structured as follows: Input-layer has 2 neurons, 1 hidden-layer with 2 neurons, and 1 output neuron.
Here is the MATLAB code:
%Create XOR Dataset
x1pos = rand(500,1);
x1neg = -rand(500,1);
x1 = [x1pos; x1neg];
p = randperm(length(x1));
x1 = x1(p);
x2pos = rand(500,1);
x2neg = -rand(500,1);
x2 = [x1pos; x1neg];
p = randperm(length(x2));
x2 = x2(p);
Data = [x1 x2];
TrainingData = Data(1:800,:);
TestData = Data(801:length(Data),:);
T = gt((Data(:,1).*Data(:,2)),0); %Create class label for data and assign to matrix T
%Neural Net
%Training
W1 = rand(2,2); %Initialize random weights
W2 = rand(1,2); %Initialize random weights
B1 = rand(2,1); %Initialize random biases
B2 = rand(1,1); %Initialize random biases
n = 0.05; %Set Learning Rate
for i = 1:800
%Fwd Pass
x1 = Data(i,1);
x2 = Data(i,2);
X = [x1; x2];
A1 = W1*X + B1;
H1 = sigmoid(A1);
A2 = W2*H1 + B2;
Y = sigmoid(A2);
%Loss
Loss = (Y-T(i))*(Y-T(i));
scatter(i, Loss)
hold on;
%Backpropagation
dEdY = 2*(Y-T(i)); %The partial derivative of the loss with respect to the output
dYdA2 = Y*(1-Y); %The partial derivative of the output with respect to the hidden layer output
dA2dH1 = W2.'; %The partial derivative of the hidden layer output with respect to the first layer activations
dH1dA1 = H1.*(1-H1); %The partial derivative of the first layer activations with respect to the first layer output
%Chain Rule
dEdW2 = dEdY.*dYdA2.*W2.';
dEdW1 = dEdY.*dYdA2.*dA2dH1.*dH1dA1.*W1.';
dEdB2 = dEdY.*dYdA2;
dEdB1 = dEdY.*dYdA2.*dA2dH1.*dH1dA1;
%Update Weights
W2 = (W2.' - n.*dEdW2).';
W1 = (W1.' - n.*dEdW1).';
%Update Biases
B2 = B2 - n.*dEdB2;
B1 = B1 - n.*dEdB1;
%Next training loop
end
%Testing
for i = 801:1000
x1 = Data(i,1);
x2 = Data(i,2);
X = [x1; x2];
A1 = W1*X + B1;
H1 = sigmoid(A1);
A2 = W2*H1 + B2;
Y = sigmoid(A2);
end
function o = sigmoid(input)
o = [];
for i = 1:length(input)
o = [o; 1/(1+exp(-input(i)))];
end
end
Related
I have been working on a FastICA algorithm implementation using MatLab. Currently the code does not separate the signals as good as id like. I was wondering if anyone here could give me some advice on what I could do to fix this problem?
disp('*****Importing Signals*****');
s = [1,30000];
[m1,Fs1] = audioread('OSR_us_000_0034_8k.wav', s);
[f1,Fs2] = audioread('OSR_us_000_0017_8k.wav', s);
ss = size(f1,1);
n = 2;
disp('*****Mixing Signals*****');
A = randn(n,n); %developing mixing matrix
x = A*[m1';f1']; %A*x
m_x = sum(x, n)/ss; %mean of x
xx = x - repmat(m_x, 1, ss); %centering the matrix
c = cov(x');
sq = inv(sqrtm(c)); %whitening the data
x = c*xx;
D = diff(tanh(x)); %setting up newtons method
SD = diff(D);
disp('*****Generating Weighted Matrix*****');
w = randn(n,1); %Random weight vector
w = w/norm(w,2); %unit vector
w0 = randn(n,1);
w0 = w0/norm(w0,2); %unit vector
disp('*****Unmixing Signals*****');
while abs(abs(w0'*w)-1) > size(w,1)
w0 = w;
w = x*D(w'*x) - sum(SD'*(w'*x))*w; %perform ICA
w = w/norm(w, 2);
end
disp('*****Output After ICA*****');
sound(w'*x); % Supposed to be one of the original signals
subplot(4,1,1);plot(m1); title('Original Male Voice');
subplot(4,1,2);plot(f1); title('Original Female Voice');
subplot(4,1,4);plot(w'*x); title('Post ICA: Estimated Signal');
%figure;
%plot(z); title('Random Mixed Signal');
%figure;
%plot(100*(w'*x)); title('Post ICA: Estimated Signal');
Your covariance matrix c is 2 by 2, you cannot work with that. You have to mix your signal multiple times with random numbers to get anywhere, because you must have some signal (m1) common to different channels. I was unable to follow through your code for fast-ICA but here is a PCA example:
url = {'https://www.voiptroubleshooter.com/open_speech/american/OSR_us_000_0034_8k.wav';...
'https://www.voiptroubleshooter.com/open_speech/american/OSR_us_000_0017_8k.wav'};
%fs = 8000;
m1 = webread(url{1});
m1 = m1(1:30000);
f1 = webread(url{2});
f1 = f1(1:30000);
ss = size(f1,1);
n = 2;
disp('*****Mixing Signals*****');
A = randn(50,n); %developing mixing matrix
x = A*[m1';f1']; %A*x
[www,comp] = pca(x');
sound(comp(:,1)',8000)
I implemented a Neural Network Back propagation Algorithm in MATLAB, however is is not training correctly. The training data is a matrix X = [x1, x2], dimension 2 x 200 and I have a target matrix T = [target1, target2], dimension 2 x 200. The first 100 columns in T can be [1; -1] for class 1, and the second 100 columns in T can be [-1; 1] for class 2.
theta = 0.1; % criterion to stop
eta = 0.1; % step size
Nh = 10; % number of hidden nodes
For some reason the total training error is always 1.000, it never goes close to the theta, so it runs forever.
I used the following formulas:
The total training error:
The code is well documented below. I would appreciate any help.
clear;
close all;
clc;
%%('---------------------')
%%('Generating dummy data')
%%('---------------------')
d11 = [2;2]*ones(1,70)+2.*randn(2,70);
d12 = [-2;-2]*ones(1,30)+randn(2,30);
d1 = [d11,d12];
d21 = [3;-3]*ones(1,50)+randn([2,50]);
d22 = [-3;3]*ones(1,50)+randn([2,50]);
d2 = [d21,d22];
hw5_1 = d1;
hw5_2 = d2;
save hw5.mat hw5_1 hw5_2
x1 = hw5_1;
x2 = hw5_2;
% step 1: Construct training data matrix X=[x1,x2], dimension 2x200
training_data = [x1, x2];
% step 2: Construct target matrix T=[target1, target2], dimension 2x200
target1 = repmat([1; -1], 1, 100); % class 1
target2 = repmat([-1; 1], 1, 100); % class 2
T = [target1, target2];
% step 3: normalize training data
training_data = training_data - mean(training_data(:));
training_data = training_data / std(training_data(:));
% step 4: specify parameters
theta = 0.1; % criterion to stop
eta = 0.1; % step size
Nh = 10; % number of hidden nodes, actual hidden nodes should be 11 (including a biase)
Ni = 2; % dimension of input vector = number of input nodes, actual input nodes should be 3 (including a biase)
No = 2; % number of class = number of out nodes
% step 5: Initialize the weights
a = -1/sqrt(No);
b = +1/sqrt(No);
inputLayerToHiddenLayerWeight = (b-a).*rand(Ni, Nh) + a
hiddenLayerToOutputLayerWeight = (b-a).*rand(Nh, No) + a
J = inf;
p = 1;
% activation function
% f(net) = a*tanh(b*net),
% f'(net) = a*b*sech2(b*net)
a = 1.716;
b = 2/3;
while J > theta
% step 6: randomly choose one training sample vector from X,
% together with its target vector
k = randi([1, size(training_data, 2)]);
input_X = training_data(:,k);
input_T = T(:,k);
% step 7: Calculate net_j values for hidden nodes in layer 1
% hidden layer output before activation function applied
netj = inputLayerToHiddenLayerWeight' * input_X;
% step 8: Calculate hidden node output Y using activation function
% apply activation function to hidden layer neurons
Y = a*tanh(b*netj);
% step 9: Calculate net_k values for output nodes in layer 2
% output later output before activation function applied
netk = hiddenLayerToOutputLayerWeight' * Y;
% step 10: Calculate output node output Z using the activation function
% apply activation function to the output layer neurons
Z = a*tanh(b*netk);
% step 11: Calculate sensitivity delta_k = (target - Z) * f'(Z)
% find the error between the expected_output and the neuron output
% we got using the weights
% delta_k = (expected - output) * activation(output)
delta_k = [];
for i=1:size(Z)
yi = Z(i,:);
expected_output = input_T(i,:);
delta_k = [delta_k; (expected_output - yi) ...
* a*b*(sech(b*yi)).^2];
end
% step 12: Calculate sensitivity
% delta_j = Sum_k(delta_k * hidden-to-out weights) * f'(net_j)
% error = (weight_k * error_j) * activation(output)
delta_j = [];
for j=1:size(Y)
yi = Y(j,:);
error = 0;
for k=1:size(delta_k)
error = error + delta_k(k,:)*hiddenLayerToOutputLayerWeight(j, k);
end
delta_j = [delta_j; error * (a*b*(sech(b*yi)).^2)];
end
% step 13: update weights
%2x10
inputLayerToHiddenLayerWeight = [];
for i=1:size(input_X)
xi = input_X(i,:);
wji = [];
for j=1:size(delta_j)
wji = [wji, eta * xi * delta_j(j,:)];
end
inputLayerToHiddenLayerWeight = [inputLayerToHiddenLayerWeight; wji];
end
inputLayerToHiddenLayerWeight
%10x2
hiddenLayerToOutputLayerWeight = [];
for j=1:size(Y)
yi = Y(j,:);
wjk = [];
for k=1:size(delta_k)
wjk = [wjk, eta * delta_k(k,:) * yi];
end
hiddenLayerToOutputLayerWeight = [hiddenLayerToOutputLayerWeight; wjk];
end
hiddenLayerToOutputLayerWeight
% Mean Square Error
J = 0;
for j=1:size(training_data, 2)
X = training_data(:,j);
t = T(:,j);
netj = inputLayerToHiddenLayerWeight' * X;
Y = a*tanh(b*netj);
netk = hiddenLayerToOutputLayerWeight' * Y;
Z = a*tanh(b*netk);
J = J + immse(t, Z);
end
J = J/size(training_data, 2)
p = p + 1;
if p == 4
break;
end
end
% testing neural network using the inputs
test_data = [[2; -2], [-3; -3], [-2; 5], [3; -4]];
for i=1:size(test_data, 2)
end
Weight decay isn't essential for Neural Network training.
What I did notice was that your feature normalization wasn't correct.
The correct algorthim for scaling data to the range of 0 to 1 is
(max - x) / (max - min)
Note: you apply this for every element within the array (or vector). Data inputs for NN need to be within the range of [0,1]. (Technically they can be a little bit outside of that ~[-3,3] but values furthur from 0 make training difficult)
edit*
I am unaware of this activation function
a = 1.716;
b = 2/3;
% f(net) = a*tanh(b*net),
% f'(net) = a*b*sech2(b*net)
It sems like a variation on tanh.
Could you elaborate what it is?
If you're net still doesn't work give me an update and I'll look at your code more closely.
I am trying to approximate a function (the right hand side of a differential equation) of the form ddx=F(x,dx,u) (where x,dx,u are scalars and u is constant) with an RBF neural network. I have the function F as a black box (I can feed it with initial x,dx and u and take x and dx for a timespan I want) and during training (using sigma-modification) I am getting the following response plotting the real dx vs the approximated dx.
Then I save the parameters of the NN (the centers and the stds of the gaussians, and the final weights) and perform a simulation using the same initial x,dx and u as before and keeping, of course, the weights stable this time. But I get the following plot.
Is that logical? Am I missing something?
The training code is as follows:
%load the results I got from the real function
load sim_data t p pd dp %p is x,dp is dx and pd is u
real_states = [p,dp];
%down and upper limits of the variables
p_dl = 0;
p_ul = 2;
v_dl = -1;
v_ul = 4;
pd_dl = 0;%pd is constant each time,but the function should work for different pds
pd_ul = 2;
%number of gaussians
nc = 15;
x = p_dl:(p_ul-p_dl)/(nc-1):p_ul;
dx = v_dl:(v_ul-v_dl)/(nc-1):v_ul;
pdx = pd_dl:(pd_ul-pd_dl)/(nc-1):pd_ul;
%centers of gaussians
Cx = combvec(x,dx,pdx);
%stds of the gaussians
B = ones(1,3)./[2.5*(p_ul-p_dl)/(nc-1),2.5*(v_ul-v_dl)/(nc-1),2.5*(pd_ul-pd_dl)/(nc-1)];
nw = size(Cx,2);
wdx = zeros(nw,1);
state = real_states(1,[1,4]);%there are also y,dy,dz and z in real_states (ignored here)
states = zeros(length(t),2);
timestep = 0.005;
for step=1:length(t)
states(step,:) = state;
%compute the values of the sigmoids
Sx = exp(-1/2 * sum(((([real_states(step,1);real_states(step,4);pd(1)]*ones(1,nw))'-Cx').*(ones(nw,1)*B)).^2,2));
ddx = -530*state(2) + wdx'*Sx;
edx = state(2) - real_states(step,4);
dwdx = -1200*edx * Sx - 4 * wdx;
wdx = wdx + dwdx*timestep;
state = [state(1)+state(2)*timestep,state(2)+ddx*timestep];
end
save weights wdx Cx B
figure
plot(t,[dp(:,1),states(:,2)])
legend('x_d_o_t','x_d_o_t_h_a_t')
The code used to verify the approximation is the following:
load sim_data t p pd dp
real_states = [p,dp];
load weights wdx Cx B
nw = size(Cx,2);
state = real_states(1,[1,4]);
states = zeros(length(t),2);
timestep = 0.005;
for step=1:length(t)
states(step,:) = state;
Sx = exp(-1/2 * sum(((([real_states(step,1);real_states(step,4);pd(1)]*ones(1,nw))'-Cx').*(ones(nw,1)*B)).^2,2));
ddx = -530*state(2) + wdx'*Sx;
state = [state(1)+state(2)*timestep,state(2)+ddx*timestep];
end
figure
plot(t,[dp(:,1),states(:,2)])
legend('x_d_o_t','x_d_o_t_h_a_t')
I was trying to build a 5-layer neural network to classify a 3 classes, 178 instances and 13 features dataset. Basically I was following the guideline given here. I have written down my own code in Matlab and it can successfully run. However, the training result turns out to be very bad. The model keep predict the same class as the output. I could not found where is wrong with my code, or the model doesn't fit for the data? Could someone help me find where the problem is? Thank you very much.
My Matlab training code is shown below:
%% Initialization
numclass = 3; % num of class
c = 13; % num of feature
% for each layer, initialize each parameter w
and each b to a small random value near zero
w1 = normrnd(0,0.01,[c,10]); % Input layer -> layer 2 (10 nodes)
b1 = normrnd(0,0.01,[1,10]);
w2 = normrnd(0,0.01,[10,6]); % layer 2 -> layer 3 (6 nodes)
b2 = normrnd(0,0.01,[1,6]);
w3 = normrnd(0,0.01,[6,4]); % layer 3 -> layer 4 (4 nodes)
b3 = normrnd(0,0.01,[1,4]);
w4 = normrnd(0,0.01,[4,numclass]); % layer 4 -> Output layer (3 nodes/class label)
b4 = normrnd(0,0.01,[1,numclass]);
Iter = 0;
lambda = 0.5; % regularization coefficient
%% Batch Training
while Iter < 200
Iter = Iter+1
d_w1 = 0; d_w2 = 0; d_w3 = 0; d_b1 = 0; d_b2 = 0; d_b3 = 0;
d_w4 = 0; d_b4 = 0;
for i = 1:r
% Forward propagation
a1 = X0(i,:); % X0 is training data, each row represents a instance with 13 features
% Input layer -> Layer 2
z2 = a1*w1+b1;
a2 = sigmoid(z2);
% Layer 2 -> Layer 3
z3 = a2*w2+b2;
a3 = sigmoid(z3);
% Layer 3 -> Layer 4
z4 = a3*w3+b3;
a4 = sigmoid(z4);
% Layer 4 -> Output Layer
z5 = a4*w4+b4;
a5 = sigmoid(z5);
% Backward propagation
y = zeros(1,numclass);
y(Y0(i)) = 1; % Y0 is the training label ({1,2,3} in this case), each element indicates which class the instance belongs to
% Output Layer -> Layer 4
delta5 = (-(y-a5).*d_sigmoid(z5))';
% Output Layer -> Layer 3
delta4 = (w4*delta5).*d_sigmoid(z4');
% Layer 3 -> Layer 2
delta3 = (w3*delta4).*d_sigmoid(z3');
% Layer 2 -> Layer I
delta2 = (w2*delta3).*d_sigmoid(z2');
% Compute the desired partial derivatives
d_w1 = d_w1 + (delta2*a1)';
d_b1 = d_b1 + delta2';
d_w2 = d_w2 + (delta3*a2)';
d_b2 = d_b2 + delta3';
d_w3 = d_w3 + (delta4*a3)';
d_b3 = d_b3 + delta4';
d_w4 = d_w4 + (delta5*a4)';
d_b4 = d_b4 + delta5';
end
eta = 0.8; % leraning rate
% weights and bias updating
w1 = w1 - eta*((1/r*d_w1)+ lambda*w1);
b1 = b1 - eta*1/r*d_b1;
w2 = w2 - eta*((1/r*d_w2)+ lambda*w2);
b2 = b2 - eta*1/r*d_b2;
w3 = w3 - eta*((1/r*d_w3)+ lambda*w3);
b3 = b3 - eta*1/r*d_b3;
w4 = w4 - eta*((1/r*d_w4)+ lambda*w4);
b4 = b4 - eta*1/r*d_b4;
end
sigmoid and d_sigmoid function are shown below:
function y = sigmoid(x);
L=1;
k=10;
x0=0;
y = L./(1+exp(-k*(x-x0)));
end
function y = d_sigmoid(x)
tmp = sigmoid(x);
y = tmp.*(1-tmp);
end
The prediction code is shown below:
%% Prediction: X1 is testing data, and Y1 is a vector of testing label
[r,c] = size(X1);
for i = 1:r
A1 = X1(i,:);
% Input layer -> Layer 2
Z2 = A1*w1+b1;
A2 = sigmoid(Z2);
% Layer 2 -> Layer 3
Z3 = A2*w2+b2;
A3 = sigmoid(Z3);
% Layer 3 -> Layer 4
Z4 = A3*w3+b3;
A4 = sigmoid(Z4);
% Layer 4 -> Output Layer
Z5 = A4*w4+b4;
A5 = sigmoid(Z5);
pred(i) = find(A5==max(A5))
end
error = length(find((pred'-Y1)~=0))
The weights that I get from training, when implied directly on input, return different results!
I'll show it on a very simple example
let's say we have an input vector x= 0:0.01:1;
and target vector t=x^2 (I know it better to use non linear network)
after training, 2 layer, linear network, with one neuron at each layer, we get:
sim(net,0.95) = 0.7850 (some error in training - that's ok and should be)
weights from net.IW,net.LW,net.b:
IW =
0.4547
LW =
2.1993
b =
0.3328 -1.0620
if I use the weights: Out = purelin(purelin(0.95*IW+b(1))*LW+b(2)) = 0.6200! , I get different result from the result of the sim!
how can it be? what's wrong?
the code:
%Main_TestWeights
close all
clear all
clc
t1 = 0:0.01:1;
x = t1.^2;
hiddenSizes = 1;
net = feedforwardnet(hiddenSizes);
[Xs,Xi,Ai,Ts,EWs,shift] = preparets(net,con2seq(t1),con2seq(x));
net.layers{1,1}.transferFcn = 'purelin';
[net,tr,Y,E,Pf,Af] = train(net,Xs,Ts,Xi,Ai);
view(net);
IW = cat(2,net.IW{1});
LW = cat(2,net.LW{2,1});
b = cat(2,[net.b{1,1},net.b{2,1}]);
%Result from Sim
t2=0.95;
Yk = sim(net,t2)
%Result from Weights
x1 = IW*t2'+b(1)
x1out = purelin(x1)
x2 = purelin(x1out*(LW)+b(2))
The neural network toolbox rescales inputs and outputs to the [-1,1] range. You must therefore rescale and unscale it so that your simulation output is the same sim()'s output:
%Result from Weights
x1 = 2*t2 - 1; # rescale
x1 = IW*x1+b(1);
x1out = purelin(x1);
x2 = purelin(x1out*(LW)+b(2));
x2 = (x2+1)/2 # unscale
then
>> x2 == Yk
ans =
1