How to calculate weights of a perceptron with vector output? - neural-network

I have a perceptron which takes two inputs with weights accordingly
i1 = [i1_0, i1_1]
w1 = w1
i2 = [i2_0, i2_1]
w2 = w2
My output for this perceptron is p[p0, p1].
How do I calculate new weights during backpropagation?
I always did it for i1 and i2 being only numbers not vectors and in that case I would use
delta = CorrectResult - (MyCurrentResult * Derivative)
newWeight = oldWeight + (learninRatio * delta * p)
Of course if I tried to count it that way I would have weight being a vector eg w1 = [w1_0, w1_1] too but I need it to be just a number.
Should I calculate w1_0 and w1_1 and then use some mean value or is there a different way?

Related

Bias derivative in backpropagation

I am building a 4 - 5 - 2 layout neural network with m data points, and each data point has 4 independent variables. So for forward propagation my network shape is is:
L1: mx4 * 4x5 + 1x5 = Z1 -> A1 = ReLu(Z1).
L2: mx5 * 5x2 + 1x2 = Z2 -> A2 = softmax(Z2)
For this I get A1 = mx5 and A2 = mx2, respectively, and when finding dweight, I can get two matrices of size 4x5 and 5x2 for dweight1 and dweight2, which means I can perform matrix subtraction for gradient descent.
However, for biases, I am stuck. dL/db2 = dL/dZ2 * dZ2/db2 = (A2 - Y) * 1, which gives me a 2xm matrix, and Im stuck. Is my matrices shape wrong somewhere?

Why I got a big MSE when I try to verify convolution theorem in matlab?

I want to verify the convolution theorem in matlab.
Firstly, I do a 2D discrete convolution of a 2D Gaussian with
an image graymap(x, y).
Secondly, I compute the Fourier Transform of
the same 2D Gaussian and of the original image. Then perform a scalar multiplication
of these two Fourier Transforms, followed by an inverse Fourier Transform of the result.
Finally, I will calculate the MSE between the two results. However, I found the err is 800+.
This is my code:
[row, col] = size(graymap);
[row_2, col_2] = size(z);
result = zeros(row, col);
for i = 1: col
for j = 1:row
accumulation_value = 0;
for k = -4:4
for h = -4:4
if ((i+k > 0 && i+k < col + 1) && (j+h > 0 && j+h < row + 1))
value_image = double(graymap(i+k, j+h));
else
value_image = 0;
end
accumulation_value = accumulation_value + value_image * double(z(5 + k, 5 + h));
weighted_sum = weighted_sum + z(5 + k, 5 + h);
end
end
result(i,j) = (accumulation_value);
end
result_blur_1 = uint8(255*mat2gray(result));
M = size(graymap,1);
N = size(graymap,2);
resIFFT = ifft2(fft2(double(graymap), M, N) .* fft2(double(z), M, N));
result_blur_2 = uint8(255*mat2gray(resIFFT));
err = immse(result_blur_1, result_blur_2);
z is the 9*9 gaussian kernel. I don't flip it because it is symmetric.
I think my implementation of convolution is correct because the result is same as conv2(graymap, z, 'same').
Therefore, I believe there are something wrong with the second part. In fact, I am confused on how padding works. May it is the cause of the big MSE.
There are indeed problems with your implementation of the second part. The most important rule to remember when implementing convolution via fft is that you are actually calculating a circular convolution, not a linear convolution. Fortunately, there is a condition under which the two become equivalent. This condition is that the two arrays should be zero-padded to have a size equal to the sum of the sizes of each minus 1 (in all dimensions). So if you are working with an image X of size MxN, and a mask Z of size PxQ, then you should pad the two arrays with zeros to so they have at least dimensions M+P-1xN+Q-1. Any additional zeros won't hurt, so it's convenient to match a 'fft-friendly' size if possible (using nextpow2 for example). You just have to take the first M+P-1xN+Q-1 values.
Now, that would work straight forward if you just wanted the full result of the convolution. But because you want the central part of the convolution (the option 'same'), you need to select the correct indexes. The first index will be ceil(([P Q] - 1)/2) + 1, and then you take as many consecutive indexes as the image size.
Here is an example putting all together:
M = randperm(1024,1);
N = randperm(1024,1);
X = rand(M,N);
P = randperm(64,1);
Q = randperm(64,1);
Z = rand(P,Q);
% 'standard' convolution with option 'same'
C1 = conv2(X,Z,'same');
R = 2^nextpow2(M+P-1);
S = 2^nextpow2(N+Q-1);
% convolution with fft. Notice the zero-padding to R,S
C2 = real(ifft2(fft2(X,R,S) .* fft2(Z,R,S)));
n = ceil(([P Q] - 1)/2);
ind{1} = n(1) + (1:M);
ind{2} = n(2) + (1:N);
C2 = C2(ind{:});
err = immse(C1,C2)
I get errors of the order of 1e-26

Getting NaN values in neural network weight matrices

**I am trying to develop a feedforward NN in MATLAB. I have a dataset of 12 inputs and 1 output with 46998 samples. I have some NaN values in last rows of Matrix, because some inputs are accelerations & velocities which are 1 & 2 steps less respectively than displacements.
With this current data set I am getting w1_grad & w2_grad as NaN matrices. I tried to remove them using `Heave_dataset(isnan(Heave_dataset))=[];, but my dataset is getting converted into a column matrix of (1*610964).
can anyone help me with this ?
%
%% Clear Variables, Close Current Figures, and Create Results Directory
clc;
clear all;
close all;
mkdir('Results//'); %Directory for Storing Results
%% Configurations/Parameters
load 'Heave_dataset'
% Heave_dataset(isnan(Heave_dataset))=[];
nbrOfNeuronsInEachHiddenLayer = 24;
nbrOfOutUnits = 1;
unipolarBipolarSelector = -1; %0 for Unipolar, -1 for Bipolar
learningRate = 0.08;
nbrOfEpochs_max = 50000;
%% Read Data
Input = Heave_dataset(:, 1:length(Heave_dataset(1,:))-1);
TargetClasses = Heave_dataset(:, length(Heave_dataset(1,:)));
%% Calculate Number of Input and Output NodesActivations
nbrOfInputNodes = length(Input(1,:)); %=Dimention of Any Input Samples
nbrOfLayers = 2 + length(nbrOfNeuronsInEachHiddenLayer);
nbrOfNodesPerLayer = [nbrOfInputNodes nbrOfNeuronsInEachHiddenLayer nbrOfOutUnits];
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Forward Pass %%%%%%%%%%%
%% Adding the Bias to Input layer
Input = [ones(length(Input(:,1)),1) Input];
%% Weights leading from input layer to hidden layer is w1
w1 = rand(nbrOfNeuronsInEachHiddenLayer,(nbrOfInputNodes+1));
%% Input & output of hidde layer
hiddenlayer_input = Input*w1';
hiddenlayer_output = -1 + 2./(1 + exp(-(hiddenlayer_input)));
%% Adding the Bias to hidden layer
hiddenlayer_output = [ones(length(hiddenlayer_output(:,1)),1) hiddenlayer_output];
%% Weights leading from input layer to hidden layer is w1
w2 = rand(nbrOfOutUnits,(nbrOfNeuronsInEachHiddenLayer+1));
%% Input & output of hidde layer
outerlayer_input = hiddenlayer_output*w2';
outerlayer_output = outerlayer_input;
%% Error Calculation
TotalError = 0.5*(TargetClasses-outerlayer_output).^2;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Backward Pass %%%%%%%%%%%
d3 = outerlayer_output - TargetClasses;
d2 = (d3*w2).*hiddenlayer_output.*(1-hiddenlayer_output);
d2 = d2(:,2:end);
D1 = d2' * Input;
D2 = d3' * hiddenlayer_output;
w1_grad = D1/46998 + learningRate*[zeros(size(w1,1),1) w1(:,2:end)]/46998;
w2_grad = D2/46998 + learningRate*[zeros(size(w2,1),1) w2(:,2:end)]/46998;
You should try vectorize your algorithm. First arrange your data in a 46998x12 matrix X.Add bias to X like X=[ones(46998,1 X]. Then the weights leading from input layer to first hidden layer must be arranged in a matrix W1 with dimensions numberofneuronsinfirsthiddenlayer(24)x(input + 1). Then XW1' is what you feed in your neuron function (either is it sigmoid or whatever it is). The result (like sigmoid(XW') is the output of neurons at hidden level 1. You add bias like before and multiply by weight matrix W2 (the weights that lead from hidden layer 1 to hidden layer 2) and so on. Hope this helps to get you started vectorizing your code at least for the feedforward part. The back-propagation part is a little trickier but luckily involves the same matrices.
I will shortly recite the feedforward process so that we use same language talking about backpropagation.
There is the data called X.(dimensions 46998x12)
A1 = [ones(46998,1 X] is the input including bias. (46998x13)
Z2 = A1*W1' (W1 is the weight matrix that leads from input to hidden layer 1)
A2 = sigmoid(Z2);
A2 = [ones(m,1) A2]; adding bias again
Z3 = A2 * W2';
A3 = sigmoid(Z3);
Supposing you only have one hidden layer feedforward stops here. I'll start backwards now and you can generalize as appropriate.
d3 = A3 - Y; (Y must is part of your data, the actual values of the data with which you train your nn)
d2 = (d3 * W2).* A2 .* (1-A2); ( Sigmod function has a nice property that d(sigmoid(z))/dz = sigmoid(z)*(1-sigmoid(z)).)
d2 = d2(:,2:end);(You dont need the first column that corresponds in the bias)
D1 = d2' * A1;
D2 = d3' * A2;
W1_grad = D1/m + lambda*[zeros(size(W1,1),1) W1(:,2:end)]/m; (lamda is the earning rate, m is 46998)
W2_grad = D2/m + lambda*[zeros(size(W2,1),1) W2(:,2:end)]/m;
Everything must be in place now except for the vectorized cost function which have to be minimized. Hope this helps a bit...

expectation maximization algorithm matlab out of memory error

I am implementing Expectation Maximization algorithm in matlab. Algorithm is operating on 214096 x 2 data matrix and While computing probabilities, there is multiplication of ( 214096 x 2 ) * (2 x 2) * ( 2 x 214096 ) matrices, which is resulting in error of out of memory in matlab. Is there a way to fix this problem?
Equation
Matlab Code:
enter image description here D = size(X,2); % dimension
N = size(X,1); % number of samples
K = 4; % number of Gaussian Mixture components ( Also number of clusters )
% Initialization
p = [0.2, 0.3, 0.2, 0.3]; % arbitrary pi, probabilities of clusters, apriori probability of cluster
[idx,mu] = kmeans(X,K); % initial means of the components, theta is mu and variance
% compute the covariance of the components
sigma = zeros(D,D,K);
for k = 1:K
tempmat = X(idx==k,:);
sigma(:,:,k) = cov(tempmat); % Sigma j
sigma_det(k) = det(sigma(:,:,k));
end
% calculate x-mu
for k=1: K
check=length( X(idx == k,1))
for lidx = 1: length( X(idx == k,1))
cidx = find( idx == k) ;
Xmu(cidx(lidx),:) = X(cidx(lidx),:) - mu(k,:); %( x-mu ) calculation on cluster level
end
end
% compute P(Cj|x; theta(t)), and take log to simplified calculation
%Eq 14.14 denominator
denom = 0;
for k=1:K
calc_sigma_1_2 = sigma_det(k)^(-1/2);
calc_x_mu = Xmu(idx == k,:);
calc_sigma_inv = inv(sigma(:,:,k));
calc_x_mu_tran = calc_x_mu.';
factor = calc_sigma_1_2 * exp (-1/2 * calc_x_mu * calc_sigma_inv * calc_x_mu_tran ) * p(k);
denom = denom + factor;
end
for k =1:K
calc_sigma_1_2 = sigma_det(k)^(-1/2);
calc_x_mu = Xmu(idx == k,:);
calc_sigma_inv = inv(sigma(:,:,k));
calc_x_mu_tran = calc_x_mu.';
factor = calc_sigma_1_2 * exp (-1/2 * calc_x_mu_tran * calc_sigma_inv * calc_x_mu ) * p(k);
pdf(k) = factor/denom;
end
%%%% Equation 14.14 ends
It seems that you tried to apply vector based equation by simply substituting vector for matrix, this is not how it works
(x - mu).' * Inv(sigma) * (x-mu)
is supposed to be mahalanobis norm of (x-mu), and you want to obtain this value per each row of matrix X, thus
(X - mu).' * Inv(sigma) =: A <- this is ok, this results in N x d matrix
and now you have to do point-wise multiplication of A with (X - mu), not a dot product, and finally sum over second axis (columns), this way you end up with N element vector, each containing a mahalanobis norm of corresponding row from X.

Octave backpropagation implementation issues

I wrote a code to implement steepest descent backpropagation with which I am having issues. I am using the Machine CPU dataset and have scaled the inputs and outputs into range [0 1]
The codes in matlab/octave is as follows:
steepest descent backpropagation
%SGD = Steepest Gradient Decent
function weights = nnSGDTrain (X, y, nhid_units, gamma, max_epoch, X_test, y_test)
iput_units = columns (X);
oput_units = columns (y);
n = rows (X);
W2 = rand (nhid_units + 1, oput_units);
W1 = rand (iput_units + 1, nhid_units);
train_rmse = zeros (1, max_epoch);
test_rmse = zeros (1, max_epoch);
for (epoch = 1:max_epoch)
delW2 = zeros (nhid_units + 1, oput_units)';
delW1 = zeros (iput_units + 1, nhid_units)';
for (i = 1:rows(X))
o1 = sigmoid ([X(i,:), 1] * W1); %1xn+1 * n+1xk = 1xk
o2 = sigmoid ([o1, 1] * W2); %1xk+1 * k+1xm = 1xm
D2 = o2 .* (1 - o2);
D1 = o1 .* (1 - o1);
e = (y_test(i,:) - o2)';
delta2 = diag (D2) * e; %mxm * mx1 = mx1
delta1 = diag (D1) * W2(1:(end-1),:) * delta2; %kxm * mx1 = kx1
delW2 = delW2 + (delta2 * [o1 1]); %mx1 * 1xk+1 = mxk+1 %already transposed
delW1 = delW1 + (delta1 * [X(i, :) 1]); %kx1 * 1xn+1 = k*n+1 %already transposed
end
delW2 = gamma .* delW2 ./ n;
delW1 = gamma .* delW1 ./ n;
W2 = W2 + delW2';
W1 = W1 + delW1';
[dummy train_rmse(epoch)] = nnPredict (X, y, nhid_units, [W1(:);W2(:)]);
[dummy test_rmse(epoch)] = nnPredict (X_test, y_test, nhid_units, [W1(:);W2(:)]);
printf ('Epoch: %d\tTrain Error: %f\tTest Error: %f\n', epoch, train_rmse(epoch), test_rmse(epoch));
fflush (stdout);
end
weights = [W1(:);W2(:)];
% plot (1:max_epoch, test_rmse, 1);
% hold on;
plot (1:max_epoch, train_rmse(1:end), 2);
% hold off;
end
predict
%Now SFNN Only
function [o1 rmse] = nnPredict (X, y, nhid_units, weights)
iput_units = columns (X);
oput_units = columns (y);
n = rows (X);
W1 = reshape (weights(1:((iput_units + 1) * nhid_units),1), iput_units + 1, nhid_units);
W2 = reshape (weights((((iput_units + 1) * nhid_units) + 1):end,1), nhid_units + 1, oput_units);
o1 = sigmoid ([X ones(n,1)] * W1); %nxiput_units+1 * iput_units+1xnhid_units = nxnhid_units
o2 = sigmoid ([o1 ones(n,1)] * W2); %nxnhid_units+1 * nhid_units+1xoput_units = nxoput_units
rmse = RMSE (y, o2);
end
RMSE function
function rmse = RMSE (a1, a2)
rmse = sqrt (sum (sum ((a1 - a2).^2))/rows(a1));
end
I have also trained the same dataset using the R RSNNS package mlp and the RMSE for train set (first 100 examples) are around 0.03 . But in my implementation I cannot achieve lower RMSE than 0.14 . And sometimes the errors grow for some higher learning rates, and no learning rate gets me lower RMSE than 0.14. Also a paper i referred report the RMSE in for the train set is around 0.03
I wanted to know where is the problem i the code. I have followed Raul Rojas book and confirmed that things are okay.
In backprobagation code the line
e = (y_test(i,:) - o2)';
is not correct, because the o2 is the output from the train set and i am finding the difference from one example from the test set y_test. The line should have been as below:
e = (y(i,:) - o2)';
which correctly finds the difference between the predicted output by the current model and the target output of the corresponding example.
This took me 3 days to find this one, I am fortunate enough to find this freaking bug which stopped me from going into further modifications.