I've been reading up on a few topics regarding machine learning, neural networks and deep learning, one of which is this (in my opinion) excellent online book: http://neuralnetworksanddeeplearning.com/chap1.html
For the most part I've come to understand the workings of a neural network but there is one question which still bugs me (which is based on the example on the website):
I consider a three layer neural network with an input layer, hidden layer and output layer. Say these layers have 2, 3 and 1 neurons (although the amount doesn't really matter).
Now an input is given: x1 and x2. Because the network is [2, 3, 1] the weights are randomly generated the first time being a list containing a 2x3 and a 3x1 matrix. The biases is a list of a 3x1 and 1x1 matrix.
Now the part I don't get:
The formula calculated in in the hidden layer:
weights x input - biases = 0
On every iteration the weights and biases are changed slightly, based on the derivative in order to find a global optimum. If this is the cases, why don't the biases and weights for every neuron converge to the same weights and biases?
I think I found the answer by doing some tests as well as finding some information on the internet. The answer lies in the having random initial weigths and biases. If all "neurons" would be equal they would all come to the same result since the weights, biases and inputs are equal. Having random weights allows for different answers:
x1 = 1
x2 = 2
x3 = 3
w1 = [0, 0, 1], giving w dot x = 3
w2 = [3, 0, 0], giving w dot x = 3
If anyone can confirm, please do so.
Related
I consider the following recurrent neural network (RNN):
RNN under consideration
where x is the input (a vector of reals), h the hidden state vector and y is the output vector. I trained the network on Matlab using some data x and obtained W, V, and U.
However, in MATLAB after changing matrix W to W', and keeping U,V the same, the output (y) of the RNN that uses W is the same as the output (y') of the RNN that uses W' when both predict on the same data x. Those two outputs should be different just by looking at the above equation, but I don't seem to be able to do that in MATLAB (when I modify V or U, the outputs do change). How could I fix the code so that the outputs (y) and (y') are different as they should be?
The relevant code is shown below:
[x,t] = simplefit_dataset; % x: input data ; t: targets
net = newelm(x,t,5); % Recurrent neural net with 1 hidden layer (5 nodes) and 1 output layer (1 node)
net.layers{1}.transferFcn = 'tansig'; % 'tansig': equivalent to tanh and also is the activation function used for hidden layer
net.biasConnect = [0;0]; % biases set to zero for easier experimenting
net.derivFcn ='defaultderiv'; % defaultderiv: tells Matlab to pick whatever derivative scheme works best for this net
view(net) % displays the network topology
net = train(net,x,t); % trains the network
W = net.LW{1,1}; U = net.IW{1,1}; V = net.LW{2,1}; % network matrices
Y = net(x); % Y: output when predicting on data x using W
net.LW{1,1} = rand(5,5); % This is the modified matrix W, W'
Y_prime = net(x) % Y_prime: output when predicting on data x using W'
max(abs(Y-Y_prime )); % The difference between the two outputs is 0 when it probably shouldn't be.
Edit: minor corrections.
This is the recursion in your first layer: (from the docs)
The weight matrix for the weight going to the ith layer from the jth
layer (or a null matrix [ ]) is located at net.LW{i,j} if
net.layerConnect(i,j) is 1 (or 0).
So net.LW{1,1} are the weights to the first layer from the first layer (i.e. recursion), whereas net.LW{2,1} stores the weights to the second layer from the first layer. Now, what does it mean when one can change the weights of the recursion randomly without any effect (in fact, you can set them to zero net.LW{1,1} = zeros(size(W)); without an effect). Note that this essentially is the same as if you drop the recursion and create as simple feed-forward network:
Hypothesis: The recursion has no effect.
You will note that if you change the weights to the second layer (1 neuron) from the first layer (5 neurons) net.LW{2,1} = zeros(size(V));, it will affect your prediction (the same is of course true if you change the input weights net.IW).
Why does the recursion has no effect?
Well, that beats me. I have no idea where this special glitch is or what the theory is behind the newelm network.
I’m trying to use a Neural Network for purposes of binary classification. It consist of three layers. The first layer has three input neurons, the hidden layer has two neurons, and the output layer has three neurons that output a binary value of 1 or 0. Actually the output is usually a floating point number, but it typically rounds up to a whole number.
If the network only outputs vectors of 3, then shouldn't my input vectors be the same size? Otherwise, for classification, how else do you map the output to the input?
I wrote the neural network in Excel using VBA based on the following article: https://www.analyticsvidhya.com/blog/2017/05/neural-network-from-scratch-in-python-and-r/
So far it works exactly as described in the article. I don’t have access to a machine learning library at the moment so I’ve chosen to give this a try.
For example:
If the output of the network is [n, n ,n], does that mean that my input data has to be [n, n, n] also?
From what I read in here: Neural net input/output
It seems that's the way it should be. I'm not entirely sure though.
To speak simple,
for regression task, your output usually has the dimension [1] (if you predict single value).
For the classification task, your output should have the same number of dimensions equal to the number of classes you have (outputs are probabilities, the sum of them = 1).
So, there is no need to have equal dimensions of input and output. NN is just a projection of one dimension to another.
For example,
regression, we predict house prices: input is [1, 10] (to features of the property), the output is [1] - price
classification, we predict class (will be sold or not): input is [1, 11] (same features + listed price), output is [1, 2] (probability of class 0 (will be not sold) and 1 (will be sold); for example, [1; 0], [0; 1] or [0.5; 0.5] and so on; it is binary classification)
Additionally, equality of input-output dimensions exists in more specific tasks, for example, autoencoder models (when you need to present your data in other dimension and then represent it back, to the original dimension).
Again, the output dimension is the size of outputs for 1 batch. Only one, not of the whole dataset.
I want to use RBM pretraining weights from Hinton paper code for weights of MATLAB native feedforwardnet toolbox.
Anyone can help me how to set or arrange the pre-trained weight for feedforwardnet?
for instance, i used Hinton code from http://www.cs.toronto.edu/~hinton/MatlabForSciencePaper.html
and use the pre-trained weights for matlab feedforwardnet.
W=hintonRBMpretrained;
net=feedforwardnet([700 300 200 30 200 300 700]);
net.setwb(net,W);
how to set up or arrange the W such that it will match the feedforwardnet structure? I know how to use single vector but i am afraid that the order or the weights sequence is incorrect.
The MATLAB feedforwardnet function returns a Neural Network object with the properties as described in the documentation. The workflow for creating a neural network with pre-trained weights is as follows:
Load data
Create the network
Configure the network
Initialize the weights and biases
Train the network
The steps 1, 2, 3, and 5 are exactly as they would be when creating a neural network from scratch. Let's look at a simple example:
% 1. Load data
load fisheriris
meas = meas.';
species = species.';
targets = dummyvar(categorical(species));
% 2. Create network
net = feedforwardnet([16, 16]);
% 3. Configure the network
configure(net, meas, targets)
Now, we have a neural network net with 4 inputs (sepal and petal length and width), and 3 outputs ('setosa', 'versicolor', and 'virginica'). We have two hidden layers with 16 nodes each. The weights are stored in the two fields net.IW and net.LW, where IW are the input weights, and LW are the layer weights:
>> net.IW
ans =
3×1 cell array
[16×4 double]
[]
[]
>> net.LW
ans =
3×3 cell array
[] [] []
[16×16 double] [] []
[] [3×16 double] []
This is confusing at first, but makes sense: each row in both these cell arrays corresponds to one of the layers we have.
In the IW array, we have the weights between the input and each of the layers. Obviously, we only have weights between the input and the first layer. The shape of this weight matrix is 16x4, as we have 4 inputs and 16 hidden units.
In the LW array, we have the weights from each layer (the rows) to each layer (the columns). In our case, we have a 16x16 weight matrix from the first to the second layer, and a 3x16 weight matrix from the second to the third layer. Makes perfect sense, right?
With that, we know how to initialize the weights we have got from the RBM code:
net.IW{1,1} = weights_input;
net.LW{2,1} = weights_hidden;
With that, you can continue with step 5, i.e. training the network in a supervised fashion.
I've seen the following instructions in one of my AI courses:
net = newp([-2 2;-2 2],2])
net.IW {1,1} = [-1 1; 3 4]
net.b{1} = [-2,3]
How does the neural network look? The perceptron has 2 neurons?
the easiest way to take a look at it is via:
view(net)
there you can see the number of inputs outputs and layers. Also you can check with
help netp
the documentation of the command and in there it says
NET = newp(P,T,TF,LF) takes these inputs,
P - RxQ matrix of Q1 representative input vectors.
T - SxQ matrix of Q2 representative target vectors.
TF - Transfer function, default = 'hardlim'.
LF - Learning function, default = 'learnp'.
net.iw{1,1} sets the input weigths to the chosen numbers
amd net.b{1} sets the biases of the network to the vector [-2,3].
Did this clearify things for you?
I need to classify a dataset using Matlab MLP and show classification.
The dataset looks like
Click to view
What I have done so far is:
I have create an neural network contains a hidden layer (two neurons
?? maybe someone could give me some suggestions on how many
neurons are suitable for my example) and a output layer (one
neuron).
I have used several different learning methods such as Delta bar
Delta, backpropagation (both of these methods are used with or -out
momentum and Levenberg-Marquardt.)
This is the code I used in Matlab(Levenberg-Marquardt example)
net = newff(minmax(Input),[2 1],{'logsig' 'logsig'},'trainlm');
net.trainParam.epochs = 10000;
net.trainParam.goal = 0;
net.trainParam.lr = 0.1;
[net tr outputs] = train(net,Input,Target);
The following shows hidden neuron classification boundaries generated by Matlab on the data, I am little bit confused, beacause network should produce nonlinear result, but the result below seems that two boundary lines are linear..
Click to view
The code for generating above plot is:
figure(1)
plotpv(Input,Target);
hold on
plotpc(net.IW{1},net.b{1});
hold off
I also need to plot the output function of the output neuron, but I am stucking on this step. Can anyone give me some suggestions?
Thanks in advance.
Regarding the number of neurons in the hidden layer, for such an small example two are more than enough. The only way to know for sure the optimum is to test with different numbers. In this faq you can find a rule of thumb that may be useful: http://www.faqs.org/faqs/ai-faq/neural-nets/
For the output function, it is often useful to divide it in two steps:
First, given the input vector x, the output of the neurons in the hidden layer is y = f(x) = x^T w + b where w is the weight matrix from the input neurons to the hidden layer and b is the bias vector.
Second, you will have to apply the activation function g of the network to the resulting vector of the previous step z = g(y)
Finally, the output is the dot product h(z) = z . v + n, where v is the weight vector from the hidden layer to the output neuron and n the bias. In the case of more than one output neurons, you will repeat this for each one.
I've never used the matlab mlp functions, so I don't know how to get the weights in this case, but I'm sure the network stores them somewhere. Edit: Searching the documentation I found the properties:
net.IW numLayers-by-numInputs cell array of input weight values
net.LW numLayers-by-numLayers cell array of layer weight values
net.b numLayers-by-1 cell array of bias values