I'm trying to train an 8-bit neural network to output XOR of its inputs. I'm using ffnet library (http://ffnet.sourceforge.net/). For low number of input bits (up to 4) backpropagation produces expected results. For 8 bits, NN seems to 'converge', meaning that it outputs the same value for any input. I'm using a multilayer NN: inputs, hidden layer, output, plus bias node.
Am I doing something wrong? Does this NN need to be of certain shape, to be able to learn to XOR?
Edit:
This is the code I'm using:
def experiment(bits, input, solution, iters):
conec = mlgraph( (bits, bits, 1) )
net = ffnet(conec)
net.randomweights()
net.train_momentum(input, solution, eta=0.5, momentum=0.0, maxiter=iters)
net.test(input, solution, iprint=2)
I'm using momentum=0.0 to get pure back-propagation.
This is a part of the results I get:
Testing results for 256 testing cases:
OUTPUT 1 (node nr 17):
Targets vs. outputs:
1 1.000000 0.041238
2 1.000000 0.041125
3 1.000000 0.041124
4 1.000000 0.041129
5 1.000000 0.041076
6 1.000000 0.041198
7 0.000000 0.041121
8 1.000000 0.041198
It goes on like this for every vector (256 values)
Related
In VW, the format for feature namespaces is shown below:
Label [Tag]|Namespace Features |Namespace Features ... |Namespace
Features Where:
Namespace=String[:Value]
and an example is:
1 1.0 |MetricFeatures:3.28 height:1.5 length:2.0 |Says black with white stripes |OtherFeatures NumberOfLegs:4.0 HasStripes
Notice that the |MetricFeatures namespace has a higher weight than 1 (3.28). Based on the above example, if I create some feature interactions, say between the M and the S namespaces with -q MS, does the new feature namespace that is the cross product of the two original ones have an importance weighting of 1 by default? Or would it inherit the product of the two importance Values (in this case 1*3.28 = 3.28)?
And is there a way to modify the weight of the feature interactions manually? E.g. say MetricFeatures has an importance weight of 1, can I have the features generated by the quadratic interaction of MetricFeaturesXSays have an importance weighting of x?
Currently there is no way to individually weight interactions.
The namespace weight is processed at parse time, so when reading in the features of that namespace they are multiplied by the weight.
This can be verified by using --audit:
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = data.txt
num sources = 1
average since example example current current current
loss last counter weight label predict features
0
MetricFeatures^height:146807:4.92:0#0 MetricFeatures^length:38580:6.56:0#0 Says^black:100768:1:0#0 Says^with:163314:1:0#0 Says^white:106708:1:0#0Says^stripes:112832:1:0#0 OtherFeatures^NumberOfLegs:146847:4:0#0 OtherFeatures^HasStripes:229154:1:0#0 Constant:116060:1:0#0
1.000000 1.000000 1 1.0 1.0000 0.0000 9
finished run
number of examples = 1
weighted example sum = 1.000000
weighted label sum = 1.000000
average loss = 1.000000
best constant = 1.000000
best constant's loss = 0.000000
total feature number = 9
MetricFeatures^height:146807:4.92:0#0 -> 3.28 * 1.5 = 4.92
Consider the hypothetical neural network here
$o_1$ is the output of neuron 1.
$o_2$ is the output of neuron 2.
$w_1$ is the weight of connection between 1 and 3.
$w_2$ is the weight of connection between 2 and 3.
So the input to neuron 3 is $i =o_1w_1 +o_2w_2$
Let the activation function of neuron 3 be sigmoid function.
$f(x) = \dfrac{1}{1+e^{-x}}$ and the threshold value of neuron 3 be $\theta$.
Therefore, output of neuron 3 will be $f(i)$ if $i\geq\theta$ and $0$ if $i\lt\theta$.
Am I correct?
Thresholds are used for binary neurons (I forget the technical name), whereas biases are used for sigmoid (and pretty much all modern) neurons. Your understanding of the threshold is correct, but again this is used in neurons where the output is either 1 or 0, which is not very useful for learning (optimization). With a sigmoid neuron, you would simply add the bias (previously the threshold but moved to the other side of the equation), so you're output would be f(weight * input + bias). All the sigmoid function is doing (for the most part) is limiting your output to a value between 0 and 1
I do not think it is the place to ask this sort of questions. You will find lot of NN ressources online. For your simple case, each link has a weight, so basicly the input of neuron 3 is :
Neuron3Input = Neuron1Output * WeightOfLinkNeuron1To3 + Neuron2Output * WeightOfLinkNeuron2To3
+ bias.
Then, to get the output, just use the activation function. Neuron3Output = F_Activation(Neuron3Input)
O3 = F(O1 * W1 + O2 * W2 + Bias)
I was wondering how was working LSTM under Keras.
Let's take an example.
I have maximum sentence length of 3 words.
Example : 'how are you'
I vectorize each words in a vector of len 4. So I will have a shape (3,4)
Now, I want to use an lstm to do translation stuff. (Just an example)
model = Sequential()
model.add(LSTM(1, input_shape=(3,4), return_sequences=True))
model.summary()
I'm going to have an output shape of (3,1) according to Keras.
Layer (type) Output Shape Param #
=================================================================
lstm_16 (LSTM) (None, 3, 1) 24
=================================================================
Total params: 24
Trainable params: 24
Non-trainable params: 0
_________________________________________________________________
And this is what I don't understand.
Each unit of an LSTM (With return_sequences=True to have all the output of each state) should give me a vector of shape (timesteps, x)
Where timesteps is 3 in this case, and x is the size of my words vector (In this case, 4)
So, why I got an output shape of (3,1) ?
I searched everywhere, but can't figure it out.
Your interpretation of what the LSTM should return is not right. The output dimensionality doesn't need to match the input dimensionality. Concretely, the first argument of keras.layers.LSTM corresponds to the dimensionality of the output space, and you're setting it to 1.
In other words, setting:
model.add(LSTM(k, input_shape=(3,4), return_sequences=True))
will result in a (None, 3, k) output shape.
I get constantly error Total number of RBF neurons must be some integer to the power of 'dimensions' with using method SetRBFCentersAndWidthsEqualSpacing in C#.
Can someone who is familiar with RBF network in Encog check the line 232 in RBFNetwork.cs. I think there is maybe a bug or I miss something:
var expectedSideLength = (int) Math.Pow(totalNumHiddenNeurons, 1.0d/dimensions);
double cmp = Math.Pow(totalNumHiddenNeurons, 1.0d/dimensions);
if (expectedSideLength != cmp) -> error
these two variables can't be equal, because (int) rounds the number. It's coincidence that it works for XOR example, it won't work with different dimenson like 19 for example.
This is how I create RBF network:
dataSet is VersatileMLDataSet
RBFNetwork n = new RBFNetwork(dataSet.CalculatedInputSize, dataSet.Count, 1, RBFEnum.Gaussian);
n.SetRBFCentersAndWidthsEqualSpacing(0, 1, RBFEnum.Gaussian, 2.0/(dataSet.CalculatedInputSize * dataSet.CalculatedInputSize), true);
My dataset has 19 attributes (dimension) with 731 records.
The number of hidden neurons is an integer raised to the power of the number of input neurons. So if you have 3 input attributes and a window size of 2, hidden neurons would be any integer (say 3) raised to the power of 6 (3 x 2) or 729. This limits the number of input attributes and window size as the number of hidden neurons gets very large very quickly.
I have approximately 5000 integer vectors (=SIZE) that look like:
[1 0 4 2 0 1 3 ...]
They have the same length N=32 and their values ranges from 0 to 4 but let's say [0 MAX].
I created a NN that takes vectors as inputs and outputs a binary array corresponding to one of the desired output(number of possible outputs = M):
for instance [0 1 0 0 ...0] => 2nd output. array_length = M
I used a Multi Layer Perceptron in Neuroph with those integer values but it did not converge.
So I am guessing the problem is using integer values or using a MLP with 3 layers: input, hidden and output.
Can you advise me on the network structure? which type of NN is suitable? Should I remodel the input and output to simplify the learning process? I have been thinking about Gray encoding for the integers input.