How to train a multiplier with MLP? - neural-network

I'm new to neural networks. Im trying to understand what kind of solutions a multilayer perceptron can learn to achieve.
Is it possible to train an MLP to do multiplication by just giving discrete amount of examples?
I could teach it how to do multiplication on certain numbers (those from training dataset of course) but it can not estimate other multiplications correctly.
I used 1 hidden layer (TanH, 10 units) and 1 output layer (Identity), both hidden layer and output layer were biased and were trained using Momentum optimizer.
dataset
0, 5 = 0
1, 1 = 1
2, 3 = 6
3, 7 = 21
4, 3 = 12
5, 9 = 45
7,7 = 49
13,13 = 169
It gives correct results for this dataset but for example calculating 5 * 5 gives wrong number like 32.
Am I expecting too much from MLP? what dataset (or layer setup) I should give to the network to be able to multiply any given number?

Yes, you are expecting too much. An MLP is not "smart" enough to abstract methods from a handful of specific examples. It's a linear combination of weights based on the inputs; extrapolating a quadratic relationship from these examples is a deeper concept than one can express in MLP terms.
In general, if your research hasn't already turned up a standard solution class for a given problem, you're stuck with a wide range of experimentation. My first thought is to try doing this with a RNN, hoping to catch the multiplication abstraction as a side-effect of the feedback loop.

Related

Neural Network XOR not converging

I have tried to implement a neural network in Java by myslef to be an XOR-gate, it has kinda of worked. About ~20% of the time when I try to train it, the weights converges to produce a good enough output (RMS < 0.05) but the other 80% of the time it doesn't.
The neural network (can be seen here) is composed of 2 inputs (+ 1 bias), 2 hidden (+ 1 bias) and 1 output unit. The activation function I used was the sigmoid function
e / ( 1 + e^-x)
which maps the input values to between 0 and 1. The learning algorithm used is Stochastic Gradient Descent using RMS as the cost function. The bias neurons has a constant output of 1. I have tried changing the learning rate between 0.1 and 0.01 and doesn't seem to fix the problem.
I have the network track the weights and rms of the network and plotted in on a graph. There is basically three different behaviours the weights can get. I can only post one the three.
two (or more) value diverges in different directions
One of the other two is just the weights converging to a good value and the second is a random wiggle of one weight.
I don't know if this is just a thing that happens or if there is some kind of way to fix it so please tell me if you know anything.

Neural net fitting in matlab

I am trying to find the optimum number of neurons to use to run the Neural Net Fitting tool in Neural Networks Matlab app.
I am currently using 62000 samples of 64 elements as input and 62000 samples of 1 element as target. I tried to obtain similar results as in data obtained through other means, but the results are not even similar when trying to run the tool with 1-12 neurons. I tried running it with 64 neurons and the results were closer to what it was expected.
Is there any kind of way to know how many neurons to use based on the number of elements/samples?
Any suggestions on how to select the number of neurons when running the tests?
Thanks.
Even for simple datasets like MNIST I will at minimum use 128 neurons. Possible values to check are 128, 256, 512, and maybe 1024. These numbers are just easy to remember and are not magical nor the consequence of a known formula. Alternatively, choose a few random samples from [100, 500] and see which number of neurons worked best. Harder tasks tend to require more neurons, and when you have many neurons you need to consider regularizing your network with L_2 regularization or dropout.

MNIST - Training stuck

I'm reading Neural Networks and Deep Learning (first two chapters), and I'm trying to follow along and build my own ANN to classify digits from the MNIST data set.
I've been scratching my head for several days now, since my implementation peaks out at ~57% accuracy at classifying digits from the test set (some 5734/10000) after 10 epochs (accuracy for the training set stagnates after the tenth epoch, and accuracy for the test set deteriorates presumably because of over-fitting).
I'm using nearly the same configuration as in the book: 2-layer feedforward ANN (784-30-10) with all layers fully connected; standard sigmoid activation functions; quadratic cost function; weights are initialized the same way (taken from a Gaussian distribution with mean 0 and standard deviation 1)
The only differences being that I'm using online training instead of batch/mini-batch training and a learning rate of 1.0 instead of 3.0 (I have tried mini-batch training + learning rate of 3.0 though)
And yet, my implementation doesn't pass the 60% percentile after a bunch of epochs where as in the book the ANN goes above %90 just after the first epoch with pretty much the exact same configuration.
At first I messed up implementing the backpropagation algorithm, but after reimplementing backpropagation differently three times, with the exactly the same results in each reimplementation, I'm stumped...
An example of the results the backpropagation algorithm is producing:
With a simpler feedforward network with the same configuration mentioned above (online training + learning rate of 1.0): 3 input neurons, 2 hidden neurons and 1 output neuron.
The initial weights are initialized as follows:
Layer #0 (3 neurons)
Layer #1 (2 neurons)
- Neuron #1: weights=[0.1, 0.15, 0.2] bias=0.25
- Neuron #2: weights=[0.3, 0.35, 0.4] bias=0.45
Layer #2 (1 neuron)
- Neuron #1: weights=[0.5, 0.55] bias=0.6
Given an input of [0.0, 0.5, 1.0], the output is 0.78900331.
Backpropagating for the same input and with the desired output of 1.0 gives the following partial derivatives (dw = derivative wrt weight, db = derivative wrt bias):
Layer #0 (3 neurons)
Layer #1 (2 neurons)
- Neuron #1: dw=[0, 0.0066968054, 0.013393611] db=0.013393611
- Neuron #2: dw=[0, 0.0061298212, 0.012259642] db=0.012259642
Layer #2 (1 neuron)
- Neuron #1: dw=[0.072069918, 0.084415339] db=0.11470326
Updating the network with those partial derivatives yields a corrected output value of 0.74862305.
If anyone would be kind enough to confirm the above results, it would help me tremendously as I've pretty much ruled out backpropagation being faulty as the reason for the problem.
Did anyone tackling the MNIST problem ever come across this problem?
Even suggestions for things I should check would help since I'm really lost here.
Doh..
Turns out nothing was wrong with my backpropagation implementation...
The problem was that I read the images into a signed char (in C++) array, and the pixel values overflowed, so that when I divided by 255.0 to normalize the input vectors into the range of 0.0-1.0, I actually got negative values... ;-;
So basically I spent some four days debugging and reimplementing the same thing when the problem was somewhere else entirely.

Bad regression output of neural network - an unwanted upper bound?

I am having a problem in a project which uses pybrain(a python library for neural network)
to build an ANN and do regression as prediction.
I am using 3-layer ANN, with 14 inputs, 10 hidden neurons in the hidden layer, and 2 outputs. A typical training or test example would be like this,
Inputs(divided by space):
1534334.489 1554790.856 1566060.675 20 20 20 50 45000 -11.399025 13 1.05E-03 1.775475116 20 0
Outputs(divided by space):
1571172.296 20
And I am using pybrain's BackpropTrainer so it is training using Backpropagation, and I trained until convergence.
The weird thing of the result is that the prediction of the first output(e.g. the first output of the trained ANN using test inputs) tracks the real value well in lower parts of the curve but seems to have an unwanted upperbound when real value rises.
I changed the number of hidden neurons to 10 but it still behaves like this. Even if I tested the trained ANN using the original training samples, it would still have an upperbound like this.
Does anyone have an intuition or advice on what's wrong here? Thanks!
Try to normalize the values(input and output) between (-1, +1).

How to use trained Neural Network in Matlab for classification in a real system

I have trained Feed Forward NN using Matlab Neural Network Toolbox on a dataset containing speech features and accelerometer measurements. Targetset contains two target classes for dataset: 0 and 1. The training, validation and performance are all fine and I have generated code for this network.
Now I need to use this neural network in real-time to recognize pattern when occur and generate 0 or 1 when I test a new dataset against previously trained NN. But when I issue a command:
c = sim(net, j)
Where "j" is a new dataset[24x11]; instead 0 or 1 i get this as an output (I assume I get percent of correct classification but there is no classification result itself):
c =
Columns 1 through 9
0.6274 0.6248 0.9993 0.9991 0.9994 0.9999 0.9998 0.9934 0.9996
Columns 10 through 11
0.9966 0.9963
So is there any command or a way that I can actually see classification results? Any help highly appreciated! Thanks
I'm no matlab user, but from a logical point of view, you are missing an important point:
The input to a Neural Network is a single vector, you are passing a matrix. Thus matlab thinks that you want to classify a bunch of vectors (11 in your case). So the vector that you get is the output activation for every of these 11 vectors.
The output activation is a value between 0 and 1 (I guess you are using the sigmoid), so this is perfectly normal. Your job is to get a threshold that fits your data best. You can get this threshold with cross validation on your training/test data or by just choosing one (0.5?) and see if the results are "good" and modify if needed.
NNs normally convert their output to a value within (0,1) using for example the logistic function. It's not a percentage or probability, just a relative measure of certainty. In any case this means is that you have to manually use a threshold (such as 0.5) to discriminate the two classes. Which threshold is best is tough to find because you must select the optimum trade off between precision and recall.