Bad regression output of neural network - an unwanted upper bound? - neural-network

I am having a problem in a project which uses pybrain(a python library for neural network)
to build an ANN and do regression as prediction.
I am using 3-layer ANN, with 14 inputs, 10 hidden neurons in the hidden layer, and 2 outputs. A typical training or test example would be like this,
Inputs(divided by space):
1534334.489 1554790.856 1566060.675 20 20 20 50 45000 -11.399025 13 1.05E-03 1.775475116 20 0
Outputs(divided by space):
1571172.296 20
And I am using pybrain's BackpropTrainer so it is training using Backpropagation, and I trained until convergence.
The weird thing of the result is that the prediction of the first output(e.g. the first output of the trained ANN using test inputs) tracks the real value well in lower parts of the curve but seems to have an unwanted upperbound when real value rises.
I changed the number of hidden neurons to 10 but it still behaves like this. Even if I tested the trained ANN using the original training samples, it would still have an upperbound like this.
Does anyone have an intuition or advice on what's wrong here? Thanks!

Try to normalize the values(input and output) between (-1, +1).

Related

Why I get high loss in a very simple 3 layers neural network

For testing something I have implemented this.
I considered a function say: f(x1,x2)=x1^2+x2^2 with 30 input. using this randomly generated data I want to train the network.
Obviously I have 2 neurons at the input, and 1 neuron at the output. The middle layer has 6 neurons. Also, we have bias in the input and hidden layers.
I have also implemented backpropagation. With the below code I got 0.1 loss which I think is a bit large with the data I have.
Which part I am doing wrong which I get this large loss.
Thanks for your help.

How to train a multiplier with MLP?

I'm new to neural networks. Im trying to understand what kind of solutions a multilayer perceptron can learn to achieve.
Is it possible to train an MLP to do multiplication by just giving discrete amount of examples?
I could teach it how to do multiplication on certain numbers (those from training dataset of course) but it can not estimate other multiplications correctly.
I used 1 hidden layer (TanH, 10 units) and 1 output layer (Identity), both hidden layer and output layer were biased and were trained using Momentum optimizer.
dataset
0, 5 = 0
1, 1 = 1
2, 3 = 6
3, 7 = 21
4, 3 = 12
5, 9 = 45
7,7 = 49
13,13 = 169
It gives correct results for this dataset but for example calculating 5 * 5 gives wrong number like 32.
Am I expecting too much from MLP? what dataset (or layer setup) I should give to the network to be able to multiply any given number?
Yes, you are expecting too much. An MLP is not "smart" enough to abstract methods from a handful of specific examples. It's a linear combination of weights based on the inputs; extrapolating a quadratic relationship from these examples is a deeper concept than one can express in MLP terms.
In general, if your research hasn't already turned up a standard solution class for a given problem, you're stuck with a wide range of experimentation. My first thought is to try doing this with a RNN, hoping to catch the multiplication abstraction as a side-effect of the feedback loop.

Neural net fitting in matlab

I am trying to find the optimum number of neurons to use to run the Neural Net Fitting tool in Neural Networks Matlab app.
I am currently using 62000 samples of 64 elements as input and 62000 samples of 1 element as target. I tried to obtain similar results as in data obtained through other means, but the results are not even similar when trying to run the tool with 1-12 neurons. I tried running it with 64 neurons and the results were closer to what it was expected.
Is there any kind of way to know how many neurons to use based on the number of elements/samples?
Any suggestions on how to select the number of neurons when running the tests?
Thanks.
Even for simple datasets like MNIST I will at minimum use 128 neurons. Possible values to check are 128, 256, 512, and maybe 1024. These numbers are just easy to remember and are not magical nor the consequence of a known formula. Alternatively, choose a few random samples from [100, 500] and see which number of neurons worked best. Harder tasks tend to require more neurons, and when you have many neurons you need to consider regularizing your network with L_2 regularization or dropout.

MNIST - Training stuck

I'm reading Neural Networks and Deep Learning (first two chapters), and I'm trying to follow along and build my own ANN to classify digits from the MNIST data set.
I've been scratching my head for several days now, since my implementation peaks out at ~57% accuracy at classifying digits from the test set (some 5734/10000) after 10 epochs (accuracy for the training set stagnates after the tenth epoch, and accuracy for the test set deteriorates presumably because of over-fitting).
I'm using nearly the same configuration as in the book: 2-layer feedforward ANN (784-30-10) with all layers fully connected; standard sigmoid activation functions; quadratic cost function; weights are initialized the same way (taken from a Gaussian distribution with mean 0 and standard deviation 1)
The only differences being that I'm using online training instead of batch/mini-batch training and a learning rate of 1.0 instead of 3.0 (I have tried mini-batch training + learning rate of 3.0 though)
And yet, my implementation doesn't pass the 60% percentile after a bunch of epochs where as in the book the ANN goes above %90 just after the first epoch with pretty much the exact same configuration.
At first I messed up implementing the backpropagation algorithm, but after reimplementing backpropagation differently three times, with the exactly the same results in each reimplementation, I'm stumped...
An example of the results the backpropagation algorithm is producing:
With a simpler feedforward network with the same configuration mentioned above (online training + learning rate of 1.0): 3 input neurons, 2 hidden neurons and 1 output neuron.
The initial weights are initialized as follows:
Layer #0 (3 neurons)
Layer #1 (2 neurons)
- Neuron #1: weights=[0.1, 0.15, 0.2] bias=0.25
- Neuron #2: weights=[0.3, 0.35, 0.4] bias=0.45
Layer #2 (1 neuron)
- Neuron #1: weights=[0.5, 0.55] bias=0.6
Given an input of [0.0, 0.5, 1.0], the output is 0.78900331.
Backpropagating for the same input and with the desired output of 1.0 gives the following partial derivatives (dw = derivative wrt weight, db = derivative wrt bias):
Layer #0 (3 neurons)
Layer #1 (2 neurons)
- Neuron #1: dw=[0, 0.0066968054, 0.013393611] db=0.013393611
- Neuron #2: dw=[0, 0.0061298212, 0.012259642] db=0.012259642
Layer #2 (1 neuron)
- Neuron #1: dw=[0.072069918, 0.084415339] db=0.11470326
Updating the network with those partial derivatives yields a corrected output value of 0.74862305.
If anyone would be kind enough to confirm the above results, it would help me tremendously as I've pretty much ruled out backpropagation being faulty as the reason for the problem.
Did anyone tackling the MNIST problem ever come across this problem?
Even suggestions for things I should check would help since I'm really lost here.
Doh..
Turns out nothing was wrong with my backpropagation implementation...
The problem was that I read the images into a signed char (in C++) array, and the pixel values overflowed, so that when I divided by 255.0 to normalize the input vectors into the range of 0.0-1.0, I actually got negative values... ;-;
So basically I spent some four days debugging and reimplementing the same thing when the problem was somewhere else entirely.

Backpropagation learning fails to converge

I use a neural network with 3 layers for categorization problem: 1) ~2k neurons 2) ~2k neurons 3) 20 neurons. My training set consists of 2 examples, most of the inputs in each example are zeros. For some reason after the backpropagation training the network gives virtually the same output for both examples (which is either valid for only 1 of examples or have 1.0 for outputs where one of example has 1s). It comes to this state after the first epoch and doesn't change much afterwards, even if learning rate is minimal double vale. I use sigmoid as activation function.
I thought it could be something wrong with my code so I've used AForge open source library, and seems like it suffers from the same issue.
What might be the problem here?
Solution: I've removed one layer and decreased the number of neurons in hidden layer to 800
2000 by 2000 by 20 is huge. That's approximately 4 million weights to determine, meaning the algorithm has to search a 4-million-dimensional space. Any optimization algorithm will be totally at a loss in this case. I'm assuming you're using gradient descent, which is not even that powerful, so likely the algorithm is stuck in a local optimum somewhere in this gigantic search space.
Simplify your model!
Added:
And please also describe in more detail what you're trying to do. Do you really have only 2 training examples? That's like trying to categorize 2 points using a 4-million-dimensional plane. It doesn't make sense to me.
You mentioned that most of the inputs are zero. To your reduce the size of your search space, try removing redundancy in your training examples. For instance if
trainingExample[0].inputValue[i] == trainingExample[1].inputValue[i]
then x.inputValue[i] has no information bearing data for the NN.
Also, perhaps it's not clear, but it seems that two training examples seem small.