I've programmed a fully connected recurrent network (based on Williams and Zipser) in Octave, and I successfully trained it using BPTT to compute an XOR as a toy example. The learning process was relatively uneventful:
XOR training
So, I thought I'd try training the network to compute the XOR of the first two inputs, and the OR of the last two inputs. However, this failed to converge the first time I ran it; instead, the error just oscillated continually. I tried decreasing the learning rate, and turning off momentum entirely, but it didn't help. When I ran it again this morning, it did end up converging, but not without more oscillations in the process:
XOR/OR training
So, my question: could this indicate a problem with my gradient computation, or is this just something that happens when training recurrent networks?
Related
I trained a Neural Network with a GA and with Backpropagation. The GA finds suitable weights for the training data but performs poorly on the test data. If I train the NN with BackPropagation, it performs much better on the test data even though the training error isn't much smaller than for the GA trained version. Even when I use the weights obtained by the GA as initial weights for Backpropagation, the NN performs worse on the test data than using only Backpropagation for training. Can anyone tell me where I could have made a mistake?
I suggest you read something about overfitting. In short you will be excelent at training set but poor at testing set(because NN follows anomaly and uncertainity and datas). Task of NN is generalize, but GA only perfect minimize error in training set(to be fair, this is GA task).
There are some methods how to deal with overfitting. I suggest you use validation set. First step is division your data into the three sets. Training testing and validation. Method is simple, you will train your NN with GA to minimalize error on training set, but you also run your NN on validation set, only run, not train. Error of network decrease on training set, but error should also decrease at validation set. So if error decrease at training set, but start increase at validation set, you must stop with learning(please don't stop at first iterations).
Hope it will be helpful.
I have encountered a similar problem, and the choice of the initial values of the neural network does not seem to affect the final classification accuracy. I used the feedforwardnet() function in matlab to compare the two cases. One is direct training, and the program gives random initial weights and bias values. One is to find the appropriate initial weights values and bias values through the GA algorithm, and then assign them to the neural network, and then start training. However, the latter approach does not improve the accuracy of neural network classification.
I am working on a regression model with a set of 158 inputs and 4 outputs of glass manufacturing project which is a continuous process of inputs and outputs. Is the usage of Neural Net a suitable solution for such kind of regression models? If yes, I have understood that Recurrent Neural Nets can be used for time series data, which Recurrent Neural Net shall I use? If usage of NN is not suitable, what are the other types of solutions available other than Linear Regression and Regression Trees?
Neural Networks are indeed suitable for continuous data. In fact, it is continous by default I would say. It is possible to have discrete I/O for sure, it all depend on your functions.
Secondly, it is true that RNN are suitable for time series, in a way. RNN are in fact suitable for timesteps more than timestamps. RNN are working by iterations. Typically, each iteration can be seen as a fixed step forward in time. This said, if you data is more like (date, value) (what I call timestamp), it may not be so good. It would not be absolutely impossible, but that's not the idea.
Hope it helps, start with simple RNN, try to understand how it works, then, if you need more, read about more complex cells.
I've built a relatively simple artificial neural network in attempts to model the Value function in a Q-Learning problem, but to verify my implementation of the network was correct I am trying to solve the XOR problem.
My network architecture uses two layers both with tanh activation, a learning factor of .001, bias units, and momentum set to .9. After each training iteration, I print the error term (using squared error), and run until my error converges to ~.001. This works about 75% of the time, but the other 25% my network will converge to ~.5 error which is rather large for this problem.
Here is a sample printout of the error term:
0.542649
0.530637
0.521523
0.509143
0.504623
0.501864
0.501657
0.500268
0.500057
0.500709
0.501979
0.501456
0.50275
0.507215
0.517656
0.530988
0.535493
0.539808
0.543903
The error oscillates as such until infinity.
So the question is: is my implementation broken, or is it possible that I am running into a local minimum?
I have a neural network with 2 entry variables, 1 hidden layer with 2 neurons and the output layer with one output neuron. When I start with some randomly (from 0 to 1) generated weights, the network learns the XOR function very fast and good, but in other cases, the network NEVER learns the XOR function! Do you know why this happens and how can I overcome this problem? Could some chaotic behaviour be involved? Thanks!
This is quite normal situation, because error function for multilayer NN is not convex, and optimization converges to local minimum.
You can just keep initial weights that resulted in successful optimization, or run optimizer multiple times starting from different weights, and keep the best solution. Optimization algorithm and learning rate also plays certain role, for example backpropagation with momentum and/or stochastic gradient descent sometimes work better. Also, if you add more neurons, beyond the minimum needed to learn XOR, this also helps.
There exist methodologies designed to find global minimum, such as simulated annealing, but, in practice they are not commonly used for NN optimization, except for some specific cases
I would like to ask for ideas what options there is for training a MATLAB ANN (artificial neural network) continuously, i.e. not having a pre-prepared training set? The idea is to have an "online" data stream thus, when first creating the network it's completely untrained but as samples flow in the ANN is trained and converges.
The ANN will be used to classify a set of values and the implementation would visualize how the training of the ANN gets improved as samples flows through the system. I.e. each sample is used for training and then also evaluated by the ANN and the response is visualized.
The effect that I expect is that for the very first samples the response of the ANN will be more or less random but as the training progress the accuracy improves.
Any ideas are most welcome.
Regards, Ola
In MATLAB you can use the adapt function instead of train. You can do this incrementally (change weights every time you get a new piece of information) or you can do it every N-samples, batch-style.
This document gives an in-depth run-down on the different styles of training from the perspective of a time-series problem.
I'd really think about what you're trying to do here, because adaptive learning strategies can be difficult. I found that they like to flail all over compared to their batch counterparts. This was especially true in my case where I work with very noisy signals.
Are you sure that you need adaptive learning? You can't periodically re-train your NN? Or build one that generalizes well enough?