When to start using the selection set in a Back Propagation Neural Network? - neural-network

Beginner on ANNs:
I am implementing a back propagation neural network to predict the price of gold. I know that I have to split my data into training data, selection data and test data.
However I unsure How to go on about using these sets of data. At first I was training the data network with my training set then after it's trained I am getting a number of inputs to my network from the test set and comparing the output.
I'm not sure if I'm doing this right and were does the selection set come in ?
thanks in advance!

The general idea is:
Train the network for a little while on the training set.
Evaluate the network on a second set, often called the validation set. Probably what you're calling the selection set.
Train the network a little more on the training set.
Evaluate the new network on the selection set again.
Which did better, the old network or the new network? If the new network is better, we're still getting some use out of training, so goto 3. If the new network is worse, more training will probably only hurt. Use the previously version of the network, since it did better.
In this way, you can tell when to stop training.
One easy modification to this is to always keep track of the best network seen so far, and we only stop training when we see some number (say, three) of training attempts that do worse in a row.
The third set, the test set, is necessary because the selection set is, if indirectly, involved in the training process. Final evaluation must be done on data that was not used at all during training.
This sort of thing is sufficient for simple experiments, but in general you'll want to use cross-validation to get a better idea of your system's performance.

I wanted to leave a comment just to say that validation sets are a good place for model-dependent hyper-parameter tuning, but I'm new here and hence lack the reputation points to do so. To make this more worthy of a separate posting, I've included an outline of my own train-validate-test process. In practice, my workflow is as follows:
Identify, collect, and clean data. Try to limit complaining during data munging process.
Split data into three sets: training, validation, test.
Establish two "base" models for evaluating more complex models built later on in the process. The first of these models is typically a basic linear/logistic regression using all possible features. The second models uses only the most obviously informative (initial identification of informative features depends on use case, typically involves combination of domain knowledge, basic clustering, simple correlation).
Begin more empirical feature selection (i.e. unsupervised NN, but usually random forest) and prototype a broad range of models using the training set.
Eliminate poorly performing models as well as uninformative features
Compare performance of remaining models against each other and the "base" models, using a modified version of the training set (same data, but sans uninformative features). Toss under-performing models.
Using the validation set, tune the appropriate hyper-parameters for each of the models (either by hand or gridsearch). Further reduce the number of models in consideration, ideally to just 2-3 (excluding base models).
Finally, evaluate model performance (with optimized hyper-parameters) on the test set. Again, compare models among themselves and against the base models. Make final model choice based on a problem-specific appropriate combination of computational complexity/cost, ease of interpretation/transparency/"explainability", and improvement over and/or performance vs base models.

Related

Is Cross Validation enough to ensure that there is no Overfitting in a classification algorithm?

I have a data set with 45 observations for one class and 55 observations for another class. Moreover, I am using 4 different features which were previously chosen by using a Feature Selection filter though the results of this procedure were somewhat strange..
On the other hand, I am using cross validation and getting good accuracy results (75% to 85%) from different classifiers since I'm using the classificationLearner on Matlab. Would this ensure that there is no overfitting? Or there might still be a chance for this? How can I assure that there is no overfitting?
That really depends on your training data set that you have available. If data that is available to you isn't representative enough, you will not get a good model regardless of the methods you use for training and validation.
Having that in mind, if you are sure your data is representative (has the same distribution of values for any subset of "important" attributes as the the global set of all data) than cross validation is good enough to rely on.

How to interpret the discriminator's loss and the generator's loss in Generative Adversarial Nets?

I am reading people's implementation of DCGAN, especially this one in tensorflow.
In that implementation, the author draws the losses of the discriminator and of the generator, which is shown below (images come from https://github.com/carpedm20/DCGAN-tensorflow):
Both the losses of the discriminator and of the generator don't seem to follow any pattern. Unlike general neural networks, whose loss decreases along with the increase of training iteration. How to interpret the loss when training GANs?
Unfortunately, like you've said for GANs the losses are very non-intuitive. Mostly it happens down to the fact that generator and discriminator are competing against each other, hence improvement on the one means the higher loss on the other, until this other learns better on the received loss, which screws up its competitor, etc.
Now one thing that should happen often enough (depending on your data and initialisation) is that both discriminator and generator losses are converging to some permanent numbers, like this:
(it's ok for loss to bounce around a bit - it's just the evidence of the model trying to improve itself)
This loss convergence would normally signify that the GAN model found some optimum, where it can't improve more, which also should mean that it has learned well enough. (Also note, that the numbers themselves usually aren't very informative.)
Here are a few side notes, that I hope would be of help:
if loss haven't converged very well, it doesn't necessarily mean that the model hasn't learned anything - check the generated examples, sometimes they come out good enough. Alternatively, can try changing learning rate and other parameters.
if the model converged well, still check the generated examples - sometimes the generator finds one/few examples that discriminator can't distinguish from the genuine data. The trouble is it always gives out these few, not creating anything new, this is called mode collapse. Usually introducing some diversity to your data helps.
as vanilla GANs are rather unstable, I'd suggest to use some version
of the DCGAN models, as they contain some features like convolutional
layers and batch normalisation, that are supposed to help with the
stability of the convergence. (the picture above is a result of the DCGAN rather than vanilla GAN)
This is some common sense but still: like with most neural net structures tweaking the model, i.e. changing its parameters or/and architecture to fit your certain needs/data can improve the model or screw it.

Neural Network Retraining

I am coding a simple Neural Network, but I have thought of one issue that is bothering me.
This NN is for finding categories in the input. To better understand this, say the categories are "the numbers" (0,1,2...9).
To implement this the output layer is 10 nodes. Say I train this NN with several input -output pairs and save the learned weights somewhere. As the learning process takes quite a lot of time, after that I go and take a break. Come fresh the next day and re-start learning with new input -output pairs. So fair so goo
But what happen if on that time, I decide that I want to recognize hexadecimals (0,1,...9,A,B,,,E,F)... ergo the categories are increasing.
I suspect that would imply changing the structure of the NN and therefore I should retrain the NN from scratch.
Is this so?
Any comment, advice or your share of experience will be greatly appreciated
EDIT: This question has been marked as duplicate. I read the other question and although similar, my question is more concrete. While the other question speaks in generalities and the answer also is quite general- mine is very concrete as I use an example:
If I train a NN to recognize decimal numbers and later on decide to add data to make it recognize hexadecimals, can this be possible? How? Do I have to retrain the whole NN? In other words, does the structure of the NN needs to stay stationary with 10 OR 16 outputs since the beginning?
I would very much appreciate for a concrete answer to this. Thanks
A few considerations
Your training set and testing set should have the same distribution
Unless you have some way of specifying sample weights like some algorithms can you should at all costs avoid training on biased data. This is true for machine learning in general, not only neural networks.
Resuming training from a previous session is equivalent of using good initial values
Technically, you're just using the previous network as initial value instead of a random value. You should keep training in the whole dataset as always, to avoid a biased network.
Short Answer
Yes, you should always retrain your network if by retrain, you mean doing a training routine with the full dataset.
If you just mean retrain as doing a really long training iteration, it isn't your choice anyway. You must always train the network until the training error and testing error (or cross validated error) converge. If you reuse the previously trained network, that will probably happen faster.
You see, this is true no matter what kind of model change. If you change the network architecture, or the dataset, or both (your example), or some other parameter.
Of course, if you change the network architecture, you're going to have a bit of trouble on reusing the previous network. You could reuse the learned parameters from nodes that were kept and randomly initialize the parameters for the new nodes.

Neural Networks and correlation between input and output

I am trying to fit some input to predict an output in Matlab using fitnet neural networks, but I am concerned in finding which input candidate vector would correlate the most with the output as a preprocessing step prior to my neural network training.
In the figure below the output in yellow has five input candidates where I need to chose only from. What command should I use in Matlab and how should I prepare that data (repeated around 1000 time) so I can get a clear correlation between the input candidate and the output.
To find out correlation between given feature and target variable you can use R = corrcoef(A,B), but... do not do it!.
This process makes no sense and will be probably harmfull for the whole process. You are going to remove part of information from your data so only features which have idependent, linear realtion to target variable persist. Then, you will apply highly-non linear model which exploits co-occurences and features correlations. These two steps are completely incompatible. The only valid relation is - if your data is very simple and it can be pretty much modeled with linear model, then neural net will work as well. But then there is no point in using a neural net in the first place, just apply linear regression. Consequently: do not perform feature selection unless you have to. Try to build a good model without doing that, and if you have to remove some features (maybe getting them is expensive process?) use post-hoc model analysis to remove features which are not used by this model. Do not split your problem to multiple, independent processes if you do not have to (unless you can show that this decomposition does not harm the process, but in case of feature selection + regressor this is not true, as you cannot construct a valid feature selection supervision without trained regressor).

Fitness evaluation and training set in realtime simulation neuro-evolution

I am attempting to train a neural network to control a simple entity in a simulated 2D environment, currently by using a genetic algorithm.
Perhaps due to lack of familiarity with the correct terms, my searches have not yielded much information on how to treat fitness and training in cases where all the following conditions hold:
There is no data available on correct outputs for given inputs.
A performance evaluation can only be made after an extended period of interaction with the environment (with continuous controller input/output invocation).
There is randomness inherent in the system.
Currently my approach is as follows:
The NN inputs are instantaneous sensor readings of the entity and environment state.
The outputs are instantaneous activation levels of its effectors, for example, a level of thrust for an actuator.
I generate a performance value by running the simulation for a given NN controller, either for a preset period of simulation time, or until some system state is reached. The performance value is then assigned as appropriate based on observations of behaviour/final state.
To prevent over-fitting, I repeat the above a number of times with different random generator seeds for the system, and assign a fitness using some metric such as average/lowest performance value.
This is done for every individual at every generation. Within a given generation, for fairness each individual will use the same set of random seeds.
I have a couple of questions.
Is this a reasonable, standard approach to take for such a problem? Unsurprisingly it all adds up to a very computationally expensive process. I'm wondering if there are any methods to avoid having to rerun a simulation from scratch every time I produce a fitness value.
As stated, the same set of random seeds is used for the simulations for each individual in a generation. From one generation to the next, should this set remain static, or should it be different? My instinct was to use different seeds each generation to further avoid over-fitting, and that doing so would not have an adverse effect on the selective force. However, from my results, I'm unsure about this.
It is a reasonable approach, but genetic algorithms are not known for being very fast/efficient. Try hillclimbing and see if that is any faster. There are numerous other optimization methods, but nothing is great if you assume the function is a black box that you can only sample from. Reinforcement learning might work.
Using random seeds should prevent overfitting, but may not be necessary depending on how representative a static test is of average, and how easy it is to overfit.