Just some general questions about training. I used a convolutional neural network for binary classification of text on a dataset of about 10,000 samples. The dataset was pretty unbalanced with about 80% of the samples in class 1. The very last image shows a model on a balanced dataset of about a few million samples doing a 14-way classification task. All models use nn.ClassNLLCriterion, momentum of 0.9, dropout, and weight decay of 0.00001:
Here's the code for more details
For loss, I got values over 1 for validation. How bad is that? Is a loss over one big or is it reasonable?
Is the error y-axis unit usually in percent? So here, for the error, would it range from 0% to 0.16% or is it 0% to 16%?
For the graphs below, the loss and error graphs look about the same shape. In general, should the loss and error graphs always have the same shape?
Are the error and loss usually on different scales?
Related
I'm training with TensorFlow a CNN for image classification on dataset Food-101 and I reach a test accuracy of about 80% (I use model.evaluate()).
The issue I have is that when I plot the confusion matrix of the 3 classes involved this is very different and have maximum 40% on the main diagonal.
I could understand it if at least 1 of the 3 classes was around 100%, because I would expect the averaged accuracy to raise even with bad results in the other predictions. But in this case none of them is neither similar to what I achieve during evaluation.
I tried to plot the confusion matrix with training data, with which I reached more than 90% accuracy during the learning process, and also this is not correct.
I am looking at (two-layer) feed-forward Neural Networks in Matlab. I am investigating parameters that can minimise the classification error.
A google search reveals that these are some of them:
Number of neurons in the hidden layer
Learning Rate
Momentum
Training type
Epoch
Minimum Error
Any other suggestions?
I've varied the number of hidden neurons in Matlab, varying it from 1 to 10. I found that the classification error is close to 0% with 1 hidden neuron and then grows very slightly as the number of neurons increases. My question is: shouldn't a larger number of hidden neurons guarantee an equal or better answer, i.e. why might the classification error go up with more hidden neurons?
Also, how might I vary the Learning Rate, Momentum, Training type, Epoch and Minimum Error in Matlab?
Many thanks
Since you are considering a simple two layer feed forward network and have already pointed out 6 different things you need to consider to reduce classification errors, I just want to add one thing only and that is amount of training data. If you train a neural network with more data, it will work better. Note that, training with large amount of data is a key to get good outcome from neural networks, specially from deep neural networks.
Why the classification error goes up with more hidden neurons?
Answer is simple. Your model has over-fitted the training data and thus resulting in poor performance. Note that, if you increase the number of neurons in hidden layers, it would decrease training errors but increase testing errors.
In the following figure, see what happens with increased hidden layer size!
How may I vary the Learning Rate, Momentum, Training type, Epoch and Minimum Error in Matlab?
I am expecting you have already seen feed forward neural net in Matlab. You just need to manipulate the second parameter of the function feedforwardnet(hiddenSizes,trainFcn) which is trainFcn - a training function.
For example, if you want to use gradient descent with momentum and adaptive learning rate backpropagation, then use traingdx as the training function. You can also use traingda if you want to use gradient descent with adaptive learning rate backpropagation.
You can change all the required parameters of the function as you want. For example, if you want to use traingda, then you just need to follow the following two steps.
Set net.trainFcn to traingda. This sets net.trainParam to traingda's default parameters.
Set net.trainParam properties to desired values.
Example
net = feedforwardnet(3,'traingda');
net.trainParam.lr = 0.05; % setting the learning rate to 5%
net.trainParam.epochs = 2000 % setting number of epochs
Please see this - gradient descent with adaptive learning rate backpropagation and gradient descent with momentum and adaptive learning rate backpropagation.
I've made digit recognition (56x56 digits) using Neural Networks, but I'm getting 89.5% accuracy on test set and 100% on training set. I know that it's possible to get >95% on test set using this training set. Is there any way to improve my training so I can get better predictions? Changing iterations from 300 to 1000 gave me +0.12% accuracy. I'm also file size limited so increasing number of nodes can be impossible, but if that's the case maybe I could cut some pixels/nodes from the input layer.
To train I'm using:
input layer: 3136 nodes
hidden layer: 220 nodes
labels: 36
regularized cost function with lambda=0.1
fmincg to calculate weights (1000 iterations)
As mentioned in the comments, the easiest and most promising way is to switch to a Convolutional Neural Network. But with you current model you can:
Add more layers with less neurons each, which increases learning capacity and should increase accuracy by a bit. Problem is that you might start overfitting. Use regularization to counter this.
Use batch Normalization (BN). While you are already using regularization, BN accelerates training and also does regularization, and is a NN specific algorithm that might work better.
Make an ensemble. Train several NNs on the same dataset, but with a different initialization. This will produce slightly different classifiers and you can combine their output to get a small increase in accuracy.
Cross-entropy loss. You don't mention what loss function you are using, if its not Cross-entropy, then you should start using it. All the high accuracy classifiers use cross-entropy loss.
Switch to backpropagation and Stochastic Gradient Descent. I do not know the effect of using a different optimization algorithm, but backpropagation might outperform the optimization algorithm you are currently using, and you could combine this with other optimizers such as Adagrad or ADAM.
Other small changes that might increase accuracy are changing the activation functions (like ReLU), shuffle training samples after every epoch, and do data augmentation.
I trained my artificial neural network (ANN) in MATLAB with 652,500 data points, and in another blind test (652,100 data points - for completely new input data sets) the output is excellent (as I want). But the problem occurs when I insert very less amount of data (for example, below 50 data points). The output is quite unexpected, and I checked it many times.
To be more precise, the training phase contains 10% data for training, 45% for validation and 45% for testing. The training is quite successful, and for large amount of new input data it works very well. The problem is when very limited data (compared to training data points) are inserted in the neural network, it shows quite unrealistic output, beyond the range on what it was trained.
Why is this so? Could anyone light some sheds on this please?
Also mention please, is there any strict (hard and fast) rules on training and final testing data points? For example: what percent of training data should be / must be introduced in the new input data sets. I guess the problem is my network overestimate or underestimate the output as very less percentage of data it receives as compared to training phase.
Your problem is over-fitting of the dataset in duration of training. Data dividing is a very important task in training of a neural network. In general and more scientifically, the percentage of the training set should be between 70-80%. Test and validation sets should be each on around 10-15%. For instance:
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
You imagine a student in a class. TrainRatio is materials/lectures that should be learned by student. ValRatio is the percentage of the materials that should be examined as a middle-term examination, and TestRatio is the percentage of the materials should be examined as final examination. So, if you have not enough material for training, the student cannot be a success in the middle and final examination. Is it clear? A neural network works for such a simple student for learning/training. So, your network faces with over-fitting problems.
I was currently doing a project on Vehicle classification and it has almost finished now but I have several confusion about the plots I get from my Neural Network
I used 230 images [90=Hatchbacks,90=Sedans,50=SUVs] for classification on 80 feature points.
Thus my vInput was a [80x230] matrix and my vTarget was [3x230] matrix
Classifier works well but I don't understand these plots or if they are abnormal or not.
My neural Network
Then I clicked these 4 plots in the PLOT section and got these sequentially.
Performance Plot
Training State
Confusion Plot
Receiver Operating Characteristic Plot
I know the images they are a lots of images but I know nothing about them.
On the matlab documentation they just train the system and plot the graph
So please someone briefly explain them to me or show me some good links to learn them.
First two plots shows training statistscs.
Performance Plot shows you mean square error dynamics for all your datasets in logarithmic scale. Training MSE is always decreasing, so its validation and test MSE you should be interested in. Your plot shows a perfect training.
Training State shows you some other training statistics.
Gradient is a value of backpropagation gradient on each iteration in logarithmic scale. 5e-7 means that you reached the bottom of the local minimum of your goal function.
Validation fails are iterations when validation MSE increased its value. A lot of fails means owertrainig, but in you case its OK. Matlab automatically stops training after 6 fails in a row.
The other two plots shows you the results of your network simulation after training.
Confusion Plot. In your case its 100% accurate. Green cells represent correct answers and red cells represent all types of incorrect answers.
For example, you may read the first one (training set) as: "59 samples from the class 1 was corrctly classified as class 1, 13 samples from the class 2 was corrctly classified as class 2 and 6 samples from the class 3 was corrctly classified as class 3".
Receiver Operating Characteristic Plot shows the same thing, but in a different way - using ROC curve: