Understanding Matlab Pattern Recognition Neural Network Plots - matlab

I was currently doing a project on Vehicle classification and it has almost finished now but I have several confusion about the plots I get from my Neural Network
I used 230 images [90=Hatchbacks,90=Sedans,50=SUVs] for classification on 80 feature points.
Thus my vInput was a [80x230] matrix and my vTarget was [3x230] matrix
Classifier works well but I don't understand these plots or if they are abnormal or not.
My neural Network
Then I clicked these 4 plots in the PLOT section and got these sequentially.
Performance Plot
Training State
Confusion Plot
Receiver Operating Characteristic Plot
I know the images they are a lots of images but I know nothing about them.
On the matlab documentation they just train the system and plot the graph
So please someone briefly explain them to me or show me some good links to learn them.

First two plots shows training statistscs.
Performance Plot shows you mean square error dynamics for all your datasets in logarithmic scale. Training MSE is always decreasing, so its validation and test MSE you should be interested in. Your plot shows a perfect training.
Training State shows you some other training statistics.
Gradient is a value of backpropagation gradient on each iteration in logarithmic scale. 5e-7 means that you reached the bottom of the local minimum of your goal function.
Validation fails are iterations when validation MSE increased its value. A lot of fails means owertrainig, but in you case its OK. Matlab automatically stops training after 6 fails in a row.
The other two plots shows you the results of your network simulation after training.
Confusion Plot. In your case its 100% accurate. Green cells represent correct answers and red cells represent all types of incorrect answers.
For example, you may read the first one (training set) as: "59 samples from the class 1 was corrctly classified as class 1, 13 samples from the class 2 was corrctly classified as class 2 and 6 samples from the class 3 was corrctly classified as class 3".
Receiver Operating Characteristic Plot shows the same thing, but in a different way - using ROC curve:

Related

Confusion matrix doesn't match evaluation

I'm training with TensorFlow a CNN for image classification on dataset Food-101 and I reach a test accuracy of about 80% (I use model.evaluate()).
The issue I have is that when I plot the confusion matrix of the 3 classes involved this is very different and have maximum 40% on the main diagonal.
I could understand it if at least 1 of the 3 classes was around 100%, because I would expect the averaged accuracy to raise even with bad results in the other predictions. But in this case none of them is neither similar to what I achieve during evaluation.
I tried to plot the confusion matrix with training data, with which I reached more than 90% accuracy during the learning process, and also this is not correct.

Neural network parameter selection

I am looking at (two-layer) feed-forward Neural Networks in Matlab. I am investigating parameters that can minimise the classification error.
A google search reveals that these are some of them:
Number of neurons in the hidden layer
Learning Rate
Momentum
Training type
Epoch
Minimum Error
Any other suggestions?
I've varied the number of hidden neurons in Matlab, varying it from 1 to 10. I found that the classification error is close to 0% with 1 hidden neuron and then grows very slightly as the number of neurons increases. My question is: shouldn't a larger number of hidden neurons guarantee an equal or better answer, i.e. why might the classification error go up with more hidden neurons?
Also, how might I vary the Learning Rate, Momentum, Training type, Epoch and Minimum Error in Matlab?
Many thanks
Since you are considering a simple two layer feed forward network and have already pointed out 6 different things you need to consider to reduce classification errors, I just want to add one thing only and that is amount of training data. If you train a neural network with more data, it will work better. Note that, training with large amount of data is a key to get good outcome from neural networks, specially from deep neural networks.
Why the classification error goes up with more hidden neurons?
Answer is simple. Your model has over-fitted the training data and thus resulting in poor performance. Note that, if you increase the number of neurons in hidden layers, it would decrease training errors but increase testing errors.
In the following figure, see what happens with increased hidden layer size!
How may I vary the Learning Rate, Momentum, Training type, Epoch and Minimum Error in Matlab?
I am expecting you have already seen feed forward neural net in Matlab. You just need to manipulate the second parameter of the function feedforwardnet(hiddenSizes,trainFcn) which is trainFcn - a training function.
For example, if you want to use gradient descent with momentum and adaptive learning rate backpropagation, then use traingdx as the training function. You can also use traingda if you want to use gradient descent with adaptive learning rate backpropagation.
You can change all the required parameters of the function as you want. For example, if you want to use traingda, then you just need to follow the following two steps.
Set net.trainFcn to traingda. This sets net.trainParam to traingda's default parameters.
Set net.trainParam properties to desired values.
Example
net = feedforwardnet(3,'traingda');
net.trainParam.lr = 0.05; % setting the learning rate to 5%
net.trainParam.epochs = 2000 % setting number of epochs
Please see this - gradient descent with adaptive learning rate backpropagation and gradient descent with momentum and adaptive learning rate backpropagation.

ANN: Perceptron performance in determining point positions

So I'm starting with machine learning and Artificial Neural Networks and I found this article The Nature of Code that introduces to Artificial Neural Networks and the idea of a Perceptron.
Through the article they show you how to create a Perceptron that is able to discriminate between points positioned above or below a line based on the function:
f(x) = 2x + 1
I developed my own Perceptron in Swift and used XCode Playgrounds to illustrate my Perceptron performance.
The perceptron takes 3 inputs: x, y and bias (always 1). The weights of the 3 inputs are generated at random, and after some training they are adjusted.
This first graphic shows the value of the first weight over trainings. As you can see, the value stabilizes at the end, and this is a proof that the Perceptron learned to discriminate points:
The second graphic represents the function line and all the training points (selected at random). The green points are the ones that the Perceptron predicted well, whereas the red ones are the wrong predictions:
As you can see, almost all of the red dots are situated in the inverse function:
f(x) = -2x - 1
My question is why this line of error dots appear. I thought that at a certain point all the weights would be stabilized and that the Perceptron performance would be 100%, but it never does. Is this because of a code bug or ANN always have this tiny interval of error?
Any explanation will be welcomed, although keep in mind that I'm a newbie at ML and ANN.
Thank you very much.

Artificial neural network presented with unclassified inputs

I am trying to classify portions of time series data using a feed forward neural network using 20 neurons in a single hidden layer, and 3 outputs corresponding to the 3 events I would like to be able to recognize. There are many other things that I could classify in the data (obviously), but I don't really care about them for the time being. Neural network creation and training has been performed using Matlab's neural network toolbox for pattern recognition, as this is a classification problem.
In order to do this I am sequentially populating a moving window, then inputting the window into the neural network. The issue I have is that I am obviously not able to classify and train every possible shape the time series takes on. Due to this, I typically get windows filled with data that look very different from the windows I used to train the neural network, but still get outputs near 1.
Essentially, the 3 things I trained the ANN with are windows of 20 different data sets that correspond to shapes that would correspond to steady state, a curve that starts with a negative slope and levels off to 0 slope (essentially the left half side of a parabola that opens upwards), and a curve corresponding to 0 slope that quickly declines (right half side of a parabola that opens downwards).
Am I incorrect in thinking that if I input data that doesn't correspond to any of the items I trained the ANN with it should output values near 0 for all outputs?
Or is it likely due to the fact that these basically cover all the bases of steady state, increasing and decreasing, despite large differences in slope, and therefore something is always classified?
I guess I just need a nudge in the right direction.
Neural network output values
A neural network may not guarantee specific output values if these input values / expected output values were presented during the training period.
A neural network will not consistently output 0 for untrained input values.
A solution is to simply present the network with an array of input values that should result in the network outputting 0.

Conceptual issues on training neural network wih particle swarm optimization

I have a 4 Input and 3 Output Neural network trained by particle swarm optimization (PSO) with Mean square error (MSE) as the fitness function using the IRIS Database provided by MATLAB. The fitness function is evaluated 50 times. The experiment is to classify features. I have a few doubts
(1) Does the PSO iterations/generations = number of times the fitness function is evaluated?
(2) In many papers I have seen the training curve of MSE vs generations being plot. In the picture, the graph (a) on left side is a model similar to NN. It is a 4 input-0 hidden layer-3 output cognitive map. And graph (b) is a NN trained by the same PSO. The purpose of this paper was to show the effectiveness of the new model in (a) over NN.
But they mention that the experiment is conducted say Cycles = 100 times with Generations =300. In that case, the Training curve for (a) and (b) should have been MSE vs Cycles and not MSE vs PSO generations ? For ex, Cycle1 : PSO iteration 1-50 --> Result(Weights_1,Bias_1, MSE_1, Classification Rate_1). Cycle2: PSO iteration 1- 50 -->Result(Weights_2,Bias_2, MSE_2, Classification Rate_2) and so on for 100 Cycles. How come the X axis in (a),(b) is different and what do they mean?
(3) Lastly, for every independent run of the program (Running the m file several times independently, through the console) , I never get the same classification rate (CR) or the same set of weights. Concretely, when I first run the program I get W (Weights) values and CR =100%. When I again run the Matlab code program, I may get CR = 50% and another set of weights!! As shown below for an example,
%Run1 (PSO generaions 1-50)
>>PSO_NN.m
Correlation =
0
Classification rate = 25
FinalWeightsBias =
-0.1156 0.2487 2.2868 0.4460 0.3013 2.5761
%Run2 (PSO generaions 1-50)
>>PSO_NN.m
Correlation =
1
Classification rate = 100
%Run3 (PSO generaions 1-50)
>>PSO_NN.m
Correlation =
-0.1260
Classification rate = 37.5
FinalWeightsBias =
-0.1726 0.3468 0.6298 -0.0373 0.2954 -0.3254
What should be the correct method? So, which weight set should I finally take and how do I say that the network has been trained? I am aware that evolutionary algorithms due to their randomness will never give the same answer, but then how do I ensure that the network has been trained?
Shall be obliged for clarification.
As in most machine learning methods, the number of iterations in PSO is the number of times the solution is updated. In the case of PSO, this is the number of update rounds over all particles. The cost function here is evaluated after every particle is updated, so more than the number of iterations. Approximately, (# cost function calls) = (# iterations) * (# particles).
The graphs here are comparing different classifiers, fuzzy cognitive maps for graph (a) and a neural network for graph (b). So the X-axis displays the relevant measures of learning iterations for each.
Every time you run your NN you initialize it with different random values, so the results are never the same. The fact that the results vary greatly from one run to the next means you have a convergence problem. The first thing to do in this case is to try running more iterations. In general convergence is a rather complicated issue and the solutions vary greatly with applications (and read carefully through the answer and comments Isaac gave you on your other question). If your problem persists after increasing the number of iterations, you can post it as a new question, providing a sample of your data and the actual code you use to construct and train the network.