Poor performance help- muti-class classification by ANN - neural-network

I'm implementing a 7-class classification task with normalised features and one-hot encoded labels. However, the training and validation accuracies have been extremely poor.
As shown, I normalised features from with StandardScaler() method and each feature vector turns out a 54-dim numpy array. Also, I one-encoded labels in the following manner.
As illustrated below, the labels are (num_y, 7) numpy arrays.
My network architecture:
It is shown here how I designed my model. And I'm wonder if the poor result has something to do with the selection of loss function (I've been using Categorical Cross-Entropy)
I appreciate any response from you. Thanks a lot!

The use of accuracy is obviously wrong. The code I refer to is not provided in your question, but I can speculate that you are comparing the true labels with your model outputs. Your model probably returns a vector of dimensionality 7 which constitutes a probability density function over the classes (due to the softmax activation in your final layer) like this:
model returns: (0.7 0 0.02 0.02 0.02 0.04 0.2) -- they sum to 1 because they represent probabilities
and then you are comparing these numbers with: (1 0 0 0 0 0 0)
what you have to do is translate the model output to the corresponding predicted label ((0.7 0 0.02 0.02 0.02 0.04 0.2) corresponds to (1 0 0 0 0 0 0) because the first output neuron has the larger value (0.7)). You may do that by applying a max function after your model outputs.
To make sure thats whats wrong with your problem formulation print the vector you are comparing with the true labels to get your accuracy and check if they are 7 numbers that sum up to 1.

Related

Thresholding for neural networks

I implemented a neural network for AND gate with 2 input units, 2 hidden units and 1 output unit.
I trained the neural network using 40 inputs for 200 epochs with a learning rate of 0.03.
When I try to test the trained neural network for AND inputs, it gives me output as :
0,0 = 0.295 (0 expected)
0,1 = 0.355 (0 expected)
1,0 = 0.329 (0 expected)
1,1 = 0.379 (1 expected)
This is not the output which is expected from the network. But if I set the threshold as 0.36 and set all values above 0.36 as 1 and rest as 0, the neural network output is just as expected every time.
My question is that : Is applying a threshold to the output of the network necessary in order to generate expected outputs like in my case ?
A threshold is not necessary, but can help better classification, for example.
In your case, maybe you can set a threshold of 0.1 for 0, and 0.9 for 1. When your output is below 0.1, we can consider that it's a 0, and if output is higher than 0.9, then it's a 1.
Therefore, set a threshold of 0.36 just because with that test example it works, is a really bad idea.
Because 0.36 is far from the output 1 that we want. And because it may (will) not work with all your test datas.
You should consider problem with your code.
This is not the initial question, but here is some ideas :
1. Look at you training accuracy at each eopch. If it's learn slowly, increase your learning rate, and maybe descrease it after few epoch.
2. If the accuracy don't change, look at your Back probagation algorithm
3. Look at your dataset and be sure inputs and outputs are correct
4. Be sure that your weights are initialized randomly
5. AND gate can be solve with a linear NN, without hidden layer. Myabe try removing your hidden layer ?

How can I efficiently find the accuracy of a classifier

Even with a simple classifier like the nearest neighbour I cannot seem to judge its accuracy and thus cannot improve it.
For example with the code below:
IDX = knnsearch(train_image_feats, test_image_feats);
predicted_categories = cell([size(test_image_feats, 1), 1]);
for i=1:size(IDX,1)
predicted_categories{i}=train_labels(IDX(i));
end
Here train_image_feats is a 300 by 256 matrix where each row represents an image. Same is the structure of test_image_feats. train_labels is the label corresponding to each row of the training matrix.
The book I am following simply said that the above method achieves an accuracy of 19%.
How did the author come to this conclusion? Is there any way to judge the accuracy of my results be it with this classifier or other?
The author then uses another method of feature extraction and says it improved accuracy by 30%.
How can I find the accuracy? Be it graphically or just via a simple percentage.
Accuracy when doing machine learning and classification is usually calculated by comparing your predicted outputs from your classifier in comparison to the ground truth. When you're evaluating the classification accuracy of your classifier, you will have already created a predictive model using a training set with known inputs and outputs. At this point, you will have a test set with inputs and outputs that were not used to train the classifier. For the purposes of this post, let's call this the ground truth data set. This ground truth data set helps assess the accuracy of your classifier when you are providing inputs to this classifier that it has not seen before. You take your inputs from your test set, and run them through your classifier. You get outputs for each input and we call the collection of these outputs the predicted values.
For each predicted value, you compare to the associated ground truth value and see if it is the same. You add up all of the instances where the outputs match up between the predicted and the ground truth. Adding all of these values up, and dividing by the total number of points in your test set yields the fraction of instances where your model accurately predicted the result in comparison to the ground truth.
In MATLAB, this is really simple to calculate. Supposing that your categories for your model were enumerated from 1 to N where N is the total number of labels you are classifying with. Let groundTruth be your vector of labels that denote the ground truth while predictedLabels denote your labels that are generated from your classifier. The accuracy is simply calculated by:
accuracy = sum(groundTruth == predictedLabels) / numel(groundTruth);
accuracyPercentage = 100*accuracy;
The first line of code calculates what the accuracy of your model is as a fraction. The second line calculates this as a percentage, where you simply multiply the first line of code by 100. You can use either or when you want to assess accuracy. One is just normalized between [0,1] while the other is a percentage from 0% to 100%. What groundTruth == predictedLabels does is that it compares each element between groundTruth and predictedLabels. If the ith value in groundTruth matches with the ith value in predictedLabels, we output a 1. If not, we output a 0. This will be a vector of 0s and 1s and so we simply sum up all of the values that are 1, which is eloquently encapsulated in the sum operation. We then divide by the total number of points in our test set to obtain the final accuracy of the classifier.
With a toy example, supposing I had 4 labels, and my groundTruth and predictedLabels vectors were this:
groundTruth = [1 2 3 2 3 4 1 1 2 3 3 4 1 2 3];
predictedLabels = [1 2 2 4 4 4 1 2 3 3 4 1 2 3 3];
The accuracy using the above vectors gives us:
>> accuracy
accuracy =
0.4000
>> accuracyPercentage
accuracyPercentage =
40
This means that we have a 40% accuracy or an accuracy of 0.40. Using this example, the predictive model was only able to accurately classify 40% of the test set when you put each test set input through the classifier. This makes sense, because between our predicted outputs and ground truth, only 40%, or 6 outputs match up. These are the 1st, 2nd, 6th, 7th, 10th and 15th elements. There are other metrics to calculating accuracy, like ROC curves, but when calculating accuracy in machine learning, this is what is usually done.

Is my implementation of confusion matrix correct? Or is something else at fault here?

I have trained a multi class svm classifier with 5 classes, i.e. svm(1)...svm(5).
I then used 5 images not used to during the training of these classifiers for testing.
These 5 images are then tested with their respective classifier. i.e. If 5 images were taken from class one they are tested against the same class.
predict = svmclassify(svm(i_t),test_features);
The predict produces a 5 by 1 vector showing the result.
-1
1
1
1
-1
I sum these and then insert it into a diagonal matrix.
Ideally it should be a diagonal matrix with 5 written diagonally when all images are correctly classified. But the result is very poor. I mean in some cases I am getting negative result. I just want to verify if this poor result is because my confusion matrix is not accurate or if I should use some other feature extractor.
Here is the code I wrote
svm_table = [];
for i_t = 1:numel(svm)
test_folder = [Path_training folders(i_t).name '\']; %select writer
feature_count = 1; %Initialize count for feature vector accumulation
for j_t = 6:10 %these 5 images that were not used for training
[img,map] = imread([test_folder imlist(j_t).name]);
test_img = imresize(img, [100 100]);
test_img = imcomplement(test_img);
%Features extracted here for each image.
%The feature vector for each image is a 1 x 16 vector.
test_features(feature_count,:) = Features_extracted;
%The feature vectors are accumulated in a single matrix. Each row is an image
feature_count = feature_count + 1; % increment the count
end
test_features(isnan(test_features)) = 0; %locate Nan and replace with 0
%I was getting NaN in some images, which was causing problems with svm, so just replaced with 0
predict = svmclassify(svm(i_t),test_features); %produce column vector of preicts
svm_table(end+1,end+1) = sum(predict); %sum them and add to matrix diagonally
end
this is what I am getting. Looks like a confusion matrix but is very poor result.
-1 0 0 0 0
0 -1 0 0 0
0 0 3 0 0
0 0 0 1 0
0 0 0 0 1
So I just want to know what is at fault here. My implementation of confusion matrix. My way of testing the svm or my selection of features.
I would like to add some issues:
You mention that: << These 5 images are then tested with their respective classifier. i.e. If 5 images were taken from class one they are tested against the same class. >>
You are never supposed to know the class (category) of test images. Of course, you need to know the test category labels for calculating various metrics such as accuracy, precision, confusion matrix etc. Apart from that, when you are using SVM to determine which class the example belongs to, you have to try all the SVMs.
There are two popular ways of training and testing multi-class SVMs, namely one-vs-all and one-vs-one approach. Read this answer and its corresponding question to understand them in detail.
I don't know if MATLAB SVM is capable of doing multiclass classification, but if you use LIBSVM then its uses one-vs-one approach. It will also do the testing for you correctly. However, if you want to design your own one-vs-one classifier, this is how you should proceed:
Say you have 5 classes, then train all possible combinations of pairs = 5c2 = 10 pairs ({1,2}, ..., {1,5},{2,1},...,{2,5},...,{5,4}). While testing, you have to apply all the 10 models and count all the votes to decide the final result. For example, we train models for 4 pairs (say), ({1 vs 2}, {1 vs 3}, {2 vs 1}, {2 vs 3}) and the outputs of 4 models are {1,1,0,1} respectively. That means, your 4 predicted classes are {1,1,1,2}. Therefore, the final class is 1.
Once you get all the predicted labels, then you can actually use the command confusionmat to get the confusion matrix. If you want to make your own, then make a 5x5 matrix of zeros. Add a 1 to the position (actual label, predicted label) i.e. if the actual class was 2 and you predicted it as 3, then add 1 at the position (2nd row, 3rd col) in the matrix.
Several issues that I can see...
1) What you're using is not really a multi class SVM. Your taking several different SVM models and applying them to the same test data (not really the same thing). You need to look at the documentation for svmtrain. When you use it you give it two kinds of data, the training data (parameter vectors for each training image) and the Group data (vector of classes for the images associated with the vectors..). What you get will be one SVM model which will decide between 1 of the options. (I usually use libsvm, so Im not that familiar with Matlabs SVM implementation, but that should be the gist of it)
2) Your confusion matrix is derived incorrectly (see: http://en.wikipedia.org/wiki/Confusion_matrix). Start by making a 5x5 zeros matrix to hold the confusion matrix. Loop through each of your test images and let the SVM model classify the image (it should pick 1 of the five possibilities). Add 1 at the proper position of the confusion matrix. So if the image should classify as a 3 and the SVM classifies it as a 4 you should add 1 to the 3,4 position...

Why matlab neural network classification returns decimal values

I have an input dataset (matrix 25x1575) which is normalized to values between 0 and 1.
I also have a binary formatted output matrix (9x1575) like 0 0 0 0 0 0 0 0 1, 1 0 0 1 1 1 0 0 1 ...
I imported both files in matlab nntool and it automatically created a network with 25 input and 9 output nodes as I wanted.
After I trained this network using feed-forward backProp, I tested the model in its training data and each output nodes returns a decimal value like (-0.1978 0.45913 0.12748 0.25072 0.45199 0.59368 0.38359 0.31435 1.0604).
Why it doesn't return discrete values like 1 0 0 1 1 1 0 0 1?
Is there any thing that I must set in nntool to get such values?
Depending on the nature of neurons, the output can be anything. The most popular neurons are linear, sigmoidal curve (range [0, 1]) and Hyperbolic Tangent (range [-1, 1]). The first one can output any value. The latter two c approximate step function (i.e. binary behavior), but it is up to the end user (you) to define the cut-off value for that translation.
You didn't say which neurons you use, but you should definitely read more on how neural networks are implemented and how they work. You may start with this video and then read Artificial Neural Networks for Beginners by C Gershenson.
UPDATE You say that you use tanh-sigmoid neurons and wonder how come you don't get values either very close to -1 or to 1.
The output of tanh neuron is hyperbolic tangent of the sum of all its inputs. Every value between -1 and 1 is possible. What determines the "steepness" of the output (in other words: the proportion of interim values) is the output values of the preceding neurons and their weights. These depend on the output of their preceding neurons and their weights etc etc etc. It is up to the learning algorithm to find the set of weights that minimizes a predefined scoring function, given a certain input. In a typical setup, a scoring function is a function that compares neural network output to a set of desired results and returns a single number that indicates how different the actual and the desired outputs are.
Before using NN you have to do some homework. At the minimum you have to decide what your goal is, how you interpret NN output and how you measure NN performance and how you update the weights.

backpropagation neural network with discrete output

I am working through the xor example with a three layer back propagation network. When the output layer has a sigmoid activation, an input of (1,0) might give 0.99 for a desired output of 1 and an input of (1,1) might give 0.01 for a desired output of 0.
But what if want the output to be discrete, either 0 or 1, do I simply set a threshold in between at 0.5? Would this threshold need to be trained like any other weight?
Well, you can of course put a threshold after the output neuron which makes the values after 0.5 as 1 and, vice versa, all the outputs below 0.5 as zero. I suggest to don't hide the continuous output with a discretization threshold, because an output of 0.4 is less "zero" than a value of 0.001 and this difference can give you useful information about your data.
Do the training without threshold, ie. computes the error on a example by using what the neuron networks outputs, without thresholding it.
Another little detail : you use a transfer function such as sigmoid ? The sigmoid function returns values in [0, 1], but 0 and 1 are asymptote ie. the sigmoid function can come close to those values but never reach them. A consequence of this is that your neural network can not exactly output 0 or 1 ! Thus, using sigmoid times a factor a little above 1 can correct this. This and some other practical aspects of back propagation are discussed here http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf