Multi-class multi-label classification using single label training data - neural-network

I am trying to do multi-class multi-label image classification using Convolutional Neural Network.
For the training process, I plan to use one-hot labelling to prep my labels. For example there's a total of 8 classes, and a sample image can be classified as classes 2, 4, and 6. Hence the label would look like
[0 1 0 1 0 1 0 0]
However, the input pipeline of the model I'm currently piggybacking on does not take in training data with multiple label. Instead of modifying the input pipeline for the model, my colleague suggested an alternative of duplicating the training data instead. Using the previous example, instead of feeding one training data with 3 labels, three duplicating training data with one label each will be fed instead. The three labels would look like
[0 1 0 0 0 0 0 0]
[0 0 0 1 0 0 0 0]
[0 0 0 0 0 1 0 0]
Given sufficient training data, would the model be able to learn to place more importance on the true values (ones) on the one-hot arrays instead of the false values (zeros)? Would the model be able to output proper multi-label data?

You can train the network with multinominal logistic regression or sigmoid cross entropy loss instead of the usual softmax, without the need to duplicate the data and longer training. Here is a nice tutorial on multilabel image classification.

Related

Poor performance help- muti-class classification by ANN

I'm implementing a 7-class classification task with normalised features and one-hot encoded labels. However, the training and validation accuracies have been extremely poor.
As shown, I normalised features from with StandardScaler() method and each feature vector turns out a 54-dim numpy array. Also, I one-encoded labels in the following manner.
As illustrated below, the labels are (num_y, 7) numpy arrays.
My network architecture:
It is shown here how I designed my model. And I'm wonder if the poor result has something to do with the selection of loss function (I've been using Categorical Cross-Entropy)
I appreciate any response from you. Thanks a lot!
The use of accuracy is obviously wrong. The code I refer to is not provided in your question, but I can speculate that you are comparing the true labels with your model outputs. Your model probably returns a vector of dimensionality 7 which constitutes a probability density function over the classes (due to the softmax activation in your final layer) like this:
model returns: (0.7 0 0.02 0.02 0.02 0.04 0.2) -- they sum to 1 because they represent probabilities
and then you are comparing these numbers with: (1 0 0 0 0 0 0)
what you have to do is translate the model output to the corresponding predicted label ((0.7 0 0.02 0.02 0.02 0.04 0.2) corresponds to (1 0 0 0 0 0 0) because the first output neuron has the larger value (0.7)). You may do that by applying a max function after your model outputs.
To make sure thats whats wrong with your problem formulation print the vector you are comparing with the true labels to get your accuracy and check if they are 7 numbers that sum up to 1.

Multiclass classification and the sigmoid function

Say have a training set Y :
1,0,1,0
0,1,1,0
0,0,1,1
0,0,1,0
And sigmoid function is defined as :
As the sigmoid function ouputs a value between 0 and 1 does this mean that the training data and value's we are trying to predict should also fall between 0 and 1 ?
Is also correct to use the sigmoid function for making predictions when training set values are not between 0 and 1 ? :
1,4,3,0
2,1,1,0
7,2,6,1
3,0,5,0
Yes, it is perfectly valid have non binary features.
The output falls between 0 and 1 because of the nature of the sigmoid function, there is nothing that stops you from having non binary feature set.
Do the predictions have to be binary?
Yes, you can have multiclass logistic classification as well.
The simplest way of doing that is solving a one-vs-all classification problem, wherein you train one binary logistic classifier for each of the labels.
For example. if your prediction space spans (1, 2, 3, 4), you can have 4 logistic classifiers.
Given any point in the test set, you can give it the label corresponding to the classifier which is most confident (i.e. has the highest score for that test point).

Is my implementation of confusion matrix correct? Or is something else at fault here?

I have trained a multi class svm classifier with 5 classes, i.e. svm(1)...svm(5).
I then used 5 images not used to during the training of these classifiers for testing.
These 5 images are then tested with their respective classifier. i.e. If 5 images were taken from class one they are tested against the same class.
predict = svmclassify(svm(i_t),test_features);
The predict produces a 5 by 1 vector showing the result.
-1
1
1
1
-1
I sum these and then insert it into a diagonal matrix.
Ideally it should be a diagonal matrix with 5 written diagonally when all images are correctly classified. But the result is very poor. I mean in some cases I am getting negative result. I just want to verify if this poor result is because my confusion matrix is not accurate or if I should use some other feature extractor.
Here is the code I wrote
svm_table = [];
for i_t = 1:numel(svm)
test_folder = [Path_training folders(i_t).name '\']; %select writer
feature_count = 1; %Initialize count for feature vector accumulation
for j_t = 6:10 %these 5 images that were not used for training
[img,map] = imread([test_folder imlist(j_t).name]);
test_img = imresize(img, [100 100]);
test_img = imcomplement(test_img);
%Features extracted here for each image.
%The feature vector for each image is a 1 x 16 vector.
test_features(feature_count,:) = Features_extracted;
%The feature vectors are accumulated in a single matrix. Each row is an image
feature_count = feature_count + 1; % increment the count
end
test_features(isnan(test_features)) = 0; %locate Nan and replace with 0
%I was getting NaN in some images, which was causing problems with svm, so just replaced with 0
predict = svmclassify(svm(i_t),test_features); %produce column vector of preicts
svm_table(end+1,end+1) = sum(predict); %sum them and add to matrix diagonally
end
this is what I am getting. Looks like a confusion matrix but is very poor result.
-1 0 0 0 0
0 -1 0 0 0
0 0 3 0 0
0 0 0 1 0
0 0 0 0 1
So I just want to know what is at fault here. My implementation of confusion matrix. My way of testing the svm or my selection of features.
I would like to add some issues:
You mention that: << These 5 images are then tested with their respective classifier. i.e. If 5 images were taken from class one they are tested against the same class. >>
You are never supposed to know the class (category) of test images. Of course, you need to know the test category labels for calculating various metrics such as accuracy, precision, confusion matrix etc. Apart from that, when you are using SVM to determine which class the example belongs to, you have to try all the SVMs.
There are two popular ways of training and testing multi-class SVMs, namely one-vs-all and one-vs-one approach. Read this answer and its corresponding question to understand them in detail.
I don't know if MATLAB SVM is capable of doing multiclass classification, but if you use LIBSVM then its uses one-vs-one approach. It will also do the testing for you correctly. However, if you want to design your own one-vs-one classifier, this is how you should proceed:
Say you have 5 classes, then train all possible combinations of pairs = 5c2 = 10 pairs ({1,2}, ..., {1,5},{2,1},...,{2,5},...,{5,4}). While testing, you have to apply all the 10 models and count all the votes to decide the final result. For example, we train models for 4 pairs (say), ({1 vs 2}, {1 vs 3}, {2 vs 1}, {2 vs 3}) and the outputs of 4 models are {1,1,0,1} respectively. That means, your 4 predicted classes are {1,1,1,2}. Therefore, the final class is 1.
Once you get all the predicted labels, then you can actually use the command confusionmat to get the confusion matrix. If you want to make your own, then make a 5x5 matrix of zeros. Add a 1 to the position (actual label, predicted label) i.e. if the actual class was 2 and you predicted it as 3, then add 1 at the position (2nd row, 3rd col) in the matrix.
Several issues that I can see...
1) What you're using is not really a multi class SVM. Your taking several different SVM models and applying them to the same test data (not really the same thing). You need to look at the documentation for svmtrain. When you use it you give it two kinds of data, the training data (parameter vectors for each training image) and the Group data (vector of classes for the images associated with the vectors..). What you get will be one SVM model which will decide between 1 of the options. (I usually use libsvm, so Im not that familiar with Matlabs SVM implementation, but that should be the gist of it)
2) Your confusion matrix is derived incorrectly (see: http://en.wikipedia.org/wiki/Confusion_matrix). Start by making a 5x5 zeros matrix to hold the confusion matrix. Loop through each of your test images and let the SVM model classify the image (it should pick 1 of the five possibilities). Add 1 at the proper position of the confusion matrix. So if the image should classify as a 3 and the SVM classifies it as a 4 you should add 1 to the 3,4 position...

Why matlab neural network classification returns decimal values

I have an input dataset (matrix 25x1575) which is normalized to values between 0 and 1.
I also have a binary formatted output matrix (9x1575) like 0 0 0 0 0 0 0 0 1, 1 0 0 1 1 1 0 0 1 ...
I imported both files in matlab nntool and it automatically created a network with 25 input and 9 output nodes as I wanted.
After I trained this network using feed-forward backProp, I tested the model in its training data and each output nodes returns a decimal value like (-0.1978 0.45913 0.12748 0.25072 0.45199 0.59368 0.38359 0.31435 1.0604).
Why it doesn't return discrete values like 1 0 0 1 1 1 0 0 1?
Is there any thing that I must set in nntool to get such values?
Depending on the nature of neurons, the output can be anything. The most popular neurons are linear, sigmoidal curve (range [0, 1]) and Hyperbolic Tangent (range [-1, 1]). The first one can output any value. The latter two c approximate step function (i.e. binary behavior), but it is up to the end user (you) to define the cut-off value for that translation.
You didn't say which neurons you use, but you should definitely read more on how neural networks are implemented and how they work. You may start with this video and then read Artificial Neural Networks for Beginners by C Gershenson.
UPDATE You say that you use tanh-sigmoid neurons and wonder how come you don't get values either very close to -1 or to 1.
The output of tanh neuron is hyperbolic tangent of the sum of all its inputs. Every value between -1 and 1 is possible. What determines the "steepness" of the output (in other words: the proportion of interim values) is the output values of the preceding neurons and their weights. These depend on the output of their preceding neurons and their weights etc etc etc. It is up to the learning algorithm to find the set of weights that minimizes a predefined scoring function, given a certain input. In a typical setup, a scoring function is a function that compares neural network output to a set of desired results and returns a single number that indicates how different the actual and the desired outputs are.
Before using NN you have to do some homework. At the minimum you have to decide what your goal is, how you interpret NN output and how you measure NN performance and how you update the weights.

Does matlab optimizes SVM linear parameters automatically

I am using SVM in Matlab for classification. I directly gave the training data set and class labels for training but without any parameters. My code looks like this:
traningData = myData;
label = [1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 0 1 1 1 0];
SVMStructure = svmtrain (myData, label);
... %further prediction part
I found the default SVM method is 'linear', so here in my code, it should be 'linear' then. But how about the parameter C? It says from the Matlab documentation:
The resulting structure, SVMstruct, contains the optimized parameters from the SVM algorithm, enabling you to classify new data
So does that mean Matlab automatically optimizes the paramter C for linear SVM here?
Those built in routines are pretty awful. They allow no flexibility and use dinosaur solvers. It is quite unclear what is going on behind the scenes.
I strongly recommend using another library for SVM training. Many popular packages are free and have MATLAB interfaces.