Which algorithm will be best for Multi-label image classification - tf.keras

I'm training data for detecting the damage of the car to calculate it's damage amount. I got upto 1000 image dataset and 5 labels.
I found a algorithm which algorithm is suitable for this use case which is CNN using keras. But in the result data it returns all the label and its confidence. It does not eliminates the labels that are not present.
I can filter the result using any threshold. But is there any algorithm that is available which will eliminate the labels that are not present and provide only labels that are available in the image?

Related

Multiclass classification or regression?

I am trying to train a CNN model to classify images based on their aesthetic score. There are 2,00,000 images and every image is rated by more than 100 subjects. Mean score is calculated and the scores are normalized.
The distribution of the scores is approximately gaussian. So I have decided to build a 10 class classification model after assigning appropriate weight for each class as the data is imbalanced.
My question:
For this problem, the scores are continuous, ie, 0<0.2<0.3<0.4<0.5<..<1.
Then does that mean this is a regression problem? If so, how do I balance the data for a regression problem, as most of the datapoints are present in between 0.4 and 0.6.
Thanks!
Since your labels are continuous, you could divide them in to 10 equal quantiles using a technique like pandas.qcut() and provide label to each classes. This can turn a regression problem to a classification problem.
And as far as the imbalance is concerned, you may want to try to oversample the minority data. This will ensure your model is not biased towards majority data.
Hope this helps.
I would recommend you to do a Histogram Equalization over ALL data of your participants first, so that their ratings are destributed equaly.
Then for each image in your training set calculate the Expected Value (and if you also want to, the Variance) The Expected Value is just the mean of the votes. For the Variance there are standard functions in (almost) every programming language where you can input an array of votes which will output the Variance.
Now take the Expected Value (and if you want also the Variance) as your ground truth for your Network.
EDIT: Histogram Equalization:
Histogram equalization is a method to use the given numerical range as efficient as possible.
In the context of images, this would change the pixel values, so that the darkest pixel becomes the value 0 and the lightest value becomes 255. Furthermore every grayscale value gets destributed so that it occurs as often as each other (in average). For your dataset you want the same. Even though your values are not from 0 to 255 but from 0 to 10. Furthermore you don't need to (and shoudn't) round the resulting values to integers. In this way more often occurring votes are more spread and less often votes are contracted.
Maybe you should first calculate the expected value and than do the histogram equalization over the expected values of all images.
By this the CNN sould be able to better differentiate those small differences.

Convolution Neural Network for image detection/classification

So here is there setup, I have a set of images (labeled train and test) and I want to train a conv net that tells me whether or not a specific object is within this image.
To do this, I followed the tensorflow tutorial on MNIST, and I train a simple conv net reduced to the area of interest (the object) which are training on image of size 128x128. The architecture is as follows : successively 3 layers consisting of 2 conv layers and 1 max pool down-sampling layers, and one fully connected softmax layers (with two class 0 and 1 whether the object is present or not)
I impleted it using tensorflow, and this works quite well, but since I have enough computing power I was wondering how I could improve the complexity of the classification:
- adding more layers ?
- adding more channel at each layer ? (currently 32,64,128 and 1024 for the fully connected)
- anything else ?
But the most important part is that now I want to detect this same object on larger images (roughle 600x600 whereas the size of the object should be around 100x100).
I was wondering how I could use the previously training "small" network used for small images, in order to pretrained a larger network on the large images ? One option could be to classify the image using a slicing window of size 128x128 and scan the whole image but I would like to try if possible to train a whole network on it.
Any suggestion on how to proceed ? Or an article / ressource tackling this kind of problem ? (I am really new to deep learning so sorry if this is stupid question...)
Thanks !
I suggest that you continue reading on the field overall. Your search keys include CNN, image classification, neural net, AlexNet, GoogleNet, and ResNet. This will return many articles, on-line classes and lectures, and other materials to help you learn about classification with neural nets.
Don't just add layers or filters: the complexity of the topology (net design) must be fitted to the task; a net that's too complex will over-fit the training data. The one you've been using is probably LeNet; the three I cite above are for the ImageNet image classification contest.
Since you are working on images, I would suggest you to use a pretrained image classification network (like VGG, Alexnet etc.)and fine tune this network with your 128x128 image data. In my experience until we have very large data set fine tuned network will give more accuracy and also save training time. After building a good image classifier on your data set you can use any popular algorithm to generate region of proposal from the image. Now take all regions of proposal and pass them to classification network one by one and check weather this network is classifying given region of proposal as positive or negative. If it classifying as positively then most probably your object is present in that region. Otherwise it's not. If there are a lot of region of proposal in which object is present according to classifier then you can use non maximal suppression algorithms to reduce number of positive proposals.

Convolutional autoencoder not learning meaningful filters

I am playing with TensorFlow to understand convolutional autoencoders. I have implemented a simple single-layer autoencoder which does this:
Input (Dimension: 95x95x1) ---> Encoding (convolution with 32 5x5 filters) ---> Latent representation (Dimension: 95x95x1x32) ---> Decoding (using tied weights) ---> Reconstructed input (Dimension: 95x95x1)
The inputs are black-and-white edge images i.e. the results of edge detection on RGB images.
I initialised the filters randomly and then trained the model to minimise loss, where loss is defined as the mean-squared-error of the input and the reconstructed input.
loss = 0.5*(tf.reduce_mean(tf.square(tf.sub(x,x_reconstructed))))
After training with 1000 steps, my loss converges and the network is able to reconstruct the images well. However, when I visualise the learned filters, they do not look very different from the randomly-initialised filters! But the values of the filters change from training step to training step.
Example of learned filters
I would have expected at least horizontal and vertical edge filters. Or if my network was learning "identity filters" I would have expected the filters to all be white or something?
Does anyone have any idea about this? Or are there any suggestions as to what I can do to analyse what is happening? Should I include pooling and depooling layers before decoding?
Thank you!
P/S: I tried the same model on RGB images and again the filters look random (like random blotches of colours).

KNN Classifier using cross validation

I am trying to implement KNN classifier using the cross validation approach where I have different images of a certain character for training(e.g 5 images) and another two for testing. Now I get the idea of the cross validation by simply choosing the K with the least error value when training & then using it with the test data to find how accurate my results are.
My question is how do I train images in matlab to get my K value? Do I compare them and try to find mismatch or what?!
Any help would be really appreciated.
First of you need to define your task precisely. F.ex Given an image I in R^(MxN) we wish to classify I as an image containing faces or an image without faces.
I often work with pixel classifiers, where the task is something like: For an image I decide if each pixel is a face pixel or a non-face pixel.
An important part of defining the task is to make a hypotheses that can be used as basis for training a classifier. F.ex We believe that the distribution of pixel intensities can be used to discriminate images of faces from images not containing faces.
Then you need to select some features that define your image. This can be done in many ways and you should search for what other people do when they analyse the same type of images you are working with.
One widely used method in pixel classification is to use pixel intensity values and do a multi-scale analysis of the image. The idea in multi-scale analysis is that different structures are most evident at different level of blurring called scales. As an illustration consider an image of a tree. Without blurring we notice the fine structure, such as small branches and leafs. When we blur the image we notice the trunk and major branches. This is often used as part of segmentation methods.
When you know your task and the features, you can train a classifier. If you use kNN and cross-validation to find the best k, you should split you dataset in train/testing and then split the training set in train/validate sets. You then train using the reduced training set and use the validation set to decide which k is the best. In the case of binary classification e.g face vs non-face the error rate is often used as a measure of performance.
Finally you use the parameters to train the classifier on the full dataset and estimate its performance on the test set.
A classification example: With or without milk?
As a full example, consider images of a cup of coffee taken from above so it shows the rim of the cup surrounding a brownly colored disk. Further assume that all images are scaled and cropped so the diameter of the disk is the same and dimensions of the image are the same. To simplify the task, we convert the color image to grayscale and scale the pixel intensities to the range [0,1].
We want to train a classifier so it can distinguish coffee with milk from coffee without milk. From inspection of histograms of some of the coffee images, we see that each image has two "bumps" in the histogram that are clearly separated. We believe that these bumps correspond to foreground (coffee) and background. Now we make the hypothesis that the average intensity of the foreground can be used to distinguish between coffee+milk/coffee.
To find the foreground pixels we observe that because the foreground/background ratio is the same (by design) we can just find the intensity value that gives us that ratio for each image. Then we calculate the average intensity of the foreground pixels and use this value as a feature for each image.
If we have N images that we have manually labeled, we split this into training and test set. We then calculate the average foreground intensity for each image in the training set, giving us a set of (average foreground intensity, label) values. We want to use kNN where an image is assigned the same class as the majority class of the k closest images. We measure the distance as the absolute value of the difference in average foreground pixel intensity.
We search for the optimal k with cross validation. We use 2-fold cross validation (aka holdout) to find the best k. We test k = {1,3,5} and select the k that gives the least prediction error on the validation set.

HOG Feature Implementation with SVM in MATLAB

I would like to do classification based on HOG Features using SVM.
I understand that HOG features is the combination of all the histograms in every cell (i.e. it becomes one aggregate histogram).
I extract HOG features using MATLAB code in this page for the Dalal-Triggs variant.
For example, I have grayscale image with size of 384 x 512, then I extracted the HOG features at 9 orientations and a cell-size of 8. By doing this, I get 48 x 64 x 36 features.
How can I make this a histogram and use it toward a SVM classifier?
Because for example, I'll have 7 classes of images and I want to do training (total images would be 700 for training) and then classify new data based on the model generated from the training phase.
I read that for multiclass, we can train our SVM with ONE vs ALL, that means that I have to train 7 classifier for my 7 classes.
So for the 1st train, I'll consider the 1st class to be labelled with +1 and the reast class will be 0.
And the 2nd train, I'll consider the 2nd class to be labelled with +1 and the reast class will be 0. And so on..
For example, I have classes of colors :
Red, green, blue, yellow, white, black and pink.
So for the 1st training, I make only 2 binary which is red and not red..
For the 2nd training, I make label green and not green..
Is it like that??
The syntax to train SVM is:
SVMStruct = svmtrain(Training,Group)
But in this case, I'll have 7 SVMStruct..
The syntax to classify / testing
Group = svmclassify(SVMStruct,Sample)
how to declare 7 SVMStruct in here??
Is that right??
Or there are another concept or syntaks that I have to know??
And for training, I'll have 48 x 64 x 36 features, howw I can train these features in SVM??
because as what I read, they just have 1xN matrix of features..
Please help me...
HOG and SVM is most successful algorithm for object detection. To apply this method, yes indeed you must have two different training datasets before they are fed into SVM classifier. For instance, you want to detect an apple, so you must have two training dataset, positive images is the one contains an apple in image and negative images is the one contains no apples in image. Then, you extract the features from both training datasets (positive and negative) into HOG descriptor separately and also label it separately (i.e 1 is for positive, 0 for negative). Afterwards, combine the features vector from positive and negative and feed them to SVM Classifier.
You can use SVM Light or LibSVM, which is easier and user friendly for beginner.
The Computer Vision System Toolbox for MATLAB includes extractHOGFeatures function, and the Statistics Toolbox includes SVM. Here's an example of how to classify images using HOG and SVM.
1. How can I make this a histogram and use it toward a SVM classifier?
One distinction I want to make is that you already have a 'histogram' of oriented gradient features. You now need to give these features as input to the SVM. It is weird to assign labels to each of these features because one HoG feature might turn up in another image labelled differently.
In practice what is done is to make another histogram called a bag of words from these HoG features and give them to the SVM as input. The intuition is if two features are very similar you would want one representation for both these HoG features. This reduces the variance in the input data. Now we make this new histogram for each image.
A bag of words is created in the following way:
Cluster all the HoG features into 'words'. Say you have 1000 of
these words.
Go through all HoG features and assign HoG feature to a word if it
is closest(euclidean distance) to that word among all words in bag.
Count how many words are assigned to each word in bag. This is the
1XN(N = number of words in bag) feature histogram which will be given
as input to the SVM after labeling.
2. How to carry out multi-class classification using a SVM?
If you will retrain the SVM you will get another model. There are two ways how you might do multiclass SVM using SVMTrain
1) One vs One SVM
Train for each label class with input in the following way:
Example for model 1 input will be
bag of words features for Image 1, RED
bag of words features Image 2, GREEN
Example for model 1 input will be
bag of words features for Image 3, YELLOW
bag of words features Image 2, GREEN
The above is done for each pair of label classes. You will have N(N-1)/2 models. Now you can count the number of votes for each class from the N(N-1)/2 models to find which label to assign.
2) One vs All SVM
Train for each label class with input in the following way:
Example for model 1 input will be
bag of words features for Image 1, RED
bag of words features for Image 2, NOT RED
Example for model 1 input will be
bag of words features for Image 2, GREEN
bag of words features for Image 1, NOT GREEN
The above is done each label class. You will have N models. Now you can count the number of votes for each class from the N models to find which label to assign.
Read more on category-level classification here: http://www.di.ens.fr/willow/events/cvml2013/materials/slides/tuesday/Tue_bof_summer_school_paris_2013.pdf