Would the division of classes into subclasses increase the predictive accuracy of image classification? - classification

If images in a class differ in image classification, should the class be further subdivided into subclasses in which the images are more similar, or is that not necessary because different features can be learned in image classification?
For example, images should be classified into 3 classes: flowers, cars, and chairs. Is it then okay to pack all the different flowers in one flower class, or would it be better to further subdivide the class into daisies, tulips, dahlias, asters, and so on. If an aster is found, then I also know that it is a flower.

It depends on your problem.
What do you need to predict? If you need only the information whether the input is a flower, a car, or a chair, then there is no need to subdivide it.
However, it is always a good idea to do experiments and see the results.
Train different models using subdivided classes and some with the target as-is and compare the performance.

Related

Is it important to pre-train a model by using meaningless classes?

If I want to train an object-detecion model which can detect 5 classes in the picture, Is it important to pre-train this model in a large dataset like coco(80 categories of objects),or just take 5 categories of coco to pre-train this model(assuming these 5 categories can be found in coco)?
If the 5 classes that you want to detect are already in the MS-COCO dataset, there are two possible options
Use the existing object detection model that was pretrained on the MS-COCO dataset. If the detections are satisfactory, great and you can continue using it
If not, you can finetune the model on data containing your classes of interest, basically use the pretrained MS-COCO weights as a starting point for training the network on your data that consists of those 5 classes (the more data, the better)
Now, if the classes that you wish to detect are not in the original MS-COCO dataset, you will be much better off by using the pretrained MS-COCO weights (trained on 80 classes even if they are not relevant to yours) in the early convolutional layers and then train the detection and deeper layers of the network on your dataset. This is because the low-level features (like edges, blobs, etc.) that the network has learned will mostly be common to all classes and will greatly speed up training

Tensorflow Resnet with unbalanced classes

I use Resnet with Tensorflow to train a model with 20 classes.
My problem is that I have 6-7 classes with A LOT of samples, about the same number of classes with a medium number of samples and the rest of classes with few samples. With this given distribution, my model had a too strong tendency to predict classes with a larger sampling over the smaller one. I've tried to balance my classes by reducing the number of samples of my large classes, and it helped to give a place to the smaller classes during the prediction, but now I've reach a point where I can't improve my model over an accuracy of 90%, and I feel like I'm loosing a lot of valuable information by cutting samples in my large classes.
So, before I go buy more samples, I'm wondering if there is a way to work with unbalanced classes with a logic that model become very good to recognize if the larger classes are present or not (because it has so many samples of them that it is extremely capable to recognize their presence), then, if they are absent, to go check which other classes are present. The idea is to use all the samples I have instead of reducing them.
I've already try the weighted class option in Keras/Tensorflow, it didn't help.
Beside the undersampling technic you used so far, there two other ways to deal with imbalanced data:
class weighting
oversampling
Oversampling is the opposite of what you did ie. you will train every sample of under represneted classes multiple times. class weighting is the case when you tell the model how much it should weigh every class samples in training procedure (weight updates in case of a neural network training). Both these cases are supported by tensorflow and you cand find them in official tutorials.

Keras: using VGG16 to detect specific, non-generic item?

I'm learning about using neural networks and object detection, using Python and Keras. My goal is to detect something very specific in an image, let's say a very specific brand / type of car carburetor (part of a car engine).
The tutorials I found so far use the detection of cats and dogs as example, and many of those use a pre-trained VGG16 network to improve performance.
If I want to detect only my specific carburetor, and don't care about anything else in the image, does it make sense to use VGG16.? Is VGG16 only useful when you want to detect many generic items, rather than one specific item.?
Edit: I only want to know if there is a specific object (carburetor) in the image. No need to locate or put a box around it. I have about 1000 images of this specific carburetor for the network to train on.
VGG16 or some other pretrained neural network is primarily used for classification. That means that you can use it to distinguish in what category the image belongs in.
As i understand, what you need is to detect where in an image a carburetor is located. For something like that you need a different, more complicated approach.
You could use
NVIDIA DetectNet,
YOLO,
Segmentation networks such as U-net or
SegNet, etc...
The VGG 16 can be used for that. (Now is it the best? This is an open question without a clear answer)
But you must replace its ending to fit your needs.
While a regular VGG model has about a thousand classes at its end, a cats x dogs VGG has its end changed to have two classes. In your case, you should change its ending to have only one class.
In Keras, you'd have to load the VGG model with the option include_top = False.
And you should then add your own final Dense layers (two or three dense layers at the end), making sure that the last layer has only one unit: Dense(1, activation='sigmoid').
This will work for "detecting" (yes / no results).
But if your goal is "locating/segmentation", then you should create your own version of a U-net or a SegNet, for instance.

How to Combine two classification model in matlab?

I am trying to detect the faces using the Matlab built-in viola jones face detection. Is there anyway that I can combine two classification models like "FrontalFaceCART" and "ProfileFace" into one in order to get a better result?
Thank you.
You can't combine models. That's a non-sense in any classification task since every classifier is different (works differently, i.e. different algorithm behind it, and maybe is also trained differently).
According to the classification model(s) help (which can be found here), your two classifiers work as follows:
FrontalFaceCART is a model composed of weak classifiers, based on classification and regression tree analysis
ProfileFace is composed of weak classifiers, based on a decision stump
More infos can be found in the link provided but you can easily see that their inner behaviour is rather different, so you can't mix them or combine them.
It's like (in Machine Learning) mixing a Support Vector Machine with a K-Nearest Neighbour: the first one uses separating hyperplanes whereas the latter is simply based on distance(s).
You can, however, train several models in parallel (e.g. independently) and choose the model that better suits you (e.g. smaller error rate/higher accuracy): so you basically create as many different classifiers as you like, give them the same training set, evaluate each accuracy (and/or other parameters) and choose the best model.
One option is to make a hierarchical classifier. So in a first step you use the frontal face classifier (assuming that most pictures are frontal faces). If the classifier fails, you try with the profile classifier.
I did that with a dataset of faces and it improved my overall classification accuracy. Furthermore, if you have some a priori information, you can use it. In my case the faces were usually in the middle up part of the picture.
To further improve your performance, without using the two classifiers in MATLAB you are using, you would need to change your technique (and probably your programming language). This is the best method so far: Facenet.

Training image classifier - Neural Network

I would like to train a conv neural network to detect the presence of hands in images.
The difficulty is that:
1/ the images will contain other objects than the hands, just like a picture of a group of people where the hands are just a small part of the image
2/ hands can have many orientations / shapes etc (whether they are open or not , depending on the angle etc..)
I was thinking of training the convnet on a big set of cropped hand images (+ random images without hands) and then apply the classifier on all the subsquares of my images. Is this a good approach?
Are there other examples of complex 2-class convnets / RNNs I could use for inspiration?
Thank you!
I was thinking of training the convnet on a big set of cropped hand
images (+ random images without hands) and then apply the classifier
on all the subsquares of my images. Is this a good approach?
Yes, I believe this would be a good approach. However, note that when you say random, you should perhaps sample it from images where "hands are most likely to appear". It really depends on your use case, and you have to tune the data set to fit what you're doing.
How you should build your data set, would be something like this:
Crop images of hands from a big image.
Sample X number of images from that same image, but not anywhere near the hand/hands.
If however, you should choose to do something like this:
Crop images of hands from a big image.
Download 1 million images (an exaggeration) that definitely don't have hands. For example, deserts, oceans, skies, caves, mountains, basically lots of scenery. And then use this as your "random images without hands", you might get bad results.
The reason for this, is because there is an underlying distribution already. I assume that most of your images could be pictures of groups of friends, having a party at a house, or perhaps the background images would be buildings. Hence, introducing scenery images, could corrupt this distribution, whilst holding the above assumption.
Therefore, be really careful when using "random images"!
on all the subsquares of my images
As to this part of your question, you are essentially running a sliding window on the entire image. Yes, practically, it would work. But if you're looking for performance, this may not be a good idea. You might want to run some segmentation algorithms, to narrow down the search space.
Are there other examples of complex 2-class convnets / RNNs I could
use for inspiration?
I'm not sure what you mean by complex 2-class convnets. I'm not familiar with RNNs, so let me focus on convnets. You can basically define the convolutional net yourself. For example, the convolutional layers size, how many layers, what's your max pooling method, how big is your fully connected layer going to be, etc. The last layer, is basically a softmax layer, where the net decides what class it's going to be. If you have 2 classes, your last layer has 2 nodes. If you have 3, then 3. And so on. So it can range from 2, to perhaps even 1000. I've not heard of convnets that have more than 1000 classes, but I could be ill-informed. I hope this helps!
This seems more a matter of finding good labeled training data than of choosing a network. A neural network can learn the difference between "pictures of hands" and "pictures which incidentally include hands", but it needs some labeled examples to figure out which category an image belongs to.
You might want to take a look at this: http://www.socher.org/index.php/Main/ParsingNaturalScenesAndNaturalLanguageWithRecursiveNeuralNetworks