Is it important to pre-train a model by using meaningless classes? - neural-network

If I want to train an object-detecion model which can detect 5 classes in the picture, Is it important to pre-train this model in a large dataset like coco(80 categories of objects),or just take 5 categories of coco to pre-train this model(assuming these 5 categories can be found in coco)?

If the 5 classes that you want to detect are already in the MS-COCO dataset, there are two possible options
Use the existing object detection model that was pretrained on the MS-COCO dataset. If the detections are satisfactory, great and you can continue using it
If not, you can finetune the model on data containing your classes of interest, basically use the pretrained MS-COCO weights as a starting point for training the network on your data that consists of those 5 classes (the more data, the better)
Now, if the classes that you wish to detect are not in the original MS-COCO dataset, you will be much better off by using the pretrained MS-COCO weights (trained on 80 classes even if they are not relevant to yours) in the early convolutional layers and then train the detection and deeper layers of the network on your dataset. This is because the low-level features (like edges, blobs, etc.) that the network has learned will mostly be common to all classes and will greatly speed up training

Related

Would the division of classes into subclasses increase the predictive accuracy of image classification?

If images in a class differ in image classification, should the class be further subdivided into subclasses in which the images are more similar, or is that not necessary because different features can be learned in image classification?
For example, images should be classified into 3 classes: flowers, cars, and chairs. Is it then okay to pack all the different flowers in one flower class, or would it be better to further subdivide the class into daisies, tulips, dahlias, asters, and so on. If an aster is found, then I also know that it is a flower.
It depends on your problem.
What do you need to predict? If you need only the information whether the input is a flower, a car, or a chair, then there is no need to subdivide it.
However, it is always a good idea to do experiments and see the results.
Train different models using subdivided classes and some with the target as-is and compare the performance.

Tensorflow Resnet with unbalanced classes

I use Resnet with Tensorflow to train a model with 20 classes.
My problem is that I have 6-7 classes with A LOT of samples, about the same number of classes with a medium number of samples and the rest of classes with few samples. With this given distribution, my model had a too strong tendency to predict classes with a larger sampling over the smaller one. I've tried to balance my classes by reducing the number of samples of my large classes, and it helped to give a place to the smaller classes during the prediction, but now I've reach a point where I can't improve my model over an accuracy of 90%, and I feel like I'm loosing a lot of valuable information by cutting samples in my large classes.
So, before I go buy more samples, I'm wondering if there is a way to work with unbalanced classes with a logic that model become very good to recognize if the larger classes are present or not (because it has so many samples of them that it is extremely capable to recognize their presence), then, if they are absent, to go check which other classes are present. The idea is to use all the samples I have instead of reducing them.
I've already try the weighted class option in Keras/Tensorflow, it didn't help.
Beside the undersampling technic you used so far, there two other ways to deal with imbalanced data:
class weighting
oversampling
Oversampling is the opposite of what you did ie. you will train every sample of under represneted classes multiple times. class weighting is the case when you tell the model how much it should weigh every class samples in training procedure (weight updates in case of a neural network training). Both these cases are supported by tensorflow and you cand find them in official tutorials.

What is the difference between a base network and detection network in Deep learning?

I started working recently on object-detection algorithms. And I usually encounter with models having a base network as LeNet or PVA-Net and then a different architecture or model for detection. But I never understood how these base networks and detection network help and how to choose a particular model as base or detection network?
Assume that you are building a model for object detection.
A CNN object detection model (for simplicity, let's choose SSD) may consist of a base network which serves as the feature extraction, while the detection modules get the input features (extracted from the base network) to generate the outputs which contain the object classes, and coordinates of objects detected (including the center (x, y), the height (h) and the width (w) of predicted box).
For the base network, we usually take the pre-trained network such as ResNet, VGG, etc which already trained on large datasets like ImageNet with the hope that the base network would produce a good set of features for detection layer (or at least we don't need to tune so much the parameters of the base network during training which helps the model converges soon).
For the detection modules, it depends on what kind of methods you want to use, for instance, one-stage methods (SSD, RetinaNet, YOLO, so on) or two-stage methods (Faster R-CNN, Masked R-CNN, etc). There is a trade-off between the accuracy and speed among those methods which is an important indicator of which detection module you should pick.

Keras: using VGG16 to detect specific, non-generic item?

I'm learning about using neural networks and object detection, using Python and Keras. My goal is to detect something very specific in an image, let's say a very specific brand / type of car carburetor (part of a car engine).
The tutorials I found so far use the detection of cats and dogs as example, and many of those use a pre-trained VGG16 network to improve performance.
If I want to detect only my specific carburetor, and don't care about anything else in the image, does it make sense to use VGG16.? Is VGG16 only useful when you want to detect many generic items, rather than one specific item.?
Edit: I only want to know if there is a specific object (carburetor) in the image. No need to locate or put a box around it. I have about 1000 images of this specific carburetor for the network to train on.
VGG16 or some other pretrained neural network is primarily used for classification. That means that you can use it to distinguish in what category the image belongs in.
As i understand, what you need is to detect where in an image a carburetor is located. For something like that you need a different, more complicated approach.
You could use
NVIDIA DetectNet,
YOLO,
Segmentation networks such as U-net or
SegNet, etc...
The VGG 16 can be used for that. (Now is it the best? This is an open question without a clear answer)
But you must replace its ending to fit your needs.
While a regular VGG model has about a thousand classes at its end, a cats x dogs VGG has its end changed to have two classes. In your case, you should change its ending to have only one class.
In Keras, you'd have to load the VGG model with the option include_top = False.
And you should then add your own final Dense layers (two or three dense layers at the end), making sure that the last layer has only one unit: Dense(1, activation='sigmoid').
This will work for "detecting" (yes / no results).
But if your goal is "locating/segmentation", then you should create your own version of a U-net or a SegNet, for instance.

How many images should be there in the training and testing phase? LibSVM

I am doing face recognition using PCA and SVM. Using libSVM for SVM implementation in matlab. I am trying to implement one vs all classification. I have a threefold question.
First :
I have 10 images in class 1(of face 1) then class 2 should have 60 images (10 images each of the 6 faces) ?
Second:
Will the accuracy depend on the number of images I take in both the classes? If yes, then
Can accuracy become 100%(unreasonably high) due to large number of images in class two?
Third:
Can a single image be used for testing?
Any help will be deeply appreciated.
You are asking three questions:
(1) EDIT: Yes, exactly as you explained it in the comments. If you have 7 classes you would train 7 classifiers. For each classifiers i you would train for the positive class images of individual i, and for the negative class images of all other individuals.
What you describe is called one-vs-all classification and it is a commonly used method to do multi class classification with a base binary classifier (such as an SVM). Let me also add that there are other methods used to extend binary classifiers to multi-class classification such as one-vs-one and error correcting tournaments.
EDIT #2:
Let me add that one-vs-one classification is already implemented in LIBSVM you really don't have to do anything special. All you need to do is add distinct doubles to each of the classes in the training data (so you could use classes 0, 1, ... 7).
If you really want to do one vs all (also called one vs the rest) you can do use it too. Since it seems you're using MATLAB, there is code (it is not directly implemented in LIBSVM) but the authors of LIBSVM make available code to implement that: direct link to FAQs
(2) The accuracy will depend on the number of images. In ideal conditions you will have many images of all individuals to train with. But you can get in situations such as imbalanced datasets, that can occur for example if you train with a million images of class x and only 2 images of class y, and 2 images of class z, you will have problems because your classifier gets a more detailed view of class x than of the other two classes. To evaluate you will need a full confusion matrix (i.e. how many real objects of class x are classifier as class y and how many real objects of class y are classified as class x and so on for every pair of classes).
(3) Yes, it can.
EDIT #3:
It seems, from the comments of the authors of LIBSVM, that the accuracy of one-vs-one is similar to the accuracy of one-vs-all, with the difference that it is faster to train one-vs-one, and that is the reason why they implement one-vs-one in their system.
To train a multi-class model using LIBSVM you would use svmtrain and invoke it only once. Class 1 are images of individual 1, Class 2 are images of individual 2, ... class 7 are images of individual 7.
To predict, after training your model you would use svmpredict