So I found the cascade object detector in matlab that use the Viola-Jones algorithm to detect faces. Very easy to use, and works great!
But got a few questions.
The viola-jones method got four stages:
Haar Feature Selection
Creating an Integral Image
Adaboost Training
Cascading Classifiers
In matlab I can use FrontalFace(CART) and FrontalFace(LBP). These are Trained cascade classification model, so they will be part of stage 4 right?
But what is the difference between stage 1 and stage 4 if I use FrontalFace(CART)? Both use Haaar features it says.
Can we say that FrontalFace(CART) and FrontalFace(LBP) are two different ways of detecting faces? Can I compare those two against each other to see which one is better?
Or should I find another method to compare against the viola-jones?
Are there other face detection methods that are easy to implement in matlab?
Found some on the internet (using skin color etc), but Matlab is quite new to me. So I felt that those codes where abit to complicated for me.
The main difference is that FrontalFace(CART) and FrontalFace(LBP) have been trained on different data sets. Also, from the name, I am guessing that FrontalFace(LBP) uses LBP feaures instead of Haar.
The original Viola-Jones algorithm used the Haar features. However, it has later been extended to use other types of features. vision.CascadeObjectDetector supports Haar, LBP, and HOG features.
To compare which one is better, you would need some ground truth images, which are images with faces labeled by hand. I am sure you can find a benchmark data set on the web. Alternatively, you can label you own images using trainingImageLabeler app.
Also, if you are not happy with the accuracy of the classifiers that come with vision.CascadeObjectDetctor, you can train your own using the trainCascadeObjectDetector function.
Related
I am trying to detect the faces using the Matlab built-in viola jones face detection. Is there anyway that I can combine two classification models like "FrontalFaceCART" and "ProfileFace" into one in order to get a better result?
Thank you.
You can't combine models. That's a non-sense in any classification task since every classifier is different (works differently, i.e. different algorithm behind it, and maybe is also trained differently).
According to the classification model(s) help (which can be found here), your two classifiers work as follows:
FrontalFaceCART is a model composed of weak classifiers, based on classification and regression tree analysis
ProfileFace is composed of weak classifiers, based on a decision stump
More infos can be found in the link provided but you can easily see that their inner behaviour is rather different, so you can't mix them or combine them.
It's like (in Machine Learning) mixing a Support Vector Machine with a K-Nearest Neighbour: the first one uses separating hyperplanes whereas the latter is simply based on distance(s).
You can, however, train several models in parallel (e.g. independently) and choose the model that better suits you (e.g. smaller error rate/higher accuracy): so you basically create as many different classifiers as you like, give them the same training set, evaluate each accuracy (and/or other parameters) and choose the best model.
One option is to make a hierarchical classifier. So in a first step you use the frontal face classifier (assuming that most pictures are frontal faces). If the classifier fails, you try with the profile classifier.
I did that with a dataset of faces and it improved my overall classification accuracy. Furthermore, if you have some a priori information, you can use it. In my case the faces were usually in the middle up part of the picture.
To further improve your performance, without using the two classifiers in MATLAB you are using, you would need to change your technique (and probably your programming language). This is the best method so far: Facenet.
GOOD MORNING COLLEAGUES!
I am very into train a new model from my own data set of faces!
I have found no information about this topic, then I hope my information could help people and I can get some answers as well.
I will try to explain the steps I have needed to do to train my own model and later on some questions...
I have download the Latent code from: http://cs.brown.edu/~pff/latent-release4/
I have download the PASCAL VOC 2008 code (devkit) from: http://host.robots.ox.ac.uk/pascal/VOC/voc2008/index.html
I have emulate the structure of files/folders of the VOC PASCAL but in my own data set:
Annotations. I have created a .xml where I have defined a object, face, (in each image I only have one face). I didn't define difficulties or poses...
JPEGImages where I have stored all the images
ImageSets where I have defined three files:
test.txt, where I wrote the file name of my positive samples
train.txt, where I wrote the file name of my negative samples
trainval.txt, where I wrote the file name of my positive samples (exactly the same file than test.txt).
I have change some things in globals.m and VOCinit.m (to tell the algorithm the path and the location of some files...)
Then I run the training with the command: pascal('face', 1);
Following these steps I have achieved that the training run completely and doesn't fail and I get my own model BUT I have some doubts...
Can you see anything weird in my explanation? Could it work?
Must the files test.txt/trainval.txt be equal? Why... What does it mean?
Do I have to choose the number of parts I want in the model INSIDE the function?
Please, you imagine I have two kind of samples (frontal faces and side faces) and I want to detect both... How can I address this issue? I thought I have to train a model with two components... but How can I tell to the training code which are frontal or side samples?? In the annotations with the label pose?? (I don't think so...) Are there other way to handle this purpose?
Thank you for your time!!
I hope you can solve my doubts :)
I think test.txt should contain samples (images) that will be used to estimate how good the system is after learning the faces. However, trainval.txt is used during the learning stage (training) to fine-tune the parameters of the model; it is an essential part of supervised learning.
Also, it is very hard to have one single SVM to classify faces that are both frontal and sideways. Here is my suggestion:
Train one SVM to detect if the input image is a frontal face or a sideways face. Call this something like SVM-0.
Train another SVM for frontal faces. This SVM will classify all your individuals. Note, however, that SVM is usually a binary classifier, so make sure you choose the right SVM, one that as a multiclass architecture. Call this SVM-F.
Tran a final SVM for sideways faces. Again, use a multiclass SVM. Call it SVM-S.
Present the input image to SVM-0 and if it detects it is a frontal face, present the input again to SVM-F; otherwise, give the input to SVM-S.
In my experience, you should expect very low performance in SVM-S. It is a hard problem to solve. But frontal faces is not a big deal, unless you are working with faces that vary in pose, illumination, and expression (PIE). Face recognition is affected greatly with PIE variations in the images.
I recommend you this website, it contains very good information and tutorials for starters, with or without experience.
I am trying to implement feature matching on multiple images. The idea is to track some features in an image data set. I am using mexopenCV on Matlab and the basics of the algorithm are:
1. Feature Detection using SIFT or SURF
2. Feature Description using SIFT or SURF
3. Feature matching using Flann matcher or Brute Force
4. Filtering matches using RANSAC
My problem is the following:
Using a single object in a scene, all of the tracked features are on that object. However, when I add another object to the scene, the tracked features are only existing on the new object and there are no features on the first object. Is there an explanation for why this is happening ?
Image 1
Image 2
P.S: The features on each image are the ones that are tracked on all the data set (8 images).
I guess I found the reason for finding features only on one object. As I mentioned in a comment, RANSAC will try to find the best model when matching the features. Since we have a change in depth for the two objects, we basically have two models to befitted. I searched for multi-modal fitting and found that there's Sequential RANSAC and Multi-RANSAC that solves this. I have tried with sequential RANSAC by setting the number of models to 2 and got a nice result.
I have created a cascade.xml for detecting face images using the opencv_traincascade utility. I am using LBP or HOG feature based cascades since they are much faster. And I do all my testing on Matlab using vision.cascadeObjectDetector. But I am unsure if Matlab is capable of understanding and calculating LBP/ HOG features for a given cascade.xml file.
Is this the correct approach for testing a cascade detector? If not, what platform should I be using for testing?
Yes, vision.cascadeObjectDetector supports LBP and HOG, as well as Haar features, as of version R2013a.
Furthermore you can now train your detector using trainCascadeObjectDetector function, which is easier to use than opencv_traincascade. There is also a trainingImageLabeler app, which gives you a nice GUI to label objects of interest in your images.
I'll be appreciated if you help me to create a feature vector of an simple object using keypoints. For now, I use ETH-80 dataset, objects have an almost blue background and pictures are took from different views. Like this:
After creating a feature vector, I want to train a neural network with this vector and use that neural network to recognize an input image of an object. I don't want make it complex, input images will be as simple as train images.
I asked similar questions before, some one suggested using average value of 20x20 neighborhood of keypoints. I tried it, It seems it's not working with ETH-80 images, because of different views of images. It's why I asked another question.
SURF or SIFT. Look for interest point detectors. A MATLAB SIFT implementation is freely available.
Update: Object Recognition from Local Scale-Invariant Features
SIFT and SURF features consist of two parts, the detector and the descriptor. The detector finds the point in some n-dimensional space (4D for SIFT), the descriptor is used to robustly describe the surroundings of said points. The latter is increasingly used for image categorization and identification in what is commonly known as the "bag of word" or "visual words" approach. In the most simple form, one can collect all data from all descriptors from all images and cluster them, for example using k-means. Every original image then has descriptors that contribute to a number of clusters. The centroids of these clusters, i.e. the visual words, can be used as a new descriptor for the image. The VLfeat website contains a nice demo of this approach, classifying the caltech 101 dataset:
http://www.vlfeat.org/applications/apps.html#apps.caltech-101