Head detection using HOG and SVM - matlab

I am currently trying to detect heads in a sequence of real-footage images and am using HOG feature descriptor and SVM as classifier.
Currently I am using Dalal's HOG implementation code in MATLAB found in this link:
http://www.mathworks.com/matlabcentral/fileexchange/46408-histogram-of-oriented-gradients--hog--code-using-matlab
Currently I am using libSVM MATLAB version found in this link:
http://www.csie.ntu.edu.tw/~cjlin/libsvm/
I prepared 350 positive training images and 1243 negative training images.
The hog feature vectors extracted from the training images are converted to libsvmFormat and inputted in the libsvm training method, to obtain a model. The hog vector length is that of 1764.
Regarding the libSVM I chose these as parameters:
-s 0 (i.e. C-SVC)
-c 1 (i.e. cost = 1)
-t 2 (i.e. kernel = RBF)
-g 3 (i.e. gamma = 3 (this is for kernel))
Regarding the HOG version, I left the cell, bin & block parameters as they were in the implementation shown in the link above.
I am using a scanning window of size 128x128 and 256x256 to scan through the whole image to detect possible heads. At each window, the hog feature vector is extracted for each image and is inputted in libsvm predict, to test whether it should be classified as a head or not.
However, after doing all the above, I have a numerous amount of false negatives and can't figure out what I am doing wrong.
Can someone experience please offer some advice on what is possibly wrong? I really need to figure this out please. Much appreciated!

Related

Support Vector Machine in Matlab

I need some help. I got *.mat Matlab file after extract the features from 2D static image. The extraction process was used 2D Haar Wavelet in Matlab Apps.
The problems are: 1. How I want to use the *.mat Matlab file as an input to the SVM program in Matlab?
Addition information: i. The image is the iris image.
ii. Link for the screen capture image of the example output after the extraction process
Based on the image, what is suitable/relevant data to be used in SVM in order to classify the image into 2 classes such as class 1 = Healthy Iris or class 2 = unhealthy iris. Or maybe somebody already got the sample Matlab code that similar with this case study, hope you willing to share the code.
TQ in advanced for the help.

How to ensure consistency in SIFT features?

I am working with a classification algorithm that requires the size of the feature vector of all samples in training and testing to be the same.
I am also to use the SIFT feature extractor. This is causing problems as the feature vector of every image is coming up as a different sized matrix. I know that SIFT detects variable keypoints in each image, but is there a way to ensure that the size of the SIFT features is consistent so that I do not get a dimension mismatch error.
I have tried rootSIFT as a workaround:
[~, features] = vl_sift(single(images{i}));
double_features = double(features);
root_it = sqrt( double_features/sum(double_features) ); %root-sift
feats{i} = root_it;
This gives me a consistent 128 x 1 vector for every image, but it is not working for me as the size of each vector is now very small and I am getting a lot of NaN in my classification result.
Is there any way to solve this?
Using SIFT there are 2 steps you need to perform in general.
Extract SIFT features. These points (first output argument of
size NPx2 (x,y) of your function) are scale invariant, and should in
theory be present in each different image of the same object. This
is not completely true. Often points are unique to each frame
(image). These points are described by 128 descriptors each (second
argument of your function).
Match points. Each time you compute features of a different image the amount of points computed is different! Lots of them should be the same point as in the previous image, but lots of them WON'T. You will have new points and old points may not be present any more. This is why you should perform a feature matching step, to link those points in different images. usually this is made by knn matching or RANSAC. You can Google how to perform this task and you'll have tons of examples.
After the second step, you should have a fixed amount of points for the whole set of images (considering they are images of the same object). The amount of points will be significantly smaller than in each single image (sometimes 30~ times less amount of points). Then do whatever you want with them!
Hint for matching: http://www.vlfeat.org/matlab/vl_ubcmatch.html
UPDATE:
You seem to be trying to train some kind of OCR. You would need to probably match SIFT features independently for each character.
How to use vl_ubcmatch:
[~, features1] = vl_sift(I1);
[~, features2] = vl_sift(I2);
matches=vl_ubcmatch(features1,features2)
You can apply a dense SIFT to the image. This way you have more control over from where you get the feature descriptors. I haven't used vlfeat, but looking at the documentation I see there's a function to extract dense SIFT features called vl_dsift. With vl_sift, I see there's a way to bypass the detector and extract the descriptors from points of your choice using the 'frames' option. Either way it seems you can get a fixed number of descriptors.
If you are using images of the same size, dense SIFT or the frames option is okay. There's a another approach you can take and it's called the bag-of-features model (similar to bag-of-words model) in which you cluster the features that you extracted from images to generate codewords and feed them into a classifier.

PCA on Sift desciptors and Fisher Vectors

I was reading this particular paper http://www.robots.ox.ac.uk/~vgg/publications/2011/Chatfield11/chatfield11.pdf and I find the Fisher Vector with GMM vocabulary approach very interesting and I would like to test it myself.
However, it is totally unclear (to me) how do they apply PCA dimensionality reduction on the data. I mean, do they calculate Feature Space and once it is calculated they perform PCA on it? Or do they just perform PCA on every image after SIFT is calculated and then they create feature space?
Is this supposed to be done for both training test sets? To me it's an 'obviously yes' answer, however it is not clear.
I was thinking of creating the feature space from training set and then run PCA on it. Then, I could use that PCA coefficient from training set to reduce each image's sift descriptor that is going to be encoded into Fisher Vector for later classification, whether it is a test or a train image.
EDIT 1;
Simplistic example:
[coef , reduced_feat_space]= pca(Feat_Space','NumComponents', 80);
and then (for both test and train images)
reduced_test_img = test_img * coef; (And then choose the first 80 dimensions of the reduced_test_img)
What do you think? Cheers
It looks to me like they do SIFT first and then do PCA. the article states in section 2.1 "The local descriptors are fixed in all experiments to be SIFT descriptors..."
also in the introduction section "the following three steps:(i) extraction
of local image features (e.g., SIFT descriptors), (ii) encoding of the local features in an image descriptor (e.g., a histogram of the quantized local features), and (iii) classification ... Recently several authors have focused on improving the second component" so it looks to me that the dimensionality reduction occurs after SIFT and the paper is simply talking about a few different methods of doing this, and the performance of each
I would also guess (as you did) that you would have to run it on both sets of images. Otherwise your would be using two different metrics to classify the images it really is like comparing apples to oranges. Comparing a reduced dimensional representation to the full one (even for the same exact image) will show some variation. In fact that is the whole premise of PCA, you are giving up some smaller features (usually) for computational efficiency. The real question with PCA or any dimensionality reduction algorithm is how much information can I give up and still reliably classify/segment different data sets
And as a last point, you would have to treat both images the same way, because your end goal is to use the Fisher Feature Vector for classification as either test or training. Now imagine you decided training images dont get PCA and test images do. Now I give you some image X, what would you do with it? How could you treat one set of images differently from another BEFORE you've classified them? Using the same technique on both sets means you'd process my image X then decide where to put it.
Anyway, I hope that helped and wasn't to rant-like. Good Luck :-)

Histogram of Oriented Gradients as feature vectors with vlfeat/libsvm

I am quite newbie on vlfeat library for computer vision and I have problems dealing with it. What i am trying to do is using the Histograms of Oriented Gradients (HOG) as feature vectors to classify in LIBSVM images with different dimensions.
The first issue i am dealing is the fact that vl_hog returns me a HOG matrix, not a vector. This is not a real issue because I can vectorize this matrix as it follows:
hog = vl_hog(image,cellSize);
features=hog(:);
The second problem is what's freaking me out. Because the images have different dimensions, the feature vectors also have different dimensions, so it's impossible to feed libsvm with them, or i'm wrong? can I solve this in an easier way? did i miss something?
You need to create a global representation of your local features so that you can feed your data to SVMs. One of the most popular approaches for this task is bag-of-words(features). vlfeat has an excellent demo/example for this. You can check this code from vlfeat website.
For your particular case, you need arrange your training/testing data in Caltech-101 like data directories:
Letter 1
Image 1
Image 2
Image 3
Image 4
...
Letter 2
...
Letter 3
...
Then you need to adjust following configuration settings for your case:
conf.numTrain = 15 ;
conf.numTest = 15 ;
conf.numClasses = 102 ;
This demo uses SIFT as local features, but you can change it to HOG afterwards.

How to train SVM in matlab for character recognition?

Im a final year student working on my major project. My project is basically to extract text from a natural scene, and recognize it and then display them in a notepad etc..
I have already extracted the text form the images and have also obtained 85 features for each character which is extracted.
How ever, for the recognition part, I have no clue as of how to train or use SVM(support vector machines) in matlab so I can get a match.
Please help me out as this is turning out to be painstakingly difficult
If you're happy with using an existing SVM implementation, then you should either use the bioinformatics toolbox svmtrain, or download the Matlab version of libsvm. If you want to implement an SVM yourself then you should understand SVM theory and you can use quadprog to solve the appropriate optimisation problem.
With your data, you will need to have an N-by-85 feature matrix, where N is a number of characters, and an N-by-1 array of 'true labels' which you provide manually. Depending on which tool you use to train an SVM, the paramaters to svmtrain are slightly different - check the documentation.
If you want to evaluate your SVM to show that it works, you may need to organise your data such that you can estimate the generalization error of classifier - see cross-validation