I have a problem with the function bagoffeatures implemented in matlab computer vision system toolbox.
I'm doing a study of a classification of different types of images, first of all i'm trying to use bagoffeatures with diffenrets custom extractors, i want to divide my work in 2 branches, first extract SURFpoints and extract 3 different types of descriptors, for example SURF BRISK and FREAK, when i use in my custom extractor the next line:
features = extractFeatures(grayImage,multiscaleGridPoints,'Upright',true, 'method', 'SURF');
It allways need to get SURF method to work, but i need to be able to get differents types of descriptors.
Can i use the function bag of features from computer vision system toolbox to do this? or it only support surffeature extractions?
Unfortunately, you cannot use BRISK or FREAK with MATLAB's implementation of bag-of-features, because the bag-of-features algorithm uses K-means clustering to create the "visual words". The problem is that BRISK and FREAK descriptors are binary bit strings, and you cannot cluster them with K-means, which only works on real-valued vectors.
You can certainly use different kinds of interest point detectors with the MATLAB's framework. However, you are limited to descriptors which are real-valued vectors. So SURF and SIFT will work, but BRISK and FREAK will not. If you absolutely must use BRISK and FREAK, you will have implement your own bag of features. There are several methods for clustering binary descriptors, but I do not know how well any of them work in the context of bag-of-features.
Related
First,Is it true to extract surf feature and use them in clustering? I want to cluster similar objects in images ?(each image contain one object)
If yes,how is it possible.
I extract features like this:
I = imread('cameraman.tif');
points = detectSURFFeatures(I);
[features, valid_points] = extractFeatures(I, points);
features is not a vector and is a matrix.Also number of points extracted by 'detectSURFFeatures' differ in different images.
how should features use?
First, you are detecting SURF features, not SIFT features (although they serve the same basic purpose). The reason you get multiple SURF features is that SURF is a local image feature, i.e. it describes only a small portion of the image. In general, multiple features will be detected in a single image. You probably want to find a way to combine these features into a single image descriptor before clustering.
A common method for combining these features is Bag-of-words.
Since it seems you are doing unsupervised learning, you will first need to learn a codebook. A popular method is to use k-means clustering on all the SURF features you extracted in all of your images.
Use these clusters to generate a k-dimensional image descriptor by creating a histogram of "codeword appearances" for each image. In this case there is one "codeword" per cluster. A codeword is said to "appear" each time a SURF feature belongs to its associated cluster.
The histogram of codewords will act as an image descriptor for each image. You can then apply clustering on the image descriptors to find similar images, I recommend either normalizing the image descriptors to have constant norm or using the cosine similarity metric (kmeans(X,k,'Distance','cosine') in MATLAB if you use k-means clustering).
That said, a solution which is likely to work better is to extract a deep feature using a convolutional neural network that was trained on a very large, diverse dataset (like ImageNet) then use those as image descriptors.
So I found the cascade object detector in matlab that use the Viola-Jones algorithm to detect faces. Very easy to use, and works great!
But got a few questions.
The viola-jones method got four stages:
Haar Feature Selection
Creating an Integral Image
Adaboost Training
Cascading Classifiers
In matlab I can use FrontalFace(CART) and FrontalFace(LBP). These are Trained cascade classification model, so they will be part of stage 4 right?
But what is the difference between stage 1 and stage 4 if I use FrontalFace(CART)? Both use Haaar features it says.
Can we say that FrontalFace(CART) and FrontalFace(LBP) are two different ways of detecting faces? Can I compare those two against each other to see which one is better?
Or should I find another method to compare against the viola-jones?
Are there other face detection methods that are easy to implement in matlab?
Found some on the internet (using skin color etc), but Matlab is quite new to me. So I felt that those codes where abit to complicated for me.
The main difference is that FrontalFace(CART) and FrontalFace(LBP) have been trained on different data sets. Also, from the name, I am guessing that FrontalFace(LBP) uses LBP feaures instead of Haar.
The original Viola-Jones algorithm used the Haar features. However, it has later been extended to use other types of features. vision.CascadeObjectDetector supports Haar, LBP, and HOG features.
To compare which one is better, you would need some ground truth images, which are images with faces labeled by hand. I am sure you can find a benchmark data set on the web. Alternatively, you can label you own images using trainingImageLabeler app.
Also, if you are not happy with the accuracy of the classifiers that come with vision.CascadeObjectDetctor, you can train your own using the trainCascadeObjectDetector function.
I am studying Support Vector Machines (SVM) by reading a lot of material. However, it seems that most of it focuses on how to classify the input 2D data by mapping it using several kernels such as linear, polynomial, RBF / Gaussian, etc.
My first question is, can SVM handle high-dimensional (n-D) input data?
According to what I found, the answer is YES!
If my understanding is correct, n-D input data will be
constructed in Hilbert hyperspace, then those data will be
simplified by using some approaches (such as PCA ?) to combine it together / project it back to 2D plane, so that
the kernel methods can map it into an appropriate shape such a line or curve can separate it into distinguish groups.
It means most of the guides / tutorials focus on step (3). But some toolboxes I've checked cannot plot if the input data greater than 2D. How can the data after be projected to 2D?
If there is no projection of data, how can they classify it?
My second question is: is my understanding correct?
My first question is, does SVM can handle high-dimensional (n-D) input data?
Yes. I have dealt with data where n > 2500 when using LIBSVM software: http://www.csie.ntu.edu.tw/~cjlin/libsvm/. I used linear and RBF kernels.
My second question is, does it correct my understanding?
I'm not entirely sure on what you mean here, so I'll try to comment on what you said most recently. I believe your intuition is generally correct. Data is "constructed" in some n-dimensional space, and a hyperplane of dimension n-1 is used to classify the data into two groups. However, by using kernel methods, it's possible to generate this information using linear methods and not consume all the memory of your computer.
I'm not sure if you've seen this already, but if you haven't, you may be interested in some of the information in this paper: http://pyml.sourceforge.net/doc/howto.pdf. I've copied and pasted a part of the text that may appeal to your thoughts:
A kernel method is an algorithm that depends on the data only through dot-products. When this is the case, the dot product can be replaced by a kernel function which computes a dot product in some possibly high dimensional feature space. This has two advantages: First, the ability to generate non-linear decision boundaries using methods designed for linear classifiers. Second, the use of kernel functions allows the user to apply a classifier to data that have no obvious fixed-dimensional vector space representation. The prime example of such data in bioinformatics are sequence, either DNA or protein, and protein structure.
It would also help if you could explain what "guides" you are referring to. I don't think I've ever had to project data on a 2-D plane before, and it doesn't make sense to do so anyway for data with a ridiculous amount of dimensions (or "features" as it is called in LIBSVM). Using selected kernel methods should be enough to classify such data.
I'm trying the assess the correctness of my SURF descriptor implementation with the de facto standard framework by Mikolajczyk et. al. I'm using OpenCV to detect and describe SURF features, and use the same feature positions as input to my descriptor implementation.
To evaluate descriptor performance, the framework requires to evaluate detector repeatability first. Unfortunately, the repeatability test expects a list of feature positions along with ellipse parameters defining the size and orientation of an image region around each feature. However, OpenCV's SURF detector only provides feature position, scale and orientation.
The related paper proposes to compute those ellipse parameters iteratively from the eigenvalues of the second moment matrix. Is this the only way? As far as I can see, this would require some fiddling with OpenCV. Is there no way to compute those ellipse parameters afterwards (e.g. in Matlab) from the feature list and the input image?
Has anyone ever worked with this framework and could assist me with some insights or pointers?
You can use the file evaluation.cpp from OpenCV. Is in the directory OpenCV/modules/features2d/src. In this file you could use the class "EllipticKeyPoint", this class has one function to convert "KeyPoint" to "ElipticKeyPoint"
Honestly I never worked with this framework., but I think you should see this paper about a performance evaluation of local descriptors.
I'll be appreciated if you help me to create a feature vector of an simple object using keypoints. For now, I use ETH-80 dataset, objects have an almost blue background and pictures are took from different views. Like this:
After creating a feature vector, I want to train a neural network with this vector and use that neural network to recognize an input image of an object. I don't want make it complex, input images will be as simple as train images.
I asked similar questions before, some one suggested using average value of 20x20 neighborhood of keypoints. I tried it, It seems it's not working with ETH-80 images, because of different views of images. It's why I asked another question.
SURF or SIFT. Look for interest point detectors. A MATLAB SIFT implementation is freely available.
Update: Object Recognition from Local Scale-Invariant Features
SIFT and SURF features consist of two parts, the detector and the descriptor. The detector finds the point in some n-dimensional space (4D for SIFT), the descriptor is used to robustly describe the surroundings of said points. The latter is increasingly used for image categorization and identification in what is commonly known as the "bag of word" or "visual words" approach. In the most simple form, one can collect all data from all descriptors from all images and cluster them, for example using k-means. Every original image then has descriptors that contribute to a number of clusters. The centroids of these clusters, i.e. the visual words, can be used as a new descriptor for the image. The VLfeat website contains a nice demo of this approach, classifying the caltech 101 dataset:
http://www.vlfeat.org/applications/apps.html#apps.caltech-101