HOG Feature Implementation with SVM in MATLAB - matlab

I would like to do classification based on HOG Features using SVM.
I understand that HOG features is the combination of all the histograms in every cell (i.e. it becomes one aggregate histogram).
I extract HOG features using MATLAB code in this page for the Dalal-Triggs variant.
For example, I have grayscale image with size of 384 x 512, then I extracted the HOG features at 9 orientations and a cell-size of 8. By doing this, I get 48 x 64 x 36 features.
How can I make this a histogram and use it toward a SVM classifier?
Because for example, I'll have 7 classes of images and I want to do training (total images would be 700 for training) and then classify new data based on the model generated from the training phase.
I read that for multiclass, we can train our SVM with ONE vs ALL, that means that I have to train 7 classifier for my 7 classes.
So for the 1st train, I'll consider the 1st class to be labelled with +1 and the reast class will be 0.
And the 2nd train, I'll consider the 2nd class to be labelled with +1 and the reast class will be 0. And so on..
For example, I have classes of colors :
Red, green, blue, yellow, white, black and pink.
So for the 1st training, I make only 2 binary which is red and not red..
For the 2nd training, I make label green and not green..
Is it like that??
The syntax to train SVM is:
SVMStruct = svmtrain(Training,Group)
But in this case, I'll have 7 SVMStruct..
The syntax to classify / testing
Group = svmclassify(SVMStruct,Sample)
how to declare 7 SVMStruct in here??
Is that right??
Or there are another concept or syntaks that I have to know??
And for training, I'll have 48 x 64 x 36 features, howw I can train these features in SVM??
because as what I read, they just have 1xN matrix of features..
Please help me...

HOG and SVM is most successful algorithm for object detection. To apply this method, yes indeed you must have two different training datasets before they are fed into SVM classifier. For instance, you want to detect an apple, so you must have two training dataset, positive images is the one contains an apple in image and negative images is the one contains no apples in image. Then, you extract the features from both training datasets (positive and negative) into HOG descriptor separately and also label it separately (i.e 1 is for positive, 0 for negative). Afterwards, combine the features vector from positive and negative and feed them to SVM Classifier.
You can use SVM Light or LibSVM, which is easier and user friendly for beginner.

The Computer Vision System Toolbox for MATLAB includes extractHOGFeatures function, and the Statistics Toolbox includes SVM. Here's an example of how to classify images using HOG and SVM.

1. How can I make this a histogram and use it toward a SVM classifier?
One distinction I want to make is that you already have a 'histogram' of oriented gradient features. You now need to give these features as input to the SVM. It is weird to assign labels to each of these features because one HoG feature might turn up in another image labelled differently.
In practice what is done is to make another histogram called a bag of words from these HoG features and give them to the SVM as input. The intuition is if two features are very similar you would want one representation for both these HoG features. This reduces the variance in the input data. Now we make this new histogram for each image.
A bag of words is created in the following way:
Cluster all the HoG features into 'words'. Say you have 1000 of
these words.
Go through all HoG features and assign HoG feature to a word if it
is closest(euclidean distance) to that word among all words in bag.
Count how many words are assigned to each word in bag. This is the
1XN(N = number of words in bag) feature histogram which will be given
as input to the SVM after labeling.
2. How to carry out multi-class classification using a SVM?
If you will retrain the SVM you will get another model. There are two ways how you might do multiclass SVM using SVMTrain
1) One vs One SVM
Train for each label class with input in the following way:
Example for model 1 input will be
bag of words features for Image 1, RED
bag of words features Image 2, GREEN
Example for model 1 input will be
bag of words features for Image 3, YELLOW
bag of words features Image 2, GREEN
The above is done for each pair of label classes. You will have N(N-1)/2 models. Now you can count the number of votes for each class from the N(N-1)/2 models to find which label to assign.
2) One vs All SVM
Train for each label class with input in the following way:
Example for model 1 input will be
bag of words features for Image 1, RED
bag of words features for Image 2, NOT RED
Example for model 1 input will be
bag of words features for Image 2, GREEN
bag of words features for Image 1, NOT GREEN
The above is done each label class. You will have N models. Now you can count the number of votes for each class from the N models to find which label to assign.
Read more on category-level classification here: http://www.di.ens.fr/willow/events/cvml2013/materials/slides/tuesday/Tue_bof_summer_school_paris_2013.pdf

Related

naïve Bayes classifier

I am working on a naïve Bayes classifier and would like to classify some data using MATLAB. In the example of Fisher's Iris Data as given in MATLAB (see here for details), they consider only the first 2 variables (Sepal Length & Width). I would like to proceed with classification with more features such as Petal Length and Petal Width.
In the documentation of this Fisher Iris example it is mentioned that "You can use the two columns containing sepal measurements." I want to take 3 or 4 columns means 4 properties with 2 classes. I want to plot the classes on x-axis and y-axis. How I can do this?
You can plot things in 3D, and use color as your fourth dimension. However this will not be readable at all especially with large datasets.
I recommend you plot combinations of 2D because you will need to use color encoding for your class type normally.
The MATLAB machine learning app can be very helpful to you.

How to extract memnet heat maps with the caffe model?

I want to extract both memorability score and memorability heat maps by using the available memnet caffemodel by Khosla et al. at link
Looking at the prototxt model, I can understand that the final inner-product output should be the memorability score, but how should I obtain the memorability map for a given input image? Here some examples.
Thanks in advance
As described in their paper [1], the CNN (MemNet) outputs a single, real-valued output for the memorability. So, the network they made publicly available, calculates this single memorability score, given an input image - and not a heatmap.
In section 5 of the paper, they describe how to use this trained CNN to predict a memorability heatmap:
To generate memorability maps, we simply scale up the image and apply MemNet to overlapping regions of the image. We do this for multiple scales of the image and average the resulting memorability maps.
Let's consider the two important steps here:
Problem 1: Make the CNN work with any input size.
To make the CNN work on images of any arbitrary size, they use the method presented in [2].
While convolutional layers can be applied to images of arbitrary size - resulting in smaller or larger outputs - the inner product layers have a fixed input and output size.
To make an inner product layer work with any input size, you apply it just like a convolutional kernel. For an FC layer with 4096 outputs, you interpret it as a 1x1 convolution with 4096 feature maps.
To do that in Caffe, you can directly follow the Net Surgery tutorial. You create a new .prototxt file, where you replace the InnerProduct layers with Convolution layers. Now, Caffe won't recognize the weights in the .caffemodel anymore, as the layer types don't match anymore. So, you load the old net and its parameters into Python, load the new net, and assign the old parameters to the new net and save it as a new .caffemodel file.
Now, we can run images of any dimensions (larger or equal than 227x227) through the network.
Problem 2: Generate the heat map
As explained in the paper [1], you apply this fully-convolutional network from Problem 1 to the same image at different scales. The MemNet is a re-trained AlexNet, so the default input dimension is 227x227. They mention that a 451x451 input gives a 8x8 output, which implies a stride of 28 for applying the layers. So a simple example could be:
Scale 1: 227x227 → 1x1. (I guess they definitely use this scale.)
Scale 2: 283x283 → 2x2. (Wild guess)
Scale 3: 339x339 → 4x4. (Wild guess)
Scale 4: 451x451 → 8x8. (This scale is mentioned in the paper.)
The results will look like this:
So, you'll just average these outputs to get your final 8x8 heatmap. From the image above, it should be clear how to average the different-scale outputs: you'll have to upsample the low-res ones to 8x8, and average then.
From the paper, I assume that they use very high-res scales, so their heatmap will be around the same size as the image initially was. They write that it takes 1s on a "normal" GPU. This is a quite long time, which also indicates that they probably upsample the input images quite to quite high dimensions.
Bibliography:
[1]: A. Khosla, A. S. Raju, A. Torralba, and A. Oliva, "Understanding and Predicting Image Memorability at a Large Scale", in: ICCV, 2015. [PDF]
[2]: J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation", in: CVPR, 2015. [PDF]

Regarding Assignment of Input and Target in Neural Network

I am designing an algorithm for OCR using a neural network. I have 100 images([40x20] matrix) of each character so my input should be 2600x800. I have some question regarding the inputs and target.
1) is my input correct? and can all the 2600 images used in random order?
2) what should be the target? do I have to define the target for all 2600 inputs?
3) as the target for the same character is single, what is the final target vector?
(26x800) or (2600x800)?
Your input should be correct. You have (I am guessing) 26 characters and 100 images of size 800 for each, therefore the matrix looks good. As a side note, that looks pretty big input size, you may want to consider doing PCA and using the eigenvalues for training or just reduce the size of the images. I have been able to train NN with 10x10 images, but bigger== more difficult. Try, and if it doesn't work try doing PCA.
(and 3) Of course, if you want to train a NN you need to give it inputs with outputs, how else re you going to train it? You ourput should be of size 26x1 for each of the images, therefore the output for training should be 2600x26. In each of the outputs you should have 1 for the character index it belongs and zero in the rest.

How to create feature vector for neural network(MATLAB)

I'm trying to use neural network(multilayered NN) to help me to classify input image into its respective class(3 classes).
I have done as below:
(1)read input image(image)..
(2)apply canny edge detector(image=>edgeimage).
(3)label region in edge image(length(B)=20,no.of region=20)
(4)compute feature.
As for computing the image features, for example, in each region, i have computed circularity and convexity which result in 20 feature value for circularity and 20 feature value for convexity. The number of region labeled can also be varied from image to image.
In my current understanding now, the number of input neuron should be in equal to the number of features used. So..number of input neuron is equal to 2(for each circularity and convexity) or it should be 40(for each feature value)..?
And I want to know too, how to create the right feature vector-input and target for my problem.
thanks.

KNN Classifier using cross validation

I am trying to implement KNN classifier using the cross validation approach where I have different images of a certain character for training(e.g 5 images) and another two for testing. Now I get the idea of the cross validation by simply choosing the K with the least error value when training & then using it with the test data to find how accurate my results are.
My question is how do I train images in matlab to get my K value? Do I compare them and try to find mismatch or what?!
Any help would be really appreciated.
First of you need to define your task precisely. F.ex Given an image I in R^(MxN) we wish to classify I as an image containing faces or an image without faces.
I often work with pixel classifiers, where the task is something like: For an image I decide if each pixel is a face pixel or a non-face pixel.
An important part of defining the task is to make a hypotheses that can be used as basis for training a classifier. F.ex We believe that the distribution of pixel intensities can be used to discriminate images of faces from images not containing faces.
Then you need to select some features that define your image. This can be done in many ways and you should search for what other people do when they analyse the same type of images you are working with.
One widely used method in pixel classification is to use pixel intensity values and do a multi-scale analysis of the image. The idea in multi-scale analysis is that different structures are most evident at different level of blurring called scales. As an illustration consider an image of a tree. Without blurring we notice the fine structure, such as small branches and leafs. When we blur the image we notice the trunk and major branches. This is often used as part of segmentation methods.
When you know your task and the features, you can train a classifier. If you use kNN and cross-validation to find the best k, you should split you dataset in train/testing and then split the training set in train/validate sets. You then train using the reduced training set and use the validation set to decide which k is the best. In the case of binary classification e.g face vs non-face the error rate is often used as a measure of performance.
Finally you use the parameters to train the classifier on the full dataset and estimate its performance on the test set.
A classification example: With or without milk?
As a full example, consider images of a cup of coffee taken from above so it shows the rim of the cup surrounding a brownly colored disk. Further assume that all images are scaled and cropped so the diameter of the disk is the same and dimensions of the image are the same. To simplify the task, we convert the color image to grayscale and scale the pixel intensities to the range [0,1].
We want to train a classifier so it can distinguish coffee with milk from coffee without milk. From inspection of histograms of some of the coffee images, we see that each image has two "bumps" in the histogram that are clearly separated. We believe that these bumps correspond to foreground (coffee) and background. Now we make the hypothesis that the average intensity of the foreground can be used to distinguish between coffee+milk/coffee.
To find the foreground pixels we observe that because the foreground/background ratio is the same (by design) we can just find the intensity value that gives us that ratio for each image. Then we calculate the average intensity of the foreground pixels and use this value as a feature for each image.
If we have N images that we have manually labeled, we split this into training and test set. We then calculate the average foreground intensity for each image in the training set, giving us a set of (average foreground intensity, label) values. We want to use kNN where an image is assigned the same class as the majority class of the k closest images. We measure the distance as the absolute value of the difference in average foreground pixel intensity.
We search for the optimal k with cross validation. We use 2-fold cross validation (aka holdout) to find the best k. We test k = {1,3,5} and select the k that gives the least prediction error on the validation set.