naïve Bayes classifier - matlab

I am working on a naïve Bayes classifier and would like to classify some data using MATLAB. In the example of Fisher's Iris Data as given in MATLAB (see here for details), they consider only the first 2 variables (Sepal Length & Width). I would like to proceed with classification with more features such as Petal Length and Petal Width.
In the documentation of this Fisher Iris example it is mentioned that "You can use the two columns containing sepal measurements." I want to take 3 or 4 columns means 4 properties with 2 classes. I want to plot the classes on x-axis and y-axis. How I can do this?

You can plot things in 3D, and use color as your fourth dimension. However this will not be readable at all especially with large datasets.
I recommend you plot combinations of 2D because you will need to use color encoding for your class type normally.
The MATLAB machine learning app can be very helpful to you.

Related

How to give label for cluster from GMM iteration?

I read the concept of GMM from Understanding concept of Gaussian Mixture Models. It is helpful for me. I have implemented GMM for fisheriris also but I didn't use fitgmdist function because I didn't have it. So I used code from http://chrisjmccormick.wordpress.com/2014/08/04/gaussian-mixture-models-tutorial-and-matlab-code/.
When I read Understanding concept of Gaussian Mixture Models, Amro could plot the result with its label, i.e. setosa, virginica, and versicolor. How did he do it? After some iterations, I only got mu, Sigma, and weight. There is no label at all. I want to put the label (setosa, virginica, and versicolor) to mixture models from GMM iteration.
There are two sets of "labels" in that plot:
one is the "true" labels of the Fisher Iris dataset (the species variable which contains the class of each instance: setoas, versicolor, or virginica). Normally you wouldn't have those in a real dataset (after all the goal of clustering is to discover those groups within the data, which you don't know beforehand). I just used them here to get an idea of how well the EM clustering performed against the actual truth (the scatter points are color-coded according to the class).
the other set of labels are the clusters we found using GMM. Basically I built a 50x50 grid of 2D points to cover the entire data domain, I then assign a cluster to each of those points by computing the posterior probability and choosing the component with highest likelihood. I showed those clusters in the background color. As a nice consequence, we get to see the discriminant decision boundaries between the clusters.
You can see that the cluster of points on the left got separated quite nicely (and perfectly matched the setosa class). While the points on the right side of the plot got separated in two matching the other two classes, although there were instance "misclassified" if you will (some green points on the wrong side of the boundary).
Typically in a real setting you wouldn't have those actual classes to compare against, so no way to tell how "accurate" your clustering was (there exist other metrics for clustering performance evaluation)...

The visualization of high-dimensional input for two-class classification in SVM

I am trying to find a way to visualize the data with high-dimensional input for two-class classification in SVM, before analysis to decide which kernel to use. In documents online, the visualization of data is given only for two dimensional inputs (I mean two attributes).
Another question rises: What if I have multi-class and more than two attributes?
To visualize, the data should be represented by 3 or less dimension.
Simply PCA can be applied to reduce dimension.
use pre-image using MDS.
refer to a paper The pre-image problem in kernel methods and its matlab code in http://www.cse.ust.hk/~jamesk/publication.html

HOG Feature Implementation with SVM in MATLAB

I would like to do classification based on HOG Features using SVM.
I understand that HOG features is the combination of all the histograms in every cell (i.e. it becomes one aggregate histogram).
I extract HOG features using MATLAB code in this page for the Dalal-Triggs variant.
For example, I have grayscale image with size of 384 x 512, then I extracted the HOG features at 9 orientations and a cell-size of 8. By doing this, I get 48 x 64 x 36 features.
How can I make this a histogram and use it toward a SVM classifier?
Because for example, I'll have 7 classes of images and I want to do training (total images would be 700 for training) and then classify new data based on the model generated from the training phase.
I read that for multiclass, we can train our SVM with ONE vs ALL, that means that I have to train 7 classifier for my 7 classes.
So for the 1st train, I'll consider the 1st class to be labelled with +1 and the reast class will be 0.
And the 2nd train, I'll consider the 2nd class to be labelled with +1 and the reast class will be 0. And so on..
For example, I have classes of colors :
Red, green, blue, yellow, white, black and pink.
So for the 1st training, I make only 2 binary which is red and not red..
For the 2nd training, I make label green and not green..
Is it like that??
The syntax to train SVM is:
SVMStruct = svmtrain(Training,Group)
But in this case, I'll have 7 SVMStruct..
The syntax to classify / testing
Group = svmclassify(SVMStruct,Sample)
how to declare 7 SVMStruct in here??
Is that right??
Or there are another concept or syntaks that I have to know??
And for training, I'll have 48 x 64 x 36 features, howw I can train these features in SVM??
because as what I read, they just have 1xN matrix of features..
Please help me...
HOG and SVM is most successful algorithm for object detection. To apply this method, yes indeed you must have two different training datasets before they are fed into SVM classifier. For instance, you want to detect an apple, so you must have two training dataset, positive images is the one contains an apple in image and negative images is the one contains no apples in image. Then, you extract the features from both training datasets (positive and negative) into HOG descriptor separately and also label it separately (i.e 1 is for positive, 0 for negative). Afterwards, combine the features vector from positive and negative and feed them to SVM Classifier.
You can use SVM Light or LibSVM, which is easier and user friendly for beginner.
The Computer Vision System Toolbox for MATLAB includes extractHOGFeatures function, and the Statistics Toolbox includes SVM. Here's an example of how to classify images using HOG and SVM.
1. How can I make this a histogram and use it toward a SVM classifier?
One distinction I want to make is that you already have a 'histogram' of oriented gradient features. You now need to give these features as input to the SVM. It is weird to assign labels to each of these features because one HoG feature might turn up in another image labelled differently.
In practice what is done is to make another histogram called a bag of words from these HoG features and give them to the SVM as input. The intuition is if two features are very similar you would want one representation for both these HoG features. This reduces the variance in the input data. Now we make this new histogram for each image.
A bag of words is created in the following way:
Cluster all the HoG features into 'words'. Say you have 1000 of
these words.
Go through all HoG features and assign HoG feature to a word if it
is closest(euclidean distance) to that word among all words in bag.
Count how many words are assigned to each word in bag. This is the
1XN(N = number of words in bag) feature histogram which will be given
as input to the SVM after labeling.
2. How to carry out multi-class classification using a SVM?
If you will retrain the SVM you will get another model. There are two ways how you might do multiclass SVM using SVMTrain
1) One vs One SVM
Train for each label class with input in the following way:
Example for model 1 input will be
bag of words features for Image 1, RED
bag of words features Image 2, GREEN
Example for model 1 input will be
bag of words features for Image 3, YELLOW
bag of words features Image 2, GREEN
The above is done for each pair of label classes. You will have N(N-1)/2 models. Now you can count the number of votes for each class from the N(N-1)/2 models to find which label to assign.
2) One vs All SVM
Train for each label class with input in the following way:
Example for model 1 input will be
bag of words features for Image 1, RED
bag of words features for Image 2, NOT RED
Example for model 1 input will be
bag of words features for Image 2, GREEN
bag of words features for Image 1, NOT GREEN
The above is done each label class. You will have N models. Now you can count the number of votes for each class from the N models to find which label to assign.
Read more on category-level classification here: http://www.di.ens.fr/willow/events/cvml2013/materials/slides/tuesday/Tue_bof_summer_school_paris_2013.pdf

How to extract useful features from a graph?

Things are like this:
I have some graphs like the pictures above and I am trying to classify them to different kinds so the shape of a character can be recognized, and here is what I've done:
I apply a 2-D FFT to the graphs, so I can get the spectral analysis of these graphs. And here are some result:
S after 2-D FFT
T after 2-D FFT
I have found that the same letter share the same pattern of magnitude graph after FFT, and I want to use this feature to cluster these letters. But there is a problem: I want the features of interested can be presented in a 2-D plane, i.e in the form of (x,y), but the features here is actually a graph, with about 600*400 element, and I know the only thing I am interested is the shape of the graph(S is a dot in the middle, and T is like a cross). So what can I do to reduce the dimension of the magnitude graph?
I am not sure I am clear about my question here, but thanks in advance.
You can use dimensionality reduction methods such as
k-means clustering
SVM
PCA
MDS
Each of these methods can take 2-dimensional arrays, and work out the best coordinate frame to distinguish / represent etc your letters.
One way to start would be reducing your 240000 dimensional space to a 26-dimensional space using any of these methods.
This would give you an 'amplitude' for each of the possible letters.
But as #jucestain says, a network classifiers are great for letter recognition.

Bayes classification in matlab

I have 50 images and created a database of the green channel of each image by separating them into two classes (Skin and wound) and storing the their respective green channel value.
Also, I have 1600 wound pixel values and 3000 skin pixel values.
Now I have to use bayes classification in matlab to classify the skin and wound pixels in a new (test) image using the data base that I have. I have tried the in-built command diaglinear but results are poor resulting in lot of misclassification.
Also, I dont know if it's a normal distribution or not so can't use gaussian estimation for finding the conditional probability density function for the data.
Is there any way to perform pixel wise classification?
If there is any part of the question that is unclear, please ask.
I'm looking for help. Thanks in advance.
If you realy want to use pixel wise classification (quite simple, but why not?) try exploring pixel value distributions with hist()/imhist(). It might give you a clue about a gaussianity...
Second, you might fit your values to the some appropriate curves (gaussians?) manually with fit() if you have curve fitting toolbox (or again do it manualy). Then multiply the curves by probabilities of the wound/skin if you like it to be MAP classifier, and finally find their intersection. Voela! you have your descition value V.
if Xi skin
else -> wound