How to use weighted vote for classification using weka - classification

I know we can use the vote classifier to combine different classifiers.
May I know if there is any way to combine the classifiers with different weights for each classifier? How would I be able to do that with Weka?
I have googled that we can add weights to attributes or instances. But I would like to know how to add weight to classifiers.
If weighted vote is not possible, is there any other way I can do that? Thanks.

It does not appear possible to achieve weighted voting with Weka without modifying the Java classes yourself.
Source 1
Source 2
That being said, I believe you can achieve rudimentary weighting by providing multiples of the base classifiers to the voting meta classifier. This appears to be backed up by the source code.
For example:
Classifier 1: J48 decision tree
Classifier 2: J48 decision tree
Classifier 3: naive Bayes
This would allow the decision tree to vote twice and therefore have a higher weight than naive Bayes.


How Adaboost and decision tree features importances differ?

I have a multiclass classification problem and I extracted features importances based on impurity decrease. I compared a decision tree and AdaBoost classifiers and I ovserved that there is a feature that was ranked on top with the decision tree while it has a very lower importance according to AdaBoost.
Is that a normal behavior?
Yes it is normal behavior. The features importance calculates a score for all the input features of a model. However, each model has a (slightly) different technique. For example: a linear regression will look at linear relationships. If a feature has a perfect linear relationship with your target, then it will have a high feature importance. Features with a non-linear relationship may not improve the accuracy resulting in a lower feature importance score.
There is some research related to the difference in feature importance measures. An example is:

How to Combine two classification model in matlab?

I am trying to detect the faces using the Matlab built-in viola jones face detection. Is there anyway that I can combine two classification models like "FrontalFaceCART" and "ProfileFace" into one in order to get a better result?
Thank you.
You can't combine models. That's a non-sense in any classification task since every classifier is different (works differently, i.e. different algorithm behind it, and maybe is also trained differently).
According to the classification model(s) help (which can be found here), your two classifiers work as follows:
FrontalFaceCART is a model composed of weak classifiers, based on classification and regression tree analysis
ProfileFace is composed of weak classifiers, based on a decision stump
More infos can be found in the link provided but you can easily see that their inner behaviour is rather different, so you can't mix them or combine them.
It's like (in Machine Learning) mixing a Support Vector Machine with a K-Nearest Neighbour: the first one uses separating hyperplanes whereas the latter is simply based on distance(s).
You can, however, train several models in parallel (e.g. independently) and choose the model that better suits you (e.g. smaller error rate/higher accuracy): so you basically create as many different classifiers as you like, give them the same training set, evaluate each accuracy (and/or other parameters) and choose the best model.
One option is to make a hierarchical classifier. So in a first step you use the frontal face classifier (assuming that most pictures are frontal faces). If the classifier fails, you try with the profile classifier.
I did that with a dataset of faces and it improved my overall classification accuracy. Furthermore, if you have some a priori information, you can use it. In my case the faces were usually in the middle up part of the picture.
To further improve your performance, without using the two classifiers in MATLAB you are using, you would need to change your technique (and probably your programming language). This is the best method so far: Facenet.

Gaussian Naive Bayes classification

I have found the following Matlab implementation of a Naive Bayes classifier:
What is the difference between Gaussian Naive Bayes and Naive Bayes? How could I extend the above implementation to become Gaussian Naive Bayes?
How can I extend the implementation for using it with 4 classes? Just doing one-vs-all other?
Thank you very much for the help.
In Naive Bayes Classification we take a set of features (x0,x1,...xn) and try to assign those feature to one of a known set Y of class (y0,y1,...yk) we do that by using training data to calculate the conditional probabilities that tell us how often a particular class had a certain feature in the training set and then multiplying them together.
The result is a score for each class in the set Y. We then take the highest scoring member of Y as the class that our feature set should be assigned to.
up until this point we haven't made any assumptions about what the p(x|C) distributions look like.
In Guassian Naive Bayes we assume that all those p(x|C) values are normaly distributed that's the only "difference" and it really isn't a difference GNB is just a subset of Naive Bayes.
This can be useful if you don't have a lot of training data, and are willing to make the assumption that the population data is normally distributed about the mean of the sample (training) data you do have.
Full discloser the Tex comes from wikipedia.

How to see which Atribute (Feature) contribute most to the performance of the classification with PCA in Matlab?

I would like to perform classification on a small data set 65x9 using some of the Machine Learning Classification Methods (SVM, Decision Trees or any other).
So, before starting with the classification I would like to do attribute analyses with PCA in Matlab or Weka (preferred MatLab). I would like to obtain which Attribute contribute most to the performance of the classifier. So I can maybe reduce the number of some Attribute or/and include more in the future. Any example of PCA can find regarding this in MatLab or Weka?
PCA is a unsupervised feature extraction method.
If your question is on selecting attributes to use with PCA, i don't know what your purpose is but it is unnecessary to do something like that to improve classification performance. Just use the whole attributes. PCA will give you best attributes in decreasing order for each instance.
If your question is on selecting attributes after PCA, you can chose a treshold (for example 0.95) and calculate #attributes enough for treshold beginning from the first attribute to last one. You can use the eigenvalues of covariance matrix to calculate and achive treshold in PCA.
After running PCA, we know that the first attribute is the best one, the second attribute is the best one after first etc...

SVM LibSVM Ignore Feature 1,3,5 when Predicting

this question is about LibSVM or SVMs in general.
I wonder if it is possible to categorize Feature-Vectors of different length with the same SVM Model.
Let's say we train the SVM with about 1000 Instances of the following Feature Vector:
[feature1 feature2 feature3 feature4 feature5]
Now I want to predict a test-vector which has the same length of 5.
If the probability I receive is to poor, I now want to check the first subset of my test-vector containing the columns 2-5. So I want to dismiss the 1 feature.
My question now is: Is it possible to tell the SVM only to check the features 2-5 for prediction (e.g. with weights), or do I have to train different SVM Models. One for 5 features, another for 4 features and so on...?
Thanks in advance...
You can always remove features from your test points by fiddling with the file, but I highly recommend not using such an approach. An SVM model is valid when all features are present. If you are using the linear kernel, simply setting a given feature to 0 will implicitly cause it to be ignored (though you should not do this). When using other kernels, this is very much a no no.
Using a different set of features for predictions than the set you used for training is not a good approach.
I strongly suggest to train a new model for the subset of features you wish to use in prediction.