Ensemble classifier with wrapper method - matlab

I'm trying to combine multiple classifiers (ANN, SVM, kNN, ... etc.) using ensemble learning (viting, stacking ...etc.) .
In order to make a classifier, I'm using more than 20 types of explanatory variables.
However, each classifier has the best subset of explanatory variables. Thus, seeking the best combination of explanatory variables for each classifier in wrapper method,
I would like to combine multiple classifiers (ANN, SVM, kNN, ... etc.) using ensemble learning (viting, stacking ...etc.) .
By using the meta-learning with weka, I should be able to use the ensemble itself.
But I can not obtain the best combination of explanatory variables since wrapper method summarizes the prediction of each classifier.
I am not stick to weka if it can be solved easier in maybe matlab or R.

With ensemble approaches, best results have been achieved with very simple classifiers. Which on the other hand can be pretty fast, to make up for the ensemble cost.
This may seem counterintuitive at first: one would exepect a better input classifier to produce a better output. However, there are two reasons why this does not work.
First of all, with simple classifiers, you can usually tweak them more to get a diverse set of input classifiers. A full-dimensional method + feature bagging gives you a diverse set of classifiers. A classifier that internally does feature selection or reduction makes feature bagging largely disfunct for getting variety. Secondly, a complex method such as SVM is more likely to optimize/converge towards the very same result. After all, the complex methods are supposed to go through a much larger search space and find the best result in this search space. But that also means, you are more likely to get the same result again.
Last but not least, when using very primivite classifiers, the errors are better behaved and more likely to even out on ensemble combination.

Related

Feature selection for one class classification

I try to apply One Class SVM but my dataset contains too many features and I believe feature selection would improve my metrics. Are there any methods for feature selection that do not need the label of the class?
If yes and you are aware of an existing implementation please let me know
You'd probably get better answers asking this on Cross Validated instead of Stack Exchange, although since you ask for implementations I will answer your question.
Unsupervised methods exist that allow you to eliminate features without looking at the target variable. This is called unsupervised data (dimensionality) reduction. They work by looking for features that convey similar information and then either eliminate some of those features or reduce them to fewer features whilst retaining as much information as possible.
Some examples of data reduction techniques include PCA, redundancy analysis, variable clustering, and random projections, amongst others.
You don't mention which program you're working in but I am going to presume it's Python. sklearn has implementations for PCA and SparseRandomProjection. I know there is a module designed for variable clustering in Python but I have not used it and don't know how convenient it is. I don't know if there's an unsupervised implementation of redundancy analysis in Python but you could consider making your own. Depending on what you decide to do it might not be too tricky (especially if you just do correlation based).
In case you're working in R, finding versions of data reduction using PCA will be no problem. For variable clustering and redundancy analysis, great packages like Hmisc and ClustOfVar exist.
You can also read about other unsupervised data reduction techniques; you might find other methods more suitable.

What is the Difference between evolutionary computing and classification?

I am looking for some comprehensive description. I couldn't find it via browsing as things are more clustered on the web and its not in my scope currently.
Classification and evolutionary computing is comparing oranges to apples. Let me explain:
Classification is a type of problem, where the goal is to determine a label given some input. (Typical example, given pixel values, determine image label).
Evolutionary computing is a family of algorithms to solve different types of problems. They work with a "population" of candidates (imagine a set of different neural networks trying to solve a given problem). Somehow you evaluate how good each candidate is in the given task (typically using a "fitness function", but there are other methods). Then a new generation of candidates is produced, taking the best candidates from the previous generation as a model, and including mutations and cross-over (that is, introducing changes). Repeat until happy.
Evolutionary computing can absolutely be used for classification! But there are examples where it is used in different ways. You may use evolutionary computing to create an artificial neural network controlling a robot (in this case, inputs are sensor values, outputs are commands for actuators). Or to create original content free of a given goal, as in Picbreeder.
Classification may be solved using evolutionary computation (maybe this is why you where confused in the first place) but other techniques are also common. You can use decision trees, or notably deep-learning (based on backpropagation).
Deep-learning based on backpropagation may sound similar to evolutionary computation, but it is quite different. Here you have only one artificial neural network, and a clear rule (backpropagation) telling you which changes to introduce every iteration.
Hope this helps to complement other answers!
Classification algorithms and evolutionary computing are different approaches. However, they are related in some ways.
Classification algorithms aim to identify the class label of new instances. They are trained with some labeled instances. For example, recognition of digits is a classification algorithm.
Evolutionary algorithms are used to find out the minimum or maximum solution of an optimization problem. They randomly explore the solution space of the given problem. They can find a good solution in a reasonable time and are not able to find the global optimum in all problems.
In some classification approaches, evolutionary algorithms are used to find out the optimal value of the parameters.

Difference between Libsvm and vl_feat SVM

I am working on image classification project. I utilized Lib-SVM and Vl_feat SVM implementation train a linear kernel. Both classifiers returns different result can some one explain what is the different between two libraries.
From a quick glance at the websites for the two implementations, they use different algorithms to solve the SVM problem. That is, both are SVM's but one uses one trick to find the weights, and the other uses a different trick. The results should both be similar, but not exactly the same.
Another possible difference is the parameters you are passing in to the implementations. The different libraries may have different default settings for certain parameters you are not explicitly setting.

How to Combine two classification model in matlab?

I am trying to detect the faces using the Matlab built-in viola jones face detection. Is there anyway that I can combine two classification models like "FrontalFaceCART" and "ProfileFace" into one in order to get a better result?
Thank you.
You can't combine models. That's a non-sense in any classification task since every classifier is different (works differently, i.e. different algorithm behind it, and maybe is also trained differently).
According to the classification model(s) help (which can be found here), your two classifiers work as follows:
FrontalFaceCART is a model composed of weak classifiers, based on classification and regression tree analysis
ProfileFace is composed of weak classifiers, based on a decision stump
More infos can be found in the link provided but you can easily see that their inner behaviour is rather different, so you can't mix them or combine them.
It's like (in Machine Learning) mixing a Support Vector Machine with a K-Nearest Neighbour: the first one uses separating hyperplanes whereas the latter is simply based on distance(s).
You can, however, train several models in parallel (e.g. independently) and choose the model that better suits you (e.g. smaller error rate/higher accuracy): so you basically create as many different classifiers as you like, give them the same training set, evaluate each accuracy (and/or other parameters) and choose the best model.
One option is to make a hierarchical classifier. So in a first step you use the frontal face classifier (assuming that most pictures are frontal faces). If the classifier fails, you try with the profile classifier.
I did that with a dataset of faces and it improved my overall classification accuracy. Furthermore, if you have some a priori information, you can use it. In my case the faces were usually in the middle up part of the picture.
To further improve your performance, without using the two classifiers in MATLAB you are using, you would need to change your technique (and probably your programming language). This is the best method so far: Facenet.

different results by SMO, NaiveBayes, and BayesNet classifiers in weka

I am trying different classifiers of Weka on my data set. I have small dataset and I am classifying my data into five classes.
My problem is that when I apply cross validation or percentage split classification by different classifiers, I get very different results.
For example, when I use NaiveBayse or BayseNet classifiers, I have an F-score of around 40 for all classes, but using SMO I get an F-score of 20. The worse result is obtained when I use LibLinear classifier which gives me a F-scores of around 15.
Maybe I should mention that since LibLinear classifier doesn't accept nominals, I assign a code to each of the possible nominal values and use them as Numeric values in my dataset.
Can anybody tell me why I get such different results? I expected all classifiers to have roughly similar results.
In addition, when I use LibLinear on my test set, I have all data classified under one of the classes and there is no instances in the other four classes.
Thanks in advance,
Why would you expect similar results? For small data set especially I think different methods could easily lead to different predictions. Also linear model has tolerance threshold that would cause early termination before convergence. It's something you can play with in LibLINEAR or SMO for instance.