I am performing multiclass classification and am investigating the impact on performance given by different types of features. I am using SVM 1v1 classifier for each set of features separately, and now I want to try training a combined model that will make use of all the feature sets I have. What are the ways of creating such a combined model without simply dumping all the features together? My understanding is that this is similar to the idea of an ensemble model, however, I couldn't find examples of ensembles that would operate on multiple feature sets.
I should also mention, that I am looking for a out-of-box implementations or some libraries, rather than implementing the models myself.
If you only have a 1-1 mapping between your abstract objects and some features in each of your sets - then this is actually a classical ensemble model, there is completely no difference. You think about your model as using multiple different feature extractors for the objects, thus
ABSTRACT OBJECTS ------ FEATURES ------ MODELS
\____________________________/ |
your definition of data your definition of model
while typical ML approach (perspective on your approach) would be
ABSTRACT OBJECTS ------ FEATURES ------ MODELS
| \_________________/
data model
in other words each pair (features_set, model) define actual model, and as you can see, with such perspective, you simply use any ensemble technique. The fact that you somehow "hand crafted" your various features sets does not change the fact, that it is just a part of modeling a function from your abstract objects (whatever they are) to actual decisions.
Related
I try to apply One Class SVM but my dataset contains too many features and I believe feature selection would improve my metrics. Are there any methods for feature selection that do not need the label of the class?
If yes and you are aware of an existing implementation please let me know
You'd probably get better answers asking this on Cross Validated instead of Stack Exchange, although since you ask for implementations I will answer your question.
Unsupervised methods exist that allow you to eliminate features without looking at the target variable. This is called unsupervised data (dimensionality) reduction. They work by looking for features that convey similar information and then either eliminate some of those features or reduce them to fewer features whilst retaining as much information as possible.
Some examples of data reduction techniques include PCA, redundancy analysis, variable clustering, and random projections, amongst others.
You don't mention which program you're working in but I am going to presume it's Python. sklearn has implementations for PCA and SparseRandomProjection. I know there is a module designed for variable clustering in Python but I have not used it and don't know how convenient it is. I don't know if there's an unsupervised implementation of redundancy analysis in Python but you could consider making your own. Depending on what you decide to do it might not be too tricky (especially if you just do correlation based).
In case you're working in R, finding versions of data reduction using PCA will be no problem. For variable clustering and redundancy analysis, great packages like Hmisc and ClustOfVar exist.
You can also read about other unsupervised data reduction techniques; you might find other methods more suitable.
I am trying to detect the faces using the Matlab built-in viola jones face detection. Is there anyway that I can combine two classification models like "FrontalFaceCART" and "ProfileFace" into one in order to get a better result?
Thank you.
You can't combine models. That's a non-sense in any classification task since every classifier is different (works differently, i.e. different algorithm behind it, and maybe is also trained differently).
According to the classification model(s) help (which can be found here), your two classifiers work as follows:
FrontalFaceCART is a model composed of weak classifiers, based on classification and regression tree analysis
ProfileFace is composed of weak classifiers, based on a decision stump
More infos can be found in the link provided but you can easily see that their inner behaviour is rather different, so you can't mix them or combine them.
It's like (in Machine Learning) mixing a Support Vector Machine with a K-Nearest Neighbour: the first one uses separating hyperplanes whereas the latter is simply based on distance(s).
You can, however, train several models in parallel (e.g. independently) and choose the model that better suits you (e.g. smaller error rate/higher accuracy): so you basically create as many different classifiers as you like, give them the same training set, evaluate each accuracy (and/or other parameters) and choose the best model.
One option is to make a hierarchical classifier. So in a first step you use the frontal face classifier (assuming that most pictures are frontal faces). If the classifier fails, you try with the profile classifier.
I did that with a dataset of faces and it improved my overall classification accuracy. Furthermore, if you have some a priori information, you can use it. In my case the faces were usually in the middle up part of the picture.
To further improve your performance, without using the two classifiers in MATLAB you are using, you would need to change your technique (and probably your programming language). This is the best method so far: Facenet.
I have a dataset with 20 features. 10 for age and 10 for weight. I want to classify the data for both separately then use the results from these 2 classifiers as an input to a third for the final result..
Is this possible with Weka????
Fusion of decisions is possible in WEKA (or with any two models), but not using the approach you describe.
Seeing as your using classifiers, each model will only output a class. You could use the two labels produced as features for a third model, but the lack of diversity in your inputs would most likely prevent the third model from giving you anything interesting.
At the most basic level, you could implement a voting scheme. Give each model a "vote" and then take assume that the correct class is the majority voted class. While this will give a rudimentary form of fusion, if you're familiar with voting theory you know that majority-rules somewhat falls apart when you have more than two classes.
I recommend that you use Combinatorial Fusion to fuse the output of the two classifiers. A good paper regarding the technique is available as a free PDF here. In essence, you use the Classifer::distributionForInstance() method provided by WEKA's classifiers and then use the sum of the distributions (called "scores") to rank the classes, choosing the class with the highest rank. The paper demonstrates that this method is superior to doing just voting alone.
I'm trying to train a classifier (specifically, a decision forest) using the Matlab 'TreeBagger' class.
I notice from the online documentation for TreeBagger, that there are a couple of methods/properties that could be used to see how important each data point feature is for distinguishing between classes of data point.
The two I found were the ComputeOOBVarImp property and the ClassificationTree.predictorImportance method. Using the latter on a decision forest/bagged ensemble of trees that I'd built, I found that many data point features had zero importance for the classifier.
Is there anything I can do with the TreeBagger class, or in conjunction with it, so that my trees use weak learners/splitting criteria that aren't just bounds on single input data features, but linear combinations of these features, in order to improve the 'information gain' at each node split.
I suppose this comes down to dimensionality reduction of the data, that I have no experience in dealing with in Matlab.
Thanks.
I'm trying to combine multiple classifiers (ANN, SVM, kNN, ... etc.) using ensemble learning (viting, stacking ...etc.) .
In order to make a classifier, I'm using more than 20 types of explanatory variables.
However, each classifier has the best subset of explanatory variables. Thus, seeking the best combination of explanatory variables for each classifier in wrapper method,
I would like to combine multiple classifiers (ANN, SVM, kNN, ... etc.) using ensemble learning (viting, stacking ...etc.) .
By using the meta-learning with weka, I should be able to use the ensemble itself.
But I can not obtain the best combination of explanatory variables since wrapper method summarizes the prediction of each classifier.
I am not stick to weka if it can be solved easier in maybe matlab or R.
With ensemble approaches, best results have been achieved with very simple classifiers. Which on the other hand can be pretty fast, to make up for the ensemble cost.
This may seem counterintuitive at first: one would exepect a better input classifier to produce a better output. However, there are two reasons why this does not work.
First of all, with simple classifiers, you can usually tweak them more to get a diverse set of input classifiers. A full-dimensional method + feature bagging gives you a diverse set of classifiers. A classifier that internally does feature selection or reduction makes feature bagging largely disfunct for getting variety. Secondly, a complex method such as SVM is more likely to optimize/converge towards the very same result. After all, the complex methods are supposed to go through a much larger search space and find the best result in this search space. But that also means, you are more likely to get the same result again.
Last but not least, when using very primivite classifiers, the errors are better behaved and more likely to even out on ensemble combination.