How to quantify similarity of tree models? (XGB, Random Forest, Gradient Boosting, etc.) - classification

Are there any algorithms that quantify the similarity of tree based models such as XGB? For example, I train two XGB models with different datasets for example in cross validation and want to estimate the robustness or consistency of the predictions and maybe how features are used.

Related

How Adaboost and decision tree features importances differ?

I have a multiclass classification problem and I extracted features importances based on impurity decrease. I compared a decision tree and AdaBoost classifiers and I ovserved that there is a feature that was ranked on top with the decision tree while it has a very lower importance according to AdaBoost.
Is that a normal behavior?
Thanks
Yes it is normal behavior. The features importance calculates a score for all the input features of a model. However, each model has a (slightly) different technique. For example: a linear regression will look at linear relationships. If a feature has a perfect linear relationship with your target, then it will have a high feature importance. Features with a non-linear relationship may not improve the accuracy resulting in a lower feature importance score.
There is some research related to the difference in feature importance measures. An example is:
https://link.springer.com/article/10.1007/s42452-021-04148-9

WEKA classifier evaluation

I'm trying to evaluate the performance of a classifier using 10-fold CV in WEKA. I have 32,000 records split across three different classes, "po", "ng", "ne".
po: ~950
ng: ~1200
ne: ~30000
How should I split the dataset for performing CV? Am I right in assuming that for CV I should have a roughly equal number of records for each class, so to prevent unfair weighting towards the "ne" class?
Firstly, no you need not have equal no. of cases in your classes. Not all datasets are balanced. Yes it might give unrealistic answer. The imbalance in the dataset is a common phenomenon but there are few tactics to handle it-:
1) Resampling the dataset
Undersampling- Deleting the records of majority class
Oversampling- Adding the records in minority class
you can use SMOTE algorithm to do it for you.
2) Performance Metrics
Some metrics like Kappa (or Cohen’s kappa)can work great in which Classification accuracy is normalized by the imbalance of the classes in the data.
3) Cost Sensitive Classifier
Weka has a CostSensitiveClassifier that can wrap any classifier and apply a custom penalty matrix for miss classification.
But the challenge here is how you determine the cost because cost should be domain dependent and not data dependent.
In case of cross-validation, I found this link to be useful.
http://www.marcoaltini.com/blog/dealing-with-imbalanced-data-undersampling-oversampling-and-proper-cross-validation
Hope it helps.

unified estimation of discrete Markov Model

Background
I have a multivariate dataset, say M x N, where M is the number of variables and N is the number of samples. Now, the pattern of dependencies between the M variables changes across the N samples i.e. the pattern of dependencies is non-stationary.
Problem
I want to use a 'discrete' Markov Model to characterize this change in pattern of dependencies across samples. One way of doing this is by estimating the pattern of dependencies for each sample and using for e.g. k-means clustering to group these patterns into a small number of symbols. Then, I could Markov Modelling to estimate transition probabilities between symbols.
My question is: can I do all the above in a single unified model i.e. a model which combines the Markov Modelling with estimating patterns of dependencies and clustering them into a small number of symbols. If so, wouldn't this be preferable to the sequential approach outlined above?
Any suggestions/thoughts/ideas welcome!

Which predictive modelling technique will be most helpful?

I have a training dataset which gives me the ranking of various cricket players(2008) on the basis of their performance in the past years(2005-2007).
I've to develop a model using this data and then apply it on another dataset to predict the ranking of players(2012) using the data already given to me(2009-2011).
Which predictive modelling will be best for this? What are the pros and cons of using the different forms of regression or neural networks?
The type of model to use depends on different factors:
Amount of data: if you have very little data, you better opt for a simple prediction model like linear regression. If you use a prediction model which is too powerful you run into the risk of over-fitting your model with the effect that it generalizes bad on new data. Now you might ask, what is little data? That depends on the number of input dimensions and on the underlying distributions of your data.
Your experience with the model. Neural networks can be quite tricky to handle if you have little experience with them. There are quite a few parameters to be optimized, like the network layer structure, the number of iterations, the learning rate, the momentum term, just to mention a few. Linear prediction is a lot easier to handle with respect to this "meta-optimization"
A pragmatic approach for you, if you still cannot opt for one of the methods, would be to evaluate a couple of different prediction methods. You take some of your data where you already have target values (the 2008 data), split it into training and test data (take some 10% as test data, e.g.), train and test using cross-validation and compute the error rate by comparing the predicted values with the target values you already have.
One great book, which is also on the web, is Pattern recognition and machine learning by C. Bishop. It has a great introductory section on prediction models.
Which predictive modelling will be best for this? 2. What are the pros
and cons of using the different forms of regression or neural
networks?
"What is best" depends on the resources you have. Full Bayesian Networks (or k-Dependency Bayesian Networks) with information theoretically learned graphs, are the ultimate 'assumptionless' models, and often perform extremely well. Sophisticated Neural Networks can perform impressively well too. The problem with such models is that they can be very computationally expensive, so models that employ methods of approximation may be more appropriate. There are mathematical similarities connecting regression, neural networks and bayesian networks.
Regression is actually a simple form of Neural Networks with some additional assumptions about the data. Neural Networks can be constructed to make less assumptions about the data, but as Thomas789 points out at the cost of being considerably more difficult to understand (sometimes monumentally difficult to debug).
As a rule of thumb - the more assumptions and approximations in a model the easier it is to A: understand and B: find the computational power necessary, but potentially at the cost of performance or "overfitting" (this is when a model suits the training data well, but doesn't extrapolate to the general case).
Free online books:
http://www.inference.phy.cam.ac.uk/mackay/itila/
http://ciml.info/dl/v0_8/ciml-v0_8-all.pdf

why as number of features increases, the classification accuracy decreases when using svm

I am using libsvm for image classification. Why when I use more features for classification my prediction accuracy decreases? Shouldn´t it increase? My dataset size is fixed at 1600 for training and 400 for testing.
Because the additional features may not be at all useful for separating the classes in the feature space. Accuracy is not necessarily tied to number of features.
Including lots of poor features may cause your SVM to learn the noise in the data, damaging the accuracy.
For example, if your extra feature looks like this (using 2D plots for clarity):
Then it will not be a very good feature for separating the (in this case) two classes. If for example, the SVM trains only on this pattern, it will not be very good at predicting the class of a future point. However, there might be a feature in your dataset that looks like this:
A feature like this would be very useful in separating the two classes.