Can feature importance be used to explain the "why and which feature contributed" to model predictions? - classification

I have a XGBoost(classificaion) model with ~75% accuracy using N-variables and the feature importance list of the model. My question is - for a given row and a prediction score - can I explain which features led to a 1 or a 0 prediction ?

I have used xgboostExplainer for this purpose. This R package creates a plot for each sample showing the score and cumulative contribution of each variable to the final prediction, e.g.
This sample was predicted to be positive:
This sample was predicted to be negative:
The height of the bars is dependant on the importance of the variable AND the score for that variable for that sample, i.e. for a given sample you can see which variables contributed to the final prediction and why.

Related

Combining probabilities from different CNN models

Let's say I have 2 images of a car, but one is generated from the camera and the other is a depth image generated from Lidar pointcloud transformation.
I used the same CNN model on both image to predict the class (output is a softmax, as there is other classes in my dataset : pedestrian, van, truck, cyclist, etc.
How can I combine the two probabilities vector in order to predict the class by taking into account both predictions?
I used method like average, maximum, minimum, naive product apply to each score for each class, but don't know if it work.
Thanks you in advance
EDIT :
Following this article : https://www.researchgate.net/publication/327744903_Multimodal_CNN_Pedestrian_Classification_a_Study_on_Combining_LIDAR_and_Camera_Data
We can see that they use maximum or minimum rule to combine the outpout of classifiers. So dit it work for multiclass problem?
As per MSalter's comment, the softmax output isn't a true probability vector. But if we choose to regard it as such, we can simply take the average of each prediction. This is equivalent to having two persons each classify a random sample of objects from a big pool of objects and assuming they both counted an equal amount, estimate the distribution of objects in the big pool by combining their observations. The sum of the 'probabilities' of the classes will still equal 1.
Following this article : https://www.researchgate.net/publication/327744903_Multimodal_CNN_Pedestrian_Classification_a_Study_on_Combining_LIDAR_and_Camera_Data
We can see that they use maximum or minimum rule to combine the outpout of classifiers. So dit it work for multiclass problem?

sample weights in pyspark decision trees

Do you know if there is some way to put sample weights on DecisionTreeClassifier algorithm in pySpark (2.0+)?
Thanks in advance!
There's no hyperparameter currently in the pyspark DecisionTree or DecisionTreeClassifier class to specify weights to classes (usually required in an biased dataset or where importance of true prediction of one class is more important)
In near update it might be added and you can track the progress in the jira here
There has been a git branch which has already implemented this, though not available officially but you can use this pull request for now:
https://github.com/apache/spark/pull/16722
You have not specified the current scenario and why you want to use weights, but suggested work around now are
1. Undersampling the dataset
If your data set has a very high Bias, you can perform a random undersample of the dataset which has a very high frequency
2. Force Fitting the weights
Not a nice approach, but works. You can repeat the the rows for each class as per the weight.
Eg, for binary classification if you need a weight of 1:2 for (0/1) classification, you can repeat all the rows with the label 1 twice.

h2o random forest calculating MSE for multinomial classification

Why is h2o.randomforest calculating MSE on Out of bag sample and while training for a multinomail classification problem?
I have done binary classification also using h2o.randomforest, there it used to calculate AUC on out of bag sample and while training but for multi classification random forest is calculating MSE which seems suspicious. Please see this screenshot.
My target variable was a factor containing 4 factor levels model1, model2, model3 and model4. In the screenshot you would also a confusion matrix for these factors.
Can someone please explain this behaviour?
Both binomial and multinomial classification display MSE, so you will see it in the Scoring History table for both models (highlighted training_MSE column).
H2O does not evaluate a multinomial AUC. A few evaluation methods exist, but there is not yet a single widely adopted method. The pROC package discusses the method of Hand and Till, but mentions that it cannot be plotted and results rarely tested. Log loss and classification error are still available, specific to classification, as each has standard methods of evaluation in a multinomial context.
There is a confusion matrix comparing your 4 factor levels, as you highlighted. Can you clarify what more you are expecting? If you were looking for four individual confusion matrices, the four-column table contains enough information that they could be computed.

Matlab Neural Network correctly classified results

I have trained a NN with Back Propagation algorithm and calculated the MSE. Now I want to find the percentage of correctly classified results (i am facing a classification problem). Any help?
It depends on your dataset whether you generate the data or whether you are given a dataset with samples.
In the first case you feed your NN with a generated sample and check whether NN predicts the correct class. You repeat it let say 100 times. And for each correctly classified sample you increment the counter CorrectlyClassified by one. Then the percentage of correctly classified results is equal to CorrectlyClassified. For higher accuracy you may not generate 100 samples, but X samples (where X is bigger than 100). Then the percentage of correctly classified results is:
CorrectlyClassified/X*100.
If you are given a dataset you should use cross-validation. See MATLAB documentation for an example.

Rapidminer - neural net operator - output confidence

I have feed-forward neural network with six inputs, 1 hidden layer and two output nodes (1; 0). This NN is learned by 0;1 values.
When applying model, there are created variables confidence(0) and confidence(1), where sum of this two numbers for each row is 1.
My question is: what do these two numbers (confidence(0) and confidence(1)) exactly mean? Are these two numbers probabilities?
Thanks for answers
In general
The confidence values (or scores, as they are called in other programs) represent a measure how, well, confident the model is that the presented example belongs to a certain class. They are highly dependent on the general strategy and the properties of the algorithm.
Examples
The easiest example to illustrate is the majority classifier, who just assigns the same score for all observations based on the proportions in the original testset
Another is example the k-nearest-neighbor-classifier, where the score for a class i is calculated by averaging the distance to those examples which both belong to the k-nearest-neighbors and have class i. Then the score is sum-normalized across all classes.
In the specific example of NN, I do not know how they are calculated without checking the code. I guess it is just the value of output node, sum-normalized across both classes.
Do the confidences represent probabilities ?
In general no. To illustrate what probabilities in this context mean: If an example has probability 0.3 for class "1", then 30% of all examples with similar feature/variable values should belong to class "1" and 70% should not.
As far as I know, his task is called "calibration". For this purpose some general methods exist (e.g. binning the scores and mapping them to the class-fraction of the corresponding bin) and some classifier-dependent (like e.g. Platt Scaling which has been invented for SVMs). A good point to start is:
Bianca Zadrozny, Charles Elkan: Transforming Classifier Scores into Accurate Multiclass Probability Estimates
The confidence measures correspond to the proportion of outputs 0 and 1 that are activated in the initial training dataset.
E.g. if 30% of your training set has outputs (1;0) and the remaining 70% has outputs (0; 1), then confidence(0) = 30% and confidence(1) = 70%