Is it possible to calculate the posterior probability of any type of classifiers? - classification

As i know, some classifiers such as Naive Bayes calculate the posterior probability of data and based on it produce the result.
My question is that does any classifier can produce posterior probability?
for example how decision tree can generate it?

Some classification models such as logistic regression and neural networks compute posterior class probabilities directly. Models based on generative models, such the quadratic discriminant and models derived from mixture densities, also compute posterior class probabilities. Decision trees can be easily adapted to output a class probability by returning the proportion of positive examples from leaves of the tree.
A prominent exception is the support vector machine, which doesn't return a probability. I think maybe someone has tried to modify it to return a probability; dunno how that worked out.
See Hastie, Tibshirani, and Friedman, "Elements of Statistical Learning" (or any of many texts) for more about this stuff. Further questions of this kind should probably go to stats.stackexchange.com.

Related

Bagging with knn as learners

I am struggling in understanding why the matlab function fitcenseble doesn't allow to create an ensemble model using knn learners with bagging, but only with the random subspace method, which is more similar to the random forest one.
I would like to use bagging in order to compare the bagging method using different types of learners (e.g., knn and trees).
I hope you will help me, thank you in advance,
Marta
Bagging is rarely used in conjunction with k-nn classifiers, as the decision surfaces are typically too stable and any multiples of datapoints in the bootstrap sample do not shift the 'weight' like in many other models. Paraphrasing (1):
The probability that any single datapoint appears at least once in a bootstrap sample is ~0.632. Consider a simple 2-class 1-NN classifier bagged with N bootstrap samples. A test datapoint can change classification only if its nearest neighbours in the learning set is not in at least half of the N bootstrap samples. The probability for this to occur is the same as the probability of flipping a weighted coin with a 0.632 probability for heads N times and getting less than 0.5N heads. As N gets larger this probability gets smaller and smaller. Similiar logic holds for multiclass problems and k-NN.
If you want to create your own bagging models you can do it with bootstrp. bootstrp() can be called without a function by calling:
[~, BootIndices] = bootstrap(N, [], Data);
BootSample = Data(BootIndices);
(1) Breiman, Leo. "Bagging predictors." Machine learning 24.2 (1996):
123-140. Chapter 6.4.

Random component on fitcsvm/predict

I have a train dataset and a test dataset, and I train a SVM with fitcsvm in MATLAB. Then, I proceed to test the trained model with predict. I'm always using the same datasets, but I keep getting different AUCs for the same model, which makes me wonder where in the process is there a random component. Note that
I'm aware of the fact that formally there isn't such thing as ROC curve or AUC and
I'm not asking for the statistical background of the SVM problem. It is relative to the matlab implementation of the training/test algorithm. I expected to have the same results because the training algorithm is, afaik, a deterministic process.

Discriminant analysis method to classify data

my aim is to classify the data into two sections- upper and lower- finding the mid line of the peaks.
I would like to apply machine learning methods- i.e. Discriminant analysis.
Could you let me know how to do that in MATLAB?
It seems that what you are looking for is GMM (gaussian mixture model). With K=2 (number of mixtures) and dimension equal 1 this will be simple, fast method, which will give you a direct solution. Given components it is easy to analytically find a local minima (which is just a weighted average of means, with weights proportional to the std's).

h2o random forest calculating MSE for multinomial classification

Why is h2o.randomforest calculating MSE on Out of bag sample and while training for a multinomail classification problem?
I have done binary classification also using h2o.randomforest, there it used to calculate AUC on out of bag sample and while training but for multi classification random forest is calculating MSE which seems suspicious. Please see this screenshot.
My target variable was a factor containing 4 factor levels model1, model2, model3 and model4. In the screenshot you would also a confusion matrix for these factors.
Can someone please explain this behaviour?
Both binomial and multinomial classification display MSE, so you will see it in the Scoring History table for both models (highlighted training_MSE column).
H2O does not evaluate a multinomial AUC. A few evaluation methods exist, but there is not yet a single widely adopted method. The pROC package discusses the method of Hand and Till, but mentions that it cannot be plotted and results rarely tested. Log loss and classification error are still available, specific to classification, as each has standard methods of evaluation in a multinomial context.
There is a confusion matrix comparing your 4 factor levels, as you highlighted. Can you clarify what more you are expecting? If you were looking for four individual confusion matrices, the four-column table contains enough information that they could be computed.

Gaussian Naive Bayes classification

I have found the following Matlab implementation of a Naive Bayes classifier:
https://github.com/jjedele/Naive-Bayes-Classifier-Octave-Matlab
What is the difference between Gaussian Naive Bayes and Naive Bayes? How could I extend the above implementation to become Gaussian Naive Bayes?
How can I extend the implementation for using it with 4 classes? Just doing one-vs-all other?
Thank you very much for the help.
In Naive Bayes Classification we take a set of features (x0,x1,...xn) and try to assign those feature to one of a known set Y of class (y0,y1,...yk) we do that by using training data to calculate the conditional probabilities that tell us how often a particular class had a certain feature in the training set and then multiplying them together.
The result is a score for each class in the set Y. We then take the highest scoring member of Y as the class that our feature set should be assigned to.
up until this point we haven't made any assumptions about what the p(x|C) distributions look like.
In Guassian Naive Bayes we assume that all those p(x|C) values are normaly distributed that's the only "difference" and it really isn't a difference GNB is just a subset of Naive Bayes.
This can be useful if you don't have a lot of training data, and are willing to make the assumption that the population data is normally distributed about the mean of the sample (training) data you do have.
Full discloser the Tex comes from wikipedia.