Does naive bayes classifier perform text annotation? - text-processing

Does naive bayes classifier perform text annotation (sequence tagging)?
If yes, i need a tutorial please.
What do you think about MALLET for naive baies?

Naive Bayes is a form of classifier that makes a prediction about one variable, for example a label for a document. In a sequence tagging problem, you are making predictions about a sequence of variables: one for each token.
You can do this by treating each token as its own independent classification problem, or you can use a model that makes predictions for the whole sequence jointly, with the decision for one token affecting the decision for neighboring tokens.
The sequence equivalent for Naive Bayes is a Hidden Markov Model. An equivalent classifier/sequence-tagger pair is logistic regression and conditional random fields (CRFs). Mallet implements all of these, as do many other systems.

Related

Can linear model give more prediction accuracy than random forest,decision tree,neural network?

I have calculated the following parameters after applying the following algorithms on a dataset from kaggle
enter image description here
In the above case,linear model is giving the best results.
Are the above results correct and can linear model actually give better results than other 3 in any case?
Or am I missing something?
According to AUC criterion this classification is perfect (1 is theoretical maximum). This means a clear difference in the data. In this case, it makes no sense to talk about differences in the results of methods. Another point is that you can play with methods parameters (you likely will get slightly different results) and other methods can become better. But real result will be indistinguishable. Sophisticated methods are invented for sophisticated data. This is not the case.
All models are wrong, some are useful. - George Box
In terms of classification, a model would be effective as long as it could nicely fit the classification boundaries.
For binary classification case, supposing your data is perfectly linearly separable, then linear model will do the job - actually the "best" job since any more complicated models won't perform better.
If your +'s and -'s are somehow a bit scattered when they cannot be separated by a line (actually hyperplane), then linear model could be beaten by decision tree simply because decision trees can provide classification boundary of more complex shape (cubes).
Then random forest may beat decision tree as classification boundary of random forest is more flexible.
However, as we mentioned early, linear model still has its time.

Is it possible to calculate the posterior probability of any type of classifiers?

As i know, some classifiers such as Naive Bayes calculate the posterior probability of data and based on it produce the result.
My question is that does any classifier can produce posterior probability?
for example how decision tree can generate it?
Some classification models such as logistic regression and neural networks compute posterior class probabilities directly. Models based on generative models, such the quadratic discriminant and models derived from mixture densities, also compute posterior class probabilities. Decision trees can be easily adapted to output a class probability by returning the proportion of positive examples from leaves of the tree.
A prominent exception is the support vector machine, which doesn't return a probability. I think maybe someone has tried to modify it to return a probability; dunno how that worked out.
See Hastie, Tibshirani, and Friedman, "Elements of Statistical Learning" (or any of many texts) for more about this stuff. Further questions of this kind should probably go to stats.stackexchange.com.

Self organizing Maps and Linear vector quantization

Self organizing maps are more suited for clustering(dimension reduction) rather than classification. But SOM's are used in Linear vector quantization for fine tuning. But LVQ is a supervised leaning method. So to use SOM's in LVQ, LVQ should be provided with a labelled training data set. But since SOM's only do clustering and not classification and thus cannot have labelled data how can SOM be used as an input for LVQ?
Does LVQ fine tune the clusters in SOM?
Before using in LVQ should SOM be put through another classification algorithm so that it can classify the inputs so that these labelled inputs maybe used in LVQ?
It must be clear that supervised differs from unsupervised because in the first the target values are known.
Therefore, the output of supervised models is a prediction.
Instead, the output of unsupervised models is a label for which we don't know the meaning yet. For this purpose, after clustering, it is necessary to do the profiling of each one of those new label.
Having said so, you could label the dataset using an unsupervised learning technique such as SOM. Then, you should profile each class in order to be sure to understand the meaning of each class.
At this point, you can pursue two different path depending on what is your final objective:
1. use this new variable as a way for dimensionality reduction
2. use this new dataset featured with the additional variable representing the class as a labelled data that you will try to predict using the LVQ
Hope this can be useful!

Parameter selection of SVM

I have a dataset which I use for classifcation with libSVM in Matlab. The dataset consists of 4 classes.
For parameter selection of SVM I can do nested cross-validation. The problem is that I also need the value of the best parameters in the end.
After having done the nested cross-validation and having the final accuracy I want the values of the best parameters. Then I will train a SVM for each class (one-vs-all) with the best parameters for selecting the most important features (according to heighest weight), i.e. feature importance map.
How can I do this? Should I just not do nested cross-validation and only looping over all parameters and doing cross-validation?
Second, if I use a linear SVM then using this weight vector w for assigning importance to features works, but does it also work for non-linear SVM (e.g. rbf kernel)?
To find the "best" parameters for your kernel of choice, you have to loop through all parameters to perform a so called "grid search". LIBSVM does not support a build-in grid-search mechanismn.
Regarding your second question, I would suggest to perform a feature selection (e.g. Information Gain, Mutual Information, ...) as a pre-processing step before the actual work with the SVM and in a second step take the weight vector
s into consideration (but I am not sure, if this will work with RBF or Gaußian Kernels...).

Gaussian Naive Bayes classification

I have found the following Matlab implementation of a Naive Bayes classifier:
https://github.com/jjedele/Naive-Bayes-Classifier-Octave-Matlab
What is the difference between Gaussian Naive Bayes and Naive Bayes? How could I extend the above implementation to become Gaussian Naive Bayes?
How can I extend the implementation for using it with 4 classes? Just doing one-vs-all other?
Thank you very much for the help.
In Naive Bayes Classification we take a set of features (x0,x1,...xn) and try to assign those feature to one of a known set Y of class (y0,y1,...yk) we do that by using training data to calculate the conditional probabilities that tell us how often a particular class had a certain feature in the training set and then multiplying them together.
The result is a score for each class in the set Y. We then take the highest scoring member of Y as the class that our feature set should be assigned to.
up until this point we haven't made any assumptions about what the p(x|C) distributions look like.
In Guassian Naive Bayes we assume that all those p(x|C) values are normaly distributed that's the only "difference" and it really isn't a difference GNB is just a subset of Naive Bayes.
This can be useful if you don't have a lot of training data, and are willing to make the assumption that the population data is normally distributed about the mean of the sample (training) data you do have.
Full discloser the Tex comes from wikipedia.