ROC-AUC plot for multiple classes - classification

My softmax classifier outputs 312 output probabilities; each referring to an individual class (total classes = 312).
How can I show the results as a ROC curve?
Do I have to plot ROC for each class? (Doing this will creates 312 curves.)
Any way to show the ROC curve for such multi class classification problems?

Related

roc curve from SVM classifier is visualise with limite thresholds in Python

i am trying to plot ROC to evaluate my classifier, however my ruc plot is not "smooth". It supposed to be some problem with the thresholds? i am quite new in python classification so propably there is sth wrong with my code. see image below. Where i sould look for solution?
i used that drop_intermediate=False but it does not help;/
This is because you are passing 0 and 1 values (predicted labels) to the plotting function. The ROC curve can only be figured out, when you provide floats in a range of 0.0 to 1.0 (predicted label probabilities) such that the ROC curve can consider multiple cutoff values and appears more "smooth" as a result.
Whatever classifier you are using, make sure y_train_pred contains float values in the range [0.0,1.0]. If you have a scoring classifier with values in the range [-∞,+∞] you can apply a sigmoid function to remap the values to this range.

Plot Loss Function in Weka using Neuronal Perceptron

I want to plot my training and validation loss curves to visulize the model performance. How can I plot two curves in Weka (using Java)?
I tried using SGD:
`SGD p = new SGD();
p.setDontNormalize(true);
p.setDontReplaceMissing(true);
p.setEpochs(1);
p.setLearningRate(0.001);`
I tried using:
`System.out.println(eval.errorRate());`
But I could not find out to get the values to show me the curves.

Gaussian Mixture Model 1D data

I have modeled my 1D data (1000*1 matrix) into 3 Gaussians, using
gmdistribution.fit(X,3)
How can I plot something Like this?
It shows the probability of a given point belonging to each class.
Write a function plotGaussian, which takes the mean, the variance, and a range of values. The function should generate the points to plot, and call the plot function. Then do hold on, and call plotGaussian 3 times.

How to implement Priors for KNN in matlab

I have a question concerning KNN training. I want to implement the KNN like in the book Pattern Recognition and Machine Learning by C. M. Bishop. We need to find the conditional density, *p(x|C_k)=K_k\N_kV*, the uncondidtional density p(x)=K\NV and the class priors, *p(C_k)=N_k\N*. Then, substitution into Bayes theorem results in the posterior probabilities.
My problem lies in programming. Lets assume I have 5 classes, in those classes my feature vectors are of different length. So class 1 has 100, class 2 has 344, class 3 has 541 ... class 5 has n. And all of them add up to the full dimensionality of 100 (number feature vectors) and a label vector of number of FV 1.
Now my questions: When I want to implement my priors now I can't just iterate with a for loop over all classes, because all my classes (different length) are stored in one matrix with feature vectors one after another
%% Code fragment out of ML-KNN: A lazy learning approach to multilabel learning Min-Ling Zhang, Zhi-Hua Zhou
for i=nclass
temp_Ci=sum(train_target(i,:)==ones(1,num_training));
Prior(i,1)=(Smooth+temp_Ci)/(Smooth*2+num_training);
PriorN(i,1)=1-Prior(i,1);
end
So, how can I implement the priors in Matlab?

how to calculate roc curves?

I write a classifier (Gaussian Mixture Model) to classify five human actions. For every observation the classifier compute the posterior probability to belong to a cluster.
I want to valutate the performance of my system parameterized with a threshold, with values from 0 to 100. For every threshold values, for every observation, if the probability of belonging to one of cluster is greater than threshold I accept the result of the classifier otherwise I discard it.
For every threshold values I compute the number of true-positive, true-negative, false-positive, false-negative.
Than I compute the two function: sensitivity and specificity as
sensitivity = TP/(TP+FN);
specificity=TN/(TN+FP);
In matlab:
plot(1-specificity,sensitivity);
to have the ROC curve. But the result isn't what I expect.
This is the plot of the functions of discards, errors, corrects, sensitivity and specificity varying the threshold of one action.
This is the plot of ROC curve of one action
This is the stem of ROC curve for the same action
I am wrong, but i don't know where. Perhaps I do wrong the calculating of FP, FN, TP, TN especially when the result of the classifier is minor of the threshold, so I have a discard. What I have to incremente when there is a discard?
Background
I am answering this because I need to work through the content, and a question like this is a great excuse. Thank you for the good opportunity.
I use data from the built-in fisher iris data:
http://archive.ics.uci.edu/ml/datasets/Iris
I also use code snippets from the Mathworks tutorial on the classification, and for plotroc
http://www.mathworks.com/products/demos/statistics/classdemo.html
http://www.mathworks.com/help/nnet/ref/plotroc.html?searchHighlight=plotroc
Problem Description
There is clearer boundary within the domain to classify "setosa" but there is overlap for "versicoloir" vs. "virginica". This is a two dimensional plot, and some of the other information has been discarded to produce it. The ambiguity in the classification boundaries is a useful thing in this case.
%load data
load fisheriris
%show raw data
figure(1); clf
gscatter(meas(:,1), meas(:,2), species,'rgb','osd');
xlabel('Sepal length');
ylabel('Sepal width');
axis equal
axis tight
title('Raw Data')
Analysis
Lets say that we want to determine the bounds for a linear classifier that defines "virginica" versus "non-virginica". We could look at "self vs. not-self" for other classes, but they would have their own
So now we make some linear discriminants and plot the ROC for them:
%load data
load fisheriris
load iris_dataset
irisInputs=meas(:,1:2)';
irisTargets=irisTargets(3,:);
ldaClass1 = classify(meas(:,1:2),meas(:,1:2),irisTargets,'linear')';
ldaClass2 = classify(meas(:,1:2),meas(:,1:2),irisTargets,'diaglinear')';
ldaClass3 = classify(meas(:,1:2),meas(:,1:2),irisTargets,'quadratic')';
ldaClass4 = classify(meas(:,1:2),meas(:,1:2),irisTargets,'diagquadratic')';
ldaClass5 = classify(meas(:,1:2),meas(:,1:2),irisTargets,'mahalanobis')';
myinput=repmat(irisTargets,5,1);
myoutput=[ldaClass1;ldaClass2;ldaClass3;ldaClass4;ldaClass5];
whos
plotroc(myinput,myoutput)
The result is shown in the following, though it took deleting repeat copies of the diagonal:
You can note in the code that I stack "myinput" and "myoutput" and feed them as inputs into the "plotroc" function. You should take the results of your classifier as targets and actuals and you can get similar results. This compares the actual output of your classifier versus the ideal output of your target values. Those are the input to plotroc.
So this will give you "built-in" ROC, which is useful for quick work, but does not make you learn every step in detail.
Questions you can ask at this point include:
which classifier is best? How do I determine what best is in this case?
What is the convex hull of the classifiers? Is there some mixture of classifiers that is more informative than any pure method? Bagging perhaps?
You are trying to draw the curves of precision vs recall, depending on the classifier threshold parameter. The definition of precision and recall are:
Precision = TP/(TP+FP)
Recall = TP/(TP+FN)
You can check the definition of these parameters in:
http://en.wikipedia.org/wiki/Precision_and_recall
There are some curves here:
http://www.cs.cornell.edu/courses/cs578/2003fa/performance_measures.pdf
Are you dividing your dataset in training set, cross validation set and test set? (if you do not divide the data, it is normal that your precision-recall curve seems weird)
EDITED: I think that there are two possible sources for your problem:
When you train a classifier for 5 classes, usually you have to train 5 distinctive classifiers. One classifier for (class A = class 1, class B = class 2, 3, 4 or 5), then a second classfier for (class A = class 2, class B = class 1, 3, 4 or 5), ... and the fifth for class A = class 5, class B = class 1, 2, 3 or 4).
As you said to select the output for your "compound" classifier, you have to pass your new (test) datapoint through the five classifiers, and you choose the one with the biggest probability.
Then, you should have 5 thresholds to define weighting values that my prioritize selecting one classifier over the others. You should check how the matlab implementations uses the thresholds, but their effect is that you don't choose the class with more probability, but the class with better weighted probability.
As you say, maybe you are not calculating well TP, TN, FP, FN. Your test data should have datapoints belonging to all the classes. Then you have testdata(i,:) and classtestdata(i) are the feature vector and "ground truth" class of datapoint i. When you evaluate the classifier you obtain classifierOutput(i) = 1 or 2 or 3 or 4 or 5. Then you should calculate the "confusion matrix", which is the way to calculate TP, TN, FP, FN when you have multiple classes (> 2):
http://en.wikipedia.org/wiki/Confusion_matrix
http://www.mathworks.com/help/stats/confusionmat.html
(note the relation between TP, TN, FP, FN that you are calculating for the multiclass problem)
I think that you can obtain the TP, TN, FP, FN data of each subclassifier (remember that you are calculating 5 separate classifiers, even if you do not realize it) from the confusion matrix. I am not sure but you can draw the precision recall curve for each subclassifier.
Also check these slides: http://www.slideserve.com/MikeCarlo/multi-class-and-structured-classification
I don't know what the ROC curve is, I will check it because machine learning is a really interesting subject for me.
Hope this helps,