ROC curve from the result of a classification or clustering - cluster-analysis

Say that I've clustered a training dataset of 5 classes containing 1000 instances, to 5 clusters (centers) using for example k-means. Then I've constructed a confusion matrix by validating on a test dataset. I want then to use plot a ROC curve from this, how is it possible to do that ?

Roc Curves show trade-off between True Positive and False Positive Rate. In other words
ROC graphs are two-dimensional graphs in which TP rate is plotted on
the Y axis and FP rate is plotted on the X axis
ROC Graphs: Notes and Practical Considerations for Researchers
When you use a discrete classifier, that classifier produces only a single point in ROC Space. Normally you need a classifier which produces probabilities. You change your parameters in classifier so that your TP and FP rates change. After that you use this points to draw a ROC curve.
Lets say you use k-means. K-means give you cluster membership discretely. A point belongs to ClusterA or .. ClusterE. Therefore outputting ROC curve from k-means is not straightforward. Lee and Fujita
describes an algorithm for this. You should look to their paper. But algorithm is something like this.
Apply k-means
calculate TP and FP using test data.
change membership of data points from one cluster to second cluster.
calculate TP and FP using test data again.
As you see they get more points in ROC space and use these points to draw ROC curve

Related

How to create ROC curve for different classifier in Weka or excel

I have a array of sensitivity and specificity values for positive class for different classifiers. I want to create one ROC curve for each classifier.
For example
Sensitivity specificity ROC
NB 0.613 0.778 0.791
LR 0.865 0.842 0.88
MLP 0.976 0.903 0.959
Those are not real value I created those value for demonstration purpose. I mention here sensitivity and specificity because ROC is the ratio of True positive and False positive rate.
I want a plot like that
I also go through the Weka Tutorial 30: Multiple ROC Curves (Model Evaluation). The knowledge flow diagram he was talking about had two drawback
1. If I have a training and test dataset and I want to see the ROC of test dataset that was not defined there.
2. If I am using 5 fold cross validation on training set how could I represent that, that was also not defined.
I tried to make a knowledge flow environment by myself but I did not get the option on arff loader "the load model".

What is the threshold in AUC (Area under curve)

Assume a binary classifier (say a random forest) rfc and I want to calculate the AUC. I struggle to understand how the threshold are being used in the calculation. I understand that you make a plot of TPR/FPR for different thresholds. I also understand the threshold is used as a threshold for predicting class 1 (else class 0), but how does the AUC algorithm predict classes?
Say using sklearn.metrics.roc_auc_score you pass y_true and y_rfc (being the true value and the predicted value), but I do not see how the thresholds come into play in the AUC score/plot.
I have read different guides/tutorials for AUC, but all of their explanation regarding the threshold and how it is used is kinda vague.
I have also had a look at How does sklearn actually calculate AUROC? .
AUC curve is generated based on TPR/FPR of different thresholds. The main point of ROC is to sample threshold from (0;1) and get a point for curve. Notice that if your classifier is perfect you will get point (0,1) and for all smaller threshold cant be worst, so it also will be on (0,1) which leads to auc = 1.
AUC provide your information not only about classification quality but also about how good confidence of your classifier was evaluated.

Computing (adjusted) R^2 from a given curve and data points

I would like to assess the fit a given curve with some data points by computing the (idelly, adjusted) R^2, using Matlab.
All the tutorials I could find online explain how to do this when the curve has been obtained directly from the data set, but in my case the curve has been determined independently and I need to compute R^2 as a measure of the fit to the new data set, which I'm using as a test.
Is there any routine in Matlab to do this?

MATLAB - Wavelet coefficient based QRS complex classifier

I am new to Wavelet field and I wanted to ask you for a help for an idea.
I am supposed to create QRS complex (certain part of ECG signal wave) wave morphology classifier based on Wavelets in other words, I am supposed to create classifier which will separate waves with similar wave shape to categories, like bins in statistics, but based on signal wavelet coefficients.
I tried MATLAB mdwtdec and used wavelet coefficients on certain level as an input for classifier which calculates distance from each QRS and according to threshold separates to classes.
This approach is rather naive and I guess in order to improve it, I need some other idea or hint.

How to find the correlation between scatter data and theoretical formula MATLAB?

I'm working on my thesis project on financial mathematics. One problem I'm having is that I want to find out if there is some correlation between a theoretical curve and scatter point data.
Here is the scatter data and the theoretical curve that I have.
Is there some easy way of doing this?
Bivariate correlation (usually Pearson correlation) is a statistic that measures linear dependence between two sets of data. The theoretical curve of your link does not seem to consist of discrete data points, therefore it is not possible to calculate correlation between it a and some set of data.
Depending on the model and the research question you have, you might be interested in analyzing the fit of your data to the model, using multivariate regression analysis or general[ized] linear model. These MATLAB commands could be useful: regress (multiple linear regression), regstats (regression diagnostics), glmfit (generalized linear model regression) and glmval (generalized linear model values).