how to get AUROC at 5% false positive - classification

How can I get AUROC at 5% false positives. I am not quite sure how to do that, is it possible to get it from full area under the curve or it has to be calculated from validation set?

You cannot.
A ROC curve displays the trade-off between true and false positive rates (or sensitivity/specificity) by varying the decision threshold over all possible thresholds. Therefore the curve integrates over all possible false positive rates between 0% and 100%. You cannot have a curve for a single false positive rate, only a single point at best.
I suggest to read Understanding ROC curve and the answers to the question on Cross Validated and An introduction to ROC analysis by Fawcett.

Related

What is the threshold in AUC (Area under curve)

Assume a binary classifier (say a random forest) rfc and I want to calculate the AUC. I struggle to understand how the threshold are being used in the calculation. I understand that you make a plot of TPR/FPR for different thresholds. I also understand the threshold is used as a threshold for predicting class 1 (else class 0), but how does the AUC algorithm predict classes?
Say using sklearn.metrics.roc_auc_score you pass y_true and y_rfc (being the true value and the predicted value), but I do not see how the thresholds come into play in the AUC score/plot.
I have read different guides/tutorials for AUC, but all of their explanation regarding the threshold and how it is used is kinda vague.
I have also had a look at How does sklearn actually calculate AUROC? .
AUC curve is generated based on TPR/FPR of different thresholds. The main point of ROC is to sample threshold from (0;1) and get a point for curve. Notice that if your classifier is perfect you will get point (0,1) and for all smaller threshold cant be worst, so it also will be on (0,1) which leads to auc = 1.
AUC provide your information not only about classification quality but also about how good confidence of your classifier was evaluated.

A region of my ROC curve is below the random line

I have some targets in an hyperspectral image and I want to detect them. I proposed a detector and then I analysed its performance via the Receiver Operating Characteristics (ROC) curves.
When the targets to detect have a very low signal to noise ratio (that is, the targets are very weak in the image and so their detection is very challenging especially for very small probability of false alarms Pfa values), I am always obtaining like the following ROC curve.
This is not my figure, but I am obtaining similar to this ROC curve. So my curve is below the random line for Pfa<=0.1.
I am wondering if it is normal? is it acceptable to have a region of the ROC curve below the random line? and if yes, so how this can be justified?
The ROC shows true and false positive ratios for increasing threshold. With the threshold at one extreme, everything is classified negative, and so you have 0% true positives and 0% false positives. With the threshold at the other extreme you have 100% true positives and 100% false positives. In between the two extremes, anything can happen. In this particular case, as you increase the threshold from the first extreme, you start classifying negative samples as positive, and so you increase the false positive rate without increasing the true positive rate.
In principle there's nothing wrong with this. What matters is that you can find a point (a threshold) where the compromise between true and false positive ratios are satisfactory. That is the point at which you'll operate your system. And because you want the choice of threshold to be robust, you want that the ROC changes slowly around that point. But what it does far away from your operating point doesn't influence your system. (This is why I think that the "Area under the Curve" measure for performance is not useful.)
However, what your ROC does show you is that the samples that your system thinks are most obviously positive are actually negative. Maybe you didn't model your samples properly?

how to generate an ROC curve given a specific hit rate and false alarm rate on MATLAB?

So I have a certain hit rate and false alarm rate, and I'm looking to see if there is a function that will allow me to type in thee two values and have matlab generate an ROC curve for me based on this point? Note, the hit/false alarm rates aren't extreme values, so curve estimation should be doable with one point, right?
Thanks in advance for any help!

MATLAB's Neural Network classifier with False Positives only

I have been trying to design Neural Network but have come into some issues. I am rather newbie. If that's OK, I would like to have some comments on results, I've got from training. This is simple feed forward NN with two hidden layers, two outputs and 13 inputs. The both hidden layers contain 7 neurons.
I have provided a few graphs to show my results from training a classifier. The graphs look slightly different from what I'd expect from a successful training process.
The first graphs is how the training progressed. It seems all right to me but NN stops due to validation error not minimum gradient. I don't know whether it is good news or not. On the second graph, the gradient seems to be oscillating up and down. Does it mean, NN struggled with finding an optimum structure? The ROC graph shows IMO very good results in terms of True Positive ratio and False Positive ratio. I believe higher this value, the better results. But confusion matrix is what I am worrying about the most. It states zero positive detections and all of false positives.
What do you think?
My opinion about your question by part.
if to speak about
The first graphs is how the training progressed. It seems all right to
me but NN stops due to validation error not minimum gradient. I don't
know whether it is good news or not. On the second graph, the
gradient seems to be oscillating up and down. Does it mean, NN
struggled with finding an optimum structure?
Not necessary. It could mean, that NN found appropriate weights, and then danced around it with jumping out of global minima and then coming back to it closer or further. I can propose you to have some kind of saver of some middle stages during training in order to check later which results for ROC and confusion matrix you can get from middle stages. Some time in my life I had better results, some time worse.
The ROC graph shows IMO very good results in terms of True Positive
ratio and False Positive ratio. I believe higher this value, the better results.
Hard to disagree with you on this point.
If to speak about
But confusion matrix is what I am worrying about the most. It states
zero positive detections and all of false positives.
That IMHO can be considered also as good results, or even very good results because your classification is quite fine in percentage ( for majority of my cases with which I've work it was desirable result to achieve), but also confusing.

How to plot miss rate vs false positive rate?

i have a problem to plot a curve for miss rate vs false positive rate to analyze the performance of my proposed system (as sampled on picture below). I have two samples dataset for positive and negative sample. I want to plot the performance of my system whether it can classify people or non people with this curve.
As far as I know, I need to get True Positive and False Positive values after the classification, but I am not sure yet how to plot the curve. Any one can help please??
Since MATLAB R2017a, you can use evaluateDetectionMissRate function.
[logAverageMissRate,fppi,missRate] = evaluateDetectionMissRate(detectionResults,groundTruthData)
This function returns data points for plotting the log MR-FPPI curve.
(MR: Miss-Rate, FPPI: False Positive Per Image).
For an example of its usage, type doc evaluateDetectionMissRate command in MATLAB or go to here.
There are two type of bounding boxes in the object detection, the boxes that data-set labeled theme as object and second the boxes that your algorithm detects.
If your bbox have huge intersection with data-set bbox, it is okey.
If your bbox have NOT intersection with data-set bbox it is False Posative.
And we call All data-set bbox without intersection with your bbox in an image MISS Rate. and after calculating these numbers, plotting these values is straight forward.
You can use the following GitHub repo for plotting MR vs FPPI. It may seem like the code only calculates the mAP but it does much more than that. It also calculates the miss-rate, false positives per image and the log-average miss rate. All these are computed in the main.py file present in the repo(line 81) but are not plotted. All you have to do is just plot the MR vs FPPI using matplotlib (or any other module). Just follow the ReadMe file to get started.
Hope this helps!