I am using Euclidean distance for speaker recognition. I want to plot the ROC curve using perfcurve in MATLAB. Since the scores are the resulting euclidean distances, am I doing right? Thanks
Labels=[1 1 1 1 1 1 1 0 0 1];
scores=[18.5573 15.3364 16.8427 19.6381 16.4195 17.3226 18.9520 21.6811 21.4013 22.3880];
[x,y]=perfcurve(Labels,scores,1);
plot(x,y);
xlabel('False positive rate'); ylabel( 'True positive rate')
You did right.
Only sensitive point is that you have to understand the meaning of your scores. Is it higher the better or lower the better?
If its lower the better, then I would use [x,y]=perfcurve(Labels,-scores,1); instead
Related
Orange3 says that cosine of No.1 vector[1, 0] to No.2 vector[0, 1] is 1.000 and No.1 to No.7 vector[-1, 0] is 2.000 in Distance Matrix as below capture. I believe that it has to be 0.000 and -1.000 because it is supposed to be cosine. Or if it is radian, it has to be 1.5708(pi/2) and 3.1415(pi).
Sounds like range of cosine is 0.0 to 2.0 in Orange3, but I've never told this before.
Does someone have any idea of this cosine results?
Thank you.
What you describe is cosine similarity. Orange computes cosine distance.
The code is here: https://github.com/biolab/orange3/blob/master/Orange/distance/distance.py#L455.
I am using matlab perfcurve
[X,Y,T,AUC] = perfcurve(labels,scores,posclass)
I am confused about the following. first a basic example and then I ll followup with my question
a) [X,Y,T,AUC] = perfcurve([1 1 1 0 0 0],[.9 .9 .9 .1 .1 .1],1) produces AUC = 1
b) [X,Y,T,AUC] = perfcurve([0 0 0 1 1 1],[.9 .9 .9 .1 .1 .1],1) produces AUC = 0
when I provide the positive class(laebl=1) does it always have to have the higher scores?
If I make the positive class(label=1) have lower scores as in b) above would the ROC curve be flipped (mirror opposite of the normal ROC curve)
The curves I generate with my data looks like below.
plot 1 is the distribution of the scores.The classes are shown in red and blue. Notice that the label=1 (red) class has low scores.
red -> label=1
blue-> label=0
The next image is the generated ROC curve. It's basically a flipped image of what I want to see. Am I doing something wrong? or is this behavior related to the label=1 class having low scores?
When you write the 1 in the third argument, you define the class label to be assumed as positive (1), and then perfcurve calculates fpr and tpr by looking at the probabilites/scores you provide in the second argument, in relation to the positive class label as you defined it (1). The score for each data defines if it is a TP or a FP (you already defined the positive class), so if you exchange scores as you show above, without changing the class label of the positive class also, each TP becomes a FP, since now is at the opposite side of the thresholds used to calculate ROC curve. That's why the plot is a mirror image of what you expect.
I am trying to use the cosine distance in pdist2. I am confused about it's output. As far as I know it should be between 0 and 1. Since MATLAB uses 1-(cosine), then 1 would be the highest variability while 0 would be the lowest. However the output seems to range from 0.5 to 1.5 or something along that!
Can somebody please advise me on how to interpret this output?
From help pdist2:
'cosine' - One minus the cosine of the included angle
between observations (treated as vectors)
Since the cosine varies between -1 and 1, the result of pdist2(...'cosine') varies between 0 and 2. If you want the cosine, use 1-pdist2(matrix1,matrix2,'cosine').
I am using matlab to plot the random variables satisfying the normal distribution. I plot the histogram as
w = 0.2;
y = randn(1, 1000)*w;
hist(y);
this shows the variables in the histogram ranges from -40 to 40, but what's that? I think since the width of the normal distribution is 0.2, I think the range of the variable should be within -1 to 1, right? So why the hist shows from -40 to 40? How do I know the actual range of the random variable? Thanks.
In the normal random variable, sometimes called Gaussian distribution, the range could be from -infinity to +infinity in theory. However, the distribution has a bell shape, this means the larger values have lower probability of occurring, but there is a chance that they happen. So if instead of randn(1, 1000) you use randn(1,1000000) with a high probability you will see a larger range. The value 0.2 that you multiply the randn() with just changes the energy of this random signal.
Can you give a bit more information?
When I run your snippet, I get a Gaussian histogram with min and max:
>> [min(y) max(y)]
ans =
-0.6464 0.7157
I have a matrix with 1000 real numbers within range -3 to 3. I have to plot the numbers on a graph so as to get a continuous curve combining all the points. The matrix name is Points and it is 1000 by 1 matrix.
try
plot(Points(:)); % plot the points
ylim([-3 3]); % set the y limits