How to calculate 95% confidence interval for AUC from confusion matrix? - confidence-interval

From a classification model in Weka software I get: sample size, confusion matrix and AUC (area under curve of ROC).
How may I calculate the 95% confidence interval for AUC?

I think you have everything you need so follow the following equation:
Note: N1 and N2 are the sample sizes of each group

Related

Is there a formulae to calculate the number of dimensions or principal components that corresponds to a particular variance magnitude?

I am trying to implement PCA in pytorch, given a covariance matrix, its eigenvalues, eigenvectors and a proportion of variance to be captured from a data matrix, is there a formulae to calculate the number of dimensions or principal components that corresponds to the variance?
Yes, there is. The percentage of variance explained by the first N principal components is the sum of the first N eigenvalues divided by the sum of all eigenvalues!
Basically it's the magnitude of the eigenvalue that tells you the importance of the feature.

how to calculate the spectral density of a matrix of data use matlab

I am not doing signal processing. But in my area, I will use the spectral density of a matrix of data. I get quite confused at a very detailed level.
%matrix H is given.
corr=xcorr2(H); %get the correlation
spec=fft2(corr); % Wiener-Khinchin Theorem
In matlab, xcorr2 will calculate the correlation function of this matrix. The lag will range from -N+1 to N-1. So if size of matrix H is N by N, then size of corr will be 2N-1 by 2N-1. For discretized data, I should use corr or half of corr?
Another problem is I think Wiener-Khinchin Theorem is basically for continuous function. I have always thought that Discretized FT is an approximation to Continuous FT, or you can say it is a tool to calculate Continuous FT. If you use matlab build in function 'fft', you should divide the final result by \delta x.
Any kind soul who knows this area well there to share some matlab code with me?
Basically, approximating a continuous FT by a Discretized FT is the same as approximating an integral by a finite sum.
We will first discuss the 1D case, then we'll discuss the 2D case.
Let's look at the Wiener-Kinchin theorem (for example here).
It states that :
"For the discrete-time case, the power spectral density of the function with discrete values x[n], is :
where
Is the autocorrelation function of x[n]."
1) You can see already that the sum is taken from -infty to +infty in the calculation of S(f)
2) Now considering the Matlab fft - You can see (command 'edit fft' in Matlab), that it is defined as :
X(k) = sum_{n=1}^N x(n)*exp(-j*2*pi*(k-1)*(n-1)/N), 1 <= k <= N.
which is exactly what you want to be done in order to calculate the power spectral density for a frequency f.
Note that, for continuous functions, S(f) will be a continuous function. For Discretized function, S(f) will be discrete.
Now that we know all that, it can easily be extended to the 2D case. Indeed, the structure of fft2 matches the structure of the right hand side of the Wiener-Kinchin Theorem for the 2D case.
Though, it will be necessary to divide your result by NxM, where N is the number of sample points in x and M is the number of sample points in y.

Calculating the area under curve from classification accuracy

I have an assignment:
Using Naive Bayes we built a model on some data with 2 classes (model returns 2 probabilities - one for positive and one for negative class). We calculated the area under ROC curve AUC = 0.8 and classification accuracy CA = 0.6 with threshold set to 0.5 (if the probability of some example for positive class is higher than 0.5, we predict positive class for that example, else the negative class). Then we discovered that if we set the threshold to 0.3, classification accuracy becomes CA = 0.7. What is the AUC for the second threshold? If the result depends on initial data, present all possibilities.
How can I calculate that?
Not sure if that qualifies as an answer, but the ROC AUC is the integral of sensitivity and specificity over all classification thresholds. Therefore you cannot compute the AUC for a specific threshold.

empirical quantiles in matlab

does anyone know how to calculate the empirical quantiles of a distribution in matlab? specifically I have issues working w the empiricalQuantiles() function and need to calculate empirical quantiles of a rolling population (a matrix that is say 49x1025 for every 100 points).
if you can also give information on how to calculate the inverse of the empirical distribution (which should give approximately the same answer) that would be great
% Simulating empirical data
empiricalData=randn(50000,1);
% Quantile evaluation
% For instance: Median
y = quantile(empiricalData,[.50]);

How do I compute the Inverse gaussian distribution from given CDF?

I want to compute the parameters mu and lambda for the Inverse Gaussian Distribution given the CDF.
By 'given the CDF' I mean that I have given the data AND the (estimated) quantile for the data I.e.
Quantile - Value
0.01 - 10
0.5 - 12
0.7 - 13
Now I want to find out the inverse gaussian distribution for this data so that I can e.g. Look up the quantile for value 11 based on my distribution.
How can I find out the values mu and lambda?
The only solution I can think of is using Gradient descent to find the best mu and lambda using RMSE as an error measure.
Isn't there a better solution?
Comment: Matlab's MLE-Algorithm is not an option, since it does not use the quantile data.
As all you really want to do is estimate the quantiles of the distribution at unknown values and you have a lot of data points you can simply interpolate the values you want to lookup.
quantile_estimate = interp1(values, quantiles, value_of_interest);
According to #mpiktas here I implemented a gradient descent algorithm for estimating my mu and lambda:
Make initial guess using MLE
Learn mu and lambda using gradient descent with RMSE as error measure.
The following article explains in detail how to compute quantiles (the inverse CDF) for the inverse Gaussian distribution:
Giner, G, and Smyth, GK (2016). statmod: probability calculations for the inverse Gaussian distribution. R Journal. http://arxiv.org/abs/1603.06687
Code for the R language is contained in the R package statmod available from CRAN. For example:
> library(statmod)
> qinvgauss(0.01, lower.tail=FALSE)
[1] 4.98
computes the 0.01 upper tail quantile of the standard IG distribution.