Matlab function perfcurve falsely asserts ROC AUC = 1

Matlab function perfcurve falsely asserts ROC AUC = 1 - matlab

The perfcurve function in Matlab falsely, asserts AUC=1 when two records are clearly misclassified for reasonable cutoff values.
If I run the same data through a confusion matrix with cutoff 0.5, the accuracy is rightfully below 1.
The MWE contains data from one of my folds. I noticed the problem because I saw perfect auc with less than perfect accuracy in my results.
I use Matlab 2016a and Ubuntu 16.4 64bit.
% These are the true classes in one of my test-set folds
classes = transpose([ones(1,9) 2*ones(1,7)])
% These are predictions from my classifier
% Class 1 is very well predicted
% Class 2 has two records predicted as class 1 with threshold 0.5
confidence = transpose([1.0 1.0 1.0 1.0 0.9999 1.0 1.0...
1.0 1.0 0.0 0.7694 0.0 0.9917 0.0 0.0269 0.002])
positiveClass = 1
% Nevertheless, the AUC yields a perfect 1
% I understand why X, Y, T have less values than classes and confidence
% Identical records are dealt with by one point on the ROC curve
[X,Y,T,AUC] = perfcurve(classes, confidence, positiveClass)
% The confusion matrix for comparison
threshold = 0.5
confus = confusionmat(classes,(confidence<threshold)+1)
accuracy = trace(confus)/sum(sum(confus))

This simply means that there is another cutoff where the separation is perfect.
Try:
threshold = 0.995
confus = confusionmat(classes,(confidence<threshold)+1)
accuracy = trace(confus)/sum(sum(confus))

Related

Assign outcomes to discrete distribution Matlab

I am trying to fit a distribution to a discrete dataset.
The possible outcomes are A = [1 3 4 5 9 10] with a respective probability of prob
prob = [0.2 0.15 0.1 0.05 0.35 0.15];
I have used makedist to find the distribution
pd = makedist('Multinomial','probabilities', prob);
I wonder if there is a way to include the outcomes 1 to 10 from A in the distribution, such that I can calculate the mean and the variance of the possible outcomes with var(pd), mean(pd). Up till now the mean is 3.65, but my aim is it to have mean(pd) = 5.95, which is the weighted sum of the possible outcomes . Thanks in advance.

The solution is pretty easy. The possible outcomes of a multinomial distrubution are defined by a sequence of values starting at 1 and ending at numel(prob). From this official documentation page:
Create a multinomial distribution object for a distribution with three
possible outcomes. Outcome 1 has a probability of 1/2, outcome 2 has a
probability of 1/3, and outcome 3 has a probability of 1/6.
pd = makedist('Multinomial','probabilities',[1/2 1/3 1/6])
Basically, your vector of possible outcomes includes a few values associated to a null (signally 0) probability. Thus, define your distribution as follows in order to obtain the expected result:
p = [0.20 0.00 0.15 0.10 0.05 0.00 0.00 0.00 0.35 0.15];
pd = makedist('Multinomial','probabilities',p);
mean(pd) % 5.95
var(pd) % 12.35

How do I create band-limited (100-640 Hz) white Gaussian noise?

I would like to create 500 ms of band-limited (100-640 Hz) white Gaussian noise with a (relatively) flat frequency spectrum. The noise should be normally distributed with mean = ~0 and 99.7% of values between ± 2 (i.e. standard deviation = 2/3). My sample rate is 1280 Hz; thus, a new amplitude is generated for each frame.
duration = 500e-3;
rate = 1280;
amplitude = 2;
npoints = duration * rate;
noise = (amplitude/3)* randn( 1, npoints );
% Gaus distributed white noise; mean = ~0; 99.7% of amplitudes between ± 2.
time = (0:npoints-1) / rate
Could somebody please show me how to filter the signal for the desired result (i.e. 100-640 Hz)? In addition, I was hoping somebody could also show me how to generate a graph to illustrate that the frequency spectrum is indeed flat.
I intend on importing the waveform to Signal (CED) to output as a form of transcranial electrical stimulation.

The following is Matlab implementation of the method alluded to by "Some Guy" in a comment to your question.
% In frequency domain, white noise has constant amplitude but uniformly
% distributed random phase. We generate this here. Only half of the
% samples are generated here, the rest are computed later using the complex
% conjugate symmetry property of the FFT (of real signals).
X = [1; exp(i*2*pi*rand(npoints/2-1,1)); 1]; % X(1) and X(NFFT/2) must be real
% Identify the locations of frequency bins. These will be used to zero out
% the elements of X that are not in the desired band
freqbins = (0:npoints/2)'/npoints*rate;
% Zero out the frequency components outside the desired band
X(find((freqbins < 100) | (freqbins > 640))) = 0;
% Use the complex conjugate symmetry property of the FFT (for real signals) to
% generate the other half of the frequency-domain signal
X = [X; conj(flipud(X(2:end-1)))];
% IFFT to convert to time-domain
noise = real(ifft(X));
% Normalize such that 99.7% of the times signal lies between ±2
noise = 2*noise/prctile(noise, 99.7);
Statistical analysis of around a million samples generated using this method results in the following spectrum and distribution:
Firstly, the spectrum (using Welch method) is, as expected, flat in the band of interest:
Also, the distribution, estimated using histogram of the signal, matches the Gaussian PDF quite well.

Calculate bias and variance in ridge regression MATLAB

I can't get my mind around the concept of how to calculate bias and variance from a random set.
I have created the code to generate a random normal set of numbers.
% Generate random w, x, and noise from standard Gaussian
w = randn(10,1);
x = randn(600,10);
noise = randn(600,1);
and then extract the y values
y = x*w + noise;
After that I split my data into a training (100) and test (500) set
% Split data set into a training (100) and a test set (500)
x_train = x([ 1:100],:);
x_test = x([101:600],:);
y_train = y([ 1:100],:);
y_test = y([101:600],:);
train_l = length(y_train);
test_l = length(y_test);
Then I calculated the w for a specific value of lambda (1.2)
lambda = 1.2;
% Calculate the optimal w
A = x_train'*x_train+lambda*train_l*eye(10,10);
B = x_train'*y_train;
w_train = A\B;
Finally, I am computing the square error:
% Compute the mean squared error on both the training and the
% test set
sum_train = sum((x_train*w_train - y_train).^2);
MSE_train = sum_train/train_l;
sum_test = sum((x_test*w_train - y_test).^2);
MSE_test = sum_test/test_l;
I know that if I create a vector of lambda (I have already done that) over some iterations I can plot the average MSE_train and MSE_test as a function of lambda, where then I will be able to verify that large differences between MSE_test and MSE_train indicate high variance, thus overfit.
But, what I want to do extra, is to calculate the variance and the bias^2.
Taken from Ridge Regression Notes at page 7, it guides us how to calculate the bias and the variance.
My questions is, should I follow its steps on the whole random dataset (600) or on the training set? I think the bias^2 and the variance should be calculated on the training set. Also, in Theorem 2 (page 7 again) the bias is calculated by the negative product of lambda, W, and beta, the beta is my original w (w = randn(10,1)) am I right?
Sorry for the long post, but I really want to understand how the concept works in practice.
UPDATE 1:
Ok, so following the previous paper didn't generate any good results. So, I took the standard form of Ridge Regression Bias-Variance which is:
Based on that, I created (I used the test set):
% Bias and Variance
sum_bias=sum((y_test - mean(x_test*w_train)).^2);
Bias = sum_bias/test_l;
sum_var=sum((mean(x_test*w_train)- x_test*w_train).^2);
Variance = sum_var/test_l;
But, after 200 iterations and for 10 different lambdas this is what I get, which is not what I expected.
Where in fact, I was hoping for something like this:

sum_bias=sum((y_test - mean(x_test*w_train)).^2); Bias = sum_bias/test_l
Why have you squared the difference between y_test and y_predicted = x_test*w_train?
I don't believe your formula for bias is correct. In your question, the 'bias term' above in blue is the bias^2 however surely your formula is neither the bias nor the bias^2 since you have only squared the residuals, not the entire bias?

fft / ifft deconvolution in Matlab

I have a time varying signal (time,amplitude) and a measured frequency sensitivity (frequency,amplitude conversion factor (Mf)).
I know that if I use the center frequency of my time signal to select the amplitude conversion factor (e.g. 0.0312) for my signal I get a max. converted amplitude value of 1.4383.
I have written some code to deconvolve the time varying signal and known sensitivity (i.e. for all frequencies).
where Pt is the output/converted amplitude and Mf is amplitude conversion factor data and fft(a) is the fft of the time varying signal (a).
I take the real part of the fft(a):
xdft = fft(a);
xdft = xdft(1:length(x)/2+1); % only retaining the positive frequencies
freq = Fs*(0:(L/2))/L;
where Fs is sampling frequency and L is length of signal.
convS = real(xdft).*Mf;
assuming Mf is magnitude = real (I don't have phase info). I also interpolate
Mf=interp1(freq_Mf,Mf_in,freq,'cubic');
so at the same interrogation points as freq.
I then reconstruct the signal in time domain using:
fftRespI=complex(real(convS),imag(xdft));
pt = ifft(fftRespI,L,'symmetric')
where I use the imaginary part of the fft(a).
The reconstructed signal shape looks correct but the amplitude of the signal is not.
If I set all values of Mf = 0.0312 for f=0..N MHz I expect a max. converted amplitude value of ~1.4383 (similar to if I use the center frequency) but I get 13.0560.
How do I calibrate the amplitude axis? i.e. how do I correctly multiply fft(a) by Mf?
Some improved understanding of the y axis of the abs(magnitude) and real FFT would help me I think...
thanks

You need to re-arrange the order of your weights Mf to match the MATLAB's order for the frequencies. Let's say you have a signal
N=10;
x=randn(1,N);
y=fft(x);
The order of the frequencies in the output y is
[0:1:floor(N/2)-1,floor(-N/2):1:-1] = [0 1 2 3 4 -5 -4 -3 -2 -1]
So if your weights
Mf = randn(1,N)
are defined in this order: Mf == [0 1 2 3 4 5 6 7 8 9], you will have to re-arrange
Mfshift = [Mf(1:N/2), fliplr(Mf(2:(N/2+1)))];
and then you can get your filtered output
z = ifft(fft(x).*Mshift);
which should come out real.

Matlab Portfolio Optimization: Calculating the IN-efficient frontier

I have a question with regards to Portfolio Optimization in Matlab. Is there a way to plot and obtain the values in the IN-efficent frontier (the bottom locus of points that envelopes the feasible solutions as opposed to the efficient frontier which envelopes the top portion)?
if ~exist('quadprog')
msgbox('The Optimization Toolbox(TM) is required to run this example.','Product dependency')
return
end
returns = [0.1 0.15 0.12];
STDs = [0.2 0.25 0.18];
correlations = [ 1 0.3 0.4
0.3 1 0.3
0.4 0.3 1 ];
% Converting to correlation and STD to covariance matrix
covariances = corr2cov(STDs , correlations);
% Calculating and Plotting Efficient Frontier
portopt(returns , covariances , 20)
% random Portfolio Generation
weights = exprnd(1,1000,3);
total = sum(weights , 2);
total = total(:,ones(3,1));
weights = weights./total;
[portRisk , portReturn] = portstats(returns , covariances , weights);
hold on
plot(portRisk , portReturn , '.r')
title('Mean-Variance Efficient Frontier and Random Portfolios')
hold off
Is there a way/command to obtain the return/risk/weight of the lower envelope of the feasible solution in the same way that the efficient frontier can be calculated?
Thanks in advance!

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Matlab function perfcurve falsely asserts ROC AUC = 1 - matlab

This simply means that there is another cutoff where the separation is perfect. Try: threshold = 0.995 confus = confusionmat(classes,(confidence<threshold)+1) accuracy = trace(confus)/sum(sum(confus))

Related

Assign outcomes to discrete distribution Matlab

How do I create band-limited (100-640 Hz) white Gaussian noise?

Calculate bias and variance in ridge regression MATLAB

fft / ifft deconvolution in Matlab

Matlab Portfolio Optimization: Calculating the IN-efficient frontier

Categories

Resources