Calculate 95% Confidence Interval from Diagnostic Test Data

Calculate 95% Confidence Interval from Diagnostic Test Data - matlab

I'm trying to calculate the 95% confidence intervals based off a series of matrices based in Matlab:
I know how to calculate the required sensitivity, specificity, negative predictive value and positive predictive value, however I'm not sure, given these data, how to calculate the 95% confidence interval.
Any help would be greatly appreciated.

You need the whole distribution to calculate your confidence intervals.
Once you have it, call it "data" and run the following code.
The code works with nans in the data, with vectors (Nx1) or matrices (NxM). As long as each distribution is represented by a column
%%% calculate CI
alpha_lvl = [2.5 97.5]; % lower and upper boundaries for the CI
SEM = nanstd(data)./sqrt(sum(~isnan(data ))); % Standard Error
ts = tinv(alpha_lvl/100,length(data)-1); % T-Score
CI = bsxfun(#plus,nanmean(data ),bsxfun(#times,SEM,ts')); % confidence intervals

Related

Fourier Analysis - MATLAB

Good evening guys,
I wanna ask you a question regarding the analysis of a function in the domain of frequencies (Fourier). I have two vectors: one containing 7700 values for pressure, and the other one containing 7700 values (same number) for time.
For example, I call the firt vector "a" and the second one "b". With the command "figure(1),plot(a,b)" I obtain the curve in the domain of time.
How can I do to plot this curve in the domain of frequency, to make Fourier transform?
I've read about the function "fft", but I've not understood very well how it can be used...can anyone help me?
Thanks in advance for your attention!

fft returns spectrum as complex numbers. In order to analyze it you have to use its absolute value or phase. In general, it should look like this (let's assume that t is vector containing time and y is the one with actual signal, N is the number of samples):
fY = fft(y) / (N/2) % scale it to amplitude, typically by N/2
amp_fY = abs(fY)
phs_fY = angle(fY)
Additionally, it would be nice to have FFT with known frequency resolution. For that, you need sampling period/frequency. Let's call that frequency fs:
fs = 1/(t(1) - t(0))
and the vector of frequencies for FFT (F)
should be:
F = (0:fs/N:(N-1)*fs/N)
and finally plots:
plot(F, amp_fY)
% or plot(F, phs_fy) according to what you need
I you can use stem instead of plot to get some other type of chart.
Note that the DC component (the average value) will be doubled on the plot.
Hope it helps

Sampling at exactly Nyquist rate in Matlab

Today I have stumbled upon a strange outcome in matlab. Lets say I have a sine wave such that
f = 1;
Fs = 2*f;
t = linspace(0,1,Fs);
x = sin(2*pi*f*t);
plot(x)
and the outcome is in the figure.
when I set,
f = 100
outcome is in the figure below,
What is the exact reason of this? It is the Nyquist sampling theorem, thus it should have generated the sine properly. Of course when I take Fs >> f I get better results and a very good sine shape. My explenation to myself is that Matlab was having hardtime with floating numbers but I am not so sure if this is true at all. Anyone have any suggestions?

In the first case you only generate 2 samples (the third input of linspace is number of samples), so it's hard to see anything.
In the second case you generate 200 samples from time 0 to 1 (including those two values). So the sampling period is 1/199, and the sampling frequency is 199, which is slightly below the Nyquist rate. So there is aliasing: you see the original signal of frequency 100 plus its alias at frequency 99.
In other words: the following code reproduces your second figure:
t = linspace(0,1,200);
x = .5*sin(2*pi*99*t) -.5*sin(2*pi*100*t);
plot(x)
The .5 and -.5 above stem from the fact that a sine wave can be decomposed as the sum of two spectral deltas at positive and negative frequencies, and the coefficients of those deltas have opposite signs.
The sum of those two sinusoids is equivalent to amplitude modulation, namely a sine of frequency 99.5 modulated by a sine of frequency 1/2. Since time spans from 0 to 1, the modulator signal (whose frequency is 1/2) only completes half a period. That's what you see in your second figure.
To avoid aliasing you need to increase sample rate above the Nyquist rate. Then, to recover the original signal from its samples you can use an ideal low pass filter with cutoff frequency Fs/2. In your case, however, since you are sampling below the Nyquist rate, you would not recover the signal at frequency 100, but rather its alias at frequency 99.
Had you sampled above the Nyquist rate, for example Fs = 201, the orignal signal could ideally be recovered from the samples.† But that would require an almost ideal low pass filter, with a very sharp transition between passband and stopband. Namely, the alias would now be at frequency 101 and should be rejected, whereas the desired signal would be at frequency 100 and should be passed.
To relax the filter requirements you need can sample well above the Nyquist rate. That way the aliases are further appart from the signal and the filter has an easier job separating signal from aliases.
† That doesn't mean the graph looks like your original signal (see SergV's answer); it only means that after ideal lowpass filtering it will.

Your problem is not related to the Nyquist theorem and aliasing. It is simple problem of graphic representation. You can change your code that frequency of sine will be lower Nyquist limit, but graph will be as strange as before:
t = linspace(0,1,Fs+2);
plot(sin(2*pi*f*t));
Result:
To explain problem I modify your code:
Fs=100;
f=12; %f << Fs
t=0:1/Fs:0.5; % step =1/Fs
t1=0:1/(10*Fs):0.5; % step=1/(10*Fs) for precise graphic representation
subplot (2, 1, 1);
plot(t,sin(2*pi*f*t),"-b",t,sin(2*pi*f*t),"*r");
subplot (2, 1, 2);
plot(t1,sin(2*pi*f*t1),"g",t,sin(2*pi*f*t),"r*");
See result:
Red star - values of sin(2*pi*f) with sampling rate of Fs.
Blue line - lines which connect red stars. It is usual data representation of function plot() - line interpolation between data points
Green curve - sin(2*pi*f)
Your eyes and brain can easily understand that these graphs represent the sine
Change frequency to more high:
f=48; % 2*f < Fs !!!
See on blue lines and red stars. Your eyes and brain do not understand now that these graphs represent the same sine. But your "red stars" are actually valid value of sine. See on bottom graph.
Finally, there is the same graphics for sine with frequency f=50 (2*f = Fs):
P.S.
Nyquist-Shannon sampling theorem states for your case that if:
f < 2*Fs
You have infinite number of samples (red stars on our plots)
then you can reproduce values of function in any time (green curve on our plots). You must use sinc interpolation to do it.

copied from Matlab Help:
linspace
Generate linearly spaced vectors
Syntax
y = linspace(a,b)
y = linspace(a,b,n)
Description
The linspace function generates linearly spaced vectors. It is similar to the colon operator ":", but gives direct control over the number of points.
y = linspace(a,b) generates a row vector y of 100 points linearly spaced between and including a and b.
y = linspace(a,b,n) generates a row vector y of n points linearly spaced between and including a and b. For n < 2, linspace returns b.
Examples
Create a vector of 100 linearly spaced numbers from 1 to 500:
A = linspace(1,500);
Create a vector of 12 linearly spaced numbers from 1 to 36:
A = linspace(1,36,12);

linspace is not apparent for Nyquist interval, so you can use the common form:
t = 0:Ts:1;
or
t = 0:1/Fs:1;
and change the Fs values.
The first Figure is due to the approximation of '0': sin(0) and sin(2*pi). We can notice the range is in 10^(-16) level.

I wrote the function reconstruct_FFT that can recover critically sampled data even for short observation intervals if the input sequence of samples is periodic. It performs lowpass filtering in the frequency domain.

95% confidence bands around fit from fminunc

I have performed a MLE fit to some data that I have using the fminunc function in matlab and estimated the parameter confidence intervals from the Hessian output. Does anyone have a method for generating 95% confidence bands around the fitted curve?

A crude way to do this is generate parameters grid on the intervals that you have obtained, and plot the curve for every point of this grid. This works only if the number of parameters is low. You should keep in mind that the resulting confidence interval for curve is lower than that of the parameters, i.e. if you have 2 parameters with 95% confidence, you will obtain 95%*95%=90% confidence interval
If the curve depends on the parameters monotonically at every point, you can generate the interval just by taking boundary values of parameter intervals.

how to use ifft function in MATLAB with experimental data

I am trying to use the ifft function in MATLAB on some experimental data, but I don't get the expected results.
I have frequency data of a logarithmic sine sweep excitation, therefore I know the amplitude [g's], the frequency [Hz] and the phase (which is 0 since the point is a piloting point).
I tried to feed it directly to the ifft function, but I get a complex number as a result (and I expected a real result since it is a time signal). I thought the problem could be that the signal is not symmetric, therefore I computed the symmetric part in this way (in a 'for' loop)
x(i) = conj(x(mod(N-i+1,N)+1))
and I added it at the end of the amplitude vector.
new_amp = [amplitude x];
In this way the new amplitude vector is symmetric, but now I also doubled the dimension of that vector and this means I have to double the dimension of the frequency vector also.
Anyway, I fed the new amplitude vector to the ifft but still I don't get the logarithmic sine sweep, although this time the output is real as expected.
To compute the time [s] for the plot I used the following formula:
t = 60*3.33*log10(f/f(1))/(sweep rate)
What am I doing wrong?
Thank you in advance

If you want to create identical time domain signal from specified frequency values you should take into account lots of details. It seems to me very complicated problem and I think it need very strength background on the mathematics behind it.
But I think you may work on some details to get more acceptable result:
1- Time vector should be equally spaced based on sampling from frequency steps and maximum.
t = 0:1/fs:N/fs;
where: *N* is the length of signal in frequency domain, and *fs* is twice the
highest frequency in frequency domain.
2- You should have some sort of logarithmic phases on the frequency bins I think.
3- Your signal in frequency domain must be even to have real signal in time domain.
I hope this could help, even for someone to improve it.

Scipy periodogram terminology confusion

I am confused about the terminology used in scipy.signal.periodogram, namely:
scaling : { 'density', 'spectrum' }, optional
Selects between computing the power spectral density ('density')
where Pxx has units of V*2/Hz if x is measured in V and computing
the power spectrum ('spectrum') where Pxx has units of V*2 if x is
measured in V. Defaults to 'density'
(see: http://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.periodogram.html)
1) a few tests show that result for option 'density' is dependent on signal and window length and sampling frequency (grows when signal length increases). How come? I would say that it is exactly density that should be not dependent on these things. If I take a longer signal I should just get more accurate estimation, not different result. Not to mention that dependence on window length is also very surprising.
Result diverges in the limit of infinite signal, which could be a feature of energy, but not power. Shouldn't the periodogram converge to real theoretical PSD when length increases? If, so, am I supposed to perform another normalisation outside of the signal.periodogram method?
2) to the contrary I see that alternative option 'spectrum' gives what I would previously call Power Spectrum Density, that is, it gives a resuls independent on window segment and window length and consistent with theoretical calculation. For instance for Asin(2PIft) a two sided solution yields two peaks at -f and f, each of height 0.25*A^2.
There is a lot of literature on this subject, but I get an impression that also there is a lot of incompatibile terminology, so I will be thankful for any clarification. The straightforward question is how to interpret these options and their units. (I am used to seeing V^2/Hz which are labeled "Power Spectrum Density").

Let's take a real array called data, of length N, and with sampling frequency fs. Let's call the time bin dt=1/fs, and T = N * dt. In frequency domain, the frequency bin df = 1/T = fs/N.
The power spectrum PS (scaling='spectrum' in scipy.periodogram) is calculated as follow:
import numpy as np
import scipy.fft as fft
dft = fft.fft(data)
PS = np.abs(dft)**2 / N ** 2
It has the units of V^2. It can be understood as follow. By analogy to the continuous Fourier transform, the energy E of the signal is:
E := np.sum(data**2) * dt = 1/N * np.sum(np.abs(dft)**2) * dt
(by Parseval's theorem). The power P of the signal is the total energy E divided by the duration of the signal T:
P := E/T = 1/N**2 * np.sum(np.abs(dft)**2)
The power P only depends on the Discrete Fourier Transform (DFT) and the number of samples N. Not directly on the sampling frequency fs or signal duration T. And the power per frequency channel, i.e., power spectrum SP, is thus given by the formula above:
PS = np.abs(dft)**2 / N ** 2
For the power spectrum density PSD (scaling='density' in scipy.periodogram), one needs to divide the PS by the frequency bin of the DFT, df:
PSD := PS/df = PS * N * dt = PS * N / fs
and thus:
PSD = np.abs(dft)**2 / N * dt
This has the units of V^2/Hz = V^2 * s, and now depends on the sampling frequency. That way, integrating the PSD over the frequency range gives the same result as summing the individual values of the PS.
This should explain the relations that you see when changing the window, sampling frequency, duration.

scipy.signal.peridogram uses the scipy.signal.welch function with 0 overlap. Therefore, the scaling is similar to the one provided by the welch function, density or spectrum.
In case of the density scaling, the amplitude will vary with window length, as the longer the window the higher the frequency resolution e.g. the \Delta_f is smaller. Since the estimated density is the average one, the smaller the \Delta_f the less zero energy is considered in the averaging.
As you have mentioned spectrum scaling is an integration of the energy density over the spectrum to produce the energy. Therefore, the integration over zero values does not affect the final value.

Fourier transform actually requires finite energy in an infinite duration of time series (like a decay). So, If you just make your time series sample longer by "duplicating", the energy will be infinite with an infinite duration.
My main confusion was on the "spectrum" option for scipy.signal.periodogram, which seems to create a constant energy spectrum even when the time series become longer.
Normally, 0.5*A^2=S(f)*delta_f, where S(f) is the power density spectrum. S(f)*delta_f, representing energy is constant if A is constant. But when using a longer duration of time series, delta_f (i.e. incremental frequency) is reduced accordingly, based on FFT procedure. For example, 100s time series will lead to a delta_f=0.01Hz, while 1000s time series will have a delta_f=0.001Hz. S(f) representing density will accordingly change.