Is there an R function to calculate the surrogate threshold - linear-regression

I have created a meta-regression:
library(metafor)
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)
res <- rma(yi, vi, mods = ~ ablat, data=dat)
How can I estimate what the upper 95% prediction interval is, when yi = 1?
Definition of surrogate thresold is the intercept of the prediciton band of the regression line with effect size of RR = 1.
Thank you.
I can predict yi for a given vi, or vi for a given yi as well as construct a plot of the prediction intervals using functions from the metafor package. However, I do not know of a way to calculate the upper 95% prediction band for a given yi.

Related

Calculate precision and recall on WANG database

I have made an CBIR system in MATLAB and have used similarity measurement as euclidean distance.
Using this for each query image I retrieve top 20 images.
I have used WANG Dataset for testing my system.
It contains 10 classes(like African people, Buses, Roses etc.) each containing 100 images.(1000 images in total).
My Method:
1. I am using Correlogram, Co-occurence Matrix(CCM) and Difference between Pixel Scan Pattern(DBPSP) for constructing my vector(64+196+28=288 dimensions respectively).
Each of the 1000 db image I have its vector constructed beforehand.
Now a query image comes and I construct it's vector too(228 dimensions again).
I use Euclidean Distance for similarity and sort db image vectors in descending order of their euclid distance.
Top 20 results are shown.
In those 20 I can have TP or FP.
For a single query image I can easily calculate Precision and Recall and plot PR-curve using this link.
How can I do the same for whole class?
My Approach: For each image belonging to class A find top 20 images and it's respective TP(true positives) and FP (False Positive).
TP FP
Image1 17 3
Image2 15 5
...
...
Image100 10 10
Total 1500 500
Precision of Class A =1500/(2000) = .75 (Is it right??)
Recall of Class A ---> Stuck ??
PR-Curve ----> Stuck ?? Some links said I need a classifier for that and some not... I am really confused.
So as you noted, you can calculate precision as follows.
P = TP ./ ( TP + FP );
However, you NEED to have either have FN or the number of total falses to calculate recall. As discussed in chat, you need to find a way to determine your FN and FP data. Then you can use the following formula to calculate recall.
R = TP ./ ( TP + FN )
If you have the confusion matrix or data, you can use my custom confusionmat2f1.m to calculate precision, recall, and f1 score. This assumes that the confusion matrix is formatted as how Matlab defines it. An explanation of each line is inline. Please let me know if you want more clarification.
function [F,P,R] = confusionmat2f1( C )
%% confusionmat2f1( C )
%
% Inputs
% C - Confusion Matrix
%
% Outputs
% F - F1 score column vector
% P - Precision column vector
% R - Recall column vector
%%
% Confusion Matrix to Probability
M = sum( C, 3 );
% Calculate Precision
P = diag(M) ./ sum(M,1)';
% Calculate Recall
R = diag(M) ./ sum(M,2);
% Calculate F1 Score
F = f1( P, R );

How to find the frequency response of the Rosenberg Glottal Model

Is there an easy way to calculate the frequency response of the following function?
I tried using heaviside function but with no luck.
Basically I want to write a function to return the frequency response based on input N1 and N2 and also the number of points (lets say x) between 0 and pi
The output would be a vector which returns x values for the frequency response for corresponding frequencies => 0:pi/x:pi
Assuming that N1 + N2 < num_points, where num_points is the length of the sequence, you can simply write the function like so:
function [gr] = rosenburg(N1, N2, num_points)
gr = zeros(num_points,1);
range1 = 0:N1;
range2 = N1+1:N1+N2;
gr(range1+1) = 0.5*(1 - cos(pi*range1/N1));
gr(range2+1) = cos(pi*(range2-N1) / (2*N2));
end
The function prototype, rosenburg takes in N1, N2 and the total number of points you want this function to take in, num_points. How this code works is that we first allocate an array that is all zeroes of size num_points. We then compute two linear ranges: One from 0 <= n <= N1 and the other from N1 < n <= N2. Note that the second range starts by offsetting N1 by 1 because we have already computed the value at n = N1. Once we compute these ranges, we simply apply the right relationship in the right ranges. Note that when I'm assigning the relationships to the correct intervals in the array, I need to offset by 1 because MATLAB begins indexing arrays at index 1. The rest of the values are zero due to the initialization at the beginning of the function.
Now, if you want to find the frequency response of this signal, just use fft which is the Fast Fourier Transform. It's the classic method to find the frequency domain version of a discrete input signal on a numerical basis. As such, once you create your signal using the rosenburg function, then throw this into the FFT function. How you call it is like so:
X = fft(gr);
This computes the N point FFT, where N is the length of the signal gr. Alternatively, you can provide the number of points you want to compute the FFT for. Specifically:
X = fft(gr, N);
Basically, the higher N is, the finer or granular the frequency components will be. Note that the frequency axis is normalized between 0 to 2*pi, and so the higher N is, the finer resolution you will have between neighbouring points on the axis. Specifically, each point on this axis has the following frequency:
w = i*(2*pi)/x;
i would be the index on the x-axis (0, 1, 2, ..., num_points-1) and x would be the total number of points for the FFT. Normally, people show the spectrum between -pi <= w <= pi, and so some people apply fftshift to shift the spectrum so that the DC component is located at the centre of the spectrum, which is how we naturally perceive the spectrum to be.
When you say "frequency response", I believe you are referring to the magnitude, and so use abs to calculate the complex magnitude of each value, as the fft is generally complex valued. Therefore, assuming that you wish to compute the FFT to be as many points as the length of your signal, and let's say we choose N1 = 4, N2 = 8 and we want 64 points, and we want to plot the spectrum. Simply do this:
gr = rosenburg(4, 8, 64);
X = fft(gr);
Xshift = fftshift(X);
plot(linspace(-pi,pi,64), abs(Xshift));
grid;
The above code will shift the spectrum, then plot its magnitude between -pi to pi. This is what I get:
As an illustration, this is what the spectrum looks like before we apply fftshift:
Here's the code to generate the above figure:
plot(linspace(0,2*pi,64), abs(X));
grid;
You can see that the spectra is symmetric. Right at the frequency pi, you can see that it is mirror reflected, which makes sense as the range from pi to 2*pi, precisely maps to -pi to 0. Because the signal is real, the spectrum is symmetric. In fact, we can call this signal Hermitian symmetric. Obviously, the frequency components are a bit sparsely spaced. It may be better to increase the total number of points to something like 256. This is what I get when I change the number of points to 256:
Pretty smooth! Now, if you want to extract the frequency components from 0 to pi, you need to extract half of the frequency decomposition that is stored in X. Therefore, you would simply do:
f = X(1:numel(X)/2);
numel determines how many elements are in an array or matrix. However, remember that each frequency point was defined as:
w = i*(2*pi)/x
You specifically want:
w = i*pi/x
As such, you'll need to compute the FFT at twice the size of your signal first, then extract half of the spectra in the same way. For example, for 64 points:
gr = rosenburg(4, 8, 64);
X = fft(gr, 128);
f = X(1:numel(X)/2);
This should hopefully get you started. Good luck!

MATLAB weighted resampling

I'm writing a particle filter localization algorithm as part of an exercise to locate a plane flying over mountains.
From my understanding, the steps to this are:
- make a bunch of random guesses
- filter out unlikely guesses (using Gaussian hypothesis testing and some known information about the problem)
- shift filtered points by how much the plane moved in that step
- resample, weighted by shifted points
What I'm having trouble with is the resampling bit - how could I perform a weighted resampling in MATLAB?
Please let me know if there's anything I should clarify!! Thanks!
Firstly you should look into the SIR (Sequential Importance Sampling Re-sampling) Particle Filter [PF] (Or Sequential Monte-Carlo Methods is the other name it is known by).
I recommend the book called by Arnaud Doucet & Neil Gordon called "Sequential Monte Carlo Methods in Practice". It contains practically the state of the art when it comes to Particle Filters and contains a description of the implementations of the various flavors of the PF.
The SIR-PF has the following steps:
Prediction: Based on your state equations and the previous particle population propagate the particles to the next discrete time instance i.e. x(t+1) = f(x(t),w(t)) := where x is a vector of n states and for each state you have N realisations (particles) of the state eg. x ~ [N x n]
Correction: based on your estimation of your measurement equations that should be in the form y(t+1) = g(x(t+1),v(t)), where x(t+1) is your state population of particles. You calculate the error, e(t) = y(t+1) - y_m(t+1) and weight the population according to a likelihood function, which can be, but not necessarily has to be, a Normal distribution. You now will have a set of weights e.g. if you have m "sensors" you will have a weighting matrix W = [N x m] or in the simple case you'll have a [N x 1] vector of weights. (remember to normalise the weights)
Re-sampling (Conditional): This step should be based on a conditional to avoid the pitfall of particle degeneracy (which you should look into), the common conditional is to compute the "effective particle population size", := 1/(sum of the squared weights) i.e. Neff = 1/sum(w1**2, w2**2, ...., wN**2). If Neff < 0.85*N then resample.
Re-sampling: Calculate the CDF of the (normalised) weights vector i.e. P = cumsum(W) and generate random samples from a uniform distribution (r), select the first particle that P(w) >= r, repeat this until you have N realisations of the CDF, this will sample more frequently from the particles that have higher weights and less frequently from those that do not, effectively condensing your particle population. You then create a new set of weights that are uniformly weighted i.e. wN = 1/N
function [weights,X_update] = Standardised_Resample(P,X)
Neff = 1/(sum(P.^2)); % Test effective particle size
P = P./sum(P) % Ensure particle weights are normalised
if Neff < 0.85*size(P,1)
N = size(P,1)
X_update(N,1) = 0
L = cumsum(P)
for i = 1:N
X_update(i) = X(find(rand <= L,1))
end
weights = ones(N,1)*1./N;
else
weights = P;
X_update = X;
end
end
Estimation: XEst = W(t+1)*x(t+1) := the weighted product produces the estimate for the states at time t+1
Rinse and Repeat for time t+2 etc.
Note: x(0/0) is a population of N samples of a random distribution of ~N(x(0),Q(0)) where x(0) is an estimate of the initial conditions [IC] and Q(0/0) is an estimate of the variance (uncertainty) of your IC guess

Comparing FFT of Function to Analytical FT Solution in Matlab

I am trying to compare the FFT of exp(-t^2) to the function's analytical fourier transform, exp(-(w^2)/4)/sqrt(2), over the frequency range -3 to 3.
I have written the following matlab code and have iterated on it MANY times now with no success.
fs = 100; %sampling frequency
dt = 1/fs;
t = 0:dt:10-dt; %time vector
L = length(t); %number of sample points
%N = 2^nextpow2(L); %necessary?
y = exp(-(t.^2));
Y=dt*ifftshift(abs(fft(y)));
freq = (-L/2:L/2-1)*fs/L; %freq vector
F = (exp(-(freq.^2)/4))/sqrt(2); %analytical solution
%Y_valid_pts = Y(W>=-3 & W<=3); %compare for freq = -3 to 3
%npts = length(Y_valid_pts);
% w = linspace(-3,3,npts);
% Fe = (exp(-(w.^2)/4))/sqrt(2);
error = norm(Y - F) %L2 Norm for error
hold on;
plot(freq,Y,'r');
plot(freq,F,'b');
xlabel('Frequency, w');
legend('numerical','analytic');
hold off;
You can see that right now, I am simply trying to get the two plots to look similar. Eventually, I would like to find a way to do two things:
1) find the minimum sampling rate,
2) find the minimum number of samples,
to reach an error (defined as the L2 norm of the difference between the two solutions) of 10^-4.
I feel that this is pretty simple, but I can't seem to even get the two graphs visually agree.
If someone could let me know where I'm going wrong and how I can tackle the two points above (minimum sampling frequency and minimum number of samples) I would be very appreciative.
Thanks
A first thing to note is that the Fourier transform pair for the function exp(-t^2) over the +/- infinity range, as can be derived from tables of Fourier transforms is actually:
Finally, as you are generating the function exp(-t^2), you are limiting the range of t to positive values (instead of taking the whole +/- infinity range).
For the relationship to hold, you would thus have to generate exp(-t^2) with something such as:
t = 0:dt:10-dt; %time vector
t = t - 0.5*max(t); %center around t=0
y = exp(-(t.^2));
Then, the variable w represents angular frequency in radians which is related to the normalized frequency freq through:
w = 2*pi*freq;
Thus,
F = (exp(-((2*pi*freq).^2)/4))*sqrt(pi); %analytical solution

Matlab Average Two Fourier transforms with different frequency vectors

Suppose I have two power spectrum vectors PS1 and PS2 which were created using fft and then taking only the positive frequency values and squaring the fft values (complex conjugate really).
Suppose also that the corresponding frequency values for PS1 and PS2 are different. E.g. PS1(10) might correspond to 10 Hz and PS2(10) might correspond to 10.5 Hz.
I want to have an average of these two (and more) power spectra. How would I best create such an average? It is fine if the PS_ave is a longer vector than any of the original power spectra, so long as there is a corresponding frequency vector. So, it might be that PS_ave(11) corresponds to 10.25 Hz, and this value should probably be the average of PS1(10) and PS2(10). All ideas are welcome!
Thanks!
You might try using interp1, which can interpolate within one signal to match frequencies for another spectrum. The following example illustrates this:
v1 =1;
t1 = [0:0.1:10];
sig1 = sin(2*pi*t1*v1).*exp(-0.5*t1)/length(t1);
v2 = 0.5;
t2 = [0:0.2:10];
sig2 = cos(2*pi*t2*v2).*exp(-0.5*t2)/length(t2);
s1= fft(sig1);
s1 = s1(1:end/2);
f1 = [0:length(s1)-1]*(1/max(t1));
s2= fft(sig2);
s2 = s2(1:end/2);
f2 = [0:length(s2)-1]*(1/max(t2));
p1 = abs(s1);
p2 = abs(s2);
% Now average using interpolation to find points in the longer vector matching the shorter
p1_interp = interp1(f1,p1,f2);
power_avg = mean([p2; p1_interp],1)
hold on, plot(f2,power_avg,'r')
Here's the result (red = avg power):