I want to get the offset in samples between two datasets in Matlab (getting them synced in time), a quite common issue. Therefore I use the cross correlation function xcorr or the cross covariance function xcov (both provide similar results in most cases for this purpose). With artificial data it works fine, but I struggle with "real" data, even though it should be pretty much the same. Matlab always says the offset would be zero. I'm using this simple piece of code:
[crossCorr] = xcov(b, c);
[~, peakIndex] = max(crossCorr())
offset = peakIndex - length(b)
I've posted a fully runable example m-file with a downsampled data excerpt on pastebin:
Code with data on pastebin
EDIT: The downsampled excerpt seems to be not fully suitable for evaluating the effect. Here's a much larger sample with the original frequency, pease use this one instead. Unfortunately it was too big for pastebin.
As the plot shows it should be no problem at all to get the offset via cross covariance. I also tried to scale the data nicer in order to avoid numerical problems, but that didn't change anything at all.
Would be great if someone could tell me my mistake.
There's nothing wrong with your method in principle, I used exactly the same approach successfully for temporally aligning different audio recordings of the same signal.
However, it appears that for your time series, correlation (or covariance) is simply not the right measure to compare shifted versions – possibly because they contain components of a time scale comparable to the total length. An alternative is to use residual variance, i.e. the variance of the difference between shifted versions. Here is a (not particularly elegant) implementation of this idea:
lags = -1000 : 1000;
v = nan(size(lags));
for i = 1 : numel(lags)
lag = lags(i);
if lag >= 0
v(i) = var(b(1 + lag : end) - c(1 : end - lag));
else
v(i) = var(b(1 : end + lag) - c(1 - lag : end));
end
end
[~, ind] = min(v);
minlag = lags(ind);
For your (longer) data set, this results in minlag = 169. Plotting residual variance over lags gives:
Your data has a minor peak around 5 and a major peak around 101.
If I knew something about my data then I could might window around an acceptable range of offsets as shown below.
Code for initial exploration:
figure; clc;
subplot(2,1,1)
plot(1:numel(b), b);
hold on
plot(1:numel(c), c, 'r');
legend('b','c')
subplot(2,1,2)
plot(crossCorr,'.b-')
hold on
plot(peakIndex,crossCorr(peakIndex),'or')
legend('crossCorr','peak')
Initial Image:
If you zoom into the first peak you can see that it is not only high around 5, but it is polynomial "enough" to allow sub-element offsets. That is convenient.
Image showing :
Here is what the curve-fitting tool gives as the analytic for a cubic:
Linear model Poly3:
f(x) = p1*x^3 + p2*x^2 + p3*x + p4
Coefficients (with 95% confidence bounds):
p1 = 8.515e-013 (8.214e-013, 8.816e-013)
p2 = -3.319e-011 (-3.369e-011, -3.269e-011)
p3 = 2.253e-010 (2.229e-010, 2.277e-010)
p4 = -4.226e-012 (-7.47e-012, -9.82e-013)
Goodness of fit:
SSE: 2.799e-024
R-square: 1
Adjusted R-square: 1
RMSE: 6.831e-013
You can note that the SSE fits to roundoff.
If you compute the root (near n=4) you use the following matlab code:
% Coefficients
p1 = 8.515e-013
p2 = -3.319e-011
p3 = 2.253e-010
p4 = -4.226e-012
% Linear model Poly3:
syms('x')
f = p1*x^3 + p2*x^2 + p3*x + p4
xz1=fzero(#(y) subs(diff(f),'x',y), 4)
and you get the analytic root at 4.01420240431444.
EDIT:
Hmmm. How about fitting a gaussian mixture model to the convolution? You sweep through a good range of component count, you do between 10 and 30 repeats, and you find which component count has the best/lowest BIC. So you fit a gmdistribution to the lower subplot of the first figure, then test the covariance at the means of the components in decreasing order.
I would try the offset at the means, and just look at sum squared error. I would then pick the offset that has the lowest error.
Procedure:
compute cross correlation
fit cross correlation to Gaussian Mixture model
sweep a reasonable range of components (start with 1-10)
use a reasonable number of repeats (10 to 30 depending on run-to-run variation)
compute Bayes Information Criterion (BIC) for each level, pick the lowest because it indicates a reasonable balance of error and parameter count
each component is going to have a mean, evaluate that mean as a candidate offset and compute sum-squared error (sse) when you offset like that.
pick the offset of the component that gives best SSE
Let me know how well that works.
If the two signals misalign by non-integer number of samples, e.g. 3.7 samples, then the xcorr method may find the max value at 4 samples, it won't be able to find the accurate time shift. In this case, you should try a method called "unified change detection". The web-link for the paper is:
[http://www.phmsociety.org/node/1404/]
Good Luck.
Related
I am currently doing a thesis that needs Ultrasonic pulse velocity(UPV). UPV can easily be attained via the machines but the data we acquired didn't have UPV so we are tasked to get it manually.
Essentially in the data we have 2 channels, one for the transmitter transducer and another for a receiver transducer.
We need to get the time from wave from the transmitter is emitted and the time it arrives to the receiver.
Using matlab, I've tried finddelay and xcorr but doesnt quite get the right result.
Here is a picture of the points I would want to get. The plot is of the transmitter(blue) and receiver(red)
So I am trying to find the two points in the picture but with the aid of matlab. The two would determine the time and further the UPV.
I am relatively a new MATLAB user so your help would be of great assistance.
Here is the code I have tried
[cc, lags] = xcorr(signal1,signal2);
d2 = -(lags(cc == max(cc))) / Fs;
#xenoclast hi there! so far the code i used are these.
close all
clc
Fs = input('input Fs = ');
T = 1/Fs;
L = input('input L = ');
t = (0:L-1)*T;
time = transpose(t);
i = input('input number of steploads = ');
% construct test sequences
%dataupv is the signal1 & datathesis is the signal2
for m=1:i
y1 = (dataupv(:,m) - mean(dataupv(:,m))) / std(dataupv(:,m));
y2 = (datathesis(:,m) - mean(datathesis(:,m))) / std(datathesis(:,m));
offset = 166;
tt = time;
% correlate the two sequences
[cc, lags] = xcorr(y2, y1,);
% find the in4dex of the maximum
[maxval, maxI] = max(cc);
[minval, minI] = min(cc);
% use that index to obtain the lag in samples
lagsamples(m,1) = lags(maxI);
lagsamples2(m,1) = lags(minI);
% plot again without timebase to verify visually
end
the resulting value is off by 70 samples compared to when i visually inspect the waves. the lag resulted in 244 but visually it should be 176 here are the data(there are 19 sets of data but i only used the 1st column) https://www.dropbox.com/s/ng5uq8f7oyap0tq/datatrans-dec-2014.xlsx?dl=0 https://www.dropbox.com/s/1x7en0x7elnbg42/datarec-dec-2014.xlsx?dl=0
Your example code doesn't specify Fs so I don't know for sure but I'm guessing that it's an issue of sample rate(s). All the examples of cross correlation start out by constructing test sequences according to a specific sample rate that they usually call Fs, not to be confused with the frequency of the test tone, which you may see called Fc.
If you construct the test signals in terms of Fs and Fc then this works fine but when you get real data from an instrument they often give you the data and the timebase as two vectors, so you have to work out the sample rate from the timebase. This may not be the same as the operating frequency or the components of the signal, so confusion is easy here.
But the sample rate is only required in the second part of the problem, where you work out the offset in time. First you have to get the offset in samples and that's a lot easier to verify.
Your example will give you the offset in samples if you remove the '/ Fs' term and you can verify it by plotting the two signals without a timebase and inspecting the sample positions.
I'm sure you've looked at dozens of examples but here's one that attempts to not confuse the issue by tying it to sample rates - you'll note that nowhere is it specified what the 'sample rate' is, it's just tied to samples (although if you treat the 5 in the y1 definition as a frequency in Hz then you will be able to infer one).
% construct test sequences
offset = 23;
tt = 0:0.01:1;
y1 = sin(2*pi*5*tt);
y2 = 0.8 * [zeros(1, offset) y1];
figure(1); clf; hold on
plot(tt, y1)
plot(tt, y2(1:numel(tt)), 'r')
% correlate the two sequences
[cc, lags] = xcorr(y2, y1);
figure(2); clf;
plot(cc)
% find the index of the maximum
[maxval, maxI] = max(cc);
% use that index to obtain the lag in samples
lagsamples = lags(maxI);
% plot again without timebase to verify visually
figure(3); clf; hold on
plot(y1)
plot(y2, 'r')
plot([offset offset], [-1 1], 'k:')
Once you've got the offset in samples you can probably deduce the required conversion factor, but if you have timebase data from the instrument then the inverse of the diff of any two consecutive entries will give it you.
UPDATE
When you correlate the two signals you can visualise it as overlaying them and summing the product of corresponding elements. This gives you a single value. Then you move one signal by a sample and do it again. Continue until you have done it at every possible arrangement of the two signals.
The value obtained at each step is the correlation, but the 'lag' is computed starting with one signal all the way over to the left and the other overlapping by only one sample. You slide the second signal all the way over until it's only overlapping the other end by a sample. Hence the number of values returned by the correlation is related to the length of both the original signals, and relating any given point in the correlation output, such as the max value, to the arrangement of the two signals that produced it requires you to do a calculation involving those lengths. The xcorr function makes this easier by outputting the lags variable, which tracks the alignment of the two signals. People may also talk about this as an offset so watch out for that.
I have two sensors seperated by some distance which receive a signal from a source. The signal in its pure form is a sine wave at a frequency of 17kHz. I want to estimate the TDOA between the two sensors. I am using crosscorrelation and below is my code
x1; % signal as recieved by sensor1
x2; % signal as recieved by sensor2
len = length(x1);
nfft = 2^nextpow2(2*len-1);
X1 = fft(x1);
X2 = fft(x2);
X = X1.*conj(X2);
m = ifft(X);
r = [m(end-len+1) m(1:len)];
[a,i] = max(r);
td = i - length(r)/2;
I am filtering my signals x1 and x2 by removing all frequencies below 17kHz.
I am having two problems with the above code:
1. With the sensors and source at the same place, I am getting different values of 'td' at each time. I am not sure what is wrong. Is it because of the noise? If so can anyone please provide a solution? I have read many papers and went through other questions on stackoverflow so please answer with code along with theory instead of just stating the theory.
2. The value of 'td' is sometimes not matching with the delay as calculated using xcorr. What am i doing wrong? Below is my code for td using xcorr
[xc,lags] = xcorr(x1,x2);
[m,i] = max(xc);
td = lags(i);
One problem you might have is the fact that you only use a single frequency. At f = 17 kHz, and an estimated speed-of-sound v = 340 m/s (I assume you use ultra-sound), the wavelength is lambda = v / f = 2 cm. This means that your length measurement has an unambiguity range of 2 cm (sorry, cannot find a good link, google yourself). This means that you already need to know your distance to better than 2 cm, before you can use the result of your measurement to refine the distance.
Think of it in another way: when taking the cross-correlation between two perfect sines, the result should be a 'comb' of peaks with spacing equal to the wavelength. If they overlap perfectly, and you displace one signal by one wavelength, they still overlap perfectly. This means that you first have to know which of these peaks is the right one, otherwise a different peak can be the highest every time purely by random noise. Did you make a plot of the calculated cross-correlation before trying to blindly find the maximum?
This problem is the same as in interferometry, where it is easy to measure small distance variations with a resolution smaller than a wavelength by measuring phase differences, but you have no idea about the absolute distance, since you do not know the absolute phase.
The solution to this is actually easy: let your source generate more frequencies. Even using (band-limited) white-noise should work without problems when calculating cross-correlations, and it removes the ambiguity problem. You should see the white noise as a collection of sines. The cross-correlation of each of them will generate a comb, but with different spacing. When adding all those combs together, they will add up significantly only in a single point, at the delay you are looking for!
White Noise, Maximum Length Sequency or other non-periodic signals should be used as the test signal for time delay measurement using cross correleation. This is because non-periodic signals have only one cross correlation peak and there will be no ambiguity to determine the time delay. It is possible to use the burst type of periodic signals to do the job, but with degraded SNR. If you have to use a continuous periodic signal as the test signal, then you can only measure a time delay within one period of the periodic test signal. This should explain why, in your case, using lower frequency sine wave as the test signal works while using higher frequency sine wave does not. This is demonstrated in these videos: https://youtu.be/L6YJqhbsuFY, https://youtu.be/7u1nSD0RlwY .
I want to find the peaks of the raw ecg signal so that I can calculate the beats per minute(bpm).
I Have written a code in matlab which I have attached below.In the code below I am unable to find threshold point correctly which will help me in finding the peaks and hence the bpm.
%input the signal into matlab
[x,fs]=wavread('heartbeat.wav');
subplot(2,1,1)
plot(x(1:10000),'r-')
grid on
%lowpass filter the input signal with cutoff at 100hz
h=fir1(30,0.3126); %normalized cutoff freq=0.3126
y=filter(h,1,x);
subplot(2,1,2)
plot(y(1:10000),'b-')
grid on
% peaks are seen as pulses(heart beats)
beat_count=0;
for p=2:length(y)-1
th(p)=abs(max(y(p)));
if(y(p) >y(p-1) && y(p) >y(p+1) && y(p)>th(p))
beat_count=beat_count+1;
end
end
N = length(y);
duration_seconds=N/fs;
duration_minutes=duration_seconds/60;
BPM=beat_count/duration_minutes;
bpm=ceil(BPM);
Please help me as I am new to matlab
I suggest changing this section of your code
beat_count=0;
for p=2:length(y)-1
th(p)=abs(max(y(p)));
if(y(p) >y(p-1) && y(p) >y(p+1) && y(p)>th(p))
beat_count=beat_count+1;
end
end
This is definitely flawed. I'm not sure of your logic here but what about this. We are looking for peaks, but only the high peaks, so first lets set a threshold value (you'll have to tweak this to a sensible number) and cull everything below that value to get rid of the smaller peaks:
th = max(y) * 0.9; %So here I'm considering anything less than 90% of the max as not a real peak... this bit really depends on your logic of finding peaks though which you haven't explained
Yth = zeros(length(y), 1);
Yth(y > th) = y(y > th);
OK so I suggest you now plot y and Yth to see what that code did. Now to find the peaks my logic is we are looking for local maxima i.e. points at which the first derivative of the function change from being positive to being negative. So I'm going to find a very simple numerical approximation to the first derivative by finding the difference between each consecutive point on the signal:
Ydiff = diff(Yth);
No I want to find where the signal goes from being positive to being negative. So I'm going to make all the positive values equal zero, and all the negative values equal one:
Ydiff_logical = Ydiff < 0;
finally I want to find where this signal changes from a zero to a one (but not the other way around)
Ypeaks = diff(Ydiff_logical) == 1;
Now count the peaks:
sum(Ypeaks)
note that for plotting purpouse because of the use of diff we should pad a false to either side of Ypeaks so
Ypeaks = [false; Ypeaks; false];
OK so there is quite a lot of matlab there, I suggest you run each line, one by one and inspect the variable by both plotting the result of each line and also by double clicking the variable in the matlab workspace to understand what is happening at each step.
Example: (signal PeakSig taken from http://www.mathworks.com/help/signal/ref/findpeaks.html) and plotting with:
plot(x(Ypeaks),PeakSig(Ypeaks),'k^','markerfacecolor',[1 0 0]);
What do you think about the built-in
findpeaks(data,'Name',value)
function? You can choose among different logics for peak detection:
'MINPEAKHEIGHT'
'MINPEAKDISTANCE'
'THRESHOLD'
'NPEAKS'
'SORTSTR'
I hope this helps.
You know, the QRS complex does not always have the maximum amplitude, for pathologic ECG it can be present as several minor oscillations instead of one high-amplitude peak.
Thus, you can try one good algothythm, tested by me: the detection criterion is assumed to be high absolute rate of change in the signal, averaged within the given interval.
Algorithm:
- 50/60 Hz filter (e.g. for 50 Hz sliding window of 20 msec will be fine)
- adaptive hipass filter (for baseline drift)
- find signal's first derivate x'
- fing squared derivate (x')^2
- apply sliding average window with the width of QRS complex - approx 100-150 msec (you will get some signal with 'rectangles', which have width of QRS)
- use simple threshold (e.g. 1/3 of maximum of the first 3 seconds) to determine approximate positions or R
- in the source ECG find local maximum within +-100 msec of that R position.
However, you still have to eliminate artifacts and outliers (e.g. surges, when the electrod connection fails).
Also, you can find a lot of helpful information from this book: "R.M. Rangayyan - Biomedical Signal Analysis"
I would like to convolve a time-series containing two spikes (call it Spike) with an exponential kernel (k) in MATLAB. Call the convolved response "calcium1". I would like to recover the original spike ("reconSpike") data using deconvolution with the kernel. I am using the following code.
k1=zeros(1,5000);
k1(1:1000)=(1.1.^((1:1000)/100)-(1.1^0.01))/((1.1^10)-1.1^0.01);
k1(1001:5000)=exp(-((1001:5000)-1001)/1000);
k1(1)=k1(2);
spike = zeros(100000,1);
spike(1000)=1;
spike(1100)=1;
calcium1=conv(k1, spike);
reconSpike1=deconv(calcium1, k1);
The problem is that at the end of reconSpike, I get a chunk of very large, high amplitude waves that was not in the original data. Anyone know why and how to fix it?
Thanks!
It works for me if you keep the spike vector the same length as the k1 vector. i.e.:
k1=zeros(1,5000);
k1(1:1000)=(1.1.^((1:1000)/100)-(1.1^0.01))/((1.1^10)-1.1^0.01);
k1(1001:5000)=exp(-((1001:5000)-1001)/1000);
k1(1)=k1(2);
spike = zeros(5000, 1);
spike(1000)=1;
spike(1100)=1;
calcium1=conv(k1, spike);
reconSpike1=deconv(calcium1, k1);
Any reason you made them different?
You are running into either a problem with MATLAB's deconvolution algorithm, or floating point precision problems (or maybe both). I suspect it's floating point precision due to all the divisions and subtractions that take place during the deconvolution, but it might be worth contacting MathWorks directly to ask what they think.
Per MATLAB documentation, if [q,r] = deconv(v,u), then v = conv(u,q)+r must also hold (i.e., the output of deconv should always satisfy this). In your case this is violently violated. Put the following at the end of your script:
[reconSpike1 rem]=deconv(calcium1, k1);
max(conv(k1, reconSpike1) + rem - calcium1)
I get 6.75e227, which is not zero ;-) Next try changing the length of spike to 6000; you will get a small number (~1e-15). Gradually increase the length of spike; the error will get larger and larger. Note that if you put only one non-zero element into your spike, this behavior doesn't happen: the error is always zero. It makes sense; all MATLAB needs to do is divide everything by the same number.
Here's a simple demonstration using random vectors:
v = random('uniform', 1,2,100,1);
u = random('uniform', 1,2,100,1);
[q r] = deconv(v,u);
fprintf('maximum error for length(v) = 100 is %f\n', max(conv(u, q) + r - v))
v = random('uniform', 1,2,1000,1);
[q r] = deconv(v,u);
fprintf('maximum error for length(v) = 1000 is %f\n', max(conv(u, q) + r - v))
The output is:
maximum error for length(v) = 100 is 0.000000
maximum error for length(v) = 1000 is 14.910770
I don't know what you are really trying to accomplish, so it's hard to give further advice. But I'll just point out that if you have a problem where pulses are piling up and you want to extract information about each pulse, this can be a tricky problem. I know some people who work on things like this, so if you want some references let me know and I will ask them.
You should never expect that a deconvolution can simply undo a convolution. This is because the deconvolution is an ill-posed problem.
The problem comes from the fact that the convolution is an integral operator (in the continuous case you write down an integral int f(x) g(x-t) dx or something similar). Now, the inverse of computing an integral (the de-convolution) is to apply a differentiation. Unfortunately, the differential amplifies noise in the input. Thus, if your integral only has slight errors on it (and floating-point inaccuarcies might already be enough), you end up with a total different outcome after differentiation.
There are some possibilities how this amplification can be mitigated but these have to be tried on a per-application basis.
First, I should specify that my knowledge of statistics is fairly limited, so please forgive me if my question seems trivial or perhaps doesn't even make sense.
I have data that doesn't appear to be normally distributed. Typically, when I plot confidence intervals, I would use the mean +- 2 standard deviations, but I don't think that is acceptible for a non-uniform distribution. My sample size is currently set to 1000 samples, which would seem like enough to determine if it was a normal distribution or not.
I use Matlab for all my processing, so are there any functions in Matlab that would make it easy to calculate the confidence intervals (say 95%)?
I know there are the 'quantile' and 'prctile' functions, but I'm not sure if that's what I need to use. The function 'mle' also returns confidence intervals for normally distributed data, although you can also supply your own pdf.
Could I use ksdensity to create a pdf for my data, then feed that pdf into the mle function to give me confidence intervals?
Also, how would I go about determining if my data is normally distributed. I mean I can currently tell just by looking at the histogram or pdf from ksdensity, but is there a way to quantitatively measure it?
Thanks!
So there are a couple of questions there. Here are some suggestions
You are right that a mean of 1000 samples should be normally distributed (unless your data is "heavy tailed", which I'm assuming is not the case). to get a 1-alpha-confidence interval for the mean (in your case alpha = 0.05) you can use the 'norminv' function. For example say we wanted a 95% CI for the mean a sample of data X, then we can type
N = 1000; % sample size
X = exprnd(3,N,1); % sample from a non-normal distribution
mu = mean(X); % sample mean (normally distributed)
sig = std(X)/sqrt(N); % sample standard deviation of the mean
alphao2 = .05/2; % alpha over 2
CI = [mu + norminv(alphao2)*sig ,...
mu - norminv(alphao2)*sig ]
CI =
2.9369 3.3126
Testing if a data sample is normally distribution can be done in a lot of ways. One simple method is with a QQ plot. To do this, use 'qqplot(X)' where X is your data sample. If the result is approximately a straight line, the sample is normal. If the result is not a straight line, the sample is not normal.
For example if X = exprnd(3,1000,1) as above, the sample is non-normal and the qqplot is very non-linear:
X = exprnd(3,1000,1);
qqplot(X);
On the other hand if the data is normal the qqplot will give a straight line:
qqplot(randn(1000,1))
You might consider, also, using bootstrapping, with the bootci function.
You may use the method proposed in [1]:
MEDIAN +/- 1.7(1.25R / 1.35SQN)
Where R = Interquartile Range,
SQN = Square Root of N
This is often used in notched box plots, a useful data visualization for non-normal data. If the notches of two medians do not overlap, the medians are, approximately, significantly different at about a 95% confidence level.
[1] McGill, R., J. W. Tukey, and W. A. Larsen. "Variations of Boxplots." The American Statistician. Vol. 32, No. 1, 1978, pp. 12–16.
Are you sure you need confidence intervals or just the 90% range of the random data?
If you need the latter, I suggest you use prctile(). For example, if you have a vector holding independent identically distributed samples of random variables, you can get some useful information by running
y = prcntile(x, [5 50 95])
This will return in [y(1), y(3)] the range where 90% of your samples occur. And in y(2) you get the median of the sample.
Try the following example (using a normally distributed variable):
t = 0:99;
tt = repmat(t, 1000, 1);
x = randn(1000, 100) .* tt + tt; % simple gaussian model with varying mean and variance
y = prctile(x, [5 50 95]);
plot(t, y);
legend('5%','50%','95%')
I have not used Matlab but from my understanding of statistics, if your distribution cannot be assumed to be normal distribution, then you have to take it as Student t distribution and calculate confidence Interval and accuracy.
http://www.stat.yale.edu/Courses/1997-98/101/confint.htm