Why does correlation using DFT give unintuitive results? - matlab

I was trying to compare how similar 2 signals using correlation via DFT (Digital Fourier Transform) in Matlab, but the correlation function gives not really predictable results. For example, if I compare those 2 pairs of signals :
correlation 1 and 2
correlation 3 and 4 (autocorrelation)
I would expect correlation peak in "corr 3 and 4" case higher than in "corr 1 and 2" case.
I as also tried to make signals "average to zero", but this did not help.
Is this the expected result or did I miss some preprocessing, etc.?

You need to normalize your data traces - i.e. divide them by their respective integrals before correlating. The following code demonstrates that when you normalize your data traces, the autocorrelation indeed gives you the larger value:
%# producing your data
trace1=(abs(linspace(-64,64,128))<20)*200;
trace2=trace1-(abs(linspace(-64,64,128))<10)*50;
figure;
subplot(321);
plot(trace1);
subplot(322);
plot(trace2);
subplot(323);
plot(xcorr(trace1,trace2))
title('unnormalized cross-correlation');
subplot(324);
plot(xcorr(trace2,trace2))
title('unnormalized autocorrelation');
%
%# what you should be doing:
subplot(325);
plot(xcorr(trace1/sum(trace1(:)),trace2/sum(trace2(:))))
title('normalized cross-correlation');
subplot(326);
plot(xcorr(trace2/sum(trace2(:)),trace2/sum(trace2(:))))
title('normalized autocorrelation');
leading to
where I zoomed in on the peaks to show the normalized autocorrelation has a higher peak than the normalized cross-correlation.

#Jonas, I was unable to find how to insert image and make good enough formatting (sorry, novice here) commenting your answer, so I am leaving this comment as "answer".
So, what I found that for following figures your method giving not expected results:
as you see - peak for auto-correlation is lower than for cross correlation.
Code which I used is below:
trace1=(abs(linspace(-64,64,128))<20)*200;
trace2=trace1-(abs(linspace(-64,64,128))<10)*50;
trace1=trace1-(abs(linspace(-64,64,128))<10)*100;
subplot(321);
plot(trace1); grid on;
subplot(322);
plot(trace2); grid on;
subplot(323);
plot(xcorr(trace1,trace2)); grid on;
title('unnormalized cross-correlation');
subplot(324);
plot(xcorr(trace2,trace2)); grid on;
title('unnormalized autocorrelation');
subplot(325);
plot(xcorr(trace1/sum(trace1(:)),trace2/sum(trace2(:)))); grid on;
title('normalized cross-correlation');
subplot(326);
plot(xcorr(trace2/sum(trace2(:)),trace2/sum(trace2(:)))); grid on;
title('normalized autocorrelation');

Related

FFT Plot of an Audio Signal - MATLAB

I'm using MATLAB to plot a recorded sound using the FFT. I want to take the log of the y-axis but I don't know what I did if correct.
Currently, my FFT plotting code looks like this:
nf=1024; %number of point in DTFT
Y = fft(y,nf);
f = fs/2*linspace(0,1,nf/2+1);
plot(f,abs(Y(1:nf/2+1)));
title('Single-Sided Amplitude Spectrum of y(t)')
xlabel('Frequency (Hz)')
ylabel('|Y(f)|')
What I did is: plot(f,log(Y(1:nf/2+1)));. I replaced abs with log. Is this correct?
Applying log on the coefficients itself doesn't make any sense... especially since the spectra will be complex-valued in nature. However, some people usually apply the log on the magnitude of the spectra (hence the abs call) mostly for visualization purposes so that large values of the magnitude don't overwhelm the smaller values. Applying log in this case will allow the larger values to taper off and the spectrum can be visualized easier. but applying the straight log in my opinion isn't correct. The code you have provided plots the magnitude of the single-sided spectrum and so there's no need to change anything.
If you provided more insight as to why you want to use the log, that would be helpful but right now, I would say that the straight log is incorrect. However, if you really must use the log, apply it on the magnitude instead. Also, to prevent undefined behaviour, make sure you add 1 to the magnitude before applying the log so that zero values of your magnitude get mapped to zero, rather than undefined.
As such, do this instead:
nf=1024; %number of point in DTFT
Y = fft(y,nf);
f = fs/2*linspace(0,1,nf/2+1);
plot(f,log(1 + abs(Y(1:nf/2+1)))); %// Change
title('Single-Sided Amplitude Spectrum of y(t)')
xlabel('Frequency (Hz)')
ylabel('|Y(f)|')

Matlab: Peak detection for clusters of peaks

I am working with biological signal data, and am trying to count the number of regions with a high density of high amplitude peaks. As seen in the figure below, the regions of interest (as observed qualitatively) are contained in red boxes and 8 such regions were observed for this particular trial. The goal is to mathematically achieve this same result in near real time without the intervention or observation of the researcher.
The data seen plotted below is the result of raw data from a 24-bit ADC being processed by an FIR filter, with no other processing yet being done.
What I am looking for is a method, or ideally code, to help me detect such regions as identified while subsequently ignoring some of the higher amplitude peaks in between the regions of interest (i.e. between regions 3 and 4, 5 and 6, or 7 and 8 there is a narrow region of high amplitude which is not of concern). It is worth noting that the maximum is not known prior to computation.
Thanks for your help.
Data
https://www.dropbox.com/s/oejyy6tpf5iti3j/FIRData.mat
can you work with thresholds?
define:
(1) "amplitude threshold": if the signal is greater than the threshold it is considered a peak
(2) "window size" : of a fixed time duration
algorithm:
if n number of peaks was detected in a duration defined in "window size" than consider the signal within "window size" as cluster of peaks.(I worked with eye blink eeg data this way before, not sure if it is suitable for your application)
P.S. if you have data that are already labelled by human, you can train a classifier to find out your thresholds and window size.
Does it make sense in your problem to have some sort of "window size"? In other words, given a region of "high" amplitude, if you shrink the duration of the region, at what point will it become meaningless to your analysis?
If you can come up with a window, just apply this window to your data as it comes in and compute the energy within the window. Then, you can define some energy threshold and perform simple peak detection on the energy signal.
By inspection of your data, the regions with high amplitude peaks are repeated at what appears to be fairly uniform intervals. This suggests that you might fit a sine or cosine wave (or a combination of the two) to your data.
Excuse my crude sketch but what I mean is something like this:
Once you make this identification, you can use the FFT to get the dominant spatial frequencies. Keep in mind that the spatial frequency spectrum of your signal may be fairly complex, due to spurious data, but what you are after is one or two dominant frequencies of your data.
For example, I made up a sinusoid and you can do the calculation like this:
N = 255; % # of samples
x = linspace(-1/2, 1/2, N);
dx = x(2)-x(1);
nu = 8; % frequency in cycles/interval
vx = (1/(dx))*[-(N-1)/2:(N-1)/2]/N; % spatial frequency
y = sin(2*pi*nu*x); % this would be your data
F = fftshift(abs(fft(y))/N);
figure; set(gcf,'Color',[1 1 1]);
subplot(2,1,1);plot(x,y,'-b.'); grid on; xlabel('x'); grid on;
subplot(2,1,2);plot(vx,F,'-k.'); axis([-1.3*nu 1.3*nu 0 0.6]); xlabel('frequency'); grid on;
Which gives:
Note the peaks at ± nu, the dominant spatial frequency. Now once you have the dominant spatial frequencies you can reconstruct the sine wave using the frequencies that you have obtained from the FFT.
Finally, once you have your sine wave you can identify the boxes with centers at the peaks of the sine waves.
This is also a nice approach because it effectively filters out the spurious or less relevant spikes, helping you to properly place the boxes at your intended locations.
Since I don't have your data, I wasn't able to complete all of the code for you, but the idea is sound and you should be able to proceed from this point.

plotting pwelch with log axis

I'm using pwelch to plot a power spectral density. I want to use the format
pwelch=(x,window,noverlap,nfft,fs,'onesided')
but with a log scale on the x axis.
I've also tried
[P,F]=(x,window,noverlap,nfft,fs);
plot(F,P)
but it doesn't give the same resulting plot as above. Therefore,
semilogx(F,P)
isn't a good solution.
OK, so to start, I've never heard of this function or this method. However, I was able to generate the same plot that the function produced using output arguments instead. I ran the example from the help text.
EXAMPLE:
Fs = 1000; t = 0:1/Fs:.296;
x = cos(2*pi*t*200)+randn(size(t)); % A cosine of 200Hz plus noise
pwelch(x,[],[],[],Fs,'twosided'); % Uses default window, overlap & NFFT.
That produces this plot:
I then did: plot(bar,10*log10(foo)); grid on; to produce the linear version (same exact plot, minus labels):
or
semilogx(bar,10*log10(foo)); grid on; for the log scale on the x-axis.
I don't like that the x-scale is sampled linearly but displayed logarithmically (that's a word right?), but it seems to look ok.
Good enough?

Matlab cdfplot: how to control the spacing of the marker spacing

I have a Matlab figure I want to use in a paper. This figure contains multiple cdfplots.
Now the problem is that I cannot use the markers because the become very dense in the plot.
If i want to make the samples sparse I have to drop some samples from the cdfplot which will result in a different cdfplot line.
How can I add enough markers while maintaining the actual line?
One method is to get XData/YData properties from your curves follow solution (1) from #ephsmith and set it back. Here is an example for one curve.
y = evrnd(0,3,100,1); %# random data
%# original data
subplot(1,2,1)
h = cdfplot(y);
set(h,'Marker','*','MarkerSize',8,'MarkerEdgeColor','r','LineStyle','none')
%# reduced data
subplot(1,2,2)
h = cdfplot(y);
set(h,'Marker','*','MarkerSize',8,'MarkerEdgeColor','r','LineStyle','none')
xdata = get(h,'XData');
ydata = get(h,'YData');
set(h,'XData',xdata(1:5:end));
set(h,'YData',ydata(1:5:end));
Another method is to calculate empirical CDF separately using ECDF function, then reduce the results before plotting with PLOT.
y = evrnd(0,3,100,1); %# random data
[f, x] = ecdf(y);
%# original data
subplot(1,2,1)
plot(x,f,'*')
%# reduced data
subplot(1,2,2)
plot(x(1:5:end),f(1:5:end),'r*')
Result
I know this is potentially unnecessary given MATLAB's built-in functions (in the Statistics Toolbox anyway) but it may be of use to other viewers who do not have access to the toolbox.
The empirical CMF (CDF) is essentially the cumulative sum of the empirical PMF. The latter is attainable in MATLAB via the hist function. In order to get a nice approximation to the empirical PMF, the number of bins must be selected appropriately. In the following example, I assume that 64 bins is good enough for your data.
%# compute a histogram with 64 bins for the data points stored in y
[f,x]=hist(y,64);
%# convert the frequency points in f to proportions
f = f./sum(f);
%# compute the cumulative sum of the empirical PMF
cmf = cumsum(f);
Now you can choose how many points you'd like to plot by using the reduced data example given by yuk.
n=20 ; % number of total data markers in the curve graph
M_n = round(linspace(1,numel(y),n)) ; % indices of markers
% plot the whole line, and markers for selected data points
plot(x,y,'b-',y(M_n),y(M_n),'rs')
verry simple.....
try reducing the marker size.
x = rand(10000,1);
y = x + rand(10000,1);
plot(x,y,'b.','markersize',1);
For publishing purposes I tend to use the plot tools on the figure window. This allow you to tweak all of the plot parameters and immediately see the result.
If the problem is that you have too many data points, you can:
1). Plot using every nth sample of the data. Experiment to find an n that results in the look you want.
2). I typically fit curves to my data and add a few sparsely placed markers to plots of the fits to differentiate the curves.
Honestly, for publishing purposes I have always found that choosing different 'LineStyle' or 'LineWidth' properties for the lines gives much cleaner results than using different markers. This would also be a lot easier than trying to downsample your data, and for plots made with CDFPLOT I find that markers simply occlude the stairstep nature of the lines.

MATLAB : frequency distribution

I have raw observations of 500 numeric values (ranging from 1 to 25000) in a text file, I wish to make a frequency distribution in MATLAB. I did try the histogram (hist), however I would prefer a frequency distribution curve than blocks and bars.
Any help is appreciated !
If you pass two output parameters to HIST, you will get both the x-axis and y-axis values. Then you can plot the data as you like. For instance,
[counts, bins] = hist(mydata);
plot(bins, counts); %# get a line plot of the histogram
You could try Kernel smoothing density estimate