Matlab: Peak detection for clusters of peaks - matlab

I am working with biological signal data, and am trying to count the number of regions with a high density of high amplitude peaks. As seen in the figure below, the regions of interest (as observed qualitatively) are contained in red boxes and 8 such regions were observed for this particular trial. The goal is to mathematically achieve this same result in near real time without the intervention or observation of the researcher.
The data seen plotted below is the result of raw data from a 24-bit ADC being processed by an FIR filter, with no other processing yet being done.
What I am looking for is a method, or ideally code, to help me detect such regions as identified while subsequently ignoring some of the higher amplitude peaks in between the regions of interest (i.e. between regions 3 and 4, 5 and 6, or 7 and 8 there is a narrow region of high amplitude which is not of concern). It is worth noting that the maximum is not known prior to computation.
Thanks for your help.
Data
https://www.dropbox.com/s/oejyy6tpf5iti3j/FIRData.mat

can you work with thresholds?
define:
(1) "amplitude threshold": if the signal is greater than the threshold it is considered a peak
(2) "window size" : of a fixed time duration
algorithm:
if n number of peaks was detected in a duration defined in "window size" than consider the signal within "window size" as cluster of peaks.(I worked with eye blink eeg data this way before, not sure if it is suitable for your application)
P.S. if you have data that are already labelled by human, you can train a classifier to find out your thresholds and window size.

Does it make sense in your problem to have some sort of "window size"? In other words, given a region of "high" amplitude, if you shrink the duration of the region, at what point will it become meaningless to your analysis?
If you can come up with a window, just apply this window to your data as it comes in and compute the energy within the window. Then, you can define some energy threshold and perform simple peak detection on the energy signal.

By inspection of your data, the regions with high amplitude peaks are repeated at what appears to be fairly uniform intervals. This suggests that you might fit a sine or cosine wave (or a combination of the two) to your data.
Excuse my crude sketch but what I mean is something like this:
Once you make this identification, you can use the FFT to get the dominant spatial frequencies. Keep in mind that the spatial frequency spectrum of your signal may be fairly complex, due to spurious data, but what you are after is one or two dominant frequencies of your data.
For example, I made up a sinusoid and you can do the calculation like this:
N = 255; % # of samples
x = linspace(-1/2, 1/2, N);
dx = x(2)-x(1);
nu = 8; % frequency in cycles/interval
vx = (1/(dx))*[-(N-1)/2:(N-1)/2]/N; % spatial frequency
y = sin(2*pi*nu*x); % this would be your data
F = fftshift(abs(fft(y))/N);
figure; set(gcf,'Color',[1 1 1]);
subplot(2,1,1);plot(x,y,'-b.'); grid on; xlabel('x'); grid on;
subplot(2,1,2);plot(vx,F,'-k.'); axis([-1.3*nu 1.3*nu 0 0.6]); xlabel('frequency'); grid on;
Which gives:
Note the peaks at ± nu, the dominant spatial frequency. Now once you have the dominant spatial frequencies you can reconstruct the sine wave using the frequencies that you have obtained from the FFT.
Finally, once you have your sine wave you can identify the boxes with centers at the peaks of the sine waves.
This is also a nice approach because it effectively filters out the spurious or less relevant spikes, helping you to properly place the boxes at your intended locations.
Since I don't have your data, I wasn't able to complete all of the code for you, but the idea is sound and you should be able to proceed from this point.

Related

Frequency domain phase shift, amplitude, hope size and non-linearity

I am trying to implement a frequency domain phase shift but there are few points on which I am not sure.
1- I am able to get a perfect reconstruction from a sine or sweep signal using a hanning window with a hop size of 50%. Nevertheless, how should I normalise my result when using a hop size > 50%?
2- When shifting the phase of low frequency signals (f<100, window size<1024, fs=44100) I can clearly see some non-linearity in my result. Is this because of the window size being to short for low frequencies?
Thank you very much for your help.
clear
freq=500;
fs=44100;
endTime=0.02;
t = 1/fs:1/fs:(endTime);
f1=linspace(freq,freq,fs*endTime);
x = sin(2*pi*f1.*t);
targetLength=numel(x);
L=1024;
w=hanning(L);
H=L*.50;% Hopsize of 50%
N=1024;
%match input length with window length
x=[zeros(L,1);x';zeros(L+mod(length(x),H),1)];
pend=length(x)- L ;
pin=0;
count=1;
X=zeros(N,1);
buffer0pad= zeros(N,1);
outBuffer0pad= zeros(L,1);
y=zeros(length(x),1);
delay=-.00001;
df = fs/N;
f= -fs/2:df:fs/2 - df;
while pin<pend
buffer = x(pin+1:pin+L).*w;
%append zero padding in the middle
buffer0pad(1:(L)/2)=buffer((L)/2+1: L);
buffer0pad(N-(L)/2+1:N)=buffer(1:(L)/2);
X = fft(buffer0pad,N);
% Phase modification
X = abs(X).*exp(1i*(angle(X))-(1i*2*pi*f'*delay));
outBuffer=real(ifft(X,N));
% undo zero padding----------------------
outBuffer0pad(1:L/2)=outBuffer(N-(L/2-1): N);
outBuffer0pad(L/2+1:L)=outBuffer(1:(L)/2);
%Overlap-add
y(pin+1:pin+L) = y(pin+1:pin+L) + outBuffer0pad;
pin=pin+H;
count=count+1;
end
%match output length with original input length
output=y(L+1:numel(y)-(L+mod(targetLength,H)));
figure(2)
plot(t,x(L+1:numel(x)-(L+mod(targetLength,H))))
hold on
plot(t,output)
hold off
Anything below 100 Hz has less than two cycles in your FFT window. Note that a DFT or FFT represents any waveform, including a single non-integer-periodic sinusoid, by possibly summing up of a whole bunch of sinusoids of very different frequencies. e.g. a lot more than just one. That's just how the math works.
For a von Hann window containing less than 2 cycles, these are often a bunch of mostly completely different frequencies (possibly very far away in terms of percentage from your low frequency). Shifting the phase of all those completely different frequencies may or may not shift your windowed low frequency sinusoid by the desired amount (depending on how different in frequency your signal is from being integer-periodic).
Also for low frequencies, the complex conjugate mirror needs to be shifted in the opposite direction in phase in order to still represent a completely real result. So you end up mixing 2 overlapped and opposite phase changes, which again is mostly a problem if the low frequency signal is far from being integer periodic in the DFT aperture.
Using a longer window in time and samples allows more cycles of a given frequency to fit inside it (thus possibly needing a lesser power of very different frequency sinusoids to be summed up in order to compose, make up or synthesize your low frequency sinusoid); and the complex conjugate is farther away in terms of FFT result bin index, thus reducing interference.
A sequence using any hop of a von Hann window that in 50% / (some-integer) in length is non-lossy (except for the very first or last window). All other hop sizes modulate or destroy information, and thus can't be normalized by a constant for reconstruction.

3-D Plot in MATLAB Containing: Time, Frequency and Power Spectral Density

I am currently working on a project for my Speech Processing course and have just finished making a time waveform plot as well as both wide/narrow band spectrograms for a spoken word in Spanish (aire).
The next part of the project is as follows:
Make a 3-D plot of each word signal, as a function of time, frequency and power spectral density. The analysis time step should be 20ms, and power density should be computed using a 75%-overlapped Hamming window and the FFT. Choose a viewing angle that best highlights the signal features as they change in time and frequency.
I was hoping that someone can offer me some guidance as to how to begin doing this part. I have started by looking here under the Spectrogram and Instantaneous Frequency heading but was unsure of how to add PSD to the script.
Thanks
I am going to give you an example.
I am going to generate a linear chirp signal.
Fs = 1000;
t = 0:1/Fs:2;
y = chirp(t,100,2,300,'linear');
And then, I am going to define number of fft and hamming window.
nfft=128;
win=hamming(nfft);
And then I am going to define length of overlap, 75% of nfft.
nOvl=nfft*0.75;
And then, I am performing STFT by using spectrogram function.
[s,f,t,pxx] = spectrogram(y,win,nOvl,nfft,Fs,'psd');
'y' is time signal, 'win' is defined hamming window, 'nOvl' is number of overlap, 'nfft' is number of fft, 'Fs' is sampling frequency, and 'psd' makes the result,pxx, as power spectral density.
Finally, I am going to plot the 'pxx' by using waterfall graph.
waterfall(f,t,pxx')
xlabel('frequency(Hz)')
ylabel('time(sec)')
zlabel('PSD')
The length of FFT, corresponding to 20ms, depends on sampling frequency of your signal.
EDIT : In plotting waterfall graph, I transposed pxx to change t and f axis.

How does this logic produce high and low pass filters?

I was studying for a signals & systems project and I have come across this code on high and low pass filters for an audio signal on the internet. Now I have tested this code and it works but I really don't understand how it is doing the low/high pass action.
The logic is that a sound is read into MATLAB by using the audioread or wavread function and the audio is stored as an nx2 matrix. The n depends on the sampling rate and the 2 columns are due to the 2 sterio channels.
Now here is the code for the low pass;
[hootie,fs]=wavread('hootie.wav'); % loads Hootie
out=hootie;
for n=2:length(hootie)
out(n,1)=.9*out(n-1,1)+hootie(n,1); % left
out(n,2)=.9*out(n-1,2)+hootie(n,2); % right
end
And this is for the high pass;
out=hootie;
for n=2:length(hootie)
out(n,1)=hootie(n,1)-hootie(n-1,1); % left
out(n,2)=hootie(n,2)-hootie(n-1,2); % right
end
I would really like to know how this produces the filtering effect since this is making no sense to me yet it works. Also shouldn't there be any cutoff points in these filters ?
The frequency response for a filter can be roughly estimated using a pole-zero plot. How this works can be found on the internet, for example in this link. The filter can be for example be a so called Finite Impulse Response (FIR) filter, or an Infinite Impulse Response (IIR) filter. The FIR-filters properties is determined only from the input signal (no feedback, open loop), while the IIR-filter uses the previous signal output to control the current signal output (feedback loop or closed loop). The general equation can be written like,
a_0*y(n)+a_1*y(n-1)+... = b_0*x(n)+ b_1*x(n-1)+...
Applying the discrete fourier transform you may define a filter H(z) = X(z)/Y(Z) using the fact that it is possible to define a filter H(z) so that Y(Z)=H(Z)*X(Z). Note that I skip a lot of steps here to cut down this text to proper length.
The point of the discussion is that these discrete poles can be mapped in a pole-zero plot. The pole-zero plot for digital filters plots the poles and zeros in a diagram where the normalized frequencies, relative to the sampling frequencies are illustrated by the unit circle, where fs/2 is located at 180 degrees( eg. a frequency fs/8 will be defined as the polar coordinate (r, phi)=(1,pi/4) ). The "zeros" are then the nominator polynom A(z) and the poles are defined by the denominator polynom B(z). A frequency close to a zero will have an attenuation at that frequency. A frequency close to a pole will instead have a high amplifictation at that frequency instead. Further, frequencies far from a pole is attenuated and frequencies far from a zero is amplified.
For your highpass filter you have a polynom,
y(n)=x(n)-x(n-1),
for each channel. This is transformed and it is possble to create a filter,
H(z) = 1 - z^(-1)
For your lowpass filter the equation instead looks like this,
y(n) - y(n-1) = x(n),
which becomes the filter
H(z) = 1/( 1-0.9*z^(-1) ).
Placing these filters in the pole-zero plot you will have the zero in the highpass filter on the positive x-axis. This means that you will have high attenuation for low frequencies and high amplification for high frequencies. The pole in the lowpass filter will also be loccated on the positive x-axis and will thus amplify low frequencies and attenuate high frequencies.
This description is best illustrated with images, which is why I recommend you to follow my links. Good luck and please comment ask if anything is unclear.

Sampling at exactly Nyquist rate in Matlab

Today I have stumbled upon a strange outcome in matlab. Lets say I have a sine wave such that
f = 1;
Fs = 2*f;
t = linspace(0,1,Fs);
x = sin(2*pi*f*t);
plot(x)
and the outcome is in the figure.
when I set,
f = 100
outcome is in the figure below,
What is the exact reason of this? It is the Nyquist sampling theorem, thus it should have generated the sine properly. Of course when I take Fs >> f I get better results and a very good sine shape. My explenation to myself is that Matlab was having hardtime with floating numbers but I am not so sure if this is true at all. Anyone have any suggestions?
In the first case you only generate 2 samples (the third input of linspace is number of samples), so it's hard to see anything.
In the second case you generate 200 samples from time 0 to 1 (including those two values). So the sampling period is 1/199, and the sampling frequency is 199, which is slightly below the Nyquist rate. So there is aliasing: you see the original signal of frequency 100 plus its alias at frequency 99.
In other words: the following code reproduces your second figure:
t = linspace(0,1,200);
x = .5*sin(2*pi*99*t) -.5*sin(2*pi*100*t);
plot(x)
The .5 and -.5 above stem from the fact that a sine wave can be decomposed as the sum of two spectral deltas at positive and negative frequencies, and the coefficients of those deltas have opposite signs.
The sum of those two sinusoids is equivalent to amplitude modulation, namely a sine of frequency 99.5 modulated by a sine of frequency 1/2. Since time spans from 0 to 1, the modulator signal (whose frequency is 1/2) only completes half a period. That's what you see in your second figure.
To avoid aliasing you need to increase sample rate above the Nyquist rate. Then, to recover the original signal from its samples you can use an ideal low pass filter with cutoff frequency Fs/2. In your case, however, since you are sampling below the Nyquist rate, you would not recover the signal at frequency 100, but rather its alias at frequency 99.
Had you sampled above the Nyquist rate, for example Fs = 201, the orignal signal could ideally be recovered from the samples.† But that would require an almost ideal low pass filter, with a very sharp transition between passband and stopband. Namely, the alias would now be at frequency 101 and should be rejected, whereas the desired signal would be at frequency 100 and should be passed.
To relax the filter requirements you need can sample well above the Nyquist rate. That way the aliases are further appart from the signal and the filter has an easier job separating signal from aliases.
† That doesn't mean the graph looks like your original signal (see SergV's answer); it only means that after ideal lowpass filtering it will.
Your problem is not related to the Nyquist theorem and aliasing. It is simple problem of graphic representation. You can change your code that frequency of sine will be lower Nyquist limit, but graph will be as strange as before:
t = linspace(0,1,Fs+2);
plot(sin(2*pi*f*t));
Result:
To explain problem I modify your code:
Fs=100;
f=12; %f << Fs
t=0:1/Fs:0.5; % step =1/Fs
t1=0:1/(10*Fs):0.5; % step=1/(10*Fs) for precise graphic representation
subplot (2, 1, 1);
plot(t,sin(2*pi*f*t),"-b",t,sin(2*pi*f*t),"*r");
subplot (2, 1, 2);
plot(t1,sin(2*pi*f*t1),"g",t,sin(2*pi*f*t),"r*");
See result:
Red star - values of sin(2*pi*f) with sampling rate of Fs.
Blue line - lines which connect red stars. It is usual data representation of function plot() - line interpolation between data points
Green curve - sin(2*pi*f)
Your eyes and brain can easily understand that these graphs represent the sine
Change frequency to more high:
f=48; % 2*f < Fs !!!
See on blue lines and red stars. Your eyes and brain do not understand now that these graphs represent the same sine. But your "red stars" are actually valid value of sine. See on bottom graph.
Finally, there is the same graphics for sine with frequency f=50 (2*f = Fs):
P.S.
Nyquist-Shannon sampling theorem states for your case that if:
f < 2*Fs
You have infinite number of samples (red stars on our plots)
then you can reproduce values of function in any time (green curve on our plots). You must use sinc interpolation to do it.
copied from Matlab Help:
linspace
Generate linearly spaced vectors
Syntax
y = linspace(a,b)
y = linspace(a,b,n)
Description
The linspace function generates linearly spaced vectors. It is similar to the colon operator ":", but gives direct control over the number of points.
y = linspace(a,b) generates a row vector y of 100 points linearly spaced between and including a and b.
y = linspace(a,b,n) generates a row vector y of n points linearly spaced between and including a and b. For n < 2, linspace returns b.
Examples
Create a vector of 100 linearly spaced numbers from 1 to 500:
A = linspace(1,500);
Create a vector of 12 linearly spaced numbers from 1 to 36:
A = linspace(1,36,12);
linspace is not apparent for Nyquist interval, so you can use the common form:
t = 0:Ts:1;
or
t = 0:1/Fs:1;
and change the Fs values.
The first Figure is due to the approximation of '0': sin(0) and sin(2*pi). We can notice the range is in 10^(-16) level.
I wrote the function reconstruct_FFT that can recover critically sampled data even for short observation intervals if the input sequence of samples is periodic. It performs lowpass filtering in the frequency domain.

how can the noise be removed from a recorded sound,using fft in MATLAB?

I want to remove noises from a recorded sound and make the fft of it finding fundamental frequencies of that sound, but I don't know how to remove those noises. I'm recording the sound of falling objects from different heights. I want to find the relation between the height and the maximum frequency of the recorded sound.
[y,fs]=wavread('100cmfreefall.wav');
ch1=y(:,1);
time=(1/44100)*length(ch1);
t=linspace(0,time,length(ch1));
L=length(ch1);
NFFT = 2^nextpow2(L); % Next power of 2 from length of y
Y = fft(y,NFFT)/L;
Y1=log10(Y);
figure(1)
f = fs/2*linspace(0,1,NFFT/2+1);
plot(f,2*abs(Y1(1:NFFT/2+1))) ;
[b,a]=butter(10,3000/(44100/2),'high');
Y1=filtfilt(b,a,Y1);
% freqz(b,a)
figure(2)
plot(f,2*abs(Y1(1:NFFT/2+1))) ;
title('Single-Sided Amplitude Spectrum of y(t)');
xlabel('Frequency (Hz)');
ylabel('|Y(f)|')
xlim([0 50000])
% soundsc(ch1(1:100000),44100)
Saying that there is noise in your signal is very vague and doesn't convey much information at all. Some of the questions are:
Is the noise high frequency or low frequency?
Is it well separated from your signal's frequency band or is it mixed in?
Does the noise follow a statistical model? Can it be described as a stationary process?
Is the noise another deterministic interfering signal?
The approach you take will certainly depend on the answers to the above questions.
However, from the experiment setup that you described, my guess is that your noise is just a background noise, that in most cases, can be approximated to be white in nature. White noise refers to a statistical noise model that has a constant power at all frequencies.
The simplest approach will be to use a low pass filter or a band pass filter to retain only those frequencies that you are interested in (a quick look at the frequency spectrum should reveal this, if you do not know it already). In a previous answer of mine, to a related question on filtering using MATLAB, I provide examples of creating low-pass filters and common pitfalls. You can probably read through that and see if it helps you.
A simple example:
Consider a sinusoid with a frequency of 50 Hz, sampled at 1000 Hz. To that, I add Gaussian white noise such that the SNR is ~ -6dB. The original signal and the noisy signal can be seen in the top row of the figure below (only 50 samples are shown). As you can see, it almost looks as if there is no hope with the noisy signal as all structure seems to have been destroyed. However, taking an FFT, reveals the buried sinusoid (shown in the bottom row)
Filtering the noisy signal with a narrow band filter from 48 to 52 Hz, gives us a "cleaned" signal. There will of course be some loss in amplitude due to the noise. However, the signal has been retrieved from what looked like a lost cause at first.
How you proceed depends on your exact application. But I hope this helped you understand some of the basics of noise filtering.
EDIT
#Shabnam: It's been nearly 50 comments, and I really do not see you making any effort to understand or at the very least, try things on your own. You really should learn to read the documentation and learn the concepts and try it instead of running back for every single error. Anyway, please try the following (modified from your code) and show the output in the comments.
[y,fs]=wavread('100cmfreefall.wav');
ch1=y(:,1);
time=(1/fs)*length(ch1);
t=linspace(0,time,length(ch1));
L=length(ch1);
NFFT = 2^nextpow2(L);
f = fs/2*linspace(0,1,NFFT/2+1);
[b,a]=butter(10,3e3/(fs/2),'high');
y1=filtfilt(b,a,ch1);
figure(1)
subplot(2,1,1)
Y=fft(ch1,NFFT)/L;
plot(f,log10(abs(Y(1:NFFT/2+1))))
title('unfiltered')
subplot(2,1,2)
Y1=fft(y1,NFFT)/L;
plot(f,log10(abs(Y1(1:NFFT/2+1))))
title('filtered')
Answer to your question is highly dependent on the characteristics of what you call "noise" - its spectral distribution, the noise being stationary or not, the source of the noise (does it originate in the environment or the recording chain?).
If the noise is stationary, i.e its statistical characteristics do not change over time, you can try recording a few seconds (10-15 is a good initial guess) of noise only, preform FFT, and then subtract the value of the noise in FFT bin n from your measurement FFT bin n.
You can read some background here: http://en.wikipedia.org/wiki/Noise_reduction