MATLAB filtering a signal results in NaN [duplicate] - matlab

I'm trying to filter theta range (3-8 Hz) from a 10 min long EEG signal with sampling rate of 500Hz. This is my code. Please help me to understand what's wrong. Right now the filtered signal seems to be ruined. Thank you so much!
fs=500;
Wp = [3 8]/(fs/2); Ws = [2.5 8.5]/(fs/2);
Rp = 3; Rs = 40;
[n,Wn] = buttord(Wp,Ws,Rp,Rs);
[b,a] = butter(n,Wn,'bandpass');
fdata = filter(b,a,data);
x=0:ts:((length(data)/fs)-ts);
f=-fs/2:fs/(length(data)-1):fs/2;
subplot(2,2,1)
plot(x,data)
subplot(2,2,2)
z1=abs(fftshift(fft(data)));
plot(f,z1)
xlim([0 150]);
subplot(2,2,3)
plot(x,fdata)
subplot(2,2,4)
z=abs(fftshift(fft(fdata)));
plot(f,z);
xlim([0 150]);

Your code (line 4) gives a filter order, n, equal to 37. I've had issues of numerical precision with Butterworth filters of such large orders; even with orders as low as 8. The problem is that butter gives absurd b and a values for large orders. Check your b and a vectors, and you'll see they contain values of about 1e21 (!)
The solution is to use the zero-pole representation of the filter, instead of the coefficient (b, a) representation. You can read more about this here. In particular,
In general, you should use the [z,p,k] syntax to design IIR filters. To analyze or implement your filter, you can then use the [z,p,k] output with zp2sos. If you design the filter using the [b,a] syntax, you may encounter numerical problems. These problems are due to round-off errors. They may occur for filter orders as low as 4.
In your case, you could proceed along the following lines:
[z, p, k] = butter(n,Wn,'bandpass');
[sos,g] = zp2sos(z,p,k);
filt = dfilt.df2sos(sos,g);
fdata = filter(filt,data)

Related

finding peaks that have some structure in 1-d

I have signal with peaks that has some structure on top of some background, and I'm trying to find a robust way to locate their positions and amplitudes.
For example, assume the peak has this form:
t=linspace(0,10,1e3);
w=0.25;
rw=#(t0) 2/(sqrt(3*w)*pi^0.25)*(1-((t-t0)/w).^2).*exp(-(t-t0).^2/(2*w.^2));
and I have several peaks a bit too close so their structure starts to interfere with the others, and one separated, so you can see it's structure:
pos=[ 2 2.5 3 8]; % positions
total_signal=0.*t;
for i=1:length(pos)
total_signal=total_signal+rw(pos(i));
end
plot(t,total_signal);
So the goal is to find all peaks found in total_signal and check that their positions agree with the original positions that were used to generate them in pos.
Your signal can be thought of as the convolution of a pulse train with the peak shape. Deconvolution can be used to retrieve the pulse train, whose peaks are the locations you are looking for. This assumes that the peak shape is known and constant.
I will start by rewriting your code as follows:
t = linspace(0,10,1e3);
w = 0.25;
rw = #(t) 2/(sqrt(3*w)*pi^0.25)*(1-((t)/w).^2).*exp(-(t).^2/(2*w.^2));
pos = [2, 2.5, 3, 8];
total_signal = zeros(size(t));
for i=1:length(pos)
total_signal = total_signal + rw(t-pos(i));
end
total_signal = total_signal + randn(size(t))*1e-2;
plot(t,total_signal);
I've changed rw to take t-t0, rather than just t0. This will allow me later to create a clean signal with just one peak in the middle. I have also added noise to the signal, for a more realistic problem. Without the noise, the problem is a lot easier to solve.
Wiener deconvolution is the simplest approach to solve this problem. In short, we assume that
total_signal = conv(pulse_train, shape)
In the frequency domain, this is written as
G = F .* H
(with .* the element-wise multiplication, using MATLAB syntax here). The Wiener deconvolution is:
F = (conj(H) .* G) ./ (abs(H).^2 + k)
with k some constant that we can tune to regularize the solution.
We implement this as follows:
shape = rw(linspace(-5,5,1e3));
G = fft(total_signal);
H = fft(ifftshift(shape)); % ifftshift moves the origin to sample #0, as expected by FFT.
k = 1;
F = (conj(H) .* G) ./ (abs(H).^2 + k);
pulse_train = ifft(F);
Now, findpeaks (requires the Signal Processing Toolbox) can be used to find the prominent peaks:
findpeaks(pulse_train, 'MinPeakProminence', 0.02)
Note how the four peaks are approximately the same height. It's not exact, because we're regularizing to deal with the noise. An exact solution is only possible in the noise-free case. Without noise, k=0, and the expression simplifies to F = G ./ H.
Also, the x-axis is off in the plot produced by findpeaks, but this shouldn't affect the results. The locations returned by it are indices into the array, those same indices can be used to index into t and find the actual locations of the peaks.

Scaling problems with IFFT in Matlab

I'm studying the IFFT in Matlab by applying it to a Gaussian. According to Wikipedia tables, the Fourier transform pair would be
F(w) = sqrt(pi/a) * exp(-w^2/(4a))
in frequency, and
f(t) = exp(-at^2)
in time. I modified the code in a previous question plus Cris Luengo's answer to perform this IFFT.
a = 0.333;
ts = 1e4; % time sampling
L = 1000*ts; % no. sample points
ds = 1/ts;
f = -floor(L/2):floor((L-1)/2); % freq vector
f = f/ts;
w = 2*pi*f; % angular freq
Y = sqrt(pi/a)*exp(-w.^2/(4*a));
y = ts*ifftshift(ifft(fftshift(Y)));
t = (-L/2:L/2-1)*ts/L; % time vector
f = exp(-a*t.^2); % analytical solution
figure; subplot(1,2,1); hold on
plot(t,real(y),'.--')
plot(t,real(f),'-')
xlabel('time, t')
title('real')
legend('numerical','analytic')
xlim([-5,5])
subplot(1,2,2); hold on
plot(w,imag(y),'.--')
plot(w,imag(f),'-')
xlabel('time, t')
title('imag')
legend('numerical','analytic')
xlim([-5,5])
When I compare the result of IFFT with the analytical expression, they don't seem to agree:
I'm not sure where the mistake is. Have I scaled the IFFT properly? Is there an error in how I define the linear/angular frequency?
Edit: For some reason, when I define L=ts^2 the analytical and numerical solutions seem to agree (L = no. sampling points, ts = time sample).
Starting from the analytical solution, let's rephrase things a bit. You have a sampling of the function f(t) = exp(-a*t^2), and the way you've constructed the analytical answer you're collecting L=1000*ts=1e7 samples at a sampling rate of Ts=ts/L=1e-3. This means that your sampling frequency is Fs=1/Ts=1e3.
Since you want to compare against results obtained with fft/ifft, you should be considering digital or discrete frequencies, meaning the values you define for your transform will correspond to the digital frequencies
frd = (-L/2:L/2-1)/L;
Mapping this to angular frequencies, we have:
w = 2*pi*frd;
But when you're trying to compute the values, you also need to keep in mind that these frequencies should represent samples of the continuous time spectrum you're expecting. So you scale these values by your sampling frequency:
Y = sqrt(pi/a)*exp(-(Fs*w).^2/(4*a));
y = Fs*ifftshift(ifft(fftshift(Y)));
When you compare the analytical and computed answers, they now match.
The short answer to your question, given this, is that you are scaling y incorrectly at the end. You're scaling it by ts, which is 1e4, but you need to be scaling it by the sampling frequency which is Fs=1e3. That's why you end up off by a factor of 10.

How to fit an exponential curve to damped harmonic oscillation data in MATLAB?

I'm trying to fit an exponential curve to data sets containing damped harmonic oscillations. The data is a bit complicated in the sense that the sinusoidal oscillations contain many frequencies as seen below:
I need to find the rate of decay in the data. The method I am using can be found here. How it works, is it takes the log of the y values above the steady state value and then uses:
lsqlin(A,y1(:),-A,-y1(:),[],[],[],[],[],optimset('algorithm','active-set','display','off'))
To fit it.
However, this results in the following data fits:
I tried using a linear regression fit which obviously didn't work because it took the average. I also tried RANSAC thinking that there is more data near the peaks. It worked a bit better than the linear regression but the method is flawed as there are times when more points exist at the wrong regions.
Does anyone know of a good method to just fit the peaks for this data?
Currently, I'm thinking of dividing the 500 data points into 10 different regions and in each region find the largest value. At the end, I should have 50 points that I can fit using any of the exponential fitting methods mentioned above. What do you think of this method?
Thought I'd give everyone an update of potential solutions that may work. As mentioned earlier, the data is complicated by the varying sinusoidal frequencies, so certain methods may not work because of this. The methods listed below can be good depending on the data and the frequencies involved.
First off, I assume that the data has the form:
y = average + b*e^-(c*x)
In my case, the average is 290 so we have:
y = 290 + b*e^-(c*x)
With that being said, let's dive into the different methods that I tried:
findpeaks() Method
This is the method that Alexander Büse suggested. It's a pretty good method for most data, but for my data, since there's multiple sinusoidal frequencies, it gets the wrong peaks. The red x's show the peaks.
% Find Peaks Method
[max_num,max_ind] = findpeaks(y(ind));
plot(max_ind,max_num,'x','Color','r'); hold on;
x1 = max_ind;
y1 = log(max_num-290);
coeffs = polyfit(x1,y1,1)
b = exp(coeffs(2));
c = coeffs(1);
RANSAC
RANSAC is good if you have most of your data at the peaks. You see that in mine, because of the multiple frequencies, more peaks exist near the top. However, the problem with my data is that not all the data sets are like this. Hence, it occasionally worked.
% RANSAC Method
ind = (y > avg);
x1 = x(ind);
y1 = log(y(ind) - avg);
iterNum = 300;
thDist = 0.5;
thInlrRatio = .1;
[t,r] = ransac([x1;y1'],iterNum,thDist,thInlrRatio);
k1 = -tan(t);
b1 = r/cos(t);
% plot(x1,k1*x1+b1,'r'); hold on;
b = exp(b1);
c = k1;
Lsqlin Method
This method is the one used here. It uses Lsqlin to constrain the system. However, it seems to ignore the data in the middle. Depending on your data set, this could work really well as it did for the person in the original post.
% Lsqlin Method
avg = 290;
ind = (y > avg);
x1 = x(ind);
y1 = log(y(ind) - avg);
A = [ones(numel(x1),1),x1(:)]*1.00;
coeffs = lsqlin(A,y1(:),-A,-y1(:),[],[],[],[],[],optimset('algorithm','active-set','display','off'));
b = exp(coeffs(2));
c = coeffs(1);
Find Peaks in Period
This is the method I mentioned in my post where I get the peak in each region, . This method works pretty well and from this I realized that my data may not actually have a perfect exponential fit. We see that it is unable to fit the large peaks at the beginning. I was able to make this a bit better by only using the first 150 data points and ignoring the steady state data points. Here I found the peak every 25 data points.
% Incremental Method 2 Unknowns
x1 = [];
y1 = [];
max_num=[];
max_ind=[];
incr = 25;
for i=1:floor(size(y,1)/incr)
[max_num(end+1),max_ind(end+1)] = max(y(1+incr*(i-1):incr*i));
max_ind(end) = max_ind(end) + incr*(i-1);
if max_num(end) > avg
x1(end+1) = max_ind(end);
y1(end+1) = log(max_num(end)-290);
end
end
plot(max_ind,max_num,'x','Color','r'); hold on;
coeffs = polyfit(x1,y1,1)
b = exp(coeffs(2));
c = coeffs(1);
Using all 500 data points:
Using the first 150 data points:
Find Peaks in Period With b Constrained
Since I want it to start at the first peak, I constrained the b value. I know the system is y=290+b*e^-c*x and I constrain it such that b=y(1)-290. By doing so, I just need to solve for c where c=(log(y-290)-logb)/x. I can then take the average or median of c. This method is quite good as well, it doesn't fit the value near the end as well but that isn't as big of a deal since the change there is minimal.
% Incremental Method 1 Unknown (b is constrained y(1)-290 = b)
b = y(1) - 290;
c = [];
max_num=[];
max_ind=[];
incr = 25;
for i=1:floor(size(y,1)/incr)
[max_num(end+1),max_ind(end+1)] = max(y(1+incr*(i-1):incr*i));
max_ind(end) = max_ind(end) + incr*(i-1);
if max_num(end) > avg
c(end+1) = (log(max_num(end)-290)-log(b))/max_ind(end);
end
end
c = mean(c); % Or median(c) works just as good
Here I take the peak for every 25 data points and then take the mean of c
Here I take the peak for every 25 data points and then take the median of c
Here I take the peak for every 10 data points and then take the mean of c
If the main goal is to extract the damping parameter from the fit, maybe you want to consider fitting directly a damped sine curve to your data. Something like this (created with the curve fitting tool):
[xData, yData] = prepareCurveData( x, y );
ft = fittype( 'a + sin(b*x - c).*exp(d*x)', 'independent', 'x', 'dependent', 'y' );
opts = fitoptions( 'Method', 'NonlinearLeastSquares' );
opts.Display = 'Off';
opts.StartPoint = [1 0.285116122712545 0.805911873245316 0.63235924622541];
[fitresult, gof] = fit( xData, yData, ft, opts );
plot( fitresult, xData, yData );
Especially since some of your example data really don't have many data points in the interesting region (above the noise).
If however, you really need to fit directly to maxima of the experimental data, you could use the findpeaks function to select only the maxima and then fit to them. You may want to play a bit with the MinPeakProminence parameter to adjust it to your needs.

How does the sgolay function work in Matlab R2013a?

I have a question about the sgolay function in Matlab R2013a. My database has 165 spectra with 2884 variables and I would like to take the first and second derivatives of them. How might I define the inputs K and F to sgolay?
Below is an example:
sgolay is used to smooth a noisy sinusoid and compare the resulting first and second derivatives to the first and second derivatives computed using diff. Notice how using diff amplifies the noise and generates useless results.
K = 4; % Order of polynomial fit
F = 21; % Window length
[b,g] = sgolay(K,F); % Calculate S-G coefficients
dx = .2;
xLim = 200;
x = 0:dx:xLim-1;
y = 5*sin(0.4*pi*x)+randn(size(x)); % Sinusoid with noise
HalfWin = ((F+1)/2) -1;
for n = (F+1)/2:996-(F+1)/2,
% Zero-th derivative (smoothing only)
SG0(n) = dot(g(:,1), y(n - HalfWin: n + HalfWin));
% 1st differential
SG1(n) = dot(g(:,2), y(n - HalfWin: n + HalfWin));
% 2nd differential
SG2(n) = 2*dot(g(:,3)', y(n - HalfWin: n + HalfWin))';
end
SG1 = SG1/dx; % Turn differential into derivative
SG2 = SG2/(dx*dx); % and into 2nd derivative
% Scale the "diff" results
DiffD1 = (diff(y(1:length(SG0)+1)))/ dx;
DiffD2 = (diff(diff(y(1:length(SG0)+2)))) / (dx*dx);
subplot(3,1,1);
plot([y(1:length(SG0))', SG0'])
legend('Noisy Sinusoid','S-G Smoothed sinusoid')
subplot(3, 1, 2);
plot([DiffD1',SG1'])
legend('Diff-generated 1st-derivative', 'S-G Smoothed 1st-derivative')
subplot(3, 1, 3);
plot([DiffD2',SG2'])
legend('Diff-generated 2nd-derivative', 'S-G Smoothed 2nd-derivative')
Taking derivatives in an inherently noisy process. Thus, if you already have some noise in your data, indeed, it will be magnified as you take higher order derivatives. Savitzky-Golay is a very useful way of combining smoothing and differentiation into one operation. It's a general method and it computes derivatives to an arbitrary order. There are trade-offs, though. Other special methods exist for data with a certain structure.
In terms of your application, I don't have any concrete answers. Much depends on the nature of the data (sampling rate, noise ratio, etc.). If you use too much smoothing, you'll smear your data or produce aliasing. Same thing if you over-fit the data by using high order polynomial coefficients, K. In your demo code you should also plot the analytical derivatives of the sin function. Then play with different amounts of input noise and smoothing filters. Such a tool with known exact answers may be helpful if you can approximate aspects of your real data. In practice, I try to use as little smoothing as possible in order to produce derivatives that aren't too noisy. Often this means a third-order polynomial (K = 3) and a window size, F, as small as possible.
So yes, many suggest that you use your eyes to tune these parameters. However, there has also been some very recent research on choosing the coefficients automatically: On the Selection of Optimum Savitzky-Golay Filters (2013). There are also alternatives to Savitzky-Golay, e.g., this paper based on regularization, but you may need to implement them yourself in Matlab.
By the way, a while back I wrote a little replacement for sgolay. Like you, I only needed the second output, the differentiation filters, G, so that's all it calculates. This function is also faster (by about 2–4 times):
function G=sgolayfilt(k,f)
%SGOLAYFILT Savitzky-Golay differentiation filters
s = vander(0.5*(1-f):0.5*(f-1));
S = s(:,f:-1:f-k);
[~,R] = qr(S,0);
G = S/R/R';
A full version of this function with input validation is available on my GitHub.

Matlab - Signal Noise Removal

I have a vector of data, which contains integers in the range -20 20.
Bellow is a plot with the values:
This is a sample of 96 elements from the vector data. The majority of the elements are situated in the interval -2, 2, as can be seen from the above plot.
I want to eliminate the noise from the data. I want to eliminate the low amplitude peaks, and keep the high amplitude peak, namely, peaks like the one at index 74.
Basically, I just want to increase the contrast between the high amplitude peaks and low amplitude peaks, and if it would be possible to eliminate the low amplitude peaks.
Could you please suggest me a way of doing this?
I have tried mapstd function, but the problem is that it also normalizes that high amplitude peak.
I was thinking at using the wavelet transform toolbox, but I don't know exact how to reconstruct the data from the wavelet decomposition coefficients.
Can you recommend me a way of doing this?
One approach to detect outliers is to use the three standard deviation rule. An example:
%# some random data resembling yours
x = randn(100,1);
x(75) = -14;
subplot(211), plot(x)
%# tone down the noisy points
mu = mean(x); sd = std(x); Z = 3;
idx = ( abs(x-mu) > Z*sd ); %# outliers
x(idx) = Z*sd .* sign(x(idx)); %# cap values at 3*STD(X)
subplot(212), plot(x)
EDIT:
It seems I misunderstood the goal here. If you want to do the opposite, maybe something like this instead:
%# some random data resembling yours
x = randn(100,1);
x(75) = -14; x(25) = 20;
subplot(211), plot(x)
%# zero out everything but the high peaks
mu = mean(x); sd = std(x); Z = 3;
x( abs(x-mu) < Z*sd ) = 0;
subplot(212), plot(x)
If it's for demonstrative purposes only, and you're not actually going to be using these scaled values for anything, I sometimes like to increase contrast in the following way:
% your data is in variable 'a'
plot(a.*abs(a)/max(abs(a)))
edit: since we're posting images, here's mine (before/after):
You might try a split window filter. If x is your current sample, the filter would look something like:
k = [L L L L L L 0 0 0 x 0 0 0 R R R R R R]
For each sample x, you average a band of surrounding samples on the left (L) and a band of surrounding samples on the right. If your samples are positive and negative (as yours are) you should take the abs. value first. You then divide the sample x by the average value of these surrounding samples.
y[n] = x[n] / mean(abs(x([L R])))
Each time you do this the peaks are accentuated and the noise is flattened. You can do more than one pass to increase the effect. It is somewhat sensitive to the selection of the widths of these bands, but can work. For example:
Two passes:
What you actually need is some kind of compression to scale your data, that is: values between -2 and 2 are scale by a certain factor and everything else is scaled by another factor. A crude way to accomplish such a thing, is by putting all small values to zero, i.e.
x = randn(1,100)/2; x(50) = 20; x(25) = -15; % just generating some data
threshold = 2;
smallValues = (abs(x) <= threshold);
y = x;
y(smallValues) = 0;
figure;
plot(x,'DisplayName','x'); hold on;
plot(y,'r','DisplayName','y');
legend show;
Please do not that this is a very nonlinear operation (e.g. when you have wanted peaks valued at 2.1 and 1.9, they will produce very different behavior: one will be removed, the other will be kept). So for displaying, this might be all you need, for further processing it might depend on what you are trying to do.
To eliminate the low amplitude peaks, you're going to equate all the low amplitude signal to noise and ignore.
If you have any apriori knowledge, just use it.
if your signal is a, then
a(abs(a)<X) = 0
where X is the max expected size of your noise.
If you want to get fancy, and find this "on the fly" then, use kmeans of 3. It's in the statistics toolbox, here:
http://www.mathworks.com/help/toolbox/stats/kmeans.html
Alternatively, you can use Otsu's method on the absolute values of the data, and use the sign back.
Note, these and every other technique I've seen on this thread is assuming you are doing post processing. If you are doing this processing in real time, things will have to change.