How does the sgolay function work in Matlab R2013a? - matlab

I have a question about the sgolay function in Matlab R2013a. My database has 165 spectra with 2884 variables and I would like to take the first and second derivatives of them. How might I define the inputs K and F to sgolay?
Below is an example:
sgolay is used to smooth a noisy sinusoid and compare the resulting first and second derivatives to the first and second derivatives computed using diff. Notice how using diff amplifies the noise and generates useless results.
K = 4; % Order of polynomial fit
F = 21; % Window length
[b,g] = sgolay(K,F); % Calculate S-G coefficients
dx = .2;
xLim = 200;
x = 0:dx:xLim-1;
y = 5*sin(0.4*pi*x)+randn(size(x)); % Sinusoid with noise
HalfWin = ((F+1)/2) -1;
for n = (F+1)/2:996-(F+1)/2,
% Zero-th derivative (smoothing only)
SG0(n) = dot(g(:,1), y(n - HalfWin: n + HalfWin));
% 1st differential
SG1(n) = dot(g(:,2), y(n - HalfWin: n + HalfWin));
% 2nd differential
SG2(n) = 2*dot(g(:,3)', y(n - HalfWin: n + HalfWin))';
end
SG1 = SG1/dx; % Turn differential into derivative
SG2 = SG2/(dx*dx); % and into 2nd derivative
% Scale the "diff" results
DiffD1 = (diff(y(1:length(SG0)+1)))/ dx;
DiffD2 = (diff(diff(y(1:length(SG0)+2)))) / (dx*dx);
subplot(3,1,1);
plot([y(1:length(SG0))', SG0'])
legend('Noisy Sinusoid','S-G Smoothed sinusoid')
subplot(3, 1, 2);
plot([DiffD1',SG1'])
legend('Diff-generated 1st-derivative', 'S-G Smoothed 1st-derivative')
subplot(3, 1, 3);
plot([DiffD2',SG2'])
legend('Diff-generated 2nd-derivative', 'S-G Smoothed 2nd-derivative')

Taking derivatives in an inherently noisy process. Thus, if you already have some noise in your data, indeed, it will be magnified as you take higher order derivatives. Savitzky-Golay is a very useful way of combining smoothing and differentiation into one operation. It's a general method and it computes derivatives to an arbitrary order. There are trade-offs, though. Other special methods exist for data with a certain structure.
In terms of your application, I don't have any concrete answers. Much depends on the nature of the data (sampling rate, noise ratio, etc.). If you use too much smoothing, you'll smear your data or produce aliasing. Same thing if you over-fit the data by using high order polynomial coefficients, K. In your demo code you should also plot the analytical derivatives of the sin function. Then play with different amounts of input noise and smoothing filters. Such a tool with known exact answers may be helpful if you can approximate aspects of your real data. In practice, I try to use as little smoothing as possible in order to produce derivatives that aren't too noisy. Often this means a third-order polynomial (K = 3) and a window size, F, as small as possible.
So yes, many suggest that you use your eyes to tune these parameters. However, there has also been some very recent research on choosing the coefficients automatically: On the Selection of Optimum Savitzky-Golay Filters (2013). There are also alternatives to Savitzky-Golay, e.g., this paper based on regularization, but you may need to implement them yourself in Matlab.
By the way, a while back I wrote a little replacement for sgolay. Like you, I only needed the second output, the differentiation filters, G, so that's all it calculates. This function is also faster (by about 2–4 times):
function G=sgolayfilt(k,f)
%SGOLAYFILT Savitzky-Golay differentiation filters
s = vander(0.5*(1-f):0.5*(f-1));
S = s(:,f:-1:f-k);
[~,R] = qr(S,0);
G = S/R/R';
A full version of this function with input validation is available on my GitHub.

Related

finding peaks that have some structure in 1-d

I have signal with peaks that has some structure on top of some background, and I'm trying to find a robust way to locate their positions and amplitudes.
For example, assume the peak has this form:
t=linspace(0,10,1e3);
w=0.25;
rw=#(t0) 2/(sqrt(3*w)*pi^0.25)*(1-((t-t0)/w).^2).*exp(-(t-t0).^2/(2*w.^2));
and I have several peaks a bit too close so their structure starts to interfere with the others, and one separated, so you can see it's structure:
pos=[ 2 2.5 3 8]; % positions
total_signal=0.*t;
for i=1:length(pos)
total_signal=total_signal+rw(pos(i));
end
plot(t,total_signal);
So the goal is to find all peaks found in total_signal and check that their positions agree with the original positions that were used to generate them in pos.
Your signal can be thought of as the convolution of a pulse train with the peak shape. Deconvolution can be used to retrieve the pulse train, whose peaks are the locations you are looking for. This assumes that the peak shape is known and constant.
I will start by rewriting your code as follows:
t = linspace(0,10,1e3);
w = 0.25;
rw = #(t) 2/(sqrt(3*w)*pi^0.25)*(1-((t)/w).^2).*exp(-(t).^2/(2*w.^2));
pos = [2, 2.5, 3, 8];
total_signal = zeros(size(t));
for i=1:length(pos)
total_signal = total_signal + rw(t-pos(i));
end
total_signal = total_signal + randn(size(t))*1e-2;
plot(t,total_signal);
I've changed rw to take t-t0, rather than just t0. This will allow me later to create a clean signal with just one peak in the middle. I have also added noise to the signal, for a more realistic problem. Without the noise, the problem is a lot easier to solve.
Wiener deconvolution is the simplest approach to solve this problem. In short, we assume that
total_signal = conv(pulse_train, shape)
In the frequency domain, this is written as
G = F .* H
(with .* the element-wise multiplication, using MATLAB syntax here). The Wiener deconvolution is:
F = (conj(H) .* G) ./ (abs(H).^2 + k)
with k some constant that we can tune to regularize the solution.
We implement this as follows:
shape = rw(linspace(-5,5,1e3));
G = fft(total_signal);
H = fft(ifftshift(shape)); % ifftshift moves the origin to sample #0, as expected by FFT.
k = 1;
F = (conj(H) .* G) ./ (abs(H).^2 + k);
pulse_train = ifft(F);
Now, findpeaks (requires the Signal Processing Toolbox) can be used to find the prominent peaks:
findpeaks(pulse_train, 'MinPeakProminence', 0.02)
Note how the four peaks are approximately the same height. It's not exact, because we're regularizing to deal with the noise. An exact solution is only possible in the noise-free case. Without noise, k=0, and the expression simplifies to F = G ./ H.
Also, the x-axis is off in the plot produced by findpeaks, but this shouldn't affect the results. The locations returned by it are indices into the array, those same indices can be used to index into t and find the actual locations of the peaks.

Analytical Fourier transform vs FFT of functions in Matlab

I have adapted the code in Comparing FFT of Function to Analytical FT Solution in Matlab for this question. I am trying to do FFTs and comparing the result with analytical expressions in the Wikipedia tables.
My code is:
a = 1.223;
fs = 1e5; %sampling frequency
dt = 1/fs;
t = 0:dt:30-dt; %time vector
L = length(t); % no. sample points
t = t - 0.5*max(t); %center around t=0
y = ; % original function in time
Y = dt*fftshift(abs(fft(y))); %numerical soln
freq = (-L/2:L/2-1)*fs/L; %freq vector
w = 2*pi*freq; % angular freq
F = ; %analytical solution
figure; subplot(1,2,1); hold on
plot(w,real(Y),'.')
plot(w,real(F),'-')
xlabel('Frequency, w')
title('real')
legend('numerical','analytic')
xlim([-5,5])
subplot(1,2,2); hold on;
plot(w,imag(Y),'.')
plot(w,imag(F),'-')
xlabel('Frequency, w')
title('imag')
legend('numerical','analytic')
xlim([-5,5])
If I study the Gaussian function and let
y = exp(-a*t.^2); % original function in time
F = exp(-w.^2/(4*a))*sqrt(pi/a); %analytical solution
in the above code, looks like there is good agreement when the real and imaginary parts of the function are plotted:
But if I study a decaying exponential multiplied with a Heaviside function:
H = #(x)1*(x>0); % Heaviside function
y = exp(-a*t).*H(t);
F = 1./(a+1j*w); %analytical solution
then
Why is there a discrepancy? I suspect it's related to the line Y = but I'm not sure why or how.
Edit: I changed the ifftshift to fftshift in Y = dt*fftshift(abs(fft(y)));. Then I also removed the abs. The second graph now looks like:
What is the mathematical reason behind the 'mirrored' graph and how can I remove it?
The plots at the bottom of the question are not mirrored. If you plot those using lines instead of dots you'll see the numeric results have very high frequencies. The absolute component matches, but the phase doesn't. When this happens, it's almost certainly a case of a shift in the time domain.
And indeed, you define the time domain function with the origin in the middle. The FFT expects the origin to be at the first (leftmost) sample. This is what ifftshift is for:
Y = dt*fftshift(fft(ifftshift(y)));
ifftshift moves the origin to the first sample, in preparation for the fft call, and fftshift moves the origin of the result to the middle, for display.
Edit
Your t does not have a 0:
>> t(L/2+(-1:2))
ans =
-1.5000e-05 -5.0000e-06 5.0000e-06 1.5000e-05
The sample at t(floor(L/2)+1) needs to be 0. That is the sample that ifftshift moves to the leftmost sample. (I use floor there in case L is odd in size, not the case here.)
To generate a correct t do as follows:
fs = 1e5; % sampling frequency
L = 30 * fs;
t = -floor(L/2):floor((L-1)/2);
t = t / fs;
I first generate an integer t axis of the right length, with 0 at the correct location (t(floor(L/2)+1)==0). Then I convert that to seconds by dividing by the sampling frequency.
With this t, the Y as I suggest above, and the rest of your code as-is, I see this for the Gaussian example:
>> max(abs(F-Y))
ans = 4.5254e-16
For the other function I see larger differences, in the order of 6e-6. This is due to the inability to sample the Heaviside function. You need t=0 in your sampled function, but H doesn't have a value at 0. I noticed that the real component has an offset of similar magnitude, which is caused by the sample at t=0.
Typically, the sampled Heaviside function is set to 0.5 for t=0. If I do that, the offset is removed completely, and max difference for the real component is reduced by 3 orders of magnitude (largest errors happen for values very close to 0, where I see a zig-zag pattern). For the imaginary component, the max error is reduced to 3e-6, still quite large, and is maximal at high frequencies. I attribute these errors to the difference between the ideal and sampled Heaviside functions.
You should probably limit yourself to band-limited functions (or nearly-band-limited ones such as the Gaussian). You might want to try to replace the Heaviside function with an error function (integral of Gaussian) with a small sigma (sigma = 0.8 * fs is the smallest sigma I would consider for proper sampling). Its Fourier transform is known.

Fourier series coefficients in Matlab not negative using FFT

I am trying to get the Fourier series coefficients using FFT in Matlab. They have the correct absolute value but I would also need the signs of the terms.
t = linspace(-pi,pi,512);
L = length(t);
S = t; % Function
c = fft(S)/L
a = c(2:end)+c(end:-1:2)
b = (c(2:end)-c(end:-1:2))*i
%The first/last b-terms should be 2, -1, 0.66, -0.50...
What am I doing wrong here?
Given how you define t and S, it looks like you are trying to obtain the Fourier series coefficients for the periodic continuous time function:
which can be evaluated analytically (or looked up in Fourier transform tables) to be:
The problem is that you do not pass a time variable when you call fft as it implicitly associates time "0" with the first data sample. If we plot the resulting periodic extension of the function seen by the fft, you should notice the time shift (compared with the S versus t plot above):
Fortunately you can undo this shift by using ifftshift. You should then also note that in order to have an anti-symmetric function (as with your original continuous time function) for which the cosine coefficients are exactly zero (or at least within the available numerical accuracy), you would need to use an odd number of samples. This should give you the following code:
N = 512;
t = linspace(-pi,pi,N-1); % use odd number of samples to get anti-symmetric signal
L = length(t);
S = ifftshift(t); % ifftshift swaps lower & upper half of t, yielding S(1)=0
c = fft(S)/L;
a = c(2:end)+c(end:-1:2);
b = (c(2:end)-c(end:-1:2))*1i;
and the corresponding first 10 b coefficients (which now match your expectation and the quoted analytical result above):

Weird behavior when performing 2D convolution by the FFT

I am trying to construct the product of the FFT of a 2D box function and the FFT of a 2D Gaussian function. After, I find the inverse FFT and I am expecting a convolution of the two functions as a result. However, I am getting a weird one-sided result as seen below. The result appears on the bottom right of the subplot.
The Octave code I wrote to reproduce the above subplot as well as the calculations I performed to construct the convolution is shown below. Can anyone tell me what I'm doing wrong?
clear all;
clc;
close all;
% domain on each side is 0-9
L = 10;
% num subdivisions
N = 32;
delta=L/N;
sigma = 0.5;
% get the domain ready
[x,y] = meshgrid((0:N-1)*delta);
% since domain ranges from 0-(N-1) on both sdes
% we need to take the average
xAvg = sum(x(1, :))/length(x(1,:));
yAvg = sum(y(:, 1))/length(x(:,1));
% gaussian
gssn = exp(- ((x - xAvg) .^ 2 + (y - yAvg) .^ 2) ./ (2*sigma^2));
function ret = boxImpulse(a,b)
n = 32;
L=10;
delta = L/n;
nL = ((n-1)/2-3)*delta;
nU = ((n-1)/2+3)*delta;
if ((a >= nL) && (a <= nU) && ( b >= nL) && (b <= nU) )
ret=1;
else
ret=0;
end
ret;
endfunction
boxResponse = arrayfun(#boxImpulse, x, y);
subplot(2,2,1);mesh(x,y,gssn); title("gaussian fun");
subplot(2,2,2);mesh(x,y, abs(fft2(gssn)) .^2); title("fft of gaussian");
subplot(2,2,3);mesh(x,y,boxResponse); title("box fun");
inv_of_product_of_ffts = abs(ifft2(fft2(boxResponse) * fft2(gssn))) .^2 ;
subplot(2,2,4);mesh(x,y,inv_of_product_of_ffts); title("inv of product of fft");
Let's first address your most obvious errors. This is the way you are computing the convolution in frequency domain:
inv_of_product_of_ffts = abs(ifft2(fft2(boxResponse) * fft2(gssn))) .^2 ;
The first problem is that you are using *, which is matrix multiplication. In frequency domain, element-wise multiplication is the equivalent to convolution in the frequency domain, so you need to use .* instead. The second problem is you also don't need the .^2 term and the abs operation. You're performing convolution, then finding the absolute value and squaring each term. This isn't required. Remove the abs and .^2 operations.
Therefore, what you need is:
inv_of_product_of_ffts = ifft2(fft2(boxResponse).*fft2(gssn)));
However, what you're going to get is this result. Let's place this in a new figure instead of a subplot*:
figure;
mesh(x,y,ifft2(fft2(boxResponse).*fft2(gssn)));
title('Convolution... not right though');
You can see that it's the right result... but it's not centred.... why is that?
This is actually one of the most common problems when computing convolution in the frequency domain. In fact, even the most experienced encounter this problem because they don't understand the internals of how the FFT works.
This is a consequence with how MATLAB and Octave compute the 2D FFT. Specifically, MATLAB and Octave defines a meshgrid of coordinates that go from 0,1,...M-1 for the horizontal and 0,1,...N-1 for the vertical, giving us a M x N result. This means that the origin / DC component is at the top-left corner of the matrix, not the centre as we usually define things. Specifically, the traditional 2D FFT defines the coordinates from -(M-1)/2, ..., (M-1)/2 for the horizontal and -(N-1)/2, ..., (N-1)/2 for the vertical.
Referencing the aforementioned, you defined your signal with the centre assuming that it's the origin and not the top-left corner. To compensate for this, you need to add an fftshift (MATLAB doc, Octave doc) so that the output of the FFT is now centred at the origin and not the top-left corner to bring things back to the way they were, and therefore you really need:
figure;
mesh(x,y,fftshift(ifft2(fft2(boxResponse).*fft2(gssn))));
title('Convolution... now it is right');
If you want to double check that we have the right result, you can perform direct convolution in the spatial domain and we can compare the results between the method in frequency domain.
This is how you'd compute the result directly in spatial domain:
figure;
mesh(x,y,conv2(gssn,boxResponse,'same'));
title('Convolution... spatial domain');
conv2 (MATLAB doc, Octave doc) performs 2D convolution between two signals, and the 'same' flag ensures that the output size is the largest of the two signals, which is either the Gaussian or Box filter. You'll see that it's the same curve, and I won't show it here for brevity.
However, we can compare both of the results and see if they're the same element-wise. A method to do this would be to determine if subtracting each element in the result is less than some threshold... say.. 1e-10:
>> out1 = conv2(boxResponse, gssn, 'same');
>> out2 = fftshift(ifft2(fft2(boxResponse).*fft2(gssn)));
>> all(abs(out1(:)-out2(:)) < 1e-10)
ans =
1
This means that indeed both of these are the same.
Hope that makes sense! Remember, since you're defining your signal where the origin is at the centre, once you find the inverse FFT, you must shift things back so that the origin is now at the centre, not the top-left corner.
*: Minor note - All plots produced in this answer were generated with MATLAB R2015a. However, the code fully works in Octave - tested with Octave 4.0.0.

Fourier Transforms in MatLab

So I have had a few posts the last few days about using MatLab to perform a convolution (see here). But I am having issues and just want to try and use the convolution property of Fourier Transforms. I have the code below:
width = 83.66;
x = linspace(-400,400,1000);
a2 = 1.205e+004 ;
al = 1.778e+005 ;
b1 = 94.88 ;
c1 = 224.3 ;
d = 4.077 ;
measured = al*exp(-((abs((x-b1)./c1).^d)))+a2;
%slit
rect = #(x) 0.5*(sign(x+0.5) - sign(x-0.5));
rt = rect(x/width);
subplot(5,1,1);plot(x,measured);title('imported data-super gaussian')
subplot(5,1,2);plot(x,(real(fftshift(fft(rt)))));title('transformed slit')
subplot(5,1,3);plot(x,rt);title('slit')
u = (fftshift(fft(measured)));
l = u./(real(fftshift(fft(rt))));
response = (fftshift(ifft(l)));
subplot(5,1,4);plot(x,real(response));title('response')
%Data Check
check = conv(rt,response,'full');
z = linspace(min(x),max(x),length(check));
subplot(5,1,5);plot(z,real(check));title('check')
My goal is to take my case, which is $measured = rt \ast signal$ and find signal. Once I find my signal, I convolve it with the rectangle and should get back measured, but I do not get that.
I have very little matlab experience, and pretty much 0 signal processing experience (working with DFTs). So any advice on how to do this would be greatly appreciated!
After considering the problem statement and woodchips' advice, I think we can get closer to a solution.
Input: u(t)
Output: y(t)
If we assume the system is causal and linear we would need to shift the rect function to occur before the response, like so:
rt = rect(((x+270+(83.66/2))/83.66));
figure; plot( x, measured, x, max(measured)*rt )
Next, consider the response to the input. It looks to me to be first order. If we assume as such, we will have a system transfer function in the frequency domain of the form:
H(s) = (b1*s + b0)/(s + a0)
You had been trying to use convolution to and FFT's to find the impulse response, "transfer function" in the time domain. However, the FFT of the rect, being a sinc has a zero crossing periodically. These zero points make using the FFT to identify the system extremely difficult. Due to:
Y(s)/U(s) = H(s)
So we have U(s) = A*sinc(a*s), with zeros, which makes the division go to infinity, which doesn't make sense for a real system.
Instead, let's attempt to fit coefficients to the frequency domain linear transfer function that we postulate is of order 1 since there are no overshoots, etc, 1st order is a reasonable place to start.
EDIT
I realized my first answer here had a unstable system description, sorry! The solution to the ODE is very stiff due to the rect function, so we need to crank down the maximum time step and use a stiff solver. However, this is still a tough problem to solve this way, a more analytical approach may be more tractable.
We use fminsearch to find the continuous time transfer function coefficients like:
function x = findTf(c0,u,y,t)
% minimize the error for the estimated
% parameters of the transfer function
% use a scaled version without an offset for the response, the
% scalars can be added back later without breaking the solution.
yo = (y - min(y))/max(y);
x = fminsearch(#(c) simSystem(c,u,y,t),c0);
end
% calculate the derivatives of the transfer function
% inputs and outputs using the estimated coefficient
% vector c
function out = simSystem(c,u,y,t)
% estimate the derivative of the input
du = diff([0; u])./diff([0; t]);
% estimate the second derivative of the input
d2u = diff([0; du])./diff([0; t]);
% find the output of the system, corresponds to measured
opt = odeset('MaxStep',mean(diff(t))/100);
[~,yp] = ode15s(#(tt,yy) odeFun(tt,yy,c,du,d2u,t),t,[y(1) u(1) 0],opt);
% find the error between the actual measured output and the output
% from the system with the estimated coefficients
out = sum((yp(:,1) - y).^2);
end
function dy = odeFun(t,y,c,du,d2u,tx)
dy = [c(1)*y(3)+c(2)*y(2)-c(3)*y(1);
interp1(tx,du,t);
interp1(tx,d2u,t)];
end
Something like that anyway should get you going.
x = findTf([1 1 1]',rt',measured',x');