Fast fourier transform for deasonalizing data in MATLAB - matlab

I'm very much a novice at signal processing techniques, but I am trying to apply the fast fourier transform to a daily time series to remove the seasonality present in the data. The example I am working with is from here:
http://www.mathworks.com/help/signal/ug/frequency-domain-linear-regression.html
While I understand how to implement the code as it is written in the example, I am having a hard time adapting it to my specific application. What I am trying to do is create a preprocessing function which deseasonalizes the training data using similar code to the above example. Then, using the same estimated coefficients from the in-sample data, deseasonalize the out-of-sample data to preserve its independence from the in-sample data. Basically, once the coefficients are estimated, I will normalize each new data point using the same coefficients. I suspect this is akin to estimating a linear trend, then removing it from the in-sample data, and then using the same linear model on unseen data to detrend it i the same manner.
Obviously, when I estimate the fourier coefficients, the vector I get out is equal to the length of the in-sample data. The out-of-sample data is comprised of much fewer observations, so directly applying them is impossible.
Is this sort of analysis possible using this technique or am I going down a dead end road? How should I approach that using the code in the example above?

What you want to do is certainly possible, you are on the right track, but you seem to misunderstand a few points in the example. First, it is shown in the example that the technique is the equivalent of linear regression in the time domain, exploiting the FFT to perform in the frequency domain an operation with the same effect. Second, the trend that is removed is not linear, it is equal to a sum of sinusoids, which is why FFT is used to identify particular frequency components in a relatively tidy way.
In your case it seems you are interested in the residuals. The initial approach is therefore to proceed as in the example as follows:
(1) Perform a rough "detrending" by removing the DC component (the mean of the time-domain data)
(2) FFT and inspect the data, choose frequency channels that contain most of the signal.
You can then use those channels to generate a trend in the time domain and subtract that from the original data to obtain the residuals. You need not proceed by using IFFT, however. Instead you can explicitly sum over the cosine and sine components. You do this in a way similar to the last step of the example, which explains how to find the amplitudes via time-domain regression, but substituting the amplitudes obtained from the FFT.
The following code shows how you can do this:
tim = (time - time0)/timestep; % <-- acquisition times for your *new* data, normalized
NFpick = [2 7 13]; % <-- channels you picked to build the detrending baseline
% Compute the trend
mu = mean(ts);
tsdft = fft(ts-mu);
Nchannels = length(ts); % <-- size of time domain data
Mpick = 2*length(NFpick);
X(:,1:2:Mpick) = cos(2*pi*(NFpick-1)'/Nchannels*tim)';
X(:,2:2:Mpick) = sin(-2*pi*(NFpick-1)'/Nchannels*tim)';
% Generate beta vector "bet" containing scaled amplitudes from the spectrum
bet = 2*tsdft(NFpick)/Nchannels;
bet = reshape([real(bet) imag(bet)].', numel(bet)*2,1)
trend = X*bet + mu;
To remove the trend just do
detrended = dat - trend;
where dat is your new data acquired at times tim. Make sure you define the time origin consistently. In addition this assumes the data is real (not complex), as in the example linked to. You'll have to examine the code to make it work for complex data.

Related

How to generate smooth filtered envelope on EMG data in Matlab

I'm new to analysing EMG data and would appreciate some carefully explained help.
I would like to generate a smooth, linear enevelope signal of my EMG data (50kHz sampling rate) like the one published in this paper: https://openi.nlm.nih.gov/detailedresult.php?img=PMC3480942_1743-0003-9-29-3&req=4
My end goal is to be able to analyze the relationship between EMG activity (output) and action potentials fired from upstream neurons (putative input) recorded at the same time.
Even though this paper lists the filtering methods out quite clearly, I do not understand what they mean or how to perform them in matlab, which is the analysis tool I have available to me.
In the code I have written so far, I can dc offset as well as rectify my data:
x = EMGtime_data
y = EMGvoltage_data
%dc offset
y2=detrend(y)
% Rectification of the EMG signal
rec_y=abs(y2);
plot(x, rec_y)
But then I am not sure how to proceed.
I have tried the envelope function, but it is not as smooth as I would like:
For instance, if I used the following:
envelope(y_rec,2000,'rms')
I get this (which also doesn't seem to care that the data is rectified):
Even if I were to accept the envelope function, I'm not sure how to access just the processed envelope data to adjust the plot (i.e. change the y-range), or analyse the data further for on-set and off-set of the signal since the results of this function seem to be coupled with the original trace.
I have also come across fastrms.m, which seems promising. Unfortunately, I do not understand how to implement this function since the general explanation is over my head and the example code is lacking any defined variable (so I don't know where to integrate my own data!)
The example code from fastrms.m file exchange is here
Fs = 200; T = 5; N = T*Fs; t = linspace(0,T,N);
noise = randn(N,1);
[a,b] = butter(5, [9 12]/(Fs/2));
x = filtfilt(a,b,noise);
window = gausswin(0.25*Fs);
rms = fastrms(x,window,[],1);
plot(t,x,t,rms*[1 -1],'LineWidth',2);
xlabel('Time (sec)'); ylabel('Signal')
title('Instantaneous amplitude via RMS')
I will be eternally grateful for help in understanding how to filter and smooth EMG data!
In order to analysis EMG signals in time domain, researcher use The combination of rectification and low pass filtering which is also called finding the “linear envelope” of the signal.
And as mentioned in both the above sentence and your attached article image's explanation, in order to plot overlaid signal, you could simply low pass filter your signal at specific frequency.
In your attached article the said signal was filtered at 8 HZ.
For better understanding the art of EMG signal analysis , i think this document could help you a lot (link)

Filters performance analysis

I am working on some experimental data which, at some point, need to be time-integrated and then high-pass filtered (to remove low frequency disturbancies introduced by integration and unwanted DC component).
The aim of my work is not related to filtering, but still I would like to analyze more in detail the filters I am using to give some justification (for example to motivate why I chosed to use a 4th order filter instead of a higher/lower one).
This is the filter I am using:
delta_t = 1.53846e-04;
Fs = 1/delta_t;
cut_F = 8;
Wn = cut_F/(Fs/2);
ftype = 'high';
[b,a] = butter(4,Wn,ftype);
filtered_signal = filtfilt(b,a,signal);
I already had a look here: High-pass filtering in MATLAB to learn something about filters (I never had a course on signal processing) and I used
fvtool(b,a)
to see the impulse response, step response ecc. of the filter I have used.
The problem is that I do not know how to "read" these plots.
What do I have to look for?
How can I understand if a filter is good or not? (I do not have any specification about filter performances, I just know that the lowest frequency I can admit is 5 Hz)
What features of different filters are useful to be compared to motivate the choice?
I see you are starting your Uni DSP class on filters :)
First thing you need to remember is that Matlab can only simulate using finite values, so the results you see are technically all discrete. There are 4 things that will influence your filtering results(or tell you if your filter is good or bad) which you will learn about/have to consider while designing a Finite response filter:
1, the Type of the filter (i.e. Hamming, Butterworth (the one you are using), Blackman, Hanning .etc)
2, the number of filter Coefficients (which determines your filter resolution)
3, the sampling frequency of the original signal (ideally, if you have infinite sampling frequency, you can have perfect filters; not possible in Matlab due to reason above, but you can simulate its effect by setting it really high)
4, the cut-off frequency
You can play around with the 4 parameters so that your filter does what you want it to.
So here comes the theory:
There is a trade-off in terms of the width of your main lobe vs the spectrum leakage of your filter. The idea is that you have some signal with some frequencies, you want to filter out the unwanted (i.e. your DC noise) and keep the ones you want, but what if your desired signal frequency is so low that it is very close to the DC noise. If you have a badly designed filter, you will not be able to filter out the DC component. In order to design a good filter, you will need to find the optimal number for your filter coefficients, type of filter, even cut-off frequency to make sure your filter works as you wanted.
Here is a low-pass filter that I wrote back in the days, you can play around with filters a lot by filtering different kinds of signals and plotting the response.
N = 21; %number of filter coefficients
fc = 4000; %cut-off frequency
f_sampling = fs; %sampling freq
Fc = fc/f_sampling;
n = -(N-1)/2:(N-1)/2;
delta = [zeros(1,(N-1)/2) 1 zeros(1,(N-1)/2)];
h = delta - 2*Fc*sinc(2*n*Fc);
output = filter(h,1,yoursignal);
to plot the response, you want to plot your output in the frequency domain using DFT or FFT(in Matlab) and see how the signal has been distorted due to the leakage and etc.
NFFT=256; % FFT length
output=1/N*abs(fft(output,NFFT)).^2; % PSD estimate using FFT
this gives you what is known as a periodigram, when you plot, you might want to do the 10*log10 to it, so it looks nicer
Hope you do well in class.

Naïve Bayes Classifier -- is normalization necessary?

We recently studied the Naïve Bayesian Classifier in our Machine Learning class and now I'm trying to implement it on the Fisher Iris dataset as a self-exercise. The concept is easy and straightforward, with some trickiness involved for continuous attributes. I read up several literature resources which recommended using a Gaussian approximation to compute probability of test data values, so I'm going with it in my code.
Now I'm trying to run it initially for 50% training and 50% test data samples, but something is missing. The current code is always predicting class 1 (I used integers to represent the classes) for all test samples, which is obviously wrong.
My guess is that the problem may be due to normalization being omitted by the code? Though I think adding normalization would still yield proportionate results, and so far my attempts to normalize have produced the same classification results.
Can someone please suggest if there is anything obvious missing here? Or if I'm not approaching this right? Since most of the code is 'mechanics', I have made prominent (****************) the 2 lines that are responsible for the calculations. Any help is appreciated, thanks!
nsamples=75; % 50% samples
% acquire training set and test set
[trainingSample,idx] = datasample(data,nsamples,'Replace',false);
testData = data(setdiff(1:150,idx),:);
% define Gaussian function
%***********************************************************%
Phi=#(mu,sig2,x) (1/sqrt(2*pi*sig2))*exp(-((x-mu)^2)/2*sig2);
%***********************************************************%
for c=1:3 % for 3 classes in training set
clear y x mu sig2;
index=1;
for i=1 : length(trainingSample)
if trainingSample(i,5)==c
y(index,:)=trainingSample(i,:); % filter current class samples
index=index+1; % for conditional probabilities
end
end
for j=1:size(testData,1) % iterate over test samples
clear pf p;
for i=1:4 % iterate over columns
x=testData(j,i); % representing attributes
mu=mean(y(:,i));
sig2=var(y(:,i));
pf(i) = Phi(mu,sig2,x); % calc conditional probability
end
% calc class likelihood; prior * posterior
%*****************************************************%
pc(j,c) = size(y,1)/nsamples * pf(1)*pf(2)*pf(3)*pf(4);
%*****************************************************%
end
end
% find the predicted class for each test sample
% by taking the max probability calculated
for i=1:size(pc,1)
[~,q]=max(pc(i,:));
predicted(i)=q;
actual(i)=testData(i,5);
end
Normalization shouldn't be necessary since the features are only compared to each other.
p(class|thing) = p(class)p(thing|class) =
= p(class)p(feature_1|class)p(feature_2|class)...p(feature_N|class)
So when fitting the parameters for the distribution feature_i|class it will just rescale the parameters (for the new "scale") in this case (mu, sigma2), but the probabilities will remain the same.
It's hard to read the matlab code due to alot of indexing and splitting of training/testing etc. Which is a possible problem source.
You should try something with a lot less non-necessary stuff around it (I would recommend python with scikit-learn for example, alot of helpers for splitting data and such http://scikit-learn.org/).
It's really important that you separate the training and test data, and only train the model with training data and test the trained model with the test data. (Is this done?)
Next step is to check the parameters which is easiest done with either printing them out (sanity check) or..
for each feature render the gaussian bells fitted next to a histogram of the data to see that they match (remember that each histogram bar must be of height number_of_samples_within_range/total_number_of_samples.
Visualising the data and the model is really important to know what is happening.

Time of Arrival estimation of a signal in Matlab

I want to estimate the time of arrival of GPR echo signals using Music algorithm in matlab, I am using the duality property of Fourier transform.
I am first applying FFT on the obtained signal and then passing these as parameters to pmusic function, i am still getting the result in frequency domain.?
Short Answer: You're using the wrong function here.
As far as I can tell Matlab's pmusic function returns the pseudospectrum of an input signal.
If you click on the pseudospectrum link, you'll see that the pseudospectrum of a signal lives in the frequency domain. In particular, look at the plot:
(from Matlab's documentation: Plotting Pseudospectrum Data)
Notice that the result is in the frequency domain.
Assuming that by GPR you mean Ground Penetrating Radar, then try radar or sonar echo detection approach to estimate the two way transit time.
This can be done and the theory has been published in several papers. See, for example, here:
STAR Channel Estimation in DS-CDMA Systems
That paper describes spatiotemporal estimation (i.e. estimation of both time and direction of arrival), but you can ignore the spatial part and just do temporal estimation if you have a single-antenna receiver.
You probably won't want to use Matlab's pmusic function directly. It's always quicker and easier to write these sorts of functions for yourself, so you know what is actually going on. In the case of MUSIC:
% Get noise subspace (where M is number of signals)
[E, D] = eig(Rxx);
[lambda, idx] = sort(diag(D), 'descend');
E = E(:, idx);
En = E(:,M+1:end);
% [Construct matrix S, whose columns are the vectors to search]
% Calculate MUSIC null spectrum and convert to dB
Z = 10*log10(sum(abs(S'*En).^2, 2));
You can use the Phased array system toolbox of MATLAB if you want to estimate the DOA using different algorithms using a single command. Such as for Root MUSIC it is phased.RootMUSICEstimator phased.ESPRITEstimator.
However as Harry mentioned its easy to write your own function, once you define the signal subspace and receive vector, you can directly apply it in the MUSIC function to find its peaks.
This is another good reference.
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1143830

Complex FFT then Inverse FFT MATLAB

I am using the FFT function in Matlab in an attempt to analyze the output of a Travelling Wave Laser Model.
The of the model is in the time domain in the form (real, imaginary), with the idea being to apply the FFT to the complex output, to obtain phase and amplitude information in the frequency domain:
%load time_domain field data
data = load('fft_data.asc');
% Calc total energy in the time domain
N = size(data,1);
dt = data(2,1) - data (1,1);
field_td = complex (data(:,4), data(:,5));
wavelength = 1550e-9;
df = 1/N/dt;
frequency = (1:N)*df;
dl = wavelength^2/3e8/N/dt;
lambda = -(1:N)*dl +wavelength + N*dl/2;
%Calc FFT
FT = fft(field_td);
FT = fftshift(FT);
counter=1;
phase=angle(FT);
amptry=abs(FT);
unwraptry=unwrap(phase);
Following the unwrapping, a best fit was applied to the phase in the region of interest, and then subtracted from the phase itself in an attempt to remove wavelength dependence of phase in the region of interest.
for i=1:N % correct phase and produce new IFFT input
bestfit(i)=1.679*(10^10)*lambda(i)-26160;
correctedphase(i)=unwraptry(i)-bestfit(i);
ReverseFFTinput(i)= complex(amptry(i)*cos(correctedphase(i)),amptry(i)*sin(correctedphase(i)));
end
Having performed the best fit manually, I now have the Inverse FFT input as shown above.
pleasework=ifft(ReverseFFTinput);
from which I can now extract the phase and amplitude information in the time domain:
newphasetime=angle(pleasework);
newamplitude=abs(pleasework);
However, although the output for the phase is greatly different compared to the input in the time domain
the amplitude of the corrected data seems to have varied little (if at all!),
despite the scaling of the phase. Physically speaking this does not seem correct, as my understanding is that removing wavelength dependence of phase should 'compress' the pulsed input i.e shorten pulse width but heighten peak.
My main question is whether I have failed to use the inverse FFT correctly, or the forward FFT or both, or is this something like a windowing or normalization issue?
Sorry for the long winded question! And thanks in advance.
You're actually seeing two effects.
First the expected one goes. You're talking about "removing wavelength dependence of phase". If you did exactly that - zeroed out the phase completely - you would actually get a slightly compressed peak.
What you actually do is that you add a linear function to the phase. This does not compress anything; it is a well-known transformation that is equivalent to shifting the peaks in time domain. Just a textbook property of the Fourier transform.
Then goes the unintended one. You convert the spectrum obtained with fft with fftshift for better display. Thus before using ifft to convert it back you need to apply ifftshift first. As you don't, the spectrum is effectively shifted in frequency domain. This results in your time domain phase being added a linear function of time, so the difference between the adjacent points which used to be near zero is now about pi.