Fourier transform and filtering the MD trajectory data instead of PCA for dimensionality reduction? - matlab

I was using PCA for dimensionality reduction of MD (molecular dynamics) trajectory data of some protein simulations. Basically my data is xyz coordinates of protein atoms which change with time (that means I have lot of frames of this xyz coordinates). The dimension of this data is something like 20000 frames of 200x3 (atoms by coordinates). I implemented PCA using princomp command in Matlab.
I was wondering if I can do FFT on my data. I have experience of doing FFT on audio signals (1D signal). Here my data has both time and space in picture. It must be theoretically possible to implement FFT on my data and then filter it using a LPF (low pass filter). But I am unsure.
Can someone give me some direction/code snippets/references towards implementing FFT on my data?
Why people are preferring PCA more often compared to FFT and filtering. Is it because of computational efficiency of algorithm or is it because of the statistical nature of underlying data?

For the first question "Can someone give me some direction/code snippets/references towards implementing FFT on my data?":
I should say fft is implemented in matlab and you do not need to implement it by your own. Also, for your case you should use fftn (fft documentation)to transform and after applying lowpass filtering by dessignfilt (design filter in matalab), the apply ifftn (inverse fft in matlab)to inverse the transform.
For the second question "Why people are preferring PCA more often compared to FFT and filtering ...":
I should say as the filtering in fft is done in signal space, after filtering you can't generalize it in time space. You can more details about this drawback in this article.
But, Fourier analysis has also some other
serious drawbacks. One of them may be that time
information is lost in transforming to the frequency
domain. When looking at a Fourier transform of a
signal, it is impossible to tell when a particular event
has taken place. If it is a stationary signal - this
drawback isn't very important. However, most
interesting signals contain numerous non-stationary or
transitory characteristics: drift, trends, abrupt changes,
and beginnings and ends of events. These
characteristics are often the most important part of the
signal, and Fourier analysis is not suitable in detecting
them.

Related

MATLAB program to estimate the energy spectrum

MATLAB program to estimate the energy spectrum by non-parametric and parametric methods in the case of EEG signals from epilepsy patients with focal and non-focal area recordings
I have to do this and I have no idea where to start. I have this reference link, https://www.upf.edu/web/ntsa/downloads/-/asset_publisher/xvT6E4pczrBw/content/2012-nonrandomness-nonlinear-dependence-and-nonstationarity-of-electroencephalographic-recordings-from-epilepsy-patients?inheritRedirect=false&redirect=https%3A%2F%2Fwww.upf.edu%2Fweb%2Fntsa%2Fdownloads%3Fp_p_id%3D101_INSTANCE_xvT6E4pczrBw%26p_p_lifecycle%3D0%26p_p_state%3Dnormal%26p_p_mode%3Dview%26p_p_col_id%3Dcolumn-1%26p_p_col_count%3D1#.X751wmUzaUk
If someone can give me some advice, please.
I have done both. The Signal Processing Toolbox and System Identification Toolbox documentation is very good on this topic.
Here is a paper I published on this topic. It is about point process rather than continuous signals, but it does have sections on parametric versus non-parametric spectrum analysis.
Non-parametric is the normal way. If your algorithm uses the fast Fourier transform, it is non-parametric.
Parametric means poles-and-zeroes type models. Spectra estimated non-parametrically tend to be a lot smoother than non-parametric. A key question that arises in parametric estimation which does not arise in non-parametric is what model structure to use and what model order. In non-paramtric, those questions do not arise. There are ways to answer those questions. The Akaike information criterion (AIC) is a standard way to answer the question about model order, although there are certainly alternatives to AIC which are better in some situations.
As you read the documentation, you will see numerous references to 'estimation'. This may seem strange at first, but it is a very important concept: wen you determine the spectrum from a signal, whether you do it parametrically or non-parametrically, you are estimating the true underlying spectrum, based on a sample of the process. The data you recorded is that sample.

Kalman Filter : How measurement noise covariance matrix and process noise helps in working of kalman filter , can some one explain intuitively?

How process noise covariance and measurement noise covariance are helping better functioning of Kalman filter ?
Can someone explain intuitively without significant equations and math please.
Well, its difficult to explain mathematical things (like kalman filters) without mathematics, but here's my attempt:
There are two parts to a kalman filter, a time update part and a measurement part. In the time update part we estimate the state at the time of observation; in the measurement part we combine (via least squares) our 'predictions' (ie the estimate from the time update) with the measurements to get a new estimate of the state.
So far, no mention of noise. There are two sources of noise: one in the time update part (sometimes called process noise) and one in the measurement part (observation noise). In each case what we need is a measure of the 'size' of that noise, ie the covariance matrix. These are used when we combine the
predictions with the measurements. When we view our predictions as very uncertain (that is, they have a large covariance matrix) the combination will be closer to the measurements than to the predictions; on the other hand when we view our predictions as very good (small covariance) the combination will be closer to the predictions than to the measurements.
So you could look upon the process and observation noise covariances as saying how much to trust the (parts of) the predictions and observations. Increasing, say, the variance of a particular component of the predictions is to say: trust this prediction less; while increasing the variance of a particular measurement is to say: trust this measurement less. This is mostly an analogy but it can be made more precise. A simple case is when the covariance matrices are diagonal. In that case the cost, ie the contrinution to what we are trying to minimise, of a difference between an measurement and the computed value is te square of that difference, divided by the observations variance. So the higher an observations variance, the lower the cost.
Note that out of the measurement part we also get a new state covariance matrix; this is used (along with the process noise and the dynamics) in the next time update when we compute the predicted state covariance.
I think the question of why the covariance is the appropriate measure of the size of the noise is rather a deep one, as is why least squares is the appropriate way to combine the predictions and the measurements. The shallow answer is that kalman filtering and least squares have been found, over decades (centuries in the case of least squares), to work well in many application areas. In the case of kalman filtering I find the derivation of it from hidden markobv models (From Hidden Markov Models to Linear Dynamical Systems by T.Minka, though this is rather mathematical) convincing. In Hidden markov models we seek to find the (conditional) probability of the states given the measurements so far; Minka shows that if the measurements are linear functions of the states and the dynamics are linear and all probability distributions are Gaussian, then we get the kalman filter.

Deconvolution of data convolved by a Gaussian response

I have a set of experimental data s(t) which consists of a vector (with 81 points as a function of time t).
From the physics, this is the result of the convolution of the system response e(t) with a probe p(t), which is a Gaussian (actually a laser pulse). In terms of vector, its FWHM covers approximately 15 points in time.
I want to deconvolve this data in Matlab using the convolution theorem: FT{e(t)*p(t)}=FT{e(t)}xFT{p(t)} (where * is the convolution, x the product and FT the Fourier transform).
The procedure itself is no problem, if I suppose a Dirac function as my probe, I recover exactly the initial signal (which makes sense, measuring a system with a Dirac gives its impulse response)
However, the Gaussian case as a probe, as far as I understood turns out to be a critical one. When I divide the signal in the Fourier space by the FT of the probe, the wings of the Gaussian highly amplifies those frequencies and I completely loose my initial signal instead of having a deconvolved one.
From your experience, which method could be used here (like Hamming windows or any windowing technique, or...) ? This looks rather pretty simple but I did not find any easy way to follow in signal processing and this is not my field.
You have noise in your experimental data, do you? The problem is ill-posed then (non-uniquely solvable) and you need regularization.
If the noise is Gaussian the keywords are Tikhonov regularization or Wiener filtering.
Basically, add a positive regularization factor that acts as a lowpass filter. In your notation the estimation of the true curve o(t) then becomes:
o(t) = FT^-1(FT(e)*conj(FT(p))/(abs(FT(p))^2+l))
with a suitable l>0.
You're trying to do Deconvolution process by assuming the Filter Model is Gaussian Blur.
Few notes for doing Deconvolution:
Since your data is real (Not synthetic) data it includes some kind of Noise.
Hence it is better to use the Wiener Filter (Even with the assumption of low variance noise). Otherwise, the "Deconvolution Filter" will increase the noise significantly (As it is an High Pass basically).
When doing the division in the Fourier Domain zero pad the signals to the correct size or better yet create the Gaussian Filter in the time domain with the same number of samples as the signal.
Boundaries will create artifact, Windowing might be useful.
There are many more sophisticated methods for Deconvolution by defining a more sophisticated model on the signal and the noise. If you have more prior data about them, you should look for this kind of framework.
You can always set a threshold on the amplification level for certain frequencies, do that if needed.
Use as much samples as you can.
I hope this will assist you.

Power spectrum from autocorrelation function with MATLAB

I have some dynamic light scattering data. The machine pumps out the autocorrelation function, and a count-rate.
I can do a simple fit to the ACF
ACF = exp(-D*q^2*t)
and obtain the diffusion coefficient.
I want to obtain the same D from the power spectrum. I have been able to create a power spectrum in two ways -- from the Fourier transform of the ACF, and from the count rate. Both agree, but the power spectrum does not look like in the one in the books, so I'm not sure how to use it to work out the line width.
Attached is an image from a PDF that shows what you should get, and what I get from MATLAB. Can anyone make sense of whats going on?
I have used the code of answer #3 on this question. The resulting autocorrelation comes out exactly the same as
the machine gives me and
using MATLAB's autocorr command on the photoncount data.
Thank you for your time.
When you compute the Fourier transform from short sequences of data it often looks very noisy. There are a number of reasons for this. One reason is that the statistics of individual Fourier components are not Gaussian, and so averaging the spectra across multiple samples of data will only slowly improve the quality of the estimate.
Another causes of "noisiness" in empirical spectra behavior is that you are applying (to a finite data sample) a transform which involves a pathological sinc function and which assumes an infinite length signal. To diminish this problem, it helps to apply a "windowing-function" to your data before computing the Fourier transform. One of the more complicated but also more powerful windowing approaches is the use of so-called 'Slepian tapers'.
MATLAB conveniently implements well-known windows in functions such as hamming and hann.

Finding Relevant Peaks in Messy FFTs

I have FFT outputs that look like this:
At 523 Hz is the maximum value. However, being a messy FFT, there are lots of little peaks that are right near the large peaks. However, they're irrelevant, whereas the peaks shown aren't. Are the any algorithms I can use to extract the maxima of this FFT that matter; I.E., aren't just random peaks cropping up near "real" peaks? Perhaps there is some sort of filter I can apply to this FFT output?
EDIT: The context of this is that I am trying to take one-hit sound samples (like someone pressing a key on a piano) and extract the loudest partials. In the image below, the peaks above 2000 Hz are important, because they are discrete partials of the given sound (which happens to be a sort of bell). However, the peaks that are scattered about right near 523 seem to be just artifacts, and I want to ignore them.
If the peak is broad, it could indicate that the peak frequency is modulated (AM, FM or both), or is actually a composite of several spectral peaks, themselves each potentially modulated.
For instance, a piano note may be the result of the hammer hitting up to 3 strings that are all tuned just a tiny fraction differently, and they all can modulate as they exchange energy between strings though the piano frame. Guitar strings can change frequency as the pluck shape distortion smooths out and decays. Bells change shape after they are hit, which can modulate their spectrum. Etc.
If the sound itself is "messy" then you need a good definition of what you mean by the "real" peak, before applying any sort of smoothing or side-band rejection filter. e.g. All that "messiness" may be part of what makes a bell sound like a real bell instead of an electronic sinewave generator.
Try convolving your FFT (treating it as a signal) with a rectangular pulse( pulse = ones(1:20)/20; ). This might get rid of some of them. Your maxima will be shifted by 10 frequency bins to teh right, to take that into account. You would basically be integrating your signal. Similar techniques are used in Pan-Tompkins algorithm for heart beat identification.
I worked on a similar problem once, and choosed to use savitsky-golay filters for smoothing the spectrum data. I could get some significant peaks, and it didn't messed too much with the overall spectrum.
But I Had a problem with what hotpaw2 is alerting you, I have lost important characteristics along with the lost of "messiness", so I truly recommend you hear him. But, if you think you won't have a problem with that, I think savitsky-golay can help.
There are non-FFT methods for creating a frequency domain representation of time domain data which are better for noisy data sets, like Max-ent recontruction.
For noisy time-series data, a max-ent reconstruction will be capable of distinguising true peaks from noise very effectively (without adding any artifacts or other modifications to suppress noise).
Max ent works by "guessing" an FFT for a time domain specturm, and then doing an IFT, and comparing the results with the "actual" time-series data, iteratively. The final output of maxent is a frequency domain spectrum (like the one you show above).
There are implementations in java i believe for 1-d spectra, but I have never used one.