Implementing Fixed-Width Sliding Windowing technique on sensor signal - matlab

I am trying to replicate the signal pre processing done on this Data Set: http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones#
"The sensor signals (accelerometer and gyroscope) were pre-processed by applying noise filters and then sampled in fixed-width sliding windows of 2.56 sec and 50% overlap (128 readings/window)"
I am trying to perform all my functionalities on Matlab and am stuck in trying to create a fixed width sliding window on my acceleration signals.
My main area is AI classification and have no background knowledge on signal preprocessing etc.. I've been trying to research etc for a long time but I cannot understand what I need to do to my signal.
I have signals which are produced at a 51.2 Sampling Frequency. Any help please on how I can generate a sampling window like the one done on the linked data set?

You should calculate window length in number of samples and step=window-overlap and use this:
https://stackoverflow.com/a/44190634/3344428 (you can clear unnecessary code for std, skewness etc.)
You should clarify window length, because you mentioned "128 readings/window", BUT if you use 2.56 sec window on 51.2 Hz signal, then window = 2.56*51.2 = 131 samples.

Related

How do I model the RF propagation of a custom UWB transmission using MATLAB?

I've successfully plotted the signal strength coverage map for a generic narrow band (read single frequency) horn antenna using MATLAB's in-built functions design(), txsite() and coverage().
MATLAB uses the 'Longley-Rice' propagation model when terrain data is present which I downloaded and introduced using addCustomTerrain().
However, I don't want my antenna to be narrow band operating at a single frequency.
I want to model the coverage map I would get on location with a known ultra-wide band (UWB) transient pulsed signal. I have the time domain E-field of this waveform as well as the FFT and energy spectral density.
My plan was to loop over many tx antennas, each having an operating frequency equal to one of the ~1000 frequency bins in the UWB spectral content and an output power equal to the scaled energy spectral density (ESD) multiplied by the frequency step size (df) and divided by the total time period of the measured pulsed signal (to get power). P = ESD * (df/T).
However, when I ran this looped code, I got:
"Error using em.EmStructures/savesolution
The calculated result is invalid; possible cause is a coarse mesh. Please consider refining the mesh
manually."
I assume this means MATLAB can't model 1000 different antennas on the same exact location.. but any idea on this error?
Is what I'm trying to do possible in MATLAB?
Are there alternative methods?
Thank you for any help in advance!

Noisy Signal in Data (FFT) - MATLAB

I am plotting some data I collected at 10000 Hz, I attached a snip of some of the data in the form of an FFT and time. I am getting a repeating frequency around 10Hz that seems to be pretty obviously some sort of noise from the system i am testing. The signal shows up in the time domain and also the frequency domain.
I am looking to use MATLAB to remove these spikes.
Has anyone dealt with a similar issue and can provide any advice.
To filter out specific frequency components of a signal, you would normally use either a notch filter or a comb filter, for which MATLAB already has some commands in the DSP System Toolbox:
https://www.mathworks.com/help/dsp/ref/iirnotch.html
https://www.mathworks.com/help/dsp/ref/fdesign.comb.html
Alternatively, if you have the Signal Processing Toolbox, you can use the band-stop Butterworth filter to remove individual frequency components (ranges) using
https://www.mathworks.com/help/signal/ref/butter.html

Estimating Quasi-stationary part of a signal

I am trying to estimate the Estimating Quasi-stationary part of a signal in Matlab. It is a 1 second long sound signal that belongs to a bird.
I am using MFCC to extract features but would like to have a window size for MFCC that is guaranteed to operate on statistically quasi-stationary part.
My questions are:
Do you think it is a solid approach if I iterate by varying my window size from 1 second to a smaller interval by observing the change of second moment of features and making a decision where the second moment is not changing anymore?
If I use Shannon entropy method by again varying my MFCC window size, how the number of bits I got at the output of the entropy algorithm would help me to identify the Estimating Quasi-stationary part of the signal
Are there any other ideas?

1024 pt fft on a large set of data points

I have a signal that may not be periodic. We take about 15secs worth of samples (# 10kHz sampling rate) and we need to do the FFT on that signal to get the frequency content.
The problem is that we are implementing this FFT on an embedded system (DSP) which provides a library FFT of max. 1024 length. That is, it takes in 1024 data points as input and provides a 1024 point output.
What is the correct way of obtaining an FFT on the full 150000 point input?
You could run the FFT on each 1024 point block and average them to get an average power spectrum on the lower-resolution 1024-point frequency axis (512 samples from 0 to the Nyquist frequency, fs/2, so about 10 Hz resolution for your 10 kHz sampling). You should average the magnitudes of the component FFTs (i.e., sqrt(re^2+im^2)), otherwise the average will be sensitive to the drifting phase within each subwindow, which will depend on the precise frequency of the sinusoi.
If you think the periodic component may be at a low frequency, such that it will show up in a 15 sec sample but not complete any cycles in a 1024/10k ~ 100ms sample (i.e., below 10 Hz or so), you could downsample your input. You could try something as crude as averaging every 100 points to get a somewhat-distorted signal at 100 Hz sampling rate, then pack 10.24 sec worth into your 1024 pt sequence to pass to the FFT.
You could combine these two approaches by using a smaller downsampling factor and then do the magnitude-averaging of successive windows.
I'm confused why the system provides an FFT only up to 1024 points - is there something about the memory that makes it harder to access larger blocks?
Calculating a 128k point FFT using a 1k FFT as a subroutine is possible, but you'd end up recoding a lot of the FFT yourself. Maybe you should forget about the system library and use some other FFT implementation, without the length limitation, that will compile on your target. It may not incorporate all the optimizations of the system-provided one, but you're likely to lose a lot of that advantage when you embed it within the custom code needed to use the partial outputs of the multiple shorter FFTs to produce the long FFT.
Probably the quickest way to do the hybrid FFT (1024 points using the library, then added code to combine them into a 128k point FFT) would be to take an existing full FFT routine (a radix-2, decimation-in-time (DIT) routine for instance), but then modify it to use the system library for what would have been the first 10 stages, which amount to calculating 128 individual 1024-point FFTs on different subsets of the original signal (not, unfortunately, successive windows, but the partial-bit-reversed subsets), then let the remaining 7 stages of butterflies operate on those partial outputs. You'd want to get a pretty solid understanding of how the DIT FFT works to implement this.

MATLAB 'spectrogram' params

I am a beginner in MATLAB and I should perform a spectral analysis of an EEG signal drawing the graphs of power spectral density and spectrogram. My signal is 10 seconds long and a sampling frequency of 160 Hz, a total of 1600 samples and have some questions on how to find the parameters of the functions in MATLAB, including:
pwelch (x, window, noverlap, nfft, fs);
spectrogram (x, window, noverlap, F, fs);
My question then is where to find values ​​for the parameters window and noverlap I do not know what they are for.
To understand window functions & their use, let's first look at what happens when you take the DFT of finite length samples. Implicit in the definition of the discrete Fourier transform, is the assumption that the finite length of signal that you're considering, is periodic.
Consider a sine wave, sampled such that a full period is captured. When the signal is replicated, you can see that it continues periodically as an uninterrupted signal. The resulting DFT has only one non-zero component and that is at the frequency of the sinusoid.
Now consider a cosine wave with a different period, sampled such that only a partial period is captured. Now if you replicate the signal, you see discontinuities in the signal, marked in red. There is no longer a smooth transition and so you'll have leakage coming in at other frequencies, as seen below
This spectral leakage occurs through the side-lobes. To understand more about this, you should also read up on the sinc function and its Fourier transform, the rectangle function. The finite sampled sequence can be viewed as an infinite sequence multiplied by the rectangular function. The leakage that occurs is related to the side lobes of the sinc function (sinc & rectangular belong to self-dual space and are F.Ts of each other). This is explained in more detail in the spectral leakage article I linked to above.
Window functions
Window functions are used in signal processing to minimize the effect of spectral leakages. Basically, what a window function does is that it tapers the finite length sequence at the ends, so that when tiled, it has a periodic structure without discontinuities, and hence less spectral leakage.
Some of the common windows are Hanning, Hamming, Blackman, Blackman-Harris, Kaiser-Bessel, etc. You can read up more on them from the wiki link and the corresponding MATLAB commands are hann, hamming,blackman, blackmanharris and kaiser. Here's a small sample of the different windows:
You might wonder why there are so many different window functions. The reason is because each of these have very different spectral properties and have different main lobe widths and side lobe amplitudes. There is no such thing as a free lunch: if you want good frequency resolution (main lobe is thin), your sidelobes become larger and vice versa. You can't have both. Often, the choice of window function is dependent on the specific needs and always boils down to making a compromise. This is a very good article that talks about using window functions, and you should definitely read through it.
Now, when you use a window function, you have less information at the tapered ends. So, one way to fix that, is to use sliding windows with an overlap as shown below. The idea is that when put together, they approximate the original sequence as best as possible (i.e., the bottom row should be as close to a flat value of 1 as possible). Typical values vary between 33% to 50%, depending on the application.
Using MATLAB's spectrogram
The syntax is spectrogram(x,window,overlap,NFFT,fs)
where
x is your entire data vector
window is your window function. If you enter just a number, say W (must be integer), then MATLAB chops up your data into chunks of W samples each and forms the spectrogram from it. This is equivalent to using a rectangular window of length W samples. If you want to use a different window, provide hann(W) or whatever window you choose.
overlap is the number of samples that you need to overlap. So, if you need 50% overlap, this value should be W/2. Use floor(W/2) or ceil(W/2) if W can take odd values. This is just an integer.
NFFT is the FFT length
fs is the sampling frequency of your data vector. You can leave this empty, and MATLAB plots the figure in terms of normalized frequencies and the time axis as simply the data chunk index. If you enter it, MATLAB scales the axis accordingly.
You can also get optional outputs such as the time vector and frequency vector and the power spectrum computed, for use in other computations or if you need to style your plot differently. Refer to the documentation for more info.
Here's an example with 1 second of a linear chirp signal from 20 Hz to 400 Hz, sampled at 1000 Hz. Two window functions are used, Hanning and Blackman-Harris, with and without overlaps. The window lengths were 50 samples, and overlap of 50%, when used. The plots are scaled to the same 80dB range in each plot.
You can notice the difference in the figures (top-bottom) due to the overlap. You get a cleaner estimate if you use overlap. You can also observe the trade-off between main lobe width and side lobe amplitude that I mentioned earlier. Hanning has a thinner main lobe (prominent line along the skew diagonal), resulting in better frequency resolution, but has leaky sidelobes, seen by the bright colors outside. Blackwell-Harris, on the other hand, has a fatter main lobe (thicker diagonal line), but less spectral leakage, evidenced by the uniformly low (blue) outer region.
Both these methods above are short-time methods of operating on signals. The non-stationarity of the signal (where statistics are a function of time, Say mean, among other statistics, is a function of time) implies that you can only assume that the statistics of the signal are constant over short periods of time. There is no way of arriving at such a period of time (for which the statistics of the signal are constant) exactly and hence it is mostly guess work and fine-tuning.
Say that the signal you mentioned above is non-stationary (which EEG signals are). Also assume that it is stationary only for about 10ms or so. To reliably measure statistics like PSD or energy, you need to measure these statistics 10ms at a time. The window-ing function is what you multiply the signal with to isolate that 10ms of a signal, on which you will be computing PSD etc.. So now you need to traverse the length of the signal. You need a shifting window (to window the entire signal 10ms at a time). Overlapping the windows gives you a more reliable estimate of the statistics.
You can imagine it like this:
1. Take the first 10ms of the signal.
2. Window it with the windowing function.
3. Compute statistic only on this 10ms portion.
4. Move the window by 5ms (assume length of overlap).
5. Window the signal again.
6. Compute statistic again.
7. Move over entire length of signal.
There are many different types of window functions - Blackman, Hanning, Hamming, Rectangular. That and the length of the window and overlap really depend on the application that you have and the frequency characteristics of the signal itself.
As an example, in speech processing (where the signals are non-stationary and windowing gets used a lot), the most popular choices for windowing functions are Hamming/Hanning of length 10ms (320 samples at 16 kHz sampling) with an overlap of 80 samples (25% of window length). This works reasonably well. You can use this as a starting point for your application and then work on fine-tuning it a little more with different values.
You may also want to take a look at the following functions in MATLAB:
1. hamming
2. hanning
I hope you know that you can call up a ton of help in MATLAB using the help command on the command line. MATLAB is one of the best documented softwares out there. Using the help command for pwelch also pulls up definitions for window size and overlap. That should help you out too.
I don't know if all this info. helped you out or not, but looking at the question, I felt you might have needed a little help with understanding what windowing and overlapping was all about.
HTH,
Sriram.
For the last parameter fs, that is the frequency rate of the raw signal, in your case X, when you extract X from audio data using function
[X,fs]=audioread('song.mp3')
You may get fs from it.
Investigate how the following parameters change the performance of the Sinc function:
The Length of the coefficients
The Following window functions:
Blackman Harris
Hanning
Bartlett