matlab block averaging time series data - matlab

Q:MATLAB RELATED: Can someone help me with a MATLAB code for block averaging of time series dataset? Also how do I determine the optimal number of blocks
Background:I have a large time series dataset (position versus time) which I break into 20 smaller blocks. I need to find the variance of position for each block. Since there is a possibility that there is autocorrelation of data, normal averaging doesn't work for me, and I would need to perform block averaging.

Check out the reshape command followed by a mean and/or var (for average or variance)
At the MATLAB terminal, type:
help reshape

Related

Moving Average block returns wrong values for column vector input

I am using Simulink for real-time classification using a trained Fine-KNN model. The input data for the model is a 50-point moving average vector [6x1]. I am using the DSP moving average block for this purpose with sliding window technique (window size = 50 and simulating using code generator). When I compare the input and the output of this block for real-time values, I get the following plot:
It is clear from the plot that there is something wrong with the output as there is quite a discrepancy between the input and the output. What could possibly be the problem or am I doing something wrong?
Edit (after Cris's comment):
Here are some screenshots to showcase some modeling parameters within Simulink:
Screenshot showing probes for measuring actual input and moving average output along with the Moving Average block parameters
Probes
Other block parameters that might be affecting the performance of the model:
a. OPC Config real-time block parameters
b. OPC Read block parameters
PS: One issue that I can think of is that the actual input is fed to the Moving Average in real-time at 10ms time-step and I am not sure if the moving average block has a buffer to store up to the "Window Length" data as it keeps coming in. What I mean by this is, the moving average block might not have access to 50 values of the input signals for quite some time and I am not sure how it deals with that kind of a situation.
I can reproduce this with the following minimal example:
So a constant input of [1; 2; 3] gives a moving average of roughly 2 (the average of the input elements) in all elements, when you would expect an output of [1; 2; 3] since each element is constant.
In your example, the inputs average approximately 0.62, which you are seeing in the output from the moving average.
Using a demux to split your vector up gives the desired output
The docs say that the moving average block should be able to handle this though
The Moving Average block computes the moving average of the input signal along each channel independently over time.
It turns out that a channel in this case is a column of your vector. Since you have a column vector, the columns in each iteration are getting stacked and averaged. Unfortunately the underlying code is sealed so we can't check this theory other than by trying it out.
Reshape your input to be a row array using the reshape block.
Then you get the expected output

Suggestions for defining limits of frequencies in Matlab Spectrogram function?

I am new to Matlab and signal processing. I am having an issue with defining the frequency range in which the spectrogram is processed. When I am plotting the spectrogram of .wav audio data, the y axis, frequency, spans from zero to around 23 kHz. The useful data I am looking for is in the range of 200-400 Hz. My code snippet is:
[samFa, fs] = audioread('samFa.wav'); %convert audio to numerical data
samFa = samFa(:,1); %take only one channel of numerical output
spectrogram(samFA,2205,1200,12800, fs,'yaxis','MinThreshold',-80);
I don't want to be some noobie that runs into a problem and instantly gives up and posts a duplicate question to stackoverflow, so I have done as much digging as I can, but am at my wit's end.
I scoured the documentation for parameters or ways to have Matlab only analyze a subset or range of the data, but found nothing. Additionally, in all of the examples the frequency range seems to automatically adapt to the data set.
I know it is possible to just calculate the spectrogram for the entire range of frequencies, and then remove all of the unnecessary data through truncating or manually changing the limits in the plot itself, but changing plotting limits does not help with the numerical data.
I went searching through many similar questions, and found an answer all the way from 2012 here: Can I adjust spectogram frequency axes?
where the suggested answer was to import a vector of specific frequencies for the spectrogram to analyze. I tried passing a vector of integer values between 200 and 400, and a few other test ranges, but got the error:
Error using welchparse>welch_options (line 297)
The sampling frequency must be a scalar.
I've tried passing the parameter in at different places in the function, with no avail, and don't see anything regarding this parameter in the documentation, leading me to believe that this functionality was possibly removed sometime between 2012 and now.
When plotting spectrogram without providing signal frequency, Matlab provides a normalized spectrogram, which only provides a much smaller data window, which I can visually assess to cover the data from 0:5kHz (an artifact of overtones in the audio), so I know that matlab is not finding any data above this range to make the frequency range go to 20kHz
I've been trying to learn some signal processing for this project, so I believe the Nyquist frequency should be the maximum frequency that a Fourier transform is able to analyze, to be half the sampling frequency. My recording frequency is sampling at 44,100 Hz, and the spectrogram is ranging to around 22 or 23 kHz, leading me to believe that it's Matlab is noticing my sampling frequency and assuming that it needs to analyze up to such a high range.
For my work I am doing I am needing to produce thousands of spectrograms to then be processed through much further analysis, so it is very time consuming for Matlab to be processing so much unecessary data, and I would expect there to be some functionality in Matlab somehow to get around this.
Sorry for the very long post, but I wanted to fully explain my problem and show that I have done as much work as I could to solve the problem before turning for help. Thank you very much.
Get the axis handle and set the visual range there:
spectrogram(samFA,2205,1200,12800, fs,'yaxis','MinThreshold',-80);
ax=gca;
ylim(ax, [0.2,0.4]); %kHz
And if you want to calculate specific frequencies range to save time you better use goertzel.
f = 200:10:400;
freq_indices = round(f/fs*N) + 1;
dft_data = goertzel(data,freq_indices);

Matlab: Comparing two signals with different time values and placed impulses

We are analysing some signals that contains an impuls in the form of a dip in the standard signal in matlab.
Signals
As you can see on the picture, we need to find the difference between the "Zlotty" and the "Krone". The two graphs besides each other, are the graphs that needs to be analyzed.
As you can see the time of the impulse is different in when it occures and in how long the impuls is. We can not use the Time as a value of measurements because that can vary randomly.
Each graph is made by vectors containing 2.5mio datapoints.
How would you use matlab to find a difference?
You could split the problem into two parts. Ensuring the same time scale for both signals and finding a possible time shift in the alignment of the resulting signals. The first part could be achieved by using the resample function of Matlab; and the second task by using cross-correlation. Using two nested for loops, you could perform a search for the "best" stretch factor and time shift that result in the maximum correlation coefficient.

Noisy signal correlation

I have two (or more) time series that I would like to correlate with one another to look for common changes e.g. both rising or both falling etc.
The problem is that the time series are all fairly noisy with relatively high standard deviations meaning it is difficult to see common features. The signals are sampled at a fairly low frequency (one point every 30s) but cover reasonable time periods 2hours +. It is often the case that the two signs are not the same length, for example 1x1hour & 1x1.5 hours.
Can anyone suggest some good correlation techniques, ideally using built in or bespoke matlab routines? I've tried auto correlation just to compare lags within a single signal but all I got back is a triangular shape with the max at 0 lag (I assume this means there is no obvious correlation except with itself?) . Cross correlation isn't much better.
Any thoughts would be greatly appreciated.
Start with a cross-covariance (xcov) instead of the cross-correlation. xcov removes the DC component (subtracts off the mean) of each data set and then does the cross-correlation. When you cross-correlate two square waves, you get a triangle wave. If you have small signals riding on a large offset, you get a triangle wave with small variations in it.
If you think there is a delay between the two signals, then I would use xcorr to calculate the delay. Since xcorr is doing an FFT of the signal, you should remove the means before calling xcorr, you may also want to consider adding a window (e.g. hanning) to reduce leakage if the data is not self-windowing.
If there is no delay between the signals or you have found and removed the delay, you could just average the two (or more) signals. The random noise should tend to average to zero and the common features will approach the true value.

MATLAB 'spectrogram' params

I am a beginner in MATLAB and I should perform a spectral analysis of an EEG signal drawing the graphs of power spectral density and spectrogram. My signal is 10 seconds long and a sampling frequency of 160 Hz, a total of 1600 samples and have some questions on how to find the parameters of the functions in MATLAB, including:
pwelch (x, window, noverlap, nfft, fs);
spectrogram (x, window, noverlap, F, fs);
My question then is where to find values ​​for the parameters window and noverlap I do not know what they are for.
To understand window functions & their use, let's first look at what happens when you take the DFT of finite length samples. Implicit in the definition of the discrete Fourier transform, is the assumption that the finite length of signal that you're considering, is periodic.
Consider a sine wave, sampled such that a full period is captured. When the signal is replicated, you can see that it continues periodically as an uninterrupted signal. The resulting DFT has only one non-zero component and that is at the frequency of the sinusoid.
Now consider a cosine wave with a different period, sampled such that only a partial period is captured. Now if you replicate the signal, you see discontinuities in the signal, marked in red. There is no longer a smooth transition and so you'll have leakage coming in at other frequencies, as seen below
This spectral leakage occurs through the side-lobes. To understand more about this, you should also read up on the sinc function and its Fourier transform, the rectangle function. The finite sampled sequence can be viewed as an infinite sequence multiplied by the rectangular function. The leakage that occurs is related to the side lobes of the sinc function (sinc & rectangular belong to self-dual space and are F.Ts of each other). This is explained in more detail in the spectral leakage article I linked to above.
Window functions
Window functions are used in signal processing to minimize the effect of spectral leakages. Basically, what a window function does is that it tapers the finite length sequence at the ends, so that when tiled, it has a periodic structure without discontinuities, and hence less spectral leakage.
Some of the common windows are Hanning, Hamming, Blackman, Blackman-Harris, Kaiser-Bessel, etc. You can read up more on them from the wiki link and the corresponding MATLAB commands are hann, hamming,blackman, blackmanharris and kaiser. Here's a small sample of the different windows:
You might wonder why there are so many different window functions. The reason is because each of these have very different spectral properties and have different main lobe widths and side lobe amplitudes. There is no such thing as a free lunch: if you want good frequency resolution (main lobe is thin), your sidelobes become larger and vice versa. You can't have both. Often, the choice of window function is dependent on the specific needs and always boils down to making a compromise. This is a very good article that talks about using window functions, and you should definitely read through it.
Now, when you use a window function, you have less information at the tapered ends. So, one way to fix that, is to use sliding windows with an overlap as shown below. The idea is that when put together, they approximate the original sequence as best as possible (i.e., the bottom row should be as close to a flat value of 1 as possible). Typical values vary between 33% to 50%, depending on the application.
Using MATLAB's spectrogram
The syntax is spectrogram(x,window,overlap,NFFT,fs)
where
x is your entire data vector
window is your window function. If you enter just a number, say W (must be integer), then MATLAB chops up your data into chunks of W samples each and forms the spectrogram from it. This is equivalent to using a rectangular window of length W samples. If you want to use a different window, provide hann(W) or whatever window you choose.
overlap is the number of samples that you need to overlap. So, if you need 50% overlap, this value should be W/2. Use floor(W/2) or ceil(W/2) if W can take odd values. This is just an integer.
NFFT is the FFT length
fs is the sampling frequency of your data vector. You can leave this empty, and MATLAB plots the figure in terms of normalized frequencies and the time axis as simply the data chunk index. If you enter it, MATLAB scales the axis accordingly.
You can also get optional outputs such as the time vector and frequency vector and the power spectrum computed, for use in other computations or if you need to style your plot differently. Refer to the documentation for more info.
Here's an example with 1 second of a linear chirp signal from 20 Hz to 400 Hz, sampled at 1000 Hz. Two window functions are used, Hanning and Blackman-Harris, with and without overlaps. The window lengths were 50 samples, and overlap of 50%, when used. The plots are scaled to the same 80dB range in each plot.
You can notice the difference in the figures (top-bottom) due to the overlap. You get a cleaner estimate if you use overlap. You can also observe the trade-off between main lobe width and side lobe amplitude that I mentioned earlier. Hanning has a thinner main lobe (prominent line along the skew diagonal), resulting in better frequency resolution, but has leaky sidelobes, seen by the bright colors outside. Blackwell-Harris, on the other hand, has a fatter main lobe (thicker diagonal line), but less spectral leakage, evidenced by the uniformly low (blue) outer region.
Both these methods above are short-time methods of operating on signals. The non-stationarity of the signal (where statistics are a function of time, Say mean, among other statistics, is a function of time) implies that you can only assume that the statistics of the signal are constant over short periods of time. There is no way of arriving at such a period of time (for which the statistics of the signal are constant) exactly and hence it is mostly guess work and fine-tuning.
Say that the signal you mentioned above is non-stationary (which EEG signals are). Also assume that it is stationary only for about 10ms or so. To reliably measure statistics like PSD or energy, you need to measure these statistics 10ms at a time. The window-ing function is what you multiply the signal with to isolate that 10ms of a signal, on which you will be computing PSD etc.. So now you need to traverse the length of the signal. You need a shifting window (to window the entire signal 10ms at a time). Overlapping the windows gives you a more reliable estimate of the statistics.
You can imagine it like this:
1. Take the first 10ms of the signal.
2. Window it with the windowing function.
3. Compute statistic only on this 10ms portion.
4. Move the window by 5ms (assume length of overlap).
5. Window the signal again.
6. Compute statistic again.
7. Move over entire length of signal.
There are many different types of window functions - Blackman, Hanning, Hamming, Rectangular. That and the length of the window and overlap really depend on the application that you have and the frequency characteristics of the signal itself.
As an example, in speech processing (where the signals are non-stationary and windowing gets used a lot), the most popular choices for windowing functions are Hamming/Hanning of length 10ms (320 samples at 16 kHz sampling) with an overlap of 80 samples (25% of window length). This works reasonably well. You can use this as a starting point for your application and then work on fine-tuning it a little more with different values.
You may also want to take a look at the following functions in MATLAB:
1. hamming
2. hanning
I hope you know that you can call up a ton of help in MATLAB using the help command on the command line. MATLAB is one of the best documented softwares out there. Using the help command for pwelch also pulls up definitions for window size and overlap. That should help you out too.
I don't know if all this info. helped you out or not, but looking at the question, I felt you might have needed a little help with understanding what windowing and overlapping was all about.
HTH,
Sriram.
For the last parameter fs, that is the frequency rate of the raw signal, in your case X, when you extract X from audio data using function
[X,fs]=audioread('song.mp3')
You may get fs from it.
Investigate how the following parameters change the performance of the Sinc function:
The Length of the coefficients
The Following window functions:
Blackman Harris
Hanning
Bartlett