What is a spectrogram and how do I set its parameters? - matlab

I am trying to plot the spectrogram of my time domain signal given:
N=5000;
phi = (rand(1,N)-0.5)*pi;
a = tan((0.5.*phi));
i = 2.*a./(1-a.^2);
plot(i);
spectrogram(i,100,1,100,1e3);
The problem is I don't understand the parameters and what values should be given. These values that I am using, I referred to MATLAB's online documentation of spectrogram. I am new to MATLAB, and I am just not getting the idea. Any help will be greatly appreciated!

Before we actually go into what that MATLAB command does, you probably want to know what a spectrogram is. That way you'll get more meaning into how each parameter works.
A spectrogram is a visual representation of the Short-Time Fourier Transform. Think of this as taking chunks of an input signal and applying a local Fourier Transform on each chunk. Each chunk has a specified width and you apply a Fourier Transform to this chunk. You should take note that each chunk has an associated frequency distribution. For each chunk that is centred at a specific time point in your time signal, you get a bunch of frequency components. The collection of all of these frequency components at each chunk and plotted all together is what is essentially a spectrogram.
The spectrogram is a 2D visual heat map where the horizontal axis represents the time of the signal and the vertical axis represents the frequency axis. What is visualized is an image where darker colours means that for a particular time point and a particular frequency, the lower in magnitude the frequency component is, the darker the colour. Similarly, the higher in magnitude the frequency component is, the lighter the colour.
Here's one perfect example of a spectrogram:
Source: Wikipedia
Therefore, for each time point, we see a distribution of frequency components. Think of each column as the frequency decomposition of a chunk centred at this time point. For each column, we see a varying spectrum of colours. The darker the colour is, the lower the magnitude component at that frequency is and vice-versa.
So!... now you're armed with that, let's go into how MATLAB works in terms of the function and its parameters. The way you are calling spectrogram conforms to this version of the function:
spectrogram(x,window,noverlap,nfft,fs)
Let's go through each parameter one by one so you can get a greater understanding of what each does:
x - This is the input time-domain signal you wish to find the spectrogram of. It can't get much simpler than that. In your case, the signal you want to find the spectrogram of is defined in the following code:
N=5000;
phi = (rand(1,N)-0.5)*pi;
a = tan((0.5.*phi));
i = 2.*a./(1-a.^2);
Here, i is the signal you want to find the spectrogram of.
window - If you recall, we decompose the image into chunks, and each chunk has a specified width. window defines the width of each chunk in terms of samples. As this is a discrete-time signal, you know that this signal was sampled with a particular sampling frequency and sampling period. You can determine how large the window is in terms of samples by:
window_samples = window_time/Ts
Ts is the sampling time of your signal. Setting the window size is actually very empirical and requires a lot of experimentation. Basically, the larger the window size, the better frequency resolution you get as you're capturing more of the frequencies, but the time localization is poor. Similarly, the smaller the window size, the better localization you have in time, but you don't get that great of a frequency decomposition. I don't have any suggestions here on what the most optimal size is... which is why wavelets are preferred when it comes to time-frequency decomposition. For each "chunk", the chunks get decomposed into smaller chunks of a dynamic width so you get a mixture of good time and frequency localization.
noverlap - Another way to ensure good frequency localization is that the chunks are overlapping. A proper spectrogram ensures that each chunk has a certain number of samples that are overlapping for each chunk and noverlap defines how many samples are overlapped in each window. The default is 50% of the width of each chunk.
nfft - You are essentially taking the FFT of each chunk. nfft tells you how many FFT points are desired to be computed per chunk. The default number of points is the largest of either 256, or floor(log2(N)) where N is the length of the signal. nfft also gives a measure of how fine-grained the frequency resolution will be. A higher number of FFT points would give higher frequency resolution and thus showing fine-grained details along the frequency axis of the spectrogram if visualised.
fs - The sampling frequency of your signal. The default is 1 Hz, but you can override this to whatever the sampling frequency your signal is at.
Therefore, what you should probably take out of this is that I can't really tell you how to set the parameters. It all depends on what signal you have, but hopefully the above explanation will give you a better idea of how to set the parameters.
Good luck!

Related

Why is the number of sample frequencies in `scipy.signal.stft()` tied to the hop size?

This question relates to SciPy's Short-time Fourier Transform function for signal processing.
For some reason I don't understand, the size of the output 'array of sample frequencies' is exactly equal to the hop size. From the documentation:
nperseg : int, optional
Length of each segment. Defaults to 256.
noverlap : int, optional
Number of points to overlap between segments. If None, noverlap = nperseg // 2. Defaults to None. When specified, the COLA constraint must be met (see Notes below).
f : ndarray
Array of sample frequencies.
hop size H = nperseg - noverlap
I'm new to signal processing and Fourier transforms, but as far as I understand a STFT is just chopping an audio file into segments ('time frames') on which you perform a Fourier transform. So if I want to do a STFT on 100 time frames, I'd expect the output to be a matrix of size 100 x F, where F is an array of measured frequencies ('measured' probably isn't the right word here but you know what I mean).
This is kinda what SciPy's implementation does, but the size of f here is what bothers me. It's supposed to be an array describing the different frequencies, like [0Hz 500Hz 1000Hz], and it does, but for some reasons its size exactly the same as the hop size. If the hop size is 700, the number of measured frequencies is 700.
The hop size is the number of samples (i.e. time) between each time frame, and is correctly calculated as H = nperseg - noverlap, but what does this have to do with the frequency array?
Edit: Related to this question
An FFT is an square matrix transform from one orthogonal basis to another of the same dimension. This is because N is the exact number of orthogonal (e.g. that don't interfere with one another) complex sinusoids that fit in a time domain vector of length N.
A longer time vector can contain more frequency information (e.g. it's hard to tell 2 frequencies apart using just 3 sample points, but much easier with 3000 samples, etc.)
You can zero-pad your short time vector of length N to use a longer FFT, but that is identical to interpolating a nice curve between N frequency points, which makes all the FFT results interdependent.
For many purposes (visualization, etc.) an STFT is overlapped, where the adjacent segments share some overlapped data instead of just being end-to-end. This gives better time locality (e.g. the segments can be spaced closer but still be long enough so that each one can provide the frequency resolution required).

Matlab - creating psd with right scaling

I want to analyze an audiodata (.wav with pcm, 32k as sampling rate) and create the psd of it with the axes Sxx (watts/hertz not db) and f (hertz).
So I would start by reading out the audiodata with:
[x,fs]=audioread('test.wav');
After this I'm having some problems because I dont really know how to proceed and also Matlab always tells me that psd functions won't be supported in the future and that I should use pwelch.. (also tried to build the autocorr and afterwards use fourier to get to the Sxx but it didn't work out really well)
So could anybody tell me how I can get from my vector x to a vector with the psdvalues in watts/hertz and plot it afterwards?
very grateful for every kind of help! :)
Update1: Yes I did read the documentation of pwelch but I'm afraid my english is too bad to understand it completly.
So if I use the psd documentation:
nfft = 2^nextpow2(length(x));
Pxx = abs(fft(x,nfft)).^2/length(x)/fs;
Hpsd = dspdata.psd(Pxx(1:length(Pxx)/2),'fs',fs);
plot(Hpsd)
I'm able to get the plot in db with the peak at the right frequency. (I dont know how dspdata.psd work though)
I tried out:
[Pyy,f]=pwelch(x,fs)
plot(Pyy)
this gives me a non db-scale but the peak is at the wrong frequency
Update 2:
First of all, thanks a lot for your detailed answer! At the moment I'm working on my matlabskills as well as my english language but all the specific technical terms give me a hard time..
When using your example of pwelch on a wav-data with a clear frequency of 1khz, the plot shows me the peak at round about 0.14, could it maybe still be a special-scaled x-axis?
If I try it this way:
[y,fs]=audioread('test.wav');
N=length(y);
bin_vals=0:N-1;
fax_Hz= bin_vals*fs/N;
N_2=ceil(N/2);
Y=fft(y);
pyy=Y.*conj(Y);
plot(fax_Hz(1:N_2),pyy(1:N_2))
the result seems right (is this way correct?), but I still need some time to search for a proper way to display the y-axis in W/Hz, since I dont know how the audiosignal was created.
Update 3:
http://s000.tinyupload.com/index.php?file_id=33803229773204653857
This wav file should have a dominant frequency at 1khz with a duration of 3 seconds and a sampling frequency of 44100Hz. (If I plot the data received from audioread the oscillation seems reasonable)
with
[y,fs]=audioread('1khz.wav');
[pyy,f]=pwelch(y,fs);
plot(f,pyy)
I get a peak at 0.14 on the x-axis.
if I use
[y,fs]=audioread('1khz.wav');
[pyy,f]=pwelch(y,[],[],[],fs);
plot(f,pyy)
instead, the peak is at the 1000. Is this way right? And how could I interpret the difference scaling on the y-axis? (pwelch vs. square of abs)
I also wanted to ask if it is possible to get a flat psd of awgn in matlab? (since you just have finite elements I don't know to get there)
Thanks again for your detailed support!
Update 4
#A.Donda
So I have a new Problem for which I think it is probably necessary to go a bit more into detail. So my plan is basically to do the following:
Read and Audiodata ([y,fs]) and generate white Noise with a certain SNR ([n,fs])
Generate a Filter H which shapes the PSD(y) similiar to the PSD(n)
Generate an inverse Filter G=H^(-1) which reverts the effect of H.
My problem is that with using pwelch, the resulting vectorlength of pyy is way smaller than the vectorlength of y. Since my Filter is determined by P=sqrt(pnn/pyy), I can't multiply fft(y)*H and therefore get no results.
Do you know any help for this Problem?
Or is there a way to go back from a PSD (Welch estimated) to a normal signal (like an inverse function for pwelch)?
In the example you have from the psd documentation, you compute a psd estimate yourself, then put it into a dspdata.psd container and plot it. What dspdata.psd data does here for you is basically compute the frequency axis and provide it to the plot command, nothing more. You get a plot of the spectral density estimate, but that's the one you compute yourself using fft, which is the simplest and worst psd estimate you can get, a so-called periodogram.
Your use of pwelch is almost correct, you just forgot to use the frequency axis information in your plot.
[Pyy,f]=pwelch(x,fs)
plot(f,Pyy)
should give you the peak at the correct frequency.
Your use of pwelch is almost correct, but you have to give the sampling frequency as the 5th argument, and then use the frequency axis information in your plot.
[Pyy,f]=pwelch(y,[],[],[],fs);
plot(f,Pyy)
should give you the peak at the correct frequency.
What pwelch gives you is the spectral density of the signal over Hz. Correct axis labels would therefore be
xlabel('frequency (Hz)')
ylabel('psd (1/Hz)')
The signal you give pwelch is a pure sequence of numbers without physical dimensions. By specifying the sampling rate, the time axis gets a physical unit, s, therefore the resulting frequency is in Hz and the density is in 1/Hz. But still your time series values have no physical dimension, and therefore the density cannot be related to something like W. Has your audiosignal been obtained by a calibrated A/D converter? If yes, you should be able to relate your data to a physical dimension and units, but that's a nontrivial step.
On a personal note, I'd really advise you to brush up on your English, because using software, especially programming interfaces, without properly understanding the documentation is a recipe for disaster.

Methodology of FFT for Matlab spectrogram / short time Fourier transform functions

I'm trying to figure out how MATLAB does the short time Fourier transforms for its spectrogram function (and related functions like specgram, or stft in Octave). What is curious to me is that you can apparently specify the length of the window and the FFT length (number of output frequencies) independently, whereas I would have expected that these two should be equal (since the length of an FFT'd signal is the same as the length of the original signal). To illustrate what I mean, here is the function call:
[S,F,T]=spectrogram(signal,winSize,overlapSize,fftSize,rate);
winSize is the length of subintervals which are to be (individually) FFT'd, and fftSize is the number of frequency components given in the output. When these are not equal, does Matlab do interpolation to produce the required number of frequency bins?
Ultimately the reason I want to know is so that I can determine the proper units and scaling for the frequencies.
Cheers
A windowed segment of a signal can be zero-padded to a longer length vector to use a longer FFT. The frequency scaling will be determined by the length of the FFT (and the signals sample rate). The window size and window formula will determine the effective resolution, in terms of peak separation ability.
Why do this? Some FFT sizes can be computed more efficiently than others (slightly or a lot, depending on the FFT library used). Also, a longer FFT will calculate more points or bins, thus producing a higher density of interpolated points in a potentially smoother spectrum result.

FFT: Match samples to frequency

let us assume,
I have a vector t with the times in seconds of my samples. (These samples are not equally distributed on the time domain.
Also I have a vector data containing the samplevalues at the time t.
t and data have the same length.
If I plot the graph some sort of periodical signal is obtained.
now I could perform: abs(fft(data)) to get my spectrum, which is then plotted over the amount of data points on the x-axis.
How can I obtain my spectrum regarding the times in vector t and plot it?
I want to see which frequencies in 1/s or which period in s my signal contains.
Thanks for your help.
[Not the OP's intention]: FFT will give you the spectrum (global) for any number of input data points. You cannot have a specific data point (in time) associated with parts (or the full) spectrum.
What you can do instead is use spectrogram and obtain the Short-Time Fourier Transform (STFT). This will give you a NxM discrete grid of time-frequency FT values (N: FT frequency bins, M: signal time-windows).
By localizing the (overlapping) STFT windows on your data samples of interest you will get N frequency magnitude values, thus the distribution of short-term spectrum estimates as the signal changes in time.
See also the possibly relevant answer here: https://stackoverflow.com/a/12085728/651951
EDIT/UPDATE:
For unevenly spaced data you need to consider the Non-Uniform DFT (and Non-uniform FFT implementations). See the relevant question/answer here https://scicomp.stackexchange.com/q/593
The primary approaches for NFFT or NUFFT, are based on creating a uniform grid through local convolutions/interpolation, running FFT on this and undoing the convolutional effect of the interpolation filter.
You can read more:
A. Dutt and V. Rokhlin, Fast Fourier transforms for nonequispaced data, SIAM J. Sci. Comput., 14, 1993.
L. Greengard and J.-Y. Lee, Accelerating the Nonuniform Fast Fourier Transform, SIAM Review, 46 (3), 2004.
Pippig, M. und Potts, D., Particle Simulation Based on Nonequispaced Fast Fourier Transforms, in: Fast Methods for Long-Range Interactions in Complex Systems, 2011.
For an implementation (with an interface to MATLAB) try NFFT and possibly its parallelized version PNFFT. You may find a nice walk-through on how to set-up and use here.
You can resample or interpolate your sample points to get another set of sample points that are equally spaced in t. The chosen spacing or sample rate of the second set of equally spaced sample points will allow you to infer frequencies to the result of an FFT of that second set.
The results may be noisy or include aliasing unless the initial data set is bandlimited to a sufficiently low frequency to allow interpolation. If bandlimited, then you might try something like cubic splines as an interpolation method.
Although it may look like one can get a high FFT bin frequency resolution by resampling to a larger number of data points, the actual useful resolution accuracy will be more related to the original number of samples.

How to get coefficients for sine/cosine function from complex FFT in Matlab?

I'm working on a control system that measures the movement of a vibrating robot arm. Because there is some deadtime, I need to look into the future of the somewhat noisy signal.
My idea was to use the frequencies in the sampled signal and produce a fourier function that could be used for extrapolation.
My question: I already have the FFT of the signal vector (containing 60-100 values e.g.) and can see the main frequencies in the amplitude spectrum. Now I want to have a function f(t) which fits to the signal, removes some noise, and can be used to predict the near future of the signal. How do I calculate the coefficients for the sine/cosine functions out of the complex FFT data?
Thank you so much!
AFAIR FFT essentially produces output as a sum of sine functions with different frequencies. The importance of each frequency is the height of each peak. So what you really want to do here is filter out some frequencies (ie. high frequencies for the arm to move gently) and then come back to the time domain.
In matlab this should be like going through the vector of what you got from fft, setting some values to 0 (or doing something more complex to it) and then use ifft to come back to time domain and make the prediction based on what you get.
There's also one thing you should consider while doing this - Nyquist frequency - this means that the highest frequency that you get on your fft is half of the sampling frequency.
If you use an FFT for data that isn't periodic within the FFT aperture length, then you may need to use a window to reduce spurious frequencies due to "spectral leakage". Frequency estimation techniques to better estimate "between bin" frequency content may also be appropriate. The phase of each cosine sinusoid, relative to the edge of the window, is usually atan2(imag[i], real[i]). The frequency depends on the sample rate and bin number versus the length of the FFT.
You might also want to look into using a Kalman filter instead of an FFT.
Added: If your signal isn't exactly integer periodic in the FFT length, then you may want to do an fftshift before the FFT to move the resulting phase measurement reference point to the center of your data vector, instead of a possibly discontinuous circular edge.