So I know what is MFCC (Mel Frequency Cepstrum Coefficients). But I need to understand what each value is... Is it some sort of sound frequency value or what?
Let's assume we have this kind of matrix. So each row represents the coefficients of one frame, but what are those numbers? Is it maybe highest frequency or something?
Cepstrum is typically derived by computing Discrete Cosine Transform of (symmetric) log power spectrum of a frame of speech; here, the log power spectrum [curve] is treated as a signal (https://en.wikipedia.org/wiki/Mel-frequency_cepstrum). So, the cepstral coefficients are measures of similarity between a sequence/curve (that represents the log power spectrum) and cosine waves of different 'frequencies'. The cepstral coefficients capture the rate with which the values of this sequence varies.
The first cepstral coefficient is the dot product of the log power spectrum with the [periodic] cosine wave whose one period begins at the origin (f=0) in the frequency domain, and ends at f=2*Pi radians (or equivalently, sampling frequency). An illustration: the log power spectrum of an vowel has high energy in the low frequency region (near f=0), and low energy in the high frequency region (near f=Pi). In other words, the slope of the log power spectrum curve in the range [0,Pi] has a negative slope in case of vowels. Since this variation of the log power spectrum is similar to that of the cosine wave mentioned above, the first cepstral coefficient of an vowel speech frame will have positive value. In contrast, cepstrum[1] of an unvoiced fricative such as /s/ will have negative value because its log power spectrum would have positive slope due to low energy at low frequency and high energy at high frequency as well as significant energy at low frequency due to voicing.
Similarly, cepstrum[2] would be positive if there is a major valley in the log power spectrum at f=Pi/2. The log power spectrum of a voiced fricative (ex: /z/) would come close to such a description because there is significant energy at high frequency due to fricative nature of the sound. Of course, cepstrum[0] is an average of log power spectral values; it captures the volume/loudness of the signal.
Related
I have two questions that are bugging me:
Does the duration of an audio signal affect the amplitude of frequency components of that same signal? For example, I am recording the sound of a fan using a microphone. At first, I record only for 10 sec and convert the audio signal into frequency spectrum. Then, I record the same sound for 20 sec and then convert the audio signal into frequency spectrum. In both the cases, the sound of the fan is same, but does the duration of the signal affect the amplitude of frequency components in the spectrum plot?
For example, I have 2 audio signals. For the first one, I have that same fan sound recording for 10 sec and the sampling frequency is 5KHz, and for the second recording, I have that same audio signal but now the sampling frequency is changed to 15KHz. I used MATLAB to check the power for both the signals and the power for both the signals was same, however I want to know why. Formula that I used was Power=rms(signal)^2. According to me the second signal should have more power because now there are more number of samples compared to the first recording and since those extra samples would also have a random amplitude, the average shouldn't be the same as for the first one. Am I thinking it right?
Can anyone provide their thoughts? Thank You!
This answer is from: https://dsp.stackexchange.com/questions/75025/does-the-duration-of-a-signal-affect-its-frequency-components-amplitude-also
Power is energy per unit time. If you increase the duration, you increase the energy, but due to the normalization with time the power would be the same.
The DFT as given by
X[k]=∑n=0N−1x[n]e−j2πnk/N
will scale the frequency component by N as given by the summation over N samples. This can be normalized by multiplying the result by 1/N.
The frequency components of the signal levels will be the same in a normalized DFT (normalized by dividing by the total number of samples) for signal components that occupy one bin (pure tones), but the noise floor as observed may be lower by the change in sampling rate: if the noise floor is limited by quantization noise, the total quantization noise (well approximated as white noise, meaning constant across all frequencies) will be spread over a wider frequency range, so increasing the sampling rate will cause the contribution of quantization noise on a per Hz basis (noise density) to be lower. Further the duration will effect the frequency resolution in each bin in the frequency domain; for an unwindowed DFT, the equivalent noise bandwidth per bin is fs/N where fs is the sampling rate and N is the number of bins. At a given sampling rate, increasing the number of bins will increase the total duration and thus reduce the noise bandwidth per bin; causing the overall DFT noise floor as observed to decrease. (Same noise power just less measured in each bin). Windowing if done in the time domain will reduce the signal level by the coherent gain of the window, but increase the equivalent noise bandwidth such that pure tones will be reduced more than noise that is spread over multiple bins.
I have a audio signal sample at the rate of 10Khz, I need to find fourier coefficients of my signal. I saw one example in mathwork's website where they are using following code to do the fft decomposition of a signal y:
NFFT = 2^nextpow2(L);
Y = fft(y,NFFT)/L;
f = Fs/2*linspace(0,1,NFFT/2+1);
where L is the length of the signal, I don't really understand why its defining the variable NFFT the way shown in the code above? Can't I just chose any value for NFFT? Also why are we taking Fs/2 in third line of the code above?
NFFT can be any positive value, but FFT computations are typically much more efficient when the number of samples can be factored into small primes. Quoting from Matlab documentation:
The execution time for fft depends on the length of the transform. It is fastest for powers of two. It is almost as fast for lengths that have only small prime factors. It is typically several times slower for lengths that are prime or which have large prime factors.
It is thus common to compute the FFT for the power of 2 which is greater or equal to the number of samples of the signal y. This is what NFFT = 2^nextpow2(L) does (in the Example from Matlab documentation y is constructed to have a length L).
When NFFT > L the signal is zero padded to the NFFT length.
As far as fs/2 goes, it is simply because the frequency spectrum of a real-valued signal has Hermitian symmetry (which means that the values spectrum above fs/2 can be obtained from the complex-conjugate of the values below fs/2), and as such is completely specifies from the first NFFT/2+1 values (with the index NFFT/2+1 corresponding to fs/2). So, instead of showing the redundant information above fs/2, the example chose to illustrate only the spectrum up to fs/2.
Output of FFT is complex for real valued input. That means for a signal sampled at Fs Hz, The fourier transform of this signal will have frequency components from -Fs/2 to Fs/2 and is symmetric at zero Hz. (Nyquist criterion states that if you have a signal with maxium frequency component at f Hz, you need to sample it with atleast 2f Hz .
You may wonder what does negative frequency mean here. If you are a mathematician you may care about the negative frequency but if you are an engineer, you may choose to ignore the notion of negative frequency and focus only on frequencies from 0 to Fs/2. (Max freq component for a signal sampled at Fs Hz is Fs/2)
Using FFT to learn more about frequency components present in your signal is cumbsrsome. You can use the function pwelch function in MATLAB to learn more frequencies present in your signal and also the power of these signals. MATLAB will automatically compute the NFFT required and return the frequencies present in your signal along with the power at each frequency. Use this syntax:
[p,f] = pwelch(x,[],[],[],Fs)
Look at the documentation of pwelch for more information.
Assuming I have two signals (raw data as excel file) measured from two different power supplies, I want to compare the noise-levels of these signals to find out which one of them the noisier one. Both power supplies produce signals with the same frequency but the amount of data points are different. Is there a way to do this in MATLAB?
You could calculate the signal-to-noise ratio for each signal. This is just a ratio of the average signal power and average noise power, usually measured in decibels. An ideal noiseless signal would have SNR = infinity.
Recall that signal power is just the square of the signal amplitude, and to get the value x in decibels, we just take 10*log10(x).
SNR = 10*log10( mean(signal.^2)/mean(noise.^2) );
To separate the signal from the noise, you could run a low-pass filter over the noisy signal.
To get the noise you could just subtract the clean signal from the noisy signal.
noise = noisy_signal - signal;
So I have the samples (hex values) of a sinus signal and I know the sampling frequency. Using this I can plot an fft or periodogram but then I would like to find out the SNR ratio. What would be the most accurate way to calculate the noise and signal power? I would prefer doing it in frequency domain. Is there a way to do this also in time domain?
Thanks a lot in advance!!!
So if there is noise on your signal and you know that your underlying signal is a sine wave, you can easily get your signal parameters i.e. amplitude,frequency and phase by least squares. If y(t) is your signal just minimize the L2 norm of (y(t)-A.sin(wt+b)) over A,w and b. Then you can easily get signal power from the underlying signal and the noise power from the error signal (y(t)-A.sin(wt+b)).
I'm working on a control system that measures the movement of a vibrating robot arm. Because there is some deadtime, I need to look into the future of the somewhat noisy signal.
My idea was to use the frequencies in the sampled signal and produce a fourier function that could be used for extrapolation.
My question: I already have the FFT of the signal vector (containing 60-100 values e.g.) and can see the main frequencies in the amplitude spectrum. Now I want to have a function f(t) which fits to the signal, removes some noise, and can be used to predict the near future of the signal. How do I calculate the coefficients for the sine/cosine functions out of the complex FFT data?
Thank you so much!
AFAIR FFT essentially produces output as a sum of sine functions with different frequencies. The importance of each frequency is the height of each peak. So what you really want to do here is filter out some frequencies (ie. high frequencies for the arm to move gently) and then come back to the time domain.
In matlab this should be like going through the vector of what you got from fft, setting some values to 0 (or doing something more complex to it) and then use ifft to come back to time domain and make the prediction based on what you get.
There's also one thing you should consider while doing this - Nyquist frequency - this means that the highest frequency that you get on your fft is half of the sampling frequency.
If you use an FFT for data that isn't periodic within the FFT aperture length, then you may need to use a window to reduce spurious frequencies due to "spectral leakage". Frequency estimation techniques to better estimate "between bin" frequency content may also be appropriate. The phase of each cosine sinusoid, relative to the edge of the window, is usually atan2(imag[i], real[i]). The frequency depends on the sample rate and bin number versus the length of the FFT.
You might also want to look into using a Kalman filter instead of an FFT.
Added: If your signal isn't exactly integer periodic in the FFT length, then you may want to do an fftshift before the FFT to move the resulting phase measurement reference point to the center of your data vector, instead of a possibly discontinuous circular edge.