Here is the context of the problem: I have a DTMF signal in wav format, I have to identify the number sequence it has encoded. I must do so using fast fourier transform in Matlab, implying that I read the wav file using wavread and to identify each number that is seperated by 40ms silence or more.
Here is my code so far:
[signal, fs] = wavread( 'C:\Temp\file.wav' ); % here, fs = 8000Hz
N = 512;
T = 1/fs;
L = length( signal )
samples = fs / 1000 * 40
windows = floor(L / samples)
t = (1:L)/fs;
figure(1), plot(t, signal);
Here is what the figure 1 looks like, that is the signal read from the wav:
How can I effectively split the signal into pieces so that I can then do an FFT on each of the 10 pieces seperately to decode the corresponding numbers?
I would recommend the following approach:
Find the envelope of the signal in the time domain (see Hilbert transform).
Smooth the envelope a bit.
Take the diff and find peaks to get the onsets of the tones.
Use the onsets to pick frames and find the spectrum using fft.
Find the index of the max in each of the spectrums and convert them to a frequency.
The tricky part in this is to get a robust onset detector in point 3. The peaks in the difference you pick, has to be of a certain size in order to qualify as on onset. If your tones are of varying strength this might pose a problem, but from your image of the time signal it doesn't seem like a problem.
Regards
This worked for me:
windowSize = 256;
nbWindows = floor(L / windowSize);
for i=1:nbWindows
coeffs = fft(signal((i-1)*windowSize+1:i*windowSize));
plot(abs(coeffs(1:N)));
waitforbuttonpress
end;
This way it is possible to shift the window until the end of the input signal
Related
I am writing a piece of code that figures out what frequencies(notes) are being played at any given time of a song (note currently I am testing it grabbing only the first second of the song). To do this I break the first second of the audio file into 8 different chunks. Then I perform an FFT on each chunk and plot it with the following code:
% Taking a second of an audio file and breaking it into n many chunks and
% figuring out what frequencies make up each of those chunks
clear all;
% Read Audio
fs = 44100; % sample frequency (Hz)
full = audioread('song.wav');
% Perform fft and get frequencies
chunks = 8; % How many chunks to break wave into
for i = 1:chunks
beginningChunk = (i-1)*fs/chunks+1
endChunk = i*fs/chunks
x = full(beginningChunk:endChunk);
y = fft(x);
n = length(x); % number of samples in chunk
amp = abs(y)/n; % amplitude of the DFT
%%%amp = amp(1:fs/2/chunks); % note this is my attempt that I think is wrong
f = (0:n-1)*(fs/n); % frequency range
%%%f = f(1:fs/2/chunks); % note this is my attempt that I think is wrong
figure(i);
plot(f,amp)
xlabel('Frequency')
ylabel('amplitude')
end
When I do that I get graphs that look like these:
It looks like I am plotting too many points because the frequencies go up in magnitude at the far right of graphs so I think I am using the double sided spectrum. I think I need to only use the samples from 1:fs/2, the problem is I don't have a big enough matrix to grab that many points. I tried going from 1:fs/2/chunks, but I am unconvinced those are the right values so I commented those out. How can I find the single sided spectrum when there are less than fs/2 samples?
As a side note when I plot all the graphs I notice the frequencies given are almost exactly the same. This is surprising to me because I thought I made the chunks small enough that only the frequency that's happening at the exact time should be grabbed -- and therefore I would be getting the current note being played. If anyone knows how I can single out what note is being played at each time better that information would be greatly appreciated.
For a single-sided FT simply take the first half of the output of the FFT algorithm. The other half (the nagative frequencies) is redundant given that your input is real-valued.
1/8 second is quite long. Note that relevant frequencies are around 160-1600 Hz, if I remeber correctly (music is not my specialty). Those will be in the left-most region of your FT. The highest frequency you compute (after dropping the right half of FFT) is half your sampling frequency, 44.1/2 kHz. The lowest frequency, and the distance between samples, is given by the length of your transform (44.1 kHz / number of samples).
I'm working on Matlab, I want to perform FFT on a wav file I previously recorded on Matlab as well.
fs = 44100; % Hz
t = 0:1/fs:1; % seconds
f = 600; % Hz
y = sin(2.*pi.*f.*t);
audiowrite('600freq.wav',y,fs)
This is the way I'm recording in the wav file.
Now to the reading and FFT part:
[y,Fs] = audioread('600freq.wav');
sound(y)
plot(fft(y))
This is the plot of the FFT I get:
Maybe I'm missing something about the FFT, but I expected two vertical lollipops.
Another thing I noticed that's wrong, is when I play the sound after reading it form the file it's longer and the pitch is significantly lower.
My guess is a sampling rate problem, but I really have no idea of what to do about it.
Thanks for any help in advance.
That's because you're not plotting the magnitude. What you are plotting are the coefficients, but these are complex valued. Because of that, the horizontal axis is the real component and the vertical axis is the imaginary component. Also, when you use sound by itself, the default sampling frequency is 8 kHz (8192 Hz to be exact) which explains why your sound is of a lower pitch. You need to use the sampling frequency as a second argument into sound, and that's given to you by the second output of audioread.
So, try placing abs after the fft call and also use Fs into sound:
[y,Fs] = audioread('600freq.wav');
sound(y, Fs);
plot(abs(fft(y)))
Also, the above code doesn't plot the horizontal axis properly. If you want to do that, make sure you fftshift your spectra after you take the Fourier transform, then label your axis properly. If you want to determine what each horizontal value is in terms of frequency, this awesome post by Paul R does the trick: How do I obtain the frequencies of each value in an FFT?
Basically, each horizontal value in your FFT is such that:
F = i * Fs / N
i is the bin number, Fs is the sampling frequency and N is the number of points you're using for the FFT. F is the interpreted frequency of the component you're looking at.
By default, fft assumes that N is the total number of points in your array. For the one-sided FFT, i goes from 0, 1, 2, up to floor((N-1)/2) due to the Nyquist sampling theorem.
Because what you're actually doing in the code you tried to write is displaying both sides of the spectrum, that's why it's nice to centre the spectrum so that the DC frequency is located in the middle and the left side is the negative spectra and the right side is the positive spectra.
We can incorporate that into your code here:
[y,Fs] = audioread('600freq.wav');
sound(y, Fs);
F = fftshift(abs(fft(y)));
f = linspace(-Fs/2, Fs/2, numel(y)+1);
f(end) = [];
plot(f, F);
The horizontal axis now reflects the correct frequency of each component as well as the vertical axis reflecting the magnitude of each component.
By running your audio generation code which generates a sine tone at 600 Hz, and then the above code to plot the spectra, I get this:
Note that I inserted a tool tip right at the positive side of the spectra... and it's about 600 Hz!
Here is the context of the problem: I have a DTMF signal in wav format, I have to identify the number sequence it has encoded. I must do so using fast fourier transform in Matlab, implying that I read the wav file using wavread and to identify each number that is seperated by 40ms silence or more.
Here is my code so far:
[signal, fs] = wavread( 'C:\Temp\file.wav' ); % here, fs = 8000Hz
N = 512;
T = 1/fs;
L = length( signal )
samples = fs / 1000 * 40
windows = floor(L / samples)
t = (1:L)/fs;
figure(1), plot(t, signal);
Here is what the figure 1 looks like, that is the signal read from the wav:
How can I effectively split the signal into pieces so that I can then do an FFT on each of the 10 pieces seperately to decode the corresponding numbers?
I would recommend the following approach:
Find the envelope of the signal in the time domain (see Hilbert transform).
Smooth the envelope a bit.
Take the diff and find peaks to get the onsets of the tones.
Use the onsets to pick frames and find the spectrum using fft.
Find the index of the max in each of the spectrums and convert them to a frequency.
The tricky part in this is to get a robust onset detector in point 3. The peaks in the difference you pick, has to be of a certain size in order to qualify as on onset. If your tones are of varying strength this might pose a problem, but from your image of the time signal it doesn't seem like a problem.
Regards
This worked for me:
windowSize = 256;
nbWindows = floor(L / windowSize);
for i=1:nbWindows
coeffs = fft(signal((i-1)*windowSize+1:i*windowSize));
plot(abs(coeffs(1:N)));
waitforbuttonpress
end;
This way it is possible to shift the window until the end of the input signal
This is my first time using Octave/MATLAB for this sort of project. I am also completely new to signal processing, so please excuse my lack of knowledge. The end goal of this project is to create an mfile which will be able to take a wav file which will be recorded from a microphone, add a level of noise distortion to it which will be specified by the user in increments, and also to add variable onset delay to either the right or left channel of the audio for the new wav file that will be generated.
edit 12:29AM 5/13/14
I have a more clear idea of what needs to happen now after discussion with partner on goals and equipment and now need to find out how to solve these blanks. The delay will most likely have to be between 10 and 300 ns max and the intensity of noise should be from 0 to 5 on a scale of silent to heavy static.
clear;
delay=input('Delay in ns: ');
noise=input('Level of distortion: );
[y,Fs,nbits]=wavread(filename);
generate some noise same length and sampling as file
[newy,Fs]=[y+y2,Fs];
shift over wave x many nanoseconds
wavwrite(newy,Fs,'newwave');
Any help with the current goal of combining signals OR if you could help with generating noise to overlay onto any size of .wav recording I would be extremely grateful.
Here's an example of how it might work. I've simplified the problem by limiting the delay to multiples of the sample period. For a 48kHz sample rate, the delay resolution is around 20us. This method is to first convert the delay to a number of samples and prepend it to the samples from the wave file. Second, the noise signal is generated of the same length and then it is added element wise to the first signal.
noiseLevel = input('Level of distortion: '); % between 0 and 1 - 0 means all signal
- 1 means all noise
delaySeconds = input('Delay in seconds: '); % in seconds
[y,fs,nbits] = wavread(filename);
% figure out how many samples to delay. Resolution is 1/fs.
delaySamples = round(delaySeconds * fs);
% signal length
signalLength = length(y) + delaySamples;
% generate noise samples
noiseSignal = gennoise(signalLength); % call your noise generation function.
% prepend zeros to the signal to delay.
delayedSignal = vertcat(zeros(delaySamples,1), y);
combinedSignal = noiseLevel*noiseSignal + (1-noiseLevel)*delayedSignal;
Couple of points:
Unless I'm doing my math wrong (entirely possible), a delay of 10 to 300 ns is not going to be detectable with typical 44 kHz audio sampling rates. You'd need to be in the MHz sampling rate range.
This solution assumes that your signal is one channel (mono). It shouldn't be too difficult to implement more channels using this format.
For adding noise, you can use randn and scale it based on your level of distortion. I'd suggest tinkering with the value that you multiply it by. Alternatively, you can use awgn to add white gaussian noise. I'm sure there are ways to add other kinds of noise, either in Fourier or time domain, but you can look into those.
If you want the noise during the delay, switch the order of the two.
A reminder that you can use sound(newy,Fs) to see if you like your result.
clear;
delay=input('Delay in ns: ');
noise=input('Level of distortion: );
[y,Fs,nbits]=wavread(filename);
% Add random noise to signal
newy = y + noise*0.1*randn(length(y),1);
% shift over wave x many nanoseconds
num = round(delay*1e-9*Fs); % Convert to number of zeros to add to front of data
newy = [zeros(num,1); newy]; % Pad with zeros
wavwrite(newy,Fs,'newwave');
I am pretty new to Matlab and I am trying to write a simple frequency based speech detection algorithm. The end goal is to run the script on a wav file, and have it output start/end times for each speech segment. If use the code:
fr = 128;
[ audio, fs, nbits ] = wavread(audioPath);
spectrogram(audio,fr,120,fr,fs,'yaxis')
I get a useful frequency intensity vs. time graph like this:
By looking at it, it is very easy to see when speech occurs. I could write an algorithm to automate the detection process by looking at each x-axis frame, figuring out which frequencies are dominant (have the highest intensity), testing the dominant frequencies to see if enough of them are above a certain intensity threshold (the difference between yellow and red on the graph), and then labeling that frame as either speech or non-speech. Once the frames are labeled, it would be simple to get start/end times for each speech segment.
My problem is that I don't know how to access that data. I can use the code:
[S,F,T,P] = spectrogram(audio,fr,120,fr,fs);
to get all the features of the spectrogram, but the results of that code don't make any sense to me. The bounds of the S,F,T,P arrays and matrices don't correlate to anything I see on the graph. I've looked through the help files and the API, but I get confused when they start throwing around algorithm names and acronyms - my DSP background is pretty limited.
How could I get an array of the frequency intensity values for each frame of this spectrogram analysis? I can figure the rest out from there, I just need to know how to get the appropriate data.
What you are trying to do is called speech activity detection. There are many approaches to this, the simplest might be a simple band pass filter, that passes frequencies where speech is strongest, this is between 1kHz and 8kHz. You could then compare total signal energy with bandpass limited and if majority of energy is in the speech band, classify frame as speech. That's one option, but there are others too.
To get frequencies at peaks you could use FFT to get spectrum and then use peakdetect.m. But this is a very naïve approach, as you will get a lot of peaks, belonging to harmonic frequencies of a base sine.
Theoretically you should use some sort of cepstrum (also known as spectrum of spectrum), which reduces harmonics' periodicity in spectrum to base frequency and then use that with peakdetect. Or, you could use existing tools, that do that, such as praat.
Be aware, that speech analysis is usually done on a frames of around 30ms, stepping in 10ms. You could further filter out false detection by ensuring formant is detected in N sequential frames.
Why don't you use fft with `fftshift:
%% Time specifications:
Fs = 100; % samples per second
dt = 1/Fs; % seconds per sample
StopTime = 1; % seconds
t = (0:dt:StopTime-dt)';
N = size(t,1);
%% Sine wave:
Fc = 12; % hertz
x = cos(2*pi*Fc*t);
%% Fourier Transform:
X = fftshift(fft(x));
%% Frequency specifications:
dF = Fs/N; % hertz
f = -Fs/2:dF:Fs/2-dF; % hertz
%% Plot the spectrum:
figure;
plot(f,abs(X)/N);
xlabel('Frequency (in hertz)');
title('Magnitude Response');
Why do you want to use complex stuff?
a nice and full solution may found in https://dsp.stackexchange.com/questions/1522/simplest-way-of-detecting-where-audio-envelopes-start-and-stop
Have a look at the STFT (short-time fourier transform) or (even better) the DWT (discrete wavelet transform) which both will estimate the frequency content in blocks (windows) of data, which is what you need if you want to detect sudden changes in amplitude of certain ("speech") frequencies.
Don't use a FFT since it calculates the relative frequency content over the entire duration of the signal, making it impossible to determine when a certain frequency occured in the signal.
If you still use inbuilt STFT function, then to plot the maximum you can use following command
plot(T,(floor(abs(max(S,[],1)))))