This is my first time using Octave/MATLAB for this sort of project. I am also completely new to signal processing, so please excuse my lack of knowledge. The end goal of this project is to create an mfile which will be able to take a wav file which will be recorded from a microphone, add a level of noise distortion to it which will be specified by the user in increments, and also to add variable onset delay to either the right or left channel of the audio for the new wav file that will be generated.
edit 12:29AM 5/13/14
I have a more clear idea of what needs to happen now after discussion with partner on goals and equipment and now need to find out how to solve these blanks. The delay will most likely have to be between 10 and 300 ns max and the intensity of noise should be from 0 to 5 on a scale of silent to heavy static.
clear;
delay=input('Delay in ns: ');
noise=input('Level of distortion: );
[y,Fs,nbits]=wavread(filename);
generate some noise same length and sampling as file
[newy,Fs]=[y+y2,Fs];
shift over wave x many nanoseconds
wavwrite(newy,Fs,'newwave');
Any help with the current goal of combining signals OR if you could help with generating noise to overlay onto any size of .wav recording I would be extremely grateful.
Here's an example of how it might work. I've simplified the problem by limiting the delay to multiples of the sample period. For a 48kHz sample rate, the delay resolution is around 20us. This method is to first convert the delay to a number of samples and prepend it to the samples from the wave file. Second, the noise signal is generated of the same length and then it is added element wise to the first signal.
noiseLevel = input('Level of distortion: '); % between 0 and 1 - 0 means all signal
- 1 means all noise
delaySeconds = input('Delay in seconds: '); % in seconds
[y,fs,nbits] = wavread(filename);
% figure out how many samples to delay. Resolution is 1/fs.
delaySamples = round(delaySeconds * fs);
% signal length
signalLength = length(y) + delaySamples;
% generate noise samples
noiseSignal = gennoise(signalLength); % call your noise generation function.
% prepend zeros to the signal to delay.
delayedSignal = vertcat(zeros(delaySamples,1), y);
combinedSignal = noiseLevel*noiseSignal + (1-noiseLevel)*delayedSignal;
Couple of points:
Unless I'm doing my math wrong (entirely possible), a delay of 10 to 300 ns is not going to be detectable with typical 44 kHz audio sampling rates. You'd need to be in the MHz sampling rate range.
This solution assumes that your signal is one channel (mono). It shouldn't be too difficult to implement more channels using this format.
For adding noise, you can use randn and scale it based on your level of distortion. I'd suggest tinkering with the value that you multiply it by. Alternatively, you can use awgn to add white gaussian noise. I'm sure there are ways to add other kinds of noise, either in Fourier or time domain, but you can look into those.
If you want the noise during the delay, switch the order of the two.
A reminder that you can use sound(newy,Fs) to see if you like your result.
clear;
delay=input('Delay in ns: ');
noise=input('Level of distortion: );
[y,Fs,nbits]=wavread(filename);
% Add random noise to signal
newy = y + noise*0.1*randn(length(y),1);
% shift over wave x many nanoseconds
num = round(delay*1e-9*Fs); % Convert to number of zeros to add to front of data
newy = [zeros(num,1); newy]; % Pad with zeros
wavwrite(newy,Fs,'newwave');
Related
I am writing a piece of code that figures out what frequencies(notes) are being played at any given time of a song (note currently I am testing it grabbing only the first second of the song). To do this I break the first second of the audio file into 8 different chunks. Then I perform an FFT on each chunk and plot it with the following code:
% Taking a second of an audio file and breaking it into n many chunks and
% figuring out what frequencies make up each of those chunks
clear all;
% Read Audio
fs = 44100; % sample frequency (Hz)
full = audioread('song.wav');
% Perform fft and get frequencies
chunks = 8; % How many chunks to break wave into
for i = 1:chunks
beginningChunk = (i-1)*fs/chunks+1
endChunk = i*fs/chunks
x = full(beginningChunk:endChunk);
y = fft(x);
n = length(x); % number of samples in chunk
amp = abs(y)/n; % amplitude of the DFT
%%%amp = amp(1:fs/2/chunks); % note this is my attempt that I think is wrong
f = (0:n-1)*(fs/n); % frequency range
%%%f = f(1:fs/2/chunks); % note this is my attempt that I think is wrong
figure(i);
plot(f,amp)
xlabel('Frequency')
ylabel('amplitude')
end
When I do that I get graphs that look like these:
It looks like I am plotting too many points because the frequencies go up in magnitude at the far right of graphs so I think I am using the double sided spectrum. I think I need to only use the samples from 1:fs/2, the problem is I don't have a big enough matrix to grab that many points. I tried going from 1:fs/2/chunks, but I am unconvinced those are the right values so I commented those out. How can I find the single sided spectrum when there are less than fs/2 samples?
As a side note when I plot all the graphs I notice the frequencies given are almost exactly the same. This is surprising to me because I thought I made the chunks small enough that only the frequency that's happening at the exact time should be grabbed -- and therefore I would be getting the current note being played. If anyone knows how I can single out what note is being played at each time better that information would be greatly appreciated.
For a single-sided FT simply take the first half of the output of the FFT algorithm. The other half (the nagative frequencies) is redundant given that your input is real-valued.
1/8 second is quite long. Note that relevant frequencies are around 160-1600 Hz, if I remeber correctly (music is not my specialty). Those will be in the left-most region of your FT. The highest frequency you compute (after dropping the right half of FFT) is half your sampling frequency, 44.1/2 kHz. The lowest frequency, and the distance between samples, is given by the length of your transform (44.1 kHz / number of samples).
Here is the context of the problem: I have a DTMF signal in wav format, I have to identify the number sequence it has encoded. I must do so using fast fourier transform in Matlab, implying that I read the wav file using wavread and to identify each number that is seperated by 40ms silence or more.
Here is my code so far:
[signal, fs] = wavread( 'C:\Temp\file.wav' ); % here, fs = 8000Hz
N = 512;
T = 1/fs;
L = length( signal )
samples = fs / 1000 * 40
windows = floor(L / samples)
t = (1:L)/fs;
figure(1), plot(t, signal);
Here is what the figure 1 looks like, that is the signal read from the wav:
How can I effectively split the signal into pieces so that I can then do an FFT on each of the 10 pieces seperately to decode the corresponding numbers?
I would recommend the following approach:
Find the envelope of the signal in the time domain (see Hilbert transform).
Smooth the envelope a bit.
Take the diff and find peaks to get the onsets of the tones.
Use the onsets to pick frames and find the spectrum using fft.
Find the index of the max in each of the spectrums and convert them to a frequency.
The tricky part in this is to get a robust onset detector in point 3. The peaks in the difference you pick, has to be of a certain size in order to qualify as on onset. If your tones are of varying strength this might pose a problem, but from your image of the time signal it doesn't seem like a problem.
Regards
This worked for me:
windowSize = 256;
nbWindows = floor(L / windowSize);
for i=1:nbWindows
coeffs = fft(signal((i-1)*windowSize+1:i*windowSize));
plot(abs(coeffs(1:N)));
waitforbuttonpress
end;
This way it is possible to shift the window until the end of the input signal
Here is the context of the problem: I have a DTMF signal in wav format, I have to identify the number sequence it has encoded. I must do so using fast fourier transform in Matlab, implying that I read the wav file using wavread and to identify each number that is seperated by 40ms silence or more.
Here is my code so far:
[signal, fs] = wavread( 'C:\Temp\file.wav' ); % here, fs = 8000Hz
N = 512;
T = 1/fs;
L = length( signal )
samples = fs / 1000 * 40
windows = floor(L / samples)
t = (1:L)/fs;
figure(1), plot(t, signal);
Here is what the figure 1 looks like, that is the signal read from the wav:
How can I effectively split the signal into pieces so that I can then do an FFT on each of the 10 pieces seperately to decode the corresponding numbers?
I would recommend the following approach:
Find the envelope of the signal in the time domain (see Hilbert transform).
Smooth the envelope a bit.
Take the diff and find peaks to get the onsets of the tones.
Use the onsets to pick frames and find the spectrum using fft.
Find the index of the max in each of the spectrums and convert them to a frequency.
The tricky part in this is to get a robust onset detector in point 3. The peaks in the difference you pick, has to be of a certain size in order to qualify as on onset. If your tones are of varying strength this might pose a problem, but from your image of the time signal it doesn't seem like a problem.
Regards
This worked for me:
windowSize = 256;
nbWindows = floor(L / windowSize);
for i=1:nbWindows
coeffs = fft(signal((i-1)*windowSize+1:i*windowSize));
plot(abs(coeffs(1:N)));
waitforbuttonpress
end;
This way it is possible to shift the window until the end of the input signal
I am pretty new to Matlab and I am trying to write a simple frequency based speech detection algorithm. The end goal is to run the script on a wav file, and have it output start/end times for each speech segment. If use the code:
fr = 128;
[ audio, fs, nbits ] = wavread(audioPath);
spectrogram(audio,fr,120,fr,fs,'yaxis')
I get a useful frequency intensity vs. time graph like this:
By looking at it, it is very easy to see when speech occurs. I could write an algorithm to automate the detection process by looking at each x-axis frame, figuring out which frequencies are dominant (have the highest intensity), testing the dominant frequencies to see if enough of them are above a certain intensity threshold (the difference between yellow and red on the graph), and then labeling that frame as either speech or non-speech. Once the frames are labeled, it would be simple to get start/end times for each speech segment.
My problem is that I don't know how to access that data. I can use the code:
[S,F,T,P] = spectrogram(audio,fr,120,fr,fs);
to get all the features of the spectrogram, but the results of that code don't make any sense to me. The bounds of the S,F,T,P arrays and matrices don't correlate to anything I see on the graph. I've looked through the help files and the API, but I get confused when they start throwing around algorithm names and acronyms - my DSP background is pretty limited.
How could I get an array of the frequency intensity values for each frame of this spectrogram analysis? I can figure the rest out from there, I just need to know how to get the appropriate data.
What you are trying to do is called speech activity detection. There are many approaches to this, the simplest might be a simple band pass filter, that passes frequencies where speech is strongest, this is between 1kHz and 8kHz. You could then compare total signal energy with bandpass limited and if majority of energy is in the speech band, classify frame as speech. That's one option, but there are others too.
To get frequencies at peaks you could use FFT to get spectrum and then use peakdetect.m. But this is a very naïve approach, as you will get a lot of peaks, belonging to harmonic frequencies of a base sine.
Theoretically you should use some sort of cepstrum (also known as spectrum of spectrum), which reduces harmonics' periodicity in spectrum to base frequency and then use that with peakdetect. Or, you could use existing tools, that do that, such as praat.
Be aware, that speech analysis is usually done on a frames of around 30ms, stepping in 10ms. You could further filter out false detection by ensuring formant is detected in N sequential frames.
Why don't you use fft with `fftshift:
%% Time specifications:
Fs = 100; % samples per second
dt = 1/Fs; % seconds per sample
StopTime = 1; % seconds
t = (0:dt:StopTime-dt)';
N = size(t,1);
%% Sine wave:
Fc = 12; % hertz
x = cos(2*pi*Fc*t);
%% Fourier Transform:
X = fftshift(fft(x));
%% Frequency specifications:
dF = Fs/N; % hertz
f = -Fs/2:dF:Fs/2-dF; % hertz
%% Plot the spectrum:
figure;
plot(f,abs(X)/N);
xlabel('Frequency (in hertz)');
title('Magnitude Response');
Why do you want to use complex stuff?
a nice and full solution may found in https://dsp.stackexchange.com/questions/1522/simplest-way-of-detecting-where-audio-envelopes-start-and-stop
Have a look at the STFT (short-time fourier transform) or (even better) the DWT (discrete wavelet transform) which both will estimate the frequency content in blocks (windows) of data, which is what you need if you want to detect sudden changes in amplitude of certain ("speech") frequencies.
Don't use a FFT since it calculates the relative frequency content over the entire duration of the signal, making it impossible to determine when a certain frequency occured in the signal.
If you still use inbuilt STFT function, then to plot the maximum you can use following command
plot(T,(floor(abs(max(S,[],1)))))
Question is as stated in the title.
After i decimate an audio signal that take every nth point out it in turns speeds up the audio clip at a factor of n. I want the decimated and original clips to have the same length in time.
Heres my code, analyzing and decimated piano .wav
[piano,fs]=wavread('piano.wav'); % loads piano
play=piano(:,1); % Renames the file as "play"
t = linspace(0,time,length(play)); % Time vector
x = play;
y = decimate(x,2);
stem(x(1:30)), axis([0 30 -2 2]) % Original signal
title('Original Signal')
figure
stem(y(1:30)) % Decimated signal
title('Decimated Signal')
%changes the sampling rate
fs1 = fs/2;
fs2 = fs/3;
fs3 = fs/4;
fs4 = fs*2;
fs5 = fs*3;
fs6 = fs*4;
wavwrite(y,fs,'PianoDecimation');
possible solutions: Double each of the remaining points since the new decimated clip is 2x shorter then the original.
I just want to be able to have a side by side comparison of the 2 clips.
here is the audio file: http://www.4shared.com/audio/11xvNmkd/piano.html
Although #sage's answer has a lot of good information, I think the answer to the question is as simple as changing your last line to:
wavplay(y,fs/2,'PianoDecimation')
You have removed half the samples in the file, so in order to play it back over the same time period as the original, you need to set the playback frequency to half as many samples per second.
Are you using wavplay, audioplayer, or something else to play the decimated signals? Are you explicitly specifying the sample frequencies?
The functions take the sample frequency as one of the parameters (the second parameter). You are decreasing the sample frequency as you decimate, so you need to update that parameter accordingly.
Also, when you are plotting the data, you should:
plot N times as many points on the original data (when decimating by N)
provide a corresponding x axis input - I recommend t = (1/Fs:1/Fs:maxT) where maxT is the maximum time you want to plot, which will address #1 if you use the updated Fs, which will result in larger time steps (and make sure to transpose t if it does not match your signal)
I have added an example that plays chirp and decimated chirp (this chirp is part of the standard MATLAB install). I amplified the decimated version. The tic and toc show that the elapsed time is equivalent (within variations in processor loading, etc.) - note that this also works for decim = 3, etc:
load chirp
inWav = y;
inFs = Fs;
decim = 2;
outWav = decimate(inWav,decim);
outFs = inFs/decim;
tic, wavplay(inWav,inFs),toc
pause(0.2)
tic,wavplay(outWav*decim^2,outFs),toc
The function 'decimate' really messes up the chirp sound (the sample rate of which is not very high frequency to begin with), but perhaps you are trying to show something like this...