HTK tool and Sampling Rate - mfcc

Hello, guys. (Is it OK that start this sentence?)
I tried to simulate with HTK tool that voice(word) recognition.
And I have *.wav files.
Some files have 16KHz sampling rate, some files have 44.1KHz sampling rate.
And I make mfcc file about each sampling rate.
But, I make HMM model using all(16KHz's and 44.1KHz's) mfcc files.
voice_16KHz.wav -> voice_1.mfcc
voice_44.1KHz.wav -> voice_2.mfcc
make hmm_model using voice_1.mfcc and voice_2.mfcc
Is it OK that make HMM model with mixing different sampling rate?
I should know surely information, not suggestion.
Thanks for reading.

I will suggest you to go for one sampling rate only. It will work but the Accuracy will change and also you need to change the configuration details in MFCC config file. Downgrade 44.1KHz files to 16KHz it's very easy.

Related

If there any way to get frequency of sound.

I want to know the frequency of data. I had a little bit idea that it can be done using FFT, but I am not sure how to do it. Once I passed the entire data to FFT, then it is giving me 2 peaks, but how can I get the frequency?
Thanks a lot in advance.
Have a look at this page for an explanation on how to calculate it:
FFT Fundamentals
Please also check this answer (it's C# code but I think you can easily understand it)
How do I obtain the frequencies of each value in an FFT?
And finally have a look at this one, it uses DFT instead of FFT:
Determining the magnitude of a certain frequency on the iPhone
I also found this implementation that you can use in Objective-C:
A lib to find the frequency https://github.com/jkells/sc_listener
A example using the above library https://github.com/jkells/sc_listener_sample
Regards
An FFT will give you the frequency of all the sinusoidal components of a signal. If instead you want just the frequency of the periodicity of common waveforms (more interesting sounding and looking that a plain sinewave) such as produced by speech and music, then you may want to use a pitch detection/estimation algorithm instead of just an FFT peak.

audio pattern matching in matlab

Can someone please give me an idea about this problem in matlab ,
I have 4 .wav files that contain the chirping of the birds . Each .wav file represents a different bird. Given an input .wav file , I need to decide which bird it is . I know I have to make frequency spectrum comparison to get to the solution . but don't quite know how i should use spectrogram to help me get there .
P.S. I know how what spectrogram does and have plotted quite a few .wav files with it though
There are several methods for patter recognition problem like the one that you are talking.
You can use a frequency analysis like FFT with the matlab function
S = SPECTROGRAM(X,WINDOW,NOVERLAP)
In SPECTROGRAM you need to define the time window of signal to be analysed in the variable WINDOW. You can use a rectangular window (example WINDOW = [1 1 1 1 1 1 1 ... 1]) with the number of values equal to the desired length. There are a lot of windows to use: hanning, hamming, blackman. You should used the one which is better to your problem. The NOVERLAP is the number of points that your windows moves in one step.
Besides this approach, wavelet transform is also a good technique to solve your problem. Matlab also have a good toolbox to apply discrete and continuous wavelets.
You might try to solve the problem with Deep Belief Networks
Here are some articles that might be helpful:
Audio Feature Extraction with Restricted Boltzmann Machines
Unsupervised feature learning for audio classification
To summarize the idea, instead of manually tune the features, employ RBMs or Autoencoder to extract features (bases) which represent the observed audio samples and then run learning algorithm.
You will need more than 4 audio samples in order to train DBN but it is worth trying as the approach has shown promising results in the past.
This tuturial might be also helpful.
this may prove to be a complicated problem. As a starting point I advise you to divide each record into some fixed length of frames like 20ms with 10ms overlap ,then extract fft of these frames and get some max energy freq. values for each frame. As a last step compare frame frequencies with each other and determine the result by selecting the maximum correlation

Notch filters and harmonic noise in matlab

So basically, my problem is that I have a speech signal in .wav format that is corrupted by a harmonic noise source at some frequency. My goal is to identify the frequency at which this noise occurs, and use a notch filter to remove said noise. So far, I have read the speech signal into matlab using:
[data, Fs] = wavread('signal.wav');
My question is how can I identify the frequency at which the harmonic noise is occurring, and once I've done that, how can I go about implementing a notch filter at that frequency?
NOTE: I do not have access to the iirnotch() command or fdesign.notch() due to the version of MATLAB I am currently using (2010).
The general procedure would be to analyse the spectrum, to identify the frequency in question, then design a filter around that frequency. For most real applications it's all a bit woolly: the frequencies move around and there's no easy way to distinguish noise from signal, so you have to use clever techniques and a bit of guesswork. However if you know you have a monotonic corruption then, yes, an FFT and a notch filter will probably do the trick.
You can analyse the signal with fft and design a filter with, among others, fir1, which I believe is part of the signal processing toolbox. If you don't have the signal processing toolbox you can do it 'by hand', as in transform to the frequency domain, remove the frequency(ies) you don't want (by zeroing the relevant elements of the frequency vector) and transform back to time domain. There's a tutorial on exactly that here.
The fft and fir1 functions are well documented: search the Mathworks site to get code examples to get you up and running.
To add to/ammend xenoclast's answer, filtering in the frequency domain may or may not work for you. There are many thorny issues with filtering in the frequency domain, some of which are covered here: http://blog.bjornroche.com/2012/08/why-eq-is-done-in-time-domain.html
One additional issue is that if you try to process your entire file at once, the "width" or Q of the filters will depend on the length of your file. This might work out for you, or it might not. If you have many files of different lengths, don't expect similar results this way.
To design your own IIR notch filter, you could use the RBJ audio filter cookbook. If you need help, I wrote up a tutorial here:
http://blog.bjornroche.com/2012/08/basic-audio-eqs.html
My tutorial uses bell/peaking filter, but it's easy to follow that and then replace it with a notch filter from RBJ.
One final note: assuming this is actually an audio signal in your .wav file, you can also use your ears to find and fix the problem frequencies:
Open the file in an audio editing program that lets you adjust filter settings in real-time (I am not sure if audacity lets you do this, but probably).
Use a "boost" or "parametric" filter set to a high gain and sweep the frequency setting until you hear the noise accentuated the most.
replace the boost filter with a notch filter of the same frequency. You may need to tweak the width to trade off noise elimination vs. signal preservation.
repete as needed (due to the many harmonics).
save the resulting file.
Of course, some audio editing apps have built-in harmonic noise reduction features that work especially well for 50/60 Hz noise.

Beat extraction in MATLAB

I have no experience in MATLAB and unfortunately my project is in MATLAB.
Basically the objective is to read a music source (preferably in mp3 format but .wav is also OK) into MATLAB and then apply a low pass filter in such a way that it filters everything except the beats. Then it should get the time at which each beat occurs and write the results to a text file.
It's quite a bit easier to work with .wav files I think, although Matlab way well have utilities for such things, in fact it does: Reading .wav
The easiest way to implement a low pass filter is a moving average filter.
The simplest way to do this would be be to loop over the data and take an average of each group of n values. I'm not sure exactly how the cutoff frequency would depend on n, but you could experiment a bit.
Otherwise, I know that there is a signal processing toolkit for Octave and I think that Matlab has a built-in filter function: https://ccrma.stanford.edu/~jos/fp/Matlab_Filter_Implementation.html
A third way which is over the top, would be to perform an FFT and do the filtering in the frequency domain.
Once you have the low-frequency part of the signal you can check for samples that are above an amplitude threshold and output where in the data these were found.
30 seconds on google with the keywords "beat extraction matlab" yield the following two code sources:
Music Audio Tempo Estimation and Beat Tracking
Beat This A Beat Synchronization Project
In Matlab you can use and state of the art Multi Feature beat tracker algorithm, the information of the algorithm is publish here:
J.R. Zapata, M. Davies and E. Gómez, "Multi-feature beat tracker," IEEE/ACM Transactions on Audio, Speech and Language Processing. 22(4), pp. 816-825, 2014. http://dx.doi.org/10.1109/TASLP.2014.2305252
The Matlab implementation of the multifeature beat tracker is:
https://github.com/JoseRZapata/MultiFeatureBeatTracking

Getting frequency of sound on iPhone

I'm looking for an Objective-C class that allows me to get the frequency of a live input sound on the iPhone. Didn't find anything useful.
Before you ask: the frequency will not change for 0.1 seconds.
Thanks for answers,
Christian
Following is the library used to fid the frequency..
A lib to find the frequency https://github.com/jkells/sc_listener
A example using the above library https://github.com/jkells/sc_listener_sample
Hope it helps some once..
Do you mean frequency or pitch ? If it's just a pure sinusoid and you want the frequency then you can just measure the time between zero crossings (crude) or use an FFT to get the power spectrum and then find the peak (more complex, more reliable).