audio pattern matching in matlab - matlab

Can someone please give me an idea about this problem in matlab ,
I have 4 .wav files that contain the chirping of the birds . Each .wav file represents a different bird. Given an input .wav file , I need to decide which bird it is . I know I have to make frequency spectrum comparison to get to the solution . but don't quite know how i should use spectrogram to help me get there .
P.S. I know how what spectrogram does and have plotted quite a few .wav files with it though

There are several methods for patter recognition problem like the one that you are talking.
You can use a frequency analysis like FFT with the matlab function
S = SPECTROGRAM(X,WINDOW,NOVERLAP)
In SPECTROGRAM you need to define the time window of signal to be analysed in the variable WINDOW. You can use a rectangular window (example WINDOW = [1 1 1 1 1 1 1 ... 1]) with the number of values equal to the desired length. There are a lot of windows to use: hanning, hamming, blackman. You should used the one which is better to your problem. The NOVERLAP is the number of points that your windows moves in one step.
Besides this approach, wavelet transform is also a good technique to solve your problem. Matlab also have a good toolbox to apply discrete and continuous wavelets.

You might try to solve the problem with Deep Belief Networks
Here are some articles that might be helpful:
Audio Feature Extraction with Restricted Boltzmann Machines
Unsupervised feature learning for audio classification
To summarize the idea, instead of manually tune the features, employ RBMs or Autoencoder to extract features (bases) which represent the observed audio samples and then run learning algorithm.
You will need more than 4 audio samples in order to train DBN but it is worth trying as the approach has shown promising results in the past.
This tuturial might be also helpful.

this may prove to be a complicated problem. As a starting point I advise you to divide each record into some fixed length of frames like 20ms with 10ms overlap ,then extract fft of these frames and get some max energy freq. values for each frame. As a last step compare frame frequencies with each other and determine the result by selecting the maximum correlation

Related

Suggestions for defining limits of frequencies in Matlab Spectrogram function?

I am new to Matlab and signal processing. I am having an issue with defining the frequency range in which the spectrogram is processed. When I am plotting the spectrogram of .wav audio data, the y axis, frequency, spans from zero to around 23 kHz. The useful data I am looking for is in the range of 200-400 Hz. My code snippet is:
[samFa, fs] = audioread('samFa.wav'); %convert audio to numerical data
samFa = samFa(:,1); %take only one channel of numerical output
spectrogram(samFA,2205,1200,12800, fs,'yaxis','MinThreshold',-80);
I don't want to be some noobie that runs into a problem and instantly gives up and posts a duplicate question to stackoverflow, so I have done as much digging as I can, but am at my wit's end.
I scoured the documentation for parameters or ways to have Matlab only analyze a subset or range of the data, but found nothing. Additionally, in all of the examples the frequency range seems to automatically adapt to the data set.
I know it is possible to just calculate the spectrogram for the entire range of frequencies, and then remove all of the unnecessary data through truncating or manually changing the limits in the plot itself, but changing plotting limits does not help with the numerical data.
I went searching through many similar questions, and found an answer all the way from 2012 here: Can I adjust spectogram frequency axes?
where the suggested answer was to import a vector of specific frequencies for the spectrogram to analyze. I tried passing a vector of integer values between 200 and 400, and a few other test ranges, but got the error:
Error using welchparse>welch_options (line 297)
The sampling frequency must be a scalar.
I've tried passing the parameter in at different places in the function, with no avail, and don't see anything regarding this parameter in the documentation, leading me to believe that this functionality was possibly removed sometime between 2012 and now.
When plotting spectrogram without providing signal frequency, Matlab provides a normalized spectrogram, which only provides a much smaller data window, which I can visually assess to cover the data from 0:5kHz (an artifact of overtones in the audio), so I know that matlab is not finding any data above this range to make the frequency range go to 20kHz
I've been trying to learn some signal processing for this project, so I believe the Nyquist frequency should be the maximum frequency that a Fourier transform is able to analyze, to be half the sampling frequency. My recording frequency is sampling at 44,100 Hz, and the spectrogram is ranging to around 22 or 23 kHz, leading me to believe that it's Matlab is noticing my sampling frequency and assuming that it needs to analyze up to such a high range.
For my work I am doing I am needing to produce thousands of spectrograms to then be processed through much further analysis, so it is very time consuming for Matlab to be processing so much unecessary data, and I would expect there to be some functionality in Matlab somehow to get around this.
Sorry for the very long post, but I wanted to fully explain my problem and show that I have done as much work as I could to solve the problem before turning for help. Thank you very much.
Get the axis handle and set the visual range there:
spectrogram(samFA,2205,1200,12800, fs,'yaxis','MinThreshold',-80);
ax=gca;
ylim(ax, [0.2,0.4]); %kHz
And if you want to calculate specific frequencies range to save time you better use goertzel.
f = 200:10:400;
freq_indices = round(f/fs*N) + 1;
dft_data = goertzel(data,freq_indices);

Modifying Sound Input to Determine Frequency

I'm working on a project and I've hit a snag that is past my understanding. My goal is to create an artificial neural network which is fed information from a sound file which is then ported through the system, resulting in a labeling of the chord. I'm hoping to make this to help in music transcription -- not to actually do the transcription itself, but to help in the harmonization aspect. I digress.
I've read as much as I can on the Goertzel and the FFT function, but I'm unsure if these functions are what I'm looking for. I'm not looking for any particular frequency in the sound sample, but rather, I'm hoping to find the higher, middle, and low range frequencies of the sample.
I know the Goertzel algorithm returns a high number if a particular frequency is found, but it seems computational wasteful to run the algorithm for all possible tones in a given sample. Any ideas on what to use?
Or, if this is impossible, I'd love to know that too before spending too much time on this one project.
Thank you for your time!
Probably better suited to DSP StackExchange.
Suppose you FFT a single 110Hz tone to get a spectrogram; you'll see evenly spaced peaks at 110 220 330 etc Hz -- the harmonics. 110 is the fundamental.
Suppose you have 3 tones. Already it's going to look quite messy in the frequency domain. Especially if you have a chord containing e.g. A110 and A220.
On account of this, I think a neural network is a good approach.
Feed in FFT output.
It would be a good idea to use a neural network that accepts complex valued inputs, as FFT outputs of a complex number for each frequency bin.
http://www.eagle.tamut.edu/faculty/igor/PRESENTATIONS/IJCNN-0813_Tutorial.pdf
It may seem computationally wasteful to extract so many frequencies with FFT, but FFT algorithms are extremely efficient nowadays. You should probably use a bit strength of 10, so 2^10 inputs -> 2^9 = 512 complex bins.
FFT is the right solution. Basically, when you have the FFT of an input signal that consists only of sinus waves, you can determine the chord by just mapping which frequencys are present to specific tones in whichever musical temperament you want to use, then look up the chord specified by those tones. If you don't have sinus-waves as input, then using a neural network is a valid attempt in solving the problem, provided that you have enough samples to train it.
FFT is the right way. Harmonics don't bother you, since they are an integer multiple of the fundamental frequency they're just higher 'octaves' of the same note. And to recognize a chord, tranpositions of notes over whole octaves don't matter.

3D SIFT for human activity classification in videos. NOT GETTING GOOD ACCURACY.

I am trying to classify human activities in videos(six classes and almost 100 videos per class, 6*100=600 videos). I am using 3D SIFT(both xy and t scale=1) from UCF.
for f= 1:20
f
offset = 0;
c=strcat('running',num2str(f),'.mat');
load(c)
pix=video3Dm;
% Generate descriptors at locations given by subs matrix
for i=1:100
reRun = 1;
while reRun == 1
loc = subs(i+offset,:);
fprintf(1,'Calculating keypoint at location (%d, %d, %d)\n',loc);
% Create a 3DSIFT descriptor at the given location
[keys{i} reRun] = Create_Descriptor(pix,1,1,loc(1),loc(2),loc(3));
if reRun == 1
offset = offset + 1;
end
end
end
fprintf(1,'\nFinished...\n%d points thrown out do to poor descriptive ability.\n',offset);
for t1=1:20
des(t1+((f-1)*100),:)=keys{1,t1}.ivec;
end
f
end
My approach is to first get 50 descriptors(of 640 dimension) for one video, and then perform bag of words with all descriptors(on 50*600= 30000 descriptors). After performing Kmeans(with 1000 k value)
idx1000=kmeans(double(total_des),1000);
I am getting 30k of length index vector. Then I am creating histogram signature of each video based on their index values in clusters. Then perform svmtrain(sum in matlab) on signetures(dim-600*1000).
Some potential problems-
1-I am generating random 300 points in 3D to calculate 50 descriptors on any 50 points from those points 300 points.
2- xy, and time scale values, by default they are "1".
3-Cluster numbers, I am not sure that k=1000 is enough for 30000x640 data.
4-svmtrain, I am using this matlab library.
NOTE: Everything is on MATLAB.
Your basic setup seems correct especially given that you are getting 85-95% accuracy. Now, it's just a matter of tuning your procedure. Unfortunately, there is no way to do this other than testing a variety of parameters examining the results and repeating. I going to break this answer into two parts. Advice about bag of words features, and advice about SVM classifiers.
Tuning Bag of Words Features
You are using 50 3D SIFT Features per video from randomly selected points with a vocabulary of 1000 visual words. As you've already mentioned, the size of the vocabulary is one parameter you can adjust. So is the number of descriptors per video.
Let's say that each video is 60 frames long, (at 30 fps only 2 sec, but let's assume you are sampling at 1fps for a 1 minute video). That means you are capturing less than one descriptor per frame. That seems very low to me even with 3D descriptors especially if the locations are randomly chosen.
I would manually examine the points for which you are generating features. Do they appear be well distributed in both space and time? Are you capturing too much background? Ask yourself, would I be able to distinguish between actions given these features?
If you find that many of the selected points are uninformative, increasing the number of points may help. The kmeans clustering can make a few groups for uninformative outliers, and more points means you hopefully capture a few more informative points. You can also try other methods for selecting points. For example, you could use corner points.
You can also manually examine the points that are clustered together. What sorts of structures do the groups have in common? Are the clusters too mixed? That's usually a sign that you need a larger vocabulary.
Tuning SVMs
Using the Matlab SVM implementation or the Libsvm implementation should not make a difference. They are both the same method and have similar tuning options.
First off, you should really be using cross-validation to tune the SVM to avoid overfitting on your test set.
The most powerful parameter for the SVM is the kernel choice. In Matlab, there are five built in kernel options, and you can also define your own. The kernels also have parameters of their own. For example, the gaussian kernel has a scaling factor, sigma. Typically, you start off with a simple kernel and compare to more complex kernels. For example, start with linear, then test quadratic, cubic and gaussian. To compare, you can simply look at your mean cross-validation accuracy.
At this point, the last option is to look at individual instances that are misclassified and try to identify reasons that they may be more difficult than others. Are there commonalities such as occlusion? Also look directly at the visual words that were selected for these instances. You may find something you overlooked when you were tuning your features.
Good luck!

MATLAB, averaging multiple fft's ,coherent integration

I have audio record.
I want to detect sinusoidal pattern.
If i do regular fft i have result with bad SNR.
for example
my signal contents 4 high frequencies:
fft result:
To reduce noise i want to do Coherent integration as described in this article: http://flylib.com/books/en/2.729.1.109/1/
but i cant find any MATLAB examples how to do it. Sorry for bad english. Please help )
I look at spectra almost every day, but I never heard of 'coherent integration' as a method to calculate one. As also mentioned by Jason, coherent integration would only work when your signal has a fixed phase during every FFT you average over.
It is more likely that you want to do what the article calls 'incoherent integration'. This is more commonly known as calculating a periodogram (or Welch's method, a slightly better variant), in which you average the squared absolute value of the individual FFTs to obtain a power-spectral-density. To calculate a PSD in the correct way, you need to pay attention to some details, like applying a suitable Fourier window before doing each FFT, doing the proper normalization (so that the result is properly calibrated in i.e. Volt^2/Hz) and using half-overlapping windows to make use of all your data. All of this is implemented in Matlab's pwelch function, which is part of the signal-processing toolbox. See my answer to a similar question about how to use pwelch.
Integration or averaging of FFT frames just amounts to adding the frames up element-wise and dividing by the number of frames. Since MATLAB provides vector operations, you can just add the frames with the + operator.
coh_avg = (frame1 + frame2 + ...) / Nframes
Where frameX are the complex FFT output frames.
If you want to do non-coherent averaging, you just need to take the magnitude of the complex elements before adding the frames together.
noncoh_avg = (abs(frame1) + abs(frame2) + ...) / Nframes
Also note that in order for coherent averaging to work the best, the starting phase of the signal of interest needs to be the same for each FFT frame. Otherwise, the FFT bin with the signal may add in such a way that the amplitudes cancel out. This is usually a tough requirement to ensure without some knowledge of the signal or some external triggering so it is more common to use non-coherent averaging.
Non-coherent integration will not reduce the noise power, but it will increase signal to noise ratio (how the signal power compares to the noise power), which is probably what you really want anyway.
I think what you are looking for is the "spectrogram" function in Matlab, which computes the short time Fourier transform(STFT) of an input signal.
STFT
Spectrogram

Beat extraction in MATLAB

I have no experience in MATLAB and unfortunately my project is in MATLAB.
Basically the objective is to read a music source (preferably in mp3 format but .wav is also OK) into MATLAB and then apply a low pass filter in such a way that it filters everything except the beats. Then it should get the time at which each beat occurs and write the results to a text file.
It's quite a bit easier to work with .wav files I think, although Matlab way well have utilities for such things, in fact it does: Reading .wav
The easiest way to implement a low pass filter is a moving average filter.
The simplest way to do this would be be to loop over the data and take an average of each group of n values. I'm not sure exactly how the cutoff frequency would depend on n, but you could experiment a bit.
Otherwise, I know that there is a signal processing toolkit for Octave and I think that Matlab has a built-in filter function: https://ccrma.stanford.edu/~jos/fp/Matlab_Filter_Implementation.html
A third way which is over the top, would be to perform an FFT and do the filtering in the frequency domain.
Once you have the low-frequency part of the signal you can check for samples that are above an amplitude threshold and output where in the data these were found.
30 seconds on google with the keywords "beat extraction matlab" yield the following two code sources:
Music Audio Tempo Estimation and Beat Tracking
Beat This A Beat Synchronization Project
In Matlab you can use and state of the art Multi Feature beat tracker algorithm, the information of the algorithm is publish here:
J.R. Zapata, M. Davies and E. Gómez, "Multi-feature beat tracker," IEEE/ACM Transactions on Audio, Speech and Language Processing. 22(4), pp. 816-825, 2014. http://dx.doi.org/10.1109/TASLP.2014.2305252
The Matlab implementation of the multifeature beat tracker is:
https://github.com/JoseRZapata/MultiFeatureBeatTracking