How to submit MFCCs to the input algorithm? - mfcc

I want to train the GMM with the help of MFCC.
I have 588 audio files (wav, if it's important). After extracting the features, I get a set of 588 two-dimensional arrays (13x?). Each file has a different number of columns.
And how to submit MFCCs to the input algorithm?

You can compute the length of columns. Make the longest one as reference.
Then traverse the mfccs.
If the number of columns is smaller the the reference,Pad the mfccs with zeros.
After doing that, the mfccs are in same shape. Then the mfccs can be fed to the model.

Related

How to train 10 digits number sequence as input data?

My input data is a set of numbers and an output number is predicted for each input number. input numbers are independent of the other and the output number is predicted based on how the input number digits are positioned.
This is a one-by-one mapping.
How do I do this?
Do I need neural networks or is there a simpler way?

how to normalize fft values for neural networks

I calculate the fft for a given soundfile and get an array of the shape e.g. (100,257) with 100 rows and 257 frequency bins. I want to use this as an input vector for a neural network but before I want to normalize with librosa lib
https://librosa.github.io/librosa/generated/librosa.util.normalize.html#librosa.util.normalize
so should I normalize over axis=0 or axis=1? axis=0 normalizes the columns aggregated over the rows and axis=1 normalizes every row or should I normalize over every value independent of rows and columns?
The way how you normalize the fft depends on your application and the final performance. There isn't a general normalization scheme.
In one of my application, I didn't normalize and input the raw fft to the neural network. One common way to normalize is taking the logarithm. This operation can reduce the dynamic range.

K-means clustering

I want to use K-means clustering for my features which are of size 286 x 276 , So I can do clustering before using SVM. These features are of 16 different gestures. I am using MATLAB function IDX=kmeans(Feat_train,16). In IDX variable I am getting vector of size 286x1 in which their is numbers in between 1-16 randomly. I am not understanding what that number shows and what I have to next for giving input to SVM for training.
The way you invoked kmeans in Matlab with your 286-by-276 feature matrix, kmeans assume you have 286 1D vectors in a 276-dimensional space. kmeans then tries to find k=16 centers best representing your 286 high-dimensional points.
Finally, it gives back IDX: an index per point telling you to which of the 16 centers this point belongs to.
It is now up to you to decide how to feed this information into an SVM machinery.
The number shows which cluster does each 1x276 "point" belongs to.

When the input matrix is supposed to be: "The rows of X correspond to observations, and columns correspond to variables."?

I'm not getting the correct results from the Matlab function so maybe my data arrangement is wrong. I looked at the help file of the function I am using and the input, "X" that it takes must be in the form.
The rows of X correspond to observations, and columns correspond to
variables.
I am sorry if this is very basic but how exactly should my input matrix be arranged?
I have 5 writers, each have a feature vector of length 18 (for example for the sake of simplicity).
So I assumed that by observations it is meant the different features of the same writer and variables mean the writers, so I arranged the input matrix as [18 x 5] where each column is a writer.
This example is simple. What of in the case of SIFT features? where each writer will produce a feature matrix [128 x num. of keypoints] which usually becomes [128 x 70] for one image. So if I want to concatenate all of them into the input matrix my input matrix will become [128 x 350].
Will this just be the input matrix X? Then in the case of SIFT each variable in 70 columns wide.
Thank you in advance
If all of your writers data have different size, I suggest you to use cell() which is cell array. http://www.mathworks.com/help/matlab/cell-arrays.html - here is your reference. So for example if you need to calculate covariance you can do it for each matrix separately. Then your covariance matrices will be same size(128*128) so you can put them together and have your 3D matrix data.
Hope it will help you.

Making feature vector from Gabor filters for classification

My aim is to classify types of cars (Sedans,SUV,Hatchbacks) and earlier I was using corner features for classification but it didn't work out very well so now I am trying Gabor features.
code from here
Now the features are extracted and suppose when I give an image as input then for 5 scales and 8 orientations I get 2 [1x40] matrices.
1. 40 columns of squared Energy.
2. 40 colums of mean Amplitude.
Problem is I want to use these two matrices for classification and I have about 230 images of 3 classes (SUV,sedan,hatchback).
I do not know how to create a [N x 230] matrix which can be taken as vInputs by the neural netowrk in matlab.(where N be the total features of one image).
My question:
How to create a one dimensional image vector from the 2 [1x40] matrices for one image.(should I append the mean Amplitude to square energy matrix to get a [1x80] matrix or something else?)
Should I be using these gabor features for my purpose of classification in first place? if not then what?
Thanks in advance
In general, there is nothing to think about - simple neural network requires one dimensional feature vector and does not care about the ordering, so you can simply concatenate any number of feature vectors into one (and even do it in random order - it does not matter). In particular if you have same feature matrices you also concatenate each of its row to create a vectorized format.
The only exception is when your data actually has some underlying geometrical dependicies, for example - matrix is actualy a pixels matrix. In such case architectures like PyraNet, Convolutional Neural Networks and others, which apply some kind of receptive fields based on this 2d structure - should be better. But those implementations simply accept 2d feature vector as an input.