Speech Recognition using LPC and ANN in Matlab - matlab

I have audio records of 4 phonemes (a, e, o, u) from 11 people. I trained an ANN using the data from 10 people, and used the other set for testing. I used 14 LPC coefficients of the first period (20ms) of records as features.
The training matrix I has 14 rows and 10 columns for each phoneme. So it is 14*40. Since it is a supervised classification problem, I constructed a target matrix T which is 4*40. It contains ones and zeros where a 1 indicates that the corresponding column in I is from that class.
The test data matrix contains four columns and 14 rows as it contains 4 phonemes from only one person. Let us call it S.
Here is the code:
net = newff(I, T, 15);
net = init(net);
net.trainParam.epochs = 10000;
net.trainParam.goal = 0.01;
net = train(net, I, T);
y1 = sim(net, I);
y2 = sim(net, S)
The results are not good even I give the training data as test data (y1).
What is wrong here?

I used 14 LPC coefficients of the first period (20ms) of records as features.
So did you ignore almost all the sound data except first 20ms? It doesn't sound right. You must have calculate an average over all frames at least.
What is wrong here?
You started coding without understanding a theory. Probably you want to read some introduction first. At least this and ideally this
To understand why ANN doesn't work calculate how many parameters are required to map 10 features to 4 classes, then calculate how many training vectors do you have for every parameter. Take into account that for every parameter you need at least 10 samples for initial estimation. That means your training data is not enough.

Related

Distance Calculations for Nearest Mean Classifer

Greetins,
How can I calculate how many distance calculations would need to be performed to classify the IRIS dataset using Nearest Mean Classifier.
I know that IRIS dataset has 4 features and every record is classified according to 3 different labels.
According to some textbooks, the calculation can be carried out as follow:
However, I am lost on these different notations and what does this equation mean. For example, what is s^2 is in the equation?
The notation is standard with most machine learning textbooks. s in this case is the sample standard deviation for the training set. It is quite common to assume that each class has the same standard deviation, which is why every class is assigned the same value.
However you shouldn't be paying attention to that. The most important point is when the priors are equal. This is a fair assumption which means that you expect that the distribution of each class in your dataset are roughly equal. By doing this, the classifier simply boils down to finding the smallest distance from a training sample x to each of the other classes represented by their mean vectors.
How you'd compute this is quite simple. In your training set, you have a set of training examples with each example belonging to a particular class. For the case of the iris dataset, you have three classes. You find the mean feature vector for each class, which would be stored as m1, m2 and m3 respectively. After, to classify a new feature vector, simply find the smallest distance from this vector to each of the mean vectors. Whichever one has the smallest distance is the class you'd assign.
Since you chose MATLAB as the language, allow me to demonstrate with the actual iris dataset.
load fisheriris; % Load iris dataset
[~,~,id] = unique(species); % Assign for each example a unique ID
means = zeros(3, 4); % Store the mean vectors for each class
for i = 1 : 3 % Find the mean vectors per class
means(i,:) = mean(meas(id == i, :), 1); % Find the mean vector for class 1
end
x = meas(10, :); % Choose a random row from the dataset
% Determine which class has the smallest distance and thus figure out the class
[~,c] = min(sum(bsxfun(#minus, x, means).^2, 2));
The code is fairly straight forward. Load in the dataset and since the labels are in a cell array, it's handy to create a new set of labels that are enumerated as 1, 2 and 3 so that it's easy to isolate out the training examples per class and compute their mean vectors. That's what's happening in the for loop. Once that's done, I choose a random data point from the training set then compute the distance from this point to each of the mean vectors. We choose the class that gives us the smallest distance.
If you wanted to do this for the entire dataset, you can but that will require some permutation of the dimensions to do so.
data = permute(meas, [1 3 2]);
means_p = permute(means, [3 1 2]);
P = sum(bsxfun(#minus, data, means_p).^2, 3);
[~,c] = min(P, [], 2);
data and means_p are the transformed features and mean vectors in a way that is a 3D matrix with a singleton dimension. The third line of code computes the distances vectorized so that it finally generates a 2D matrix with each row i calculating the distance from the training example i to each of the mean vectors. We finally find the class with the smallest distance for each example.
To get a sense of the accuracy, we can simply compute the fraction of the total number of times we classified correctly:
>> sum(c == id) / numel(id)
ans =
0.9267
With this simple nearest mean classifier, we have an accuracy of 92.67%... not bad, but you can do better. Finally, to answer your question, you would need K * d distance calculations, with K being the number of examples and d being the number of classes. You can clearly see that this is required by examining the logic and code above.

Number of parameters in GMM-HMM

I want to understand the use of Gaussian Mixture Models in Hidden Markov Models.
Suppose, we have speech data and we are recognizing 5 speech sounds (which are states of HMM). For example 'X' be the speech sample and O = (s,u,h,b,a) (considering characters instead of phones just for simplicity) be HMM states. Now, we use gaussian mixture model of 3 mixtures to estimate gaussian density for every state using the following equation (sorry cannot upload image because of reputation points).
P(X|O) = sum (i=1->3) w(i) * P (X|mu(i), var(i)) (considering univariate distribution)
So, we first learn the GMM parameters from the training data using EM algorithm.
Then use these parameters for learning HMM parameters and once this is done, we use both of them on test data.
In all we are learning 3 * 3 * 5 (weight, mean and variance for 3 mixtures and 5 states) parameters for GMM in this example.
Is my understanding correct?
Your understanding is mostly correct, however, the number of parameters is usually larger. The mean and variance are vectors, not numbers. Variance could be matrix for rare case of full covariance GMM. Each vector usually contains 39 components for 13 cepstrum + 13 deltas + 13 delta-deltas.
So for every phone you learn
39 + 39 + 1 = 79 parameters
Total number of parameters is
79 * 5 = 395
And, usually phone is composed of 3 or so states, not from a single state. So you have 395 * 3 or 1185 parameters just for GMM. Then you need a transition matrix for HMM. Number of parameters is large thats why training requires a lot of data.

Feedforward neural network classification in Matlab

I have two gaussian distribution samples, one guassian contains 10,000 samples and the other gaussian also contains 10,000 samples, I would like to train a feed-forward neural network with these samples but I dont know how many samples I have to take in order to get an optimal decision boundary.
Here is the code but I dont know exactly the solution and the output are weirds.
x1 = -49:1:50;
x2 = -49:1:50;
[X1, X2] = meshgrid(x1, x2);
Gaussian1 = mvnpdf([X1(:) X2(:)], mean1, var1);// for class A
Gaussian2 = mvnpdf([X1(:) X2(:)], mean2, var2);// for Class B
net = feedforwardnet(10);
G1 = reshape(Gaussian1, 10000,1);
G2 = reshape(Gaussian2, 10000,1);
input = [G1, G2];
output = [0, 1];
net = train(net, input, output);
When I ran the code it give me weird results.
If the code is not correct, can someone please suggest me so that I can get a decision boundary for these two distributions.
I'm pretty sure that the input must be the Gaussian distribution (and not the x coordinates). In fact the NN has to understand the relationship between the phenomenons themselves that you are interested (the Gaussian distributions) and the output labels, and not between the space in which are contained the phenomenons and the labels. Moreover, If you choose the x coordinates, the NN will try to understand some relationship between the latter and the output labels, but the x are something of potentially constant (i.e., the input data might be even all the same, because you can have very different Gaussian distribution in the same range of the x coordinates only varying the mean and the variance). Thus the NN will end up being confused, because the same input data might have more output labels (and you don't want that this thing happens!!!).
I hope I was helpful.
P.S.: for doubt's sake I have to tell you that the NN doesn't fit very well the data if you have a small training set. Moreover don't forget to validate your data model using the cross-validation technique (a good rule of thumb is to use a 20% of your training set for the cross-validation set and another 20% of the same training set for the test set and thus to use only the remaining 60% of your training set to train your model).

KNN Classifier for simple digit recognition

Actually , i have an assignment where it is required to recognize individual decimal digits as a part of the text recognition process.I am already given a set of JPEG formatted images of some digits. Each image is of size 160 x 160 pixels.After checking some resources here i managed to write this code but :
1)I am not sure if reading the images and resizing them in matrices for holding them is right or not.
2)Supposing that i have 30 train data images for numbers [0-9] each number has three images and i have 10 images for test each image is of only one digit.How to calculate distance between every test and train in a loop ? Because in my part of code for calculating Euclidean it gives an output zero.
3)How to calculate accuracy by using confusion matrix ?
% number of train data
Train = 30;
%number of test data
Test =10;
% to store my images
tData = uint8(zeros(160,160,30));
tTest = uint8(zeros(160,160,10));
for k=1:Test
s1='im-';
s2=num2str(k);
t = strcat('testy/im-',num2str(k),'.jpg');
im=rgb2gray(imread(t));
I=imresize(im,[160 160]);
tTest(:,:,k)=I;
%case testing if it belongs to zero
for l=1:3
ss1='zero-';
ss2=num2str(l);
t1 = strcat('data/zero-',num2str(l),'.jpg');
im1=rgb2gray(imread(t1));
I1=imresize(im1,[160 160]);
tData(:,:,l)=I1;
% Euclidean distance
distance= sqrt(sum(bsxfun(#minus, tData(:,:,k), tTest(:,:,l)).^2, 2));
[d,index] = sort(distance);
%k=3
% index_close(l) = index(l:3);
%x_close = I(index_close,:);
end
end
First of all i think 10 test data is not enough.
Just use the below function, data_test is your training data() and data_label is their labels. re size your images to smaller sizes!
I think the default distance measure is Euclidean distance but you can choose other ways such as City-block method for example.
Class = knnclassify(data_test, data_train, lab_train, 11);
fprintf('11-NN Accuracy: %f\n', sum(Class == lab_test')/length(lab_test));
Class = knnclassify(data_test, data_train, lab_train, 1, 'cityblock');
fprintf('1-NN Accuracy (cityblock): %f\n', sum(Class == lab_test')/length(lab_test));
Ok now you have the overall accuracy but this is not a good measure, it's better to calculate the accuracy separately for each class and then calculate their mean.
you can calculate a specific class (id) accuracy like this,,
idLocations = (lab_test == id);
NumberOfId = sum(idLocations);
NumberOfCurrect =sum (lab_test (idLocations) == Class(idLocations));
NumberOfCurrect/NumberOfId %Class id accuracy
as your questions are:
1) image re-sizing does affects the accuracy of the whole process.
Ans: As you mentioned in your question your images are already of the size 160 by 160, imresize will not affect it, but if your image is too small in size say 60*60 it will perform interpolation to increase the spatial dimensions of the image, which may affects structure and shape of the digit, to tackle these kind of variability, your training data should have much more samples(at least 50 samples per class), and some pre-processing should be apply on data like de-skewing of the digit image.
2) euclidean distance is good measure but not the best to deal with these kind of problems, as its distribution is a spherical distribution it may give same distance for to different digits. if you are working in MATLAB beware of of variable casting, you are taking difference so both the variable should be double in nature. it may be one of the reason of wrong distance calculation. in this line distance= sqrt(sum(bsxfun(#minus, tData(:,:,k), tTest(:,:,l)).^2, 2)); you are summing matrices column wise so output of this will be a row vector(1 X 160) which have sum along each corner. i think it should be like this:distance= sqrt(sum(sum(bsxfun(#minus, tData(:,:,k), tTest(:,:,l)).^2, 2))); i have just added one more sum there for getting sum of differences for whole matrix try it whether it helps or not.
3) For checking accuracy of your classifier precisely you have to have a large training dataset,by the way, Confusion matrix created during the process of cross-validation, where you split your training data into training samples and testing samples, so you know output classes in both the sample, now perform classification process, prepare a matrix for num_classe X num_classes(in your case 10 X 10), where rows resembles actual classes and columns belongs to prediction. take a sample from test and predict output class, suppose your classifier predict 5 and sample's actual class is also 5 put +1 in the confusion_matrix(5,5); if your classifier have predicted it as 3, you should do +1 at confusion_matrix(5,3). finally add diagonal elements of the confusion_mat and divide it by the total number of the test samples. output will be accuracy of your classifier.
P.S. Try to have atleast 50 samples per class and during cross-validation divide the training data 85:10 ratio where 90% sample should be used for training and rest 10 % should be used for testing the classifier.
Hope it have helps you.
feel free to share your thoughts.
Thank You

sequence prediction using HMM Matlab

I'm currently learning the murphyk's toolbox for Hidden Markov's Model, However I'v a problem of determining my model's coefficients and also the algorithm for the sequence prediction by log likelihood.
My Scenario:
I have the flying bird's trajectory in 3D-space i.e its X,Y and Z which lies in Continuous HMM's category. I'v the 200 observations of flying bird i.e 500 rows data of trajectory, and I want to predict the sequence. I want to sample that in 20 datapoints . i.e after 10 points, so my first question is, Is following parameters are valid for my case?
O = 3; %Number of coefficients in a vector
T = 20; %Number of vectors in a sequence
nex = 50; %Number of sequences
M = 2; %Number of mixtures
Q = 20; %Number of states
And the second question is, What algorithm is appropriate for sequence prediction and is training is compulsory for that?
From what I understand, I'm assuming you're training 200 different classes (HMMs) and each class has 500 training examples (observation sequences).
O is the dimensionality of vectors, seems to be correct.
There is no need to have a fixed T, it depends on the observation sequences you have.
M is the number of multivariate Gaussians (or mixtures) in the GMM of a state. More will fit to your data better and give you better accuracy, but at the cost of performance. Choose a suitable value.
N does not need to be equal to T. For the best number of states N, you'll have to benchmark and see yourself:
Determinig the number of hidden states in a Hidden Markov Model
Yes, you have to train your classes using the Baum-Welch algorithm, optionally preceded by something like the segmental k-means procedure. After that you can easily perform isolated unit recognition using Forward/Backward probability or Viterbi probability by simply selecting the class with the highest probability.