Lip Reading classification on LiLiR dataset in matlab - matlab

i work on lip reading but i am a newbie.
after googling, i found that one of the data set of lip reading is LiLir dataset. now i downloaded it and i want to classify them using Support vector machine (SVM). but each letter has a matrix of data which has 4800 rows and 21 until 28 columns. i do not know what is the meaning of columns. they are features, but which features?
A1_Faye_lips = load('\data set\avletters\avletters\Lips\A1_Faye-lips.mat')
A1_Faye_lips =
vid: [4800x21 double]
siz: [60 80 21]
>>
how can i train SVM using this 2D matrix?

21 is the feature, I didn't look into the data source, so cannot tell what are those features exactly . But they are possibly the independent variables that influence the output (lip reading). Each variable is 4800*1 vector or 60*80 array.
For data training, Libsvm is a good SVM training toolbox for you.

Related

The size of the generated confusion matrix using confusionmat function is not right, why?

I am working on a traffic sign recognition code in MATLAB using Belgian Traffic Sign Dataset. This dataset can be found here.
The dataset consists of training data and test data (or evaluation data).
I resized the given images and extracted HOG features using the VL_HOG function from VL_feat library.
Then, I trained a multi class SVM using all of the signs inside the training dataset. There are 62 categories (i.e. different types of traffic signs) and 4577 frames inside the training set.
I used the fitcecoc function to obtain the classifier.
Upon training the multi-class SVM, I want to test the classifier performance using the test data and I used the predict and confusionmat functions, respectively.
For some reason, the size of the returned confusion matrix is 53 by 53 instead of 62 by 62.
Why the size of the confusion matrix is not the same as the number of categories?
Some of the folders inside the testing dataset are empty, causing MATLAB to skip those rows and columns in the confusion matrix.

Histogram of Oriented Gradients as feature vectors with vlfeat/libsvm

I am quite newbie on vlfeat library for computer vision and I have problems dealing with it. What i am trying to do is using the Histograms of Oriented Gradients (HOG) as feature vectors to classify in LIBSVM images with different dimensions.
The first issue i am dealing is the fact that vl_hog returns me a HOG matrix, not a vector. This is not a real issue because I can vectorize this matrix as it follows:
hog = vl_hog(image,cellSize);
features=hog(:);
The second problem is what's freaking me out. Because the images have different dimensions, the feature vectors also have different dimensions, so it's impossible to feed libsvm with them, or i'm wrong? can I solve this in an easier way? did i miss something?
You need to create a global representation of your local features so that you can feed your data to SVMs. One of the most popular approaches for this task is bag-of-words(features). vlfeat has an excellent demo/example for this. You can check this code from vlfeat website.
For your particular case, you need arrange your training/testing data in Caltech-101 like data directories:
Letter 1
Image 1
Image 2
Image 3
Image 4
...
Letter 2
...
Letter 3
...
Then you need to adjust following configuration settings for your case:
conf.numTrain = 15 ;
conf.numTest = 15 ;
conf.numClasses = 102 ;
This demo uses SIFT as local features, but you can change it to HOG afterwards.

How to use trained Neural Network in Matlab for classification in a real system

I have trained Feed Forward NN using Matlab Neural Network Toolbox on a dataset containing speech features and accelerometer measurements. Targetset contains two target classes for dataset: 0 and 1. The training, validation and performance are all fine and I have generated code for this network.
Now I need to use this neural network in real-time to recognize pattern when occur and generate 0 or 1 when I test a new dataset against previously trained NN. But when I issue a command:
c = sim(net, j)
Where "j" is a new dataset[24x11]; instead 0 or 1 i get this as an output (I assume I get percent of correct classification but there is no classification result itself):
c =
Columns 1 through 9
0.6274 0.6248 0.9993 0.9991 0.9994 0.9999 0.9998 0.9934 0.9996
Columns 10 through 11
0.9966 0.9963
So is there any command or a way that I can actually see classification results? Any help highly appreciated! Thanks
I'm no matlab user, but from a logical point of view, you are missing an important point:
The input to a Neural Network is a single vector, you are passing a matrix. Thus matlab thinks that you want to classify a bunch of vectors (11 in your case). So the vector that you get is the output activation for every of these 11 vectors.
The output activation is a value between 0 and 1 (I guess you are using the sigmoid), so this is perfectly normal. Your job is to get a threshold that fits your data best. You can get this threshold with cross validation on your training/test data or by just choosing one (0.5?) and see if the results are "good" and modify if needed.
NNs normally convert their output to a value within (0,1) using for example the logistic function. It's not a percentage or probability, just a relative measure of certainty. In any case this means is that you have to manually use a threshold (such as 0.5) to discriminate the two classes. Which threshold is best is tough to find because you must select the optimum trade off between precision and recall.

Time series classification MATLAB

My task is to classify time-series data with use of MATLAB and any neural-network framework.
Describing task more specifically:
Is is a problem from computer-vision field. Is is a scene boundary detection task.
Source data are 4 arrays of neighbouring frame histogram correlations from the videoflow.
Based on this data, we have to classify this timeseries with 2 classes:
"scene break"
"no scene break"
So network input is 4 double values for each source data entry, and output is one binary value. I am going to show example of src data below:
0.997894,0.999413,0.982098,0.992164
0.998964,0.999986,0.999127,0.982068
0.993807,0.998823,0.994008,0.994299
0.225917,0.000000,0.407494,0.400424
0.881150,0.999427,0.949031,0.994918
Problem is that pattern-recogition tools from Matlab Neural Toolbox (like patternnet) threat source data like independant entrues. But I have strong belief that results will be precise only if net take decision based on the history of previous correlations.
But I also did not manage to get valid response from reccurent nets which serve time series analysis (like delaynet and narxnet).
narxnet and delaynet return lousy result and it looks like these types of networks not supposed to solve classification tasks. I am not insert any code here while it is allmost totally autogenerated with use of Matlab Neural Toolbox GUI.
I would apprecite any help. Especially, some advice which tool fits better for accomplishing my task.
I am not sure how difficult to classify this problem.
Given your sample, 4 input and 1 output feed-forward neural network is sufficient.
If you insist on using historical inputs, you simply pre-process your input d, such that
Your new input D(t) (a vector at time t) is composed of d(t) is a 1x4 vector at time t; d(t-1) is 1x4 vector at time t-1;... and d(t-k) is a 1x4 vector at time t-k.
If t-k <0, just treat it as '0'.
So you have a 1x(4(k+1)) vector as input, and 1 output.
Similar as Dan mentioned, you need to find a good k.
Speaking of the weights, I think additional pre-processing like windowing method on the input is not necessary, since neural network would be trained to assign weights to each input dimension.
It sounds a bit messy, since the neural network would consider each input dimension independently. That means you lose the information as four neighboring correlations.
One possible solution is the pre-processing extracts the neighborhood features, e.g. using mean and std as two features representative for the originals.

Matlab Question - Principal Component Analysis

I have a set of 100 observations where each observation has 45 characteristics. And each one of those observations have a label attached which I want to predict based on those 45 characteristics. So it's an input matrix with the dimension 45 x 100 and a target matrix with the dimension 1 x 100.
The thing is that I want to know how many of those 45 characteristics are relevant in my set of data, basically the principal component analysis, and I understand that I can do this with Matlab function processpca.
Could you please tell me how can I do this? Suppose that the input matrix is x with 45 rows and 100 columns and y is a vector with 100 elements.
Assuming that you want to construct a model of the 1x100 vector, based on the 45x100 matrix, I am not convinced that PCA will do what you think. PCA can be used to select variables for model estimation, but this is a somewhat indirect way to gather a set of model features. Anyway, I suggest reading both:
Principal Components Analysis
and...
Putting PCA to Work
...both of which provide code in MATLAB not requiring any Toolboxes.
Have you tried COEFF = princomp(x)?
COEFF = princomp(X) performs principal
components analysis (PCA) on the
n-by-p data matrix X, and returns the
principal component coefficients, also
known as loadings. Rows of X
correspond to observations, columns to
variables. COEFF is a p-by-p matrix,
each column containing coefficients
for one principal component. The
columns are in order of decreasing
component variance.
From your question I deduced you don't need to do it in MATLAB, but you just want to analyze your dataset. According to my opinion the key is visualization of the dependencies.
If you're not forced to do the analysis in MATLAB I'd suggest you try more specialized software something like WEKA (www.cs.waikato.ac.nz/ml/weka/) or RapidMiner (rapid-i.com). Both tools can provide PCA and other dimension reduction algorithms + they contain nice visualization tools.
Your use case sounds like a combination of Classification and Feature Selection.
Statistics Toolbox offers a lot of good capabilities in this area. The toolbox provides access to a number of classification algorithms including
Naive Bayes Classifiers Bagged
Decision Trees (aka Random Forests)
Binomial and Multinominal logistic regression
Linear Discriminant analysis
You also have a variety of options available for feature selection include
sequentialfs (forwards and backwards feature selection)
relifF
"treebagger" also supports options for feature selection and estimating variable importance.
Alternatively, you can use some of Optimization Toolbox's capabilities to write your own custom equations to estimate variable importance.
A couple monthes back, I did a webinar for The MathWorks titled "Compuational Statistics: Getting Started with Classification using MTALAB". You can watch the Webinar at
http://www.mathworks.com/company/events/webinars/wbnr51468.html?id=51468&p1=772996255&p2=772996273
The code and the data set for the examples is available at MATLAB Central
http://www.mathworks.com/matlabcentral/fileexchange/28770
With all this said and done, many people using Principal Component Analysis as a pre-processing step before applying classification algorithms. PCA gets used alot
When you need to extract features from images
When you're worried about multicollinearity
You should find correlation matrix. in the following example matlab finds correlation matrix with 'corr' function
http://www.mathworks.com/help/stats/feature-transformation.html#f75476