Training a model for Latent-SVM - matlab

GOOD MORNING COLLEAGUES!
I am very into train a new model from my own data set of faces!
I have found no information about this topic, then I hope my information could help people and I can get some answers as well.
I will try to explain the steps I have needed to do to train my own model and later on some questions...
I have download the Latent code from: http://cs.brown.edu/~pff/latent-release4/
I have download the PASCAL VOC 2008 code (devkit) from: http://host.robots.ox.ac.uk/pascal/VOC/voc2008/index.html
I have emulate the structure of files/folders of the VOC PASCAL but in my own data set:
Annotations. I have created a .xml where I have defined a object, face, (in each image I only have one face). I didn't define difficulties or poses...
JPEGImages where I have stored all the images
ImageSets where I have defined three files:
test.txt, where I wrote the file name of my positive samples
train.txt, where I wrote the file name of my negative samples
trainval.txt, where I wrote the file name of my positive samples (exactly the same file than test.txt).
I have change some things in globals.m and VOCinit.m (to tell the algorithm the path and the location of some files...)
Then I run the training with the command: pascal('face', 1);
Following these steps I have achieved that the training run completely and doesn't fail and I get my own model BUT I have some doubts...
Can you see anything weird in my explanation? Could it work?
Must the files test.txt/trainval.txt be equal? Why... What does it mean?
Do I have to choose the number of parts I want in the model INSIDE the function?
Please, you imagine I have two kind of samples (frontal faces and side faces) and I want to detect both... How can I address this issue? I thought I have to train a model with two components... but How can I tell to the training code which are frontal or side samples?? In the annotations with the label pose?? (I don't think so...) Are there other way to handle this purpose?
Thank you for your time!!
I hope you can solve my doubts :)

I think test.txt should contain samples (images) that will be used to estimate how good the system is after learning the faces. However, trainval.txt is used during the learning stage (training) to fine-tune the parameters of the model; it is an essential part of supervised learning.
Also, it is very hard to have one single SVM to classify faces that are both frontal and sideways. Here is my suggestion:
Train one SVM to detect if the input image is a frontal face or a sideways face. Call this something like SVM-0.
Train another SVM for frontal faces. This SVM will classify all your individuals. Note, however, that SVM is usually a binary classifier, so make sure you choose the right SVM, one that as a multiclass architecture. Call this SVM-F.
Tran a final SVM for sideways faces. Again, use a multiclass SVM. Call it SVM-S.
Present the input image to SVM-0 and if it detects it is a frontal face, present the input again to SVM-F; otherwise, give the input to SVM-S.
In my experience, you should expect very low performance in SVM-S. It is a hard problem to solve. But frontal faces is not a big deal, unless you are working with faces that vary in pose, illumination, and expression (PIE). Face recognition is affected greatly with PIE variations in the images.
I recommend you this website, it contains very good information and tutorials for starters, with or without experience.

Related

Predict a number with a given image (0 to 1)

I am a total beginner to ML and Neural networks. I am currently working on a project where I have a lot of pictures stored in a MongoDB database. Each one of those pictures has a number from 0 to 1. For example "picture 1" 0.71.
I want to train my model given the database. The main goal for the project is that after the model is finished and trained, given an image the model will be able to return(predict) a number from 0 to 1. After doing some research and asking a few people I figured out some libraries that would be useful for the project are: Tenserflow and Keras. Some people told me that it is impossible, but I'm not sure therefore I came to ask here.
So my questions are: Is it possible? If so, how can I implement it? Are there any specific tools you recommend? If you specify a way that I should use for my project do I need to export my MongoDB database in a certain form? Since I am a beginner maybe there are some tutorials that you think that can help?
I'm sorry if this question is a bit too general, if there are any misunderstandings please comment and I will try to answer.
Thanks in advance!
What you want to do is totally feasible, this kind of project is called regression, since you are using images data the best type of models are called convolutional neural network (CNN), you'll need some understanding if you want to build your own model. I've done a project where I had to predict a number of bacterial colonies using an image, much like your problem except that I had no boundaries on the predicted values.
What is a CNN ? Here is a link
Basically a CNN will understand the features in the images and will use those features to predict a value.
You won't need to create your own model, most people just use well-designed one in the scientific litterature.
Go for keras, it's the easiest framework out there and work like a charm. Here is how to implement VGG16 (an architecture that is probably the best for your problem) : link
You should follow this tutorial to get going on developing with keras.
Last hint: don't use the same last layer as the one on the VGG16 implementation, use a Dense Layer with one neuron and with a sigmoid/linear/leaky relu activation.
ie:
#model.add(Dense(1000, activation='softmax'))
model.add(Dense(1, activation='sigmoid'))
This means : predict 1 number (sigmoid will bound it between 0 and 1, but maybe lrelu or linear is better)
Also, I guess you could use MongoDB to read the images as arrays, but I would just put the images on a folder.
Edit : When compiling the model, use a mean squared error as in
adam = keras.optimizers.Adam(lr=1e-4)
model.compile(optimizer=adam, loss='mse')
Here you have the "hello world program" in terms of neural networks and digits classification. You can start studying it because I think you will end up with a similar architecture for your NN. What you should focus on is the output of your model, because in this example they are performing classification on 10 classes (digits from 0 to 9) but you are trying to read a real number. You could try to use a single neurone with sigmoid or linear activation at the end of your model.

Face Recognition based on Deep Learning (Siamese Architecture)

I want to use pre-trained model for the face identification. I try to use Siamese architecture which requires a few number of images. Could you give me any trained model which I can change for the Siamese architecture? How can I change the network model which I can put two images to find their similarities (I do not want to create image based on the tutorial here)? I only want to use the system for real time application. Do you have any recommendations?
I suppose you can use this model, described in Xiang Wu, Ran He, Zhenan Sun, Tieniu Tan A Light CNN for Deep Face Representation with Noisy Labels (arXiv 2015) as a a strating point for your experiments.
As for the Siamese network, what you are trying to earn is a mapping from a face image into some high dimensional vector space, in which distances between points reflects (dis)similarity between faces.
To do so, you only need one network that gets a face as an input and produce a high-dim vector as an output.
However, to train this single network using the Siamese approach, you are going to duplicate it: creating two instances of the same net (you need to explicitly link the weights of the two copies). During training you are going to provide pairs of faces to the nets: one to each copy, then the single loss layer on top of the two copies can compare the high-dimensional vectors representing the two faces and compute a loss according to a "same/not same" label associated with this pair.
Hence, you only need the duplication for the training. In test time ('deploy') you are going to have a single net providing you with a semantically meaningful high dimensional representation of faces.
For a more advance Siamese architecture and loss see this thread.
On the other hand, you might want to consider the approach described in Oren Tadmor, Yonatan Wexler, Tal Rosenwein, Shai Shalev-Shwartz, Amnon Shashua Learning a Metric Embedding for Face Recognition using the Multibatch Method (arXiv 2016). This approach is more efficient and easy to implement than pair-wise losses over image pairs.

Face detection (viola-jones) in matlab

So I found the cascade object detector in matlab that use the Viola-Jones algorithm to detect faces. Very easy to use, and works great!
But got a few questions.
The viola-jones method got four stages:
Haar Feature Selection
Creating an Integral Image
Adaboost Training
Cascading Classifiers
In matlab I can use FrontalFace(CART) and FrontalFace(LBP). These are Trained cascade classification model, so they will be part of stage 4 right?
But what is the difference between stage 1 and stage 4 if I use FrontalFace(CART)? Both use Haaar features it says.
Can we say that FrontalFace(CART) and FrontalFace(LBP) are two different ways of detecting faces? Can I compare those two against each other to see which one is better?
Or should I find another method to compare against the viola-jones?
Are there other face detection methods that are easy to implement in matlab?
Found some on the internet (using skin color etc), but Matlab is quite new to me. So I felt that those codes where abit to complicated for me.
The main difference is that FrontalFace(CART) and FrontalFace(LBP) have been trained on different data sets. Also, from the name, I am guessing that FrontalFace(LBP) uses LBP feaures instead of Haar.
The original Viola-Jones algorithm used the Haar features. However, it has later been extended to use other types of features. vision.CascadeObjectDetector supports Haar, LBP, and HOG features.
To compare which one is better, you would need some ground truth images, which are images with faces labeled by hand. I am sure you can find a benchmark data set on the web. Alternatively, you can label you own images using trainingImageLabeler app.
Also, if you are not happy with the accuracy of the classifiers that come with vision.CascadeObjectDetctor, you can train your own using the trainCascadeObjectDetector function.

Mapping Vision Outputs To Neural Network Inputs

I'm fairly new to MATLAB, but have acquainted myself with Simulink and Computer Vision over the past few days. My problem statement involves taking a traffic/highway video input and detecting if an accident has occurred.
I plan to do this by extracting the values of centroid to plot trajectory, velocity difference (between frames) and distance between two vehicles. I can successfully track the centroids, and aim to derive the rest of the features.
What I don't know is how to map these to ANN. I mean, every image has more than one vehicle blobs, which means, there are multiple centroids in a single frame/image. So, how does NN act on multiple inputs (the extracted features per vehicle) simultaneously? I am obviously missing the link. Help me figure it out please.
Also, am I looking at time series data?
I am not exactly sure about your question. The problem can be both time series data and not. You might be able to transform the time series version of the problem, such that it can be solved using ANN, but it is sort of a Maslow's hammer :). Also, Could you rephrase the problem.
As you said, you could give it features from two or three frames and then use the classifier to detect accident or not, but it might be difficult to train such a classifier. The problem is really difficult and the so you might need tons of training samples to get it right, esp really good negative samples (for examples cars travelling close to each other) etc.
There are multiple ways you can try to solve this problem of accident detection. For example : Build a classifier (ANN/SVM etc) to detect accidents without time series data. In which case your input would be accident images and non accident images or some sort of positive and negative samples for training and later images for test. In this specific case, you are not looking at the time series data. But here you might need lots of features to detect the same (this in some sense a single frame version of the problem).
The second method would be to use time series data, in which case you will have to detect the features, track the features (say using Lucas Kanade/Horn and Schunck) and then use the information about velocity and centroid to detect the accident. You might even be able to formulate it for HMMs.

Training for pattern recognition (neural network)

How do you train Neural Network for pattern recognition? For example a face recognition in a picture how would you define the output neurons? (eg. how to detect where is the face exactly, rather than just saying that there is a face in camera). Also, how about detecting multiple faces and different size of faces?
If anyone could give me a pointer it would be really great
Cheers!
Generally speaking I would split the problem into multiple stages e.g.
1 - Is there a face in the picture?
2 - Where is the face in the picture?
3 - Is the face in the picture one that the NN (Neural network) recognises?
In each instance I would suggest you build a separate NN and train it to answer the questions posed.
As for the structure of the NN, that's a bit trickier to answer as it depends on your input data and desired output. For example if you had a 100x100 px image then I suppose its feasible to have 10,000 inputs. You might want to consider doing some preprocessing before hand to say detect ovals that way you could look and see if there are a number of ovals in a predictable outline (1 for the face, 2 for the eyes, and one for the mouth possibly). If you are preprocessing the data then you might have inputs for each oval.
Now for the output... for question one you could just have one output to say how sure the NN is that there is a face in the input data i.e a valuer of 0.0 (defiantly no face) --> 1.0 (defiantly a face). This way you can move onto stages 2 and 3.
I might say at this point that this is a non-trivial problem and you might be better to have a look at some of the frameworks available e.g. OpenCV
Now for the training part, you need to have a stockpile of images available to train the NN. There are a number of ways in which you could train the NN. One potential solution is to use a technique called back propagation 1, 2. In general terms, you use the NN on an image and compare it to a predetermined output. If its wrong tweak the NN to produce the desired output and repeat.
If you want a good book on AI, then I would highly recommend Artificial Intelligence: A Modern Approach by Russell and Norvig. Im sure that there are more appropriate Computer Vision textbooks, but the Russell & Norvig book is an excellent starter.
Dear GantengX, you should prepare your self to the fact that the answer is so large, complex and hard to understand. There is so many approaches to pattern and face recognition. And implementing real-life face recognition system is a huge array of work that one person can never handle. Prepare your self for at least 10 years of life behind books on mathematic and artificial intelligence, I'm not talking about hiring 5 highly payed developers in the end who will understand what you want them to do. And maybe you will end up having your own face recognition system. There are also dozen of other issues that will jump out during the process. So be ready for a life full of stresses and problems.
I'm sorry for telling obvious things, but your question was not specific, complete answer would touch many different scientific spheres and will result as a book with over 1k pages.
Regarding your question (the short answer).
There are several principal parts that each face recognition app consists of:
Artificial intelligence algorithm
Optimization algorithm (for AI optimization)
Different filtration algorithms
Effective data set development
Items 1. and 2. are the central part of each system, they do the actual work. Any other preprocessing just makes the input data less complex, making it easier to do a decision for your AI. Don't start 3. and 4. until you will have your first results.
P.S.
Using existing solutions is more cost-effective, but if you are studying things then don't loose time like I did, and start your dissertation right away.