Facial features extraction in MATLAB [closed] - matlab

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have a project in which I need to make neural network for face recognition.
Inputs of network should be features of face which needs to be recognized.
I searched a lot and found SURF Detector of Matlab's Computer Vision Toolbox to be the one which will help me extract the features of face. But SURF Detector extracts keypoints of face and for each of them sets vector with 64 or 128 values. Problem is that the number of keypoints varies,and I need it to be same for each face, to be able to feed the inputs of neural network.
So i thought to extract only some features which can be presented as single number, like proportions of nose,mouth,eyes to the face, or distance between eyes, etc.
How can i get these features, and will they be good enough to serve as inputs to neural network which will need to recognize faces? On the output of neural network there will be same number of neurons as there is different people in database, and in training phase I'm going to feed the network with extracted face features from photo, and if it is photo of let's say third of five people in database, my output layer will look like [0,0,1,0,0].
Is this good approach and can you give me some code which extracts those face features from face in Matlab?

Proportions of nose/mouth/eyes to the face and distance between eyes will give you very bad results. Those are not measures that are accurate or distinctive enough.
If you're looking for features for face recognition, you should consider LBP:
http://www.scholarpedia.org/article/Local_Binary_Patterns#Face_description_using_LBP

Related

neural network check plastic parts [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
neural networks are used to generalize and classify...
I have a little experience with classify digits...
Using neural nets to recognize handwritten digits
i want to use a network to check plastic parts.
I have a videostream of production from these plastic parts.
should i train the network with many videos of correct plastic parts to get positive output and random videos to get negative output?
If you have any books or links i would be happy to see them.
EDIT
It looks like i asked a bit stupid...
During production, wrong plastic parts can be created and these should be recognized by network. There are a lot of mistakes can happen during production, so i think
it only makes sense to train the network with correct plastic parts.
A convolution neural network would be my recommendation.
You should show individual parts with similar background and lighting.
The training has to be done on both good and bad parts - a sufficient random sampling of both. You should also set aside a test set once your CNN is trained so you can evaluate it.
You'll want to generate a confusion matrix from the test data so you'll know the rate of false positives, false negatives, correct, and incorrect classifications.

How to choose the number of filters in each Convolutional Layer? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
When building a convolutional neural network, how do you determine the number of filters used in each convolutional layer. I know that there is no hard rule about the number of filters, but from your experience/ papers you have read, etc. is there an intuition/observation about number of filters used?
For instance (I'm just making this up as example):
use more/less filters as the network gets deeper.
use larger/smaller filter with large/small kernel size
If the object of interest in the image is large/small, use ...
As you said, there are no hard rules for this.
But you can get inspiration from VGG16 for example.
It double the number of filters between each conv layers.
For the kernel size, I usually keep 3x3 or 5x5.
But, you can also take a look at Inception by Google.
They use varying kernel size, then concat them. Very interesting.
As far as I am concerned there is no foxed depth for the convolutional layers. Just several suggestions:
In CS231 they mention using 3 x 3 or 5 x 5 filters with stride of 1 or 2 is a widely used practice.
How many of them: Depends on the dataset. Also, consider using fine-tuning if the data is suitable.
How the dataset will reflect the choice? A matter of experiment.
What are the alternatives? Have a look at the Inception and ResNet papers for approaches which are close to the state of the art.

How to design deep convolutional neural networks? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
As I understand it, all CNNs are quite similar. They all have a convolutional layers followed by pooling and relu layers. Some have specialised layers like FlowNet and Segnet. My doubt is how should we decide how many layers to use and how do we set the kernel size for each layer in the network. I have searched for an answer to this question but I couldn't find a concrete answer. Is the network designed using trial and error or are some specific rules that I am not aware of? If you could please clarify this, I would be very grateful to you.
Short answer: if there are design rules, we haven't discovered them yet.
Note that there are comparable questions in computing. For instance, note that there is only a handful of basic electronic logic units, the gates that drive your manufacturing technology. All computing devices use the same Boolean logic; some have specialised additions, such as photoelectric input or mechanical output.
How do you decide how to design your computing device?
The design depends on the purpose of the CNN. Input characteristics, accuracy, training speed, scoring speed, adaptation, computing resources, ... all of these affect the design. There is no generalized solution, even for a given problem (yet).
For instance, consider the ImageNet classification problem. Note the structural differences between the winners and contenders so far: AlexNet, GoogleNet, ResNet, VGG, etc. If you change inputs (say, to MNIST), then these are overkill. If you change the paradigm, they may be useless. GoogleNet may be a prince of image processing, but it's horrid for translating spoken French to written English. If you want to track a hockey puck in real time on your video screen, forget these implementations entirely.
So far, we're doing this the empirical way: a lot of people try a lot of different things to see what works. We get feelings for what will improve accuracy, or training time, or whatever factor we want to tune. We find what works well with total CPU time, or what we can do in parallel. We change algorithms to take advantage of vector math in lengths that are powers of 2. We change problems slightly and see how the learning adapts elsewhere. We change domains (say, image processing to written text), and start all over -- but with a vague feeling of what might tune a particular bottleneck, once we get down to considering certain types of layers.
Remember, CNNs really haven't been popular for that long, barely 6 years. For the most part, we're still trying to learn what the important questions might be. Welcome to the research team.

Convolutional Neural Network (CNN) for Audio [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have been following the tutorials on DeepLearning.net to learn how to implement a convolutional neural network that extracts features from images. The tutorial are well explained, easy to understand and follow.
I want to extend the same CNN to extract multi-modal features from videos (images + audio) at the same time.
I understand that video input is nothing but a sequence of images (pixel intensities) displayed in a period of time (ex. 30 FPS) associated with audio. However, I don't really understand what audio is, how it works, or how it is broken down to be feed into the network.
I have read a couple of papers on the subject (multi-modal feature extraction/representation), but none have explained how audio is inputed to the network.
Moreover, I understand from my studies that multi-modality representation is the way our brains really work as we don't deliberately filter out our senses to achieve understanding. It all happens simultaneously without us knowing about it through (joint representation). A simple example would be, if we hear a lion roar we instantly compose a mental image of a lion, feel danger and vice-versa. Multiple neural patterns are fired in our brains to achieve a comprehensive understanding of what a lion looks like, sounds like, feels like, smells like, etc.
The above mentioned is my ultimate goal, but for the time being I'm breaking down my problem for the sake of simplicity.
I would really appreciate if anyone can shed light on how audio is dissected and then later on represented in a convolutional neural network. I would also appreciate your thoughts with regards to multi-modal synchronisation, joint representations, and what is the proper way to train a CNN with multi-modal data.
EDIT:
I have found out the audio can be represented as spectrograms. It as a common format for audio and is represented as a graph with two geometric dimensions where the horizontal line represents time and the vertical represents frequency.
Is it possible to use the same technique with images on these spectrograms? In other words can I simply use these spectrograms as input images for my convolutional neural network?
We used deep convolutional networks on spectrograms for a spoken language identification task. We had around 95% accuracy on a dataset provided in this TopCoder contest. The details are here.
Plain convolutional networks do not capture the temporal characteristics, so for example in this work the output of the convolutional network was fed to a time-delay neural network. But our experiments show that even without additional elements convolutional networks can perform well at least on some tasks when the inputs have similar sizes.
There are many techniques to extract feature vectors from audio data in order to train classifiers. The most commonly used is called MFCC (Mel-frequency cepstrum), which you can think of as a "improved" spectrogram, retaining more relevant information to discriminate between classes. Other commonly used technique is PLP (Perceptual Linear Predictive), which also gives good results. These are still many other less known.
More recently deep networks have been used to extract features vectors by themselves, thus more similarly the way we do in image recognition. This is a active area of research. Not long ago we also used feature extractors to train classifiers for images (SIFT, HOG, etc.), but these were replaced by deep learning techniques, which have raw images as inputs and extract feature vectors by themselves (indeed it's what deep learning is really all about).
It's also very important to notice that audio data is sequential. After training a classifier you need to train a sequential model as a HMM or CRF, which chooses the most likely sequences of speech units, using as input the probabilities given by your classifier.
A good starting point to learn speech recognition is Jursky and Martins: Speech and Language Processing. It explains very well all these concepts.
[EDIT: adding some potentially useful information]
There are many speech recognition toolkits with modules to extract MFCC feature vectors from audio files, but using than for this purpose is not always straightforward. I'm currently using CMU Sphinx4. It has a class named FeatureFileDumper, that can be used standalone to generate MFCC vectors from audio files.

What are interesting ideas for experimenting with Artificial Neural Networks? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I'm after a list of possible neural network implementations that can be experimented with. Possibly something that could take an hour to a week to write.
What other possibilities are there?
Here's the list so far:
Games
tic-tac-toe
Connect 4
Chess
Go
Sudoku
paper/scissors/rock
horse racing predictor
Visual recognition
Character recognition (typefaces, letters, numbers, etc)
Facial recognition
Audio recognition
Language detection
Male vs female
Word recognition
Language detection (natural, programming)
Pathfinding
"Artificial neural network driven mobile robots learn how to drive on roads in simulation." http://cig.felk.cvut.cz/projects/robo/, http://www.youtube.com/watch?v=lmPJeKRs8gE
Some links to more:
http://www.cs.colostate.edu/~anderson/res/project-ideas.html
You can combine Genetic Algorithms and Neural Networks to evolve simple neural configurations, such as Neural Networks that perform logic operations (including the phantomatic XOR!).
This is a topic I very much like because - if you think about it - it's a bare bones model of how our brains evolved (I am not saying we have logic gates in our head).
It is simple enough - and should be good fun!
In a wider way, all that cover pattern recognition and signal processing could take great advantage of neural networks.
Also, you could use neural networks to develop "pseudo-AI" for games (strategy, soccer games).
Anyway, as neural network is a tool more than a "solution", it can be used in economics, physics, navigation, signal processing, etc.
Also, many types of neural networks exist (perceptron, hopfield), the thing is to use them wisely according to the problem.
Neural networks are not panacea, just a (very interesting and powerful) tool.
what about face recognition?
Here are some problems that I think feed forward neural nets (with multiple hidden layers) might be able to solve.
Given the number of packets sent/recieved on the network interface, the volume of ambient noise, and the level of ambient light, attempt to predict the time of day.
Given a latitude and longitude, attempt to predict the elevation, or crime rate.
Given some simple metrics about the keywords in the title of an article, predict how many upvotes it has.
Given the digits of a random phone number, predict where the line terminus is located.
This is more challenging: visualize (ie, plot) the decision boundary surface of a 2-layer neural network. (With 1 layer the boundary is linear, so it's easy).