CNN for recognizing five different faces - neural-network

I have a project for face recognition of five people that I want my CNN to detect, and I was wondering if people could have a look at my model to see if this is a step in the right direction
def model():
model= Sequential()
# sort out the input layer later
model.add(convolutional.Convolution2D(64,3,3, activation='relu'), input_shape=(3,800,800))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(convolutional.Convolution2D(64,3,3, activation='relu'))
model.add(convolutional.MaxPooling2D((2,2), strides=(2,2)))
flatten()
model.add(Dense(128, activation='relu'))
model.add(Dropout(p=0.2))
model.add(Dense(number_of_faces, activation='softmax'))
so the model will be taking in pictures (headshots found on google of 5 people) in 3 channels of size 800 by 800 with 64 feature maps, pooled and then another set of feature maps
and then connected to a mlp for classification into a binary vector for 5 output neurons. My question is, is this a decent approach to try and classify headshots of certain people?
for example if I were to download one hundred pictures of a certain person and put them through this model, would the feature space created in the convolution be big enough to capture
the features of that face and four others?
thanks for the help guys

Well, it is not an engineering issue but a scientific one. It is hard to judge whether 100 picture is enough for your purpose without showing current progress (like, what is the accuracy now? Are your facing overfitting or underfitting.
But, YES, extra data of faces can help with your model, especially when those faces are of same context (background, light, angle, skin color, etc.) with your eventual testing data.
If you are interesting in face recognition, you can start with Deep Learning Face Representation from Predicting 10,000 Classes (unofficial code here), they use 10 thousand faces as extra dataset to train. You can search "DeepID" for more information.
If you are an engineering guy, you can check Facial Expression Recognition with Convolutional Neural Networks, this report focus more on implementation, which is also implemented by Keras.
By then way, 800*800 is extra large in face recognition community. You might like to resize them to a smaller size. Otherwise your program might be too gargantuan to train and consumes butch of memory.

Face recognition is not a regular classification study. If you train your model for 5 people, even if it would be a successful model, you need to re-train it if a new person join to the team. It means that your new model might not be successful anymore.
We firstly train a regular classification model but then drop its final softmax layer and use its early layer to represent images. Representations are multi-dimensional vector. Herein, we expect that image pair of same person should have high similarity whereas image pair of different persons should have low similarity. We can find the vector similarities with cosine similarity or euclidean distance methods.
To sum up, you should not train a model anymore for face recognition application. You just need to use a neural networks to predict. Predictions will be representations.
I recommend you to use deepface. It wraps state-of-the-art face recognition models such as VGG-Face, Google FaceNet, OpenFace, Facebook DeepFace, DeepID and Dlib. It also handles face detection and alignment in the background. You just need to call a line of code to apply face recognition.
#!pip install deepface
from deepface import DeepFace
models = ['VGG-Face', 'Facenet', 'OpenFace', 'DeepFace', 'DeepID', 'Dlib']
obj = DeepFace.verify("img1.jpg", "img2.jpg", model_name = models[0])
print(obj["verified"], ", ", obj["distance"])
Returned object stores max threshold value and found distance. In this way, it returns True in verified param if the image pair is same person, returns False if the image pair is different persons.

Related

How do self driving cars using vision detection systems handle the n possibilities as inputs

I understand Convolutional neural networks can be used to fix this problem, but if you look at videos of self driving cars, like tesla autopilot, they still use vision detection and labeling systems as input for their neural networks. I am wondering how the self driving cars fix the problem of having N possible number of detection objects and for each of the inputs there are a varing number of information to input about them. As a neural network structure is very rigid, I would imagine that this would cause a problem. Any explanation would be greatly helpful; however, if you do have a scientific paper that would be very appreciated!
These networks do not output a class label such as car, person or sidewalk, rather a probability distribution over N objects. The final decision is later made, basically taking the highest rated object in terms of probability as the prediction. The model is trained on lots of images and as you said all of these images contain a varying numbers of objects but since the model itself output probabilities for all N objects regardless of the number of objects in the input, this is already something that model is trained for. So they learn to output probabilities close to 0 for objects types if they are not extant in the image.
Since this is something that they are trained for they can also do it during the inference. Of course, some problems might occur if certain object type is very rare in the data but this is a class imbalance issue.

Use a trained neural network to imitate its training data

I'm in the overtures of designing a prose imitation system. It will read a bunch of prose, then mimic it. It's mostly for fun so the mimicking prose doesn't need to make too much sense, but I'd like to make it as good as I can, with a minimal amount of effort.
My first idea is to use my example prose to train a classifying feed-forward neural network, which classifies its input as either part of the training data or not part. Then I'd like to somehow invert the neural network, finding new random inputs that also get classified by the trained network as being part of the training data. The obvious and stupid way of doing this is to randomly generate word lists and only output the ones that get classified above a certain threshold, but I think there is a better way, using the network itself to limit the search to certain regions of the input space. For example, maybe you could start with a random vector and do gradient descent optimisation to find a local maximum around the random starting point. Is there a word for this kind of imitation process? What are some of the known methods?
How about Generative Adversarial Networks (GAN, Goodfellow 2014) and their more advanced siblings like Deep Convolutional Generative Adversarial Networks? There are plenty of proper research articles out there, and also more gentle introductions like this one on DCGAN and this on GAN. To quote the latter:
GANs are an interesting idea that were first introduced in 2014 by a
group of researchers at the University of Montreal lead by Ian
Goodfellow (now at OpenAI). The main idea behind a GAN is to have two
competing neural network models. One takes noise as input and
generates samples (and so is called the generator). The other model
(called the discriminator) receives samples from both the generator
and the training data, and has to be able to distinguish between the
two sources. These two networks play a continuous game, where the
generator is learning to produce more and more realistic samples, and
the discriminator is learning to get better and better at
distinguishing generated data from real data. These two networks are
trained simultaneously, and the hope is that the competition will
drive the generated samples to be indistinguishable from real data.
(DC)GAN should fit your task quite well.

Face Recognition based on Deep Learning (Siamese Architecture)

I want to use pre-trained model for the face identification. I try to use Siamese architecture which requires a few number of images. Could you give me any trained model which I can change for the Siamese architecture? How can I change the network model which I can put two images to find their similarities (I do not want to create image based on the tutorial here)? I only want to use the system for real time application. Do you have any recommendations?
I suppose you can use this model, described in Xiang Wu, Ran He, Zhenan Sun, Tieniu Tan A Light CNN for Deep Face Representation with Noisy Labels (arXiv 2015) as a a strating point for your experiments.
As for the Siamese network, what you are trying to earn is a mapping from a face image into some high dimensional vector space, in which distances between points reflects (dis)similarity between faces.
To do so, you only need one network that gets a face as an input and produce a high-dim vector as an output.
However, to train this single network using the Siamese approach, you are going to duplicate it: creating two instances of the same net (you need to explicitly link the weights of the two copies). During training you are going to provide pairs of faces to the nets: one to each copy, then the single loss layer on top of the two copies can compare the high-dimensional vectors representing the two faces and compute a loss according to a "same/not same" label associated with this pair.
Hence, you only need the duplication for the training. In test time ('deploy') you are going to have a single net providing you with a semantically meaningful high dimensional representation of faces.
For a more advance Siamese architecture and loss see this thread.
On the other hand, you might want to consider the approach described in Oren Tadmor, Yonatan Wexler, Tal Rosenwein, Shai Shalev-Shwartz, Amnon Shashua Learning a Metric Embedding for Face Recognition using the Multibatch Method (arXiv 2016). This approach is more efficient and easy to implement than pair-wise losses over image pairs.

Convolution Neural Network for image detection/classification

So here is there setup, I have a set of images (labeled train and test) and I want to train a conv net that tells me whether or not a specific object is within this image.
To do this, I followed the tensorflow tutorial on MNIST, and I train a simple conv net reduced to the area of interest (the object) which are training on image of size 128x128. The architecture is as follows : successively 3 layers consisting of 2 conv layers and 1 max pool down-sampling layers, and one fully connected softmax layers (with two class 0 and 1 whether the object is present or not)
I impleted it using tensorflow, and this works quite well, but since I have enough computing power I was wondering how I could improve the complexity of the classification:
- adding more layers ?
- adding more channel at each layer ? (currently 32,64,128 and 1024 for the fully connected)
- anything else ?
But the most important part is that now I want to detect this same object on larger images (roughle 600x600 whereas the size of the object should be around 100x100).
I was wondering how I could use the previously training "small" network used for small images, in order to pretrained a larger network on the large images ? One option could be to classify the image using a slicing window of size 128x128 and scan the whole image but I would like to try if possible to train a whole network on it.
Any suggestion on how to proceed ? Or an article / ressource tackling this kind of problem ? (I am really new to deep learning so sorry if this is stupid question...)
Thanks !
I suggest that you continue reading on the field overall. Your search keys include CNN, image classification, neural net, AlexNet, GoogleNet, and ResNet. This will return many articles, on-line classes and lectures, and other materials to help you learn about classification with neural nets.
Don't just add layers or filters: the complexity of the topology (net design) must be fitted to the task; a net that's too complex will over-fit the training data. The one you've been using is probably LeNet; the three I cite above are for the ImageNet image classification contest.
Since you are working on images, I would suggest you to use a pretrained image classification network (like VGG, Alexnet etc.)and fine tune this network with your 128x128 image data. In my experience until we have very large data set fine tuned network will give more accuracy and also save training time. After building a good image classifier on your data set you can use any popular algorithm to generate region of proposal from the image. Now take all regions of proposal and pass them to classification network one by one and check weather this network is classifying given region of proposal as positive or negative. If it classifying as positively then most probably your object is present in that region. Otherwise it's not. If there are a lot of region of proposal in which object is present according to classifier then you can use non maximal suppression algorithms to reduce number of positive proposals.

Can neural network fail to learn a function? and How to choose better feature descriptors for pattern recognition?

I was working on webots which is an environment used to model, program and simulate mobile robots. Basically i have a small robot with a VGA camera, and it looks for simple blue coloured patterns on white walls of a small lego maze and moves accordingly
The method I used here was
​
Obtain images of the patterns from webots and save it in a location
in PC.
​​Detect the blue pattern, form a square enclosing the pattern
with atleast 2 edges of the pattern being part of the boundary of the
square.
​Resize it to 7x7 matrix(using nearest neighbour
interpolation algorithm)
The input to the network is nothing but the red pixel intensities of each of the 7x7 image(when i look at the blue pixel through a red filter it appears black so). The intensities of each pixel is extracted and the 7x7 matrix is then converted it to a 1D vector i.e 1x49 which is my input to the neural network. (I chose this characteristic as my input because it is 'relatively' less difficult to access this information using C and webots.​​)
I used MATLAB for this offline training method and I used a slower learning rate(0.06) to ensure parameter convergence and tested it on large and small datasets(1189 and 346 respectively). On all the numerous times I have tried, the network fails to classify the pattern.(it says the pattern belongs to all the 4 classes !!!! ) . There is nothing wrong with the program as I tested it out on the simpleclass_dataset in matlab and it works almost perfectly
Is it possible that the neural network fails to learn the function because of really poor data? (by poor data i mean that the datapoints corresponding to one sample of one class are very close to another sample belonging to a different class or something of that sort). Or can the neural network fail because of very poor feature descriptors?
Can anyone suggest a simpler method to extract features from the image(I am now shifting to MATLAB as I am now only concerned with simulations in webots and not the real robot). What sort of features can I choose? The patterns are very simple (L,an inverted L and its reflected versions are the 4 patterns)
Neural networks CAN fail to learn a function; this is most often caused by employing a network topology which is too simple to model the necessary function. A classic example of this case is attempting to learn an XOR function using a perceptron classifier, although it can even happen in multilayer neural nets sometimes; especially for complex tasks like image recognition. See my previous answer for a rough guide on how to select neural network parameters (ignore the convolution stuff if you want, although I would highly recommened looking into convolutional neural networks if you are still having problems).
It is a possiblity that there is too little seperability between classes, although I doubt that this is the case given your current features. Is there a reason that your network needs to allow an image to be four classifications simultaneously? If not, then perhaps you could classify the input as the output with the highest activation instead of all those with high activations.