How do self driving cars using vision detection systems handle the n possibilities as inputs - neural-network

I understand Convolutional neural networks can be used to fix this problem, but if you look at videos of self driving cars, like tesla autopilot, they still use vision detection and labeling systems as input for their neural networks. I am wondering how the self driving cars fix the problem of having N possible number of detection objects and for each of the inputs there are a varing number of information to input about them. As a neural network structure is very rigid, I would imagine that this would cause a problem. Any explanation would be greatly helpful; however, if you do have a scientific paper that would be very appreciated!

These networks do not output a class label such as car, person or sidewalk, rather a probability distribution over N objects. The final decision is later made, basically taking the highest rated object in terms of probability as the prediction. The model is trained on lots of images and as you said all of these images contain a varying numbers of objects but since the model itself output probabilities for all N objects regardless of the number of objects in the input, this is already something that model is trained for. So they learn to output probabilities close to 0 for objects types if they are not extant in the image.
Since this is something that they are trained for they can also do it during the inference. Of course, some problems might occur if certain object type is very rare in the data but this is a class imbalance issue.

Related

Use a trained neural network to imitate its training data

I'm in the overtures of designing a prose imitation system. It will read a bunch of prose, then mimic it. It's mostly for fun so the mimicking prose doesn't need to make too much sense, but I'd like to make it as good as I can, with a minimal amount of effort.
My first idea is to use my example prose to train a classifying feed-forward neural network, which classifies its input as either part of the training data or not part. Then I'd like to somehow invert the neural network, finding new random inputs that also get classified by the trained network as being part of the training data. The obvious and stupid way of doing this is to randomly generate word lists and only output the ones that get classified above a certain threshold, but I think there is a better way, using the network itself to limit the search to certain regions of the input space. For example, maybe you could start with a random vector and do gradient descent optimisation to find a local maximum around the random starting point. Is there a word for this kind of imitation process? What are some of the known methods?
How about Generative Adversarial Networks (GAN, Goodfellow 2014) and their more advanced siblings like Deep Convolutional Generative Adversarial Networks? There are plenty of proper research articles out there, and also more gentle introductions like this one on DCGAN and this on GAN. To quote the latter:
GANs are an interesting idea that were first introduced in 2014 by a
group of researchers at the University of Montreal lead by Ian
Goodfellow (now at OpenAI). The main idea behind a GAN is to have two
competing neural network models. One takes noise as input and
generates samples (and so is called the generator). The other model
(called the discriminator) receives samples from both the generator
and the training data, and has to be able to distinguish between the
two sources. These two networks play a continuous game, where the
generator is learning to produce more and more realistic samples, and
the discriminator is learning to get better and better at
distinguishing generated data from real data. These two networks are
trained simultaneously, and the hope is that the competition will
drive the generated samples to be indistinguishable from real data.
(DC)GAN should fit your task quite well.

Face Recognition based on Deep Learning (Siamese Architecture)

I want to use pre-trained model for the face identification. I try to use Siamese architecture which requires a few number of images. Could you give me any trained model which I can change for the Siamese architecture? How can I change the network model which I can put two images to find their similarities (I do not want to create image based on the tutorial here)? I only want to use the system for real time application. Do you have any recommendations?
I suppose you can use this model, described in Xiang Wu, Ran He, Zhenan Sun, Tieniu Tan A Light CNN for Deep Face Representation with Noisy Labels (arXiv 2015) as a a strating point for your experiments.
As for the Siamese network, what you are trying to earn is a mapping from a face image into some high dimensional vector space, in which distances between points reflects (dis)similarity between faces.
To do so, you only need one network that gets a face as an input and produce a high-dim vector as an output.
However, to train this single network using the Siamese approach, you are going to duplicate it: creating two instances of the same net (you need to explicitly link the weights of the two copies). During training you are going to provide pairs of faces to the nets: one to each copy, then the single loss layer on top of the two copies can compare the high-dimensional vectors representing the two faces and compute a loss according to a "same/not same" label associated with this pair.
Hence, you only need the duplication for the training. In test time ('deploy') you are going to have a single net providing you with a semantically meaningful high dimensional representation of faces.
For a more advance Siamese architecture and loss see this thread.
On the other hand, you might want to consider the approach described in Oren Tadmor, Yonatan Wexler, Tal Rosenwein, Shai Shalev-Shwartz, Amnon Shashua Learning a Metric Embedding for Face Recognition using the Multibatch Method (arXiv 2016). This approach is more efficient and easy to implement than pair-wise losses over image pairs.

CNN for recognizing five different faces

I have a project for face recognition of five people that I want my CNN to detect, and I was wondering if people could have a look at my model to see if this is a step in the right direction
def model():
model= Sequential()
# sort out the input layer later
model.add(convolutional.Convolution2D(64,3,3, activation='relu'), input_shape=(3,800,800))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(convolutional.Convolution2D(64,3,3, activation='relu'))
model.add(convolutional.MaxPooling2D((2,2), strides=(2,2)))
flatten()
model.add(Dense(128, activation='relu'))
model.add(Dropout(p=0.2))
model.add(Dense(number_of_faces, activation='softmax'))
so the model will be taking in pictures (headshots found on google of 5 people) in 3 channels of size 800 by 800 with 64 feature maps, pooled and then another set of feature maps
and then connected to a mlp for classification into a binary vector for 5 output neurons. My question is, is this a decent approach to try and classify headshots of certain people?
for example if I were to download one hundred pictures of a certain person and put them through this model, would the feature space created in the convolution be big enough to capture
the features of that face and four others?
thanks for the help guys
Well, it is not an engineering issue but a scientific one. It is hard to judge whether 100 picture is enough for your purpose without showing current progress (like, what is the accuracy now? Are your facing overfitting or underfitting.
But, YES, extra data of faces can help with your model, especially when those faces are of same context (background, light, angle, skin color, etc.) with your eventual testing data.
If you are interesting in face recognition, you can start with Deep Learning Face Representation from Predicting 10,000 Classes (unofficial code here), they use 10 thousand faces as extra dataset to train. You can search "DeepID" for more information.
If you are an engineering guy, you can check Facial Expression Recognition with Convolutional Neural Networks, this report focus more on implementation, which is also implemented by Keras.
By then way, 800*800 is extra large in face recognition community. You might like to resize them to a smaller size. Otherwise your program might be too gargantuan to train and consumes butch of memory.
Face recognition is not a regular classification study. If you train your model for 5 people, even if it would be a successful model, you need to re-train it if a new person join to the team. It means that your new model might not be successful anymore.
We firstly train a regular classification model but then drop its final softmax layer and use its early layer to represent images. Representations are multi-dimensional vector. Herein, we expect that image pair of same person should have high similarity whereas image pair of different persons should have low similarity. We can find the vector similarities with cosine similarity or euclidean distance methods.
To sum up, you should not train a model anymore for face recognition application. You just need to use a neural networks to predict. Predictions will be representations.
I recommend you to use deepface. It wraps state-of-the-art face recognition models such as VGG-Face, Google FaceNet, OpenFace, Facebook DeepFace, DeepID and Dlib. It also handles face detection and alignment in the background. You just need to call a line of code to apply face recognition.
#!pip install deepface
from deepface import DeepFace
models = ['VGG-Face', 'Facenet', 'OpenFace', 'DeepFace', 'DeepID', 'Dlib']
obj = DeepFace.verify("img1.jpg", "img2.jpg", model_name = models[0])
print(obj["verified"], ", ", obj["distance"])
Returned object stores max threshold value and found distance. In this way, it returns True in verified param if the image pair is same person, returns False if the image pair is different persons.

Using a learned Artificial Neural Network to solve inputs

I've recently been delving into artificial neural networks again, both evolved and trained. I had a question regarding what methods, if any, to solve for inputs that would result in a target output set. Is there a name for this? Everything I try to look for leads me to backpropagation which isn't necessarily what I need. In my search, the closest thing I've come to expressing my question is
Is it possible to run a neural network in reverse?
Which told me that there, indeed, would be many solutions for networks that had varying numbers of nodes for the layers and they would not be trivial to solve for. I had the idea of just marching toward an ideal set of inputs using the weights that have been established during learning. Does anyone else have experience doing something like this?
In order to elaborate:
Say you have a network with 401 input nodes which represents a 20x20 grayscale image and a bias, two hidden layers consisting of 100+25 nodes, as well as 6 output nodes representing a classification (symbols, roman numerals, etc).
After training a neural network so that it can classify with an acceptable error, I would like to run the network backwards. This would mean I would input a classification in the output that I would like to see, and the network would imagine a set of inputs that would result in the expected output. So for the roman numeral example, this could mean that I would request it to run the net in reverse for the symbol 'X' and it would generate an image that would resemble what the net thought an 'X' looked like. In this way, I could get a good idea of the features it learned to separate the classifications. I feel as it would be very beneficial in understanding how ANNs function and learn in the grand scheme of things.
For a simple feed-forward fully connected NN, it is possible to project hidden unit activation into pixel space by taking inverse of activation function (for example Logit for sigmoid units), dividing it by sum of incoming weights and then multiplying that value by weight of each pixel. That will give visualization of average pattern, recognized by this hidden unit. Summing up these patterns for each hidden unit will result in average pattern, that corresponds to this particular set of hidden unit activities.Same procedure can be in principle be applied to to project output activations into hidden unit activity patterns.
This is indeed useful for analyzing what features NN learned in image recognition. For more complex methods you can take a look at this paper (besides everything it contains examples of patterns that NN can learn).
You can not exactly run NN in reverse, because it does not remember all information from source image - only patterns that it learned to detect. So network cannot "imagine a set inputs". However, it possible to sample probability distribution (taking weight as probability of activation of each pixel) and produce a set of patterns that can be recognized by particular neuron.
I know that you can, and I am working on a solution now. I have some code on my github here for imagining the inputs of a neural network that classifies the handwritten digits of the MNIST dataset, but I don't think it is entirely correct. Right now, I simply take a trained network and my desired output and multiply backwards by the learned weights at each layer until I have a value for inputs. This is skipping over the activation function and may have some other errors, but I am getting pretty reasonable images out of it. For example, this is the result of the trained network imagining a 3: number 3
Yes, you can run a probabilistic NN in reverse to get it to 'imagine' inputs that would match an output it's been trained to categorise.
I highly recommend Geoffrey Hinton's coursera course on NN's here:
https://www.coursera.org/course/neuralnets
He demonstrates in his introductory video a NN imagining various "2"s that it would recognise having been trained to identify the numerals 0 through 9. It's very impressive!
I think it's basically doing exactly what you're looking to do.
Gruff

Convert SURFpoints object MATLAB

Is there any way I can convert the SURFpoints object, generated by matlab, into a matrix with x and y positions, for feeding into a neural network?
I am a pretty much complete beginner, but from what I can tell, and by looking at documentation, I wasn't sure if there was a way to get SURFpoints into neural networks?
Many thanks,
Hugh
SURFPoints has a field, Location, that is an n x 2 matrix that has the (x,y) coordinates of each SURF point detected in the image.
Note, however, that SURF points have other attributes beside their location (such as scale and orientation). If you only take into account the (x,y) locations, you are throwing away a lot of data.
Also, it's unclear how you would feed this information into a neural network. A neural network, like many other machine learning models, expects a uniform length feature vector of an entity. If your task is something like image classification, you'll have to come up with some way to convert the list of SURF points into a feature vector that captures the properties you want your classifier to care about. Depending on your application, a neural network may or may not be the best way to go. In the context of computer vision and image processing, neural networks these days are more commonly used for unsupervised feature discovery (see "deep learning"). For supervised learning tasks, other models like boosted decision trees and SVMs give better theoretical guarantees and have fared much better in practice.