Adapt neural network after training - neural-network

What is the best way to adapt a neural network after its initial training?
I.e. I want to do some image recognition and the network should get better the more new pictures I present it. That could be done with reinforced learning but for a fast progress at the beginning I want to use back propagation. Is it possible to update a network?
And what about creating new categories later on?
Is there another way than retraining it with the complete dataset since that would take a lot of time.
Sorry for my basic questions but I couldn't find much information about this.

Neural networks can be adapted by training them with small learning rates on the new data. Maybe even training the last layers with a larger learning rate than the others (incase you are using a deep neural network).
For the second part of your question, about creating new categories, a deep neural network can be used as a feature extractor on top of any other classifier (maybe another small neural network). When you want to add a new category, you have to re-train the small classifier (or neural network). This would mean that you will retain the training values of the feature detector (the deep neural network) and use it to detect new categories.

Related

Use a trained neural network to imitate its training data

I'm in the overtures of designing a prose imitation system. It will read a bunch of prose, then mimic it. It's mostly for fun so the mimicking prose doesn't need to make too much sense, but I'd like to make it as good as I can, with a minimal amount of effort.
My first idea is to use my example prose to train a classifying feed-forward neural network, which classifies its input as either part of the training data or not part. Then I'd like to somehow invert the neural network, finding new random inputs that also get classified by the trained network as being part of the training data. The obvious and stupid way of doing this is to randomly generate word lists and only output the ones that get classified above a certain threshold, but I think there is a better way, using the network itself to limit the search to certain regions of the input space. For example, maybe you could start with a random vector and do gradient descent optimisation to find a local maximum around the random starting point. Is there a word for this kind of imitation process? What are some of the known methods?
How about Generative Adversarial Networks (GAN, Goodfellow 2014) and their more advanced siblings like Deep Convolutional Generative Adversarial Networks? There are plenty of proper research articles out there, and also more gentle introductions like this one on DCGAN and this on GAN. To quote the latter:
GANs are an interesting idea that were first introduced in 2014 by a
group of researchers at the University of Montreal lead by Ian
Goodfellow (now at OpenAI). The main idea behind a GAN is to have two
competing neural network models. One takes noise as input and
generates samples (and so is called the generator). The other model
(called the discriminator) receives samples from both the generator
and the training data, and has to be able to distinguish between the
two sources. These two networks play a continuous game, where the
generator is learning to produce more and more realistic samples, and
the discriminator is learning to get better and better at
distinguishing generated data from real data. These two networks are
trained simultaneously, and the hope is that the competition will
drive the generated samples to be indistinguishable from real data.
(DC)GAN should fit your task quite well.

Why we need CNN for the Object Detection?

I want to ask one general question that nowadays Deep learning specially Convolutional Neural Network (CNN) has been used in every field. Sometimes it is not necessary to use CNN for the problem but the researchers are using and following the trend.
So for the Object Detection problem, is it a kind of problem where CNN is really needed to solve the detection problem?
That is unhappy question. In title you ask about CNN, but you ask about deep learning in general.
So we don't necessary need deep learning for object recognition. But trained deep networks gets better results. Companies like Google and others are thankful for every % of better results.
About CNN, they gets better results than "traditional" ANN and also have less parameters because of weights sharing. CNN also allow transfer learning(you take a feature detector- convolution and pooling layers and than you connect on feature detector yours full connected layers).
A key concept of CNN's is the idea of translational invariance. In short, using a convolutional kernel on an image allows the machine to learn a set of weights for a specific feature (an edge, or a much more detailed object, depending on the layering of the network) and apply it across the entire image.
Consider detecting a cat in an image. If we designed some set of weights that allowed the learner to recognize a cat, we would like those weights to be the same no matter where the cat is in the image! So we would "assign" a layer in the convolutional kernel to detecting cats, and then convolve over the entire image.
Whatever the reason for the recent successes of CNN's, it should be noted that regular fully-connected ANN's should perform just as well. The problem is that they quickly become computationally infeasible on larger images, whereas CNN's are much more efficient due to parameter sharing.

Where do filters/kernels for a convolutional network come from?

I've seen some tutorial examples, like UFLDL covolutional net, where they use features obtained by unsupervised learning, or some others, where kernels are engineered by hand (using Sobel and Gabor detectors, different sharpness/blur settings etc). Strangely, I can't find a general guideline on how one should choose a good kernel for something more than a toy network. For example, considering a deep network with many convolutional-pooling layers, are the same kernels used at each layer, or does each layer have its own kernel subset? If so, where do these, deeper layer's filters come from - should I learn them using some unsupervised learning algorithm on data passed through the first convolution-and-pooling layer pair?
I understand that this question doesn't have a singular answer, I'd be happy to just the the general approach (some review article would be fantastic).
The current state of the art suggest to learn all the convolutional layers from the data using backpropagation (ref).
Also, this paper recommend small kernels (3x3) and pooling (2x2). You should train different filters for each layer.
Kernels in deep networks are mostly trained all at the same time in a supervised way (known inputs and outputs of network) using Backpropagation (computes gradients) and some version of Stochastic Gradient Descent (optimization algorithm).
Kernels in different layers are usually independent. They can have different sizes and their numbers can differ as well. How to design a network is an open question and it depends on your data and the problem itself.
If you want to work with your own dataset, you should start with an existing pre-trained network [Caffe Model Zoo] and fine-tune it on your dataset. This way, the architecture of the network would be fixed, as you would have to respect the architecture of the original network. The networks you can donwload are trained on very large problems which makes them able to generalize well to other classification/regression problems. If your dataset is at least partly similar to the original dataset, the fine-tuned networks should work very well.
Good place to get more information is Caffe # CVPR2015 tutorial.

How do neural networks handle large images where the area of interest is small?

If I've understood correctly, when training neural networks to recognize objects in images it's common to map single pixel to a single input layer node. However, sometimes we might have a large picture with only a small area of interest. For example, if we're training a neural net to recognize traffic signs, we might have images where the traffic sign covers only a small portion of it, while the rest is taken by the road, trees, sky etc. Creating a neural net which tries to find a traffic sign from every position seems extremely expensive.
My question is, are there any specific strategies to handle these sort of situations with neural networks, apart from preprocessing the image?
Thanks.
Using 1 pixel per input node is usually not done. What enters your network is the feature vector and as such you should input actual features, not raw data. Inputing raw data (with all its noise) will not only lead to bad classification but training will take longer than necessary.
In short: preprocessing is unavoidable. You need a more abstract representation of your data. There are hundreds of ways to deal with the problem you're asking. Let me give you some popular approaches.
1) Image proccessing to find regions of interest. When detecting traffic signs a common strategy is to use edge detection (i.e. convolution with some filter), apply some heuristics, use a threshold filter and isolate regions of interest (blobs, strongly connected components etc) which are taken as input to the network.
2) Applying features without any prior knowledge or image processing. Viola/Jones use a specific image representation, from which they can compute features in a very fast way. Their framework has been shown to work in real-time. (I know their original work doesn't state NNs but I applied their features to Multilayer Perceptrons in my thesis, so you can use it with any classifier, really.)
3) Deep Learning.
Learning better representations of the data can be incorporated into the neural network itself. These approaches are amongst the most popular researched atm. Since this is a very large topic, I can only give you some keywords so that you can research it on your own. Autoencoders are networks that learn efficient representations. It is possible to use them with conventional ANNs. Convolutional Neural Networks seem a bit sophisticated at first sight but they are worth checking out. Before the actual classification of a neural network, they have alternating layers of subwindow convolution (edge detection) and resampling. CNNs are currently able to achieve some of the best results in OCR.
In every scenario you have to ask yourself: Am I 1) giving my ANN a representation that has all the data it needs to do the job (a representation that is not too abstract) and 2) keeping too much noise away (and thus staying abstract enough).
We usually dont use fully connected network to deal with image because the number of units in the input layer will be huge. In neural network, we have specific neural network to deal with image which is Convolutional neural network(CNN).
However, CNN plays a role of feature extractor. The encoded feature will finally feed into a fully connected network which act as a classifier. In your case, I dont know how small your object is compare to the full image. But if the interested object is really small, even use CNN, the performance for image classification wont be very good. Then we probably need to use object detection(which used sliding window) to deal with it.
If you want recognize small objects on large sized image, you should use "scanning window".
For "scanning window" you can to apply dimention reducing methods:
DCT (http://en.wikipedia.org/wiki/Discrete_cosine_transform)
PCA (http://en.wikipedia.org/wiki/Principal_component_analysis)

Artificial Neural Network that creates it's own connections

I've been reading about feed forward Artificial Neural Networks (ANN), and normally they need training to modify their weights in order to achieve the desired output. They will also always produce the same output when receiving the same input once tuned (biological networks don't necessarily).
Then I started reading about evolving neural networks. However, the evolution usually involves recombining two parents genomes into a new genome, there is no "learning" but really recombining and verifying through a fitness test.
I was thinking, the human brain manages it's own connections. It creates connections, strengthens some, and weakens others.
Is there a neural network topology that allows for this? Where the neural network, once having a bad reaction, either adjusts it's weights accordingly, and possibly creates random new connections (I'm not sure how the brain creates new connections, but even if I didn't, a random mutation chance of creating a new connection could alleviate this). A good reaction would strengthen those connections.
I believe this type of topology is known as a Turing Type B Neural Network, but I haven't seen any coded examples or papers on it.
This paper, An Adaptive Spiking Neural Network with Hebbian Learning, specifically addresses the creation of new neurons and synapses. From the introduction:
Traditional rate-based neural networks and the newer spiking neural networks have been shown to be very effective for some tasks, but they have problems with long term learning and "catastrophic forgetting." Once a network is trained to perform some task, it is difficult to adapt it to new applications. To do this properly, one can mimic processes that occur in the human brain: neurogenesis and synaptogenesis, or the birth and death of both neurons and synapses. To be effective, however, this must be accomplished while maintaining the current memories.
If you do some searching on google with the keywords 'neurogenesis artificial neural networks', or similar, you will find more articles. There is also this similar question at cogsci.stackexchange.com.
neat networks as well as cascading add their own connections/neurons to solve problems by building structures to create specific responses to stimuli