How to pre-train a deep neural network (or RNN) with unlabeled data? - neural-network

Recently, I was asked about how to pre-train a deep neural network with unlabeled data, meaning, instead of initializing the model weight with small random numbers, we set initial weight from a pretrained model (with unlabeled data).
Well, intuitively, I kinda get it, it probably helps with the vanishing gradient issue and shorten the training time when there are not too much labeled data available. But still, I don't really know how it is done, how can you train a neural network with unlabeled data? Is it something like SOM or Boltzmann machine?
Has anybody heard about this? If yes, can you provide some links to sources or papers. I am curious. Greatly appreciate!

There are lots of ways to deep-learn from unlabeled data. Layerwise pre-training was developed back in the 2000s by Geoff Hinton's group, though that's generally fallen out of favor.
More modern unsupervised deep learning methods include Auto-Encoders, Variational Auto-Encoders, and Generative Adversarial Networks. I won't dive into the details of all of them, but the simplest of these, auto-encoders, work by compressing an unlabeled input into a low dimensional real-valued representation, and using this compressed representation to reconstruct the original input. Intuitively, a compressed code that can effectively be used to recreate an input is likely to capture some useful features of said input. See here for an illustration and more detailed description. There are also plenty of examples implemented in your deep learning library of choice.
I guess in some sense any of the listed methods could be used as pre-training, e.g for preparing a network for a discriminative task like classification, though I'm not aware of that being a particularly common practice. Initialization methods, activation functions, and other optimization tricks are generally advanced enough to do well without more complicated initialization procedures.

Related

How to evaluate a quality of neural network for object detection in Keras?

I've already trained the neural network in Keras for detecting two classes of images (cats and dogs) and got accuracy on test data. Is it enough for the conclusion in the master thesis or should I do other actions for evaluating the quality of network (for instance, cross-validation)?
Not really, I would expect more than just accuracy from my students in any classification setup. Accuracy only evaluates that particular network on that particular test set but you would have to some extent justify the design choices you've made in building that network. Here are some things to consider:
Presumably you have some hyper-parameters you've fixed, you can investigate how these affect your results. How many filters? How many layers? and most importantly why?
An important aspect of object classification is how your model handles noise. Depending on your dataset, one simple way would be to pre-process the test data, blur it, invert colours etc and you'll see that your performance will drop. Why does it do that? How does the confusion matrix look like then?
What is the performance of the network? Is it fast, slow compared to another system, say VGG?
When you evaluate your project in general not just the network, asking why things worked helps a lot, not just why things didn't work.

neural network for sudoku solver

I recently started learning neural networks, and I thought that creating a sudoku solver would be a nice application for NN. I started learning them with backward propagation neural network, but later I figured that there are tens of neural networks. At this point, I find it hard to learn all of them and then pick an appropriate one for my purpose. Hence, I am asking what would be a good choice for creating this solver. Can back propagation NN work here? If not, can you explain why and tell me which one can work.
Thanks!
Neural networks don't really seem to be the best way to solve sudoku, as others have already pointed out. I think a better (but also not really good/efficient) way would be to use an genetic algorithm. Genetic algorithms don't directly relate to NNs but its very useful to know how they work.
Better (with better i mean more likely to be sussessful and probably better for you to learn something new) ideas would include:
If you use a library:
Play around with the networks, try to train them to different datasets, maybe random numbers and see what you get and how you have to tune the parameters to get better results.
Try to write an image generator. I wrote a few of them and they are stil my favourite projects, with one of them i used backprop to teach a NN what x/y coordinate of the image has which color, and the other aproach combines random generated images with ine another (GAN/NEAT).
Try to use create a movie (series of images) of the network learning to create a picture. It will show you very well how backprop works and what parameter tuning does to the results and how it changes how the network gets to the result.
If you are not using a library:
Try to solve easy problems, one after the other. Use backprop or a genetic algorithm for training (whatever you have implemented).
Try to improove your implementation and change some things that nobody else cares about and see how it changes the results.
List of 'tasks' for your Network:
XOR (basically the hello world of NN)
Pole balancing problem
Simple games like pong
More complex games like flappy bird, agar.io etc.
Choose more problems that you find interesting, maybe you are into image recognition, maybe text, audio, who knows. Think of something you can/would like to be able to do and find a way to make you computer do it for you.
It's not advisable to only use your own NN implemetation, since it will probably not work properly the first few times and you'll get frustratet. Experiment with librarys and your own implementation.
Good way to find almost endless resources:
Use google search and add 'filetype:pdf' in the end in order to only show pdf files. Search for neural network, genetic algorithm, evolutional neural network.
Neither neural nets not GAs are close to ideal solutions for Sudoku. I would advise to look into Constraint Programming (eg. the Choco or Gecode solver). See https://gist.github.com/marioosh/9188179 for example. Should solve any 9x9 sudoku in a matter of milliseconds (the daily Sudokus of "Le monde" journal are created using this type of technology BTW).
There is also a famous "Dancing links" algorithm for this problem by Knuth that works very well https://en.wikipedia.org/wiki/Dancing_Links
Just like was mentioned in the comments, you probably want to take a look at convolutional networks. You basically input the sudoku bord as an two dimensional 'image'. I think using a receptive field of 3x3 would be quite interesting, and I don't really think you need more than one filter.
The harder thing is normalization: the numbers 1-9 don't have an underlying relation in sudoku, you could easily replace them by A-I for example. So they are categories, not numbers. However, one-hot encoding every output would mean a lot of inputs, so i'd stick to numerical normalization (1=0.1, 2 = 0.2, etc.)
The output of your network should be a softmax with of some kind: if you don't use softmax, and instead outupt just an x and y coordinate, then you can't assure that the outputedd square has not been filled in yet.
A numerical value should be passed along with the output, to show what number the network wants to fill in.
As PLEXATIC mentionned, neural-nets aren't really well suited for these kind of task. Genetic algorithm sounds good indeed.
However, if you still want to stick with neural-nets you could have a look at https://github.com/Kyubyong/sudoku. As answered Thomas W, 3x3 looks nice.
If you don't want to deal with CNN, you could find some answers here as well. https://www.kaggle.com/dithyrambe/neural-nets-as-sudoku-solvers

Use a trained neural network to imitate its training data

I'm in the overtures of designing a prose imitation system. It will read a bunch of prose, then mimic it. It's mostly for fun so the mimicking prose doesn't need to make too much sense, but I'd like to make it as good as I can, with a minimal amount of effort.
My first idea is to use my example prose to train a classifying feed-forward neural network, which classifies its input as either part of the training data or not part. Then I'd like to somehow invert the neural network, finding new random inputs that also get classified by the trained network as being part of the training data. The obvious and stupid way of doing this is to randomly generate word lists and only output the ones that get classified above a certain threshold, but I think there is a better way, using the network itself to limit the search to certain regions of the input space. For example, maybe you could start with a random vector and do gradient descent optimisation to find a local maximum around the random starting point. Is there a word for this kind of imitation process? What are some of the known methods?
How about Generative Adversarial Networks (GAN, Goodfellow 2014) and their more advanced siblings like Deep Convolutional Generative Adversarial Networks? There are plenty of proper research articles out there, and also more gentle introductions like this one on DCGAN and this on GAN. To quote the latter:
GANs are an interesting idea that were first introduced in 2014 by a
group of researchers at the University of Montreal lead by Ian
Goodfellow (now at OpenAI). The main idea behind a GAN is to have two
competing neural network models. One takes noise as input and
generates samples (and so is called the generator). The other model
(called the discriminator) receives samples from both the generator
and the training data, and has to be able to distinguish between the
two sources. These two networks play a continuous game, where the
generator is learning to produce more and more realistic samples, and
the discriminator is learning to get better and better at
distinguishing generated data from real data. These two networks are
trained simultaneously, and the hope is that the competition will
drive the generated samples to be indistinguishable from real data.
(DC)GAN should fit your task quite well.

Criteria Behind Structuring a Neural Network

I'm just starting with Torch and neural networks and just glancing at a lot of sample code and tutorials, I see a lot of variety in the how people structure their neural networks. There are layers like Linear(), Tanh(), Sigmoid() as well as criterions like MSE, ClassNLL, MultiMargin, etc.
I'm wondering what kind of factors people keep in mind when creating the structure of their network? For example, I know that in a ClassNLLCriterion, you want to have the last layer of your network be a LogSoftMax() layer so that you can input the right log probabilities.
Are there any other general rules or guidelines when it comes to creating these networks?
Thanks
Here is a good webpage which contains the pros and cons of some of the main activation functions;
http://cs231n.github.io/neural-networks-1/#actfun
It can boil down to the problem at hand and knowing what to do when something goes wrong. As an example, if you have a huge dataset and you can't churn through it terribly quickly then a ReLU might be better in order to quickly get to a local minimum. However you could find that some of the ReLU units "die" so you might want to keep a track on the proportion of activated neurons in that particular layer to make sure this hasn't happened.
In terms of criterions, they are also problem specific but a bit less ambiguous. For example, binary cross entropy for binary classification, MSE for regression etc. It really depends on the objective of the whole project.
For the overall network architecture, I personally find it can be a case of trying out different architectures and seeing which ones work and which don't on your test set. If you think that the problem at hand is terribly complex and you need a complex network to solve the problem then you will probably want to try making a very deep network to begin with, then add/remove a few layers at a time to see if you have under/overfitted. As another example, if you are using convolutional network and the input is relatively small then you might try and use a smaller set of convolutional filters to begin with.

Why do we use neural networks in computers?

Why do we use neural networks? It's biologic. Aren't there any more solutions that're more "suitable" for computers?
In other words: Why do we use the human brain as a model for inspiration for artifical intelligence?
Neural networks aren't really very biological. They resemble, at a very general level, the architecture of neurons, but it's a great exaggeration to say that they work "just like the brain" (an exaggeration that's encouraged by some neural-net advocates, alas).
Neural nets are mostly used for fuzzy, difficult problems that don't yield to traditional algorithmic approaches. IOWs, there are more "suitable" solutions for computers, but sometimes those solutions don't work, and in those cases one approach is a neural network.
Why do we use neural networks?
Because they're simple to construct, and often appear to be a good approach to certain classes of problems, such as pattern recognition.
Aren't there any more solutions that're more "suitable" for computers?
Yes, implementations that more closely match a computer's architecture can be more suitable for the computer, but then can be less suitable for an effective solution.
Why do we use the human brain as a model for inspiration for artifical intelligence?
Because our brain is the superior example we have of something intelligent.
Neural Networks are still used for two reasons.
They are easy to understand for people who don't want to delve into the math of a more complicated algorithm.
They have a really good name. I mean when you role into a CEO's office to sell him your model which would you rather say, Neural Network or Support Vector Machine. When he asks how it works you can just say "just like the neurons in your brain", which is something most people understand. If you try and explain a support vector machine Mr. CEO is going to be lost (Not because he is dumb but because SVMs are harder to understand).
Sometimes they are still useful however I think that the training time is often just too long.
I don't understand the question. Neural nets are suitable for certain functions, and not others. The same is true for various other sorts of classes of algorithms, regardless of what they might have been inspired by.
If we have a good many inputs to something, and we want some outputs, and we have a set of example inputs with known desired outputs, and we don't want to calculate a function ourselves, neural nets are excellent. We feed in the example inputs, compare the output to the example outputs, and adjust the inner workings of the NN in an automatic fashion, to make the NN output closer to the desired output.
This sort of function derivation is very useful in various forms of pattern recognition and general classification. It isn't a panacea, of course. It has no explanatory power (in that you can't look at the innards to see why it classifies something in a particular way), it doesn't offer guarantees of correctness within certain limits, validating how well it works is difficult, and gathering enough examples for training and validation can be expensive or even impossible. The trick is to know when to use a NN and what sort to use.
There are, of course, people who oversell the things as some sort of super solution or even an explanation of human thought, and you might be reacting to them.
Neural network are only "inspired" by the neural structure of our brain, but they are not even close to the complexity of the behaviour of a real neuron (to date there is no neuron model that captures the complexity of a SINGLE neuron, don't even think about a neuronal population...)
Although "neural", machine "learning" and other "pseudo-bio" (like "genetic algorithms") terms are very "cool", that does not mean that they are actually based on real biological processes.
Just that they may very approximatively remind of a biological situation.
NB: of course this does not make them useless! They're very very important in many fields!
Neural networks have been around for a while, and originally were developed to model as close an understanding as we had at the time to the way neurons work in the brain. They represent a network of neurons, hence "neural network." Since computers and brains are very different hardware-wise, implementing anything like a brain with a computer is going to be rather clunky. However, as others have stated so far, neural networks can be useful for some things that are vague such as pattern recognition, facial recognition, and other similar uses. They are also still useful as a basic model of how neurons connect and are often used in Cognitive Science and other fields of artificial intelligence to try to understand how small parts of the complex human brain might make simple decisions. Unfortunately, once a neural network "learns" something, it is very difficult to understand how it actually makes its decisions.
There are, of course, many misuses of neural networks and in most non-research applications, other algorithms have been developed that are much more accurate. If a piece of business software proudly proclaims it uses a neural network, chances are it probably doesn't need it, and might be using it to inefficiently perform a task that could be performed in a much easier way. Unless the software is actually "learning" on the fly, which is very rare, neural networks are pretty much useless. And even when the software is "learning", sometimes neural networks aren't the best way to go.
While I admit, I tinker with Neural Networks because of my hopes in creating high level AI, however, you can look at a Neural Network as being more than just just an artificial representation of a human brain, but as a Mathematical construct.
For example Let's say you have a function y = f(x) or more abstractly y = f(x1, x2, ..., xn-1, xn), Neural networks themselves act as functions, or even a set of functions, taking in a large input and producing some output [y1, y2, ..., yn-1, yn] = f(x1, x2, ..., xn-1, xn)
Furthermore, they are not static, but instead can continue adapting and learning and eventually extrapolate(predict) interesting things. Their abstractness can even result in them coming up with unique solutions to problems that haven't haven't been thought up yet. For example the TDGammon program learned to play backgammon and beat the world champion. The world champion stated that the program play a unique end game that he had never seen. (that's pretty awesome if you ask me considering the complexity of NNs)
And then when you look at recurrent neural networks (i.e. can have internal feedback loops, or pipe their output back into their input, while consuming new input) they can solve even more interesting problems, and map even more complex functions.
In a nutshell Neural Networks are like a very very abstract high dimensional function and capable of mapping/learning very interesting things that would be otherwise impossible to program programmatically. For example, the energy needed to calculate the total net Forces of Gravity on a large number of objects is intense (you have to calculate it for each object, and against each object), but once a neural network learns how to map it they can do these complex calculations that would run in exponential or combinatoric? time in polynomial time. Just look at how fast your brain processes physics data, spatial data/ images / sound when you dream. That's the potential computation power of Neural Networks. And to also mention the way they store data is very clever as well (in synaptics patterns, i.e. memories)
Artificial intelligence is a branch of computer science devoted to making computers more 'biologic.' This is useful when you want a computer to do human(biologic) things like play chess, or imitate casual conversation.
Human brains are much more efficient and powerful in some ways than the most powerful computers, so it makes sense to try to imitate a biological way of processing information.
Most neural networks I'm aware of are nothing more than flexible interpolators. Backpropagating of errors is easy and fast, here are some possible uses :
Classification of data
Some games (modern backgammon AIs beat the best players in the world, the evaluation function is a neural net)
Pattern recognition (OCR ?)
There is nothing particularly related to human intelligence. There are other uses of neural nets, I have seen an implementation of associative memory which allowed for degradation without (much) data loss, pretty much like the brain which sees some neurons die with time.