Recently, I was asked about how to pre-train a deep neural network with unlabeled data, meaning, instead of initializing the model weight with small random numbers, we set initial weight from a pretrained model (with unlabeled data).
Well, intuitively, I kinda get it, it probably helps with the vanishing gradient issue and shorten the training time when there are not too much labeled data available. But still, I don't really know how it is done, how can you train a neural network with unlabeled data? Is it something like SOM or Boltzmann machine?
Has anybody heard about this? If yes, can you provide some links to sources or papers. I am curious. Greatly appreciate!
There are lots of ways to deep-learn from unlabeled data. Layerwise pre-training was developed back in the 2000s by Geoff Hinton's group, though that's generally fallen out of favor.
More modern unsupervised deep learning methods include Auto-Encoders, Variational Auto-Encoders, and Generative Adversarial Networks. I won't dive into the details of all of them, but the simplest of these, auto-encoders, work by compressing an unlabeled input into a low dimensional real-valued representation, and using this compressed representation to reconstruct the original input. Intuitively, a compressed code that can effectively be used to recreate an input is likely to capture some useful features of said input. See here for an illustration and more detailed description. There are also plenty of examples implemented in your deep learning library of choice.
I guess in some sense any of the listed methods could be used as pre-training, e.g for preparing a network for a discriminative task like classification, though I'm not aware of that being a particularly common practice. Initialization methods, activation functions, and other optimization tricks are generally advanced enough to do well without more complicated initialization procedures.
I'm in the overtures of designing a prose imitation system. It will read a bunch of prose, then mimic it. It's mostly for fun so the mimicking prose doesn't need to make too much sense, but I'd like to make it as good as I can, with a minimal amount of effort.
My first idea is to use my example prose to train a classifying feed-forward neural network, which classifies its input as either part of the training data or not part. Then I'd like to somehow invert the neural network, finding new random inputs that also get classified by the trained network as being part of the training data. The obvious and stupid way of doing this is to randomly generate word lists and only output the ones that get classified above a certain threshold, but I think there is a better way, using the network itself to limit the search to certain regions of the input space. For example, maybe you could start with a random vector and do gradient descent optimisation to find a local maximum around the random starting point. Is there a word for this kind of imitation process? What are some of the known methods?
How about Generative Adversarial Networks (GAN, Goodfellow 2014) and their more advanced siblings like Deep Convolutional Generative Adversarial Networks? There are plenty of proper research articles out there, and also more gentle introductions like this one on DCGAN and this on GAN. To quote the latter:
GANs are an interesting idea that were first introduced in 2014 by a
group of researchers at the University of Montreal lead by Ian
Goodfellow (now at OpenAI). The main idea behind a GAN is to have two
competing neural network models. One takes noise as input and
generates samples (and so is called the generator). The other model
(called the discriminator) receives samples from both the generator
and the training data, and has to be able to distinguish between the
two sources. These two networks play a continuous game, where the
generator is learning to produce more and more realistic samples, and
the discriminator is learning to get better and better at
distinguishing generated data from real data. These two networks are
trained simultaneously, and the hope is that the competition will
drive the generated samples to be indistinguishable from real data.
(DC)GAN should fit your task quite well.
I'm just starting with Torch and neural networks and just glancing at a lot of sample code and tutorials, I see a lot of variety in the how people structure their neural networks. There are layers like Linear(), Tanh(), Sigmoid() as well as criterions like MSE, ClassNLL, MultiMargin, etc.
I'm wondering what kind of factors people keep in mind when creating the structure of their network? For example, I know that in a ClassNLLCriterion, you want to have the last layer of your network be a LogSoftMax() layer so that you can input the right log probabilities.
Are there any other general rules or guidelines when it comes to creating these networks?
Thanks
Here is a good webpage which contains the pros and cons of some of the main activation functions;
http://cs231n.github.io/neural-networks-1/#actfun
It can boil down to the problem at hand and knowing what to do when something goes wrong. As an example, if you have a huge dataset and you can't churn through it terribly quickly then a ReLU might be better in order to quickly get to a local minimum. However you could find that some of the ReLU units "die" so you might want to keep a track on the proportion of activated neurons in that particular layer to make sure this hasn't happened.
In terms of criterions, they are also problem specific but a bit less ambiguous. For example, binary cross entropy for binary classification, MSE for regression etc. It really depends on the objective of the whole project.
For the overall network architecture, I personally find it can be a case of trying out different architectures and seeing which ones work and which don't on your test set. If you think that the problem at hand is terribly complex and you need a complex network to solve the problem then you will probably want to try making a very deep network to begin with, then add/remove a few layers at a time to see if you have under/overfitted. As another example, if you are using convolutional network and the input is relatively small then you might try and use a smaller set of convolutional filters to begin with.
How do I approach the problem with a neural network and a intrusion detection system where by lets say we have an attack via FTP.
Lets say some one attempts to continuously try different logins via brute force attack on an ftp account.
How would I set the structure of the NN? What things do I have to consider? How would it recognise "similar approaches in the future"?
Any diagrams and input would be much appreciated.
Your question is extremely general and a good answer is a project in itself. I recommend contracting someone with experience in neural network design to help come up with an appropriate model or even tell you whether your problem is amenable to using a neural network. A few ideas, though:
Inputs need to be quantized, so start by making a list of possible numeric inputs that you could measure.
Outputs also need to be quantized and you probably can't generate a simple "Yes/no" response. Most likely you'll want to generate one or more numbers that represent a rough probability of it being an attack, perhaps broken down by category.
You'll need to accumulate a large set of training data that has been analyzed and quantized into the inputs and outputs you've designed. Figuring out the process of doing this quantization is a huge part of the overall problem.
You'll also need a large set of validation data, which should be quantized in the same way as the training data, but that should not take any part in the training, as otherwise you will simply force a correlation network that may well be completely meaningless.
Once you've completed the above, you can think about how you want to structure your network and the specific algorithms you want to use to train it. There is a wide range of literature on this topic, but, honestly, this is the simpler part of the problem. Representing the problem in a way that can be processed coherently is much more difficult.
Why do we use neural networks? It's biologic. Aren't there any more solutions that're more "suitable" for computers?
In other words: Why do we use the human brain as a model for inspiration for artifical intelligence?
Neural networks aren't really very biological. They resemble, at a very general level, the architecture of neurons, but it's a great exaggeration to say that they work "just like the brain" (an exaggeration that's encouraged by some neural-net advocates, alas).
Neural nets are mostly used for fuzzy, difficult problems that don't yield to traditional algorithmic approaches. IOWs, there are more "suitable" solutions for computers, but sometimes those solutions don't work, and in those cases one approach is a neural network.
Why do we use neural networks?
Because they're simple to construct, and often appear to be a good approach to certain classes of problems, such as pattern recognition.
Aren't there any more solutions that're more "suitable" for computers?
Yes, implementations that more closely match a computer's architecture can be more suitable for the computer, but then can be less suitable for an effective solution.
Why do we use the human brain as a model for inspiration for artifical intelligence?
Because our brain is the superior example we have of something intelligent.
Neural Networks are still used for two reasons.
They are easy to understand for people who don't want to delve into the math of a more complicated algorithm.
They have a really good name. I mean when you role into a CEO's office to sell him your model which would you rather say, Neural Network or Support Vector Machine. When he asks how it works you can just say "just like the neurons in your brain", which is something most people understand. If you try and explain a support vector machine Mr. CEO is going to be lost (Not because he is dumb but because SVMs are harder to understand).
Sometimes they are still useful however I think that the training time is often just too long.
I don't understand the question. Neural nets are suitable for certain functions, and not others. The same is true for various other sorts of classes of algorithms, regardless of what they might have been inspired by.
If we have a good many inputs to something, and we want some outputs, and we have a set of example inputs with known desired outputs, and we don't want to calculate a function ourselves, neural nets are excellent. We feed in the example inputs, compare the output to the example outputs, and adjust the inner workings of the NN in an automatic fashion, to make the NN output closer to the desired output.
This sort of function derivation is very useful in various forms of pattern recognition and general classification. It isn't a panacea, of course. It has no explanatory power (in that you can't look at the innards to see why it classifies something in a particular way), it doesn't offer guarantees of correctness within certain limits, validating how well it works is difficult, and gathering enough examples for training and validation can be expensive or even impossible. The trick is to know when to use a NN and what sort to use.
There are, of course, people who oversell the things as some sort of super solution or even an explanation of human thought, and you might be reacting to them.
Neural network are only "inspired" by the neural structure of our brain, but they are not even close to the complexity of the behaviour of a real neuron (to date there is no neuron model that captures the complexity of a SINGLE neuron, don't even think about a neuronal population...)
Although "neural", machine "learning" and other "pseudo-bio" (like "genetic algorithms") terms are very "cool", that does not mean that they are actually based on real biological processes.
Just that they may very approximatively remind of a biological situation.
NB: of course this does not make them useless! They're very very important in many fields!
Neural networks have been around for a while, and originally were developed to model as close an understanding as we had at the time to the way neurons work in the brain. They represent a network of neurons, hence "neural network." Since computers and brains are very different hardware-wise, implementing anything like a brain with a computer is going to be rather clunky. However, as others have stated so far, neural networks can be useful for some things that are vague such as pattern recognition, facial recognition, and other similar uses. They are also still useful as a basic model of how neurons connect and are often used in Cognitive Science and other fields of artificial intelligence to try to understand how small parts of the complex human brain might make simple decisions. Unfortunately, once a neural network "learns" something, it is very difficult to understand how it actually makes its decisions.
There are, of course, many misuses of neural networks and in most non-research applications, other algorithms have been developed that are much more accurate. If a piece of business software proudly proclaims it uses a neural network, chances are it probably doesn't need it, and might be using it to inefficiently perform a task that could be performed in a much easier way. Unless the software is actually "learning" on the fly, which is very rare, neural networks are pretty much useless. And even when the software is "learning", sometimes neural networks aren't the best way to go.
While I admit, I tinker with Neural Networks because of my hopes in creating high level AI, however, you can look at a Neural Network as being more than just just an artificial representation of a human brain, but as a Mathematical construct.
For example Let's say you have a function y = f(x) or more abstractly y = f(x1, x2, ..., xn-1, xn), Neural networks themselves act as functions, or even a set of functions, taking in a large input and producing some output [y1, y2, ..., yn-1, yn] = f(x1, x2, ..., xn-1, xn)
Furthermore, they are not static, but instead can continue adapting and learning and eventually extrapolate(predict) interesting things. Their abstractness can even result in them coming up with unique solutions to problems that haven't haven't been thought up yet. For example the TDGammon program learned to play backgammon and beat the world champion. The world champion stated that the program play a unique end game that he had never seen. (that's pretty awesome if you ask me considering the complexity of NNs)
And then when you look at recurrent neural networks (i.e. can have internal feedback loops, or pipe their output back into their input, while consuming new input) they can solve even more interesting problems, and map even more complex functions.
In a nutshell Neural Networks are like a very very abstract high dimensional function and capable of mapping/learning very interesting things that would be otherwise impossible to program programmatically. For example, the energy needed to calculate the total net Forces of Gravity on a large number of objects is intense (you have to calculate it for each object, and against each object), but once a neural network learns how to map it they can do these complex calculations that would run in exponential or combinatoric? time in polynomial time. Just look at how fast your brain processes physics data, spatial data/ images / sound when you dream. That's the potential computation power of Neural Networks. And to also mention the way they store data is very clever as well (in synaptics patterns, i.e. memories)
Artificial intelligence is a branch of computer science devoted to making computers more 'biologic.' This is useful when you want a computer to do human(biologic) things like play chess, or imitate casual conversation.
Human brains are much more efficient and powerful in some ways than the most powerful computers, so it makes sense to try to imitate a biological way of processing information.
Most neural networks I'm aware of are nothing more than flexible interpolators. Backpropagating of errors is easy and fast, here are some possible uses :
Classification of data
Some games (modern backgammon AIs beat the best players in the world, the evaluation function is a neural net)
Pattern recognition (OCR ?)
There is nothing particularly related to human intelligence. There are other uses of neural nets, I have seen an implementation of associative memory which allowed for degradation without (much) data loss, pretty much like the brain which sees some neurons die with time.