Why do we use neural networks? It's biologic. Aren't there any more solutions that're more "suitable" for computers?
In other words: Why do we use the human brain as a model for inspiration for artifical intelligence?
Neural networks aren't really very biological. They resemble, at a very general level, the architecture of neurons, but it's a great exaggeration to say that they work "just like the brain" (an exaggeration that's encouraged by some neural-net advocates, alas).
Neural nets are mostly used for fuzzy, difficult problems that don't yield to traditional algorithmic approaches. IOWs, there are more "suitable" solutions for computers, but sometimes those solutions don't work, and in those cases one approach is a neural network.
Why do we use neural networks?
Because they're simple to construct, and often appear to be a good approach to certain classes of problems, such as pattern recognition.
Aren't there any more solutions that're more "suitable" for computers?
Yes, implementations that more closely match a computer's architecture can be more suitable for the computer, but then can be less suitable for an effective solution.
Why do we use the human brain as a model for inspiration for artifical intelligence?
Because our brain is the superior example we have of something intelligent.
Neural Networks are still used for two reasons.
They are easy to understand for people who don't want to delve into the math of a more complicated algorithm.
They have a really good name. I mean when you role into a CEO's office to sell him your model which would you rather say, Neural Network or Support Vector Machine. When he asks how it works you can just say "just like the neurons in your brain", which is something most people understand. If you try and explain a support vector machine Mr. CEO is going to be lost (Not because he is dumb but because SVMs are harder to understand).
Sometimes they are still useful however I think that the training time is often just too long.
I don't understand the question. Neural nets are suitable for certain functions, and not others. The same is true for various other sorts of classes of algorithms, regardless of what they might have been inspired by.
If we have a good many inputs to something, and we want some outputs, and we have a set of example inputs with known desired outputs, and we don't want to calculate a function ourselves, neural nets are excellent. We feed in the example inputs, compare the output to the example outputs, and adjust the inner workings of the NN in an automatic fashion, to make the NN output closer to the desired output.
This sort of function derivation is very useful in various forms of pattern recognition and general classification. It isn't a panacea, of course. It has no explanatory power (in that you can't look at the innards to see why it classifies something in a particular way), it doesn't offer guarantees of correctness within certain limits, validating how well it works is difficult, and gathering enough examples for training and validation can be expensive or even impossible. The trick is to know when to use a NN and what sort to use.
There are, of course, people who oversell the things as some sort of super solution or even an explanation of human thought, and you might be reacting to them.
Neural network are only "inspired" by the neural structure of our brain, but they are not even close to the complexity of the behaviour of a real neuron (to date there is no neuron model that captures the complexity of a SINGLE neuron, don't even think about a neuronal population...)
Although "neural", machine "learning" and other "pseudo-bio" (like "genetic algorithms") terms are very "cool", that does not mean that they are actually based on real biological processes.
Just that they may very approximatively remind of a biological situation.
NB: of course this does not make them useless! They're very very important in many fields!
Neural networks have been around for a while, and originally were developed to model as close an understanding as we had at the time to the way neurons work in the brain. They represent a network of neurons, hence "neural network." Since computers and brains are very different hardware-wise, implementing anything like a brain with a computer is going to be rather clunky. However, as others have stated so far, neural networks can be useful for some things that are vague such as pattern recognition, facial recognition, and other similar uses. They are also still useful as a basic model of how neurons connect and are often used in Cognitive Science and other fields of artificial intelligence to try to understand how small parts of the complex human brain might make simple decisions. Unfortunately, once a neural network "learns" something, it is very difficult to understand how it actually makes its decisions.
There are, of course, many misuses of neural networks and in most non-research applications, other algorithms have been developed that are much more accurate. If a piece of business software proudly proclaims it uses a neural network, chances are it probably doesn't need it, and might be using it to inefficiently perform a task that could be performed in a much easier way. Unless the software is actually "learning" on the fly, which is very rare, neural networks are pretty much useless. And even when the software is "learning", sometimes neural networks aren't the best way to go.
While I admit, I tinker with Neural Networks because of my hopes in creating high level AI, however, you can look at a Neural Network as being more than just just an artificial representation of a human brain, but as a Mathematical construct.
For example Let's say you have a function y = f(x) or more abstractly y = f(x1, x2, ..., xn-1, xn), Neural networks themselves act as functions, or even a set of functions, taking in a large input and producing some output [y1, y2, ..., yn-1, yn] = f(x1, x2, ..., xn-1, xn)
Furthermore, they are not static, but instead can continue adapting and learning and eventually extrapolate(predict) interesting things. Their abstractness can even result in them coming up with unique solutions to problems that haven't haven't been thought up yet. For example the TDGammon program learned to play backgammon and beat the world champion. The world champion stated that the program play a unique end game that he had never seen. (that's pretty awesome if you ask me considering the complexity of NNs)
And then when you look at recurrent neural networks (i.e. can have internal feedback loops, or pipe their output back into their input, while consuming new input) they can solve even more interesting problems, and map even more complex functions.
In a nutshell Neural Networks are like a very very abstract high dimensional function and capable of mapping/learning very interesting things that would be otherwise impossible to program programmatically. For example, the energy needed to calculate the total net Forces of Gravity on a large number of objects is intense (you have to calculate it for each object, and against each object), but once a neural network learns how to map it they can do these complex calculations that would run in exponential or combinatoric? time in polynomial time. Just look at how fast your brain processes physics data, spatial data/ images / sound when you dream. That's the potential computation power of Neural Networks. And to also mention the way they store data is very clever as well (in synaptics patterns, i.e. memories)
Artificial intelligence is a branch of computer science devoted to making computers more 'biologic.' This is useful when you want a computer to do human(biologic) things like play chess, or imitate casual conversation.
Human brains are much more efficient and powerful in some ways than the most powerful computers, so it makes sense to try to imitate a biological way of processing information.
Most neural networks I'm aware of are nothing more than flexible interpolators. Backpropagating of errors is easy and fast, here are some possible uses :
Classification of data
Some games (modern backgammon AIs beat the best players in the world, the evaluation function is a neural net)
Pattern recognition (OCR ?)
There is nothing particularly related to human intelligence. There are other uses of neural nets, I have seen an implementation of associative memory which allowed for degradation without (much) data loss, pretty much like the brain which sees some neurons die with time.
Related
Recently, I was asked about how to pre-train a deep neural network with unlabeled data, meaning, instead of initializing the model weight with small random numbers, we set initial weight from a pretrained model (with unlabeled data).
Well, intuitively, I kinda get it, it probably helps with the vanishing gradient issue and shorten the training time when there are not too much labeled data available. But still, I don't really know how it is done, how can you train a neural network with unlabeled data? Is it something like SOM or Boltzmann machine?
Has anybody heard about this? If yes, can you provide some links to sources or papers. I am curious. Greatly appreciate!
There are lots of ways to deep-learn from unlabeled data. Layerwise pre-training was developed back in the 2000s by Geoff Hinton's group, though that's generally fallen out of favor.
More modern unsupervised deep learning methods include Auto-Encoders, Variational Auto-Encoders, and Generative Adversarial Networks. I won't dive into the details of all of them, but the simplest of these, auto-encoders, work by compressing an unlabeled input into a low dimensional real-valued representation, and using this compressed representation to reconstruct the original input. Intuitively, a compressed code that can effectively be used to recreate an input is likely to capture some useful features of said input. See here for an illustration and more detailed description. There are also plenty of examples implemented in your deep learning library of choice.
I guess in some sense any of the listed methods could be used as pre-training, e.g for preparing a network for a discriminative task like classification, though I'm not aware of that being a particularly common practice. Initialization methods, activation functions, and other optimization tricks are generally advanced enough to do well without more complicated initialization procedures.
I'm just starting with Torch and neural networks and just glancing at a lot of sample code and tutorials, I see a lot of variety in the how people structure their neural networks. There are layers like Linear(), Tanh(), Sigmoid() as well as criterions like MSE, ClassNLL, MultiMargin, etc.
I'm wondering what kind of factors people keep in mind when creating the structure of their network? For example, I know that in a ClassNLLCriterion, you want to have the last layer of your network be a LogSoftMax() layer so that you can input the right log probabilities.
Are there any other general rules or guidelines when it comes to creating these networks?
Thanks
Here is a good webpage which contains the pros and cons of some of the main activation functions;
http://cs231n.github.io/neural-networks-1/#actfun
It can boil down to the problem at hand and knowing what to do when something goes wrong. As an example, if you have a huge dataset and you can't churn through it terribly quickly then a ReLU might be better in order to quickly get to a local minimum. However you could find that some of the ReLU units "die" so you might want to keep a track on the proportion of activated neurons in that particular layer to make sure this hasn't happened.
In terms of criterions, they are also problem specific but a bit less ambiguous. For example, binary cross entropy for binary classification, MSE for regression etc. It really depends on the objective of the whole project.
For the overall network architecture, I personally find it can be a case of trying out different architectures and seeing which ones work and which don't on your test set. If you think that the problem at hand is terribly complex and you need a complex network to solve the problem then you will probably want to try making a very deep network to begin with, then add/remove a few layers at a time to see if you have under/overfitted. As another example, if you are using convolutional network and the input is relatively small then you might try and use a smaller set of convolutional filters to begin with.
I am planning to use neural networks for approximating a value function in a reinforcement learning algorithm. I want to do that to introduce some generalization and flexibility on how I represent states and actions.
Now, it looks to me that neural networks are the right tool to do that, however I have limited visibility here since I am not an AI expert. In particular, it seems that neural networks are being replaced by other technologies these days, e.g. support vector machines, but I am unsure if this is a fashion matter or if there is some real limitation in neural networks that could doom my approach. Do you have any suggestion?
Thanks,
Tunnuz
It's true that neural networks are no longer in vogue, as they once were, but they're hardly dead. The general reason for them falling from favor was the rise of the Support Vector Machine, because they converge globally and require fewer parameter specifications.
However, SVMs are very burdensome to implement and don't naturally generalize to reinforcement learning like ANNs do (SVMs are primarily used for offline decision problems).
I'd suggest you stick to ANNs if your task seems suitable to one, as within the realm of reinforcement learning, ANNs are still at the forefront in performance.
Here's a great place to start; just check out the section titled "Temporal Difference Learning" as that's the standard way ANNs solve reinforcement learning problems.
One caveat though: the recent trend in machine learning is to use many diverse learning agents together via bagging or boosting. While I haven't seen this as much in reinforcement learning, I'm sure employing this strategy would still be much more powerful than an ANN alone. But unless you really need world class performance (this is what won the netflix competition), I'd steer clear of this extremely complex technique.
It seems to me that neural networks are kind of making a comeback. For example, this year there were a bunch of papers at ICML 2011 on neural networks. I would definitely not consider them abandonware. That being said, I would not use them for reinforcement learning.
Neural networks are a decent general way of approximating complex functions, but they are rarely the best choice for any specific learning task. They are difficult to design, slow to converge, and get stuck in local minima.
If you have no experience with neural networks, then you might be happier to you use a more straightforward method of generalizing RL, such as coarse coding.
Theoretically it has been proved that Neural Networks can approximate any function (given an infinite number of hidden neurons and the necessary inputs), so no I don't think the neural networks will ever be abandonwares.
SVM are great, but they cannot be used for all applications while Neural Networks can be used for any purpose.
Using neural networks in combination with reinforcement learning is standard and well-known, but be careful to plot and debug your neural network's convergence to check that it works correctly as neural networks are notoriously known to be hard to implement and learn correctly.
Be also very careful about the representation of the problem you give to your neural network (ie: the inputs nodes): could you, or could an expert, solve the problem given what you give as inputs to your net? Very often, people implementing neural networks don't give enough informations for the neural net to reason, this is not so uncommon, so be careful with that.
Do you know if anyone has tried to compile high level programming languages (java, c#, etc') into a recurrent neural network and then evolve them?
I mean that the whole process including memory usage is stored in a graph of a neural net, and I'm talking about complex programs (thinking about natural language processing problems).
When I say neural net I mean a directed weighted graphs that spreads activation, and the nodes are functions of their inputs (linear, sigmoid and multiplicative to keep it simple).
Furthermore, is that what people mean in genetic programming or is there a difference?
Neural networks are not particularly well suited for evolving programs; their strength tends to be in classification. If anyone has tried, I haven't heard about it (which considering I barely touch neural networks is not a surprise, but I am active in the general AI field at the moment).
The main reason why neural networks aren't useful for generating programs is that they basically represent a mathematical equation (numeric, rather than functional). Given some numeric input, you get a numeric output. It is difficult to interpret these in the context of a program any more complicated than simple arithmetic.
Genetic Programming traditionally uses Lisp, which is a pure functional language, and often programs are often shown as tree diagrams (which occasionally look similar to some neural network diagrams - is this the source of your confusion?). The programs are evolved by exchanging entire branches of a tree (a function and all its parameters) between programs or regenerating an entire branch randomly.
There are certainly a lot of good (and a lot of bad) references on both of these topics out there - I refrain from listing them because it isn't clear what you are actually interested in. Wikipedia covers each of these techniques, and is a good starting point.
Genetic programming is very different from Neural networks. What you are suggesting is more along the lines of genetic programming - making small random changes to a program, possibly "breeding" successful programs. It is not easy, and I have my doubts that it can be done successfully across a large program.
You may have more luck extracting a small but critical part of your program, one which has a few particular "aspects" (such as parameter values) that you can try to evolve.
Google is your friend.
Some sophisticated anti-virus programs as well as sophisticated malware use formal grammar and genetic operators to evolve against each other using neural networks.
Here is an example paper on the topic: http://nexginrc.org/nexginrcAdmin/PublicationsFiles/raid09-sadia.pdf
Sources: A class on Artificial Intelligence I took a couple years ago.
With regards to your main question, no one has ever tried that on programming languages to the best of my knowledge, but there is some research in the field of evolutionary computation that could be compared to something like that (but it's obviously a far-fetched comparison). As a matter of possible interest, I asked a similar question about sel-improving compilers a while ago.
For a difference between genetic algorithms and genetic programming, have a look at this question.
Neural networks have nothing to do with genetic algorithms or genetic programming, but you can obviously use either to evolve neural nets (as any other thing for that matters).
You could have look at genetic-programming.org where they claim that they have found some near human competitive results produced by genetic programming.
I have not heard of self-evolving and self-imrpvoing programs before. They may exist as special research tools like genetic-programming.org have but nothing solid for generic use. And even if they exist they are very limited to special purpose operations like malware detection as Alain mentioned.
Seeing that as as far as we know, one half of your brain is logical and the other half of your brain is emotional, and that the wants of the emotional side are fed to the logical side in order to fulfill those wants; has there been any research done in connecting two separate neural networks to one another (one trained to be emotional, and one trained to be logical) to see if it would result in almost a free-will sort of "brain"?
I don't really know anything about neural networks except that they were modeled after the biological synapses in the human brain, which is why I ask.
I'm not even sure if this would be possible considering that even a trained neural network sometimes doesn't act logically (a.k.a. do what you thought you trained it to do).
First, most modern neural networks aren't really modeled after biological synapses. They use an Artificial Neuron which allowed Back Propagation to work rather than a Perceptron which is a much more accurate representation.
When you feed the output of one network into the input of another network, you've really just created one larger network, not two separate networks. It just happens that in this case portions of the networks would be trained independently.
That said, all neural networks have to be trained. Which means you need sample input and sample output. You are looking to create a decision engine of sorts I suppose. So you would need to create a dataset where it makes sense that there might be an emotional and rational response, such as purchasing an item. You'd have to train the 'rational' network to accept as a set of inputs the output of an 'emotional' network. Which means you are really just training the rational decision engine to be responsive based on the leve of 'distress' caused by the emotional network.
Just my two cents.
I have also heard of one hemisphere being called "divergent" and one "convergent". This may not make any more sense than emotional vs logical, but it does hint at how you might model it more easily. I don't know how the brain achieves some of the impressive computational feats it does, but I wouldn't be very surprised if all revolved around balance, but maybe that is just one of the baises you have when you are a brain with two hemipheres (or any even number) :D
A balance between convergence and divergence is the crux of the creativity inherent in evolution. Replicating this with neural nets sounds promising to me. Suppose you make one learning system that generalizes and keeps representations of only the typical groups of patterns it is shown. Then you take another and make it generate all the in-betweens and mutants of the patterns it is shown. Then you feed them to eachother in a circle, and poof, you have made something really interesting!
It's even more complex than that, unbelievably. The left hemisphere works on a set of logical rules, it uses these to predict its environment and categorize input. It also infers rules and stores them for future use. The right hemisphere is based, as you said, on emotion, but also on memory of single, unique or emotionally relevant occurrences. A software implementation should also be able to retrieve and store these two data types and exchange "opinions" about them.
While the left hemisphere of the brain may be more involved in making emotional decisions, emotion itself is unlikely to occur exclusively in one side of the brain, and the interplay between emotions and rational thought within the brain is likely to be substantially more complex than having two completely separate circuits. For instance, a study on rhesus macaques found that dopamine and other hormones associated with emotional responses essentially implements temporal difference learning within the brain (I'm still looking for a link to it). This suggests that separating emotional and rational thought into two separate neural networks probably wouldn't be practical, even if we had the resources to build neural networks on the scale of brain hemispheres (which we don't, or at least not within most research budgets).
This idea is supported by Sloman and Croucher's suggestion that emotion will likely be an unavoidable emergent property of a sufficiently advanced intelligent system. Such systems (discussed in detail in the paper) will be much more complex than straight-up neural nets. More importantly, though, the emotions won't be something that you can localize to one part of the system.