I have finished two neural network courses and done loads of reading on the subject. I am comfortable with Tensorflow and Keras and building advanced neural networks (multiple inputs, large data, special layers...). I also have a fairly deep understanding of the underlying mathematics.
My problem is that I know how to build neural networks but don't know the process by which an "expert" would create one for a specific application.
I can:
Collect loads of data and clean it up.
Train the neural network.
Fine tune hyper parameters.
Export it for actual applications.
What I am missing is how to come up with the layers in the neural network (how wide, what kind...). I know it is somewhat trial and error and looking at what has worked for others. But there must be a process that people can use to come up with architectures* that actually work very well. For example state of the art neural networks.
I am looking for a free resource that would help me understand this process of creating a very good architecture*.
*by architecture I mean the different layers that make up the network and their properties
I wrote my masters thesis about the topic:
Thoma, Martin. "Analysis and Optimization of Convolutional Neural Network Architectures." arXiv preprint arXiv:1707.09725 (2017).
Long story short: There are a couple of techniques for analysis (chapter 2.5) and algorithms that learn topoligies (chapter 3), but in practice it is mostly trial and error / gut feeling.
Related
I've seen some tutorial examples, like UFLDL covolutional net, where they use features obtained by unsupervised learning, or some others, where kernels are engineered by hand (using Sobel and Gabor detectors, different sharpness/blur settings etc). Strangely, I can't find a general guideline on how one should choose a good kernel for something more than a toy network. For example, considering a deep network with many convolutional-pooling layers, are the same kernels used at each layer, or does each layer have its own kernel subset? If so, where do these, deeper layer's filters come from - should I learn them using some unsupervised learning algorithm on data passed through the first convolution-and-pooling layer pair?
I understand that this question doesn't have a singular answer, I'd be happy to just the the general approach (some review article would be fantastic).
The current state of the art suggest to learn all the convolutional layers from the data using backpropagation (ref).
Also, this paper recommend small kernels (3x3) and pooling (2x2). You should train different filters for each layer.
Kernels in deep networks are mostly trained all at the same time in a supervised way (known inputs and outputs of network) using Backpropagation (computes gradients) and some version of Stochastic Gradient Descent (optimization algorithm).
Kernels in different layers are usually independent. They can have different sizes and their numbers can differ as well. How to design a network is an open question and it depends on your data and the problem itself.
If you want to work with your own dataset, you should start with an existing pre-trained network [Caffe Model Zoo] and fine-tune it on your dataset. This way, the architecture of the network would be fixed, as you would have to respect the architecture of the original network. The networks you can donwload are trained on very large problems which makes them able to generalize well to other classification/regression problems. If your dataset is at least partly similar to the original dataset, the fine-tuned networks should work very well.
Good place to get more information is Caffe # CVPR2015 tutorial.
I've been reading about feed forward Artificial Neural Networks (ANN), and normally they need training to modify their weights in order to achieve the desired output. They will also always produce the same output when receiving the same input once tuned (biological networks don't necessarily).
Then I started reading about evolving neural networks. However, the evolution usually involves recombining two parents genomes into a new genome, there is no "learning" but really recombining and verifying through a fitness test.
I was thinking, the human brain manages it's own connections. It creates connections, strengthens some, and weakens others.
Is there a neural network topology that allows for this? Where the neural network, once having a bad reaction, either adjusts it's weights accordingly, and possibly creates random new connections (I'm not sure how the brain creates new connections, but even if I didn't, a random mutation chance of creating a new connection could alleviate this). A good reaction would strengthen those connections.
I believe this type of topology is known as a Turing Type B Neural Network, but I haven't seen any coded examples or papers on it.
This paper, An Adaptive Spiking Neural Network with Hebbian Learning, specifically addresses the creation of new neurons and synapses. From the introduction:
Traditional rate-based neural networks and the newer spiking neural networks have been shown to be very effective for some tasks, but they have problems with long term learning and "catastrophic forgetting." Once a network is trained to perform some task, it is difficult to adapt it to new applications. To do this properly, one can mimic processes that occur in the human brain: neurogenesis and synaptogenesis, or the birth and death of both neurons and synapses. To be effective, however, this must be accomplished while maintaining the current memories.
If you do some searching on google with the keywords 'neurogenesis artificial neural networks', or similar, you will find more articles. There is also this similar question at cogsci.stackexchange.com.
neat networks as well as cascading add their own connections/neurons to solve problems by building structures to create specific responses to stimuli
I have decided to use a feed-forward NN with back-propagation training for my OCR application for Handwritten text and the input layer is going to be with 32*32 (1024) neurones and at least 8-12 out put neurones.
I found Neuroph easy to use by reading some articles at the same time Encog is few times better in performance. Considering the parameters in my scenario which API is the most suitable one. And I appreciate if u can comment on the number of input nodes i have taken, is it too large value (Although it is out of the topic)
First my disclaimer, I am one of the main developers on the Encog project. This means I am more familiar with Encog that Neuroph and perhaps biased towards it. In my opinion, the relative strengths of each are as follows. Encog supports quite a few interchangeable machine learning methods and training methods. Neuroph is VERY focused on neural networks and you can express a connection between just about anything. So if you are going to create very custom/non-standard (research) neural networks of different typologies than the typical Elman/Jordan, NEAT, HyperNEAT, Feedforward type networks, then Neuroph will fit the bill nicely.
I have a couple slightly modified / non-traditional setups for feedforward neural networks which I'd like to compare for accuracy against the ones used professionally today. Are there specific data sets, or types of data sets, which can be used as a benchmark for this? I.e. "the style of ANN typically used for such-and-such a task is 98% accurate against this data set." It would be great to have a variety of these, a couple for statistical analysis, a couple for image and voice recognition, etc.
Basically, is there a way to compare an ANN I've put together against ANNs used professionally, across a variety of tasks? I could pay for data or software, but would prefer free of course.
CMU has some benchmarks for neural networks: Neural Networks Benchmarks
The Fast Artificial Neural Networks library (FANN) has some benchmarks that are widely used: FANN. Download the source code (version 2.2.0) and look at the directory datasets, the format is very simple. There is always a training set (x.train) and a test set (x.test). At the beginning of the file is the number of instances, the number of inputs and the number of outputs. The next lines are the input of the first instance and the output of the first instance and so on. You can find example programs with FANN in the directory examples. I think they even had detailed comparisons to other libraries in previous versions.
I think most of FANN's benchmarks if not all are from Proben1. Google for it, there is a paper from Lutz Prechelt with detailed descriptions and comparisons.
I am planning to use neural networks for approximating a value function in a reinforcement learning algorithm. I want to do that to introduce some generalization and flexibility on how I represent states and actions.
Now, it looks to me that neural networks are the right tool to do that, however I have limited visibility here since I am not an AI expert. In particular, it seems that neural networks are being replaced by other technologies these days, e.g. support vector machines, but I am unsure if this is a fashion matter or if there is some real limitation in neural networks that could doom my approach. Do you have any suggestion?
Thanks,
Tunnuz
It's true that neural networks are no longer in vogue, as they once were, but they're hardly dead. The general reason for them falling from favor was the rise of the Support Vector Machine, because they converge globally and require fewer parameter specifications.
However, SVMs are very burdensome to implement and don't naturally generalize to reinforcement learning like ANNs do (SVMs are primarily used for offline decision problems).
I'd suggest you stick to ANNs if your task seems suitable to one, as within the realm of reinforcement learning, ANNs are still at the forefront in performance.
Here's a great place to start; just check out the section titled "Temporal Difference Learning" as that's the standard way ANNs solve reinforcement learning problems.
One caveat though: the recent trend in machine learning is to use many diverse learning agents together via bagging or boosting. While I haven't seen this as much in reinforcement learning, I'm sure employing this strategy would still be much more powerful than an ANN alone. But unless you really need world class performance (this is what won the netflix competition), I'd steer clear of this extremely complex technique.
It seems to me that neural networks are kind of making a comeback. For example, this year there were a bunch of papers at ICML 2011 on neural networks. I would definitely not consider them abandonware. That being said, I would not use them for reinforcement learning.
Neural networks are a decent general way of approximating complex functions, but they are rarely the best choice for any specific learning task. They are difficult to design, slow to converge, and get stuck in local minima.
If you have no experience with neural networks, then you might be happier to you use a more straightforward method of generalizing RL, such as coarse coding.
Theoretically it has been proved that Neural Networks can approximate any function (given an infinite number of hidden neurons and the necessary inputs), so no I don't think the neural networks will ever be abandonwares.
SVM are great, but they cannot be used for all applications while Neural Networks can be used for any purpose.
Using neural networks in combination with reinforcement learning is standard and well-known, but be careful to plot and debug your neural network's convergence to check that it works correctly as neural networks are notoriously known to be hard to implement and learn correctly.
Be also very careful about the representation of the problem you give to your neural network (ie: the inputs nodes): could you, or could an expert, solve the problem given what you give as inputs to your net? Very often, people implementing neural networks don't give enough informations for the neural net to reason, this is not so uncommon, so be careful with that.