Is it possible to implement, using torch, an architecture that connects the neurons of the same layer?
What you describe is called a recurrent neural network. Note that it needs quite different type of structure, input data, and training algorithms to work well.
There is the rnn library for Torch to work with recurrent neural networks.
Yes, it's possible. Torch has everything that other languages have: logical operations, reading/writing operations, array operations. That's all what needed for implementing any kind of neural network. If to take into account that torch has usage of CUDA you can even implement neural network which can work faster then some C# or java implementations. Performance improvement can depend from number of if/else during one iteration
Related
I have finished two neural network courses and done loads of reading on the subject. I am comfortable with Tensorflow and Keras and building advanced neural networks (multiple inputs, large data, special layers...). I also have a fairly deep understanding of the underlying mathematics.
My problem is that I know how to build neural networks but don't know the process by which an "expert" would create one for a specific application.
I can:
Collect loads of data and clean it up.
Train the neural network.
Fine tune hyper parameters.
Export it for actual applications.
What I am missing is how to come up with the layers in the neural network (how wide, what kind...). I know it is somewhat trial and error and looking at what has worked for others. But there must be a process that people can use to come up with architectures* that actually work very well. For example state of the art neural networks.
I am looking for a free resource that would help me understand this process of creating a very good architecture*.
*by architecture I mean the different layers that make up the network and their properties
I wrote my masters thesis about the topic:
Thoma, Martin. "Analysis and Optimization of Convolutional Neural Network Architectures." arXiv preprint arXiv:1707.09725 (2017).
Long story short: There are a couple of techniques for analysis (chapter 2.5) and algorithms that learn topoligies (chapter 3), but in practice it is mostly trial and error / gut feeling.
I'm searching for existing work on Neural Net architectures that grow based on need or complexity/variability of training data. Some architectures that I've found include self-organizing maps, and growing Neural gas. Are these the only one's out there?
What I'm searching for is best illustrated by a simple scenario;
if the training data only has a few patterns, then the neural net would be 2-3 layers deep with a small set of nodes in each layer. If the training data was more convoluted, then we would see deeper networks.
Such work seems rare or absent in the AI literature. Is it because the performance is comparatively weak ? I'd appreciate any guidance.
An example of this is called neuro-evolution. What you could do is combine backprop with evolution to find the optimal structure for your dataset. Neataptic is one of the NN libraries which offers neuro-evolution. With some simple coding you could turn this into backprop + evolution.
The disadvantage of this is that it will require much more computation power as it requires a genetic algorithm to run an entire population. So using neuro-evolution does make the performance comparibly weak.
However, I think there are more techniques out there that disable certain nodes, and if there is no negative effect on the output, they will be removed. I'm not sure though.
I've seen some tutorial examples, like UFLDL covolutional net, where they use features obtained by unsupervised learning, or some others, where kernels are engineered by hand (using Sobel and Gabor detectors, different sharpness/blur settings etc). Strangely, I can't find a general guideline on how one should choose a good kernel for something more than a toy network. For example, considering a deep network with many convolutional-pooling layers, are the same kernels used at each layer, or does each layer have its own kernel subset? If so, where do these, deeper layer's filters come from - should I learn them using some unsupervised learning algorithm on data passed through the first convolution-and-pooling layer pair?
I understand that this question doesn't have a singular answer, I'd be happy to just the the general approach (some review article would be fantastic).
The current state of the art suggest to learn all the convolutional layers from the data using backpropagation (ref).
Also, this paper recommend small kernels (3x3) and pooling (2x2). You should train different filters for each layer.
Kernels in deep networks are mostly trained all at the same time in a supervised way (known inputs and outputs of network) using Backpropagation (computes gradients) and some version of Stochastic Gradient Descent (optimization algorithm).
Kernels in different layers are usually independent. They can have different sizes and their numbers can differ as well. How to design a network is an open question and it depends on your data and the problem itself.
If you want to work with your own dataset, you should start with an existing pre-trained network [Caffe Model Zoo] and fine-tune it on your dataset. This way, the architecture of the network would be fixed, as you would have to respect the architecture of the original network. The networks you can donwload are trained on very large problems which makes them able to generalize well to other classification/regression problems. If your dataset is at least partly similar to the original dataset, the fine-tuned networks should work very well.
Good place to get more information is Caffe # CVPR2015 tutorial.
By serialized i mean that the values for an input come in discrete intervals of time and that size of the vector is also not known before hand.
Conventionally the neural networks employ fixed size parallel input neurons and fixed size parallel output neurons.
A serialized implementation could be used in speech recognition where i can feed the network with a time series of the waveform and on the output end get the phonemes.
It would be great if someone can point out some existing implementation.
Simple neural network as a structure doesn't have invariance across time scale deformation that's why it is impractical to apply it to recognize time series. To recognize time series usually a generic communication model is used (HMM). NN could be used together with HMM to classify individual frames of speech. In such HMM-ANN configuration audio is split on frames, frame slices are passed into ANN in order to calculate phoneme probabilities and then the whole probability sequence is analyzed for a best match using dynamic search with HMM.
HMM-ANN system usually requires initialization from more robust HMM-GMM system thus there are no standalone HMM-ANN implementation, usually they are part of a whole speech recognition toolkit. Among popular toolkits Kaldi has implementation for HMM-ANN and even for HMM-DNN (deep neural networks).
There are also neural networks which are designed to classify time series - recurrent neural networks, they can be successfully used to classify speech. The example can be created with any toolkit supporting RNN, for example Keras. If you want to start with recurrent neural networks, try long-short term memory networks (LSTM), their architecture enables more stable training. Keras setup for speech recognition is discussed in Building Speech Dataset for LSTM binary classification
There are several types of neural networks that are intended to model sequence data; I would say most of these models fit into an equivalence class known as a recurrent neural network, which is generally any neural network model whose connection graph contains a cycle. The cycle in the connection graph can typically be exploited to model some aspect of the past "state" of the network, and different strategies -- for example, Elman/Jordan nets, Echo State Networks, etc. -- have been developed to take advantage of this state information in different ways.
Historically, recurrent nets have been extremely difficult to train effectively. Thanks to lots of recent work in second-order optimization tools for neural networks, along with research from the deep neural networks community, several recent examples of recurrent networks have been developed that show promise in modeling real-world tasks. In my opinion, one of the neatest current examples of such a network is Ilya Sutskever's "Generating text with recurrent neural networks" (ICML 2011), in which a recurrent net is used as a very compact, long-range n-gram character model. (Try the RNN demo on the linked homepage, it's fun.)
As far as I know, recurrent nets have not yet been applied successfully to speech -> phoneme modeling directly, but Alex Graves specifically mentions this task in several of his recent papers. (Actually, it looks like he has a 2013 ICASSP paper on this topic.)
I have decided to use a feed-forward NN with back-propagation training for my OCR application for Handwritten text and the input layer is going to be with 32*32 (1024) neurones and at least 8-12 out put neurones.
I found Neuroph easy to use by reading some articles at the same time Encog is few times better in performance. Considering the parameters in my scenario which API is the most suitable one. And I appreciate if u can comment on the number of input nodes i have taken, is it too large value (Although it is out of the topic)
First my disclaimer, I am one of the main developers on the Encog project. This means I am more familiar with Encog that Neuroph and perhaps biased towards it. In my opinion, the relative strengths of each are as follows. Encog supports quite a few interchangeable machine learning methods and training methods. Neuroph is VERY focused on neural networks and you can express a connection between just about anything. So if you are going to create very custom/non-standard (research) neural networks of different typologies than the typical Elman/Jordan, NEAT, HyperNEAT, Feedforward type networks, then Neuroph will fit the bill nicely.