Fully connected neural network for panel data? - neural-network

I am trying to build neural network (NN) to forecast the probability of an event (e.g. thunderstorm will occur). As a base, I have a panel with weather data per state over 10 years. I saw some posts with similar questions (e.g. Keras Recurrent Neural Networks For Multivariate Time Series) and they all seem to use a RNN for this problem.
I would like to understand why a RNN seems the go-to solution and not e.g. a simple fully connected NN. Conventionally, I would use a logit model with fixed-effects for this problem.
Maybe someone can point me towards a paper or two which discusses this?

Related

Is neural network suitable for supervised learning where the data (inputs and outputs) are continuous?

I am working on a regression model with a set of 158 inputs and 4 outputs of glass manufacturing project which is a continuous process of inputs and outputs. Is the usage of Neural Net a suitable solution for such kind of regression models? If yes, I have understood that Recurrent Neural Nets can be used for time series data, which Recurrent Neural Net shall I use? If usage of NN is not suitable, what are the other types of solutions available other than Linear Regression and Regression Trees?
Neural Networks are indeed suitable for continuous data. In fact, it is continous by default I would say. It is possible to have discrete I/O for sure, it all depend on your functions.
Secondly, it is true that RNN are suitable for time series, in a way. RNN are in fact suitable for timesteps more than timestamps. RNN are working by iterations. Typically, each iteration can be seen as a fixed step forward in time. This said, if you data is more like (date, value) (what I call timestamp), it may not be so good. It would not be absolutely impossible, but that's not the idea.
Hope it helps, start with simple RNN, try to understand how it works, then, if you need more, read about more complex cells.

Use a trained neural network to imitate its training data

I'm in the overtures of designing a prose imitation system. It will read a bunch of prose, then mimic it. It's mostly for fun so the mimicking prose doesn't need to make too much sense, but I'd like to make it as good as I can, with a minimal amount of effort.
My first idea is to use my example prose to train a classifying feed-forward neural network, which classifies its input as either part of the training data or not part. Then I'd like to somehow invert the neural network, finding new random inputs that also get classified by the trained network as being part of the training data. The obvious and stupid way of doing this is to randomly generate word lists and only output the ones that get classified above a certain threshold, but I think there is a better way, using the network itself to limit the search to certain regions of the input space. For example, maybe you could start with a random vector and do gradient descent optimisation to find a local maximum around the random starting point. Is there a word for this kind of imitation process? What are some of the known methods?
How about Generative Adversarial Networks (GAN, Goodfellow 2014) and their more advanced siblings like Deep Convolutional Generative Adversarial Networks? There are plenty of proper research articles out there, and also more gentle introductions like this one on DCGAN and this on GAN. To quote the latter:
GANs are an interesting idea that were first introduced in 2014 by a
group of researchers at the University of Montreal lead by Ian
Goodfellow (now at OpenAI). The main idea behind a GAN is to have two
competing neural network models. One takes noise as input and
generates samples (and so is called the generator). The other model
(called the discriminator) receives samples from both the generator
and the training data, and has to be able to distinguish between the
two sources. These two networks play a continuous game, where the
generator is learning to produce more and more realistic samples, and
the discriminator is learning to get better and better at
distinguishing generated data from real data. These two networks are
trained simultaneously, and the hope is that the competition will
drive the generated samples to be indistinguishable from real data.
(DC)GAN should fit your task quite well.

how to derive a model equation from the artificial neural networks?

I have used the neural network software for predicting the continous data. Obviously the prediction was better than the results obtained through regression analysis. Now i would like to derive a model expression from the trained weights obtained from the training of the continous data through the software, as suggested by many researchers on how to interpret the trained weights and biases for deriving the model equation i tried to derive one from the similar lines.
After deriving the equation i found that the equation was not able to replicate the same results as given by the neural network software. so i am exploring the new methods to derive the equation. I want to know where i am going wrong and if any one can provide me steps for deriving one it will be helpful.
I have read sometime ago about what you're talking about, but with some diferences. It would probably be useful to you. It's called 'knowledge distilling', if I remember well, and it is a way of extracting the knowledge inside the blackbox that a neural network is. It consists, roughly speaking, in training a simpler model that is easier to interpret, but preserving al the predictive power of the original neural network. I'm speaking from memory, so I'm sorry about the lack of detail. A search on Google will provide the exact references for it.
Hope to have helped.

Adapt neural network after training

What is the best way to adapt a neural network after its initial training?
I.e. I want to do some image recognition and the network should get better the more new pictures I present it. That could be done with reinforced learning but for a fast progress at the beginning I want to use back propagation. Is it possible to update a network?
And what about creating new categories later on?
Is there another way than retraining it with the complete dataset since that would take a lot of time.
Sorry for my basic questions but I couldn't find much information about this.
Neural networks can be adapted by training them with small learning rates on the new data. Maybe even training the last layers with a larger learning rate than the others (incase you are using a deep neural network).
For the second part of your question, about creating new categories, a deep neural network can be used as a feature extractor on top of any other classifier (maybe another small neural network). When you want to add a new category, you have to re-train the small classifier (or neural network). This would mean that you will retain the training values of the feature detector (the deep neural network) and use it to detect new categories.

How to train on and make a serialized feature vector for a Neural Network?

By serialized i mean that the values for an input come in discrete intervals of time and that size of the vector is also not known before hand.
Conventionally the neural networks employ fixed size parallel input neurons and fixed size parallel output neurons.
A serialized implementation could be used in speech recognition where i can feed the network with a time series of the waveform and on the output end get the phonemes.
It would be great if someone can point out some existing implementation.
Simple neural network as a structure doesn't have invariance across time scale deformation that's why it is impractical to apply it to recognize time series. To recognize time series usually a generic communication model is used (HMM). NN could be used together with HMM to classify individual frames of speech. In such HMM-ANN configuration audio is split on frames, frame slices are passed into ANN in order to calculate phoneme probabilities and then the whole probability sequence is analyzed for a best match using dynamic search with HMM.
HMM-ANN system usually requires initialization from more robust HMM-GMM system thus there are no standalone HMM-ANN implementation, usually they are part of a whole speech recognition toolkit. Among popular toolkits Kaldi has implementation for HMM-ANN and even for HMM-DNN (deep neural networks).
There are also neural networks which are designed to classify time series - recurrent neural networks, they can be successfully used to classify speech. The example can be created with any toolkit supporting RNN, for example Keras. If you want to start with recurrent neural networks, try long-short term memory networks (LSTM), their architecture enables more stable training. Keras setup for speech recognition is discussed in Building Speech Dataset for LSTM binary classification
There are several types of neural networks that are intended to model sequence data; I would say most of these models fit into an equivalence class known as a recurrent neural network, which is generally any neural network model whose connection graph contains a cycle. The cycle in the connection graph can typically be exploited to model some aspect of the past "state" of the network, and different strategies -- for example, Elman/Jordan nets, Echo State Networks, etc. -- have been developed to take advantage of this state information in different ways.
Historically, recurrent nets have been extremely difficult to train effectively. Thanks to lots of recent work in second-order optimization tools for neural networks, along with research from the deep neural networks community, several recent examples of recurrent networks have been developed that show promise in modeling real-world tasks. In my opinion, one of the neatest current examples of such a network is Ilya Sutskever's "Generating text with recurrent neural networks" (ICML 2011), in which a recurrent net is used as a very compact, long-range n-gram character model. (Try the RNN demo on the linked homepage, it's fun.)
As far as I know, recurrent nets have not yet been applied successfully to speech -> phoneme modeling directly, but Alex Graves specifically mentions this task in several of his recent papers. (Actually, it looks like he has a 2013 ICASSP paper on this topic.)