what should be my input in ANN - neural-network

I am getting confusing about Input data set . I am studying about Artificial Neural Network , my purpose is that I wanted to use the historical data (I have stock data of last 10 years ) to predict stock value in the future (for example 2015). So, where is my input? For example i have a Excel sheet data as [Column1-Date| Column2-High | Column3-low |Column4-opening|Column5-closing]

By profession I am a quant and I am currently pursuing a masters degree in Computer Science. There are a many considerations when selecting financial input for a neural network including,
Select indicators which which are positively correlated to returns.
Indicators are independent variables which have predictive power on the dependent variable (stock returns). Common popular indicators include technical indicators derived from price and volume data, fundamental indicators about the underlying company or asset, and quantitative indicators such as descriptive statistics or even model parameters. If you have many indicators, you can narrow them down using correlation analysis, best subset, or principal component analysis.
Pre-process the indicators for use in Neural Networks
Neural networks work by connecting perceptrons together. Each perceptron contains an activation function e.g. the sigmoid function or tanh. Most activation functions have an active range. For the sigmoid function this is between -sqrt(3) and +sqrt(3). What this means is that you should normalize your data to within the active range and seriously consider removing outliers.
There are many other potential issues with using Neural Networks. I wrote an article a while back which identified ten issues, including the ones mentioned here. Feel free to check it out.

Related

Choice of Neural Network and Activation Function

I am very new to the field of Neural Network. Apologies, if this question is very amateurish.
I am looking to build a neural network model to predict whether a particular image that I am about to post on a social media platform will get a certain engagement rate.
I have around 120 images with historical data about the engagement rate. The following information is available:
Images of size 501 px x 501 px
Type of image (Exterior photoshoot/Interior photoshoot)
Day of posting the image (Sunday/Monday/Tuesday/Wednesday/Thursday/Friday/Saturday)
Time of posting the image (18:33, 10:13, 19:36 etc)
No. of people who have seen the post (15659, 35754, 25312 etc)
Engagement rate (5.22%, 3.12%, 2.63% etc)
I would like the model to predict if a certain image when posted on a particular day and time will give an engagement rate of 3% or more.
As you may have noticed, the input data is images, text (signifying what type or day), time and numbers.
Could you please help me understand how to build a neural network for this problem?
P.S: I am very new to this field. It would be great if you can give a detailed direction how I should proceed to solve this problem.
A neural network has three kinds of neuronal layers:
Input layer. It stores the inputs this network will receive. The number of neurons must equal the number of inputs you have;
Hidden layer. It uses the inputs that come from the previous layer and it does the necessary calculations so as to obtain a result, which passes to the output layer. More complex problems may require more than one hidden layer. As far as I know, there is not an algorithm to determine the number of neurons in this layer, so I think you determine this number based on trial and error and previous experience;
Output layer. It gets the results from the hidden layer and gives it to the user for his personal use. The number of neurons from the output layer equals the number of outputs you have.
According to what you write here, your training database has 6 inputs and one output (the engagement rate). This means that your artificial neural network (ANN) will have 6 neurons on the input layer and one neuron on the output layer.
I not sure if you can pass images as inputs to a neural network. Also, because in theory there are an infinite types of images, I think you should categorize them a bit, each category receiving a number. An example of categorization would be:
Images with dogs are in category 1;
Images with hospitals are in category 2, etc.
So, your inputs will look like this:
Image category (dogs=1, hospitals=2, etc.);
Type of image (Exterior photoshoot=1, interior photoshoot=2);
Posting day (Sunday=1, Monday=2, etc.);
Time of posting the image;
Number of people who have seen the post;
Engagement rate.
The number of hidden layers and the number of each neuron from each hidden layer depends on your problem's complexity. Having 120 pictures, I think one hidden layer and 10 neurons on this layer is enough.
The ANN will have one hidden layer (the engagement rate).
Once the database containing the information about the 120 pictures is created (known as training database) is created, the next step is to train the ANN using the database. However, there is some discussion here.
Training an ANN means computing some parameters of the hidden neurons by using an optimization algorithm so as the sum of squared errors is minimum. The training process has some degree of randomness to it. To minimize the effect of the randomness factor and to get as precise estimations as possible, your training database must have:
Consistent data;
Many records;
I don't know how consistent your data are, but from my experience, a small training database with consistent data beats a huge database with non-consistent ones.
Judging by the problem, I think you should use the default activation function provided by the software you use for ANN handling.
Once you have trained your database, it is time to see how efficient this training was. The software which you use for ANN should provide you with tools to estimate this, tools which should be documented. If training is satisfactory for you, you may begin using it. If it is not, you may either re-train the ANN or use a larger database.

Siamese networks: Why does the network to be duplicated?

The DeepFace paper from Facebook uses a Siamese network to learn a metric. They say that the DNN that extracts the 4096 dimensional face embedding has to be duplicated in a Siamese network, but both duplicates share weights. But if they share weights, every update to one of them will also change the other. So why do we need to duplicate them?
Why can't we just apply one DNN to two faces and then do backpropagation using the metric loss? Do they maybe mean this and just talk about duplicated networks for "better" understanding?
Quote from the paper:
We have also tested an end-to-end metric learning ap-
proach, known as Siamese network [8]: once learned, the
face recognition network (without the top layer) is repli-
cated twice (one for each input image) and the features are
used to directly predict whether the two input images be-
long to the same person. This is accomplished by: a) taking
the absolute difference between the features, followed by b)
a top fully connected layer that maps into a single logistic
unit (same/not same). The network has roughly the same
number of parameters as the original one, since much of it
is shared between the two replicas, but requires twice the
computation. Notice that in order to prevent overfitting on
the face verification task, we enable training for only the
two topmost layers.
Paper: https://research.fb.com/wp-content/uploads/2016/11/deepface-closing-the-gap-to-human-level-performance-in-face-verification.pdf
The short answer is that yes, I think that looking at the architecture of the network will help you understand what is going on. You have two networks that are "joined at the hip" i.e. sharing weights. That's what makes it a "Siamese network". The trick is that you want the two images you feed into the network to pass through the same embedding function. So to ensure that this happens both branches of the network need to share weights.
Then we combine the two embeddings into a metric loss (called "contrastive loss" in the image below). And we can back-propagate as normal, we just have two input branches available so that we can feed in two images at a time.
I think a picture is worth a thousand words. So check out how a Siamese network is constructed (at least conceptually) below.
The gradients depend on the activation values. So for each branch gradients will be different and final update could be based on some averaging to share the weights

Use a trained neural network to imitate its training data

I'm in the overtures of designing a prose imitation system. It will read a bunch of prose, then mimic it. It's mostly for fun so the mimicking prose doesn't need to make too much sense, but I'd like to make it as good as I can, with a minimal amount of effort.
My first idea is to use my example prose to train a classifying feed-forward neural network, which classifies its input as either part of the training data or not part. Then I'd like to somehow invert the neural network, finding new random inputs that also get classified by the trained network as being part of the training data. The obvious and stupid way of doing this is to randomly generate word lists and only output the ones that get classified above a certain threshold, but I think there is a better way, using the network itself to limit the search to certain regions of the input space. For example, maybe you could start with a random vector and do gradient descent optimisation to find a local maximum around the random starting point. Is there a word for this kind of imitation process? What are some of the known methods?
How about Generative Adversarial Networks (GAN, Goodfellow 2014) and their more advanced siblings like Deep Convolutional Generative Adversarial Networks? There are plenty of proper research articles out there, and also more gentle introductions like this one on DCGAN and this on GAN. To quote the latter:
GANs are an interesting idea that were first introduced in 2014 by a
group of researchers at the University of Montreal lead by Ian
Goodfellow (now at OpenAI). The main idea behind a GAN is to have two
competing neural network models. One takes noise as input and
generates samples (and so is called the generator). The other model
(called the discriminator) receives samples from both the generator
and the training data, and has to be able to distinguish between the
two sources. These two networks play a continuous game, where the
generator is learning to produce more and more realistic samples, and
the discriminator is learning to get better and better at
distinguishing generated data from real data. These two networks are
trained simultaneously, and the hope is that the competition will
drive the generated samples to be indistinguishable from real data.
(DC)GAN should fit your task quite well.

How does a neural network produce an array?

Background
I've been studying Neural Networks, specifically the implmentation provided by this incredible online book. In the example network provided, we're shown how to create a neural network that classifies the MNIST training data to perform Optical Character Recognition (OCR).
The network is configured so that the input stimuli represents a discrete range of thresholded pixel data from a 24x24 image; at the output, we have ten signal paths which represent each of the different solutions for the input images; these are used classify a handwritten digit from zero to nine. In this implementation, a handwritten '3' would drive a strong signal down the third output path.
Now, I've seen that Neural Networks can be applied to far more 'unpredictable' output solutions; for example, take the team who taught a network to recognize the hair on a human:
Question
Surely in the application above, we couldn't use a fixed output array length because the number of points that would qualify within an image would vary just so wildly between different samples. Can anyone recommend what kind of pattern would have been used to accomplish this?
Assumption
In the interest of completeness, I'm going to propose that the team could have employed a kind of 'line following robot' for the classification task. So for an input image, a network could be trained by using a small range of discrete commands (LEFT, RIGHT, UP, DOWN) for a fixed period t and train the network to control the robot like an Etch-a-Sketch.
Alternatively, we could implement a network which would map pixels one-to-one, and define whether individual pixels contributed to hair; but this wouldn't be compatible with different image resolutions.
So, do either of these solutions sound plausable? If so, are these basic implementations of a known generic solution for this kind of problem? What approach would you use?

How to train on and make a serialized feature vector for a Neural Network?

By serialized i mean that the values for an input come in discrete intervals of time and that size of the vector is also not known before hand.
Conventionally the neural networks employ fixed size parallel input neurons and fixed size parallel output neurons.
A serialized implementation could be used in speech recognition where i can feed the network with a time series of the waveform and on the output end get the phonemes.
It would be great if someone can point out some existing implementation.
Simple neural network as a structure doesn't have invariance across time scale deformation that's why it is impractical to apply it to recognize time series. To recognize time series usually a generic communication model is used (HMM). NN could be used together with HMM to classify individual frames of speech. In such HMM-ANN configuration audio is split on frames, frame slices are passed into ANN in order to calculate phoneme probabilities and then the whole probability sequence is analyzed for a best match using dynamic search with HMM.
HMM-ANN system usually requires initialization from more robust HMM-GMM system thus there are no standalone HMM-ANN implementation, usually they are part of a whole speech recognition toolkit. Among popular toolkits Kaldi has implementation for HMM-ANN and even for HMM-DNN (deep neural networks).
There are also neural networks which are designed to classify time series - recurrent neural networks, they can be successfully used to classify speech. The example can be created with any toolkit supporting RNN, for example Keras. If you want to start with recurrent neural networks, try long-short term memory networks (LSTM), their architecture enables more stable training. Keras setup for speech recognition is discussed in Building Speech Dataset for LSTM binary classification
There are several types of neural networks that are intended to model sequence data; I would say most of these models fit into an equivalence class known as a recurrent neural network, which is generally any neural network model whose connection graph contains a cycle. The cycle in the connection graph can typically be exploited to model some aspect of the past "state" of the network, and different strategies -- for example, Elman/Jordan nets, Echo State Networks, etc. -- have been developed to take advantage of this state information in different ways.
Historically, recurrent nets have been extremely difficult to train effectively. Thanks to lots of recent work in second-order optimization tools for neural networks, along with research from the deep neural networks community, several recent examples of recurrent networks have been developed that show promise in modeling real-world tasks. In my opinion, one of the neatest current examples of such a network is Ilya Sutskever's "Generating text with recurrent neural networks" (ICML 2011), in which a recurrent net is used as a very compact, long-range n-gram character model. (Try the RNN demo on the linked homepage, it's fun.)
As far as I know, recurrent nets have not yet been applied successfully to speech -> phoneme modeling directly, but Alex Graves specifically mentions this task in several of his recent papers. (Actually, it looks like he has a 2013 ICASSP paper on this topic.)