how to conduct the proper multilayer perceptron on interval data - neural-network

I have a dataset of the daily temperature for a couple of years. The data is in the interval form, including daily high temp and daily low temp.
I want to do a forecasting of the data, and I recently read several paper mention that the multilayer perceptron have the advantage to do this. However, after reading the paper I still got puzzled. I know in order to conduct it, I will need to have input, hidden layer and output. But in Matlab, though I have the code already, I still don't know how to simulate it. What should I put as its input and output, should I put the interval data as the input and output? And how can I choose hidden layer?

The input in an MLP network is the input feature data that you are trying to predict the outcome of. The output is what you are trying to predict. For the hidden layer that will determine how well it predicts, which you want as large as it needs to achieve reasonable prediction results. Going too large and it just memorizes the data rather than generalize on a pattern when training is run.
For example, if your input layer would be what day of the year it is (1-365), what the high was of the day, and what the low is of the day. And I assume is what the high and low temperature would be for the next day?
The more relevant input features you have the better the network will be.

Related

Choice of Neural Network and Activation Function

I am very new to the field of Neural Network. Apologies, if this question is very amateurish.
I am looking to build a neural network model to predict whether a particular image that I am about to post on a social media platform will get a certain engagement rate.
I have around 120 images with historical data about the engagement rate. The following information is available:
Images of size 501 px x 501 px
Type of image (Exterior photoshoot/Interior photoshoot)
Day of posting the image (Sunday/Monday/Tuesday/Wednesday/Thursday/Friday/Saturday)
Time of posting the image (18:33, 10:13, 19:36 etc)
No. of people who have seen the post (15659, 35754, 25312 etc)
Engagement rate (5.22%, 3.12%, 2.63% etc)
I would like the model to predict if a certain image when posted on a particular day and time will give an engagement rate of 3% or more.
As you may have noticed, the input data is images, text (signifying what type or day), time and numbers.
Could you please help me understand how to build a neural network for this problem?
P.S: I am very new to this field. It would be great if you can give a detailed direction how I should proceed to solve this problem.
A neural network has three kinds of neuronal layers:
Input layer. It stores the inputs this network will receive. The number of neurons must equal the number of inputs you have;
Hidden layer. It uses the inputs that come from the previous layer and it does the necessary calculations so as to obtain a result, which passes to the output layer. More complex problems may require more than one hidden layer. As far as I know, there is not an algorithm to determine the number of neurons in this layer, so I think you determine this number based on trial and error and previous experience;
Output layer. It gets the results from the hidden layer and gives it to the user for his personal use. The number of neurons from the output layer equals the number of outputs you have.
According to what you write here, your training database has 6 inputs and one output (the engagement rate). This means that your artificial neural network (ANN) will have 6 neurons on the input layer and one neuron on the output layer.
I not sure if you can pass images as inputs to a neural network. Also, because in theory there are an infinite types of images, I think you should categorize them a bit, each category receiving a number. An example of categorization would be:
Images with dogs are in category 1;
Images with hospitals are in category 2, etc.
So, your inputs will look like this:
Image category (dogs=1, hospitals=2, etc.);
Type of image (Exterior photoshoot=1, interior photoshoot=2);
Posting day (Sunday=1, Monday=2, etc.);
Time of posting the image;
Number of people who have seen the post;
Engagement rate.
The number of hidden layers and the number of each neuron from each hidden layer depends on your problem's complexity. Having 120 pictures, I think one hidden layer and 10 neurons on this layer is enough.
The ANN will have one hidden layer (the engagement rate).
Once the database containing the information about the 120 pictures is created (known as training database) is created, the next step is to train the ANN using the database. However, there is some discussion here.
Training an ANN means computing some parameters of the hidden neurons by using an optimization algorithm so as the sum of squared errors is minimum. The training process has some degree of randomness to it. To minimize the effect of the randomness factor and to get as precise estimations as possible, your training database must have:
Consistent data;
Many records;
I don't know how consistent your data are, but from my experience, a small training database with consistent data beats a huge database with non-consistent ones.
Judging by the problem, I think you should use the default activation function provided by the software you use for ANN handling.
Once you have trained your database, it is time to see how efficient this training was. The software which you use for ANN should provide you with tools to estimate this, tools which should be documented. If training is satisfactory for you, you may begin using it. If it is not, you may either re-train the ANN or use a larger database.

How is input dataset fed into neural network?

If I have 1000 observations in my dataset with 15 features and 1 label, how is the data in input neurons fed for forward pass and back propagation? Is it fed row wise for 1000 observations (one at a time) and weights are updated with each observation fed or full data is given in terms of input matrix and then with number of epochs, the network learns corresponding weight values? Also if it is fed one at time, what is epochs in that case?
Thanks
Assuming that the data is formatted into rows (1000 instances with 16 features each, with the last one being the label), you would feed in the first 15 features row by row and use the last "feature"/label as the target. This is called online learning. Online learning requires you to feed the data in one example at a time and conduct the back propagation and the weight update for every example. As you can imagine this can get quite intensive due to the backpropagation and update for every instance of your data.
The other option that you mentioned is feeding in the entire data into the network. This performs poorly in practice as the convergence is extremely slow.
In practice, mini-batches are used. This involves sending a small subset of the dataset through and then doing the back propagation and weight update. This provides the benefit of relatively frequent weight updates to speed up learning but is less intensive than the online learning. For more information on mini-batches see this University of Toronto Lecture by Geoffrey Hinton
Finally, an epoch is always 1 run through all of your data. It doesn't matter if you feed it in one at a time or all at once.
I hope this clarified your questions.

Extracting Patterns using Neural Networks

I am trying to extract common patterns that always appear whenever a certain event occurs.
For example, patient A, B, and C all had a heart attack. Using the readings from there pulse, I want to find the common patterns before the heart attack stroke.
In the next stage I want to do this using multiple dimensions. For example, using the readings from the patients pulse, temperature, and blood pressure, what are the common patterns that occurred in the three dimensions taking into consideration the time and order between each dimension.
What is the best way to solve this problem using Neural Networks and which type of network is best?
(Just need some pointing in the right direction)
and thank you all for reading
Described problem looks like a time series prediction problem. That means a basic prediction problem for a continuous or discrete phenomena generated by some existing process. As a raw data for this problem we will have a sequence of samples x(t), x(t+1), x(t+2), ..., where x() means an output of considered process and t means some arbitrary timepoint.
For artificial neural networks solution we will consider a time series prediction, where we will organize our raw data to a new sequences. As you should know, we consider X as a matrix of input vectors that will be used in ANN learning. For time series prediction we will construct a new collection on following schema.
In the most basic form your input vector x will be a sequence of samples (x(t-k), x(t-k+1), ..., x(t-1), x(t)) taken at some arbitrary timepoint t, appended to it predecessor samples from timepoints t-k, t-k+1, ..., t-1. You should generate every example for every possible timepoint t like this.
But the key is to preprocess data so that we get the best prediction results.
Assuming your data (phenomena) is continuous, you should consider to apply some sampling technique. You could start with an experiment for some naive sampling period Δt, but there are stronger methods. See for example Nyquist–Shannon Sampling Theorem, where the key idea is to allow to recover continuous x(t) from discrete x(Δt) samples. This is reasonable when we consider that we probably expect our ANNs to do this.
Assuming your data is discrete... you still should need to try sampling, as this will speed up your computations and might possibly provide better generalization. But the key advice is: do experiments! as the best architecture depends on data and also will require to preprocess them correctly.
The next thing is network output layer. From your question, it appears that this will be a binary class prediction. But maybe a wider prediction vector is worth considering? How about to predict the future of considered samples, that is x(t+1), x(t+2) and experiment with different horizons (length of the future)?
Further reading:
Somebody mentioned Python here. Here is some good tutorial on timeseries prediction with Keras: Victor Schmidt, Keras recurrent tutorial, Deep Learning Tutorials
This paper is good if you need some real example: Fessant, Francoise, Samy Bengio, and Daniel Collobert. "On the prediction of solar activity using different neural network models." Annales Geophysicae. Vol. 14. No. 1. 1996.

Increased Error with more Training Data for a Neural Network in Matlab

I have a question regarding the Matlab NN toolbox. As a part of research project I decided to create a Matlab script that uses the NN toolbox for some fitting solutions.
I have a data stream that is being loaded to my system. The Input data consists of 5 input channels and 1 output channel. I train my data on on this configurations for a while and try to fit the the output (for a certain period of time) as new data streams in. I retrain my network constantly to keep it updated.
So far everything works fine, but after a certain period of time the results get bad and do not represent the desired output. I really can't explain why this happens, but i could imagine that there must be some kind of memory issue, since as the data set is still small, everything is ok.
Only when it gets bigger the quality of the simulation drops down. Is there something as a memory which gets full, or is the bad sim just a result of the huge data sets? I'm a beginner with this tool and will really appreciate your feedback. Best Regards and thanks in advance!
Please elaborate on your method of retraining with new data. Do you run further iterations? What do you consider as "time"? Do you mean epochs?
At a first glance, assuming time means epochs, I would say that you're overfitting the data. Neural Networks are supposed to be trained for a limited number of epochs with early stopping. You could try regularization, different gradient descent methods (if you're using a GD method), GD momentum. Also depending on the values of your first few training datasets, you may have trained your data using an incorrect normalization range. You should check these issues out if my assumptions are correct.

How to use created "net" neural network object for prediction?

I used ntstool to create NAR (nonlinear Autoregressive) net object, by training on a 1x1247 input vector. (daily stock price for 6 years)
I have finished all the steps and saved the resulting net object to workspace.
Now I am clueless on how to use this object to predict the y(t) for example t = 2000, (I trained the model for t = 1:1247)
In some other threads, people recommended to use sim(net, t) function - however this will give me the same result for any value of t. (same with net(t) function)
I am not familiar with the specific neural net commands, but I think you are approaching this problem in the wrong way. Typically you want to model the evolution in time. You do this by specifying a certain window, say 3 months.
What you are training now is a single input vector, which has no information about evolution in time. The reason you always get the same prediction is because you only used a single point for training (even though it is 1247 dimensional, it is still 1 point).
You probably want to make input vectors of this nature (for simplicity, assume you are working with months):
[month1 month2; month2 month 3; month3 month4]
This example contains 2 training points with the evolution of 3 months. Note that they overlap.
Use the Network
After the network is trained and validated, the network object can be used to calculate the network response to any input. For example, if you want to find the network response to the fifth input vector in the building data set, you can use the following
a = net(houseInputs(:,5))
a =
34.3922
If you try this command, your output might be different, depending on the state of your random number generator when the network was initialized. Below, the network object is called to calculate the outputs for a concurrent set of all the input vectors in the housing data set. This is the batch mode form of simulation, in which all the input vectors are placed in one matrix. This is much more efficient than presenting the vectors one at a time.
a = net(houseInputs);
Each time a neural network is trained, can result in a different solution due to different initial weight and bias values and different divisions of data into training, validation, and test sets. As a result, different neural networks trained on the same problem can give different outputs for the same input. To ensure that a neural network of good accuracy has been found, retrain several times.
There are several other techniques for improving upon initial solutions if higher accuracy is desired. For more information, see Improve Neural Network Generalization and Avoid Overfitting.
strong text