univariate time series multi step ahead prediction using multi-layer-perceptron(MLP) - neural-network

I have a univariate time series data. I want to do a multistep prediction.
I came across this question which explains time series one step prediction.
but I am interested in multistep ahead prediction.
e.g typical univariate time series data looks like
time value
---- ------
t1 a1
t2 a2
..........
..........
t100 a100.
Suppose, I want 3 step ahead prediction.
Can I frame my problem like
TrainX TrainY
[a1,a2,a3,a4,a5,a6] -> [a7,a8,a9]
[a2,a3,a4,a5,a6,a7] -> [a8,a9,a10]
[a3,a4,a5,a6,a7,a8] -> [a9,a10,a11]
.................. ...........
.................. ...........
I am using keras and tensorflow as backend
First layer has 50 neurons and expects 6 inputs.
hidden layer has 30 neurons
output layer has 3 neurons i.e (outputs three time series values)
model = Sequential()
model.add(Dense(50, input_dim=6, activation='relu',kernel_regularizer=regularizers.l2(0.01)))
model.add(Dense(30, activation='relu',kernel_regularizer=regularizers.l2(0.01)))
model.add(Dense(3))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(TrainX, TrainY, epochs=300, batch_size=16)
My model will be able to predict a107,a108,a109
,when my input is a101,a102,a103,a104,a105,a106
Is this a valid model ? Am I missing some thing?

That model might do it, but you should probably benefit from using LSTM layers (recurrent networks for sequences).
#TrainX.shape = (total of samples, time steps, features per step)
#TrainX.shape = (total of samples, 6, 1)
model.add(LSTM(50,input_shape=(6,1),return_sequences=True, ....))
model.add(LSTM(30,return_sequences=True, ....))
model.add(LSTM(3,return_sequences=False, ....))
You may be missing an activation function that limits the result to the possible range of the value you want to predict.
Often we work with values from 0 to 1 (activation='sigmoid') or from -1 to 1 (activation='tanh').
This would also require that the input be limited to these values, since inputs and outputs are the same.

Related

Neural Network - Train a MLP with multiple entries

I implemented a MLP with a Backpropagation algorithm, it works fine for only one entry, for example, if the input is 1 and 1 the answers on the last layer will be 1 and 0.
Let's suppose that instead of having only one entry (like 1,1) I have four entries, (1,1; 1,0; 0,0; 0,1), all of them have different expected answers.
I need to train this MLP and it needs to answer correctly to all entries.
I'm not finding a way to do this. Let's suppose that I have 1000 epochs, in this case I would need to train every entry for 250 epochs? Train one epoch with 1 entry then the next epoch with another entry?
How I could properly train a MLP to answer correctly to all entries?
at least for a python implementation, you can simply use multidimensional training data
# training a neural network to behave like an XOR gate
import numpy as np
X = np.array([[1,0],[0,1],[1,1],[0,0]]) # entries
y = np.array([[1],[1],[0],[0]]) # expected answers
INPUTS = X.shape[1]
HIDDEN = 12
OUTPUTS = y.shape[1]
w1 = np.random.randn(INPUTS, HIDDEN) * np.sqrt(2 / INPUTS)
w2 = np.random.randn(HIDDEN, OUTPUTS) * np.sqrt(2 / HIDDEN)
ALPHA = 0.5
EPOCHS = 1000
for e in range(EPOCHS):
z1 = sigmoid(X.dot(w1))
o = sigmoid(z1.dot(w2))
o_error = o - y
o_delta = o_error * sigmoidPrime(o)
w2 -= z1.T.dot(o_delta) * ALPHA
w2_error = o_delta.dot(w2.T)
w2_delta = w2_error * sigmoidPrime(z1)
w1 -= X.T.dot(w2_delta) * ALPHA
print(np.mean(np.abs(o_error))) # prints the loss of the NN
such an approach might not work with some neural network libraries, but that shouldn't matter, because neural network libraries will usually handle stuff like that themselves
the reason this works is that during the dot product between the input and hidden layer, each training entry gets matrix-multiplied with the entire hidden layer individually, so the result is a matrix containing the result for each sample forwarded through the hidden layer
and this process continues throughout the entire network, so what you are essentially doing is running multiple instances of the same neural network in parallel
the number of training entries doesn't have to be four, it can be any arbitrarily high number, as long as the size of its contents is the same as the input layer for X and the output layer for y and X and y are the same length (and you have enough RAM)
also, nothing about the neural network architecture is fundamentally changed from using single entries, only the data that is feeded into it has changed, so you don't have to scrap the code you've written, just make a few small changes most likely

How to emphasize selected output in neural network

I'm training a data set with 17 features and 5 output using pytorch. But I'm most interested in two of them, let say output 2 and 3 out of 0-4. What's a good strategy to get as high accuracy as possible on 2 and 3, while the rest can have lower accuracy?
If you are using nn.CrossEntropyLoss(), you can pass in the weights to emphasize or de-emphasize certain classes. From the PyTorch docs:
torch.nn.CrossEntropyLoss(weight: Optional[torch.Tensor] = None, ...)
The weights do not have to sum up to one, since PyTorch will handle that on its own when reduction='mean', which is the default setting. The weights specify which classes to weigh more heavily when calculating the loss. In other words, the higher the weight, the higher the penalty for getting a prediction wrong for the particular set of classes with higher weights.
# imports assumed
x = torch.randn(10, 5) # dummy data
target = torch.randint(0, 5, (10,)) # dummy targets
weights = torch.tensor([1., 1., 2., 2., 1.]) # emphasize classes 2 and 3
criterion_weighted = nn.CrossEntropyLoss(weight=weights)
loss_weighted = criterion_weighted(x, target)

training neural network for user recognition in MATLAB

I'm working on gait recognition problem, the aim of this study is to be used for user authentication
I have data of 36 users
I've successfully extracted 143 features for each sample (or example) which are (36 rows and 143 columns) for each user
( in other words, I have 36 examples and 143 features are extracted for each example. Thus a matrix called All_Feat of 36*143 has been created for each individual user).
By the way, column represents the number of the extracted features and row represents the number of samples (examples) for each feature.
Then I have divided the data into two parts, training and testing (the training matrix contains 25 rows and 143 columns, while the testing Matrix contains 11 rows and 143 columns).
Then, for each user, I divided the matrix (All_Feat) into two matrixes ( Training matrix, and Test matrix ).
The training matrix contains ( 25 rows (examples) and 143 columns), while the testing matrix has (11 rows and 143 columns).
I'm new to classification and this stuff
I'd like to use machine learning (Neural Network) for classifying these features.
Therefore, the first step I need to create a reference template for each user ( which called training phase)
this can be done by training the classifier with the user's features (data) and the remaining Users as well (35 users are considered as imposters).
Based on what I have read, training Neural Network requires two classes, the first class contains all the training data of genuine user (e.g. User1) and labelled with 1 , while the second class has the training data of imposters labelled as 0 (which is binary classification, 1 for the authorised user and 0 for imposters).
**now my question is: **
1- i dont know how to create these classes!
2- For example, if I want to train Neural Network for User1, I have these variables, input and target. what should I assign to these variables?
should input= Training matrix of User1 and Training matrixes of User2, User3,.....User35 ?
Target=what should i assign to this matrix?
I really appreciate any help!
Try this: https://es.mathworks.com/help/nnet/gs/classify-patterns-with-a-neural-network.html
A few notes:
You said that, for each user, you have extracted 136 features. It sounds like you have only one repetition for each user (i.e. the user has tried-used the system once). However, I don't know the source of your data, but I dunno that it hasn't got some type of randomness. You mention gait analysis, and that sounds like that the recorded data of a given one user will be different each time that user uses the system. In other words: the user uses your system, you capture the data, you extract the 136 features (numbers); then, the user uses again the system, but the extracted 136 features will be slightly different. Therefore, you should get several examples for each user to train the classifier. In terms of "matlab matrix" your matrix should have one COLUMN for each example, and 136 rows (each of you features). Since you should have several repetitions for each user (for example 10 times), your big matrix should be something like: 136 rows x 360 columns.
You should "create" one new neural network for each user. Given a user (for example User4), you create a dataset (a new matrix) with samples of that user, and samples of several other users (User1, User3, User5...). You do a binary classification (cases: "user4" against "other users"). After training, it would be advisable to test the classifier with data of other users whose data was not present during the training phase (for example User2 and others). Since you are doing a binary classification your matrices should be somthing like follows:
Example, you have 10 trials (examples) of each user. You want to create a neural network to detect the user User1. The matrix should be like:
(notation cU1_t1 means: column with features of user 1, trial 1)
input_matrix = [cU1_t1; cU1_t2; ...; cU1_t10; cU2_t1; ...; cU36_t10]
The target matrix should be like:
target = a matrix whose 10 first columns are [ 1, 0], and the other 350 columns are [0, 1]. That means that the first 10 columns are of type A, and the others of type B. In this case "type A" means "User1", and "type B" means "Not User1".
Then, you should segment the data (train data, validation data, test data) to train the nerual network and so on. Remember to save some users just for the testing phase, for example, the train matrix should not have any of the columns of five users: user2, user6, user7, user10, user20 (50 columns).
I think you get the idea.
Regards.
************ UPDATE: ******************************
This example assumes that the user selects/indicates its name and then the system uses the neural network to authenticate the user (like a password). I will give you an small example with random numbers.
Let's say you have recorded data from 15 users (but in the future you will have more). You record "gait data" from them when they do something with your recording device. From the recorded signals you extract some features, let's say you extract 5 features (5 numbers). Hence, everytime a user uses the machine you get 5 numbers. Even if user is the same, the 5 numbers will be different each time, because the recorded signals have some randomness. Therefore, to train the neural network you have to have several examples of each user. Let's say that you have 18 repetitions performed by each user.
To sum up this example:
There are 15 users available for the experiment.
Each time the user uses the system you record 5 numbers (features). You get a feature vector. In matlab it will be a COLUMN.
For the experiment each user has performed 18 repetitions.
Now you have to create one neural network for each user. To that end, you have to construct several matrices.
Let's say you want to create the neural network (NN) of user 2 (U2). The NN will classify the feature vectors in 2 classes: U2 and NotU2. Therefore, you have to train and test the NN with examples of this. The group NotU2 represents any other user that it is not U2, however, you should NOT train the NN with data of every other user that you have in your experiment. This will be cheating (think that you can't have data from every user in the world). Therefore, to create the train dataset you will exclude all the repetitions of some users to test the NN during the training (validation dataset) and after the trainning (test dataset). For this example we will use users {U1,U3,U4} for validation, and users {U5,U6,U7} for testing.
Therefore you construct the following matrices:
Train input matrix
It wil have 12 examples of U2 (70% more or less) and every example of users {U8,U9,...,U14,U15}. Each example is a column, hence, the train matrix will be a matrix of 5 rows and 156 columns (12+8*18). I will order it as follows: [U2_ex1, U2_ex2, ..., U2_ex12, U8_ex1, U8_ex2, ..., U8_ex18, U9_ex1, ..., U15_ex1,...U15_ex18]. Where U2_ex1 represents a column vector with the 5 features obtained of User 2 during the repetition/example number 1.
-- Target matrix of train matrix. It is a matrix of 2 rows and 156 columns. Each column j represents the correct class of the example j. The column is formed by zeros, and it has a 1 at the row that indicates the class. Since we have only 2 classes the matrix has only 2 rows. I will say that class U2 will be the first one (hence the column vector for each example of this class will be [1 0]), and the other class (NotU2) will be the second one (hence the column vector for each example of this class will be [0 1]). Obviously, the columns of this matrix have the same order than the train matrix. So, according to the order that I have used, the target matrix will be:
12 columns [1 0] and 144 columns [0 1].
Validation input matrix
It will have 3 examples of U2 (15% more or less) and every example of users [U1,U3,U4]. Hence, this will be a matrix of 6 rows and 57 columns ( 3+3*18).
-- Target matrix of validation matrix: A matrix of 2 rows and 57 columns: 3 columns [1 0] and 54 columns [0 1].
Test input matrix
It will have the remaining 3 examples of U2 (15%) and every example of users [U5,U6,U7]. Hence, this will be a matrix of 6 rows and 57 columns (3+3*18).
-- Target matrix of test matrix: A matrix of 2 rows and 57 columns: 3 columns [1 0] and 54 columns [0 1].
IMPORTANT. The columns of each matrix should have a random order to improve the training. That is, do not put all the examples of U2 together and then the others. For this example I have put them in order for clarity. Obviously, if you change the order of the input matrix, you have to use the same order in the target matrix.
To use MATLAB you will have to to pass two matrices: the inputMatrix and the targetMatrix. The inputMatrix will have the train,validation and test input matrices joined. And the targetMatrix the same with the targets. So, the inputMatrix will be a matrix of 6 rows and 270 columns. The targetMatrix will have 2 rows and 270 columns. For clarity I will say that the first 156 columns are the trainning ones, then the 57 columns of validation, and finally 57 columns of testing.
The MATLAB commands will be:
% Create a Pattern Recognition Network
hiddenLayerSize = 10; %You can play with this number
net = patternnet(hiddenLayerSize);
%Specify the indices of each matrix
net.divideFcn = 'divideind';
net.divideParam.trainInd = [1: 156];
net.divideParam.valInd = [157:214];
net.divideParam.testInd = [215:270];
% % Train the Network
[net,tr] = train(net, inputMatrix, targetMatrix);
In the open window you will be able to see the performance of your neural network. The output object "net" is your neural network trained. You can use it with new data if you want.
Repeat this process for each other user (U1, U3, ...U15) to obtain his/her neural network.

cross validation function crossvalind

I have question please; concerning cross validation, for me the cross-validation is used to find the best parameters.
but I did not understand the role of this function "crossvalind":Generate cross-validation indices, it just takes a data set without model, like in this exemple :
load fisheriris
[g gn] = grp2idx(species);
[trainIdx testIdx] = crossvalind('HoldOut', species, 1/3);
crossvalind() function splits your data in two groups: the training set and the cross-validation set.
By your example:
[trainIdx testIdx] = crossvalind('HoldOut', size(species,1), 1/3); means split the data in species (2/3 in the training set and 1/3 in the cross-validation set).
Supposing that your data is like:
species=[datarow1;datarow2;datarow3;datarow4;datarow5;datarow6] then
trainIdx would be like [1;1;0;1;1;0] and testIdx would be like [0;0;1;0;0;1] meaning that from the 6 total elements in our set crossvalind function assigned 4 to the train set and 2 to the cross-validation set. Of course this is a random assignment meaning that the zero and ones indices will vary every time you call the function but the proportion between them will be fixed and trainIdx + testIdx will always be ones(size(species,1),1)
crossvalind('LeaveMout',size(species,1),2) would be exactly the same as crossvalind('HoldOut', size(species,1), 1/3) in this particular case. In the 'HoldOut' format you provide parameter P which takes values from 0 to 1 (like 1/3 in the example above) while with the option 'LeaveMout' you provide integer M like 2 samples from the 6 total or like 2000 samples from the 10000 total samples in your dataset. In case of 'Resubstitution': crossvalind('Resubstitution', size(species,1), [1/3,2/3]) would be yet the same but here you also have the option of let's say [1/3,3/4] meaning that some samples can be on both the train and cross-validation sets, or even [1,1] which means that all the samples are used in both sets (trainIdx=testIdx=[1;1;1;1;1;1] in the above example). I strongly suggest to type help crossvalind and take a look at the help file which is always a lot more detailed and helpful than i could ever be.

Time series forecasting

I have an input and target series. However, the target series lags 3 steps behind the input. Can I still use narx or some other network?
http://www.mathworks.co.uk/help/toolbox/nnet/ref/narxnet.html
Predict: y(t+1)
Input:
x(t) |?
x(t-1)|?
x(t-2)|?
x(t-3)|y(t-3)
x(t-4)|y(t-4)
x(t-5)|y(t-5)
...
During my training, I have y(t-2), y(t-1), y(t) in advance, but when I do the prediction in real life, those values are only available 3 steps later, because I calculate y from the next 3 inputs.
Here are some options
1) Also, you could have two inputs and one output as
x(t), y(t-3) -> y(t)
x(t-1),y(t-4) -> y(t-1)
x(t-2),y(t-5) -> y(t-2)
...
and predict the single output y(t)
2) You could also use ar or arx with na = 0, nb > 0, and nk = 3.
3) Also, you could have four inputs, where 2 of the inputs are estimated and one output as
x(t), y(t-3), ye(t-2), ye(t-1) -> y(t)
x(t-1),y(t-4), y(t-3), ye(t-2) -> y(t-1)
x(t-2),y(t-5), y(t-4), y(t-3) -> y(t-2)
...
and predict the single output y(t), using line 3 and higher as training data
4) You could setup the input/output as in steps one or three and use s4sid
I have a similar problem, but without any measurable inputs. And I'm trying to see how much error there is as the forecast distance and model complexity are increased. But I've only tried approach 2 and set nb = 5 to 15 by 5 and varied nk from 20 to 150 by 10 and plotted contours of maximum error. In my case, I'm not interested in predictions of less than 20 time steps.
Define a window of your choice( you need to try different sizes to see which is the best value). Now make this problem a regression problem. Use values of xt and yt from t=T-2... T-x where x-2 is the size of window. Now use regress() to train a regression model and use it for prediction.