What exactly is the alignment of inputs and targets in MATLAB NARX networks? When is one ahead prediction appropriate? - matlab

This is a basic question, but I have managed to confuse myself. I have already searched for prior answers unsuccessfully, and while there is likely one one among the many hundreds of submissions from over the past decade, I have not
been to locate it.
I have a target vector and several input vectors aligned with the target in time. That is, on any given day the input values are known, but the
actual target value is not. The idea is to use the the input values to try to estimate the target value.
First I will show the "default" method as returned as a simple script by MATLAB then I will show the one-step ahead prediction variant.
Which is correct for my goal? That is, whcih gives the correct alignment of the inputs, targets and estimated targets?
For the "default" method I do the following:
% gather the n inputs for days=1,2,3,... into n columns, one for each input
% put the target for days=1,2,3,... into 1 column
% prepare inputs and targets
X = tonndata(inputs,false,false);
T = tonndata(target,false,false);
% choose the training function
trainFcn = 'trainlm';
% create a nonlinear autoregressive network
inputDelays = 1:4;
feedbackDelays = 1:4;
hiddenLayerSize = 10;
net = narxnet(inputDelays,feedbackDelays,hiddenLayerSize,'open',trainFcn);
% prepare for training and simulation
[x,xi,ai,t] = preparets(net,X,{},T);
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
% train the network
[net,tr] = train(net,x,t,xi,ai);
% test the network
y = net(x,xi,ai);
% Next, I implement the alternative method, the Step-Ahead Prediction Network
nets = removedelay(net);
[xs,xis,ais,ts\] = preparets(nets,X,{},T);
ys = nets(xs,xis,ais);
The MATLAB comments in the script state that "The new network returns the same outputs as the original network, but outputs are shifted left
one timestep." Well, that's true, except that there is also an additional new value at the end of ys not present in y!
My question is: Which method puts the estimated target value into the same day as the outputs used to estimate it?
With the one step ahead prediction, I expected that if I gave a NaN as the last target value, the last output value would also be a NaN. This is not what happened. A value appeared, not present in the non-step ahead method, since all the other values were shifted to the left.

Related

Neural network y=f(x) regression

Encouraged by some success in MNIST classification I wanted to solve a "real" problem with some neural networks.
The task seems quite easy:
We have:
some x-value (e.g. 1:1:100)
some y-values (e.g. x^2)
I want to train a network with 1 input (for 1 x-value) and one output (for 1 y-value). One hidden layer.
Here is my basic procedure:
Slicing my x-values into different batches (e.g. 10 elements per batch)
In each batch calculating the outputs of the net, then applying backpropagation, calculating weight and bias updates
After each batch averaging the calculated weight and bias updates and actually update the weights and biases
Repeating step 1. - 3. multiple times
This procedure worked fine for MNIST, but for the regression it totally fails.
I am wondering if I do something fundamentally wrong.
I tried different batchsizes, up to averaging over ALL x values.
Basically the network does not train well. After manually tweaking the weights and biases (with 2 hidden neurons) I could approximate my y=f(x) quite well, but when the network shall learn the parameters, it fails.
When I have just one element for x and one for y and I train the network, it trains well for this one specific pair.
Maybe somebody has a hint for me. Am I misunderstanding regression with neural networks?
So far I assume, the code itself is okay, as it worked for MNIST and it works for the "one x/y pair example". I rather think my overall approach (see above) may be not suitable for regression.
Thanks,
Jim
ps: I will post some code tomorrow...
Here comes the code (MATLAB). As I said, its one hidden layer, with two hidden neurons:
% init hyper-parameters
hidden_neurons=2;
input_neurons=1;
output_neurons=1;
learning_rate=0.5;
batchsize=50;
% load data
training_data=d(1:100)/100;
training_labels=v_start(1:100)/255;
% init weights
init_randomly=1;
if init_randomly
% initialize weights and bias with random numbers between -0.5 and +0.5
w1=rand(hidden_neurons,input_neurons)-0.5;
b1=rand(hidden_neurons,1)-0.5;
w2=rand(output_neurons,hidden_neurons)-0.5;
b2=rand(output_neurons,1)-0.5;
else
% initialize with manually determined values
w1=[10;-10];
b1=[-3;-0.5];
w2=[0.2 0.2];
b2=0;
end
for epochs =1:2000 % looping over some epochs
for i = 1:batchsize:length(training_data) % slice training data into batches
batch_data=training_data(i:min(i+batchsize,length(training_data))); % generating training batch
batch_labels=training_labels(i:min(i+batchsize,length(training_data))); % generating training label batch
% initialize weight updates for next batch
w2_update=0;
b2_update =0;
w1_update =0;
b1_update =0;
for k = 1: length(batch_data) % looping over one single batch
% extract trainig sample
x=batch_data(k); % extracting one single training sample
y=batch_labels(k); % extracting expected output of training sample
% forward pass
z1 = w1*x+b1; % sum of first layer
a1 = sigmoid(z1); % activation of first layer (sigmoid)
z2 = w2*a1+b2; % sum of second layer
a2=z2; %activation of second layer (linear)
% backward pass
delta_2=(a2-y); %calculating delta of second layer assuming quadratic cost; derivative of linear unit is equal to 1 for all x.
delta_1=(w2'*delta_2).* (a1.*(1-a1)); % calculating delta of first layer
% calculating the weight and bias updates averaging over one
% batch
w2_update = w2_update +(delta_2*a1') * (1/length(batch_data));
b2_update = b2_update + delta_2 * (1/length(batch_data));
w1_update = w1_update + (delta_1*x') * (1/length(batch_data));
b1_update = b1_update + delta_1 * (1/length(batch_data));
end
% actually updating the weights. Updated weights will be used in
% next batch
w2 = w2 - learning_rate * w2_update;
b2 = b2 - learning_rate * b2_update;
w1 = w1 - learning_rate * w1_update;
b1 = b1 - learning_rate * b1_update;
end
end
Here is the outcome with random initialization, showing the expected output, the output before training, and the output after training:
training with random init
One can argue that the blue line is already closer than the black one, in that sense the network has optimized the results already. But I am not satisfied.
Here is the result with my manually tweaked values:
training with pre-init
The black line is not bad for just two hidden neurons, but my expectation was rather, that such a black line would be the outcome of training starting with random init.
Any suggestions what I am doing wrong?
Thanks!
Ok, after some research I found some interesting points:
The function I tried to learn seems particularly hard to learn (not sure why)
With the same setup I tried to learn some 3rd degree polynomials which was successful (cost <1e-6)
Randomizing training samples seems to improve learning (for the polynomial and my initial function). I know this is well known in literature but I always skipped that part in implementation. So I learned for myself how important it is.
For learning "curvy/wiggly" functions, I found sigmoid works better than ReLu. (output layer is still "linear" as suggested for regression)
a learning rate of 0.1 worked fine for the curve fitting I finally wanted to perform
A larger batchsize would smoothen the cost vs. epochs plot (surprise...)
Initializing weigths between -5 and +5 worked better than -0.5 and 0.5 for my application
In the end I got quite convincing results for what I intendet to learn with the network :)
Have you tried with a much smaller learning rate? Generally, learning rates of 0.001 are a good starting point, 0.5 is in most cases way too large.
Also note that your predefined weights are in an extremely flat region of the sigmoid function (sigmoid(10) = 1, sigmoid(-10) = 0), with the derivative at both positions close to 0. That means that backpropagating from such a position (or getting to such a position) is extremely difficult; For exactly that reason, some people prefer to use ReLUs instead of sigmoid, since it has only a "dead" region for negative activations.
Also, am I correct in seeing that you only have 100 training samples? You could maybe try a smaller batch size, or increase the number of samples you take. Also don't forget to shuffle your samples after each epoch. Reasons are given plenty, for example here.

Feedforward neural network classification in Matlab

I have two gaussian distribution samples, one guassian contains 10,000 samples and the other gaussian also contains 10,000 samples, I would like to train a feed-forward neural network with these samples but I dont know how many samples I have to take in order to get an optimal decision boundary.
Here is the code but I dont know exactly the solution and the output are weirds.
x1 = -49:1:50;
x2 = -49:1:50;
[X1, X2] = meshgrid(x1, x2);
Gaussian1 = mvnpdf([X1(:) X2(:)], mean1, var1);// for class A
Gaussian2 = mvnpdf([X1(:) X2(:)], mean2, var2);// for Class B
net = feedforwardnet(10);
G1 = reshape(Gaussian1, 10000,1);
G2 = reshape(Gaussian2, 10000,1);
input = [G1, G2];
output = [0, 1];
net = train(net, input, output);
When I ran the code it give me weird results.
If the code is not correct, can someone please suggest me so that I can get a decision boundary for these two distributions.
I'm pretty sure that the input must be the Gaussian distribution (and not the x coordinates). In fact the NN has to understand the relationship between the phenomenons themselves that you are interested (the Gaussian distributions) and the output labels, and not between the space in which are contained the phenomenons and the labels. Moreover, If you choose the x coordinates, the NN will try to understand some relationship between the latter and the output labels, but the x are something of potentially constant (i.e., the input data might be even all the same, because you can have very different Gaussian distribution in the same range of the x coordinates only varying the mean and the variance). Thus the NN will end up being confused, because the same input data might have more output labels (and you don't want that this thing happens!!!).
I hope I was helpful.
P.S.: for doubt's sake I have to tell you that the NN doesn't fit very well the data if you have a small training set. Moreover don't forget to validate your data model using the cross-validation technique (a good rule of thumb is to use a 20% of your training set for the cross-validation set and another 20% of the same training set for the test set and thus to use only the remaining 60% of your training set to train your model).

getting the delay/lag of Ultrasonic pulse velocity matlab

I am currently doing a thesis that needs Ultrasonic pulse velocity(UPV). UPV can easily be attained via the machines but the data we acquired didn't have UPV so we are tasked to get it manually.
Essentially in the data we have 2 channels, one for the transmitter transducer and another for a receiver transducer.
We need to get the time from wave from the transmitter is emitted and the time it arrives to the receiver.
Using matlab, I've tried finddelay and xcorr but doesnt quite get the right result.
Here is a picture of the points I would want to get. The plot is of the transmitter(blue) and receiver(red)
So I am trying to find the two points in the picture but with the aid of matlab. The two would determine the time and further the UPV.
I am relatively a new MATLAB user so your help would be of great assistance.
Here is the code I have tried
[cc, lags] = xcorr(signal1,signal2);
d2 = -(lags(cc == max(cc))) / Fs;
#xenoclast hi there! so far the code i used are these.
close all
clc
Fs = input('input Fs = ');
T = 1/Fs;
L = input('input L = ');
t = (0:L-1)*T;
time = transpose(t);
i = input('input number of steploads = ');
% construct test sequences
%dataupv is the signal1 & datathesis is the signal2
for m=1:i
y1 = (dataupv(:,m) - mean(dataupv(:,m))) / std(dataupv(:,m));
y2 = (datathesis(:,m) - mean(datathesis(:,m))) / std(datathesis(:,m));
offset = 166;
tt = time;
% correlate the two sequences
[cc, lags] = xcorr(y2, y1,);
% find the in4dex of the maximum
[maxval, maxI] = max(cc);
[minval, minI] = min(cc);
% use that index to obtain the lag in samples
lagsamples(m,1) = lags(maxI);
lagsamples2(m,1) = lags(minI);
% plot again without timebase to verify visually
end
the resulting value is off by 70 samples compared to when i visually inspect the waves. the lag resulted in 244 but visually it should be 176 here are the data(there are 19 sets of data but i only used the 1st column) https://www.dropbox.com/s/ng5uq8f7oyap0tq/datatrans-dec-2014.xlsx?dl=0 https://www.dropbox.com/s/1x7en0x7elnbg42/datarec-dec-2014.xlsx?dl=0
Your example code doesn't specify Fs so I don't know for sure but I'm guessing that it's an issue of sample rate(s). All the examples of cross correlation start out by constructing test sequences according to a specific sample rate that they usually call Fs, not to be confused with the frequency of the test tone, which you may see called Fc.
If you construct the test signals in terms of Fs and Fc then this works fine but when you get real data from an instrument they often give you the data and the timebase as two vectors, so you have to work out the sample rate from the timebase. This may not be the same as the operating frequency or the components of the signal, so confusion is easy here.
But the sample rate is only required in the second part of the problem, where you work out the offset in time. First you have to get the offset in samples and that's a lot easier to verify.
Your example will give you the offset in samples if you remove the '/ Fs' term and you can verify it by plotting the two signals without a timebase and inspecting the sample positions.
I'm sure you've looked at dozens of examples but here's one that attempts to not confuse the issue by tying it to sample rates - you'll note that nowhere is it specified what the 'sample rate' is, it's just tied to samples (although if you treat the 5 in the y1 definition as a frequency in Hz then you will be able to infer one).
% construct test sequences
offset = 23;
tt = 0:0.01:1;
y1 = sin(2*pi*5*tt);
y2 = 0.8 * [zeros(1, offset) y1];
figure(1); clf; hold on
plot(tt, y1)
plot(tt, y2(1:numel(tt)), 'r')
% correlate the two sequences
[cc, lags] = xcorr(y2, y1);
figure(2); clf;
plot(cc)
% find the index of the maximum
[maxval, maxI] = max(cc);
% use that index to obtain the lag in samples
lagsamples = lags(maxI);
% plot again without timebase to verify visually
figure(3); clf; hold on
plot(y1)
plot(y2, 'r')
plot([offset offset], [-1 1], 'k:')
Once you've got the offset in samples you can probably deduce the required conversion factor, but if you have timebase data from the instrument then the inverse of the diff of any two consecutive entries will give it you.
UPDATE
When you correlate the two signals you can visualise it as overlaying them and summing the product of corresponding elements. This gives you a single value. Then you move one signal by a sample and do it again. Continue until you have done it at every possible arrangement of the two signals.
The value obtained at each step is the correlation, but the 'lag' is computed starting with one signal all the way over to the left and the other overlapping by only one sample. You slide the second signal all the way over until it's only overlapping the other end by a sample. Hence the number of values returned by the correlation is related to the length of both the original signals, and relating any given point in the correlation output, such as the max value, to the arrangement of the two signals that produced it requires you to do a calculation involving those lengths. The xcorr function makes this easier by outputting the lags variable, which tracks the alignment of the two signals. People may also talk about this as an offset so watch out for that.

neural network for handwritten recognition?

I have been following the course of Andrew Ng about Machine Learning, and I currently have some doubts about the implementation of a handwritten recognition tool.
-First he says that he uses a subset of the MNIST dataset, which contaings 5000 training examples and each training example is an image in a 20x20 gray scale format. With that he says that we have a vector of 400 elements of length that is the "unrolled" of the data previously described. Does it mean that the train set has something like the following format?
Training example 1 v[1,2,...,400]
Training example 2 v[1,2,...,400]
...
Training example 5000 v[1,2,...,400]
For the coding part the author gives the following complete code in Matlab:
%% Machine Learning Online Class - Exercise 3 | Part 2: Neural Networks
% Instructions
% ------------
%
% This file contains code that helps you get started on the
% linear exercise. You will need to complete the following functions
% in this exericse:
%
% lrCostFunction.m (logistic regression cost function)
% oneVsAll.m
% predictOneVsAll.m
% predict.m
%
% For this exercise, you will not need to change any code in this file,
% or any other files other than those mentioned above.
%
%% Initialization
clear ; close all; clc
%% Setup the parameters you will use for this exercise
input_layer_size = 400; % 20x20 Input Images of Digits
hidden_layer_size = 25; % 25 hidden units
num_labels = 10; % 10 labels, from 1 to 10
% (note that we have mapped "0" to label 10)
%% =========== Part 1: Loading and Visualizing Data =============
% We start the exercise by first loading and visualizing the dataset.
% You will be working with a dataset that contains handwritten digits.
%
% Load Training Data
fprintf('Loading and Visualizing Data ...\n')
load('ex3data1.mat');
m = size(X, 1);
% Randomly select 100 data points to display
sel = randperm(size(X, 1));
sel = sel(1:100);
displayData(X(sel, :));
fprintf('Program paused. Press enter to continue.\n');
pause;
%% ================ Part 2: Loading Pameters ================
% In this part of the exercise, we load some pre-initialized
% neural network parameters.
fprintf('\nLoading Saved Neural Network Parameters ...\n')
% Load the weights into variables Theta1 and Theta2
load('ex3weights.mat');
%% ================= Part 3: Implement Predict =================
% After training the neural network, we would like to use it to predict
% the labels. You will now implement the "predict" function to use the
% neural network to predict the labels of the training set. This lets
% you compute the training set accuracy.
pred = predict(Theta1, Theta2, X);
fprintf('\nTraining Set Accuracy: %f\n', mean(double(pred == y)) * 100);
fprintf('Program paused. Press enter to continue.\n');
pause;
% To give you an idea of the network's output, you can also run
% through the examples one at the a time to see what it is predicting.
% Randomly permute examples
rp = randperm(m);
for i = 1:m
% Display
fprintf('\nDisplaying Example Image\n');
displayData(X(rp(i), :));
pred = predict(Theta1, Theta2, X(rp(i),:));
fprintf('\nNeural Network Prediction: %d (digit %d)\n', pred, mod(pred, 10));
% Pause
fprintf('Program paused. Press enter to continue.\n');
pause;
end
and the predict function should be complete by the students, I have done the following:
function p = predict(Theta1, Theta2, X)
%PREDICT Predict the label of an input given a trained neural network
% p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
% trained weights of a neural network (Theta1, Theta2)
% Useful values
m = size(X, 1);
num_labels = size(Theta2, 1);
% You need to return the following variables correctly
p = zeros(size(X, 1), 1);
X = [ones(m , 1) X];
% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
% your learned neural network. You should set p to a
% vector containing labels between 1 to num_labels.
%
% Hint: The max function might come in useful. In particular, the max
% function can also return the index of the max element, for more
% information see 'help max'. If your examples are in rows, then, you
% can use max(A, [], 2) to obtain the max for each row.
%
a1 = X;
a2 = sigmoid(a1*Theta1');
a2 = [ones(m , 1) a2];
a3 = sigmoid(a2*Theta2');
[M , p] = max(a3 , [] , 2);
Even thought it runs I am not completely aware of how it really works (I have just followed the step by step instructions that is on the author's website). I have doubts in the following:
The author considers that X(input) is an array of 5000 x 400 elements, or it has 400 neurons as input, with 10 neurons as output and a hidden layer. Does it mean this 5000 x 400 values are the training set?
The author gives us the values of theta 1 and theta 2, which I believe serve as weights for the calculations on the inner layer, but how does values are obtained? Why does he uses 25 neurons of hidden layer and not 24 or 30?
Any help will be apreciated.
Thanks
Let's break your question in parts:
First he says that he uses a subset of the MNIST dataset, which
contaings 5000 training examples and each training example is an image
in a 20x20 gray scale format. With that he says that we have a vector
of 400 elements of length that is the "unrolled" of the data
previously described. Does it mean that the train set has something
like the following format? (...)
You're on the right track. Each training example is a 20x20 image. The simplest neural network model, introduced in the course, treats each image just as a simple 1x400 vector (the "unrolled" means exactly this transformation). The dataset is stored in a matrix because this way you can perform computations faster exploiting the efficient linear algebra libraries which are used by Octave/Matlab. You don't need necessarily to store all training examples as a 5000x400 matrix, but this way your code will run faster.
The author considers that X(input) is an array of 5000 x 400 elements,
or it has 400 neurons as input, with 10 neurons as output and a hidden
layer. Does it mean this 5000 x 400 values are the training set?
The "input layer" is nothing but the very input image. You can think of it as neurons whose output values were already calculated or as the values were coming from outside the network (think about your retina. It is like the input layer of you visual system). Thus this network has 400 input units (the "unrolled" 20x20 image). But of course, your training set doesn't consist of a single image, thus you put all your 5000 images together in a single 5000x400 matrix to form your training set.
The author gives us the values of theta 1 and theta 2, which I believe
serve as weights for the calculations on the inner layer, but how does
values are obtained?
These theta values were found using a algorithm called backpropagation. If you didn't have to implement it in the course yet, just be patient. It might be in the exercises soon! Btw, yes they are the weights.
Why does he uses 25 neurons of hidden layer and not 24 or 30?
He probably chose an arbitrarily value that doesn't run too slow, neither has too poor performance. You probably can find much better values for this hyper-parameters. But if you increase it too much, the training procedure will take probably much longer. Also since you are just using a small portion of the hole training set (the original MNIST has 60000 training examples and 28x28 images), you need to use a "small" number of hidden units to prevent over fitting. If you use too many units your neurons will "learn by heart" the training examples and will not be able to generalize to new unseen data. Finding the hyper parameters, such as the number of the hidden units, is a kind of art that you will master with experience (and maybe with Bayesian optimization and more advanced method, but that's another story xD).
I did the same course some time ago.
X is the input data. Therefore X is the matrix consisting of the 5 000 vectors of 400 elements each. There is no training set, because the network is pre trained.
Normally the values for theta 1 and 2 are trained. How this is done is a subject for the next few lectures. (Backpropagation algorithm)
I'm not entirely sure, why he used 25 neurons as hidden layer. However my guess is, that this number of neurons simply works, without making the training step take forever.

Matlab Multilayer Perceptron Question

I need to classify a dataset using Matlab MLP and show classification.
The dataset looks like
Click to view
What I have done so far is:
I have create an neural network contains a hidden layer (two neurons
?? maybe someone could give me some suggestions on how many
neurons are suitable for my example) and a output layer (one
neuron).
I have used several different learning methods such as Delta bar
Delta, backpropagation (both of these methods are used with or -out
momentum and Levenberg-Marquardt.)
This is the code I used in Matlab(Levenberg-Marquardt example)
net = newff(minmax(Input),[2 1],{'logsig' 'logsig'},'trainlm');
net.trainParam.epochs = 10000;
net.trainParam.goal = 0;
net.trainParam.lr = 0.1;
[net tr outputs] = train(net,Input,Target);
The following shows hidden neuron classification boundaries generated by Matlab on the data, I am little bit confused, beacause network should produce nonlinear result, but the result below seems that two boundary lines are linear..
Click to view
The code for generating above plot is:
figure(1)
plotpv(Input,Target);
hold on
plotpc(net.IW{1},net.b{1});
hold off
I also need to plot the output function of the output neuron, but I am stucking on this step. Can anyone give me some suggestions?
Thanks in advance.
Regarding the number of neurons in the hidden layer, for such an small example two are more than enough. The only way to know for sure the optimum is to test with different numbers. In this faq you can find a rule of thumb that may be useful: http://www.faqs.org/faqs/ai-faq/neural-nets/
For the output function, it is often useful to divide it in two steps:
First, given the input vector x, the output of the neurons in the hidden layer is y = f(x) = x^T w + b where w is the weight matrix from the input neurons to the hidden layer and b is the bias vector.
Second, you will have to apply the activation function g of the network to the resulting vector of the previous step z = g(y)
Finally, the output is the dot product h(z) = z . v + n, where v is the weight vector from the hidden layer to the output neuron and n the bias. In the case of more than one output neurons, you will repeat this for each one.
I've never used the matlab mlp functions, so I don't know how to get the weights in this case, but I'm sure the network stores them somewhere. Edit: Searching the documentation I found the properties:
net.IW numLayers-by-numInputs cell array of input weight values
net.LW numLayers-by-numLayers cell array of layer weight values
net.b numLayers-by-1 cell array of bias values