I was toying around with the matlab neural network toolbox and I encountered some things I did not expect. My problem is especially with a classification network with no hidden layers, only 1 input and the tansig transfer function. So I expect this classifier to divide a 1D dataset at some point that is defined by the learned input weight and bias.
First of all, I thought that the formula for computing the output y for a given input x is:
y = f(x*w + b)
with w the input weight and b the bias. What is the correct formula for calculating the output of the network?
I also expected that translating the whole dataset by a certain value (+77) would have a big effect on the bias and/or weight. But this doesn't seem to be the case. Why does the translation of the dataset has not much effect on the bias and weight?
This is my code:
% Generate data
p1 = randn(1, 1000);
t1(1:1000) = 1;
p2 = 3 + randn(1, 1000);
t2(1:1000) = -1;
% view data
figure, hist([p1', p2'], 100)
P = [p1 p2];
T = [t1 t2];
% train network without hidden layer
net1 = newff(P, T, [], {'tansig'}, 'trainlm');
[net1,tr] = train(net1, P, T);
% display weight and bias
w = net1.IW{1,1};
b = net1.b{1,1};
disp(w) % -6.8971
disp(b) % -0.2280
% make label decision
class_correct = 0;
outputs = net1(P);
for i = 1:length(outputs)
% choose between -1 and 1
if outputs(i) > 0
outputs(i) = 1;
else
outputs(i) = -1;
end
% compare
if T(i) == outputs(i)
class_correct = class_correct + 1;
end
end
% Calculate the correct classification rate (CCR)
CCR = (class_correct * 100) / length(outputs);
fprintf('CCR: %f \n', CCR);
% plot the errors
errors = gsubtract(T, outputs);
figure, plot(errors)
% I expect these to be equal and near 1
net1(0) % 0.9521
tansig(0*w + b) % -0.4680
% I expect these to be equal and near -1
net1(4) % -0.9991
tansig(4*w + b) % -1
% translate the dataset by +77
P2 = P + 77;
% train network without hidden layer
net2 = newff(P2, T, [], {'tansig'}, 'trainlm');
[net2,tr] = train(net2, P2, T);
% display weight and bias
w2 = net2.IW{1,1};
b2 = net2.b{1,1};
disp(w2) % -5.1132
disp(b2) % -0.1556
I generated an artificial dataset that is made of 2 populations with a normal distribution with a different mean. I plotted these populations in a histogram, and trained the network with it.
I calculate the Correct classification rate, which is the percentage of the correct classified instances of the whole dataset. This is somewhere around 92% so I know the classifier works.
But, I expected net1(x) and tansig(x*w + b) to give the same output, this is not the case. What is the correct formula for calculating the output of my trained network?
And I expected net1 and net2 to have different weights and/or bias because net2 is trained on a translated version (+77) of the dataset where net1 is trained on. Why does the translation of the dataset has not much effect on the bias and weight?
First off, your code leaves the default MATLAB input pre-processing intact. You can check this with:
net2.inputs{1}
When I put your code in I got this:
Neural Network Input
name: 'Input'
feedbackOutput: []
processFcns: {'fixunknowns', removeconstantrows,
mapminmax}
processParams: {1x3 cell array of 2 params}
processSettings: {1x3 cell array of 3 settings}
processedRange: [1x2 double]
processedSize: 1
range: [1x2 double]
size: 1
userdata: (your custom info)
The important part being processFncs set to mapminmax. According to the docs, mapminmax will "Process matrices by mapping row minimum and maximum values to [-1 1]". That's why your "shifting" (re-sampling) your inputs arbitrarily had no effect.
I assume that by "calculating the output" you mean checking the performance of your network. You can do that by first simulating your network on a dataset (see doc nnet/sim) and then checking the "correctness" with the perform function. It will use the same cost function as you did when training:
% get predictions
[Y] = sim(net2,T);
% see how well we did
perf = perform(net2,T,Y);
Boomshuckalucka. Hope that helps.
Related
I want to determine how well the estimated model fits to the future new data. To do this, prediction error plot is often used. Basically, I want to compare the measured output and the model output. I am using the Least Mean Square algorithm as the equalization technique. Can somebody please help what is the proper way to plot the comparison between the model and the measured data? If the estimates are close to true, then the curves should be very close to each other. Below is the code. u is the input to the equalizer, x is the noisy received signal, y is the output of the equalizer, w is the equalizer weights. Should the graph be plotted using x and y*w? But x is noisy. I am confused since the measured output x is noisy and the model output y*w is noise-free.
%% Channel and noise level
h = [0.9 0.3 -0.1]; % Channel
SNRr = 10; % Noise Level
%% Input/Output data
N = 1000; % Number of samples
Bits = 2; % Number of bits for modulation (2-bit for Binary modulation)
data = randi([0 1],1,N); % Random signal
d = real(pskmod(data,Bits)); % BPSK Modulated signal (desired/output)
r = filter(h,1,d); % Signal after passing through channel
x = awgn(r, SNRr); % Noisy Signal after channel (given/input)
%% LMS parameters
epoch = 10; % Number of epochs (training repetation)
eta = 1e-3; % Learning rate / step size
order=10; % Order of the equalizer
U = zeros(1,order); % Input frame
W = zeros(1,order); % Initial Weigths
%% Algorithm
for k = 1 : epoch
for n = 1 : N
U(1,2:end) = U(1,1:end-1); % Sliding window
U(1,1) = x(n); % Present Input
y = (W)*U'; % Calculating output of LMS
e = d(n) - y; % Instantaneous error
W = W + eta * e * U ; % Weight update rule of LMS
J(k,n) = e * e'; % Instantaneous square error
end
end
Lets start step by step:
First of all when using some fitting method it is a good practice to use RMS error . To get this we have to find error between input and output. As I understood x is an input for our model and y is an output. Furthermore you already calculated error between them. But you used it in loop without saving. Lets modify your code:
%% Algorithm
for k = 1 : epoch
for n = 1 : N
U(1,2:end) = U(1,1:end-1); % Sliding window
U(1,1) = x(n); % Present Input
y(n) = (W)*U'; % Calculating output of LMS
e(n) = x(n) - y(n); % Instantaneous error
W = W + eta * e(n) * U ; % Weight update rule of LMS
J(k,n) = e(n) * (e(n))'; % Instantaneous square error
end
end
Now e consists of errors at the last epoch. So we can use something like this:
rms(e)
Also I'd like to compare results using mean error and standard deviation:
mean(e)
std(e)
And some visualization:
histogram(e)
Second moment: we can't use compare function just for vectors! You can use it for dynamic system models. For it you have to made some workaround about using this method as dynamic model. But we can use some functions as goodnessOfFit for example. If you want something like error at each step that consider all previous points of data then make some math workaround - calculate it at each point using [1:currentNumber].
About using LMS method. There are built-in function calculating LMS. Lets try to use it for your data sets:
alg = lms(0.001);
eqobj = lineareq(10,alg);
y1 = equalize(eqobj,x);
And lets see at the result:
plot(x)
hold on
plot(y1)
There are a lot of examples of such implementation of this function: look here for example.
I hope this was helpful for you!
Comparison of the model output vs observed data is known as residual.
The difference between the observed value of the dependent variable
(y) and the predicted value (ŷ) is called the residual (e). Each data
point has one residual.
Residual = Observed value - Predicted value
e = y - ŷ
Both the sum and the mean of the residuals are equal to zero. That is,
Σ e = 0 and e = 0.
A residual plot is a graph that shows the residuals on the vertical
axis and the independent variable on the horizontal axis. If the
points in a residual plot are randomly dispersed around the horizontal
axis, a linear regression model is appropriate for the data;
otherwise, a non-linear model is more appropriate.
Here is an example of residual plots from a model of mine. On the vertical axis is the difference between the output of the model and the measured value. On the horizontal axis is one of the independent variables used in the model.
We can see that most of the residuals are within 0.2 units which happens to be my tolerance for this model. I can therefore make a conclusion as to the worth of the model.
See here for a similar question.
Regarding you question about the lack of noise in your models output. We are creating a linear model. There's the clue.
In many areas I have found that while adding noise, we mention some specification like zero mean and variance. I need to add AWGN, colored noise, uniform noise of varying SNR in Db. The following code shows the way how I generated and added noise. I am aware of the function awgn() but it is a kind of black box thing without knowing how the noise is getting added. So, can somebody please explain the correct way to generate and add noise. Thank you
SNR = [-10:5:30]; %in Db
snr = 10 .^ (0.1 .* SNR);
for I = 1:length(snr)
noise = 1 / sqrt(2) * (randn(1, N) + 1i * randn(1, N));
u = y + noise .* snr(I);
end
I'm adding another answer since it strikes me that Steven's is not quite correct and Horchler's suggestion to look inside function awgn is a good one.
Either MATLAB or Octave (in the communications toolbox) have a function awgn that adds (white Gaussian) noise to attain a desired signal-to-noise power level; the following is the relevant portion of the code (from the Octave function):
if (meas == 1) % <-- if using signal power to determine appropriate noise power
p = sum( abs( x(:)) .^ 2) / length(x(:));
if (strcmp(type,"dB"))
p = 10 * log10(p);
endif
endif
if (strcmp(type,"linear"))
np = p / snr;
else % <-- in dB
np = p - snr;
endif
y = x + wgn (m, n, np, 1, seed, type, out);
As you can see by the way p (the power of the input data) is computed, the answer from Steven does not appear to be quite right.
You can ask the function to compute the total power of your data array and combine that with the desired s/n value you provide to compute the appropriate power level of the added noise. You do this by passing the string "measured" among the optional inputs, like this (see here for the Octave documentation or here for the MATLAB documentation):
y = awgn (x, snr, 'measured')
This leads ultimately to meas=1 and so meas==1 being true in the code above. The function awgn then uses the signal passed to it to compute the signal power, and from this and the desired s/n it then computes the appropriate power level for the added noise.
As the documentation further explains
By default the snr and pwr are assumed to be in dB and dBW
respectively. This default behavior can be chosen with type set to
"dB". In the case where type is set to "linear", pwr is assumed to be
in Watts and snr is a ratio.
This means you can pass a negative or 0 dB snr value. The result will also depend then on other options you pass, such as the string "measured".
For the MATLAB case I suggest reading the documentation, it explains how to use the function awgn in different scenarios. Note that implementations in Octave and MATLAB are not identical, the computation of noise power should be the same but there may be different options.
And here is the relevant part from wgn (called above by awgn):
if (strcmp(type,"dBW"))
np = 10 ^ (p/10);
elseif (strcmp(type,"dBm"))
np = 10 ^((p - 30)/10);
elseif (strcmp(type,"linear"))
np = p;
endif
if(!isempty(seed))
randn("state",seed);
endif
if (strcmp(out,"complex"))
y = (sqrt(imp*np/2))*(randn(m,n)+1i*randn(m,n)); % imp=1 assuming impedance is 1 Ohm
else
y = (sqrt(imp*np))*randn(m,n);
endif
If you want to check the power of your noise (np), the awgn and awg functions assume the following relationships hold:
np = var(y,1); % linear scale
np = 10*log10(np); % in dB
where var(...,1) is the population variance for the noise y.
Most answers here forget that SNR is specified in decibels. Therefore, you shouldn't encounter 'division by 0' error, because you should really divide by 10^(targetSNR/10) which is never negative nor zero for real targetSNR.
This 'should not divide by 0' problem could be easily solved if you add a condition to check if targetSNR is 0 and do these only if it is not 0. When your target SNR is 0, it means it's pure noise.
function out_signal = addAWGN(signal, targetSNR)
sigLength = length(signal); % length
awgnNoise = randn(size(signal)); % orignal noise
pwrSig = sqrt(sum(signal.^2))/sigLength; % signal power
pwrNoise = sqrt(sum(awgnNoise.^2))/sigLength; % noise power
if targetSNR ~= 0
scaleFactor = (pwrSig/pwrNoise)/targetSNR; %find scale factor
awgnNoise = scaleFactor*awgnNoise;
out_signal = signal + awgnNoise; % add noise
else
out_signal = awgnNoise; % noise only
end
You can use randn() to generate a noise vector 'awgnNoise' of the length you want. Then, given a specified SNR value, calculate the power of the orignal signal and the power of the noise vector 'awgnNoise'.
Get the right amplitude scaling factor for the noise vector and just scale it.
The following code is an example to corrupt signal with white noise, assuming input signal is 1D and real valued.
function out_signal = addAWGN(signal, targetSNR)
sigLength = length(signal); % length
awgnNoise = randn(size(signal)); % orignal noise
pwrSig = sqrt(sum(signal.^2))/sigLength; % signal power
pwrNoise = sqrt(sum(awgnNoise.^2))/sigLength; % noise power
scaleFactor = (pwrSig/pwrNoise)/targetSNR; %find scale factor
awgnNoise = scaleFactor*awgnNoise;
out_signal = signal + awgnNoise; % add noise
Be careful about the sqrt(2) factor when you deal with complex signal, if you want to generate the real and imag part separately.
By using normrnd, I would like to create a normal distribution function with mean and sigma values expressed as vectors of size 1x45 varying from 1:45 and plot this simulated PDF with ideal values.
Whenever I create a normrnd like the one expressed below,
Gaussian = normrnd([1 45],[1 45],[1 500],length(c_t));
I am obtaining the following error,
Size information is inconsistent.
The reason for creating this PDF is to compute Chemical kinetics of a tracer with variable gaussian noise model. Basically i have an Ideal characteristics of a Tracer now i would like to add gaussian noise and understand how the chemical kinetics of a tracer vary with changing noise.
Basically there are different computational models for understanding chemical kinetics of tracer, one of which is Three compartmental model ,others are viz shape analysis,constrained shape analysis model.
I currently have ideal curve for all respective models, now i would like to add noise to these models and understand how each particular model behaves with varying noise
This is why i would like to create a variable noise model with normrnd add this model to ideal characteristics and compute Noise(Sigma) Vs Error -This analysis will give me an approximate estimation how different models behave with varying noise and which model is suitable for estimating chemical kinetics of tracer.
function [c_t,c_t_noise] =Noise_ConstrainedK2(t,a1,a2,a3,b1,b2,b3,td,tmax,k1,k2,k3)
K_1 = (k1*k2)/(k2+k3);
K_2 = (k1*k3)/(k2+k3);
%DV_free= k1/(k2+k3);
c_t = zeros(size(t));
ind = (t > td) & (t < tmax);
c_t(ind)= conv(((t(ind) - td) ./ (tmax - td) * (a1 + a2 + a3)),(K_1*exp(-(k2+k3)*t(ind)+K_2)),'same');
ind = (t >= tmax);
c_t(ind)=conv((a1 * exp(-b1 * (t(ind) - tmax))+ a2 * exp(-b2 * (t(ind) - tmax))) + a3 * exp(-b3 * (t(ind) - tmax)),(K_1*exp(-(k2+k3)*t(ind)+K_2)),'same');
meanAndVar = (rand(45,2)-0.5)*2;
numPoints = 500;
randSamples = zeros(1,numPoints);
for ii = 1:numPoints
idx = mod(ii,size(meanAndVar,1))+1;
randSamples(ii) = normrnd(meanAndVar(idx,1),meanAndVar(idx,2));
c_t_noise = c_t + randSamples(ii);
end
scatter(1:numPoints,randSamples)
dg = [0 0.5 0];
plot(t,c_t,'r');
hold on;
plot(t,c_t_noise,'Color',dg);
hold off;
axis([0 50 0 1900]);
xlabel('Time[mins]');
ylabel('concentration [Mbq]');
title('My signal');
%plot(t,c_tnp);
end
The output characteristics from the above function are as follows,Here i could not visualize any noise
The only remotely close thing to what you want to be done can be done as follows, but will involve looping because you can not request 500 data points from only 45 different means and variances, without the assumption that multiple sets can be revisited.
This is my interpretation of what you want, though I am still not entirely sure.
Random Gaussian Function Selection
meanAndVar = rand(45,2);
numPoints = 500;
randSamples = zeros(1,numPoints);
for ii = 1:numPoints
randMeanVarIdx = randi([1,size(meanAndVar,1)]);
randSamples(ii) = normrnd(meanAndVar(randMeanVarIdx,1),meanAndVar(randMeanVarIdx,2));
end
scatter(1:numPoints,randSamples)
The above code generates a random 2-D matrix of mean and variance (1st col = mean, 2nd col = variance). We then preallocate some space.
Inside the loop we chose a random set of mean and variance to use (uniformly) and then take that mean and variance, plug it into a random gaussian value function, and store it.
the matrix randSamples will contain a list of random values generated by a random set of gaussian functions chosen in a randomly uniform manner.
Sequential Function Selection
If you do not want to randomly select which function to use, and just want to go sequentially you loop using modulus to get the index of which set of values to use.
meanAndVar = (rand(45,2)-0.5)*2; % zero shift and make bounds [-1,1]
numPoints = 500;
randSamples = zeros(1,numPoints);
for ii = 1:numPoints
idx = mod(ii,size(meanAndVar,1))+1;
randSamples(ii) = normrnd(meanAndVar(idx,1),meanAndVar(idx,2));
end
scatter(1:numPoints,randSamples)
The problem with this statement
Gaussian = normrnd([1 45],[1 45],[1 500],length(c_t));
is that you supply two mu values and two sigma values, and ask for a matrix of size [1 500] x length(c_t). You need to pass the size in a uniform way, so either
Gaussian = normrnd(mu, sigma,[500 length(c_t)]);
or
Gaussian = normrnd(mu, sigma, 500, length(c_t));
Then you should make sure that the size of the mu/sigma vectors match the size of the matrix you ask for. So if you want a 500 x length(c_t) matrix as output you need to pass 500 x length(c_t) (mu,sigma) pairs. If you only want to vary one of mu or sigma you can pass a single value for the other parameter
To get N values from a normal distribution with fixed mean and steadily increasing sigma you can do
noise = #(mu, s0, s1, n) normrnd(mu, s0:(s1-s0)/(n-1):s1, 1,n)
where s0 is the lowest sigma value and s1 is the largest sigma value. To get 10 values drawn from distributions with mu=0 and sigma increasing from 1 to 5 you can do
noise(0,1,5,10)
If you want to introduce some randomness in the increase of sigma you can do
noise_rand = #(mu, s0, s1, n) normrnd(mu, (s0:(s1-s0)/(n-1):s1) .* rand(1,n), 1,n)
I got this example of a minimal Echo State Network (ESN) which I analyse while trying to understand Echo State Networks. Unfortunately I have some problems understanding why this really works. It all breaks down to the questions:
[ What defines | What is] the echo state of an ESN?
What is it that makes an ESN so easy and fast learning of such complex nonlinear functions like the Mackey-Glass function?
First here is a little piece of code that shows the important part of initialization:
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Generate the ESN reservoir
%
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
rand('seed', 42);
trainLen = 2000;
testLen = 2000;
initLen = 100;
data = load('MackeyGlass_t17.txt');
% Input neurons
inSize = 1;
% Output neurons
outSize = 1;
% Reservoir size
resSize = 1000;
% Leaking rate
a = 0.3;
% Input weights
Win = ( rand(resSize, (inSize+1) ) - 0.5) .* 1;
% Reservoir weights
W = rand(resSize, resSize) - 0.5;
Running the reservoir:
I understand that every single data-point of the input data set is propagated from the input neuron to the reservoir neurons. After a warm-up of size initLen the states are accepted and stored in matrix X. When this is done every single column of X represents a "vector of reservoir neuron activations". And here comes the point where I am not sure if I got it right:
The comment already says "collected states" or "design matrix" X. Am I getting this right, that all this does is storing the state of the whole network in the rows of matrix X?
If we assume that t was just a time parameter then X(:,t) represents the network state of time t , isn't it?
In my examples this would mean that there are 1.900 time slices which represent the whole network state of their corresponding timeframe (X therefore is a 1002x1900 matrix). Another question that occurs to me here is
why is a 1 (I guess it is the bias) and the input value u appended to this vector: X(:,t-initLen) = [1;u;x];
So:
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Run the reservoir with the data and collect X.
%
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Allocated memory for the design (collected states) matrix
X = zeros((1+inSize) + resSize, trainLen - initLen);
% Vector of reservoir neuron activations (used for calculation)
x = zeros(resSize, 1);
% Update of the reservoir neuron activations
xUpd = zeros(resSize, 1);
for t = 1:trainLen
u = data(t);
xUpd = tanh( Win * [1;u] + W * x );
x = (1-a) * x + a * xUpd;
if ( t > initLen )
X(:,t-initLen) = [1;u;x];
end
end
Training part:
The training part is also a little magic to me yet. I am familiar how linear regression works, so this is not the problem here.
What I see is that this part just uses the hole state matrix X and performs a single linear regression step on the input data to generate the output weight vector Wout and that's it.
So all that's been done so far - if I'm not mistaken - is initializing the output weights according to the state matri X which itself was generated using input data and randomly gernerated (input and reservoir) weights.
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Train the output
%
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Set the corresponding target matrix directly
Yt = data(initLen+2:trainLen+1)';
% Regularization coefficient
reg = 1e-8;
% Get X transposed - needed twice therefore it is a little faster
X_T = X';
% Yt * pseudo_inverse(X); (linear regression task)
Wout = Yt * X_T * (X * X_T + reg * eye(1+inSize+resSize))^(-1);
Running the ESN in a generative mode:
I can run this in two modes: generative or predictive. But well, this is the part where I just can say: "Well, .. it works." not having the exact idea why it is.
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Run the trained ESN in a generative mode. no need to initialize here,
% because x is initialized with training data and we continue from there.
%
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Y = zeros(outSize,testLen);
u = data(trainLen+1);
for t = 1:testLen
xUpd = tanh( Win*[1;u] + W*x );
x = (1-a)*x + a*xUpd;
% Generative mode:
u = Wout*[1;u;x];
% This would be a predictive mode:
%u = data(trainLen+t+1);
Y(:,t) = u;
end
It works pretty well as you can see (generative mode):
I know this is a quiet huge "question" if this can even be considered as one. I feel like I am understanding the single parts but what I'm missing is the big picture of this magic black box called Echo State Network.
The echo state network (ESN) is basically a clever way to train a Recurrent Neural Network.
The ESN has a "reservoir" of hidden units which are coupled.
The inputs are connected to the reservoir with input (plus a bias) to hidden connections. These connections are not trained. They are randomly initialized, and this is the code snippet that does this initialization (I am using python).
Win = (random.rand(resSize,1+inSize)-0.5) * 1
The units in the reservoir are coupled, meaning basically that there exist hidden to hidden connections. Again the weights in the reservoir are not trained but initialized. However, initialization of the reservoir weights is tricky. Those weights (depicted by W in the code) are first randomly initialized and then they are multiplied by a factor which takes into account the spectral radius of the random matrix. Careful initialization of these connections is very important because it affects the dynamics of the ESN (do not forget it is a recurrent network). I guess if you want to know more details about this you have to be able to understand linear system theory.
Now, after initializing properly the two weight matrices you start presenting inputs to the reservoir. For each input presented to the reservoir the activations are calculated and these activations are the state of the ESN. Look at the figure below.
This figure shows a plot of 200 activations for 20 inputs.
So, after presenting all inputs to the ESN the states are collected into a matrix X. This is the code snippet that does this in python:
x = zeros((resSize,1))
for t in range(trainLen):
u = data[t]
x = (1-a)*x + a*tanh( dot( Win, vstack((1,u)) ) + dot( W, x ) )
if t >= initLen:
X[:,t-initLen] = vstack((1,u,x))[:,0]
The state of the ESN is therefore a function of the finite history of the inputs presented to the network.
Now, in order to predict the output from the states of the oscillators the only thing that has to be learned is how to couple the outputs to the oscillators, i.e. the hidden to output connections:
# train the output
reg = 1e-8 # regularization coefficient
X_T = X.T
Wout = dot( dot(Yt,X_T), linalg.inv( dot(X,X_T) + \
reg*eye(1+inSize+resSize) ) )
Then after the network has been trained the predictive capability is tested using the test sample of the data.
The generative mode means that you start with a particular value of the time series and then you use that value to predict the next value in the time series but then you use the predicted value to predict the next value and so on. In effect you are generating the time series, hence generative mode. It allows you to predict multiple steps into the future, as opposed to predictive mode where you get one value from the time series and predict the next one.
And this is why the ESN seems to be doing a pretty good job. The target signal is pretty complex and yet in generative mode it does very well.
Finally, as far as minimal implementation goes i guess it refers to the size of the reservoir (1000), which apparently is pretty small.
I have a following stochastic model describing evolution of a process (Y) in space and time. Ds and Dt are domain in space (2D with x and y axes) and time (1D with t axis). This model is usually known as mixed-effects model or components-of-variation models
I am currently developing Y as follow:
%# Time parameters
T=1:1:20; % input
nT=numel(T);
%# Grid and model parameters
nRow=100;
nCol=100;
[Grid.Nx,Grid.Ny,Grid.Nt] = meshgrid(1:1:nCol,1:1:nRow,T);
xPower=0.1;
tPower=1;
noisePower=1;
detConstant=1;
deterministic_mu = detConstant.*(((Grid.Nt).^tPower)./((Grid.Nx).^xPower));
beta_s = randn(nRow,nCol); % mean-zero random effect representing location specific variability common to all times
gammaTemp = randn(nT,1);
for t = 1:nT
gamma_t(:,:,t) = repmat(gammaTemp(t),nRow,nCol); % mean-zero random effect representing time specific variability common to all locations
end
var=0.1;% noise has variance = 0.1
for t=1:nT
kappa_st(:,:,t) = sqrt(var)*randn(nRow,nCol);
end
for t=1:nT
Y(:,:,t) = deterministic_mu(:,:,t) + beta_s + gamma_t(:,:,t) + kappa_st(:,:,t);
end
My questions are:
How to produce delta in the expression for Y and the difference in kappa and delta?
Help explain, through some illustration using Matlab, if I am correctly producing Y?
Please let me know if you need some more information/explanation. Thanks.
First, I rewrote your code to make it a bit more efficient. I see you generate linearly-spaced grids for x,y and t and carry out the computation for all points in this grid. This approach has severe limitations on the maximum attainable grid resolution, since the 3D grid (and all variables defined with it) can consume an awfully large amount of memory if the resolution goes up. If the model you're implementing will grow in complexity and size (it often does), I'd suggest you throw this all into a function accepting matrix/vector inputs for s and t, which will be a bit more flexible in this regard -- processing "blocks" of data that will otherwise not fit in memory will be a lot easier that way.
Then, I generated the the delta_st term with rand instead of randn since the noise should be "white". Now I'm very unsure about that last one, and I didn't have time to read through the paper you linked to -- can you tell me on what pages I can find relevant the sections for the delta_st?
Now, the code:
%# Time parameters
T = 1:1:20; % input
nT = numel(T);
%# Grid and model parameters
nRow = 100;
nCol = 100;
% noise has variance = 0.1
var = 0.1;
xPower = 0.1;
tPower = 1;
noisePower = 1;
detConstant = 1;
[Grid.Nx,Grid.Ny,Grid.Nt] = meshgrid(1:nCol,1:nRow,T);
% deterministic mean
deterministic_mu = detConstant .* Grid.Nt.^tPower ./ Grid.Nx.^xPower;
% mean-zero random effect representing location specific
% variability common to all times
beta_s = repmat(randn(nRow,nCol), [1 1 nT]);
% mean-zero random effect representing time specific
% variability common to all locations
gamma_t = bsxfun(#times, ones(nRow,nCol,nT), randn(1, 1, nT));
% mean zero random effect capturing the spatio-temporal
% interaction not found in the larger-scale deterministic mu
kappa_st = sqrt(var)*randn(nRow,nCol,nT);
% mean zero random effect representing the micro-scale
% spatio-temporal variability that is modelled by white
% noise (i.i.d. at different time steps) in Ds·Dt
delta_st = noisePower * (rand(nRow,nCol,nT)-0.5);
% Final result:
Y = deterministic_mu + beta_s + gamma_t + kappa_st + delta_st;
Your implementation samples beta, gamma and kappa as if they are white (e.g. their values at each (x,y,t) are independent). The descriptions of the terms suggest that this is not meant to be the case. It looks like delta is supposed to capture the white noise, while the other terms capture the correlations over their respective domains. e.g. there is a non-zero correlation between gamma(t_1) and gamma(t_1+1).
If you wish to model gamma as a stationary Gaussian Markov process with variance var_g and correlation cor_g between gamma(t) and gamma(t+1), you can use something like
gamma_t = nan( nT, 1 );
gamma_t(1) = sqrt(var_g)*randn();
K_g = cor_g/var_g;
K_w = sqrt( (1-K_g^2)*var_g );
for t = 2:nT,
gamma_t(t) = K_g*gamma_t(t-1) + K_w*randn();
end
gamma_t = reshape( gamma_t, [ 1 1 nT ] );
The formulas I've used for gains K_g and K_w in the above code (and the initialization of gamma_t(1)) produce the desired stationary variance \sigma^2_0 and one-step covariance \sigma^2_1:
Note that the implementation above assumes that later you will sum the terms using bsxfun to do the "repmat" for you:
Y = bsxfun( #plus, deterministic_mu + kappa_st + delta_st, beta_s );
Y = bsxfun( #plus, Y, gamma_t );
Note that I haven't tested the above code, so you should confirm with sampling that it does actually produce a zero noise process of the specified variance and covariance between adjacent samples. To sample beta the same procedure can be extended into two dimensions, but the principles are essentially the same. I suspect kappa should be similarly modeled as a Markov Gaussian Process, but in all three dimensions and with a lower variance to represent higher-order effects not captured in mu, beta and gamma.
Delta is supposed to be zero mean stationary white noise. Assuming it to be Gaussian with variance noisePower one would sample it using
delta_st = sqrt(noisePower)*randn( [ nRows nCols nT ] );