Echo State Network learning Mackey-Glass function, but how? - matlab

I got this example of a minimal Echo State Network (ESN) which I analyse while trying to understand Echo State Networks. Unfortunately I have some problems understanding why this really works. It all breaks down to the questions:
[ What defines | What is] the echo state of an ESN?
What is it that makes an ESN so easy and fast learning of such complex nonlinear functions like the Mackey-Glass function?
First here is a little piece of code that shows the important part of initialization:
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Generate the ESN reservoir
%
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
rand('seed', 42);
trainLen = 2000;
testLen = 2000;
initLen = 100;
data = load('MackeyGlass_t17.txt');
% Input neurons
inSize = 1;
% Output neurons
outSize = 1;
% Reservoir size
resSize = 1000;
% Leaking rate
a = 0.3;
% Input weights
Win = ( rand(resSize, (inSize+1) ) - 0.5) .* 1;
% Reservoir weights
W = rand(resSize, resSize) - 0.5;
Running the reservoir:
I understand that every single data-point of the input data set is propagated from the input neuron to the reservoir neurons. After a warm-up of size initLen the states are accepted and stored in matrix X. When this is done every single column of X represents a "vector of reservoir neuron activations". And here comes the point where I am not sure if I got it right:
The comment already says "collected states" or "design matrix" X. Am I getting this right, that all this does is storing the state of the whole network in the rows of matrix X?
If we assume that t was just a time parameter then X(:,t) represents the network state of time t , isn't it?
In my examples this would mean that there are 1.900 time slices which represent the whole network state of their corresponding timeframe (X therefore is a 1002x1900 matrix). Another question that occurs to me here is
why is a 1 (I guess it is the bias) and the input value u appended to this vector: X(:,t-initLen) = [1;u;x];
So:
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Run the reservoir with the data and collect X.
%
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Allocated memory for the design (collected states) matrix
X = zeros((1+inSize) + resSize, trainLen - initLen);
% Vector of reservoir neuron activations (used for calculation)
x = zeros(resSize, 1);
% Update of the reservoir neuron activations
xUpd = zeros(resSize, 1);
for t = 1:trainLen
u = data(t);
xUpd = tanh( Win * [1;u] + W * x );
x = (1-a) * x + a * xUpd;
if ( t > initLen )
X(:,t-initLen) = [1;u;x];
end
end
Training part:
The training part is also a little magic to me yet. I am familiar how linear regression works, so this is not the problem here.
What I see is that this part just uses the hole state matrix X and performs a single linear regression step on the input data to generate the output weight vector Wout and that's it.
So all that's been done so far - if I'm not mistaken - is initializing the output weights according to the state matri X which itself was generated using input data and randomly gernerated (input and reservoir) weights.
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Train the output
%
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Set the corresponding target matrix directly
Yt = data(initLen+2:trainLen+1)';
% Regularization coefficient
reg = 1e-8;
% Get X transposed - needed twice therefore it is a little faster
X_T = X';
% Yt * pseudo_inverse(X); (linear regression task)
Wout = Yt * X_T * (X * X_T + reg * eye(1+inSize+resSize))^(-1);
Running the ESN in a generative mode:
I can run this in two modes: generative or predictive. But well, this is the part where I just can say: "Well, .. it works." not having the exact idea why it is.
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Run the trained ESN in a generative mode. no need to initialize here,
% because x is initialized with training data and we continue from there.
%
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Y = zeros(outSize,testLen);
u = data(trainLen+1);
for t = 1:testLen
xUpd = tanh( Win*[1;u] + W*x );
x = (1-a)*x + a*xUpd;
% Generative mode:
u = Wout*[1;u;x];
% This would be a predictive mode:
%u = data(trainLen+t+1);
Y(:,t) = u;
end
It works pretty well as you can see (generative mode):
I know this is a quiet huge "question" if this can even be considered as one. I feel like I am understanding the single parts but what I'm missing is the big picture of this magic black box called Echo State Network.

The echo state network (ESN) is basically a clever way to train a Recurrent Neural Network.
The ESN has a "reservoir" of hidden units which are coupled.
The inputs are connected to the reservoir with input (plus a bias) to hidden connections. These connections are not trained. They are randomly initialized, and this is the code snippet that does this initialization (I am using python).
Win = (random.rand(resSize,1+inSize)-0.5) * 1
The units in the reservoir are coupled, meaning basically that there exist hidden to hidden connections. Again the weights in the reservoir are not trained but initialized. However, initialization of the reservoir weights is tricky. Those weights (depicted by W in the code) are first randomly initialized and then they are multiplied by a factor which takes into account the spectral radius of the random matrix. Careful initialization of these connections is very important because it affects the dynamics of the ESN (do not forget it is a recurrent network). I guess if you want to know more details about this you have to be able to understand linear system theory.
Now, after initializing properly the two weight matrices you start presenting inputs to the reservoir. For each input presented to the reservoir the activations are calculated and these activations are the state of the ESN. Look at the figure below.
This figure shows a plot of 200 activations for 20 inputs.
So, after presenting all inputs to the ESN the states are collected into a matrix X. This is the code snippet that does this in python:
x = zeros((resSize,1))
for t in range(trainLen):
u = data[t]
x = (1-a)*x + a*tanh( dot( Win, vstack((1,u)) ) + dot( W, x ) )
if t >= initLen:
X[:,t-initLen] = vstack((1,u,x))[:,0]
The state of the ESN is therefore a function of the finite history of the inputs presented to the network.
Now, in order to predict the output from the states of the oscillators the only thing that has to be learned is how to couple the outputs to the oscillators, i.e. the hidden to output connections:
# train the output
reg = 1e-8 # regularization coefficient
X_T = X.T
Wout = dot( dot(Yt,X_T), linalg.inv( dot(X,X_T) + \
reg*eye(1+inSize+resSize) ) )
Then after the network has been trained the predictive capability is tested using the test sample of the data.
The generative mode means that you start with a particular value of the time series and then you use that value to predict the next value in the time series but then you use the predicted value to predict the next value and so on. In effect you are generating the time series, hence generative mode. It allows you to predict multiple steps into the future, as opposed to predictive mode where you get one value from the time series and predict the next one.
And this is why the ESN seems to be doing a pretty good job. The target signal is pretty complex and yet in generative mode it does very well.
Finally, as far as minimal implementation goes i guess it refers to the size of the reservoir (1000), which apparently is pretty small.

Related

Linear regression -- Stuck in model comparison in Matlab after estimation?

I want to determine how well the estimated model fits to the future new data. To do this, prediction error plot is often used. Basically, I want to compare the measured output and the model output. I am using the Least Mean Square algorithm as the equalization technique. Can somebody please help what is the proper way to plot the comparison between the model and the measured data? If the estimates are close to true, then the curves should be very close to each other. Below is the code. u is the input to the equalizer, x is the noisy received signal, y is the output of the equalizer, w is the equalizer weights. Should the graph be plotted using x and y*w? But x is noisy. I am confused since the measured output x is noisy and the model output y*w is noise-free.
%% Channel and noise level
h = [0.9 0.3 -0.1]; % Channel
SNRr = 10; % Noise Level
%% Input/Output data
N = 1000; % Number of samples
Bits = 2; % Number of bits for modulation (2-bit for Binary modulation)
data = randi([0 1],1,N); % Random signal
d = real(pskmod(data,Bits)); % BPSK Modulated signal (desired/output)
r = filter(h,1,d); % Signal after passing through channel
x = awgn(r, SNRr); % Noisy Signal after channel (given/input)
%% LMS parameters
epoch = 10; % Number of epochs (training repetation)
eta = 1e-3; % Learning rate / step size
order=10; % Order of the equalizer
U = zeros(1,order); % Input frame
W = zeros(1,order); % Initial Weigths
%% Algorithm
for k = 1 : epoch
for n = 1 : N
U(1,2:end) = U(1,1:end-1); % Sliding window
U(1,1) = x(n); % Present Input
y = (W)*U'; % Calculating output of LMS
e = d(n) - y; % Instantaneous error
W = W + eta * e * U ; % Weight update rule of LMS
J(k,n) = e * e'; % Instantaneous square error
end
end
Lets start step by step:
First of all when using some fitting method it is a good practice to use RMS error . To get this we have to find error between input and output. As I understood x is an input for our model and y is an output. Furthermore you already calculated error between them. But you used it in loop without saving. Lets modify your code:
%% Algorithm
for k = 1 : epoch
for n = 1 : N
U(1,2:end) = U(1,1:end-1); % Sliding window
U(1,1) = x(n); % Present Input
y(n) = (W)*U'; % Calculating output of LMS
e(n) = x(n) - y(n); % Instantaneous error
W = W + eta * e(n) * U ; % Weight update rule of LMS
J(k,n) = e(n) * (e(n))'; % Instantaneous square error
end
end
Now e consists of errors at the last epoch. So we can use something like this:
rms(e)
Also I'd like to compare results using mean error and standard deviation:
mean(e)
std(e)
And some visualization:
histogram(e)
Second moment: we can't use compare function just for vectors! You can use it for dynamic system models. For it you have to made some workaround about using this method as dynamic model. But we can use some functions as goodnessOfFit for example. If you want something like error at each step that consider all previous points of data then make some math workaround - calculate it at each point using [1:currentNumber].
About using LMS method. There are built-in function calculating LMS. Lets try to use it for your data sets:
alg = lms(0.001);
eqobj = lineareq(10,alg);
y1 = equalize(eqobj,x);
And lets see at the result:
plot(x)
hold on
plot(y1)
There are a lot of examples of such implementation of this function: look here for example.
I hope this was helpful for you!
Comparison of the model output vs observed data is known as residual.
The difference between the observed value of the dependent variable
(y) and the predicted value (ŷ) is called the residual (e). Each data
point has one residual.
Residual = Observed value - Predicted value
e = y - ŷ
Both the sum and the mean of the residuals are equal to zero. That is,
Σ e = 0 and e = 0.
A residual plot is a graph that shows the residuals on the vertical
axis and the independent variable on the horizontal axis. If the
points in a residual plot are randomly dispersed around the horizontal
axis, a linear regression model is appropriate for the data;
otherwise, a non-linear model is more appropriate.
Here is an example of residual plots from a model of mine. On the vertical axis is the difference between the output of the model and the measured value. On the horizontal axis is one of the independent variables used in the model.
We can see that most of the residuals are within 0.2 units which happens to be my tolerance for this model. I can therefore make a conclusion as to the worth of the model.
See here for a similar question.
Regarding you question about the lack of noise in your models output. We are creating a linear model. There's the clue.

MATLAB's newrb for designing radial basis networks does not behave in accordance to the documentation. Why?

I'm trying to approximate various signals using radial basis networks. In particular, I make use of MATLAB's newrb.
My problem is that this function seems to behave incorrectly if I follow the description of newrb. As I understand it, it makes sense to transpose all arguments despite the documentation.
The following example hopefully illustrates my problem.
I create one period of a sine wave with 100 samples. I would like to approximate this sine wave by means of a radial basis network with maximally two hidden neurons. I have one input vector (t) and one target vector (s). Hence, according to the documentation, I should call newrb with two column vectors. However, the approximation is too good. In fact, the mean squared error is 0 which can't be true using only two neurons. Additionally, the visualization with view(net) shows not only one but 100 inputs if I use column vectors.
In the example, the vectors corresponding to the "correct" (according to the documentation) function call are indicated by _doc, the ones corresponding to the "incorrect" call by _not_doc.
Can anybody explain this behavior?
% one period sine signal with
% carrier frequency = 1, sampling frequency = 100
Ts = 1 / 100;
t = 2 * pi * (0:Ts:1-Ts); % size(t) = 1 100
s = sin(t); % size(s) = 1 100
% design radial basis network
MSE_goal = 0.0; % mean squared error goal, default value
spread = 1.0; % spread of readial basis functions, default value
max_neurons = 2; % maximum number of neurons, custom value
DF = 25; % number of neurons to add between displays, default value
net_not_doc = newrb( t , s , MSE_goal, spread, max_neurons, DF ); % row vectors
net_doc = newrb( t', s', MSE_goal, spread, max_neurons, DF ); % column vectors
% simulate network
approx_not_doc = sim( net_not_doc, t );
approx_doc = sim( net_doc, t' );
% plot
figure;
plot( t, s, 'DisplayName', 'Sine' );
hold on;
plot( t, approx_not_doc, 'r:', 'DisplayName', 'Approximation_{not doc}');
hold on;
plot( t, approx_doc', 'g:', 'DisplayName', 'Approximation_{doc}');
grid on;
legend show;
% view neural networks
view(net_not_doc);
view(net_doc);
Because I had the same problem myself, I'll try to give an answer for anyone who will stumble upon the same post.
As I figured the problem is not the transpose vectors. You can use your data as it is, without transposing anything.
The fact that you train your RBF network with vector t and then simulate with the same vector that you trained your network, is the reason why you have so perfect approximation. You test your network with the same values that you taught it.
If you realy want to test your network you must choose a different vector for testing. In your example I used this:
% simulate network
t_test = 2 * pi * ((1-Ts)/2:Ts:3-Ts);
approx_not_doc = sim( net_not_doc, t_test );
And now when you plot your results, you can observe that the points that have the same value as in your train vector are almost flawless. The rest have an unknown target because of the small number of neurons (as you expected).
Plot of t_test with approx_not_doc.
Now If you add more neurons (in this example I used 100), you can see that now the new network can predict, with the same test vector t_test, an unknown part of your function. Plot t_test with approx_not_doc for 100 neurons. Of course, if you try with different number of neurons and spread your results will vary.
Hope this will help anyone with the same problem.

BER graph MATLAB calibration

I'm doing a 16QAM system (transmitter, channel and receiver), and BER and PER curves of the results. However, I'm having some problems with noise at the receiver.
I'm running the system inside two loops: for all the Eb/No values and for all the packets and I sent 200 symbols and 1000 packets but this still happens. I would like to check whether the result from this code is correct or not:
clear all
clc
numPkts=1000;
N = 200; % number of symbols
M = 16; % constellation size
k = log2(M); % bits per symbol
pv=4; %prefix length
% defining the real and imaginary PAM constellation
% for 16-QAM
alphaRe = [-(2*sqrt(M)/2-1):2:-1 1:2:2*sqrt(M)/2-1];
alphaIm = [-(2*sqrt(M)/2-1):2:-1 1:2:2*sqrt(M)/2-1];
k_16QAM = 1/sqrt(10);
Eb_N0_dB = [0:15]; % multiple Es/N0 values
Es_N0_dB = Eb_N0_dB + 10*log10(k);
erTot=zeros(1,length(Eb_N0_dB));
% Mapping for binary <--> Gray code conversion
ref = [0:k-1];
map = bitxor(ref,floor(ref/2));
[tt ind] = sort(map);
for ii = 1:length(Eb_N0_dB)
for pktX=1:numPkts
% symbol generation
% ------------------
ipBit = rand(1,N*k,1)>0.5; % random 1's and 0's
ipBitReshape = reshape(ipBit,k,N).';
bin2DecMatrix = ones(N,1)*(2.^[(k/2-1):-1:0]) ; % conversion from binary to decimal
% real
ipBitRe = ipBitReshape(:,[1:k/2]);
ipDecRe = sum(ipBitRe.*bin2DecMatrix,2);
ipGrayDecRe = bitxor(ipDecRe,floor(ipDecRe/2));
% imaginary
ipBitIm = ipBitReshape(:,[k/2+1:k]);
ipDecIm = sum(ipBitIm.*bin2DecMatrix,2);
ipGrayDecIm = bitxor(ipDecIm,floor(ipDecIm/2));
% mapping the Gray coded symbols into constellation
modRe = alphaRe(ipGrayDecRe+1);
modIm = alphaIm(ipGrayDecIm+1);
% complex constellation
mod = modRe + j*modIm;
s1 = k_16QAM*mod; % normalization of transmit power to one
s=[s1(length(s1)-pv+1:end) s1]; %add prefix
% noise
% -----
EsNo=10^(Es_N0_dB(ii)/10);
stanDevNoise=sqrt((1)/(2*EsNo));
n =stanDevNoise *[randn(1,length(s)) + j*randn(1,length(s))]; % white guassian noise, 0dB variance
h=(1/sqrt(2))*(randn+j*randn);
y1= conv(s,h) + n; % additive white gaussian noise
%removes prefix
y1(1:pv) = [];
y=y1/h;
% demodulation
% ------------
y_re = real(y)/k_16QAM; % real part
y_im = imag(y)/k_16QAM; % imaginary part
% rounding to the nearest alphabet
ipHatRe = 2*floor(y_re/2)+1;
ipHatRe(find(ipHatRe>max(alphaRe))) = max(alphaRe);
ipHatRe(find(ipHatRe<min(alphaRe))) = min(alphaRe);
ipHatIm = 2*floor(y_im/2)+1;
ipHatIm(find(ipHatIm>max(alphaIm))) = max(alphaIm);
ipHatIm(find(ipHatIm<min(alphaIm))) = min(alphaIm);
% Constellation to Decimal conversion
ipDecHatRe = ind(floor((ipHatRe+4)/2+1))-1; % LUT based
ipDecHatIm = ind(floor((ipHatIm+4)/2+1))-1; % LUT based
% converting to binary string
ipBinHatRe = dec2bin(ipDecHatRe,k/2);
ipBinHatIm = dec2bin(ipDecHatIm,k/2);
% converting binary string to number
ipBinHatRe = ipBinHatRe.';
ipBinHatRe = ipBinHatRe(1:end).';
ipBinHatRe = reshape(str2num(ipBinHatRe).',k/2,N).' ;
ipBinHatIm = ipBinHatIm.';
ipBinHatIm = ipBinHatIm(1:end).';
ipBinHatIm = reshape(str2num(ipBinHatIm).',k/2,N).' ;
% counting errors for real and imaginary
nBitErr(pktX) = size(find([ipBitRe- ipBinHatRe]),1) + size(find([ipBitIm - ipBinHatIm]),1) ;
end
erTot(ii)=erTot(ii)+sum(nBitErr); %total errors in all packets
simBer(ii)=(erTot(ii)/(N*k*numPkts)); %bit error rate
totPktErRate(ii)=(erTot(ii)/(numPkts));
end
theoryBer = (1/k)*3/2*erfc(sqrt(k*0.1*(10.^(Eb_N0_dB/10))));
close all; figure
semilogy(Eb_N0_dB,theoryBer,'bs-','LineWidth',2);
hold on
semilogy(Eb_N0_dB,simBer,'mx-','LineWidth',2);
axis([0 15 10^-5 1])
grid on
legend('theory', 'simulation');
xlabel('Eb/No, dB')
ylabel('Bit Error Rate')
title('Bit error probability curve for 16-QAM modulation')
Thanks!
The code provided makes the following assumptions:
16-QAM modulation using Gray-coding bit mapping
a flat slow/block Rayleigh fading channel model.
coherent decoding under perfect channel state information estimation
Due to it's similarity with the Additive-White-Gaussian-Noise (AWGN) channel, a logical first step in understanding and calibrating the system performance under the assumptions stated above is to evaluate its performance without fading (i.e. substituting the channel model with an AWGN channel by setting h=1 in the provided code).
AWGN channel
You may want to verify the calibration of Symbol-Error-Rate (SER) performance as this can have a large impact on the (BER) performance, and SER curves are readily available for coherent decoding of uncoded 16-QAM constellation (see e.g dsplog, these lecture slides, this book, etc.) Those references also include the following approximation to the SER of 16-QAM:
1.5*erfc(sqrt(EsN0/10))
where EsN0 = 10.^(0.1*EsN0_dB).
Note that results may be equivalently provided in terms of either Es/N0 (the average energy per symbol) or Eb/N0 (the average energy per bit). For a k-bits signal constellation (constellation size of 2k), the relationship between Es/N0 and Eb/N0 is given as
Es/N0 = k*Eb/N0
Thus for 16-QAM, Es/N0 = 4Eb/N0 (or Es/N0dB = Eb/N0dB + 6dB).
For a Gray coded scheme, a BER approximation for sufficiently high Eb/N0 can then be obtained from the fact that a symbol error translates into 1 bit in error (out of the k-bits in the symbol) most of the time, thus BER ~ SER/k (or again for 16-QAM: BER ~ SER/4).
Es/N0 (dB) Eb/N0 (dB) SER BER approx
15 9 1.8e-2 4.5e-3
16 10 7.0e-3 1.8e-3
18 12 5.5e-4 1.4e-4
20 14 1.2e-5 3.0e-6
25 19 2.7e-15 6.7e-16
As a side note, the confidence interval of simulation results using 2,000,000 symbols at SERs below approximately 10-5 can start to be quite significant. As an illustration, the following graph shows the SER of 16-QAM in blue, with the expected 95% confidence interval of a 2,000,000 symbols simulation in red:
Rayleigh block fading channel
Once performance calibration has been established for the AWGN channel, we can get back to the Rayleigh block fading channel used in the posted code.
Assuming perfect channel state information estimation at the receiver and if there were no noise, it is possible to scale the received signal back exactly onto the original transmitted symbols using the transformation:
y = y1/h;
When noise is present, this transformation unfortunately also scales the noise. Fortunately, the noise remains white and Gaussian such that the basic derivation of AWGN channel equations can be reused with some work.
Over independent packets, the statistical distribution of the scaling abs(h) follows a Rayleigh distribution (with parameter sigma2=1/2). Thus to get the average effect of this scaling on SER, it is possible to compute the weighted sum (where the weight is the probability density function of the Rayleigh distribution) of the effects over the range of possible scaling values using the integral:
This can be done numerically with MATLAB using:
function SER = AwgnSer(EsN0)
SER = 1.5*erfc(sqrt(0.1*EsN0));
end
function f = WeightedAwgnSer(x)
weight = 2*x.*exp(-x.*conj(x));
f = weight*AwgnSer(EsN0*x.*conj(x));
end
function SER = BlockRayleighFadingSer(EsN0)
for ii=1:length(EsN0)
SER(ii) = quad(inline('WeightedAwgnSer(EsN0(ii),s)','s'), 0, inf);
end
end
A similar derivation can be obtained for the BER:
function BER = AwgnBer(EsN0)
x = sqrt(0.1*EsN0);
q1 = 0.5*erfc(x);
q3 = 0.5*erfc(3*x);
q5 = 0.5*erfc(5*x);
BER = (12*q1+8*q3-4*q5 - q1*(q1+q3-2*q5)+(q3-q5)*q5)/16;
end
function f = WeightedAwgnBer(x)
weight = 2*x.*exp(-x.*conj(x));
f = weight*AwgnBer(EsN0*x.*conj(x));
end
function SER = BlockRayleighFadingBer(EsN0)
for ii=1:length(EsN0)
SER(ii) = quad(inline('WeightedAwgnBer(EsN0(ii),s)','s'), 0, inf);
end
end
Note that I've use an exact formula for the BER since the weighted average tends to be affected by low signal-to-noise ratio where the approximation is not very good. It does not make a huge difference on the curve (~0.3dB at Eb/N0=10dB) but it is not something I want to worry about when calibrating performance curves.
This yields the following performance curves:
Other considerations
Decoding performance can be affected by a number of other factors which are beyond the scope of this answer. The following thus only briefly touches on a few common ones and links to external references which may be used for additional information.
The decoder in the posted code uses explicit knowledge of the fading effect (as evidenced from the line y=y1/h;). It is generally not the case, and fading must first be estimated. The estimation of the channel effect at the receiver is beyond the scope of this answer, but generally imperfect estimation result in some performance loss. Performance curves of perfect knowledge are often used as a practical benchmark against which to compare performance under imperfect channel estimation.
Channel coding is often done to improve system performance. Common benchmarks used for coded modulation over the AWGN channel are:
Shannon's channel capacity
Union upper bound (eg. these lectures notes), and multiple other bounds found in research literature
Uncoded modulation performance (which we derived here)
Similarly for coded modulation over flat block Rayleigh fading channel, the following benchmarks are commonly used:
Outage probability (see section 5.4.1 of this book)
Uncoded modulation performance (which we derived here)

Proper way to add noise to signal

In many areas I have found that while adding noise, we mention some specification like zero mean and variance. I need to add AWGN, colored noise, uniform noise of varying SNR in Db. The following code shows the way how I generated and added noise. I am aware of the function awgn() but it is a kind of black box thing without knowing how the noise is getting added. So, can somebody please explain the correct way to generate and add noise. Thank you
SNR = [-10:5:30]; %in Db
snr = 10 .^ (0.1 .* SNR);
for I = 1:length(snr)
noise = 1 / sqrt(2) * (randn(1, N) + 1i * randn(1, N));
u = y + noise .* snr(I);
end
I'm adding another answer since it strikes me that Steven's is not quite correct and Horchler's suggestion to look inside function awgn is a good one.
Either MATLAB or Octave (in the communications toolbox) have a function awgn that adds (white Gaussian) noise to attain a desired signal-to-noise power level; the following is the relevant portion of the code (from the Octave function):
if (meas == 1) % <-- if using signal power to determine appropriate noise power
p = sum( abs( x(:)) .^ 2) / length(x(:));
if (strcmp(type,"dB"))
p = 10 * log10(p);
endif
endif
if (strcmp(type,"linear"))
np = p / snr;
else % <-- in dB
np = p - snr;
endif
y = x + wgn (m, n, np, 1, seed, type, out);
As you can see by the way p (the power of the input data) is computed, the answer from Steven does not appear to be quite right.
You can ask the function to compute the total power of your data array and combine that with the desired s/n value you provide to compute the appropriate power level of the added noise. You do this by passing the string "measured" among the optional inputs, like this (see here for the Octave documentation or here for the MATLAB documentation):
y = awgn (x, snr, 'measured')
This leads ultimately to meas=1 and so meas==1 being true in the code above. The function awgn then uses the signal passed to it to compute the signal power, and from this and the desired s/n it then computes the appropriate power level for the added noise.
As the documentation further explains
By default the snr and pwr are assumed to be in dB and dBW
respectively. This default behavior can be chosen with type set to
"dB". In the case where type is set to "linear", pwr is assumed to be
in Watts and snr is a ratio.
This means you can pass a negative or 0 dB snr value. The result will also depend then on other options you pass, such as the string "measured".
For the MATLAB case I suggest reading the documentation, it explains how to use the function awgn in different scenarios. Note that implementations in Octave and MATLAB are not identical, the computation of noise power should be the same but there may be different options.
And here is the relevant part from wgn (called above by awgn):
if (strcmp(type,"dBW"))
np = 10 ^ (p/10);
elseif (strcmp(type,"dBm"))
np = 10 ^((p - 30)/10);
elseif (strcmp(type,"linear"))
np = p;
endif
if(!isempty(seed))
randn("state",seed);
endif
if (strcmp(out,"complex"))
y = (sqrt(imp*np/2))*(randn(m,n)+1i*randn(m,n)); % imp=1 assuming impedance is 1 Ohm
else
y = (sqrt(imp*np))*randn(m,n);
endif
If you want to check the power of your noise (np), the awgn and awg functions assume the following relationships hold:
np = var(y,1); % linear scale
np = 10*log10(np); % in dB
where var(...,1) is the population variance for the noise y.
Most answers here forget that SNR is specified in decibels. Therefore, you shouldn't encounter 'division by 0' error, because you should really divide by 10^(targetSNR/10) which is never negative nor zero for real targetSNR.
This 'should not divide by 0' problem could be easily solved if you add a condition to check if targetSNR is 0 and do these only if it is not 0. When your target SNR is 0, it means it's pure noise.
function out_signal = addAWGN(signal, targetSNR)
sigLength = length(signal); % length
awgnNoise = randn(size(signal)); % orignal noise
pwrSig = sqrt(sum(signal.^2))/sigLength; % signal power
pwrNoise = sqrt(sum(awgnNoise.^2))/sigLength; % noise power
if targetSNR ~= 0
scaleFactor = (pwrSig/pwrNoise)/targetSNR; %find scale factor
awgnNoise = scaleFactor*awgnNoise;
out_signal = signal + awgnNoise; % add noise
else
out_signal = awgnNoise; % noise only
end
You can use randn() to generate a noise vector 'awgnNoise' of the length you want. Then, given a specified SNR value, calculate the power of the orignal signal and the power of the noise vector 'awgnNoise'.
Get the right amplitude scaling factor for the noise vector and just scale it.
The following code is an example to corrupt signal with white noise, assuming input signal is 1D and real valued.
function out_signal = addAWGN(signal, targetSNR)
sigLength = length(signal); % length
awgnNoise = randn(size(signal)); % orignal noise
pwrSig = sqrt(sum(signal.^2))/sigLength; % signal power
pwrNoise = sqrt(sum(awgnNoise.^2))/sigLength; % noise power
scaleFactor = (pwrSig/pwrNoise)/targetSNR; %find scale factor
awgnNoise = scaleFactor*awgnNoise;
out_signal = signal + awgnNoise; % add noise
Be careful about the sqrt(2) factor when you deal with complex signal, if you want to generate the real and imag part separately.

How can we produce kappa and delta in the following model using Matlab?

I have a following stochastic model describing evolution of a process (Y) in space and time. Ds and Dt are domain in space (2D with x and y axes) and time (1D with t axis). This model is usually known as mixed-effects model or components-of-variation models
I am currently developing Y as follow:
%# Time parameters
T=1:1:20; % input
nT=numel(T);
%# Grid and model parameters
nRow=100;
nCol=100;
[Grid.Nx,Grid.Ny,Grid.Nt] = meshgrid(1:1:nCol,1:1:nRow,T);
xPower=0.1;
tPower=1;
noisePower=1;
detConstant=1;
deterministic_mu = detConstant.*(((Grid.Nt).^tPower)./((Grid.Nx).^xPower));
beta_s = randn(nRow,nCol); % mean-zero random effect representing location specific variability common to all times
gammaTemp = randn(nT,1);
for t = 1:nT
gamma_t(:,:,t) = repmat(gammaTemp(t),nRow,nCol); % mean-zero random effect representing time specific variability common to all locations
end
var=0.1;% noise has variance = 0.1
for t=1:nT
kappa_st(:,:,t) = sqrt(var)*randn(nRow,nCol);
end
for t=1:nT
Y(:,:,t) = deterministic_mu(:,:,t) + beta_s + gamma_t(:,:,t) + kappa_st(:,:,t);
end
My questions are:
How to produce delta in the expression for Y and the difference in kappa and delta?
Help explain, through some illustration using Matlab, if I am correctly producing Y?
Please let me know if you need some more information/explanation. Thanks.
First, I rewrote your code to make it a bit more efficient. I see you generate linearly-spaced grids for x,y and t and carry out the computation for all points in this grid. This approach has severe limitations on the maximum attainable grid resolution, since the 3D grid (and all variables defined with it) can consume an awfully large amount of memory if the resolution goes up. If the model you're implementing will grow in complexity and size (it often does), I'd suggest you throw this all into a function accepting matrix/vector inputs for s and t, which will be a bit more flexible in this regard -- processing "blocks" of data that will otherwise not fit in memory will be a lot easier that way.
Then, I generated the the delta_st term with rand instead of randn since the noise should be "white". Now I'm very unsure about that last one, and I didn't have time to read through the paper you linked to -- can you tell me on what pages I can find relevant the sections for the delta_st?
Now, the code:
%# Time parameters
T = 1:1:20; % input
nT = numel(T);
%# Grid and model parameters
nRow = 100;
nCol = 100;
% noise has variance = 0.1
var = 0.1;
xPower = 0.1;
tPower = 1;
noisePower = 1;
detConstant = 1;
[Grid.Nx,Grid.Ny,Grid.Nt] = meshgrid(1:nCol,1:nRow,T);
% deterministic mean
deterministic_mu = detConstant .* Grid.Nt.^tPower ./ Grid.Nx.^xPower;
% mean-zero random effect representing location specific
% variability common to all times
beta_s = repmat(randn(nRow,nCol), [1 1 nT]);
% mean-zero random effect representing time specific
% variability common to all locations
gamma_t = bsxfun(#times, ones(nRow,nCol,nT), randn(1, 1, nT));
% mean zero random effect capturing the spatio-temporal
% interaction not found in the larger-scale deterministic mu
kappa_st = sqrt(var)*randn(nRow,nCol,nT);
% mean zero random effect representing the micro-scale
% spatio-temporal variability that is modelled by white
% noise (i.i.d. at different time steps) in Ds·Dt
delta_st = noisePower * (rand(nRow,nCol,nT)-0.5);
% Final result:
Y = deterministic_mu + beta_s + gamma_t + kappa_st + delta_st;
Your implementation samples beta, gamma and kappa as if they are white (e.g. their values at each (x,y,t) are independent). The descriptions of the terms suggest that this is not meant to be the case. It looks like delta is supposed to capture the white noise, while the other terms capture the correlations over their respective domains. e.g. there is a non-zero correlation between gamma(t_1) and gamma(t_1+1).
If you wish to model gamma as a stationary Gaussian Markov process with variance var_g and correlation cor_g between gamma(t) and gamma(t+1), you can use something like
gamma_t = nan( nT, 1 );
gamma_t(1) = sqrt(var_g)*randn();
K_g = cor_g/var_g;
K_w = sqrt( (1-K_g^2)*var_g );
for t = 2:nT,
gamma_t(t) = K_g*gamma_t(t-1) + K_w*randn();
end
gamma_t = reshape( gamma_t, [ 1 1 nT ] );
The formulas I've used for gains K_g and K_w in the above code (and the initialization of gamma_t(1)) produce the desired stationary variance \sigma^2_0 and one-step covariance \sigma^2_1:
Note that the implementation above assumes that later you will sum the terms using bsxfun to do the "repmat" for you:
Y = bsxfun( #plus, deterministic_mu + kappa_st + delta_st, beta_s );
Y = bsxfun( #plus, Y, gamma_t );
Note that I haven't tested the above code, so you should confirm with sampling that it does actually produce a zero noise process of the specified variance and covariance between adjacent samples. To sample beta the same procedure can be extended into two dimensions, but the principles are essentially the same. I suspect kappa should be similarly modeled as a Markov Gaussian Process, but in all three dimensions and with a lower variance to represent higher-order effects not captured in mu, beta and gamma.
Delta is supposed to be zero mean stationary white noise. Assuming it to be Gaussian with variance noisePower one would sample it using
delta_st = sqrt(noisePower)*randn( [ nRows nCols nT ] );