BER graph MATLAB calibration - matlab

I'm doing a 16QAM system (transmitter, channel and receiver), and BER and PER curves of the results. However, I'm having some problems with noise at the receiver.
I'm running the system inside two loops: for all the Eb/No values and for all the packets and I sent 200 symbols and 1000 packets but this still happens. I would like to check whether the result from this code is correct or not:
clear all
clc
numPkts=1000;
N = 200; % number of symbols
M = 16; % constellation size
k = log2(M); % bits per symbol
pv=4; %prefix length
% defining the real and imaginary PAM constellation
% for 16-QAM
alphaRe = [-(2*sqrt(M)/2-1):2:-1 1:2:2*sqrt(M)/2-1];
alphaIm = [-(2*sqrt(M)/2-1):2:-1 1:2:2*sqrt(M)/2-1];
k_16QAM = 1/sqrt(10);
Eb_N0_dB = [0:15]; % multiple Es/N0 values
Es_N0_dB = Eb_N0_dB + 10*log10(k);
erTot=zeros(1,length(Eb_N0_dB));
% Mapping for binary <--> Gray code conversion
ref = [0:k-1];
map = bitxor(ref,floor(ref/2));
[tt ind] = sort(map);
for ii = 1:length(Eb_N0_dB)
for pktX=1:numPkts
% symbol generation
% ------------------
ipBit = rand(1,N*k,1)>0.5; % random 1's and 0's
ipBitReshape = reshape(ipBit,k,N).';
bin2DecMatrix = ones(N,1)*(2.^[(k/2-1):-1:0]) ; % conversion from binary to decimal
% real
ipBitRe = ipBitReshape(:,[1:k/2]);
ipDecRe = sum(ipBitRe.*bin2DecMatrix,2);
ipGrayDecRe = bitxor(ipDecRe,floor(ipDecRe/2));
% imaginary
ipBitIm = ipBitReshape(:,[k/2+1:k]);
ipDecIm = sum(ipBitIm.*bin2DecMatrix,2);
ipGrayDecIm = bitxor(ipDecIm,floor(ipDecIm/2));
% mapping the Gray coded symbols into constellation
modRe = alphaRe(ipGrayDecRe+1);
modIm = alphaIm(ipGrayDecIm+1);
% complex constellation
mod = modRe + j*modIm;
s1 = k_16QAM*mod; % normalization of transmit power to one
s=[s1(length(s1)-pv+1:end) s1]; %add prefix
% noise
% -----
EsNo=10^(Es_N0_dB(ii)/10);
stanDevNoise=sqrt((1)/(2*EsNo));
n =stanDevNoise *[randn(1,length(s)) + j*randn(1,length(s))]; % white guassian noise, 0dB variance
h=(1/sqrt(2))*(randn+j*randn);
y1= conv(s,h) + n; % additive white gaussian noise
%removes prefix
y1(1:pv) = [];
y=y1/h;
% demodulation
% ------------
y_re = real(y)/k_16QAM; % real part
y_im = imag(y)/k_16QAM; % imaginary part
% rounding to the nearest alphabet
ipHatRe = 2*floor(y_re/2)+1;
ipHatRe(find(ipHatRe>max(alphaRe))) = max(alphaRe);
ipHatRe(find(ipHatRe<min(alphaRe))) = min(alphaRe);
ipHatIm = 2*floor(y_im/2)+1;
ipHatIm(find(ipHatIm>max(alphaIm))) = max(alphaIm);
ipHatIm(find(ipHatIm<min(alphaIm))) = min(alphaIm);
% Constellation to Decimal conversion
ipDecHatRe = ind(floor((ipHatRe+4)/2+1))-1; % LUT based
ipDecHatIm = ind(floor((ipHatIm+4)/2+1))-1; % LUT based
% converting to binary string
ipBinHatRe = dec2bin(ipDecHatRe,k/2);
ipBinHatIm = dec2bin(ipDecHatIm,k/2);
% converting binary string to number
ipBinHatRe = ipBinHatRe.';
ipBinHatRe = ipBinHatRe(1:end).';
ipBinHatRe = reshape(str2num(ipBinHatRe).',k/2,N).' ;
ipBinHatIm = ipBinHatIm.';
ipBinHatIm = ipBinHatIm(1:end).';
ipBinHatIm = reshape(str2num(ipBinHatIm).',k/2,N).' ;
% counting errors for real and imaginary
nBitErr(pktX) = size(find([ipBitRe- ipBinHatRe]),1) + size(find([ipBitIm - ipBinHatIm]),1) ;
end
erTot(ii)=erTot(ii)+sum(nBitErr); %total errors in all packets
simBer(ii)=(erTot(ii)/(N*k*numPkts)); %bit error rate
totPktErRate(ii)=(erTot(ii)/(numPkts));
end
theoryBer = (1/k)*3/2*erfc(sqrt(k*0.1*(10.^(Eb_N0_dB/10))));
close all; figure
semilogy(Eb_N0_dB,theoryBer,'bs-','LineWidth',2);
hold on
semilogy(Eb_N0_dB,simBer,'mx-','LineWidth',2);
axis([0 15 10^-5 1])
grid on
legend('theory', 'simulation');
xlabel('Eb/No, dB')
ylabel('Bit Error Rate')
title('Bit error probability curve for 16-QAM modulation')
Thanks!

The code provided makes the following assumptions:
16-QAM modulation using Gray-coding bit mapping
a flat slow/block Rayleigh fading channel model.
coherent decoding under perfect channel state information estimation
Due to it's similarity with the Additive-White-Gaussian-Noise (AWGN) channel, a logical first step in understanding and calibrating the system performance under the assumptions stated above is to evaluate its performance without fading (i.e. substituting the channel model with an AWGN channel by setting h=1 in the provided code).
AWGN channel
You may want to verify the calibration of Symbol-Error-Rate (SER) performance as this can have a large impact on the (BER) performance, and SER curves are readily available for coherent decoding of uncoded 16-QAM constellation (see e.g dsplog, these lecture slides, this book, etc.) Those references also include the following approximation to the SER of 16-QAM:
1.5*erfc(sqrt(EsN0/10))
where EsN0 = 10.^(0.1*EsN0_dB).
Note that results may be equivalently provided in terms of either Es/N0 (the average energy per symbol) or Eb/N0 (the average energy per bit). For a k-bits signal constellation (constellation size of 2k), the relationship between Es/N0 and Eb/N0 is given as
Es/N0 = k*Eb/N0
Thus for 16-QAM, Es/N0 = 4Eb/N0 (or Es/N0dB = Eb/N0dB + 6dB).
For a Gray coded scheme, a BER approximation for sufficiently high Eb/N0 can then be obtained from the fact that a symbol error translates into 1 bit in error (out of the k-bits in the symbol) most of the time, thus BER ~ SER/k (or again for 16-QAM: BER ~ SER/4).
Es/N0 (dB) Eb/N0 (dB) SER BER approx
15 9 1.8e-2 4.5e-3
16 10 7.0e-3 1.8e-3
18 12 5.5e-4 1.4e-4
20 14 1.2e-5 3.0e-6
25 19 2.7e-15 6.7e-16
As a side note, the confidence interval of simulation results using 2,000,000 symbols at SERs below approximately 10-5 can start to be quite significant. As an illustration, the following graph shows the SER of 16-QAM in blue, with the expected 95% confidence interval of a 2,000,000 symbols simulation in red:
Rayleigh block fading channel
Once performance calibration has been established for the AWGN channel, we can get back to the Rayleigh block fading channel used in the posted code.
Assuming perfect channel state information estimation at the receiver and if there were no noise, it is possible to scale the received signal back exactly onto the original transmitted symbols using the transformation:
y = y1/h;
When noise is present, this transformation unfortunately also scales the noise. Fortunately, the noise remains white and Gaussian such that the basic derivation of AWGN channel equations can be reused with some work.
Over independent packets, the statistical distribution of the scaling abs(h) follows a Rayleigh distribution (with parameter sigma2=1/2). Thus to get the average effect of this scaling on SER, it is possible to compute the weighted sum (where the weight is the probability density function of the Rayleigh distribution) of the effects over the range of possible scaling values using the integral:
This can be done numerically with MATLAB using:
function SER = AwgnSer(EsN0)
SER = 1.5*erfc(sqrt(0.1*EsN0));
end
function f = WeightedAwgnSer(x)
weight = 2*x.*exp(-x.*conj(x));
f = weight*AwgnSer(EsN0*x.*conj(x));
end
function SER = BlockRayleighFadingSer(EsN0)
for ii=1:length(EsN0)
SER(ii) = quad(inline('WeightedAwgnSer(EsN0(ii),s)','s'), 0, inf);
end
end
A similar derivation can be obtained for the BER:
function BER = AwgnBer(EsN0)
x = sqrt(0.1*EsN0);
q1 = 0.5*erfc(x);
q3 = 0.5*erfc(3*x);
q5 = 0.5*erfc(5*x);
BER = (12*q1+8*q3-4*q5 - q1*(q1+q3-2*q5)+(q3-q5)*q5)/16;
end
function f = WeightedAwgnBer(x)
weight = 2*x.*exp(-x.*conj(x));
f = weight*AwgnBer(EsN0*x.*conj(x));
end
function SER = BlockRayleighFadingBer(EsN0)
for ii=1:length(EsN0)
SER(ii) = quad(inline('WeightedAwgnBer(EsN0(ii),s)','s'), 0, inf);
end
end
Note that I've use an exact formula for the BER since the weighted average tends to be affected by low signal-to-noise ratio where the approximation is not very good. It does not make a huge difference on the curve (~0.3dB at Eb/N0=10dB) but it is not something I want to worry about when calibrating performance curves.
This yields the following performance curves:
Other considerations
Decoding performance can be affected by a number of other factors which are beyond the scope of this answer. The following thus only briefly touches on a few common ones and links to external references which may be used for additional information.
The decoder in the posted code uses explicit knowledge of the fading effect (as evidenced from the line y=y1/h;). It is generally not the case, and fading must first be estimated. The estimation of the channel effect at the receiver is beyond the scope of this answer, but generally imperfect estimation result in some performance loss. Performance curves of perfect knowledge are often used as a practical benchmark against which to compare performance under imperfect channel estimation.
Channel coding is often done to improve system performance. Common benchmarks used for coded modulation over the AWGN channel are:
Shannon's channel capacity
Union upper bound (eg. these lectures notes), and multiple other bounds found in research literature
Uncoded modulation performance (which we derived here)
Similarly for coded modulation over flat block Rayleigh fading channel, the following benchmarks are commonly used:
Outage probability (see section 5.4.1 of this book)
Uncoded modulation performance (which we derived here)

Related

BER result in MATLAB

I build a pam-2 modulation, then I make a pulse shaping with half sine (matched filter).
Then I send it through the AWGN channel.
At the end I do down sampling and demodulation.
But i have problem with plotting BER. I don't understand what I'm doing wrong:
clc;
clear;
N=1e4;
N2 = 1e2;
M = 2;
range = 0:10;
error = zeros(1,length(range)); %BER
%
% half sine
Rc = 1e3; % Chip rate
T = 1/Rc; % inverse of chip rate
Tc = 0.5* T;
Fs = 2e3; % sampling frequency
dt = 1/Fs;
over = Fs/Rc; % sampling factor
sps = 10;
time = 0:dt/sps:2*T;
half_Sine = sin(pi*time/(2*T)).^3;
%% BER
for i = 1:length(range)
for n = 1:N2
% Modulation
x=randi([0 M-1],N,1);
h_mod = pammod(x,M);
over_data=upsample(h_mod,over);
txSig = conv(over_data,half_Sine, 'same');
% AWGN
Ps = mean(((txSig)).^2);
Sigma = sqrt(Ps * 10^(-range(i)/10) / 2);
Noise = randn(length(txSig), 1) * Sigma;
rx_SIG = Noise + txSig;
% Downsample
down = rx_SIG(1:over:end);
% Demodulation
hDemod = pamdemod(down,M);
% Errors
error(i) = error(i)+...
sum(hDemod~=x) / length(hDemod);
end
BER = error/n;
end
figure(1);
grid on
semilogy(range,BER);
title('BER');
Update:I need to build a ber from 10^1 to 10^6
1.- Your script up and running :
There are many code lines turned comments that I used to get to run the start script. I have left them along, that should be used while debugging only.
close all;clear all;clc;
N=1024;
M = 2;
% pulse : half sine
Rc = 1e3; % [b/s] chip rate
T = 1/Rc; % [s/b] inverse of chip rate
Tc = 0.5* T;
Fs = 2e3; % [Hz] sampling frequency
dt = 1/Fs; % [s]
ov1 = Fs/Rc; % sampling factor
sps = 10;
dt2=dt/sps
% single pulse time reference
t = [0:dt2:2*T]; % [s]
% signals usually have a known preable
% a heading known sequence that helps receivers
% extract with relative ease when pulse peaks take place
% the sync preamble has to be long enough to acquire
% the sampling interval.
% For simplicity here I just make sure that the first bit of the signal is always 1
nsync=64 % length sync header
x=[ones(1,nsync) randi([0 M-1],1,N-nsync)]; % signal : data
xm=reshape(x([nsync+1:end]),[(N-nsync)/8 8])
L1=sum(repmat(2.^[7:-1:0],size(xm,1),1).*xm,2); % string to check received bytes against, to measure BER
over_data = pammod(x,M); % signal : PAM symbols
% err1 = zeros(1,length(x)); %BER
% single pulse
pulse_half_Sine = sin(pi*t/(2*T)).^3;
figure;plot(t,pulse_half_Sine)
grid on;xlabel('t');title('single pulse')
% A=[.0001:.0001:1];
A=[1e-3 5e-3 1e-2 5e-2 .1 .5 1 10 100];
% A=[5e-2:1e-2:2];
rng1 = [1:numel(A)]; % amount S power levels to check for BER
% to use power on requires a reference impedance
% usually assumed 1, in accademic literature, but then when attempting to correlated
% BER with used Watts if R0=1 the comparison is not correct.
R0=50 % [Ohm]
%% BER measuring loop
% k=1
% Logging BER for different signal power levels,
% Logging signal power levels, it was useful when getting script up and running
BER=[];
SNR_log=[];
figure(1)
ax1=gca
for k = 1:length(rng1)
% generating signal
x2=2*(x-.5); % [0 1] to [-1 1]
S=[]
for k2=1:1:numel(x)
S=[S A(k)*x2(k2)*pulse_half_Sine];
end
Ps = mean(S.^2)/R0; % signal power
% adding AWGN
% you are making the noise proportional to the signal
% this causes BER not to improve when signal power up
% sigma1 = sqrt(Ps * 10^(-rng1(k)/10) / 2); % not used
sigma1=.1
noise1 = randn(length(S), 1) * sigma1;
Pn=mean(noise1.^2)/R0;
rx_S = noise1' + S; % noise + S
% Downsample
% this downsampling is an attempt to sync received signal
% to the time stamps where pulse peaks are expected
% but it does not work
% dwn1 = rx_SIG(1:ov1:end);
% Demodulation
% because the sampling times of the previous line are not
% centered pamdemod doesn't work either
% hDemod = pamdemod(dwn1,M);
% the missing key step : conv on reception with expected pulse shape
rx2_S=conv(pulse_half_Sine,rx_S);
rx2_sync=conv(pulse_half_Sine,rx_S([1:1:nsync*numel(pulse_half_Sine)]));
% removing leading samples that only correspond to the
% pulse used to correlate over received signal
rx2_S([1:numel(pulse_half_Sine)-1])=[];
% syncing
[pks,locs]=findpeaks(abs(rx2_sync),'NPeaks',nsync,'MinPeakHeight',A(k)/2);
% [pks,locs]=findpeaks(abs(rx2_S));
% x3(find(pks<.1))=[]; % do not use sign results close to zero
% locs(find(pks<.1))=[];
% t0=dt2*[0:1:numel(rx2_sync)-1];
% figure;plot(t0,rx2_sync);hold on
% plot(t0(locs),pks,'bo')
% 5 header pulses needed, 1st and last header samples are null, not needed
n01=find(pks<.2*A(k));
if ~isempty(n01) peaks(n01)=[];locs(n01)=[]; end
% pks([1 end])=[];locs([1 end])=[];
% plot(t0(locs),pks,'rs')
% since we know there have to be 5 leading pulses to be all ones
% we extract the sampling interval from this header
nT2=round(mean(diff(locs)))
% t3=dt2*[0:1:numel(rx2_S)-1];
% figure;plot(t3,abs(rx2_S));
% hold on
% xlabel('t')
% plot(t3(locs),pks,'ro')
% nt3=[1:nT2:numel(x)*nT2]; % sampling times
% plot(t3(nt3),max(pks)*ones(1,numel(nt3)),'sg')
x3=sign(rx2_S([1:nT2:N*nT2])); % only N bits expected so only sample N times
% x3 [-1 1] back to [0 1] otherwise when comparing v3 against x
% a roughtly 50% of bits are always going to be wrong, which comes
% from the signal statistics
x3=.5*(1+x3);
% sampling
% x3=sign(rx_S(locs));
% making sure x3 and x same length
% x3(find(pks<.1))=[]; % do not use sign results close to zero
SNR=Ps/Pn;
SNR_log=[SNR_log Ps];
x3_8=reshape(x3([nsync+1:end]),[(N-nsync)/8 8])
Lrx=sum(repmat(2.^[7:-1:0],size(x3_8,1),1).*x3_8,2);
err1 = sum(L1~=Lrx) / length(Lrx);
BER = [BER err1];
end
%% BER(S)
figure(1);
plot(rng1,BER);
grid on
title('BER/rng1');
This is not BIT ERROR RATIO as we know it and as it is used in all sort of quality measurements.
Note that Bit Error Ratio is not the same as Bit Error Rate despite both terms commonly used alike.
A rate implies and amount/seconds a velocity, speed.
BER as used commonly used to measure signal quality is a RATIO, not a rate.
BER = correct_bits/total_bits , but it's not as simple as this, as I am going to show.
For instance note that worst BER obtained with your script with a quick fix doesn't reach above 0.5 (!?) BER certainly reaches 1 when message not 'getting-there'.
I believe the following points are important for you to understand how BER really works.
2.- BER was completely flat for really dispare signal power levels
In an earlier working script not shown even using pulse amplitude A=100, and low noise mean(noise1)=-7.36e-04 about 1/3 of the received symbols are erroneous while figure;plot(rx_S) shows a rather clean signal, no riding ripple, no sudden changes ..
The 1/3 errorenous bit were not corrupted by channel noise but it was already in the transmitted signal. I have spaced each pulse enough to avoid overlapped pulses.
Adjacent pulses need at least 2ms to avoid overlapping.
This is without considering doppler.
Heavily overlapping symbols is what happens when command conv is used on a train of pulses generated the way you did :
S = conv(over_data,A(k)*pulse_half_Sine, 'same');
3.- You started with 1e4 data bits treated as 1e4 modulation symbols
But your transmitted-received time signal also showed length 1e4 time samples, cannot be, way too few time samples.
The time reference of over_data and pulse_half_Sine should not be the same.
Nyquist; signal is currupted beyond recovery if only 2 samples er cycle of let's say carrier modulating pulses.
I tried
h_mod = pammod(x,M);
over_data=upsample(h_mod,ov1);
S = conv(h_mod,A(k)*pulse_half_Sine, 'same'); % modulated signal
h_mod = pammod(x,M);
S = conv(h_mod,A(k)*pulse_half_Sine, 'same'); % modulated signal
S = conv(over_data,A(k)*pulse_half_Sine, 'same'); % modulated signal
and none of these 3 got the expected BER showing whether the signal is strong or weak.
4.- It turns out command upsample is for discrete-time models
sys = tf(0.75,[1 10 2],2.25)
L = 14;
sys1 = upsample(sys,L)
not to directly interpolate a signal to, for instance, double the amount of samples as it seems you attempted.
5.- This is how the transmitted signal (before noise added) should look like
t2=dt2*[0:1:numel(S)-1];
figure;plot(t2,S);
grid on;xlabel('t');title('transmitted signal before noise')
t3=dt2*[0:1:numel(rx2_S)-1];
[pks,locs]=findpeaks(abs(rx2_S))
figure;plot(t3,rx2_S);
hold on
xlabel('t')
plot(t3(locs),pks,'ro')
6.- The chosen pulse is not particularly strong against AWGN
The main reason being because it's a baseband pulse. not modulated, and on top of this only has positive values.
Convolution efficiency highly improves when modulating the pulse, the positive and negative pulse samples to be found across each pulse increases robustness when attempting to decide whether there's pulse or just noise.
For instance Chirp pulses are a lot stronger.
7.- To measure BER : Use bytes, constellation points, coded symbols, but not bare bits
Measuring BER with bare bits, or more broadly speaking, using a random test signal with fixed statistical moments BER is constrained to whatever mean and var assinged to signal and/or mean var from noise in absence of with weak signal.
Rewording, testing for BER with bare bits counting, when weak or no signal BER is actually measuring the noise the signal was trying to avoid.
Roughly 50% of the received bits, regardless of signal or noise, the way you are attempting BER measurement, will always hit what are apparently correct bits : false positives.
To avoid these false positives following I show how to measure BER against expected caracters.
N=1024
..
nsync=64 % length sync header
x=[ones(1,nsync) randi([0 M-1],1,N-nsync)]; % signal : data
Now x is 1024 and the initial 64 bits are for syncing only, leaving N-sync for message.
Let's check BER against let's say L1 the expected sequence of bytes
xm=reshape(x([nsync+1:end]),[(N-nsync)/8 8])
L1=sum(repmat(2.^[7:-1:0],size(xm,1),1).*xm,2);
L1 is checked against Lrx generated with x3_8 the message part of x3 the demodulated symbols
8.- the upsampling downsampling didn't work
this downsampling on reception
dwn1 = rx_SIG(1:ov1:end);
was an attempt to sync received signal to the time stamps where pulse peaks are expected but it didn't not work.
Because the sampling times were not centered pamdemod didn't work either.
9.- Use sync header to calculate sampling interval
I only convolve the nsync (64) initial bits
rx2_sync=conv(pulse_half_Sine,rx_S([1:1:nsync*numel(pulse_half_Sine)]));
These pulses allow a reliable calculation of nT2 the sampling interval to check along the rest of the received frame.
I obtain nT2 with
[pks,locs]=findpeaks(abs(rx2_sync),'NPeaks',nsync,'MinPeakHeight',A(k)/2);
There's need for further conditioning but basically locs already has the necessary information to obtain nT2 .
10.- This is the graph obtained
when no signal BER = 1 and when signal strength high enough PAM signals show good `BER' ending to 0.
When refining A step, by this meaning making it smaller, one gets the following
BER testers are often plugged to base stations upon setup and left a few hours or even days recording, and such testers do not record bare bit errors, bytes, constellation points, and even frames are checked.
11.- BER/SNR BER/EbN0 not against signal only
BER is usually plotted against SNR (analog signals) or Eb/N0 (digital signals) not just against signal amplitude or signal power.
12.- The Communications Toolbox is an add-on
This toolbox adds the following support functions: pammod pamdemod genqammod genqamdemod, yes pammod and pamdemod use genqammod genqamdemod respectively.
These functions are not available unless the Communications Toolbox is installed.
for BER simulations try Simulink, there are already available BER examples.

how to robustly estimate in MATLAB low and up envelope of signal with trend, few level constant steps and noise

I am looking for robust estimation method of low and up envelope of the signal consisting from smooth trend component, constant steps between few fixed levels and additive noise (+ outliers of course). This question raised during my current research work on real life signal processing.
Typical example of signal is produced by following script:
%% signal definition
% number of samples
Ns = 10000;
% sampling period [secs]
Ts = 1;
time = (1:Ns)*Ts;
% trend component
a = 2;
b = 0;
T = 1e4;
slope = time/Ts * 0.0005;
trend = a * sin(2*pi*time/T) + b + slope;
% steps component (4 constant levels)
step = [zeros(1,1000),linspace(0,1,6),1*ones(1,500),linspace(1,0,6),zeros(1,500),linspace(0,-2,10), -2*ones(1,1200),linspace(-2,1,15),1*ones(1,3000),linspace(1,0,6), zeros(1,3000),linspace(0,-1,6), -1*ones(1,751)];
% noise component (normal noise)
noise = 0.2*randn(1,Ns);
% noise = 0.5*(rand(1,Ns)-0.5);
%
%% show signals component
close all
figure
plot(time,trend,'r-')
hold on
plot(time,trend+step,'g-')
plot(time,trend - 2, 'b--')
plot(time,trend + 1, 'k--')
plot(time,trend+step+noise,'mo')
legend('trend','trend+steps','lowenvelope', 'upenvelope','trend+steps+noise')
title('smooth trend signal with constant steps between 4 levels and noise')
xlabel('time [sec]')
ylabel('value [-]')
hold off
See the following image
The separate signal components are unknown! Steps are always constant and between small number of fixed levels (typically < 4 or 5), so the estimated envelopes should be parallel to the trend signal. Noise is approximated by normal distribution with sigma ~0.1
Any idea how to solve this surprisingly difficult problem? Any relevant references or matlab code links?

Linear regression -- Stuck in model comparison in Matlab after estimation?

I want to determine how well the estimated model fits to the future new data. To do this, prediction error plot is often used. Basically, I want to compare the measured output and the model output. I am using the Least Mean Square algorithm as the equalization technique. Can somebody please help what is the proper way to plot the comparison between the model and the measured data? If the estimates are close to true, then the curves should be very close to each other. Below is the code. u is the input to the equalizer, x is the noisy received signal, y is the output of the equalizer, w is the equalizer weights. Should the graph be plotted using x and y*w? But x is noisy. I am confused since the measured output x is noisy and the model output y*w is noise-free.
%% Channel and noise level
h = [0.9 0.3 -0.1]; % Channel
SNRr = 10; % Noise Level
%% Input/Output data
N = 1000; % Number of samples
Bits = 2; % Number of bits for modulation (2-bit for Binary modulation)
data = randi([0 1],1,N); % Random signal
d = real(pskmod(data,Bits)); % BPSK Modulated signal (desired/output)
r = filter(h,1,d); % Signal after passing through channel
x = awgn(r, SNRr); % Noisy Signal after channel (given/input)
%% LMS parameters
epoch = 10; % Number of epochs (training repetation)
eta = 1e-3; % Learning rate / step size
order=10; % Order of the equalizer
U = zeros(1,order); % Input frame
W = zeros(1,order); % Initial Weigths
%% Algorithm
for k = 1 : epoch
for n = 1 : N
U(1,2:end) = U(1,1:end-1); % Sliding window
U(1,1) = x(n); % Present Input
y = (W)*U'; % Calculating output of LMS
e = d(n) - y; % Instantaneous error
W = W + eta * e * U ; % Weight update rule of LMS
J(k,n) = e * e'; % Instantaneous square error
end
end
Lets start step by step:
First of all when using some fitting method it is a good practice to use RMS error . To get this we have to find error between input and output. As I understood x is an input for our model and y is an output. Furthermore you already calculated error between them. But you used it in loop without saving. Lets modify your code:
%% Algorithm
for k = 1 : epoch
for n = 1 : N
U(1,2:end) = U(1,1:end-1); % Sliding window
U(1,1) = x(n); % Present Input
y(n) = (W)*U'; % Calculating output of LMS
e(n) = x(n) - y(n); % Instantaneous error
W = W + eta * e(n) * U ; % Weight update rule of LMS
J(k,n) = e(n) * (e(n))'; % Instantaneous square error
end
end
Now e consists of errors at the last epoch. So we can use something like this:
rms(e)
Also I'd like to compare results using mean error and standard deviation:
mean(e)
std(e)
And some visualization:
histogram(e)
Second moment: we can't use compare function just for vectors! You can use it for dynamic system models. For it you have to made some workaround about using this method as dynamic model. But we can use some functions as goodnessOfFit for example. If you want something like error at each step that consider all previous points of data then make some math workaround - calculate it at each point using [1:currentNumber].
About using LMS method. There are built-in function calculating LMS. Lets try to use it for your data sets:
alg = lms(0.001);
eqobj = lineareq(10,alg);
y1 = equalize(eqobj,x);
And lets see at the result:
plot(x)
hold on
plot(y1)
There are a lot of examples of such implementation of this function: look here for example.
I hope this was helpful for you!
Comparison of the model output vs observed data is known as residual.
The difference between the observed value of the dependent variable
(y) and the predicted value (ŷ) is called the residual (e). Each data
point has one residual.
Residual = Observed value - Predicted value
e = y - ŷ
Both the sum and the mean of the residuals are equal to zero. That is,
Σ e = 0 and e = 0.
A residual plot is a graph that shows the residuals on the vertical
axis and the independent variable on the horizontal axis. If the
points in a residual plot are randomly dispersed around the horizontal
axis, a linear regression model is appropriate for the data;
otherwise, a non-linear model is more appropriate.
Here is an example of residual plots from a model of mine. On the vertical axis is the difference between the output of the model and the measured value. On the horizontal axis is one of the independent variables used in the model.
We can see that most of the residuals are within 0.2 units which happens to be my tolerance for this model. I can therefore make a conclusion as to the worth of the model.
See here for a similar question.
Regarding you question about the lack of noise in your models output. We are creating a linear model. There's the clue.

Echo State Network learning Mackey-Glass function, but how?

I got this example of a minimal Echo State Network (ESN) which I analyse while trying to understand Echo State Networks. Unfortunately I have some problems understanding why this really works. It all breaks down to the questions:
[ What defines | What is] the echo state of an ESN?
What is it that makes an ESN so easy and fast learning of such complex nonlinear functions like the Mackey-Glass function?
First here is a little piece of code that shows the important part of initialization:
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Generate the ESN reservoir
%
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
rand('seed', 42);
trainLen = 2000;
testLen = 2000;
initLen = 100;
data = load('MackeyGlass_t17.txt');
% Input neurons
inSize = 1;
% Output neurons
outSize = 1;
% Reservoir size
resSize = 1000;
% Leaking rate
a = 0.3;
% Input weights
Win = ( rand(resSize, (inSize+1) ) - 0.5) .* 1;
% Reservoir weights
W = rand(resSize, resSize) - 0.5;
Running the reservoir:
I understand that every single data-point of the input data set is propagated from the input neuron to the reservoir neurons. After a warm-up of size initLen the states are accepted and stored in matrix X. When this is done every single column of X represents a "vector of reservoir neuron activations". And here comes the point where I am not sure if I got it right:
The comment already says "collected states" or "design matrix" X. Am I getting this right, that all this does is storing the state of the whole network in the rows of matrix X?
If we assume that t was just a time parameter then X(:,t) represents the network state of time t , isn't it?
In my examples this would mean that there are 1.900 time slices which represent the whole network state of their corresponding timeframe (X therefore is a 1002x1900 matrix). Another question that occurs to me here is
why is a 1 (I guess it is the bias) and the input value u appended to this vector: X(:,t-initLen) = [1;u;x];
So:
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Run the reservoir with the data and collect X.
%
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Allocated memory for the design (collected states) matrix
X = zeros((1+inSize) + resSize, trainLen - initLen);
% Vector of reservoir neuron activations (used for calculation)
x = zeros(resSize, 1);
% Update of the reservoir neuron activations
xUpd = zeros(resSize, 1);
for t = 1:trainLen
u = data(t);
xUpd = tanh( Win * [1;u] + W * x );
x = (1-a) * x + a * xUpd;
if ( t > initLen )
X(:,t-initLen) = [1;u;x];
end
end
Training part:
The training part is also a little magic to me yet. I am familiar how linear regression works, so this is not the problem here.
What I see is that this part just uses the hole state matrix X and performs a single linear regression step on the input data to generate the output weight vector Wout and that's it.
So all that's been done so far - if I'm not mistaken - is initializing the output weights according to the state matri X which itself was generated using input data and randomly gernerated (input and reservoir) weights.
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Train the output
%
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Set the corresponding target matrix directly
Yt = data(initLen+2:trainLen+1)';
% Regularization coefficient
reg = 1e-8;
% Get X transposed - needed twice therefore it is a little faster
X_T = X';
% Yt * pseudo_inverse(X); (linear regression task)
Wout = Yt * X_T * (X * X_T + reg * eye(1+inSize+resSize))^(-1);
Running the ESN in a generative mode:
I can run this in two modes: generative or predictive. But well, this is the part where I just can say: "Well, .. it works." not having the exact idea why it is.
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Run the trained ESN in a generative mode. no need to initialize here,
% because x is initialized with training data and we continue from there.
%
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Y = zeros(outSize,testLen);
u = data(trainLen+1);
for t = 1:testLen
xUpd = tanh( Win*[1;u] + W*x );
x = (1-a)*x + a*xUpd;
% Generative mode:
u = Wout*[1;u;x];
% This would be a predictive mode:
%u = data(trainLen+t+1);
Y(:,t) = u;
end
It works pretty well as you can see (generative mode):
I know this is a quiet huge "question" if this can even be considered as one. I feel like I am understanding the single parts but what I'm missing is the big picture of this magic black box called Echo State Network.
The echo state network (ESN) is basically a clever way to train a Recurrent Neural Network.
The ESN has a "reservoir" of hidden units which are coupled.
The inputs are connected to the reservoir with input (plus a bias) to hidden connections. These connections are not trained. They are randomly initialized, and this is the code snippet that does this initialization (I am using python).
Win = (random.rand(resSize,1+inSize)-0.5) * 1
The units in the reservoir are coupled, meaning basically that there exist hidden to hidden connections. Again the weights in the reservoir are not trained but initialized. However, initialization of the reservoir weights is tricky. Those weights (depicted by W in the code) are first randomly initialized and then they are multiplied by a factor which takes into account the spectral radius of the random matrix. Careful initialization of these connections is very important because it affects the dynamics of the ESN (do not forget it is a recurrent network). I guess if you want to know more details about this you have to be able to understand linear system theory.
Now, after initializing properly the two weight matrices you start presenting inputs to the reservoir. For each input presented to the reservoir the activations are calculated and these activations are the state of the ESN. Look at the figure below.
This figure shows a plot of 200 activations for 20 inputs.
So, after presenting all inputs to the ESN the states are collected into a matrix X. This is the code snippet that does this in python:
x = zeros((resSize,1))
for t in range(trainLen):
u = data[t]
x = (1-a)*x + a*tanh( dot( Win, vstack((1,u)) ) + dot( W, x ) )
if t >= initLen:
X[:,t-initLen] = vstack((1,u,x))[:,0]
The state of the ESN is therefore a function of the finite history of the inputs presented to the network.
Now, in order to predict the output from the states of the oscillators the only thing that has to be learned is how to couple the outputs to the oscillators, i.e. the hidden to output connections:
# train the output
reg = 1e-8 # regularization coefficient
X_T = X.T
Wout = dot( dot(Yt,X_T), linalg.inv( dot(X,X_T) + \
reg*eye(1+inSize+resSize) ) )
Then after the network has been trained the predictive capability is tested using the test sample of the data.
The generative mode means that you start with a particular value of the time series and then you use that value to predict the next value in the time series but then you use the predicted value to predict the next value and so on. In effect you are generating the time series, hence generative mode. It allows you to predict multiple steps into the future, as opposed to predictive mode where you get one value from the time series and predict the next one.
And this is why the ESN seems to be doing a pretty good job. The target signal is pretty complex and yet in generative mode it does very well.
Finally, as far as minimal implementation goes i guess it refers to the size of the reservoir (1000), which apparently is pretty small.

How can we produce kappa and delta in the following model using Matlab?

I have a following stochastic model describing evolution of a process (Y) in space and time. Ds and Dt are domain in space (2D with x and y axes) and time (1D with t axis). This model is usually known as mixed-effects model or components-of-variation models
I am currently developing Y as follow:
%# Time parameters
T=1:1:20; % input
nT=numel(T);
%# Grid and model parameters
nRow=100;
nCol=100;
[Grid.Nx,Grid.Ny,Grid.Nt] = meshgrid(1:1:nCol,1:1:nRow,T);
xPower=0.1;
tPower=1;
noisePower=1;
detConstant=1;
deterministic_mu = detConstant.*(((Grid.Nt).^tPower)./((Grid.Nx).^xPower));
beta_s = randn(nRow,nCol); % mean-zero random effect representing location specific variability common to all times
gammaTemp = randn(nT,1);
for t = 1:nT
gamma_t(:,:,t) = repmat(gammaTemp(t),nRow,nCol); % mean-zero random effect representing time specific variability common to all locations
end
var=0.1;% noise has variance = 0.1
for t=1:nT
kappa_st(:,:,t) = sqrt(var)*randn(nRow,nCol);
end
for t=1:nT
Y(:,:,t) = deterministic_mu(:,:,t) + beta_s + gamma_t(:,:,t) + kappa_st(:,:,t);
end
My questions are:
How to produce delta in the expression for Y and the difference in kappa and delta?
Help explain, through some illustration using Matlab, if I am correctly producing Y?
Please let me know if you need some more information/explanation. Thanks.
First, I rewrote your code to make it a bit more efficient. I see you generate linearly-spaced grids for x,y and t and carry out the computation for all points in this grid. This approach has severe limitations on the maximum attainable grid resolution, since the 3D grid (and all variables defined with it) can consume an awfully large amount of memory if the resolution goes up. If the model you're implementing will grow in complexity and size (it often does), I'd suggest you throw this all into a function accepting matrix/vector inputs for s and t, which will be a bit more flexible in this regard -- processing "blocks" of data that will otherwise not fit in memory will be a lot easier that way.
Then, I generated the the delta_st term with rand instead of randn since the noise should be "white". Now I'm very unsure about that last one, and I didn't have time to read through the paper you linked to -- can you tell me on what pages I can find relevant the sections for the delta_st?
Now, the code:
%# Time parameters
T = 1:1:20; % input
nT = numel(T);
%# Grid and model parameters
nRow = 100;
nCol = 100;
% noise has variance = 0.1
var = 0.1;
xPower = 0.1;
tPower = 1;
noisePower = 1;
detConstant = 1;
[Grid.Nx,Grid.Ny,Grid.Nt] = meshgrid(1:nCol,1:nRow,T);
% deterministic mean
deterministic_mu = detConstant .* Grid.Nt.^tPower ./ Grid.Nx.^xPower;
% mean-zero random effect representing location specific
% variability common to all times
beta_s = repmat(randn(nRow,nCol), [1 1 nT]);
% mean-zero random effect representing time specific
% variability common to all locations
gamma_t = bsxfun(#times, ones(nRow,nCol,nT), randn(1, 1, nT));
% mean zero random effect capturing the spatio-temporal
% interaction not found in the larger-scale deterministic mu
kappa_st = sqrt(var)*randn(nRow,nCol,nT);
% mean zero random effect representing the micro-scale
% spatio-temporal variability that is modelled by white
% noise (i.i.d. at different time steps) in Ds·Dt
delta_st = noisePower * (rand(nRow,nCol,nT)-0.5);
% Final result:
Y = deterministic_mu + beta_s + gamma_t + kappa_st + delta_st;
Your implementation samples beta, gamma and kappa as if they are white (e.g. their values at each (x,y,t) are independent). The descriptions of the terms suggest that this is not meant to be the case. It looks like delta is supposed to capture the white noise, while the other terms capture the correlations over their respective domains. e.g. there is a non-zero correlation between gamma(t_1) and gamma(t_1+1).
If you wish to model gamma as a stationary Gaussian Markov process with variance var_g and correlation cor_g between gamma(t) and gamma(t+1), you can use something like
gamma_t = nan( nT, 1 );
gamma_t(1) = sqrt(var_g)*randn();
K_g = cor_g/var_g;
K_w = sqrt( (1-K_g^2)*var_g );
for t = 2:nT,
gamma_t(t) = K_g*gamma_t(t-1) + K_w*randn();
end
gamma_t = reshape( gamma_t, [ 1 1 nT ] );
The formulas I've used for gains K_g and K_w in the above code (and the initialization of gamma_t(1)) produce the desired stationary variance \sigma^2_0 and one-step covariance \sigma^2_1:
Note that the implementation above assumes that later you will sum the terms using bsxfun to do the "repmat" for you:
Y = bsxfun( #plus, deterministic_mu + kappa_st + delta_st, beta_s );
Y = bsxfun( #plus, Y, gamma_t );
Note that I haven't tested the above code, so you should confirm with sampling that it does actually produce a zero noise process of the specified variance and covariance between adjacent samples. To sample beta the same procedure can be extended into two dimensions, but the principles are essentially the same. I suspect kappa should be similarly modeled as a Markov Gaussian Process, but in all three dimensions and with a lower variance to represent higher-order effects not captured in mu, beta and gamma.
Delta is supposed to be zero mean stationary white noise. Assuming it to be Gaussian with variance noisePower one would sample it using
delta_st = sqrt(noisePower)*randn( [ nRows nCols nT ] );