MATLAB: vectorised backpropagation (no loop over training examples) - matlab

In MATLAB/Octave, how do I implement backpropagation without any loops over the training examples?
This answer talks about the theory of parallelism, but how would this be implemented in actual Octave code?

For me the final piece of the puzzle came from computing sum of outer products.
Here is what I came up with:
% X is a {# of training examples} x {# of features} matrix
% Y is a {# of training examples} x {# of output neurons} matrix
% Theta is a cell matrix containing Theta{1}...Theta{n}
% Number of training examples
m = size(X, 1);
% Get h(X) and z (non-activated output of all neurons in network)
[hX, z, activation] = predict(Theta, X);
% Get error of output layer
layers = 1 + length(Theta);
d{layers} = hX - Y;
% Propagate errors backwards through hidden layers
for layer = layers-1 : -1 : 2
d{layer} = d{layer+1} * Theta{layer};
d{layer} = d{layer}(:, 2:end); % Remove "error" for constant bias term
d{layer} .*= sigmoidGradient(z{layer});
end
% Calculate Theta gradients
for l = 1:layers-1
Theta_grad{l} = zeros(size(Theta{l}));
% Sum of outer products
Theta_grad{l} += d{l+1}' * [ones(m,1) activation{l}];
% Add regularisation term
Theta_grad{l}(:, 2:end) += lambda * Theta{l}(:, 2:end);
Theta_grad{l} /= m;
end

Related

Matlab: 2D Discrete Fourier Transform and Inverse

I'm trying to run a program in matlab to obtain the direct and inverse DFT for a grey scale image, but I'm not able to recover the original image after applying the inverse. I'm getting complex numbers as my inverse output. Is like i'm losing information. Any ideas on this? Here is my code:
%2D discrete Fourier transform
%Image Dimension
M=3;
N=3;
f=zeros(M,N);
f(2,1:3)=1;
f(3,1:3)=0.5;
f(1,2)=0.5;
f(3,2)=1;
f(2,2)=0;
figure;imshow(f,[0 1],'InitialMagnification','fit')
%Direct transform
for u=0:1:M-1
for v=0:1:N-1
for x=1:1:M
for y=1:1:N
F(u+1,v+1)=f(x,y)*exp(-2*pi*(1i)*((u*(x-1)/M)+(v*(y-1)/N)));
end
end
end
end
Fab=abs(F);
figure;imshow(Fab,[0 1],'InitialMagnification','fit')
%Inverse Transform
for x=0:1:M-1
for y=0:1:N-1
for u=1:1:M
for v=1:1:N
z(x+1,y+1)=(1/M*N)*F(u,v)*exp(2*pi*(1i)*(((u-1)*x/M)+((v-1)*y/N)));
end
end
end
end
figure;imshow(real(z),[0 1],'InitialMagnification','fit')
There are a couple of issues with your code:
You are not applying the definition of the DFT (or IDFT) correctly: you need to sum over the original variable(s) to obtain the transform. See the formula here; notice the sum.
In the IDFT the normalization constant should be 1/(M*N) (not 1/M*N).
Note also that the code could be made mucho more compact by vectorization, avoiding the loops; or just using the fft2 and ifft2 functions. I assume you want to compute it manually and "low-level" to verify the results.
The code, with the two corrections, is as follows. The modifications are marked with comments.
M=3;
N=3;
f=zeros(M,N);
f(2,1:3)=1;
f(3,1:3)=0.5;
f(1,2)=0.5;
f(3,2)=1;
f(2,2)=0;
figure;imshow(f,[0 1],'InitialMagnification','fit')
%Direct transform
F = zeros(M,N); % initiallize to 0
for u=0:1:M-1
for v=0:1:N-1
for x=1:1:M
for y=1:1:N
F(u+1,v+1) = F(u+1,v+1) + ...
f(x,y)*exp(-2*pi*(1i)*((u*(x-1)/M)+(v*(y-1)/N))); % add term
end
end
end
end
Fab=abs(F);
figure;imshow(Fab,[0 1],'InitialMagnification','fit')
%Inverse Transform
z = zeros(M,N);
for x=0:1:M-1
for y=0:1:N-1
for u=1:1:M
for v=1:1:N
z(x+1,y+1) = z(x+1,y+1) + (1/(M*N)) * ... % corrected scale factor
F(u,v)*exp(2*pi*(1i)*(((u-1)*x/M)+((v-1)*y/N))); % add term
end
end
end
end
figure;imshow(real(z),[0 1],'InitialMagnification','fit')
Now the original and recovered image differ only by very small values, of the order of eps, due to the usual floating-point inaccuacies:
>> f-z
ans =
1.0e-15 *
Columns 1 through 2
0.180411241501588 + 0.666133814775094i -0.111022302462516 - 0.027755575615629i
0.000000000000000 + 0.027755575615629i 0.277555756156289 + 0.212603775716506i
0.000000000000000 - 0.194289029309402i 0.000000000000000 + 0.027755575615629i
Column 3
-0.194289029309402 - 0.027755575615629i
-0.222044604925031 - 0.055511151231258i
0.111022302462516 - 0.111022302462516i
Firstly, the biggest error is that you are computing the Fourier transform incorrectly. When computing F, you need to be summing over x and y, which you are not doing. Here's how to rectify that:
F = zeros(M, N);
for u=0:1:M-1
for v=0:1:N-1
for x=1:1:M
for y=1:1:N
F(u+1,v+1)=F(u+1,v+1) + f(x,y)*exp(-2*pi*(1i)*((u*(x-1)/M)+(v*(y-1)/N)));
end
end
end
end
Secondly, in the inverse transform, your bracketing is incorrect. It should be 1/(M*N) not (1/M*N).
As an aside, at the cost of a bit more memory, you can speed up the computation by not nesting so many loops. Namely, when computing the FFT, do the following instead
x = (1:1:M)'; % x is a column vector
y = (1:1:N) ; % y is a row vector
for u = 0:1:M-1
for v = 0:1:N-1
F2(u+1,v+1) = sum(f .* exp(-2i * pi * (u*(x-1)/M + v*(y-1)/N)), 'all');
end
end
To take this method to the extreme, i.e. not using any loops at all, you would do the following (though this is not recommended, since you would lose code readability and the memory cost would increase exponentially)
x = (1:1:M)'; % x is in dimension 1
y = (1:1:N) ; % y is in dimension 2
u = permute(0:1:M-1, [1, 3, 2]); % x-freqs in dimension 3
v = permute(0:1:N-1, [1, 4, 3, 2]); % y-freqs in dimension 4
% sum the exponential terms in x and y, which are in dimensions 1 and 2.
% If you are using r2018a or older, the below summation should be
% sum(sum(..., 1), 2)
% instead of
% sum(..., [1,2])
F3 = sum(f .* exp(-2i * pi * (u.*(x-1)/M + v.*(y-1)/N)), [1, 2]);
% The resulting array F3 is 1 x 1 x M x N, to make it M x N, simply shiftdim or squeeze
F3 = squeeze(F3);

What is the error in the iterative implementation of gradient descent algorithm below?

I have attempted to implement the iterative version of gradient descent algorithm which however is not working correctly. The vectorized implementation of the same algorithm however works correctly.
Here is the iterative implementation :
function [theta] = gradientDescent_i(X, y, theta, alpha, iterations)
% get the number of rows and columns
nrows = size(X, 1);
ncols = size(X, 2);
% initialize the hypothesis vector
h = zeros(nrows, 1);
% initialize the temporary theta vector
theta_temp = zeros(ncols, 1);
% run gradient descent for the specified number of iterations
count = 1;
while count <= iterations
% calculate the hypothesis values and fill into the vector
for i = 1 : nrows
for j = 1 : ncols
term = theta(j) * X(i, j);
h(i) = h(i) + term;
end
end
% calculate the gradient
for j = 1 : ncols
for i = 1 : nrows
term = (h(i) - y(i)) * X(i, j);
theta_temp(j) = theta_temp(j) + term;
end
end
% update the gradient with the factor
fact = alpha / nrows;
for i = 1 : ncols
theta_temp(i) = fact * theta_temp(i);
end
% update the theta
for i = 1 : ncols
theta(i) = theta(i) - theta_temp(i);
end
% update the count
count += 1;
end
end
And below is the vectorized implementation of the same algorithm :
function [theta, theta_all, J_cost] = gradientDescent(X, y, theta, alpha)
% set the learning rate
learn_rate = alpha;
% set the number of iterations
n = 1500;
% number of training examples
m = length(y);
% initialize the theta_new vector
l = length(theta);
theta_new = zeros(l,1);
% initialize the cost vector
J_cost = zeros(n,1);
% initialize the vector to store all the calculated theta values
theta_all = zeros(n,2);
% perform gradient descent for the specified number of iterations
for i = 1 : n
% calculate the hypothesis
hypothesis = X * theta;
% calculate the error
err = hypothesis - y;
% calculate the gradient
grad = X' * err;
% calculate the new theta
theta_new = (learn_rate/m) .* grad;
% update the old theta
theta = theta - theta_new;
% update the cost
J_cost(i) = computeCost(X, y, theta);
% store the calculated theta value
if i < n
index = i + 1;
theta_all(index,:) = theta';
end
end
Link to the dataset can be found here
The filename is ex1data1.txt
ISSUES
For initial theta = [0, 0] (this is a vector!), learning rate of 0.01 and running this for 1500 iterations I get the optimal theta as :
theta0 = -3.6303
theta1 = 1.1664
The above is the output for the vectorized implementation which I know I have implemented correctly (it passed all the test cases on Coursera).
However, when I implemented the same algorithm using the iterative method (1st code I mentioned) the theta values I get are (alpha = 0.01, iterations = 1500):
theta0 = -0.20720
theta1 = -0.77392
This implementation fails to pass the test cases and I know therefore that the implementation is incorrect.
I am however unable to understand where I am going wrong as the iterative code does the same job, same multiplications as the vectorized one and when I tried to trace the output of 1 iteration of both the codes, the values came same (on pen and paper!) but failed when I ran them on Octave.
Any help regarding this would be of great help especially if you could point out where I went wrong and what exactly was the cause of failure.
Points to consider
The implementation of hypothesis is correct as I tested it out and both the codes gave the same results, so no issues here.
I printed the output of the gradient vector in both the codes and realised that the error lies here because the outputs here were very different!
Additionally, here is the code for pre-processing the data :
function[X, y] = fileReader(filename)
% load the dataset
dataset = load(filename);
% get the dimensions of the dataset
nrows = size(dataset, 1);
ncols = size(dataset, 2);
% generate the X matrix from the dataset
X = dataset(:, 1 : ncols - 1);
% generate the y vector
y = dataset(:, ncols);
% append 1's to the X matrix
X = [ones(nrows, 1), X];
end
What is going wrong with the first code is that the theta_temp and the h vectors are not being initialised properly. For the very first iteration (when count value equals 1) your code runs properly because for that particular iteration the the h and the theta_temp vectors have been initialised to 0 properly. However, since these are temporary vectors for each iteration of gradient descent, they have not been initialised to 0 vectors again for the subsequent iterations. That is, for iteration 2, the values that are modified into h(i) and theta_temp(i) are just added to the old values. Hence because of that, the code does not work properly. You need to update the vectors as zero vectors at the beginning of each iteration and then they would work correctly. Here is my implementation of your code (the first one, observe the changes) :
function [theta] = gradientDescent_i(X, y, theta, alpha, iterations)
% get the number of rows and columns
nrows = size(X, 1);
ncols = size(X, 2);
% run gradient descent for the specified number of iterations
count = 1;
while count <= iterations
% initialize the hypothesis vector
h = zeros(nrows, 1);
% initialize the temporary theta vector
theta_temp = zeros(ncols, 1);
% calculate the hypothesis values and fill into the vector
for i = 1 : nrows
for j = 1 : ncols
term = theta(j) * X(i, j);
h(i) = h(i) + term;
end
end
% calculate the gradient
for j = 1 : ncols
for i = 1 : nrows
term = (h(i) - y(i)) * X(i, j);
theta_temp(j) = theta_temp(j) + term;
end
end
% update the gradient with the factor
fact = alpha / nrows;
for i = 1 : ncols
theta_temp(i) = fact * theta_temp(i);
end
% update the theta
for i = 1 : ncols
theta(i) = theta(i) - theta_temp(i);
end
% update the count
count += 1;
end
end
I ran the code and it gave the same values of theta which you have mentioned. However, what I wonder is how did you state that the output of hypothesis vector was the same in both cases where clearly, this was one of the reasons for the first code failing!

How to plot decision boundary from linear SVM after PCA in Matlab?

I have conducted a linear SVM on a large dataset, however in order to reduce the number of dimensions I performed a PCA, than conducted the SVM on a subset of the component scores (the first 650 components which explained 99.5% of the variance). Now I want to plot the decision boundary in the original variable space using the beta weights and bias from the SVM created in PCA space. But I can't figure out how to project the bias term from the SVM into the original variable space. I've written a demo using the fisher iris data to illustrate:
clear; clc; close all
% load data
load fisheriris
inds = ~strcmp(species,'setosa');
X = meas(inds,3:4);
Y = species(inds);
mu = mean(X)
% perform the PCA
[eigenvectors, scores] = pca(X);
% train the svm
SVMModel = fitcsvm(scores,Y);
% plot the result
figure(1)
gscatter(scores(:,1),scores(:,2),Y,'rgb','osd')
title('PCA space')
% now plot the decision boundary
betas = SVMModel.Beta;
m = -betas(1)/betas(2); % my gradient
b = -SVMModel.Bias; % my y-intercept
f = #(x) m.*x + b; % my linear equation
hold on
fplot(f,'k')
hold off
axis equal
xlim([-1.5 2.5])
ylim([-2 2])
% inverse transform the PCA
Xhat = scores * eigenvectors';
Xhat = bsxfun(#plus, Xhat, mu);
% plot the result
figure(2)
hold on
gscatter(Xhat(:,1),Xhat(:,2),Y,'rgb','osd')
% and the decision boundary
betaHat = betas' * eigenvectors';
mHat = -betaHat(1)/betaHat(2);
bHat = b * eigenvectors';
bHat = bHat + mu; % I know I have to add mu somewhere...
bHat = bHat/betaHat(2);
bHat = sum(sum(bHat)); % sum to reduce the matrix to a single value
% the correct value of bHat should be 6.3962
f = #(x) mHat.*x + bHat;
fplot(f,'k')
hold off
axis equal
title('Recovered feature space')
xlim([3 7])
ylim([0 4])
Any guidance on how I'm calculating bHat incorrectly would be much appreciated.
Just in case anyone else comes across this problem, the solution is the bias term can be used to find the y-intercept, b = -SVMModel.Bias/betas(2). And the y-intercept is just another point in space [0 b] which can be recovered/unrotated by inverse transforming it through the PCA. This new point can then be used to solve the linear equation y = mx + b (i.e., b = y - mx). So the code should be:
% and the decision boundary
betaHat = betas' * eigenvectors';
mHat = -betaHat(1)/betaHat(2);
yint = b/betas(2); % y-intercept in PCA space
yintHat = [0 b] * eigenvectors'; % recover in original space
yintHat = yintHat + mu;
bHat = yintHat(2) - mHat*yintHat(1); % solve the linear equation
% the correct value of bHat is now 6.3962

Can I use t-SNE when the dimension is larger than the number of data?

I am using t-SNE with the matlab code from this web site (https://lvdmaaten.github.io/tsne/). However, there is an error whenever I run this program with the data's dimension is larger than the number of data. The code below is the code I use currently and the error is always occurs here
M = M(:,ind(1:initial_dims));
the error is
Index exceeds matrix dimensions.
Error in tsne (line 62)
M = M(:,ind(1:initial_dims));
I call this tsne function with the command in the matlab
output = tsne(input, [], 2, 640, 30);
The input size is (162x640), the dimension is 640 and the number of data is 162. The program below is the code from the website above.
function ydata = tsne(X, labels, no_dims, initial_dims, perplexity)
%TSNE Performs symmetric t-SNE on dataset X
%
% mappedX = tsne(X, labels, no_dims, initial_dims, perplexity)
% mappedX = tsne(X, labels, initial_solution, perplexity)
%
% The function performs symmetric t-SNE on the NxD dataset X to reduce its
% dimensionality to no_dims dimensions (default = 2). The data is
% preprocessed using PCA, reducing the dimensionality to initial_dims
% dimensions (default = 30). Alternatively, an initial solution obtained
% from an other dimensionality reduction technique may be specified in
% initial_solution. The perplexity of the Gaussian kernel that is employed
% can be specified through perplexity (default = 30). The labels of the
% data are not used by t-SNE itself, however, they are used to color
% intermediate plots. Please provide an empty labels matrix [] if you
% don't want to plot results during the optimization.
% The low-dimensional data representation is returned in mappedX.
%
%
% (C) Laurens van der Maaten, 2010
% University of California, San Diego
if ~exist('labels', 'var')
labels = [];
end
if ~exist('no_dims', 'var') || isempty(no_dims)
no_dims = 2;
end
if ~exist('initial_dims', 'var') || isempty(initial_dims)
initial_dims = min(50, size(X, 2));
end
if ~exist('perplexity', 'var') || isempty(perplexity)
perplexity = 30;
end
% First check whether we already have an initial solution
if numel(no_dims) > 1
initial_solution = true;
ydata = no_dims;
no_dims = size(ydata, 2);
perplexity = initial_dims;
else
initial_solution = false;
end
% Normalize input data
X = X - min(X(:));
X = X / max(X(:));
X = bsxfun(#minus, X, mean(X, 1));
% Perform preprocessing using PCA
if ~initial_solution
disp('Preprocessing data using PCA...');
if size(X, 2) < size(X, 1)
C = X' * X;
else
C = (1 / size(X, 1)) * (X * X');
end
[M, lambda] = eig(C);
[lambda, ind] = sort(diag(lambda), 'descend');
M = M(:,ind(1:initial_dims));
lambda = lambda(1:initial_dims);
if ~(size(X, 2) < size(X, 1))
M = bsxfun(#times, X' * M, (1 ./ sqrt(size(X, 1) .* lambda))');
end
X = bsxfun(#minus, X, mean(X, 1)) * M;
clear M lambda ind
end
% Compute pairwise distance matrix
sum_X = sum(X .^ 2, 2);
D = bsxfun(#plus, sum_X, bsxfun(#plus, sum_X', -2 * (X * X')));
% Compute joint probabilities
P = d2p(D, perplexity, 1e-5); % compute affinities using fixed perplexity
clear D
% Run t-SNE
if initial_solution
ydata = tsne_p(P, labels, ydata);
else
ydata = tsne_p(P, labels, no_dims);
end
I am trying to understand this code but I cannot understand the part where the error occurs.
if size(X, 2) < size(X, 1)
C = X' * X;
else
C = (1 / size(X, 1)) * (X * X');
end
Why this condition is needed? Since the size of 'X' is (162x640), the else statement will be executed. I guess this is the problem. In the else statement, the size of 'C' will be (162x162). However, in the next line
M = M(:,ind(1:initial_dims));
the 'initial_dims' which equals to 640 is used. Did I used this code in a wrong way? Or is it just not available to the data set I use?
According to the document:
The data is preprocessed using PCA, reducing the dimensionality to initial_dims dimensions (default = 30). So, you should leave this parameter unchanged at first time.
The condition if size(X, 2) < size(X, 1) is used to formulate the matrix for economy SVD, so that the size of covariance matrix will be smaller, which leads to faster computation.

Discrete time kalman filter with augmented state vector

I am trying to implement the discrete time kalman filter for a state space model with an augmented state vector. That is, the state equation is given by:
x(t) = 0.80x(t-1) + noise
There is only latent process in my example so the x(t) is of dimension 1x1. The measurement equation is given by:
Y(t) = AX(t) + noise
Where capital Y(t) has dimension 3x1. It contains three observed series, Y(t) = [y1(t),y2(t),y3(t)]'. Two of the observed series are occasionally missing in a systematic way:
y1(t) is observed at all time points.
y2(t) is observed at every fourth time point.
y3(t) is observed at every twelfth time point.
The dimension of the measurement matrix A is 3x12 and the dimension of capital X(t) is 12x1. This is a vector stacked with the last twelve realizations of the latent process:
X(t) = [x(t),x(t-1),...,x(t-11)]'.
The reason for capital X(t) being stacked is that y1(t) is related to x(t) and x(t-1). Moreover, y2(t) is related to x(t), x(t-1), x(t-2) and x(t-3). Finally, y3(t) is related to all of the last twelve realizations of the latent process.
The attached matlab code simulates data from this state space model and subsequently runs through a kalman filter with an augmented state space vector, X(t).
My problem is, that the filtered (and the predicted) process differs substantially from the true latent process. This can be seen from the attached figure as well. Clearly, I am doing something wrong but I am unable to see what it is? My kalman filter worked very well, when the state vector was not stacked... I hope someone can help, thanks!
%-------------------------------------- %
% AUGMENTED KALMAN FILTER WITH MISSINGS
% ------------------------------------- %
clear; close all; clc;
% 1) SET PARAMETER VALUES %
% ----------------------- %
% Number of observations, number of latent processes and observed
% processes.
T = 2000;
NumberOfLatent = 1;
NumberOfObs = 3;
% Measurement Matrix, A.
A = [-0.40, -0.15, zeros(1,10);
0.25, 0.25, 0.25, 0.25, zeros(1,8);
ones(1,12)];
% State transition matrix, Phi.
Phi = zeros(12,12);
Phi(1,1) = 0.80;
Phi(2:end,1:end-1) = eye(11);
% Covariance matrix (for the measurement equation).
R = diag([2.25; 1.00; 1.00]);
% State noise and measurement noise.
MeasNoise = mvnrnd(zeros(1,NumberOfObs),R,T)';
StateNoise = randn(NumberOfLatent,T+11);
% One latent process (X) and three observed processes (Y).
X = zeros(NumberOfObs,T+11);
Y = zeros(NumberOfObs,T);
% Set initial state.
X0 = 0;
% 2) SIMULATE DATA %
% ---------------- %
% Before Y (the observed processes) can be simulated we need to simulate 11
% realisations of X (the latent process).
for t = 1:T+11
if (t == 1)
X(:,t) = Phi(1,1)*X0 + StateNoise(:,t);
else
X(:,t) = Phi(1,1)*X(1,t-1) + StateNoise(:,t);
end
if (t>=12)
Y(:,t-11) = A*X(1,t:-1:t-11)' + MeasNoise(:,t-11);
end
end
% 3) DELETE DATA SUCH THAT THERE ARE MISSINGS %
% ------------------------------------------- %
% First row is observed at all time points. The second rows is observed at
% every 4th. time point and the third row is observed at every 12th. time
% point.
for j = 1:3
Y(2,j:4:end) = NaN;
end
for j = 1:11
Y(3,j:12:end) = NaN;
end
% 4) RUN THE KALMAN FILTER WITH AUGMENTED STATE VECTOR %
% ---------------------------------------------------- %
% Initializing matrices.
AugmentedStates = 12;
X_predicted = zeros(AugmentedStates,T); % Predicted states.
X_filtered = zeros(AugmentedStates,T); % Updated states.
P_predicted = zeros(AugmentedStates,AugmentedStates,T); % Predicted variances.
P_filtered = zeros(AugmentedStates,AugmentedStates,T); % Filtered variances.
KG = zeros(AugmentedStates,NumberOfObs,T); % Kalman gains.
Q = eye(AugmentedStates); % State noise for augmented states.
Res = zeros(NumberOfObs,T); % Residuals.
% Initial values:
X_zero = ones(AugmentedStates,1);
P_zero = eye(AugmentedStates);
for t = 1:T
% Initialisation of the Kalman filter.
if (t == 1)
X_predicted(:,t) = Phi * X_zero; % Initial predicted state, dimension (AugmentedStates x 1).
P_predicted(:,:,t) = Phi * P_zero * Phi'+ Q; % Initial variance, dimension (AugmentedStates x AugmentedStates).
else
% #1 Prediction step.
X_predicted(:,t) = Phi * X_filtered(:,t-1); % Predicted state, dimension (AugmentedStates x 1).
P_predicted(:,:,t) = Phi * P_filtered(:,:,t-1) * Phi'+ Q; % Variance of prediction, dimension (AugmentedStates x AugmentedStates).
end
% If there is missings the corresponding entries in A and Y is set to
% zero.
if sum(isnan(Y(:,t)))>0
temp = find(isnan(Y(:,t)));
Y(temp,t) = 0;
A(temp,:) = 0;
end
% #3 Calculate the Kalman Gains and save them in the matrix KG.
K = (P_predicted(:,:,t) * A')/(A * P_predicted(:,:,t) * A' + R); % Kalman Gains, dimension (AugmentedStates x ObservedProcesses).
KG(:,:,t) = K;
% Residuals.
Res(:,t) = Y(:,t) - A*X_predicted(:,t);
% #3 Update step.
X_filtered(:,t) = X_predicted(:,t) + K * Res(:,t); % Updated state (AugmentedStates x 1).
P_filtered(:,:,t) = (eye(AugmentedStates) - K * A) * P_predicted(:,:,t); % Updated variance, dimension (AugmentedStatesstates x AugmentedStates).
% Measurement Matrix, A, is recreated.
A = [-0.40, -0.15, zeros(1,10);
0.25, 0.25, 0.25, 0.25, zeros(1,8);
ones(1,12)];
end
Aha. I realized what the bug is. The state transition matrix (Phi) in the matlab code is not correct. That is, in the current matlab code it is:
% State transition matrix, Phi.
Phi = zeros(12,12);
Phi(1,1) = 0.80;
Phi(2:end,1) = 1
And it should actually look like this:
% State transition matrix, Phi.
Phi = zeros(12,12);
Phi(1,1) = 0.80;
Phi(2:end,1:end-1) = eye(11)
Without this adjustment the state equation in the filter does not make any sense! I have implemented this in the code. And just to show that it actually works, the plot now looks like this:
... Or at least: The performance of the filter has been improved!