Kalman Filter prediction error estimation: why two constants and transposed matrices? - prediction

Hy everybody!
I have found a very informative and good tutorial for understanding Kalman Filter. In the end, I would like to understand the Extended Kalman Filter in the second half of the tutorial, but first I want to solve any mystery.
Kalman Filter tutorial Part 6.
I think we use constant for prediction error, because the new value in a certain k time moment can be different, than the previous. But why we use two constants? It says:
we multiply twice by a because the prediction error pk is itself a squared error; hence, it is scaled by the square of the coefficient associated with the state value xk.
I can't see the meaning of this sentence.
And later in the EKF he creates a matrix and a transposed matrix from that (in Part 12). Why the transposed one?
Thanks a lot.

The Kalman filter maintains error estimates as variances, which are squared standard deviations. When you multiply a Gaussian random variable N(x,p) by a constant a, you increase its standard deviation by a factor of a, which means its variance increases as a^2. He's writing this as a*p*a to maintain a parallel structure when he converts from a scalar state to a matrix state. If you have an error coviarance matrix P representing state x, then the error covariance of Ax is APA^T as he shows in part 12. It's a convenient shorthand for doing that calculation. You can expand the matrix multiplication by hand to see that the coefficients all go in the right place.
If any of this is fuzzy to you, I strongly recommend you read a tutorial on Gaussian random variables. Between x and P in a Kalman filter, your success depends a lot more on you understanding P than x, even though most people get started by being interested in improving x.

Related

How does covariance matrix (P) in Kalman filter get updated in relation to measurements and state estimate?

I am in the midst of implementing a Kalman filter based AHRS in C++. There's something rather strange to me in the equations of the filter.
I can't find the part where the P (covariance) matrix is actually updated to represent uncertainty of predictions.
During the "predict" step P estimate is calculated from its previous value, A and Q. From what I understand A (system matrix) and Q (covariance of noise) are constant. Then during "Correct" P is calculated from K, H and predicted P. H (observation matrix) is constant, so the only variable that affects P is K (Kalman gain). But K is calculated from predicted P, H and R (observation noise) that are either constants or the P itself. So where is the part of the equations that makes P relate to x? To me it seems like P is recursively looping here depending only on the constants and initial value of P. This doesn't make any sense. What am I missing?
You are not missing anything.
It can come as a surprise to realise that, indeed, the state error covariance matrix (P) in a linear kalman filter does not depend on the the data (z). One way to lessen the surprise is to note what the covariance is saying: it is how uncertain you should be in the estimated state, given that the models you are using (effectively A,Q and H,R) are accurate. It is not saying: this is the uncertainty. By judicious tweaking of Q and R you could change P arbitrarily. In particular you should not interpret P as a 'quality' figure, but rather look at the observation residuals. You could, for example, make P smaller by reducing R. However then the residuals would be larger compared with their computed sds.
When the observations come in at a constant rate, and always the same set of observations, P will tend to a steady state that could, in principal, be computed ahead of time.
However there is no difficulty in applying the kalman filter when you have varying times between observations and varying sets of observations at each time, for example if you have various sensor systems with different sampling periods. In this case you will see more variation in P, though again in principal this could be computed ahead of time.
Further the kalman filter can be extended (in various ways, eg the extended kalman filter and the unscented kalman filter) to handle non linear dynamics and non linear observations. In this case because the transition matrix (A) and the observation model matrix (H) have a state dependency, so too will P.

Kalman filter +process noise covariance

I have been working on Kalman filters in recent times. There are surely a lot of terms involved , which need to be understood thoroughly to implement it and optimize it well. I have no clear understanding when it comes to deciding upon the process error covariances, process noise covariance and the measurement noise covariance.
Error covariances are still okay to work with as they basically define the uncertainity in the actual state and the assumed/estimated state and the correlation between the uncertainities of the state vector components. These covariances are uodated on each successive iteration and gradually merge to a minimum value as the estimates become more accurate in comparison to the actual state over time.
For process noise and measurement noise covariance matrices, I started with an assumed value of 3 x 3 identity matrix as Q (HIT AND TRIAL). The results didn't look so promising so I tried plugging in this matrix:
Q = [(T^5)/20, (T^4)/8, (T^3)/6;
(T^4)/8, (T^3)/3, (T^2)/2;
(T^3)/6, (T^2)/2, T];
(Found this matrix in some paper, T is the sampling time)
This seemed to work fine and provide good results. It worked, but the reasoning behind it isn't clear to me. Also tried various other matrices like:
Q = 0.0001*diag([0.1 0.1 0.1]);
Even this seemed to give good results. I read at a few places on the net, that choosing an overly large value for Q will result in a misbehaved filter.
Kindly help me on how to choose the 'Q' matrix. Are there some guidelines for the same.
Coming to the measurement noise covariance matrix R, reading up a bit on the net, I decided to choose a calculated for noise covariance as measurement noise covariance. This again resulted in inaccurate results. So, I had to give in to the hit and trial method and ended up choosing R as [1]
This works fine for now, nut again I'm not satisfied by this trial and error method of choosing values.
It'll be great, If someone can help me with the clarifications.
Thanks
If you are still interested in the question, here is the answer.
While real object dynamics, that you are tracking with Kalman filter, correspond dynamics of your filter (that is written in matrix A), you don't need covariance matrix Q at all. In that case gain coefficients of your filter decrease from step to step. That's right because filter knows your object from step to step better and better and finally doesn't need measurements at all.
But! If object dynamics differ from matrix A, than filter lag error increases from step to step on the same reason. Matrix Q solves this problem. Q states for expected difference between real object dynamics and matrix A.
For instance, matrix A equals
(1 T; 0 1)
for two-dimensional state. If the object you are tracking accelerates, expected difference equals
(T*T/2; T)*acceleration
Therefore, additional prediction error equals
(T^4/4 T^3/2; T^3/2 T^2)*acceleration^2
That is matrix Q. I hope it helps you.

Numerical Instability Kalman Filter in MatLab

I am trying to run a standard Kalman Filter algorithm to calculate likelihoods, but I keep getting a problema of a non positive definite variance matrix when calculating normal densities.
I've researched a little and seen that there may be in fact some numerical instabitlity; tried some numerical ways to avoid a non-positive definite matrix, using both choleski decomposition and its variant LDL' decomposition.
I am using MatLab.
Does anyone suggest anything?
Thanks.
I have encountered what might be the same problem before when I needed to run a Kalman filter for long periods but over time my covariance matrix would degenerate. It might just be a problem of losing symmetry due to numerical error. One simple way to enforce your covariance matrix (let's call it P) to remain symmetric is to do:
P = (P + P')/2 # where P' is transpose(P)
right after estimating P.
post your code.
As a rule of thumb, if the model is not accurate and the regularization (i.e. the model noise matrix Q) is not sufficiently "large" an underfitting will occur and the covariance matrix of the estimator will be ill-conditioned. Try fine tuning your Q matrix.
The Kalman Filter implemented using the Joseph Form is known to be numerically unstable, as any old timer who once worked with single precision implementation of the filter can tell. This problem was discovered zillions of years ago and prompt a lot of research in implementing the filter in a stable manner. Probably the best well-known implementation is the UD, where the Covariance matrix is factorized as UDU' and the two factors are updated and propagated using special formulas (see Thoronton and Bierman). U is an upper diagonal matrix with "1" in its diagonal, and D is a diagonal matrix.

Creating a 1D Second derivative of gaussian Window

In MATLAB I need to generate a second derivative of a gaussian window to apply to a vector representing the height of a curve. I need the second derivative in order to determine the locations of the inflection points and maxima along the curve. The vector representing the curve may be quite noise hence the use of the gaussian window.
What is the best way to generate this window?
Is it best to use the gausswin function to generate the gaussian window then take the second derivative of that?
Or to generate the window manually using the equation for the second derivative of the gaussian?
Or even is it best to apply the gaussian window to the data, then take the second derivative of it all? (I know these last two are mathematically the same, however with the discrete data points I do not know which will be more accurate)
The maximum length of the height vector is going to be around 100-200 elements.
Thanks
Chris
I would create a linear filter composed of the weights generated by the second derivative of a Gaussian function and convolve this with your vector.
The weights of a second derivative of a Gaussian are given by:
Where:
Tau is the time shift for the filter. If you are generating weights for a discrete filter of length T with an odd number of samples, set tau to zero and allow t to vary from [-T/2,T/2]
sigma - varies the scale of your operator. Set sigma to a value somewhere between T/6. If you are concerned about long filter length then this can be reduced to T/4
C is the normalising factor. This can be derived algebraically but in practice I always do this numerically after calculating the filter weights. For unity gain when smoothing periodic signals, I will set C = 1 / sum(G'').
In terms of your comment on the equivalence of smoothing first and taking a derivative later, I would say it is more involved than that. As which derivative operator would you use in the second step? A simple central difference would not yield the same results.
You can get an equivalent (but approximate) response to a second derivative of a Gaussian by filtering the data with two Gaussians of different scales and then taking the point-wise differences between the two resulting vectors. See Difference of Gaussians for that approach.

Minimizing error of a formula in MATLAB (Least squares?)

I'm not too familiar with MATLAB or computational mathematics so I was wondering how I might solve an equation involving the sum of squares, where each term involves two vectors- one known and one unknown. This formula is supposed to represent the error and I need to minimize the error. I think I'm supposed to use least squares but I don't know too much about it and I'm wondering what function is best for doing that and what arguments would represent my equation. My teacher also mentioned something about taking derivatives and he formed a matrix using derivatives which confused me even more- am I required to take derivatives?
The problem that you must be trying to solve is
Min u'u = min \sum_i u_i^2, u=y-Xbeta, where u is the error, y is the vector of dependent variables you are trying to explain, X is a matrix of independent variables and beta is the vector you want to estimate.
Since sum u_i^2 is diferentiable (and convex), you can evaluate the minimal of this expression calculating its derivative and making it equal to zero.
If you do that, you find that beta=inv(X'X)X'y. This maybe calculated using the matlab function regress http://www.mathworks.com/help/stats/regress.html or writing this formula in Matlab. However, you should be careful how to evaluate the inverse (X'X) see Most efficient matrix inversion in MATLAB