Minimizing error of a formula in MATLAB (Least squares?) - matlab

I'm not too familiar with MATLAB or computational mathematics so I was wondering how I might solve an equation involving the sum of squares, where each term involves two vectors- one known and one unknown. This formula is supposed to represent the error and I need to minimize the error. I think I'm supposed to use least squares but I don't know too much about it and I'm wondering what function is best for doing that and what arguments would represent my equation. My teacher also mentioned something about taking derivatives and he formed a matrix using derivatives which confused me even more- am I required to take derivatives?

The problem that you must be trying to solve is
Min u'u = min \sum_i u_i^2, u=y-Xbeta, where u is the error, y is the vector of dependent variables you are trying to explain, X is a matrix of independent variables and beta is the vector you want to estimate.
Since sum u_i^2 is diferentiable (and convex), you can evaluate the minimal of this expression calculating its derivative and making it equal to zero.
If you do that, you find that beta=inv(X'X)X'y. This maybe calculated using the matlab function regress http://www.mathworks.com/help/stats/regress.html or writing this formula in Matlab. However, you should be careful how to evaluate the inverse (X'X) see Most efficient matrix inversion in MATLAB

Related

How to compute inverse of a matrix accurately?

I'm trying to compute an inverse of a matrix P, but if I multiply inv(P)*P, the MATLAB does not return the identity matrix. It's almost the identity (non diagonal values in the order of 10^(-12)). However, in my application I need more precision.
What can I do in this situation?
Only if you explicitly need the inverse of a matrix you use inv(), otherwise you just use the backslash operator \.
The documentation on inv() explicitly states:
x = A\b is computed differently than x = inv(A)*b and is recommended for solving systems of linear equations.
This is because the backslash operator, or mldivide() uses whatever method is most suited for your specific matrix:
x = A\B solves the system of linear equations A*x = B. The matrices A and B must have the same number of rows. MATLABĀ® displays a warning message if A is badly scaled or nearly singular, but performs the calculation regardless.
Just so you know what algorithm MATLAB chooses depending on your input matrices, here's the full algorithm flowchart as provided in their documentation
The versatility of mldivide in solving linear systems stems from its ability to take advantage of symmetries in the problem by dispatching to an appropriate solver. This approach aims to minimize computation time. The first distinction the function makes is between full (also called "dense") and sparse input arrays.
As a side-note about error of order of magnitude 10^(-12), besides the above mentioned inaccuracy of the inv() function, there's floating point accuracy. This post on MATLAB issues on it is rather insightful, with a more general computer science post on it here. Basically, if you are computing numerics, don't worry (too much at least) about errors 12 orders of magnitude smaller.
You have what's called an ill-conditioned matrix. It's risky to try to take the inverse of such a matrix. In general, taking the inverse of anything but the smallest matrices (such as those you see in an introduction to linear algebra textbook) is risky. If you must, you could try taking the Moore-Penrose pseudoinverse (see Wikipedia), but even that is not foolproof.

Kalman Filter prediction error estimation: why two constants and transposed matrices?

Hy everybody!
I have found a very informative and good tutorial for understanding Kalman Filter. In the end, I would like to understand the Extended Kalman Filter in the second half of the tutorial, but first I want to solve any mystery.
Kalman Filter tutorial Part 6.
I think we use constant for prediction error, because the new value in a certain k time moment can be different, than the previous. But why we use two constants? It says:
we multiply twice by a because the prediction error pk is itself a squared error; hence, it is scaled by the square of the coefficient associated with the state value xk.
I can't see the meaning of this sentence.
And later in the EKF he creates a matrix and a transposed matrix from that (in Part 12). Why the transposed one?
Thanks a lot.
The Kalman filter maintains error estimates as variances, which are squared standard deviations. When you multiply a Gaussian random variable N(x,p) by a constant a, you increase its standard deviation by a factor of a, which means its variance increases as a^2. He's writing this as a*p*a to maintain a parallel structure when he converts from a scalar state to a matrix state. If you have an error coviarance matrix P representing state x, then the error covariance of Ax is APA^T as he shows in part 12. It's a convenient shorthand for doing that calculation. You can expand the matrix multiplication by hand to see that the coefficients all go in the right place.
If any of this is fuzzy to you, I strongly recommend you read a tutorial on Gaussian random variables. Between x and P in a Kalman filter, your success depends a lot more on you understanding P than x, even though most people get started by being interested in improving x.

how to convert a matrix to a diagonally dominant matrix using pivoting in Matlab

Hi I am trying to solve a linear system of the following type:
A*x=b,
where A is the coefficient matrix,
x is the vectors of unknowns and
b is the vector of solution.
The coefficient matrix (A) is a n-by-n sparse matrix, with even zeros in the diagonal. In order to solve this system in an accurate way I am using an iterative method in Matlab called bicgstab (Biconjugate gradients stabilized method).
This coefficient matrix (A) has a
det(A)=-4.1548e-05 and a rcond(A)= 1.1331e-04.
Therefore the matrix is ill-conditioned. I first try to perform a scaling and the results where:
det(A)= -1.2612e+135 but the rcond(A)=5.0808e-07...
Therefore the matrix is still ill-conditioned... I verify and the sum of all absolute value of the non-diagonal elements where 163.60 and the sum of all absolute value of the diagonal elements where 32.49... Therefore the matrix of coefficient is not diagonally dominant and will not converge using my function bicgstab...
I am looking for someone that can help me with performing a pivoting to the coefficient matrix (A) so it can be diagonally dominant. Or any advice to solve this problem....
Thanks for the help.
First there should be quite a few things noted here:
Don't use the determinant to estimate the "amount of singularity" of your matrix. The determinant is the product of all the eigenvalues of your matrix, and therefore its scaling can be wildly misleading compared to a much better measure like the condition number, leading to the next point..
your conditioning (according to rcond) isn't that bad, are you working with single or double precision? Large problems can routinely get condition numbers in this range and still be quite solvable, but of course this depends on a very complicated interaction of many factors, of which the condition number plays only a small part. This leads to another complicated point:
Diagonal dominance may not help you at all here. BiCGStab as far as I know does not require diagonal dominance for its convergence, and also I don't think diagonal dominance is known even to help it. Diagonal dominance is usually an assumption made by other iterative methods such as the Jacobi method or Gauss-Seidel. Actually the convergence behavior of BiCGStab is not very well understood at all, and it is usually only used when memory is a very severe problem but conjugate gradients is not applicable.
If you are really interested in using a Krylov method (such as BiCGStab) to solve your problem, then you generally need to have more understanding of where your matrix is coming from so that you can choose a sensible preconditioner.
So this calls for a bit more information. Do you know more about this matrix? Is it arising from some kind of physical problem? Do you know for example if it is symmetric or positive definite (I will assume not both because you are not using CG).
Let me end with some actionable advice which is very generic, and so not necessarily optimal:
If memory is not an issue, consider using restarted GMRES instead of BiCGStab. My experience is that GMRES has much more robust convergence.
Try an approximate factorization preconditioner such as ILU. MATLAB has a function for this built in.

Numerical Instability Kalman Filter in MatLab

I am trying to run a standard Kalman Filter algorithm to calculate likelihoods, but I keep getting a problema of a non positive definite variance matrix when calculating normal densities.
I've researched a little and seen that there may be in fact some numerical instabitlity; tried some numerical ways to avoid a non-positive definite matrix, using both choleski decomposition and its variant LDL' decomposition.
I am using MatLab.
Does anyone suggest anything?
Thanks.
I have encountered what might be the same problem before when I needed to run a Kalman filter for long periods but over time my covariance matrix would degenerate. It might just be a problem of losing symmetry due to numerical error. One simple way to enforce your covariance matrix (let's call it P) to remain symmetric is to do:
P = (P + P')/2 # where P' is transpose(P)
right after estimating P.
post your code.
As a rule of thumb, if the model is not accurate and the regularization (i.e. the model noise matrix Q) is not sufficiently "large" an underfitting will occur and the covariance matrix of the estimator will be ill-conditioned. Try fine tuning your Q matrix.
The Kalman Filter implemented using the Joseph Form is known to be numerically unstable, as any old timer who once worked with single precision implementation of the filter can tell. This problem was discovered zillions of years ago and prompt a lot of research in implementing the filter in a stable manner. Probably the best well-known implementation is the UD, where the Covariance matrix is factorized as UDU' and the two factors are updated and propagated using special formulas (see Thoronton and Bierman). U is an upper diagonal matrix with "1" in its diagonal, and D is a diagonal matrix.

Find the best linear combination of two vectors resembling a third vector; implementing constraints

I have a vector z that I want to approximate by a linear combination of two other vectors (x,y) such that the residual of a*x+b*y and z is minimized. Also I want to keep one coefficient (a) positive for the fitting.
Any suggestions which command may help?
Thanks!
If you didn't have a bound on one of the coefficients, your problem could have been viewed as multiple regression (solved in matlab by regress). Since one of the coefficients is bounded, you should use lsqlin. This function solves least squares problems with bounds or inequalities on the coefficients. Don't forget to include an all-ones intercept predictor if your signals are not centered.
I think that fminsearch would be an overshoot in this case, since lsqlin does exactly what you want.
You have to define a function, which desribes the cost. The lower the cost is, the better the solution. The output must be a single scalar, e.g. the norm of the difference.
To avoid negative values for x, add something like (x<0)*inf. This rejects every solution with a negative x.
If done so, use fminsearch for a numeric solution.