Why is SVD applied on Linear Regression

Why is SVD applied on Linear Regression - linear-regression

I cannot understand on these slides why is the SVD applied to the Least Square Problem?
And then it follows this:
And here I don't understand why was the Derivative of the Residuals taken, and is it the Idea in that graph to take the Projection of y to minimize the error?

Here is my humble trial to explain this...
The first slide does not explain yet how SVD is related to LS. But it says that whenever X is a "standard" matrix, one can transform the problem with a Singular matrix (only diagonal elements are not null) - which is convenient for computation.
Slide 2 shows the computation to be done using the singular matrix.
Explanation are on slide 3 : minimizing the norm of r is equivalent to minimizing its square which is the RSS (because x -> x*x is an increasing function for x>0). Minimizing RSS: same as minimizing any "good" function, you derivate it, and then equal the derivative to 0.

Related

Very small numerical issues with hessian symmetry and sparse command

I am using IPOPT in MATLAB to run an optimization and I am running into some issues where it says:
Hessian must be an n x n sparse, symmetric and lower triangular matrix
with row indices in increasing order, where n is the number of variables.
After looking at my Hessian Matrix, I found that the non-symmetric elements it is complaining about are very close, here is an example:
H(k,j) = 2.956404205984938
H(j,k) = 2.956404205984939
Obviously these elements are close enough and there are some numerical round-off issues or something of the like. Also, when I call MATLABs issymmetric function with H as an input, I get false. Is there a way to forget about these very small differences in symmetry?
A little more info:
I am using an optimized matlabFunction to actually calculate the entire hessian (H), then I did some postprocessing before passing it to IPOPT:
H = tril(H);
H = sparse(H);
The tril command generates a lower triangular matrix, so these numeral differences should not come into play. So, the issue might be that it is complaining that the sparse command passes back increasing column indices and not increasing row indices. Is there a way to change this so that it passes back the sparse matrix in increasing row indices?

If H is very close to symmetric but not quite, and you need to force it to be exactly symmetric, a standard way to do this would be to say H = (H+H')./2.

What is benefit to use SVD for solving Ax=b

I have a linear equation such as
Ax=b
where A is full rank matrix which its size is 512x512. b is a vector of 512x1. x is unknown vector. I want to find x, hence, I have some options for doing that
1.Using the normal way
inv(A)*b
2.Using SVD ( Singular value decomposition)
[U S V]=svd(A);
x = V*(diag(diag(S).^-1)*(U.'*b))
Both methods give the same result. So, what is benefit of using SVD to solve Ax=b, especially in the case A is a 2D matrix?

Welcome to the world of numerical methods, let me be your guide.
You, as a new person in this world wonders, "Why would I do something this difficult with this SVD stuff instead of the so commonly known inverse?! Im going to try it in Matlab!"
And no answer was found. That is, because you are not looking at the problem itself! The problems arise when you have an ill-conditioned matrix. Then the computing of the inverse is not possible numerically.
example:
A=[1 1 -1;
1 -2 3;
2 -1 2];
try to invert this matrix using inv(A). Youll get infinite.
That is, because the condition number of the matrix is very high (cond(A)).
However, if you try to solve it using SVD method (b=[1;-2;3]) you will get a result. This is still a hot research topic. Solving Ax=b systems with ill condition numbers.
As #Stewie Griffin suggested, the best way to go is mldivide, as it does a couple of things behind it.
(yeah, my example is not very good because the only solution of X is INF, but there is a way better example in this youtube video)

inv(A)*b has several negative sides. The main one is that it explicitly calculates the inverse of A, which is both time demanding, and may result in inaccuracies if values vary by many orders of magnitude.
Although it might be better than inv(A)*b, using svd is not the "correct" approach here. The MATLAB-way to do this is using mldivide, \. Using this, MATLAB chooses the best algorithm to solve the linear system based on its properties (Hermation, upper Hessenberg, real and positive diagonal, symmetric, diagonal, sparse etc.). Often, the solution will be a LU-triangulation with partial permutation, but it varies. You'll have a hard time beating MATLABs implementation of mldivide, but using svd might give you some more insight of the properties of the system if you actually investigates U, S, V. If you don't want to do that, do with mldivide.

Rotate a basis to align to vector

I have a matrix M of size NxP. Every P columns are orthogonal (M is a basis). I also have a vector V of size N.
My objective is to transform the first vector of M into V and to update the others in order to conservate their orthogonality. I know that the origins of V and M are the same, so it is basically a rotation from a certain angle. I assume we can find a matrix T such that T*M = M'. However, I can't figure out the details of how to do it (with MATLAB).
Also, I know there might be an infinite number of transforms doing that, but I'd like to get the simplest one (in which others vectors of M approximately remain the same, i.e no rotation around the first vector).
A small picture to illustrate. In my actual case, N and P can be large integers (not necessarily 3):
Thanks in advance for your help!
[EDIT] Alternative solution to Gram-Schmidt (accepted answer)
I managed to get a correct solution by retrieving a rotation matrix R by solving an optimization problem minimizing the 2-norm between M and R*M, under the constraints:
V is orthogonal to R*M[1] ... R*M[P-1] (i.e V'*(R*M[i]) = 0)
R*M[0] = V
Due to the solver constraints, I couldn't indicate that R*M[0] ... R*M[P-1] are all pairwise orthogonal (i.e (R*M)' * (R*M) = I).
Luckily, it seems that with this problem and with my solver (CVX using SDPT3), the resulting R*M[0] ... R*M[P-1] are also pairwise orthogonal.

I believe you want to use the Gram-Schmidt process here, which finds an orthogonal basis for a set of vectors. If V is not orthogonal to M[0], you can simply change M[0] to V and run Gram-Schmidt, to arrive at an orthogonal basis. If it is orthogonal to M[0], instead change another, non-orthogonal vector such as M[1] to V and swap the columns to make it first.
Mind you, the vector V needs to be in the column space of M, or you will always have a different basis than you had before.
Matlab doesn't have a built-in Gram-Schmidt command, although you can use the qr command to get an orthogonal basis. However, this won't work if you need V to be one of the vectors.

Option # 1 : if you have some vector and after some changes you want to rotate matrix to restore its orthogonality then, I believe, this method should work for you in Matlab
http://www.mathworks.com/help/symbolic/mupad_ref/numeric-rotationmatrix.html
(edit by another user: above link is broken, possible redirect: Matrix Rotations and Transformations)
If it does not, then ...
Option # 2 : I did not do this in Matlab but a part of another task was to find Eigenvalues and Eigenvectors of the matrix. To achieve this I used SVD. Part of SVD algorithm was Jacobi Rotation. It says to rotate the matrix until it is almost diagonalizable with some precision and invertible.
https://math.stackexchange.com/questions/222171/what-is-the-difference-between-diagonalization-and-orthogonal-diagonalization
Approximate algorithm of Jacobi rotation in your case should be similar to this one. I may be wrong at some point so you will need to double check this in relevant docs :
1) change values in existing vector
2) compute angle between actual and new vector
3) create rotation matrix and ...
put Cosine(angle) to diagonal of rotation matrix
put Sin(angle) to the top left corner of the matric
put minus -Sin(angle) to the right bottom corner of the matrix
4) multiple vector or matrix of vectors by rotation matrix in a loop until your vector matrix is invertible and diagonalizable, ability to invert can be calculated by determinant (check for singularity) and orthogonality (matrix is diagonalized) can be tested with this check - if Max value in LU matrix is less then some constant then stop rotation, at this point new matrix should contain only orthogonal vectors.
Unfortunately, I am not able to find exact pseudo code that I was referring to in the past but these links may help you to understand Jacobi Rotation :
http://www.physik.uni-freiburg.de/~severin/fulltext.pdf
http://web.stanford.edu/class/cme335/lecture7.pdf
https://www.nada.kth.se/utbildning/grukth/exjobb/rapportlistor/2003/rapporter03/maleko_mercy_03003.pdf

Numerical Instability Kalman Filter in MatLab

I am trying to run a standard Kalman Filter algorithm to calculate likelihoods, but I keep getting a problema of a non positive definite variance matrix when calculating normal densities.
I've researched a little and seen that there may be in fact some numerical instabitlity; tried some numerical ways to avoid a non-positive definite matrix, using both choleski decomposition and its variant LDL' decomposition.
I am using MatLab.
Does anyone suggest anything?
Thanks.

I have encountered what might be the same problem before when I needed to run a Kalman filter for long periods but over time my covariance matrix would degenerate. It might just be a problem of losing symmetry due to numerical error. One simple way to enforce your covariance matrix (let's call it P) to remain symmetric is to do:
P = (P + P')/2 # where P' is transpose(P)
right after estimating P.

post your code.
As a rule of thumb, if the model is not accurate and the regularization (i.e. the model noise matrix Q) is not sufficiently "large" an underfitting will occur and the covariance matrix of the estimator will be ill-conditioned. Try fine tuning your Q matrix.

The Kalman Filter implemented using the Joseph Form is known to be numerically unstable, as any old timer who once worked with single precision implementation of the filter can tell. This problem was discovered zillions of years ago and prompt a lot of research in implementing the filter in a stable manner. Probably the best well-known implementation is the UD, where the Covariance matrix is factorized as UDU' and the two factors are updated and propagated using special formulas (see Thoronton and Bierman). U is an upper diagonal matrix with "1" in its diagonal, and D is a diagonal matrix.

Determinants of huge matrices in MATLAB

from a simulation problem, I want to calculate complex square matrices on the order of 1000x1000 in MATLAB. Since the values refer to those of Bessel functions, the matrices are not at all sparse.
Since I am interested in the change of the determinant with respect to some parameter (the energy of a searched eigenfunction in my case), I overcome the problem at the moment by first searching a rescaling factor for the studied range and then calculate the determinants,
result(k) = det(pre_factor*Matrix{k});
Now this is a very awkward solution and only works for matrix dimensions of, say, maximum 500x500.
Does anybody know a nice solution to the problem? Interfacing to Mathematica might work in principle but I have my doubts concerning feasibility.
Thank you in advance
Robert
Edit: I did not find a convient solution to the calculation problem since this would require changing to a higher precision. Instead, I used that
ln det M = trace ln M
which is, when I derive it with respect to k
A = trace(inv(M(k))*dM/dk)
So I at least had the change of the logarithm of the determinant with respect to k. From the physical background of the problem I could derive constraints on A which in the end gave me a workaround valid for my problem. Unfortunately I do not know if such a workaround could be generalized.

You should realize that when you multiply a matrix by a constant k, then you scale the determinant of the matrix by k^n, where n is the dimension of the matrix. So for n = 1000, and k = 2, you scale the determinant by
>> 2^1000
ans =
1.07150860718627e+301
This is of course a huge number, so you might expect that it should fail, since in double precision, MATLAB will only represent floating point numbers as large as realmax.
>> realmax
ans =
1.79769313486232e+308
There is no need to do all the work of recomputing that determinant, not that computing the determinant of a huge matrix like that is a terribly well-posed problem anyway.

If speed is not a concern, you may want to use det(e^A) = e^(tr A) and take as A some scaling constant times your matrix (so that A - I has spectral radius less than one).
EDIT: In MatLab, the log of a matrix (logm) is calculated via trigonalization. So it is better for you to compute the eigenvalues of your matrix and multiply them (or better, add their logarithm). You did not specify whether your matrix was symmetric or not: if it is, finding eigenvalues are easier than if it is not.

You said the current value of the determinant is about 10^-300.
Are you trying to get the determinant at a certain value, say 1? If so, rescaling is awkward: the matrix you are considering is ill-conditioned, and, considering the precision of the machine, you should consider the output determinant to be zero. It is impossible to get a reliable inverse in other words.
I would suggest to modify the columns or lines of the matrix rather than rescale it.
I used R to make a small test with a random matrix (random normal values), it seems the determinant should be clearly non-zero.
> n=100
> M=matrix(rnorm(n**2),n,n)
> det(M)
[1] -1.977380e+77
> kappa(M)
[1] 2318.188

This is not strictly a matlab solution, but you might want to consider using Mahout. It's specifically designed for large-scale linear algebra. (1000x1000 is no problem for the scales it's used to.)
You would call into java to pass data to/from Mahout.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Why is SVD applied on Linear Regression - linear-regression

I cannot understand on these slides why is the SVD applied to the Least Square Problem? And then it follows this: And here I don't understand why was the Derivative of the Residuals taken, and is it the Idea in that graph to take the Projection of y to minimize the error?

Related

Very small numerical issues with hessian symmetry and sparse command

What is benefit to use SVD for solving Ax=b

Rotate a basis to align to vector

Numerical Instability Kalman Filter in MatLab

Determinants of huge matrices in MATLAB

Categories

Resources