Calculating an inverse matrix in Matlab - matlab

I'm running an optimization algorithm that requires calculation of the inverse of a matrix. The goal of the algorithm is to eliminate negative values from the matrix A and obtain the new matrix B. Basically, I start with known square matrices B and C of the same size.
I start by calculating the matrix A which is equal to:
A = B^-1 * C
Or in Matlab:
A = B\C;
I use this because Matlab told me B\C is more accurate than inv(B)*C.
The negative values in A are then divided by two and A is then normalised so that it's rows have length of 1. Using this new A, I calculate a new B with:
(1/N) * A * C' = B^-1
where N is just a scaling factor (# of columns in A). This new B would then be used again in the first step and these iterations continue until the negatives in A are gone.
My problem is I have to calculate B from the second equation and then normalise it.
invB = (1/N)*A*C';
B = inv(invB);
I've been calculating B using inv(B^-1) but after a few iterations I start getting messages that B^-1 is "close to singular or badly scaled."
This algorithm actually works for smaller matrices (around 70x70) but when it gets up to about 500x500 I start getting these messages.
Are there any better ways to calculate inv(B^-1)?

You should definitely head warnings about singular matrices. Results in numerical linear algebra tend to break down as you move toward matrices with high condition numbers. The underlying idea is if
A*b_1 = c
and we're actually solving the problem (because we are using approximate numbers when we use computers)
(A + matrix error)*b_2 = (c + vector error)
how close are b_1 and b_2 as a function of the matrix and vector errors? When A has small condition number b_1 and b_2 are close. When A has large condition number b_1 and b_2 are not close.
There is an informative piece of analysis you could do on your algorithm. At each iteration, after you've found B, find use Matlab to find the condition number of it. This is
cond(B)
You will likely see the number climb rapidly. This indicates that every time you iterate your algorithm, you should trust your result for B less and less.
Problems like this crop up all the time in numerical mathematics. If you'll be working with numerical algorithms frequently you should take some time to familiarize with the role of condition numbers in the field and preconditioning techniques as mentioned above. My preferred text for this is "Numerical Linear Algebra" by Lloyd Trefethen, but any text on Numerical Algebra should address some of these issues.
Best of luck,
Andrew

The main issue is that your matrix has a high condition number (i.e. really small rcond(B) in your case). This is due to the iterative structure in your algorithm, I guess. As you do each iteration your small singular values get smaller and smaller so your condition number grows exponentially. You should check preconditioning to avoid this kind of behavior.

Related

Rank of matrix contradicts the number of independent columns

I have 50x49 matrix A that has 49 linearly independent columns. However, my software (octave) tells me its rank is 44:
Is it due to some computational error? If so, then how to prevent such errors?
If the software was able to correctly calculate rref(A), then why did it fail with rank(A)? Does it mean that calculating rank(A) is more error prone than calculating rref(A), or vice versa? I mean rref(A) actually tells you the rank, but here's a contradiction.
P.S. I've checked, Python makes the same error.
EDIT 1: Here is the matrix A itself. The first 9 columns were given. The rest was obtained with polynomial features.
EDIT 2: I was able to found a similar issue. Here is 10x10 matrix B of rank 10 (and octave calculates its rank correctly). However, octave says that rank(B * B) = 9 which is impossible.
The distinction between an invertible matrix (i.e. full rank) and a non-invertible one is clear-cut in theory, but not so in practice. A matrix B with large condition number (as in your example) can be inverted, but computing the inverse is numerically unstable. It roughly corresponds to B having a determinant that is "small" (using an appropriate, relative measure of "small"), so the matrix is almost singular. As a result, the inverse matrix will be computed with bad accuracy. In your example B, the condition number (computed with cond) is 2.069e9.
Another way to look at this is: when the condition number is large, it well could be that B is "really" singular, but small numerical errors from previous computations make it look barely non-singular. So you can't be sure.
The rank and rref functions use different algorithms (singular-value decomposition for rank, Gauss-Jordan elimination with partial pivoting for rref). For well-behaved matrices the numerical errors will be small in both cases, and the results will be consistent. But for a bad-conditioned matrix the numerical errors will be large and potentially different in each case, giving inconsistent results.
This is a well known issue with numerical algebra. In general, avoid inverting matrices with large condition number.

Computing the SVD of a rectangular matrix

I have a matrix like M = K x N ,where k is 49152 and is the dimension of the problem and N is 52 and is the number of observations.
I have tried to use [U,S,V]=SVD(M) but doing this I get less memory space.
I found another code which uses [U,S,V]=SVD(COV(M)) and it works well. My questions are what is the meaning of using the COV(M) command inside the SVD and what is the meaning of the resultant [U,S,V]?
Finding the SVD of the covariance matrix is a method to perform Principal Components Analysis or PCA for short. I won't get into the mathematical details here, but PCA performs what is known as dimensionality reduction. If you like a more formal treatise on the subject, you can read up on my post about it here: What does selecting the largest eigenvalues and eigenvectors in the covariance matrix mean in data analysis?. However, simply put dimensionality reduction projects your data stored in the matrix M onto a lower dimensional surface with the least amount of projection error. In this matrix, we are assuming that each column is a feature or a dimension and each row is a data point. I suspect the reason why you are getting more memory occupied by applying the SVD on the actual data matrix M itself rather than the covariance matrix is because you have a significant amount of data points with a small amount of features. The covariance matrix finds the covariance between pairs of features. If M is a m x n matrix where m is the total number of data points and n is the total number of features, doing cov(M) would actually give you a n x n matrix, so you are applying SVD on a small amount of memory in comparison to M.
As for the meaning of U, S and V, for dimensionality reduction specifically, the columns of V are what are known as the principal components. The ordering of V is in such a way where the first column is the first axis of your data that describes the greatest amount of variability possible. As you start going to the second columns up to the nth column, you start to introduce more axes in your data and the variability starts to decrease. Eventually when you hit the nth column, you are essentially describing your data in its entirety without reducing any dimensions. The diagonal values of S denote what is called the variance explained which respect the same ordering as V. As you progress through the singular values, they tell you how much of the variability in your data is described by each corresponding principal component.
To perform the dimensionality reduction, you can either take U and multiply by S or take your data that is mean subtracted and multiply by V. In other words, supposing X is the matrix M where each column has its mean computed and the is subtracted from each column of M, the following relationship holds:
US = XV
To actually perform the final dimensionality reduction, you take either US or XV and retain the first k columns where k is the total amount of dimensions you want to retain. The value of k depends on your application, but many people choose k to be the total number of principal components that explains a certain percentage of your variability in your data.
For more information about the link between SVD and PCA, please see this post on Cross Validated: https://stats.stackexchange.com/q/134282/86678
Instead of [U, S, V] = svd(M), which tries to build a matrix U that is 49152 by 49152 (= 18 GB 😱!), do svd(M, 'econ'). That returns the “economy-class” SVD, where U will be 52 by 52, S is 52 by 52, and V is also 52 by 52.
cov(M) will remove each dimension’s mean and evaluate the inner product, giving you a 52 by 52 covariance matrix. You can implement your own version of cov, called mycov, as
function [C] = mycov(M)
M = bsxfun(#minus, M, mean(M, 1)); % subtract each dimension’s mean over all observations
C = M' * M / size(M, 1);
(You can verify this works by looking at mycov(randn(49152, 52)), which should be close to eye(52), since each element of that array is IID-Gaussian.)
There’s a lot of magical linear algebraic properties and relationships between the SVD and EVD (i.e., singular value vs eigenvalue decompositions): because the covariance matrix cov(M) is a Hermitian matrix, it’s left- and right-singular vectors are the same, and in fact also cov(M)’s eigenvectors. Furthermore, cov(M)’s singular values are also its eigenvalues: so svd(cov(M)) is just an expensive way to get eig(cov(M)) 😂, up to ±1 and reordering.
As #rayryeng explains at length, usually people look at svd(M, 'econ') because they want eig(cov(M)) without needing to evaluate cov(M), because you never want to compute cov(M): it’s numerically unstable. I recently wrote an answer that showed, in Python, how to compute eig(cov(M)) using svd(M2, 'econ'), where M2 is the 0-mean version of M, used in the practical application of color-to-grayscale mapping, which might help you get more context.

Very small numerical issues with hessian symmetry and sparse command

I am using IPOPT in MATLAB to run an optimization and I am running into some issues where it says:
Hessian must be an n x n sparse, symmetric and lower triangular matrix
with row indices in increasing order, where n is the number of variables.
After looking at my Hessian Matrix, I found that the non-symmetric elements it is complaining about are very close, here is an example:
H(k,j) = 2.956404205984938
H(j,k) = 2.956404205984939
Obviously these elements are close enough and there are some numerical round-off issues or something of the like. Also, when I call MATLABs issymmetric function with H as an input, I get false. Is there a way to forget about these very small differences in symmetry?
A little more info:
I am using an optimized matlabFunction to actually calculate the entire hessian (H), then I did some postprocessing before passing it to IPOPT:
H = tril(H);
H = sparse(H);
The tril command generates a lower triangular matrix, so these numeral differences should not come into play. So, the issue might be that it is complaining that the sparse command passes back increasing column indices and not increasing row indices. Is there a way to change this so that it passes back the sparse matrix in increasing row indices?
If H is very close to symmetric but not quite, and you need to force it to be exactly symmetric, a standard way to do this would be to say H = (H+H')./2.

Normalize in Adaboost without numerical error - Matlab

I'm implementing AdaBoost on Matlab. This algorithm requires that in every iteration the weights of each data point in the training set sum up to one.
If I simply use the following normalization v = v / sum(v) I get a vector whose 1-norm is 1 except some numerical error which later leads to the failure of the algorithm.
Is there a matlab function for normalizing a vector so that it's 1-norm is EXACTLY 1?
Assuming you want identical values to be normalised with the same factor, this is not possible. Simple counter example:
v=ones(21,1);
v = v / sum(v);
sum(v)-1
One common way to deal with it, is enforce values sum(v)>=1 or sum(v)<=1 if your algorithm can deal with a derivation to one side:
if sum(v)>1
v=v-eps(v)
end
Alternatively you can try using vpa, but this will drastically increase your computation time.

Determinants of huge matrices in MATLAB

from a simulation problem, I want to calculate complex square matrices on the order of 1000x1000 in MATLAB. Since the values refer to those of Bessel functions, the matrices are not at all sparse.
Since I am interested in the change of the determinant with respect to some parameter (the energy of a searched eigenfunction in my case), I overcome the problem at the moment by first searching a rescaling factor for the studied range and then calculate the determinants,
result(k) = det(pre_factor*Matrix{k});
Now this is a very awkward solution and only works for matrix dimensions of, say, maximum 500x500.
Does anybody know a nice solution to the problem? Interfacing to Mathematica might work in principle but I have my doubts concerning feasibility.
Thank you in advance
Robert
Edit: I did not find a convient solution to the calculation problem since this would require changing to a higher precision. Instead, I used that
ln det M = trace ln M
which is, when I derive it with respect to k
A = trace(inv(M(k))*dM/dk)
So I at least had the change of the logarithm of the determinant with respect to k. From the physical background of the problem I could derive constraints on A which in the end gave me a workaround valid for my problem. Unfortunately I do not know if such a workaround could be generalized.
You should realize that when you multiply a matrix by a constant k, then you scale the determinant of the matrix by k^n, where n is the dimension of the matrix. So for n = 1000, and k = 2, you scale the determinant by
>> 2^1000
ans =
1.07150860718627e+301
This is of course a huge number, so you might expect that it should fail, since in double precision, MATLAB will only represent floating point numbers as large as realmax.
>> realmax
ans =
1.79769313486232e+308
There is no need to do all the work of recomputing that determinant, not that computing the determinant of a huge matrix like that is a terribly well-posed problem anyway.
If speed is not a concern, you may want to use det(e^A) = e^(tr A) and take as A some scaling constant times your matrix (so that A - I has spectral radius less than one).
EDIT: In MatLab, the log of a matrix (logm) is calculated via trigonalization. So it is better for you to compute the eigenvalues of your matrix and multiply them (or better, add their logarithm). You did not specify whether your matrix was symmetric or not: if it is, finding eigenvalues are easier than if it is not.
You said the current value of the determinant is about 10^-300.
Are you trying to get the determinant at a certain value, say 1? If so, rescaling is awkward: the matrix you are considering is ill-conditioned, and, considering the precision of the machine, you should consider the output determinant to be zero. It is impossible to get a reliable inverse in other words.
I would suggest to modify the columns or lines of the matrix rather than rescale it.
I used R to make a small test with a random matrix (random normal values), it seems the determinant should be clearly non-zero.
> n=100
> M=matrix(rnorm(n**2),n,n)
> det(M)
[1] -1.977380e+77
> kappa(M)
[1] 2318.188
This is not strictly a matlab solution, but you might want to consider using Mahout. It's specifically designed for large-scale linear algebra. (1000x1000 is no problem for the scales it's used to.)
You would call into java to pass data to/from Mahout.