Efficient way to solve for X in AX=B in MATLAB when both A and B are big matrices - matlab

I have this problem which requires solving for X in AX=B. A is of the order 15000 x 15000 and is sparse and symmetric. B is 15000 X 7500 and is NOT sparse. What is the fastest way to solve for X?
I can think of 2 ways.
Simplest possible way, X = A\B
Using for loop,
invA = A\speye(size(A))
for i = 1:size(B,2)
X(:,i) = invA*B(:,i);
end
Is there a better way than the above two? If not, which one is best between the two I mentioned?

First things first - never, ever compute inverse of A. That is never sparse except when A is a diagonal matrix. Try it for a simple tridiagonal matrix. That line on its own kills your code - memory-wise and performance-wise. And computing the inverse is numerically less accurate than other methods.
Generally, \ should work for you fine. MATLAB does recognize that your matrix is sparse and executes sparse factorization. If you give a matrix B as the right-hand side, the performance is much better than if you only solve one system of equations with a b vector. So you do that correctly. The only other technical thing you could try here is to explicitly call lu, chol, or ldl, depending on the matrix you have, and perform backward/forward substitution yourself. Maybe you save some time there.
The fact is that the methods to solve linear systems of equations, especially sparse systems, strongly depend on the problem. But in almost any (sparse) case I imagine, factorization of a 15k system should only take a fraction of a second. That is not a large system nowadays. If your code is slow, this probably means that your factor is not that sparse sparse anymore. You need to make sure that your matrix is properly reordered to minimize the fill (added non-zero entries) during sparse factorization. That is the crucial step. Have a look at this page for some tests and explanations on how to reorder your system. And have a brief look at example reorderings at this SO thread.

Since you can answer yourself which of the two is faster, I'll try yo suggest the next options.
Solve it using a GPU. Plenty of details can be found online, including this SO post, a matlab benchmarking of A/b, etc.
Additionally, there's the MATLAB add-on of LAMG (Lean Algebraic Multigrid). LAMG is a fast graph Laplacian solver. It can solve Ax=b in O(m) time and storage.

If your matrix A is symmetric positive definite, then here's what you can do to solve the system efficiently and stably:
First, compute the cholesky decomposition, A=L*L'. Since you have a sparse matrix, and you want to exploit it to accelerate the inversion, you should not apply chol directly, which would destroy the sparsity pattern. Instead, use one of the reordering method described here.
Then, solve the system by X = L'\(L\B)
Finally, if are not dealing with potential complex values, then you can replace all the L' by L.', which gives a bit further acceleration because it's just trying to transpose instead of computing the complex conjugate.
Another alternative would be the preconditioned conjugate gradient method, pcg in Matlab. This one is very popular in practice, because you can trade off speed for accuracy, i.e. give it less number of iterations, and it will give you a (usually pretty good) approximate solution. You also never need to store the matrix A explicitly, but just be able to compute matrix-vector product with A, if your matrix doesn't fit into memory.

If this takes forever to solve in your tests, you are probably going into virtual memory for the solve. A 15k square (full) matrix will require 1.8 gigabytes of RAM to store in memory.
>> 15000^2*8
ans =
1.8e+09
You will need some serious RAM to solve this, as well as the 64 bit version of MATLAB. NO factorization will help you unless you have enough RAM to solve the problem.
If your matrix is truly sparse, then are you using MATLAB's sparse form to store it? If not, then MATLAB does NOT know the matrix is sparse, and does not use a sparse factorization.
How sparse is A? Many people think that a matrix that is half full of zeros is "sparse". That would be a waste of time. On a matrix that size, you need something that is well over 99% zeros to truly gain from a sparse factorization of the matrix. This is because of fill-in. The resulting factorized matrix is almost always nearly full otherwise.
If you CANNOT get more RAM (RAM is cheeeeeeeeep you know, certainly once you consider the time you have wasted trying to solve this) then you will need to try an iterative solver. Since these tools do not factorize your matrix, if it is truly sparse, then they will not go into virtual memory. This is a HUGE savings.
Since iterative tools often require a preconditioner to work as well as possible, it can take some study to find the best preconditioner.

Related

MATLAB eig vs eigs vs svd vs svds

I'm running an MCMC scheme, in which I calculate a lot of eigenvalues. The matrices will have between around 10x10 to 200x200, so not massive, and deifnitely not at the size where I would need to consider using sparse matrices.
Each matrix I'm looking at has a 0 eigenvalue, and I just need to find the eigenvector corresponding to that 0 eigenvalue. Which function out of eig, eigs, svd, svds would be fastest for this?
eigs allows you to specify that you only want the smallest eigenvalue (or nth smallest eigenvalues), so instinctively I'd think this would be faster - though don't know anything about the underlying methods. I think similar can be done for svd/ svds.
I've also run into issues with the methods (except for eigs) telling me that my system is almost singular, whihc doesn't occur when I use eig.
Does anyone have any suggestions on what the best method to use would be?

Multiplication of large sparse Matrices without null values in scala

I have two very sparse distributed matrixes of dimension 1,000,000,000 x 1,000,000,000 and I want to compute the matrix multiplication efficiently.
I tried to create a BlockMatrix from a CoordinateMatrix but it's a lot of memory (where in reality the non zero data are around ~500'000'000) and the time of computation is enormous.
So there is another way to create a sparse matrix and compute a multiplication efficiently in a distributed way in Spark? Or i have to compute it manually?
You must obviously use a storage format for sparse matrices that makes use of their sparsity.
Now, without knowing anything about how you handle matrices and which libraries you use, there's no helping you but to ask you to look at the linear algebra libraries of your choice and look for sparse storage formats; the "good old" Fortran-based libraries that underly a lot of modern math libs support them, and so chances are that you really have to do but a little googling with yourlibraryname + "sparse matrix".
second thoughts:
Sparse matrixes really don't lend themselves to distribution very well; think about the operations you'd have to do to coordinate distribution compared to the actual multiplications/additions.
Also, ~5e8 non-zero elements in a 1e18 element matrix are definitely a lot of memory, and since you don't specify how much you consider a lot to be, it's very possible there's nothing wrong with it. Assuming you're using the default double precision, that's 5e8 * 8B = 4GB of pure numbers, not counting the coordinates needed for sparse storage. So, if you've got ~10GB of memory, I wouldn't be surprised at all.
As there is no build-in method in Spark to perform a matrix multiplication with sparse matrixes. I resolved by reduce at best the sparsity of the matrices before perform the matrice multiplication with BlockMatrix (that not support sparse matrix).
Last edit: Even with the sparsity optimization I had a lot of problems with large dataset. Finally, I decided to implement it myself. Now is running very fast. I hope that a matrix implementation with sparse matrix will be implemented in Spark as I think there are a lot of application that can make use of this.

How to efficiently solve linear system with Laplacian + diagonal matrix?

In my implementation of an image processing algorithm, I have to solve a large linear system of the form A*x=b, where:
Matrix A=L+D is the sum of a Laplacian matrix L and a diagonal matrix D
Laplacian matrix L is sparse, with about 25 non-zeros per row
The system is large, with as many unknowns as there are pixels in the input image (typically > 1 million).
The Laplacian matrix L does not change between successive runs of the algorithm; I can construct this matrix in preprocessing, and possibly compute its factorization. The diagonal matrix D and right-side vector b change at each run of the algorithm.
I am trying to find out what would be the fastest method to solve the system at runtime; I do not mind spending time on preprocessing (for computing a factorization of L, for example).
My initial idea was to pre-compute a Cholesky factorization of L, then update the factorization at runtime with values from D (rank-1 update with cholupdate), and solve quickly the problem with back-substitution. Unfortunately, the Cholesky factorization is not as sparse as the original L matrix, and just loading it from disk already takes 5.48s; as a comparison, it takes 8.30s to directly solve the system with backslash.
Given the shape of my matrices, is there any other method that you would recommend to speedup the solving at runtime, no matter how long it takes at preprocessing time?
Assuming that you are working on a grid (since you mention images - although this is not guaranteed), that you are more interested in speed than precision (since 5s seems already too slow for 1 million unknowns), I see several options.
First, forget about exact methods such as Cholesky (+reordering). Even if they allow to store the factorization and reuse it for multiple rhs, you'll likely need to store gigantic matrices that appear to be intractable in your case (I hope you're re-ordering rows/columns with reverse Cuthill McKee or anything else though - that sparsifies the factorization a lot).
Depending on your boundary conditions, I would first try a Matlab poisolv that solves a Poisson problem using an FFT, and possible reprojections if you want Dirichlet boundary conditions instead of periodic ones. It's very fast, but might not be appropriate for your problem (you mention having 25 nnz for a Laplacian matrix+identity : why ? is-it a high order Laplace matrix, in which case you may be more interested in precision than what I assume ? or is-it in fact a different problem than the one you describe ?).
Then, you can try multigrid solvers that are very fast for images and smooth problems. You can use a simple relaxation method for each iteration and each level of the multigrid, or use fancier methods (for instance, a preconditioned conjugate gradient par level).
Alternatively, you can do a simpler preconditioned conjugate gradient (or even SSOR) without multigrid, and if you're only interested in an approximate solution, you can stop the iterations before full convergence.
My arguments for iterative solvers are:
you can stop before convergence if you want an approximate problem
you can still re-use other results to initialize your solution (for instance, if your different runs correspond to different frames of a video, then using the solution of the previous frame as an initialization of the next would make some sense).
Of course, a direct solver for which you can precompute, store and keep the factorization also makes sense (although I don't understand your argument for a rank-1 update if your matrix is constant) since only the backsubstitution remains to be done at runtime. But given this ignores the structure of the problem (a regular grid, a possible interest in limited precision results etc.), I'd opt for methods which have been designed for these cases such as Fourier-like methods or multigrids. Both methods can be implemented on the GPU for faster results (recall that GPUs are rather tailored for dealing with images/textures!).
Finally, you can get interesting answers from scicomp.stackexchange which is more targeted to numerical analysis.

Matlab division of large matrices [duplicate]

I have this problem which requires solving for X in AX=B. A is of the order 15000 x 15000 and is sparse and symmetric. B is 15000 X 7500 and is NOT sparse. What is the fastest way to solve for X?
I can think of 2 ways.
Simplest possible way, X = A\B
Using for loop,
invA = A\speye(size(A))
for i = 1:size(B,2)
X(:,i) = invA*B(:,i);
end
Is there a better way than the above two? If not, which one is best between the two I mentioned?
First things first - never, ever compute inverse of A. That is never sparse except when A is a diagonal matrix. Try it for a simple tridiagonal matrix. That line on its own kills your code - memory-wise and performance-wise. And computing the inverse is numerically less accurate than other methods.
Generally, \ should work for you fine. MATLAB does recognize that your matrix is sparse and executes sparse factorization. If you give a matrix B as the right-hand side, the performance is much better than if you only solve one system of equations with a b vector. So you do that correctly. The only other technical thing you could try here is to explicitly call lu, chol, or ldl, depending on the matrix you have, and perform backward/forward substitution yourself. Maybe you save some time there.
The fact is that the methods to solve linear systems of equations, especially sparse systems, strongly depend on the problem. But in almost any (sparse) case I imagine, factorization of a 15k system should only take a fraction of a second. That is not a large system nowadays. If your code is slow, this probably means that your factor is not that sparse sparse anymore. You need to make sure that your matrix is properly reordered to minimize the fill (added non-zero entries) during sparse factorization. That is the crucial step. Have a look at this page for some tests and explanations on how to reorder your system. And have a brief look at example reorderings at this SO thread.
Since you can answer yourself which of the two is faster, I'll try yo suggest the next options.
Solve it using a GPU. Plenty of details can be found online, including this SO post, a matlab benchmarking of A/b, etc.
Additionally, there's the MATLAB add-on of LAMG (Lean Algebraic Multigrid). LAMG is a fast graph Laplacian solver. It can solve Ax=b in O(m) time and storage.
If your matrix A is symmetric positive definite, then here's what you can do to solve the system efficiently and stably:
First, compute the cholesky decomposition, A=L*L'. Since you have a sparse matrix, and you want to exploit it to accelerate the inversion, you should not apply chol directly, which would destroy the sparsity pattern. Instead, use one of the reordering method described here.
Then, solve the system by X = L'\(L\B)
Finally, if are not dealing with potential complex values, then you can replace all the L' by L.', which gives a bit further acceleration because it's just trying to transpose instead of computing the complex conjugate.
Another alternative would be the preconditioned conjugate gradient method, pcg in Matlab. This one is very popular in practice, because you can trade off speed for accuracy, i.e. give it less number of iterations, and it will give you a (usually pretty good) approximate solution. You also never need to store the matrix A explicitly, but just be able to compute matrix-vector product with A, if your matrix doesn't fit into memory.
If this takes forever to solve in your tests, you are probably going into virtual memory for the solve. A 15k square (full) matrix will require 1.8 gigabytes of RAM to store in memory.
>> 15000^2*8
ans =
1.8e+09
You will need some serious RAM to solve this, as well as the 64 bit version of MATLAB. NO factorization will help you unless you have enough RAM to solve the problem.
If your matrix is truly sparse, then are you using MATLAB's sparse form to store it? If not, then MATLAB does NOT know the matrix is sparse, and does not use a sparse factorization.
How sparse is A? Many people think that a matrix that is half full of zeros is "sparse". That would be a waste of time. On a matrix that size, you need something that is well over 99% zeros to truly gain from a sparse factorization of the matrix. This is because of fill-in. The resulting factorized matrix is almost always nearly full otherwise.
If you CANNOT get more RAM (RAM is cheeeeeeeeep you know, certainly once you consider the time you have wasted trying to solve this) then you will need to try an iterative solver. Since these tools do not factorize your matrix, if it is truly sparse, then they will not go into virtual memory. This is a HUGE savings.
Since iterative tools often require a preconditioner to work as well as possible, it can take some study to find the best preconditioner.

Tool to diagonalize large matrices

I want to compute a diffusion kernel, which involves taking exp(b*A) where A is a large matrix. In order to play with values of b, I'd like to diagonalize A (so that exp(A) runs quickly).
My matrix is about 25k x 25k, but is very sparse - only about 60k values are non-zero. Matlab's "eigs" function runs of out memory, as does octave's "eig" and R's "eigen." Is there a tool to find the decomposition of large, sparse matrices?
Dunno if this is relevant, but A is an adjacency matrix, so it's symmetric, and it is full rank.
Have you tried SVD, svds for sparse matrix in matlab.
EDIT: one more thing, don't do full rank SVD since the dimension is big, use a small rank, say 500, so that your solution fits in the memory. This cuts the small eigenvalues and their vectors out. Thus it does not hurt your accuracy much.
Have you considered the following property:
exp(A*t) = L^(-1) {(sI-A)^(-1)}
where L^(-1) the inverse Laplace transform? - provided that you can invert (sI-A)
If you have access to a 64 bit machine and octave compiled with 64 bit support, you might be able to get around this problem.
Also, I don't know what platform you are running all of this on, but in UNIX based systems you can use ulimit to increase the maximum allowed stack size for user processes.
For example, you can run
ulimit -u unlimited
and this will ensure that there are no memory limits etc on your processes. This is not a good idea in general, since you have have runaway processes that will completely bog down your machine. Try instead
ulimit -s [stacksize]
to increase the stack size limit.
Octave has splu which does lu decomposition for sparse matrices. I am not sure whether it can handle 25k x 25k but worth a shot.
Alternatively if your matrix is structured like so: A = [B zeros;zeros C] then you can diagonalize B and C separately and put them together into one matrix. I am guessing you can do something similar for eig.
In R you could check igraph package and function arpack which is interface to ARPACK library.