Solving multiple linear systems using vectorization

Solving multiple linear systems using vectorization - matlab

Sorry if this is obvious but I searched a while and did not find anything (or missed it).
I'm trying to solve linear systems of the form Ax=B with A a 4x4 matrix, and B a 4x1 vector.
I know that for a single system I can use mldivide to obtain x: x=A\B.
However I am trying to solve a great number of systems (possibly > 10000) and I am reluctant to use a for loop because I was told it is notably slower than matrix formulation in many MATLAB problems.
My question is then: is there a way to solve Ax=B using vectorization with A 4x4x N and B a matrix 4x N ?
PS: I do not know if it is important but the B vector is the same for all the systems.

You should use a for loop. There might be a benefit in precomputing a factorization and reusing it, if A stays the same and B changes. But for your problem where A changes and B stays the same, there's no alternative to solving N linear systems.
You shouldn't worry too much about the performance cost of loops either: the MATLAB JIT compiler means that loops can often be just as fast on recent versions of MATLAB.

I don't think you can optimize this further. As explained by #Tom, since A is the one changing, there is no benefit in factoring the various A's beforehand...
Besides the looped solution is pretty fast given the dimensions you mention:
A = rand(4,4,10000);
B = rand(4,1); %# same for all linear systems
tic
X = zeros(4,size(A,3));
for i=1:size(A,3)
X(:,i) = A(:,:,i)\B;
end
toc
Elapsed time is 0.168101 seconds.

Here's the problem:
you're trying to perform a 2D operation (mldivide) on a 3d matrix. No matter how you look at it, you need reference the matrix by index which is where the time penalty kicks in... it's not the for loop which is the problem, but it's how people use them.
If you can structure your problem differently, then perhaps you can find a better option, but right now you have a few options:
1 - mex
2 - parallel processing (write a parfor loop)
3 - CUDA

Here's a rather esoteric solution that takes advantage of MATLAB's peculiar optimizations. Construct an enormous 4k x 4k sparse matrix with your 4x4 blocks down the diagonal. Then solve all simultaneously.
On my machine this gets the same solution up to single precision accuracy as #Amro/Tom's for-loop solution, but faster.
n = size(A,1);
k = size(A,3);
AS = reshape(permute(A,[1 3 2]),n*k,n);
S = sparse( ...
repmat(1:n*k,n,1)', ...
bsxfun(#plus,reshape(repmat(1:n:n*k,n,1),[],1),0:n-1), ...
AS, ...
n*k,n*k);
X = reshape(S\repmat(B,k,1),n,k);
for a random example:
For k = 10000
For loop: 0.122570 seconds.
Giant sparse system: 0.032287 seconds.
If you know that your 4x4 matrices are positive definite then you can use chol on S to improve the accuracy.
This is silly. But so is how slow matlab's for loops still are in 2015, even with JIT. This solution seems to find a sweet spot when k is not too large so everything still fits into memory.

I know this post is years old now, but I'll contribute my two cents anyway. You CAN put all of your A matricies into a bigger block diagonal matrix, where there will be 4x4 blocks on the diagonal of a big matrix. The right hand side will be all of your b vectors stacked on top of each other over and over. Once you set this up, it is represented as a sparse system, and can be efficiently solved with the algorithms mldivide chooses. The blocks are numerically decoupled, so even if there are singular blocks in there, the answers for the nonsingular blocks should be right when you use mldivide. There is a code that took this approach on MATLAB Central:
http://www.mathworks.com/matlabcentral/fileexchange/24260-multiple-same-size-linear-solver
I suggest experimenting to see if the approach is any faster than looping. I suspect it can be more efficient, especially for large numbers of small systems. In particular, if there are nice formulas for the coefficients of A across the N matricies, you can build the full left hand side using MATLAB vector operations (without looping), which could give you additional cost savings. As others have noted, vectorized operations aren't always faster, but they often are in my experience.

Related

Matlab: Solve for a single variable in a linear system of equations

I have a linear system of about 2000 sparse equations in Matlab. For my final result, I only really need the value of one of the variables: the other values are irrelevant. While there is no real problem in simply solving the equations and extracting the correct variable, I was wondering whether there was a faster way or Matlab command. For example, as soon as the required variable is calculated, the program could in principle stop running.
Is there anyone who knows whether this is at all possible, or if it would just be easier to keep solving the entire system?

Most of the computation time is spent inverting the matrix, if we can find a way to avoid completely inverting the matrix then we may be able to improve the computation time. Lets assume I'm only interested in the solution for the last variable x(N). Using the standard method we compute
x = A\b;
res = x(N);
Assuming A is full rank, we can instead use LU decomposition of the augmented matrix [A b] to get x(N) which looks like this
[~,U] = lu([A b]);
res = U(end,end-1)/U(end,end);
This is essentially performing Gaussian elimination and then solving for x(N) using back-substitution.
We can extend this to find any value of x by swapping the columns of A before LU decomposition,
x_index = 123; % the index of the solution we are interested in
A(:,[x_index,end]) = A(:,[end,x_index]);
[~,U] = lu([A b]);
res = U(end,end)/U(end,end-1);
Bench-marking performance in MATLAB2017a with 10,000 random 200 dimensional systems we get a slight speed-up
Total time direct method : 4.5401s
Total time LU method : 3.9149s
Note that you may experience some precision issues if A isn't well conditioned.
Also, this approach doesn't take advantage of the sparsity of A. In my experiments even with 2000x2000 sparse matrices everything significantly slowed down and the LU method is significantly slower. That said full matrix representation only requires about 30MB which shouldn't be a problem on most computers.

If you have access to theory manuals on NASTRAN, I believe (from memory) there is coverage of partial solutions of linear systems. Also try looking for iterative or tri diagonal solvers for A*x = b. On this page, review the pqr solution answer by Shantachhani. Another reference.

repmat vs simple matrix multiplication in MATLAB

Let v be a row vector of length n. The goal is to create a matrix A with m rows that are all equal to v.
MATLAB has a function for this that is called repmat. Possible code would be
A = repmat(v,[m 1])
There is an alternative equally concise way using simple matrix operations
A = ones(m,1)*v
Is any of the two methods preferable for large m and n?

Lets compare them!
When testing algorithms 2 metrics are important: time, and memory.
Lets start with time:
Clearly repmat wins!
Memory:
profile -memory on
for m=1000:1000:50000
f1=#()(repmat(v,[m 1]));
f2=#()(ones(m,1)*v);
ii=ii+1;
t1(ii)=timeit(f1);
t2(ii)=timeit(f2);
end
profreport
It seems that both take the same amount of memory. However, the profiler is known for not showing all the memory, so we can not fully trust it.
Still, it is clear that repmat is better

You should use repmat().
Matrix Multiplication is O(n ^ 3) operation which is much slower then replicating data in memory.
On top of that, the second option allocate more data in memory of the size of the output.
In the case above you create a vector which the outer multiplication is faster yet still not as memory operation.
MATLAB doesn't use the knowledge all vector elements are 1, hence you multiply each element of x by 1 m times.
Both operations will be mainly memory bounded, yet more efficient, fast and direct method would be going with repmat().
The question is, what you do afterwards?
Because you may not need repmat().

What is benefit to use SVD for solving Ax=b

I have a linear equation such as
Ax=b
where A is full rank matrix which its size is 512x512. b is a vector of 512x1. x is unknown vector. I want to find x, hence, I have some options for doing that
1.Using the normal way
inv(A)*b
2.Using SVD ( Singular value decomposition)
[U S V]=svd(A);
x = V*(diag(diag(S).^-1)*(U.'*b))
Both methods give the same result. So, what is benefit of using SVD to solve Ax=b, especially in the case A is a 2D matrix?

Welcome to the world of numerical methods, let me be your guide.
You, as a new person in this world wonders, "Why would I do something this difficult with this SVD stuff instead of the so commonly known inverse?! Im going to try it in Matlab!"
And no answer was found. That is, because you are not looking at the problem itself! The problems arise when you have an ill-conditioned matrix. Then the computing of the inverse is not possible numerically.
example:
A=[1 1 -1;
1 -2 3;
2 -1 2];
try to invert this matrix using inv(A). Youll get infinite.
That is, because the condition number of the matrix is very high (cond(A)).
However, if you try to solve it using SVD method (b=[1;-2;3]) you will get a result. This is still a hot research topic. Solving Ax=b systems with ill condition numbers.
As #Stewie Griffin suggested, the best way to go is mldivide, as it does a couple of things behind it.
(yeah, my example is not very good because the only solution of X is INF, but there is a way better example in this youtube video)

inv(A)*b has several negative sides. The main one is that it explicitly calculates the inverse of A, which is both time demanding, and may result in inaccuracies if values vary by many orders of magnitude.
Although it might be better than inv(A)*b, using svd is not the "correct" approach here. The MATLAB-way to do this is using mldivide, \. Using this, MATLAB chooses the best algorithm to solve the linear system based on its properties (Hermation, upper Hessenberg, real and positive diagonal, symmetric, diagonal, sparse etc.). Often, the solution will be a LU-triangulation with partial permutation, but it varies. You'll have a hard time beating MATLABs implementation of mldivide, but using svd might give you some more insight of the properties of the system if you actually investigates U, S, V. If you don't want to do that, do with mldivide.

how to convert a matrix to a diagonally dominant matrix using pivoting in Matlab

Hi I am trying to solve a linear system of the following type:
A*x=b,
where A is the coefficient matrix,
x is the vectors of unknowns and
b is the vector of solution.
The coefficient matrix (A) is a n-by-n sparse matrix, with even zeros in the diagonal. In order to solve this system in an accurate way I am using an iterative method in Matlab called bicgstab (Biconjugate gradients stabilized method).
This coefficient matrix (A) has a
det(A)=-4.1548e-05 and a rcond(A)= 1.1331e-04.
Therefore the matrix is ill-conditioned. I first try to perform a scaling and the results where:
det(A)= -1.2612e+135 but the rcond(A)=5.0808e-07...
Therefore the matrix is still ill-conditioned... I verify and the sum of all absolute value of the non-diagonal elements where 163.60 and the sum of all absolute value of the diagonal elements where 32.49... Therefore the matrix of coefficient is not diagonally dominant and will not converge using my function bicgstab...
I am looking for someone that can help me with performing a pivoting to the coefficient matrix (A) so it can be diagonally dominant. Or any advice to solve this problem....
Thanks for the help.

First there should be quite a few things noted here:
Don't use the determinant to estimate the "amount of singularity" of your matrix. The determinant is the product of all the eigenvalues of your matrix, and therefore its scaling can be wildly misleading compared to a much better measure like the condition number, leading to the next point..
your conditioning (according to rcond) isn't that bad, are you working with single or double precision? Large problems can routinely get condition numbers in this range and still be quite solvable, but of course this depends on a very complicated interaction of many factors, of which the condition number plays only a small part. This leads to another complicated point:
Diagonal dominance may not help you at all here. BiCGStab as far as I know does not require diagonal dominance for its convergence, and also I don't think diagonal dominance is known even to help it. Diagonal dominance is usually an assumption made by other iterative methods such as the Jacobi method or Gauss-Seidel. Actually the convergence behavior of BiCGStab is not very well understood at all, and it is usually only used when memory is a very severe problem but conjugate gradients is not applicable.
If you are really interested in using a Krylov method (such as BiCGStab) to solve your problem, then you generally need to have more understanding of where your matrix is coming from so that you can choose a sensible preconditioner.
So this calls for a bit more information. Do you know more about this matrix? Is it arising from some kind of physical problem? Do you know for example if it is symmetric or positive definite (I will assume not both because you are not using CG).
Let me end with some actionable advice which is very generic, and so not necessarily optimal:
If memory is not an issue, consider using restarted GMRES instead of BiCGStab. My experience is that GMRES has much more robust convergence.
Try an approximate factorization preconditioner such as ILU. MATLAB has a function for this built in.

Why eigs( 'lm') is much faster than eigs('sm')

I use eigs to calculate the eigen vectors of sparse square matrices which are large (tens of thousands).
What I want is the smallest set of eigen vectors.
But
eigs(A, 10, 'sm') % Note: A is the matrix
runs very slow.
However, using eigs(A, 10, 'lm') gives me the answer relatively faster.
And as I tried, replacing 10 with A_width in eigs(A, 10, 'lm') so that this includes all the eigen vectors, doesn't solve this problem, 'cause this make it the as slow as using 'sm'.
So, I want to know why calculating the smallest vectors(using 'sm') is much slower than calculating the largest?
BTW, if you have any idea about how to use eigs with 'sm' as fast as with 'lm', please tell me that.

The algorithm used in pretty much any standard eigs function is (some variation of) the Lanczos algorithm. It is iterative and the first iterations give you the largest eigenvalues. This explains pretty much every observation you make:
Largest eigenvalues take the least amount of iterations,
Smallest eigenvalues take the maximum amount of iterations,
All eigenvalues also take the maximum amount of iterations.
There are tricks to "fool" eigs into calculating the smallest eigenvalues by actually making them the largest eigenvalues of another problem. This is usually accomplished by a shift parameter. Skimming over the Matlab documentation for eigs, I see that they have a sigma parameter, which might help you. Note the same documentation recommends proper eig if the matrix fits into memory, as eigs has its numerical quirks.

Since eigs is actually an m-file function, we can profile it. I have run a couple of basic tests, and it depends very much on the nature of the data in the matrix. If we run the profiler separately on the following two lines of code:
eigs(eye(1000), 10, 'lm'), and
eigs(eye(1000), 10, 'sm'),
then in the first instance it calls arpackc (the main function that does the work - according to the comments in eigs it's probably from here) a total of 22 times. In the second instance it is called 103 times.
On the other hand, trying it with
eigs(rand(1000), 10, 'lm'), and
eigs(rand(1000), 10, 'sm'),
I get results where the 'lm' option consistently calls arpackc many more times than the sm option.
I'm afraid I don't know the details of the algorithm, and so can't explain it in any deeper mathematical sense, but the page that I linked suggests ARPACK is best for matrices with some structure. Since matrices generated by rand have little structure, it is probably safe to assume the latter behaviour I described is not what you'd expect under normal operating conditions.
In short: it simply takes the algorithm more iterations to converge when you ask it for the smallest eigenvalues of a structured matrix. This being an iterative process, however, it very much depends on the actual data you give it.
Edit: There is a wealth of information and references about this method here, and the key to understanding exactly why this happens is surely contained somewhere therein.

The reason is actually much more simple and due to the basics of solving large sparse eigenvalue problems. These are all based on solving:
(1) A x = lam x
Most solution methods use some power law (e.g. a Krylov subspace spanned in both the Lanczos and Arnoldi methods)
The thing is that the a power series converge to the largest eigenvalue of (1). Therefore we have that the largest eigenvalues are found by the subspace spanned by: K^k = {A*r0,....,A^k*r0}, which requires only matrix vector multiplications (cheap).
To find the smallest, we have to reformulate (1) as follows:
(2) 1/lam x = A^(-1) x or A^(-1) x = invlam x
Now solving for the largest eigenvalue of (2) is equivalent to finding the smallest eigenvalue of (1). In this case the subspace is spanned by K^k = {A^(-1)*r0,....,A^(-k)*r0}, which requires solving several linear system (expensive!).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse