I wanna multiply a lower triangular and an upper triangular matrix... the usual matrix multiplication is a waste because it spends so much time in multiplying zeros ... i am looking if there is a matlab specific way to save computation time .. matrices are of size of the order of thousands
You may get some gains by using SPARSE arrays, since they use less memory and don't do multiplication by zero, but they come with a bit of computational overhead.
Otherwise, I sincerely doubt that you can beat Matlab for efficiency in linear algebra manipulation by writing your own Matlab code.
the usual matrix multiplication is a waste because it spends so much time in multiplying zeros
So half the coefficients of each matrix are zero, which means that a naive matrix multiplication scheme would "waste" 3/4 of its time. And you want to try to recover that time by doing something more complicated?
I'd bet moderate amounts of money that you can't beat MATLAB. Its matrix routines are at the core of its computation engine. Most likely they check for zero coefficients and eliminate that "wasted" time on their own.
I'd echo #Jonas's comments, but would add that the only time you should be using sparse matrices is if the vast majority of coefficients are zero. As in >90%, rather than 50%.
Related
I understand that if a number gets closer to zero than realmin, then Matlab converts the double to a denorm . I am noticing this causes significant performance cost. In particular I am using a gradient descent algorithm that when near convergence, the gradients (in backprop for my bespoke neural network) drop below realmin such that the algorithm incurs heavy performance cost (due to, I am assuming, type conversion behind the scenes). I have used the following code to validate my gradient matrices so that no numbers falls below realmin:
function mat= validateSmallDoubles(obj, mat, threshold)
mat= mat.*(abs(mat)>threshold);
end
Is this usual practice and what value should threshold take (obviously you want this as close to realmin as possible, but not too close otherwise any additional division operations will send some elements of mat below realmin after validation)?. Also, specifically for neural networks, where are the best places to do gradient validation without ruining the network's ability to learn?. I would be grateful to know what solutions people with experience in training neural networks have? I am sure this is a problem for all languages. Tentative threshold values have ruined my network's learning.
I do not know if it is somehow related to your problem, but I had a similar problem with underflows while doing exponentially weighted average of gradients (say while implementing Momentum or Adam).
In particular, at some point you do something like:
v := 0.9*v + 0.1*gradient where v is the exponentially weighted average of your gradient g. If in a lot of successive iterations a same element of your g matrix remains 0, your v is quickly becoming very small and you hit dernormals.
So the problem, is why all those zeros ? In my case the culprit where the ReLu units which outputed a lot of zeros (if x<0 , relu(x) is zero). Because when Relu outputs zero on a given neurons the related weight has no effect it means the corresponding partial derivative will be zero in g. So it happened to me that in a lot of successive iterations that particular neuron was not fired.
To avoiding having zero activations (and derivatives), I used "leaky relu" so to have a very small derivative instead.
Another solution, is to use gradient clipping before applying your weighted average to threshold your gradients to a minimum value. Which is quite similar to what you did.
I traced the diminishing gradient occurrences to the Adam SGD optimiser - the biased moving average matrix calculations in the Adam optimiser were causing matlab to carry out the denorm operation. I simply thresholded the matrix elements for each layer after these calculations, with threshold=10*realmin, to zero without any effect on learning. I have yet to investigate why my moving averages were getting so close to zero as my architecture and weight initialisation priors would normally mitigate this.
I have a very large and sparse matrix of size 180GB(text , 30k * 3M) containing only the entries and no additional data. I have to do matrix multiplication , inversion and some similar linear algebra operations over it. I tried octave and simple single-threaded C code for the multiplication but my system RAM of 40GB gets used up very fast and then I can find the program starts thrashing. Is there any other options available to me. I am not familiar with MathLab or any other matrix operational library that can help me in doing so.
When I run a simple matrix multiplication of two matrices with 10 rows and 3 M cols, and its transpose, it gives the following error :
memory exhausted or requested size too large for range of Octave's index type
I am not sure whether the same would work on Matlab or not. For sparse matrix representation and matrix multiplication, is there another library or code.
if there are few enough nonzero entries, I suggest creating a sparse matrix S with appropriate dimensions and max nonzero entries; see matlab create sparse matrix. Then as #oleg komarov described, load the matrix in blocks and assign the nonzero entries from each block into the correct address in the sparse matrix S. I feel that if your matrix is sparse enough, then loading it is really the only difficulty you face. I had similar issues with large transfer operators.
Have you considered performing your processing in blocks? Transposition and multiplications work very well with block matrix processing (see https://en.wikipedia.org/wiki/Block_matrix) and that will get you around any limitations about the indices.
This wouldn't help you with matrix inversion though unless you can decompose your matrix in blocks when blocks that aren't on the diagonal are completely empty, which isn't stated in your assumptions.
Octave has a limit in both the memory resources of about 2GB and the maximum number of indices a matrix can hold of about 2^32 (for 32 bits Octave). MatLab doesn't have such a memory limit, since it will use all of your memory resources, swapping file included. Thus you could try with MatLab by setting a huge swapfile, you may then compute your operations (but it will anyway take quite along time...).
If you are interested by other approaches, you may take a look into out-of-core computing which aims to promote new methods to process huge datasets that cannot reside all in memory, but rather store it on disk and load efficiently the bits that are necessary.
For a practical approach, you may take a look into Blaze for Python (notice: still in development!).
In my implementation of an image processing algorithm, I have to solve a large linear system of the form A*x=b, where:
Matrix A=L+D is the sum of a Laplacian matrix L and a diagonal matrix D
Laplacian matrix L is sparse, with about 25 non-zeros per row
The system is large, with as many unknowns as there are pixels in the input image (typically > 1 million).
The Laplacian matrix L does not change between successive runs of the algorithm; I can construct this matrix in preprocessing, and possibly compute its factorization. The diagonal matrix D and right-side vector b change at each run of the algorithm.
I am trying to find out what would be the fastest method to solve the system at runtime; I do not mind spending time on preprocessing (for computing a factorization of L, for example).
My initial idea was to pre-compute a Cholesky factorization of L, then update the factorization at runtime with values from D (rank-1 update with cholupdate), and solve quickly the problem with back-substitution. Unfortunately, the Cholesky factorization is not as sparse as the original L matrix, and just loading it from disk already takes 5.48s; as a comparison, it takes 8.30s to directly solve the system with backslash.
Given the shape of my matrices, is there any other method that you would recommend to speedup the solving at runtime, no matter how long it takes at preprocessing time?
Assuming that you are working on a grid (since you mention images - although this is not guaranteed), that you are more interested in speed than precision (since 5s seems already too slow for 1 million unknowns), I see several options.
First, forget about exact methods such as Cholesky (+reordering). Even if they allow to store the factorization and reuse it for multiple rhs, you'll likely need to store gigantic matrices that appear to be intractable in your case (I hope you're re-ordering rows/columns with reverse Cuthill McKee or anything else though - that sparsifies the factorization a lot).
Depending on your boundary conditions, I would first try a Matlab poisolv that solves a Poisson problem using an FFT, and possible reprojections if you want Dirichlet boundary conditions instead of periodic ones. It's very fast, but might not be appropriate for your problem (you mention having 25 nnz for a Laplacian matrix+identity : why ? is-it a high order Laplace matrix, in which case you may be more interested in precision than what I assume ? or is-it in fact a different problem than the one you describe ?).
Then, you can try multigrid solvers that are very fast for images and smooth problems. You can use a simple relaxation method for each iteration and each level of the multigrid, or use fancier methods (for instance, a preconditioned conjugate gradient par level).
Alternatively, you can do a simpler preconditioned conjugate gradient (or even SSOR) without multigrid, and if you're only interested in an approximate solution, you can stop the iterations before full convergence.
My arguments for iterative solvers are:
you can stop before convergence if you want an approximate problem
you can still re-use other results to initialize your solution (for instance, if your different runs correspond to different frames of a video, then using the solution of the previous frame as an initialization of the next would make some sense).
Of course, a direct solver for which you can precompute, store and keep the factorization also makes sense (although I don't understand your argument for a rank-1 update if your matrix is constant) since only the backsubstitution remains to be done at runtime. But given this ignores the structure of the problem (a regular grid, a possible interest in limited precision results etc.), I'd opt for methods which have been designed for these cases such as Fourier-like methods or multigrids. Both methods can be implemented on the GPU for faster results (recall that GPUs are rather tailored for dealing with images/textures!).
Finally, you can get interesting answers from scicomp.stackexchange which is more targeted to numerical analysis.
I have this problem which requires solving for X in AX=B. A is of the order 15000 x 15000 and is sparse and symmetric. B is 15000 X 7500 and is NOT sparse. What is the fastest way to solve for X?
I can think of 2 ways.
Simplest possible way, X = A\B
Using for loop,
invA = A\speye(size(A))
for i = 1:size(B,2)
X(:,i) = invA*B(:,i);
end
Is there a better way than the above two? If not, which one is best between the two I mentioned?
First things first - never, ever compute inverse of A. That is never sparse except when A is a diagonal matrix. Try it for a simple tridiagonal matrix. That line on its own kills your code - memory-wise and performance-wise. And computing the inverse is numerically less accurate than other methods.
Generally, \ should work for you fine. MATLAB does recognize that your matrix is sparse and executes sparse factorization. If you give a matrix B as the right-hand side, the performance is much better than if you only solve one system of equations with a b vector. So you do that correctly. The only other technical thing you could try here is to explicitly call lu, chol, or ldl, depending on the matrix you have, and perform backward/forward substitution yourself. Maybe you save some time there.
The fact is that the methods to solve linear systems of equations, especially sparse systems, strongly depend on the problem. But in almost any (sparse) case I imagine, factorization of a 15k system should only take a fraction of a second. That is not a large system nowadays. If your code is slow, this probably means that your factor is not that sparse sparse anymore. You need to make sure that your matrix is properly reordered to minimize the fill (added non-zero entries) during sparse factorization. That is the crucial step. Have a look at this page for some tests and explanations on how to reorder your system. And have a brief look at example reorderings at this SO thread.
Since you can answer yourself which of the two is faster, I'll try yo suggest the next options.
Solve it using a GPU. Plenty of details can be found online, including this SO post, a matlab benchmarking of A/b, etc.
Additionally, there's the MATLAB add-on of LAMG (Lean Algebraic Multigrid). LAMG is a fast graph Laplacian solver. It can solve Ax=b in O(m) time and storage.
If your matrix A is symmetric positive definite, then here's what you can do to solve the system efficiently and stably:
First, compute the cholesky decomposition, A=L*L'. Since you have a sparse matrix, and you want to exploit it to accelerate the inversion, you should not apply chol directly, which would destroy the sparsity pattern. Instead, use one of the reordering method described here.
Then, solve the system by X = L'\(L\B)
Finally, if are not dealing with potential complex values, then you can replace all the L' by L.', which gives a bit further acceleration because it's just trying to transpose instead of computing the complex conjugate.
Another alternative would be the preconditioned conjugate gradient method, pcg in Matlab. This one is very popular in practice, because you can trade off speed for accuracy, i.e. give it less number of iterations, and it will give you a (usually pretty good) approximate solution. You also never need to store the matrix A explicitly, but just be able to compute matrix-vector product with A, if your matrix doesn't fit into memory.
If this takes forever to solve in your tests, you are probably going into virtual memory for the solve. A 15k square (full) matrix will require 1.8 gigabytes of RAM to store in memory.
>> 15000^2*8
ans =
1.8e+09
You will need some serious RAM to solve this, as well as the 64 bit version of MATLAB. NO factorization will help you unless you have enough RAM to solve the problem.
If your matrix is truly sparse, then are you using MATLAB's sparse form to store it? If not, then MATLAB does NOT know the matrix is sparse, and does not use a sparse factorization.
How sparse is A? Many people think that a matrix that is half full of zeros is "sparse". That would be a waste of time. On a matrix that size, you need something that is well over 99% zeros to truly gain from a sparse factorization of the matrix. This is because of fill-in. The resulting factorized matrix is almost always nearly full otherwise.
If you CANNOT get more RAM (RAM is cheeeeeeeeep you know, certainly once you consider the time you have wasted trying to solve this) then you will need to try an iterative solver. Since these tools do not factorize your matrix, if it is truly sparse, then they will not go into virtual memory. This is a HUGE savings.
Since iterative tools often require a preconditioner to work as well as possible, it can take some study to find the best preconditioner.
I have this problem which requires solving for X in AX=B. A is of the order 15000 x 15000 and is sparse and symmetric. B is 15000 X 7500 and is NOT sparse. What is the fastest way to solve for X?
I can think of 2 ways.
Simplest possible way, X = A\B
Using for loop,
invA = A\speye(size(A))
for i = 1:size(B,2)
X(:,i) = invA*B(:,i);
end
Is there a better way than the above two? If not, which one is best between the two I mentioned?
First things first - never, ever compute inverse of A. That is never sparse except when A is a diagonal matrix. Try it for a simple tridiagonal matrix. That line on its own kills your code - memory-wise and performance-wise. And computing the inverse is numerically less accurate than other methods.
Generally, \ should work for you fine. MATLAB does recognize that your matrix is sparse and executes sparse factorization. If you give a matrix B as the right-hand side, the performance is much better than if you only solve one system of equations with a b vector. So you do that correctly. The only other technical thing you could try here is to explicitly call lu, chol, or ldl, depending on the matrix you have, and perform backward/forward substitution yourself. Maybe you save some time there.
The fact is that the methods to solve linear systems of equations, especially sparse systems, strongly depend on the problem. But in almost any (sparse) case I imagine, factorization of a 15k system should only take a fraction of a second. That is not a large system nowadays. If your code is slow, this probably means that your factor is not that sparse sparse anymore. You need to make sure that your matrix is properly reordered to minimize the fill (added non-zero entries) during sparse factorization. That is the crucial step. Have a look at this page for some tests and explanations on how to reorder your system. And have a brief look at example reorderings at this SO thread.
Since you can answer yourself which of the two is faster, I'll try yo suggest the next options.
Solve it using a GPU. Plenty of details can be found online, including this SO post, a matlab benchmarking of A/b, etc.
Additionally, there's the MATLAB add-on of LAMG (Lean Algebraic Multigrid). LAMG is a fast graph Laplacian solver. It can solve Ax=b in O(m) time and storage.
If your matrix A is symmetric positive definite, then here's what you can do to solve the system efficiently and stably:
First, compute the cholesky decomposition, A=L*L'. Since you have a sparse matrix, and you want to exploit it to accelerate the inversion, you should not apply chol directly, which would destroy the sparsity pattern. Instead, use one of the reordering method described here.
Then, solve the system by X = L'\(L\B)
Finally, if are not dealing with potential complex values, then you can replace all the L' by L.', which gives a bit further acceleration because it's just trying to transpose instead of computing the complex conjugate.
Another alternative would be the preconditioned conjugate gradient method, pcg in Matlab. This one is very popular in practice, because you can trade off speed for accuracy, i.e. give it less number of iterations, and it will give you a (usually pretty good) approximate solution. You also never need to store the matrix A explicitly, but just be able to compute matrix-vector product with A, if your matrix doesn't fit into memory.
If this takes forever to solve in your tests, you are probably going into virtual memory for the solve. A 15k square (full) matrix will require 1.8 gigabytes of RAM to store in memory.
>> 15000^2*8
ans =
1.8e+09
You will need some serious RAM to solve this, as well as the 64 bit version of MATLAB. NO factorization will help you unless you have enough RAM to solve the problem.
If your matrix is truly sparse, then are you using MATLAB's sparse form to store it? If not, then MATLAB does NOT know the matrix is sparse, and does not use a sparse factorization.
How sparse is A? Many people think that a matrix that is half full of zeros is "sparse". That would be a waste of time. On a matrix that size, you need something that is well over 99% zeros to truly gain from a sparse factorization of the matrix. This is because of fill-in. The resulting factorized matrix is almost always nearly full otherwise.
If you CANNOT get more RAM (RAM is cheeeeeeeeep you know, certainly once you consider the time you have wasted trying to solve this) then you will need to try an iterative solver. Since these tools do not factorize your matrix, if it is truly sparse, then they will not go into virtual memory. This is a HUGE savings.
Since iterative tools often require a preconditioner to work as well as possible, it can take some study to find the best preconditioner.