Does Matlab support efficient operations on large sparse tensors?
More specifically:
Is there an elegant way, similar to sparse, of loading and storing a sparse tensor? As far as I can understand, sparse can only load matrices.
Are operations like tensor product implemented efficiently over sparse tensors?
I realize I can always store a tensor as a combination of cell arrays of matrices, but that would require using loops, and I'm hoping to avoid that.
Since the data I'm working with is very large, I cannot consider a non-sparse representation.
Out of the box, I believe MATLAB only handles sparse matrices, as you say.
But you might like to take a look at the Tensor Toolbox and the N-way Toolbox to see if they meet your needs. Both are freely available, and I've heard good things about both (although I've used neither myself). The Tensor Toolbox in particular seems to have at least some support for sparse multidimensional arrays.
You can use the Tensor Toolbox for working with tensors. you can use the sptensor() to create the sparse tensor in this Toolbox.
If you're looking for a truly scalable solution, take a look at SPLATT: http://glaros.dtc.umn.edu/gkhome/splatt/overview
Related
I have two very sparse distributed matrixes of dimension 1,000,000,000 x 1,000,000,000 and I want to compute the matrix multiplication efficiently.
I tried to create a BlockMatrix from a CoordinateMatrix but it's a lot of memory (where in reality the non zero data are around ~500'000'000) and the time of computation is enormous.
So there is another way to create a sparse matrix and compute a multiplication efficiently in a distributed way in Spark? Or i have to compute it manually?
You must obviously use a storage format for sparse matrices that makes use of their sparsity.
Now, without knowing anything about how you handle matrices and which libraries you use, there's no helping you but to ask you to look at the linear algebra libraries of your choice and look for sparse storage formats; the "good old" Fortran-based libraries that underly a lot of modern math libs support them, and so chances are that you really have to do but a little googling with yourlibraryname + "sparse matrix".
second thoughts:
Sparse matrixes really don't lend themselves to distribution very well; think about the operations you'd have to do to coordinate distribution compared to the actual multiplications/additions.
Also, ~5e8 non-zero elements in a 1e18 element matrix are definitely a lot of memory, and since you don't specify how much you consider a lot to be, it's very possible there's nothing wrong with it. Assuming you're using the default double precision, that's 5e8 * 8B = 4GB of pure numbers, not counting the coordinates needed for sparse storage. So, if you've got ~10GB of memory, I wouldn't be surprised at all.
As there is no build-in method in Spark to perform a matrix multiplication with sparse matrixes. I resolved by reduce at best the sparsity of the matrices before perform the matrice multiplication with BlockMatrix (that not support sparse matrix).
Last edit: Even with the sparsity optimization I had a lot of problems with large dataset. Finally, I decided to implement it myself. Now is running very fast. I hope that a matrix implementation with sparse matrix will be implemented in Spark as I think there are a lot of application that can make use of this.
I have symmetrical sparse matrices. Some of the elements would form "blocks" or "components" .
Please look at the output of spy on example matrix.
I want to efficiently find those clusters in MATLAB.
This problem is equivalent to finding connected components of a graph, however I have a feeling that relevant functionality should be available as a (combination of) fast MATLAB built-in functions that operate on sparse matrices.
Can you suggest such combination?
OK, found graphconncomp function in bioinformatics toolbox. It uses some mex routines internally.
I have a linear program with order N^4 variables and order N^4 constraints. If I want to solve this in AMPL, I define the constraints one by one without having to bother about the exact coefficient matrices. No memory issues arises. When using the standard LP-solver in Matlab however, I need to define the matrices explicitly.
When I have variables with four subscripts, this will lead to a massively sparse matrix of dimension order N^4 x N^4. This matrix won't even fit in memory for non trivial problem sizes.
Is there a way to get around this problem using Matlab, apart from various column generation/cutting plane techniques? Since AMPL manages to solve it, I suppose they're either automating some kind of decomposition, or they somehow solve the LP without explicitly working with this sparse monster matrix.
Apart from sparse mentioned by m.s. you can also use AMPL API for MATLAB. It is especially useful if you already have an AMPL model and want to work with it from MATLAB.
Converting my comment into an answer:
MATLAB supports sparse matrices using the sparse command which allows you to build your constraint matrix without exceeding memory limits.
I am trying to make my control algorithm more efficient since my matrices are sparse. Currently, I am doing conventional matrix-vector multiplications in Simulink/xPC for a real-time application. I can not find a way to convert the matrix to a sparse one and perform that type of multiplication where it is compatible with xPC. Does anyone have an idea on how to do this?
It appears, at least as of earlier this year, that it is impossible to do sparse matrices in Simulink: see this Q&A on MathWorks' site. As the answerer is a Simulink software engineer, it seems authoritative. :)
I have a matrix of size 200000 X 200000 .I need to find the eigen values for this .I was using matlab till now but as the size of the matrix is unhandleable by matlab i have shifted to perl and now even perl is unable to handle this huge matrix it is saying out of memory.I would like to know if i can find out the eigen values of this matrix using some other programming language which can handle such huge data. The elements are not zeros mostly so no option of going for sparse matrix. Please help me in solving this.
I think you may still have luck with MATLAB. Take a look into their distributed computing toolbox. You'd need some kind of parallel environment, a computing cluster.
If you don't have a computational cluster, you might look into distributed eigenvalue/vector calculation methods that could be employed on Amazon EC2 or similar.
There is also a discussion of parallel eigenvalue calculation methods here, which may direct you to better libraries and programming approaches than Perl.