Is MATLAB vector length a constant operation? - matlab

I assume that MATLAB vectors/matrices have some meta data about dim/size/lengths. So length(a) is supposed to be a very fast operation if a is of vector. Since MATLAB doc does not talk about complexity in general, do we have any way to confirm this?

You are correct. "Under the hood" MATLAB stores and maintains a size for all array types, and the length operator merely retrieves this value. It isn't quite a simple variable reference, because length has to look at all size dimensions and pick the largest, so it is O(n) in the number of dimensions.

Related

Matlab optimization toolbox, optimizing hessian

Never used this toolbox before, I have a very large problem (i.e. number of variables) to be optimzed. I'm aware it's possible to optimize the hessian computation, which is my issue given the error:
Error using eye
Requested 254016x254016 (480.7GB) array exceeds maximum array size preference. Creation of arrays greater than this limit may
take a long time and cause MATLAB to become unresponsive. See array size limit or preference panel for more information.
But according to this quote (from a forum) it must be possible to optimize the hessian computation:
If you are going to use the trust-region algorithm, you will need to
choose some combination of the options 'Hessian', 'HessMult', and
'HessPattern' to avoid full, explicit computation of the Hessian.
I struggle to find examples of this settings, does anyone know?
My problem is a sparse problem, if such information is necessary.
Basically I'm sure there's some extra options to be put in a line like:
option = optimoptions(#fminunc,...
'Display','iter','GradObj','on','MaxIter',30,...
'ObjectiveLimit',10e-10,'Algorithm','quasi-newton');
You probably need to add 'HessPattern',Hstr to optimoptions. An example is given here (In this example, Hstr is defined in brownhstr.mat; you need to calculate your own hessian sparsity pattern matrix Hstr).

Sparse boolean matrix multiplication

Does anybody know the efficient implementation of sparse boolean matrix multiplication? I'm interested in both CPU and GPGPU implementations because it is necessary to multiply matrices of different sizes (from 8x8 to up to 10^8x10^8). Currently, I use cuSPARSE library, but it supports only numerical matrices (float, double etc) and this fact leads to huge overhead (by memory and time) which is critical in my task.
Since a boolean matrix can be viewed as the adjacency matrix of some (bipartite) graph, its product with another matrix can be interpreted as the distance 2 connections between the nodes of two subgraphs linked by a common set of nodes.
To avoid wasting space and exploit some amount of bit parallelism, you could try using some form of succint data structure for graph storage and manipulation.
One such family of data structures which could be useful in your case is the K2-tree (or Kn in general), which uses an approach to store the adjacencies similar to spatial decompositions such as quad- and oct- trers.
Ultimately, the best algorithm and data structure will heavily depend on the dimension and sparsity patterns of your matrices.

Multiplication of large sparse Matrices without null values in scala

I have two very sparse distributed matrixes of dimension 1,000,000,000 x 1,000,000,000 and I want to compute the matrix multiplication efficiently.
I tried to create a BlockMatrix from a CoordinateMatrix but it's a lot of memory (where in reality the non zero data are around ~500'000'000) and the time of computation is enormous.
So there is another way to create a sparse matrix and compute a multiplication efficiently in a distributed way in Spark? Or i have to compute it manually?
You must obviously use a storage format for sparse matrices that makes use of their sparsity.
Now, without knowing anything about how you handle matrices and which libraries you use, there's no helping you but to ask you to look at the linear algebra libraries of your choice and look for sparse storage formats; the "good old" Fortran-based libraries that underly a lot of modern math libs support them, and so chances are that you really have to do but a little googling with yourlibraryname + "sparse matrix".
second thoughts:
Sparse matrixes really don't lend themselves to distribution very well; think about the operations you'd have to do to coordinate distribution compared to the actual multiplications/additions.
Also, ~5e8 non-zero elements in a 1e18 element matrix are definitely a lot of memory, and since you don't specify how much you consider a lot to be, it's very possible there's nothing wrong with it. Assuming you're using the default double precision, that's 5e8 * 8B = 4GB of pure numbers, not counting the coordinates needed for sparse storage. So, if you've got ~10GB of memory, I wouldn't be surprised at all.
As there is no build-in method in Spark to perform a matrix multiplication with sparse matrixes. I resolved by reduce at best the sparsity of the matrices before perform the matrice multiplication with BlockMatrix (that not support sparse matrix).
Last edit: Even with the sparsity optimization I had a lot of problems with large dataset. Finally, I decided to implement it myself. Now is running very fast. I hope that a matrix implementation with sparse matrix will be implemented in Spark as I think there are a lot of application that can make use of this.

Large and Sparse Matrix Multiplcation

I have a very large and sparse matrix of size 180GB(text , 30k * 3M) containing only the entries and no additional data. I have to do matrix multiplication , inversion and some similar linear algebra operations over it. I tried octave and simple single-threaded C code for the multiplication but my system RAM of 40GB gets used up very fast and then I can find the program starts thrashing. Is there any other options available to me. I am not familiar with MathLab or any other matrix operational library that can help me in doing so.
When I run a simple matrix multiplication of two matrices with 10 rows and 3 M cols, and its transpose, it gives the following error :
memory exhausted or requested size too large for range of Octave's index type
I am not sure whether the same would work on Matlab or not. For sparse matrix representation and matrix multiplication, is there another library or code.
if there are few enough nonzero entries, I suggest creating a sparse matrix S with appropriate dimensions and max nonzero entries; see matlab create sparse matrix. Then as #oleg komarov described, load the matrix in blocks and assign the nonzero entries from each block into the correct address in the sparse matrix S. I feel that if your matrix is sparse enough, then loading it is really the only difficulty you face. I had similar issues with large transfer operators.
Have you considered performing your processing in blocks? Transposition and multiplications work very well with block matrix processing (see https://en.wikipedia.org/wiki/Block_matrix) and that will get you around any limitations about the indices.
This wouldn't help you with matrix inversion though unless you can decompose your matrix in blocks when blocks that aren't on the diagonal are completely empty, which isn't stated in your assumptions.
Octave has a limit in both the memory resources of about 2GB and the maximum number of indices a matrix can hold of about 2^32 (for 32 bits Octave). MatLab doesn't have such a memory limit, since it will use all of your memory resources, swapping file included. Thus you could try with MatLab by setting a huge swapfile, you may then compute your operations (but it will anyway take quite along time...).
If you are interested by other approaches, you may take a look into out-of-core computing which aims to promote new methods to process huge datasets that cannot reside all in memory, but rather store it on disk and load efficiently the bits that are necessary.
For a practical approach, you may take a look into Blaze for Python (notice: still in development!).

Multidimensional indexing of images

I would like to know if there is a good way for indexing multidimensional objects (i.e. images). More precisely, I have a large collection of images on which I calculate n-dimensional feature vectors. There is a distance metric (i.e. L2-norm) defined over those feature vectors d(u,v). Given a key (an n-dimensional) k, the index should allow fast retrieval of feature vectors that are "close" to k (that is, their distance is small).
MATLAB code reference would be great...
For distances r-tree's are often used. I think it can apply to n-dimensions, but I'm not sure if it will work with custom distance or dissimilarity functions. I think it's implemented in this library. It might help to convert your data to n-dimensional coordinates.