Matlab large matrix multiplication limit - matlab

I'm trying to run the following code below, but Matlab freezes when my matrix gets beyond 10,000 columns. What can I do to fix this?
X = load('iris.mtx');
[n,d] = size(X);
%X=14000x128 double
%form RBF over the data:
nms = sum(X'.^2); %nms becomes 14000x1
%here is where the crash begins, for a smaller data size, like 10000x128, this part wont freeze
K = exp(-nms'*ones(1,n) -ones(n,1)*nms + 2*X*X');
Is this a limitation that I just have to accept? I need to be able to use this for matrix that are much bigger than what I'm currently using.

I would refer to this previously asked question about limitations in the size of Matlab matrices: Matrix size limitation in MATLAB.
The only limitations are the limitations of your hardware.
Not knowing more about what you need to do I can't suggest much further reading. However, as the size of your matrices gets to be larger than you have memory for this question addresses the issue of optimizing your operations. Efficient multiplication of very large matrices in MATLAB

Related

Matlab: Solve for a single variable in a linear system of equations

I have a linear system of about 2000 sparse equations in Matlab. For my final result, I only really need the value of one of the variables: the other values are irrelevant. While there is no real problem in simply solving the equations and extracting the correct variable, I was wondering whether there was a faster way or Matlab command. For example, as soon as the required variable is calculated, the program could in principle stop running.
Is there anyone who knows whether this is at all possible, or if it would just be easier to keep solving the entire system?
Most of the computation time is spent inverting the matrix, if we can find a way to avoid completely inverting the matrix then we may be able to improve the computation time. Lets assume I'm only interested in the solution for the last variable x(N). Using the standard method we compute
x = A\b;
res = x(N);
Assuming A is full rank, we can instead use LU decomposition of the augmented matrix [A b] to get x(N) which looks like this
[~,U] = lu([A b]);
res = U(end,end-1)/U(end,end);
This is essentially performing Gaussian elimination and then solving for x(N) using back-substitution.
We can extend this to find any value of x by swapping the columns of A before LU decomposition,
x_index = 123; % the index of the solution we are interested in
A(:,[x_index,end]) = A(:,[end,x_index]);
[~,U] = lu([A b]);
res = U(end,end)/U(end,end-1);
Bench-marking performance in MATLAB2017a with 10,000 random 200 dimensional systems we get a slight speed-up
Total time direct method : 4.5401s
Total time LU method : 3.9149s
Note that you may experience some precision issues if A isn't well conditioned.
Also, this approach doesn't take advantage of the sparsity of A. In my experiments even with 2000x2000 sparse matrices everything significantly slowed down and the LU method is significantly slower. That said full matrix representation only requires about 30MB which shouldn't be a problem on most computers.
If you have access to theory manuals on NASTRAN, I believe (from memory) there is coverage of partial solutions of linear systems. Also try looking for iterative or tri diagonal solvers for A*x = b. On this page, review the pqr solution answer by Shantachhani. Another reference.

Visualizing a large matrix in matlab

I have a huge sparse matrix (1,000 x 1,000,000) that I cannot load on matlab (not enough RAM).
I want to visualize this matrix to have an idea of its sparsity and of the differences of the values.
Because of the memory constraints, I want to proceed as follows:
1- Divide the matrix into 4 matrices
2- Load each matrix on matlab and visualize it so that the colors give an idea of the values (and of the zeros particularly)
3- "Stick" the 4 images I will get in order to have a global idea for the original matrix
(i) Is it possible to load "part of a matrix" in matlab?
(ii) For the visualization tool, I read about spy (and daspect). However, this function only enables to visualize the non-zero values indifferently of their scales. Is there a way to add a color code?
(iii) How can I "stick" plots in order to make one?
If your matrix is sparse, then it seems that the currently method of storing it (as a full matrix in a text file) is very inefficient, and certainly makes loading it into MATLAB very hard. However, I suspect that as long as it is sparse enough, it can still be leaded into MATLAB as a sparse matrix.
The traditional way of doing this would be to load it all in at once, then convert to sparse representation. In your case, however, it would make sense to read in the text file, one line at a time, and convert to a MATLAB sparse matrix on-the-fly.
You can find out if this is possible by estimating the sparsity of your matrix, and using this to see if the whole thing could be loaded into MATLAB's memory as a sparse matrix.
Try something like: (untested code!)
% initialise sparse matrix
sparse_matrix = sparse(num_rows, num_cols);
row_num = 1;
fid = fopen(filename);
% read each line of text file in turn
while ~feof(fid)
this_line = fscanf(fid, '%f');
% add row to sparse matrix (note transpose, which I think is required)
sparse_matrix(row_num, :) = this_line';
row_num = row_num + 1;
end
fclose(fid)
% visualise using spy
spy(sparse_matrix)
Visualisation
With regards to visualisation: visualising a sparse matrix like this via a tool like imagesc is possible, but I believe it may internally create the full matrix – maybe someone can confirm if this is true or not. If it does, then it's going to cause you memory problems.
All spy is really doing is plotting in 2D the locations of the non-zero elements. You can fairly easily write your own spy function, which can have different coloured or sized points depending on the values at each location. See this answer for some examples.
Saving sparse matrices
As I say above, the method your matrix is saved as is pretty inefficient – for a matrix with 10% sparsity, around 95% of your text file will be a zero or a space. I don't know where this data has come from, but if you have any control over its creation (e.g. it comes from another program you have written) it would make much more sense to save only the non-zero elements in the format row_idx, col_idx, value.
You can then use spconvert to import the sparse matrix directly.
One of the simplest methods (if you can actually store the full sparse matrix in RAM) is to use gnuplot to visualize the sparisty pattern.
I was able to spy matrices of size 10-20GB using gnuplot without problems. But make sure you use png or jpeg formats to output the image. Note that you don't need the value of the non-zero entry only the integers (row, col). And plot them "plot "row_col.dat" using 1:2 with points".
This chooses your row as x axis and cols as your y axis and start plotting the non-zero entries. It is very easy to do this. This is the most scalable solution I know. Gnuplot works at decent speed even for very large datasets (>10GB of [row, cols]), but Matlab just hangs (with due respect)
I use imagesc() to visualise arrays. It scales the values in array to values between 0 and 1, then plots the array like a greyscale bitmap image (of course you can change the colormap to make it easier to see detail).

MATLAB - apply multiple convolution masks to a single matrix

I need to convolve a matrix with many other matrices with few calls to convn.
for example: I have size(MyMat)=[fm, fm ,1, bSize] and size(masks)=[s, s, maskNum]
I want res(:,:,k,:) to be the product of convolving masks(:,:,k) with MyMat
res(:,:,k,:)=convn(MyMat,masks(:,:,k));
since the convolution takes up over 80% of the running time for my script and is called hundreds of thousands of times, I don't want to use a loop.
I'm looking for the fastest way to do this. basically, you could say I have bSize matrices, and I want to apply convolution masks masks to all of them with as few calls as possible to convolution.
The matrices are all small,non-sparse, fft-based convolution will probably slow it down (as a commentor here verified :) )
(The reason I have a 1 in the size of MyMat is because I actually have more elements in that dimension, but I compute the convolution for each element in that dimension in a loop)
The main goal is simply to eliminate the need for the following loop, or make it parallel with very little overhead, if possible:
for i=1:length
res(:,:,:,i)=convn(MyArray,convMask(:,:,i));
end
parallelizing for the GPU would be great if there's a way to do this with less overhead than the usual parfor
Thank you!
I assume that you are preallocating the array res correctly? Without a simple demo of what your doing and an idea of the size of fm, s, etc., one can only make guesses to help you. If the sizes of your matrices are large enough you might look into FFT-based convolution methods (there are some for convn on the Matlab File Exchange). If the data is sparse (> 50% zeros), you could try converting this to matrix multiplication and use sparse data types. You could also try gpuArray/convn if you have a decent one.

Solving multiple linear systems using vectorization

Sorry if this is obvious but I searched a while and did not find anything (or missed it).
I'm trying to solve linear systems of the form Ax=B with A a 4x4 matrix, and B a 4x1 vector.
I know that for a single system I can use mldivide to obtain x: x=A\B.
However I am trying to solve a great number of systems (possibly > 10000) and I am reluctant to use a for loop because I was told it is notably slower than matrix formulation in many MATLAB problems.
My question is then: is there a way to solve Ax=B using vectorization with A 4x4x N and B a matrix 4x N ?
PS: I do not know if it is important but the B vector is the same for all the systems.
You should use a for loop. There might be a benefit in precomputing a factorization and reusing it, if A stays the same and B changes. But for your problem where A changes and B stays the same, there's no alternative to solving N linear systems.
You shouldn't worry too much about the performance cost of loops either: the MATLAB JIT compiler means that loops can often be just as fast on recent versions of MATLAB.
I don't think you can optimize this further. As explained by #Tom, since A is the one changing, there is no benefit in factoring the various A's beforehand...
Besides the looped solution is pretty fast given the dimensions you mention:
A = rand(4,4,10000);
B = rand(4,1); %# same for all linear systems
tic
X = zeros(4,size(A,3));
for i=1:size(A,3)
X(:,i) = A(:,:,i)\B;
end
toc
Elapsed time is 0.168101 seconds.
Here's the problem:
you're trying to perform a 2D operation (mldivide) on a 3d matrix. No matter how you look at it, you need reference the matrix by index which is where the time penalty kicks in... it's not the for loop which is the problem, but it's how people use them.
If you can structure your problem differently, then perhaps you can find a better option, but right now you have a few options:
1 - mex
2 - parallel processing (write a parfor loop)
3 - CUDA
Here's a rather esoteric solution that takes advantage of MATLAB's peculiar optimizations. Construct an enormous 4k x 4k sparse matrix with your 4x4 blocks down the diagonal. Then solve all simultaneously.
On my machine this gets the same solution up to single precision accuracy as #Amro/Tom's for-loop solution, but faster.
n = size(A,1);
k = size(A,3);
AS = reshape(permute(A,[1 3 2]),n*k,n);
S = sparse( ...
repmat(1:n*k,n,1)', ...
bsxfun(#plus,reshape(repmat(1:n:n*k,n,1),[],1),0:n-1), ...
AS, ...
n*k,n*k);
X = reshape(S\repmat(B,k,1),n,k);
for a random example:
For k = 10000
For loop: 0.122570 seconds.
Giant sparse system: 0.032287 seconds.
If you know that your 4x4 matrices are positive definite then you can use chol on S to improve the accuracy.
This is silly. But so is how slow matlab's for loops still are in 2015, even with JIT. This solution seems to find a sweet spot when k is not too large so everything still fits into memory.
I know this post is years old now, but I'll contribute my two cents anyway. You CAN put all of your A matricies into a bigger block diagonal matrix, where there will be 4x4 blocks on the diagonal of a big matrix. The right hand side will be all of your b vectors stacked on top of each other over and over. Once you set this up, it is represented as a sparse system, and can be efficiently solved with the algorithms mldivide chooses. The blocks are numerically decoupled, so even if there are singular blocks in there, the answers for the nonsingular blocks should be right when you use mldivide. There is a code that took this approach on MATLAB Central:
http://www.mathworks.com/matlabcentral/fileexchange/24260-multiple-same-size-linear-solver
I suggest experimenting to see if the approach is any faster than looping. I suspect it can be more efficient, especially for large numbers of small systems. In particular, if there are nice formulas for the coefficients of A across the N matricies, you can build the full left hand side using MATLAB vector operations (without looping), which could give you additional cost savings. As others have noted, vectorized operations aren't always faster, but they often are in my experience.

Alternate method to kron

I'm doing CDMA spreading in MATLAB. And I'm having an Out of Memory error in MATLAB despite upgrading my RAM, preallocating arrays, etc.
Is there an alternate method to kron (Kronecker tensor product) in MATLAB? Here is my code:
tempData = kron( Data, walsh);
Data is a M by 1 matrix and walsh (spread code) is a 8 by 1 matrix.
My Data is comprises of real and imaginary parts, e.g.: 0.000 + 1.000i or 1.000 + 0.000i in double format.
This call to kron is not memory intensive. I know, your problem seems so trivial. However, you don't tell us what is M. For very large values of M, you are simply trying to create too large of an array to fit in memory. It is very easy to forget that your computer is not infinitely large or infinitely fast. We get spoiled when we see "giga" in front of everything.
If you absolutely must do this for that value of M, then you probably need the 64 bit version of MATLAB, AND more memory will always help once you do that.
Another option is to make Data single precision, if you can afford the loss in precision. This will at least give you an extra factor of 2. In order to provide the best help, we need to know the size of M.