MATLAB: Matrix multiplication with very large arrays - matlab

I need to perform a matrix multiplication with very large matrices, something like 5000x13 * 13x2000000. This leads to an error message as I don't have enough memory. I understand this.
Now, what's the best strategy to overcome this memory issue?

My suggestion is to split the array that you want to generate. Even if you generate the array, you can not store it! 5000 by 2 million is larger than array size limit in MATLAB! This limit applies to the size of each array, not the total size of all MATLAB arrays. So the problem is not coming from multiplication.
My suggestion is that you make four blocks of your output matrix each 5000 by 500K and write the multiplication of each block separately.

You could use tall arrays. They have some quirks regarding order of multiplication and such so you may want to look up the documentation. You weren't very specific on what you want exactly, so lets say you want to find the mean of your matrix multiplication. Here's how you could do it with tall arrays:
a = rand(5000,13).';
b = tall(rand(13,2000000).');
c = b * a;
d = mean(c,1);
e = gather(d).';
Note that in tall array multiplication, only one matrix is allowed to be tall and if the tall array is multiplied with another matrix, then the tall array must come first. This is why I used transposes quite liberally.

Related

Preallocate space for array with known maximum size

Is there anyway to allot a fixed chunk of memory for a growing/decreasing array in a loop? I know the range of sizes it can have, i.e. the min and max dimensions.
I could just allot a matrix of max size as follows,
A = zeros(max,max);
but here's the problem with that approach. I have matrix multiplication and inverse operations inside the loop. On top of that, I am using slicing operation (complete row/column selections)
A[:,i] = data(i).x;
B = A\P;
C = A*W;
The allocation of maximum size does not go well these operations (size mismatch error).
So, I am trying to allocate a memory chunk corresponding to a max dimension, but want to utilize only a part of it.
I know this can be achieved using loops for matrix operations, instead of vector operations in Matlab but that would be inefficient. (in fact I am not sure how the inverse operation would be implemented in a loop).
Any help is appreciated.

use a small matrix to generate a larger matrix by concatenating it again and again

I have a 40x43 matrix and I would like to use this matrix a building block to generate larger matrix.
I want to generate a structure like the image attached and the building block is the 40x43 matrix. I tried using [A zeros(20,43); zeros(20,43) A] but as I had guessed, the horzcat did't work. I would ideally like to use this block 1000 times to extend the structure of matrix. Could anyone tell me an efficient way to concatenate the small matrix?
Try using kron. This performs what is known as the Kronecker product such that for two matrices A and B, the result is:
In this case, we can replicate what you want exactly by setting A to be the identity matrix of size 1000 x 1000 and B to be the matrix you want to replicate. However, to promote computational savings and memory usage, make sure you use the sparse version of the identity matrix. This will convert the output matrix to sparse form. If you want to replicate this 1000 times, you are creating a 40000 x 43000 matrix and this requires 13.76 GB of memory and you probably don't have enough memory available for this matrix. Since most of the elements are zero, use the sparse version instead:
N = 1000;
B = kron(speye(N), A);

Can we create a fixed-sized array w/o initialization

We have zeros(42,42), ones(42,42), inf(42,42), nan(42,42)...
Can we create an array w/o initialization and then fill it w/ numbers later? I know this could not be a good code that code analyzers can prove its safety. But in case an array is large, this could save some computation.
There is no such thing as an empty array of a fixed size. If you start filling the elements of the array later, and assign a value to element [k1, k2], then you will have an array of size at least [k1, k2], all doubles (by default). The reason is that matlab arrays are homogeneous containers, so every element has to be a proper double (or the corresponding type of the array). Sooner or later, your array has to be allocated, with zeros in case of unassigned elements. The most efficient thing to do in case of full matrices is to preallocate, which is what zeros(k1max,k2max) does. Actually, at least in older versions of MATLAB, it is faster to pre-allocate with mymat(k1max,k2max)=0;, i.e. by assigning a single zero to the bottom-right corner of your array (this automatically pre-allocates all the other elements between that and [1,1]. An other upside of pre-allocation is that MATLAB can reserve a contiguous block of memory for the whole array at once, which is the most efficient scenario possible.
What you might be looking for are sparse arrays. In case of large arrays with a huge number of zero elements, it's inefficient to store all those zeroes in memory, and to perform computations on them. MATLAB naturally treats sparse arrays, where only the nonzero elements are stored (for each column, so there's some overhead), which leads to huge memory efficiency and performance increase in case of very sparse matrices (where the number of nonzero elements is much smaller than the total number of elements).
An important upside of sparse matrices is that all arithmetic operations and almost all matrix operations are implemented for them, or at least they are automatically cast to full matrices. This makes their use almost identical to full matrices. And in line with your question, you only store the nonzero elements. Obviously this is only efficient if the matrix is sparse enough, otherwise the overhead from the bookkeeping of elements (and not using fully vectorized matrix operations) will make their use inefficient.
As a final remark, I just want to note that you can create empty double arrays as long as one of their dimensions is zero:
>> double.empty(100,0)
ans =
Empty matrix: 100-by-0
>> double.empty(100,100)
Error using double.empty
At least one dimension must be zero.
but this rarely has a place in practical applications.

Sparse Matrix Assignment becomes very slow in Matlab

I am filling a sparse matrix P (230k,290k) with values coming from a text file which I read line by line, here is the (simplified) code
while ...
C = textscan(text_line,'%d','delimiter',',','EmptyValue', 0);
line_number = line_number+1;
P(line_number,:)=C{1};
end
the problem I have is that while at the beginning the
P(line_number,:)=C{1};
statement is fast, after a few thousands lines become exterely slow, I guess because Matlab need to find the memory space to allocate every time. Is there a way to pre-allocate memory with sparse matrixes? I don't think so but maybe I am missing something. Any other advise which can speed up the operation (e.g. having a lot of free RAM can make the difference?)
There's a sixth input argument to sparse that tells the number of nonzero elements in the matrix. That's used by Matlab to preallocate:
S = sparse(i,j,s,m,n,nzmax) uses vectors i, j, and s to generate an
m-by-n sparse matrix such that S(i(k),j(k)) = s(k), with space
allocated for nzmax nonzeros.
So you could initiallize with
P = sparse([],[],[],230e3,290e3,nzmax);
You can make a guess about the number of nonzeros (perhaps checking file size?) and use that as nzmax. If it turns you need more nonzero elements in the end, Matlab will preallocate on the fly (slowly).
By far the fastest way to generate a sparse matrix wihtin matlab is to load all the values in at once, then generate the sparse matrix in one call to sparse. You have to load the data and arrange it into vectors defining the row and column indices and values for each filled cell. You can then call sparse using the S = sparse(i,j,s,m,n) syntax.

Matlab - vector divide by vector, use loop

I have to two evenly sized very large vectors (columns) A and B. I would like to divide vector A by vector B. This will give me a large matrix AxB filled with zeros, except the last column. This column contains the values I'm interested in. When I simple divide the vectors in a Matlab script, I run out of memory. Probably because the matrix AxB becomes very large. Probably I can prevent this from happening by repeating the following:
calculating the first row of matrix AxB
filter the last value and put it into another vector C.
delete the used row of matrix AxB
redo step 1-4 for all rows in vector A
How can I make a loop which does this?
You're question doesn't make it clear what you are trying to do, although it sounds like you want to do an element wise division.
Try:
C = A./B
"Matrix product AxB" and "dividing vectors" are distinct operations.
If we understood this correctly, what you do want to calculate is "C = last column from AxB", such that:
lastcolsel=zeros(size(B,2),1)
C=(A*B)*lastcolsel
If that code breaks your memory limit, recall that matrix product is associative (MxN)xP = Mx(NxP). Simplifying your example, we get:
lastcolsel=zeros(size(B,2),1)
simplifier=B*lastcolsel
C=A*simplifier