Does MATLAB execute basic array operations in constant space? - matlab

I am getting an out of memory error on this line of MATLAB code:
result = (A(1:xmax,1:ymax,1:zmax) .* B(2:xmax+1,2:ymax+1,2:zmax+1) +
A(2:xmax+1,2:ymax+1,2:zmax+1) .* B(1:xmax,1:ymax,1:zmax)) ./ C
where C is another array. This is on 32 bit MATLAB (I can't seem to get the 64 bit version at the moment, which would temporarily fix my problems).
The arrays result, A, B, and C are pre-initialized and never change size. It is then my guess that this computation is not being performed in constant space.
Is this correct? Is there a way to make it run or check if it is running in constant space?
These arrays of are approximate size (250, 250, 250).
If MATLAB does not run this in constant size, does anyone have any experience as to whether Octave or Julia or (insert similar language) does?
edit 1:
I eliminated excess arrays. There are 10 arrays that are 258 x 258 x 338, which corresponds to 1.67 GB. There are a bunch of other variables but they are much smaller. The calculation presented is simplified, the form of the calculation is:
R = (A(3Drange) .* B(3Drange) + A(new_3Drange) .* D(new_3Drange) + . . . ) ./ C
where the ranges generally just differ by a shift of plus or minus 1 or 2.
The output of memory command:
Maximum possible array: 669 MB (7.013e+08 bytes) *
Memory available for all arrays: 1541 MB (1.616e+09 bytes) **
Memory used by MATLAB: 2209 MB (2.316e+09 bytes)
Physical Memory (RAM): 8154 MB (8.550e+09 bytes)
* Limited by contiguous virtual address space available.
** Limited by virtual address space available.
Apparently I should be violating the second line. However, the code runs fine until the first operation that I actually do with the arrays. Perhaps MATLAB is being lazy and not allocating when I type:
A=zeros(xmax+2,ymax+2,zmax+2);
but still telling me in the workspace that the variable is allocated.
This code has worked before with smaller arrays. (edit: but it seems the actual memory size is the problem, not the size of each individual array).
The very curious thing to me is why it does not error during allocation, but instead errors during the first calculation.
edit 2:
I have confirmed that the loop is not running constant in space. There is about a .8 GB of memory being allocated during the calculation. Here is an image of resource usage while the command is being executed in a loop:
However, I tried breaking up the computation into multiple lines. I split the computation at each addition and added on each part in a new command, treating R as a accumulator. The result is that less memory is allocated at one time, but presumably more often. Here is the picture:
I am still curious as to why MATLAB doesn't want to execute this in constant space. I think it perhaps has something to do with the indexing being shifted - I am planning on investigating it more later and then putting this all together in an answer, but someone may beat me to it, which would be great also. Now, though, I can run the array size I was looking for and can finish my project.

I guess that most of the question has already been answered:
Does it operate in constant space?
No as you verified, it does not.
Why doesn't it operate in constant space?
Matlab claims to be fast at vectorized matrix operations, not so much emphasis is placed on memory efficiency.
What to do now?
Here are different options, the first one is preferred if possible, the other two are certainly possible.
Make it fit, for example by upgrading to 64 bit matlab or by not putting other stuf in your workspace
Work on parts of the matrix, so for example cut it in half
Dont use vectorization at all but make a simple for loop
If you don't vectorize, you will have a minimal space solution.

Related

How to 'copy' matrix without creating a temporary matrix in memory that caused memory overflow?

By assigning a matrix into a much bigger allocated memory, matlab somehow will duplicate it while 'copying' it, and if the matrix to be copied is large enough, there will be memory overflow. This is the sample code:
main_mat=zeros(500,500,2000);
n=500;
slice_matrix=zeros(500,500,n);
for k=1:4
parfor i=1:n
slice_matrix(:,:,i)=gather(gpuArray(rand(500,500)));
end
main_mat(:,:,1+(k-1)*n:1+(k-1)*n+n-1)=slice_matrix; %This is where the memory will likely overflow
end
Any way to just 'smash' the slice_matrix onto the main_mat without the overhead? Thanks in advance.
EDIT:
The overflow occurred when main_mat is allocated beforehand. If main_mat is initialized with main_mat=zeros(500,500,1); (smaller size), the overflow will not occur, but it will slowed down as allocation is not done before matrix is assigned into it. This will significantly reduce the performance as the range of k increases.
The main issue is that numbers take more space than zeros.
main_mat=zeros(500,500,2000); takes little RAM while main_mat = rand(500,500,2000); take a lot, no matter if you use GPU or parfor (in fact, parfor will make you use more RAM). So This is not an unnatural swelling of memory. Following Daniel's link below, it seems that the assignment of zeros only creates pointers to memory, and the physical memory is filled only when you use the matrix for "numbers". This is managed by the operating system. And it is expected for Windows, Mac and Linux, either you do it with Matlab or other languages such as C.
Removing parfor will likely fix your problem.
parfor is not useful there. MATLAB's parfor does not use shared memory parallelism (i.e. it doesn't start new threads) but rather distributed memory parallelism (it starts new processes). It is designed to distribute work over a set or worker nodes. And though it also works within one node (or a single desktop computer) to distribute work over multiple cores, it is not an optimal way of doing parallelism within one node.
This means that each of the processes started by parfor needs to have its own copy of slice_matrix, which is the cause of the large amount of memory used by your program.
See "Decide When to Use parfor" in the MATLAB documentation to learn more about parfor and when to use it.
I assume that your code is just a sample code and that rand() represents a custom in your MVE. So there are a few hints and tricks for the memory usage in matlab.
There is a snippet from The MathWorks training handbooks:
When assigning one variable to another in MATLAB, as occurs when passing parameters into a function, MATLAB transparently creates a reference to that variable. MATLAB breaks the reference, and creates a copy of that variable, only when code modifies one or more of teh values. This behavior, known as copy-on-write, or lazy-copying, defers the cost of copying large data sets until the code modifies a values. Therefore, if the code performs no modifications, there is no need for extra memory space and execution time to copy variables.
The first thing to do would be to check the (memory) efficiency of your code. Even the code of excellent programmers can be futher optimized with (a little) brain power. Here are a few hints regarding memory efficiency
make use of the nativ vectorization of matlab, e.g. sum(X,2), mean(X,2), std(X,[],2)
make sure that matlab does not have to expand matrices (implicit expanding was changed recently). It might be more efficient to use the bsxfun
use in-place-operations, e.g. x = 2*x+3 rather than x = 2*x+3
...
Be aware that optimum regarding memory usage is not the same as if you would want to reduce computation time. Therefore, you might want to consider reducing the number of workers or refrain from using the parfor-loop. (As parfor cannot use shared memory, there is no copy-on-write feature with using the Parallel Toolbox.
If you want to have a closer look at your memory, what is available and that can be used by Matlab, check out feature('memstats'). What is interesting for you is the Virtual Memory that is
Total and available memory associated with the whole MATLAB process. It is limited by processor architecture and operating system.
or use this command [user,sys] = memory.
Quick side node: Matlab stores matrices consistently in memory. You need to have a large block of free RAM for large matrices. That is also the reason why you want to allocate variables, because changing them dynamically forces Matlab to copy the entire matrix to a larger spot in the RAM every time it outgrows the current spot.
If you really have memory issues, you might just want to dig into the art of data types -- as is required in lower level languages. E.g. you can cut your memory usage in half by using single-precision directly from the start main_mat=zeros(500,500,2000,'single'); -- btw, this also works with rand(...,'single') and more native functions -- although a few of the more sophisticated matlab functions require input of type double, which you can upcast again.
If I understand correctly your main issue is that parfor does not allow to share memory. Think of every parfor worker as almost a separate matlab instance.
There is basically just one workaround for this that I know (that I have never tried), that is 'shared matrix' on Fileexchange: https://ch.mathworks.com/matlabcentral/fileexchange/28572-sharedmatrix
More solutions: as others suggested: remove parfor is certainly one solution, get more ram, use tall arrays (that use harddrives when ram runs full, read here), divide operations in smaller chunks, last but not least, consider an alternative other than Matlab.
You may use following code. You actually don't need the slice_matrix
main_mat=zeros(500,500,2000);
n=500;
slice_matrix=zeros(500,500,n);
for k=1:4
parfor i=1:n
main_mat(:,:,1+(k-1)*n + i - 1) = gather(gpuArray(rand(500,500)));
end
%% now you don't need this main_mat(:,:,1+(k-1)*n:1+(k-1)*n+n-1)=slice_matrix; %This is where the memory will likely overflow
end

Largest Matrix Matlab Linprog can Support

I want to use MATLAB linprog to solve a problem, and I check it by a much smaller, much simpler example.
But I wonder if MATLAB can support my real problem, there may be a 300*300*300*300 matrix...
Maybe I should give the exact problem. There is a directed graph of network nodes, and I want to get the lowest utilization of the edge capacity under some constraints. Let m be the number of edges, and n be the number of nodes. There are mn² variables and nm² constraints. Unfortunately, n may reach 300...
I want to use MATLAB linprog to solve it. As described above, I am afraid MATLAB can not support it...Lastly the matrix must be sparse, can some way simplify it?
First: a 300*300*300*300 array is not called a matrix, but a tensor (or simply array). Therefore you can not use matrix/vector algebra on it, because that is not defined for arrays with dimensionality greater than 2, and you can certainly not use it in linprog without some kind of interpretation step.
Second: if I interpret that 300⁴ to represent the number of elements in the matrix (and not the size), it really depends if MATLAB (or any other software) can support that.
As already answered by ben, if your matrix is full, then the answer is likely to be no. 300^4 doubles would consume almost 65GB of memory, so it's quite unlikely that any software package is going to be capable of handling that all from memory (unless you actually have > 65 GB of RAM). You could use a blockproc-type scheme, where you only load parts of the matrix in memory and leave the rest on harddisk, but that is insanely slow. Moreover, if you have matrices that huge, it's entirely possible you're overlooking some ways in which your problem can be simplified.
If you matrix is sparse (i.e., contains lots of zeros), then maybe. Have a look at MATLAB's sparse command.
So, what exactly is your problem? Where does that enormous matrix come from? Perhaps I or someone else sees a way in which to reduce that matrix to something more manageable.
On my system, with 24GByte RAM installed, running Matlab R2013a, memory gives me:
Maximum possible array: 44031 MB (4.617e+10 bytes) *
Memory available for all arrays: 44031 MB (4.617e+10 bytes) *
Memory used by MATLAB: 1029 MB (1.079e+09 bytes)
Physical Memory (RAM): 24574 MB (2.577e+10 bytes)
* Limited by System Memory (physical + swap file) available.
On a 64-bit version of Matlab, if you have enough RAM, it should be possible to at least create a full matrix as big as the one you suggest, but whether linprog can do anything useful with it in a realistic time is another question entirely.
As well as investigating the use of sparse matrices, you might consider working in single precision: that halves your memory usage for a start.
well you could simply try: X=zeros( 300*300*300*300 )
on my system it gives me a very clear statement:
>> X=zeros( 300*300*300*300 )
Error using zeros
Maximum variable size allowed by the program is exceeded.
since zeros is a build in function, which only fills a array of the given size with zeros you can asume that handling such a array will not be possible
you can also use the memory command
>> memory
Maximum possible array: 21549 MB (2.260e+10 bytes) *
Memory available for all arrays: 21549 MB (2.260e+10 bytes) *
Memory used by MATLAB: 685 MB (7.180e+08 bytes)
Physical Memory (RAM): 12279 MB (1.288e+10 bytes)
* Limited by System Memory (physical + swap file) available.
>> 2.278e+10 /8
%max bytes avail for arrays divided by 8 bytes for double-precision real values
ans =
2.8475e+09
>> 300*300*300*300
ans =
8.1000e+09
which means I dont even have the memory to store such a array.
while this may not answer your question directly it might still give you some insight.

Fast Algorithms for Finding Pairwise Euclidean Distance (Distance Matrix)

I know matlab has a built in pdist function that will calculate pairwise distances. However, my matrix is so large that its 60000 by 300 and matlab runs out of memory.
This question is a follow up on Matlab euclidean pairwise square distance function.
Is there any workaround for this computational inefficiency. I tried manually coding the pairwise distance calculations and it usually takes a full day to run (sometimes 6 to 7 hours).
Any help is greatly appreciated!
Well, I couldn't resist playing around. I created a Matlab mex C file called pdistc that implements pairwise Euclidean distance for single and double precision. On my machine using Matlab R2012b and R2015a it's 20–25% faster than pdist(and the underlying pdistmex helper function) for large inputs (e.g., 60,000-by-300).
As has been pointed out, this problem is fundamentally bounded by memory and you're asking for a lot of it. My mex C code uses minimal memory beyond that needed for the output. In comparing its memory usage to that of pdist, it looks like the two are virtually the same. In other words, pdist is not using lots of extra memory. Your memory problem is likely in the memory used up before calling pdist (can you use clear to remove any large arrays?) or simply because you're trying to solve a big computational problem on tiny hardware.
So, my pdistc function likely won't be able to save you memory overall, but you may be able to use another feature I built in. You can calculate chunks of your overall pairwise distance vector. Something like this:
m = 6e3;
n = 3e2;
X = rand(m,n);
sz = m*(m-1)/2;
for i = 1:m:sz-m
D = pdistc(X', i, i+m); % mex C function, X is transposed relative to pdist
... % Process chunk of pairwise distances
end
This is considerably slower (10 times or so) and this part of my C code is not optimized well, but it will allow much less memory use – assuming that you don't need the entire array at one time. Note that you could do the same thing much more efficiently with pdist (or pdistc) by creating a loop where you passed in subsets of X directly, rather than all of it.
If you have a 64-bit Intel Mac, you won't need to compile as I've included the .mexmaci64 binary, but otherwise you'll need to figure out how to compile the code for your machine. I can't help you with that. It's possible that you may not be able to get it to compile or that there will be compatibility issues that you'll need to solve by editing the code yourself. It's also possible that there are bugs and the code will crash Matlab. Also, note that you may get slightly different outputs relative to pdist with differences between the two in the range of machine epsilon (eps). pdist may or may not do fancy things to avoid overflows for large inputs and other numeric issues, but be aware that my code does not.
Additionally, I created a simple pure Matlab implementation. It is massively slower than the mex code, but still faster than a naïve implementation or the code found in pdist.
All of the files can be found here. The ZIP archive includes all of the files. It's BSD licensed. Feel free to optimize (I tried BLAS calls and OpenMP in the C code to no avail – maybe some pointer magic or GPU/OpenCL could further speed it up). I hope that it can be helpful to you or someone else.
On my system the following is the fastest (Even faster than the C code pdistc by #horchler):
function [ mD ] = CalcDistMtx ( mX )
vSsqX = sum(mX .^ 2);
mD = sqrt(bsxfun(#plus, vSsqX.', vSsqX) - (2 * (mX.' * mX)));
end
You'll need a very well tuned C code to beat this, I think.
Update
Since MATLAB R2016b MATLAB supports implicit broadcasting without the use of bsxfun().
Hence the code can be written:
function [ mD ] = CalcDistMtx ( mX )
vSsqX = sum(mX .^ 2, 1);
mD = sqrt(vSsqX.'+ vSsqX - (2 * (mX.' * mX)));
end
A generalization is given in my Calculate Distance Matrix project.
P. S.
Using MATLAB's pdist for comparison: squareform(pdist(mX.')) is equivalent to CalcDistMtx(mX).
Namely the input should be transposed.
Computers are not infinitely large, or infinitely fast. People think that they have a lot of memory, a fast CPU, so they just create larger and larger problems, and then eventually wonder why their problem runs slowly. The fact is, this is NOT computational inefficiency. It is JUST an overloaded CPU.
As Oli points out in a comment, there are something like 2e9 values to compute, even assuming you only compute the upper or lower half of the distance matrix. (6e4^2/2 is approximately 2e9.) This will require roughly 16 gigabytes of RAM to store, assuming that only ONE copy of the array is created in memory. If your code is sloppy, you might easily double or triple that. As soon as you go into virtual memory, things get much slower.
Wanting a big problem to run fast is not enough. To really help you, we need to know how much RAM is available. Is this a virtual memory issue? Are you using 64 bit MATLAB, on a CPU that can handle all the needed RAM?

Matlab: your opinion about a small memory issue working with matrix

I have a small question regarding MATLAB memory consumption.
My Architecture:
- Linux OpenSuse 12.3 64bit
- 16 GB of RAM
- Matlab 2013a 64 bit
I handle a matrix of double with size: 62 x 11969100 (called y)
When I try the following:
a = bsxfun(#minus,y,-1)
or simply
a = minus(y, -1)
I got a OUT of MEMORY error (in both cases).
I've just computed the ram space allocated for the matrix:
62 x 11969100 x 8 = 5.53 GB
Where am I wrong?!
Thanks a lot!
I'm running on Win64, with 16GB RAM.
Starting with a fresh MATLAB, with only a couple of other inconsequential applications open, my baseline memory usage is about 3.8GB. When I create y, that increases to 9.3GB (9.3-3.8 = 5.5GB, about what you calculate). When I then run a = minus(y, -1), I don't run out of memory, but it goes up to about 14.4GB.
You wouldn't need much extra memory to have been taken away (1.6GB at most) for that to cause an out of memory error.
In addition, when MATLAB stores an array, it requires a contiguous block of memory to do so. If your memory was a little fragmented - perhaps you had a couple of other tiny variables that happened to be stored right in the middle of one of those 5.5GB blocks - you would also get an out of memory error (you can sometimes avoid that issue with the use of pack).
The output of memory on windows platform:
>> memory
Maximum possible array: 2046 MB (2.145e+009 bytes) *
Memory available for all arrays: 3226 MB (3.382e+009 bytes) **
Memory used by MATLAB: 598 MB (6.272e+008 bytes)
Physical Memory (RAM): 3561 MB (3.734e+009 bytes)
* Limited by contiguous virtual address space available.
** Limited by virtual address space available.
The output of computer on linux/mac:
>> [~,maxSize] = computer
maxSize =
2.814749767106550e+14 % Max. number of elements in a single array
with some hacks (found here):
>> java.lang.Runtime.getRuntime.maxMemory
ans =
188416000
>> java.lang.Runtime.getRuntime.totalMemory
ans =
65011712
>> java.lang.Runtime.getRuntime.freeMemory
ans =
57532968
As you can see, aside from memory limitations per variable, there are also limitations on total storage for all variables. This is not different for Windows or Linux.
The important thing to note is that for example on my Windows machine, it is impossible to create two 1.7GB variables, even though I have enough RAM, and neither is limited by maximum variable size.
Since carrying out the minus operation will assign a result of equal size to a new variable (a in your case, or ans when not assigning to anything), there need to be at least two of these humongous things in memory.
My guess is you run into the second limit of total memory space available for all variables.
bsxfun is vectorized for efficiency. Typically vectorized solutions require more than just minimal memory.
You could try using repmat, or if that does not work a simple for loop.
In general I believe the for loop will require the least memory.

2d matrix to a 3d matrix without using a loop

I have a 300x300 matrix. I need to make a 300x300x1024 matrix where each "slice" is the original 300x300 matrix. Is there any way to do this without a loop? I tried the following:
old=G;
for j=2:N;
G(:,:,j)=old;
end
where N is 1024, but I run out of memory.
Know any shorter routes?
use repmat.
B = repmat(A,[m n p...])
produces a multidimensional array B composed of copies of A. The size of B is [size(A,1)*m, size(A,2)*n, size(A,3)*p, ...].
In your case ,
G=repmat(old,[1 1 1024]);
Will yield the result you wanted without the for loop. The memory issue is a completely different subject. A 300x300x1024 double matrix will "cost" you ~740 MB of memory, that's not a lot these days. Check your memory load before you try the repmat and see why you don't have these extra 700 MB. use memory and whos to see what is the available memory and which variables can be cleared.
You are likely running out of memory because you haven't pre-initialized your matrix.
if you do this first,
old = G;
G = zeros(size(old,1), size(old,2), 1024);
and then start the loop from 1 instead of 2, you will probably not run out of memory
Why this works is because you first set aside a block of memory large enough for the entire matrix. If you do not initialize your matrix, matlab first sets aside enough memory for a 300x300x1 matrix. Next when you add the second slice, it moves down the memory, and allocates a new block for a 300x300x2 matrix, and so on, never being able to access the memory allocated for the first matrices.
This occurs often in matlab, so it is important to never grow your matrices within a loop.
Quick answer is no, you will need to loop.
You might be able to do something smart like block-copying your array's memory but you didn't even give us a language to work with.
You will probably want to make sure each entry in your matrix is a minimum size, even at byte matrix size you will require 92mb, if you are storing a 64bit value we're talking nearly a gig. If it's an object your number will leap into the many-gig range in no time. Bit packing may come in handy... but again no idea what your other constraints are.
Edit: I updated your tags.
I'm not sure if I can help much, but doubles are 64bits each so at bare minimum you're talking about 2gb (You're already past impossible if you are on a 32 bit os). This could easily double if each cell involves one or two pointers to different memory locations (I don't know enough about matlab to tell you for sure).
If you're not running on an 8gb 64 bit machine I don't think you have a chance. If you are, allocate all the memory you can to matlab and pray.
Sorry I can't be of more help, maybe someone else knows more tricks.