Not enough RAM for parfor [duplicate] - matlab

This question already has answers here:
Saving time and memory using parfor?
(2 answers)
Closed 6 years ago.
If a parfor reckons that the computer will not have enough ram to run the code in parallel will it automatically serialize it? That definitely seems to be the case.
I have two identical parfor loops (except regarding the size of the matrices within them). On the first one it easily reaches 100% CPU and half my RAM, on the second one it reaches 12-20% CPU and all my RAM, and the codes are exactly equal (except for the size of the matrices inside them).

I have addressed the same issue in this question here.
In short, being each worker in the Matlab pool independent from the others, each worker needs his own amount of memory.
And no, Matlab does not automatically serialise your for-loop if it goes out-of-memory. If Matlab throws a proper error (as my knowledge it does happen on Windows PCs) you can do some sort of try-catch statement. The try-catch simply tries to execute the code in the try branch and if some error happens it executes automatically the catch block. In your case it'll be something like
try
% parfor here
catch
% standard for here
end

Related

How to 'copy' matrix without creating a temporary matrix in memory that caused memory overflow?

By assigning a matrix into a much bigger allocated memory, matlab somehow will duplicate it while 'copying' it, and if the matrix to be copied is large enough, there will be memory overflow. This is the sample code:
main_mat=zeros(500,500,2000);
n=500;
slice_matrix=zeros(500,500,n);
for k=1:4
parfor i=1:n
slice_matrix(:,:,i)=gather(gpuArray(rand(500,500)));
end
main_mat(:,:,1+(k-1)*n:1+(k-1)*n+n-1)=slice_matrix; %This is where the memory will likely overflow
end
Any way to just 'smash' the slice_matrix onto the main_mat without the overhead? Thanks in advance.
EDIT:
The overflow occurred when main_mat is allocated beforehand. If main_mat is initialized with main_mat=zeros(500,500,1); (smaller size), the overflow will not occur, but it will slowed down as allocation is not done before matrix is assigned into it. This will significantly reduce the performance as the range of k increases.
The main issue is that numbers take more space than zeros.
main_mat=zeros(500,500,2000); takes little RAM while main_mat = rand(500,500,2000); take a lot, no matter if you use GPU or parfor (in fact, parfor will make you use more RAM). So This is not an unnatural swelling of memory. Following Daniel's link below, it seems that the assignment of zeros only creates pointers to memory, and the physical memory is filled only when you use the matrix for "numbers". This is managed by the operating system. And it is expected for Windows, Mac and Linux, either you do it with Matlab or other languages such as C.
Removing parfor will likely fix your problem.
parfor is not useful there. MATLAB's parfor does not use shared memory parallelism (i.e. it doesn't start new threads) but rather distributed memory parallelism (it starts new processes). It is designed to distribute work over a set or worker nodes. And though it also works within one node (or a single desktop computer) to distribute work over multiple cores, it is not an optimal way of doing parallelism within one node.
This means that each of the processes started by parfor needs to have its own copy of slice_matrix, which is the cause of the large amount of memory used by your program.
See "Decide When to Use parfor" in the MATLAB documentation to learn more about parfor and when to use it.
I assume that your code is just a sample code and that rand() represents a custom in your MVE. So there are a few hints and tricks for the memory usage in matlab.
There is a snippet from The MathWorks training handbooks:
When assigning one variable to another in MATLAB, as occurs when passing parameters into a function, MATLAB transparently creates a reference to that variable. MATLAB breaks the reference, and creates a copy of that variable, only when code modifies one or more of teh values. This behavior, known as copy-on-write, or lazy-copying, defers the cost of copying large data sets until the code modifies a values. Therefore, if the code performs no modifications, there is no need for extra memory space and execution time to copy variables.
The first thing to do would be to check the (memory) efficiency of your code. Even the code of excellent programmers can be futher optimized with (a little) brain power. Here are a few hints regarding memory efficiency
make use of the nativ vectorization of matlab, e.g. sum(X,2), mean(X,2), std(X,[],2)
make sure that matlab does not have to expand matrices (implicit expanding was changed recently). It might be more efficient to use the bsxfun
use in-place-operations, e.g. x = 2*x+3 rather than x = 2*x+3
...
Be aware that optimum regarding memory usage is not the same as if you would want to reduce computation time. Therefore, you might want to consider reducing the number of workers or refrain from using the parfor-loop. (As parfor cannot use shared memory, there is no copy-on-write feature with using the Parallel Toolbox.
If you want to have a closer look at your memory, what is available and that can be used by Matlab, check out feature('memstats'). What is interesting for you is the Virtual Memory that is
Total and available memory associated with the whole MATLAB process. It is limited by processor architecture and operating system.
or use this command [user,sys] = memory.
Quick side node: Matlab stores matrices consistently in memory. You need to have a large block of free RAM for large matrices. That is also the reason why you want to allocate variables, because changing them dynamically forces Matlab to copy the entire matrix to a larger spot in the RAM every time it outgrows the current spot.
If you really have memory issues, you might just want to dig into the art of data types -- as is required in lower level languages. E.g. you can cut your memory usage in half by using single-precision directly from the start main_mat=zeros(500,500,2000,'single'); -- btw, this also works with rand(...,'single') and more native functions -- although a few of the more sophisticated matlab functions require input of type double, which you can upcast again.
If I understand correctly your main issue is that parfor does not allow to share memory. Think of every parfor worker as almost a separate matlab instance.
There is basically just one workaround for this that I know (that I have never tried), that is 'shared matrix' on Fileexchange: https://ch.mathworks.com/matlabcentral/fileexchange/28572-sharedmatrix
More solutions: as others suggested: remove parfor is certainly one solution, get more ram, use tall arrays (that use harddrives when ram runs full, read here), divide operations in smaller chunks, last but not least, consider an alternative other than Matlab.
You may use following code. You actually don't need the slice_matrix
main_mat=zeros(500,500,2000);
n=500;
slice_matrix=zeros(500,500,n);
for k=1:4
parfor i=1:n
main_mat(:,:,1+(k-1)*n + i - 1) = gather(gpuArray(rand(500,500)));
end
%% now you don't need this main_mat(:,:,1+(k-1)*n:1+(k-1)*n+n-1)=slice_matrix; %This is where the memory will likely overflow
end

Matlab: Error using parallel_function: Out of Memory [duplicate]

This question already has answers here:
Saving time and memory using parfor?
(2 answers)
Closed 7 years ago.
I am using Matlab R2011b version on Windows 7 64 bit, Core i7 CPU with 8 GB RAM. I am running Approximate Nearest Neighbor algorithm called the Locality Sensitive Hashing using Matlabpool. Upon starting Matlab pool, I get the output
Starting matlabpool using the 'local' configuration ... connected to 4 labs.
When the control reaches the for loop, Matlab throws errro
Error using parallel_function (line 598)
Out of memory. Type HELP MEMORY for your options.
Error stack:
remoteParallelFunction.m at 29
Error in Evaluate (line 19)
parfor i=1:query_num
I have no clue how to solve this problem. Please help. Thank you
That is because the parfor requires a lot more memory.
All the workers/labs in a parfor loop are independent so each of them needs his amount of memory. Also, there is overhead involved due to the fact that the pool must spread and collect data from/to the workers.
Try using a regular for or open a pool with 2 workers instead of 4.
Also, it depends on how optimized your some_function() is: try using as few variables as possible.

aborting the MATLAB code when the RAM is FULL

Is there any MATLAB command that let us abort the MATLAB code when the 90% of RAM is full due to the huge amount of data?
I am asking this question because I do not want to restart the computer every time that MATLAB is stuck and computer is hanged?
As far as I know, you can't "automatically" do that, if MATLAB hangs, it hangs.
However, in your code, you can always add somewhere (e.g. inside of a memory heavy iterative function) a memory check.
if you do
maxmem=2e10; %about 2GB of RAM
%% //this inside the memory heavy code
mem=memory;
if mem.MemUsedMATLAB>maxmem
exit; % // or some other thing you may want to do
end
This will exit MATLAB when the memory is about 2GB of RAM (the value is in bits, so make sure you note that when putting your own value)
Adding this answer to SO as suggested by #Ander Biguri, the answer if purely based on this link
Using Matlab try (as an option), you can monitor your memory usage as
tryOptions.watchdog.virtualAddressSpace = 7e9 ; %//7GB Mem
tryOptions.watchdog.execTime = 1800 ; %//Execution Time 1800 seconds
try tryOptions
...
catch %// use the try and catch combo to monitor your memory usage and kill process if you need to.
Other useful tools which may help:
T = evalc('feature(''memstats'')') ;
str2mat(regexp(T, '(?<=Use:\s*)\d+', 'match'))
The memstats could output the current stats of your memory, you can add break points in your code (at the beginning of a major operation) to monitor your memory usage and decide if you want to continue executing.

How to fully use the CPU in Matlab [Improving performance of a repetitive, time-consuming program]

I'm working on an adaptive and Fully automatic segmentation algorithm under varying light condition , the core of this algorithm uses Particle swarm Optimization(PSO) to tune the fuzzy system and believe me it's very time consuming :| for only 5 particles and 100 iterations I have to wait 2 to 3 hours ! and it's just processing one image from my data set containing over 100 photos !
I'm using matlab R2013 ,with a intel coer i7-2670Qm # 2.2GHz //8.00GB RAM//64-bit operating system
the problem is : when starting the program it uses only 12%-16% of my CPU and only one core is working !!
I've searched a lot and came into matlabpool so I added this line to my code :
matlabpool open 8
now when I start the program the task manger shows 98% CPU usage, but it's just for a few seconds ! after that it came back to 12-13% CPU usage :|
Do you have any idea how can I get this code run faster ?!
12 Percent sounds like Matlab is using only one Thread/Core and this one with with full load, which is normal.
matlabpool open 8 is not enough, this simply opens workers. You have to use commands like parfor, to assign work to them.
Further to Daniel's suggestion, ideally to apply PARFOR you'd find a time-consuming FOR loop in you algorithm where the iterations are independent and convert that to PARFOR. Generally, PARFOR works best when applied at the outermost level possible. It's also definitely worth using the MATLAB profiler to help you optimise your serial code before you start adding parallelism.
With my own simulations I find that I cannot recode them using Parfor, the for loops I have are too intertwined to take advantage of multiple cores.
HOWEVER:
You can open a second (and third, and fourth etc) instance of Matlab and tell this additional instance to run another job. Each instance of matlab open will use a different core. So if you have a quadcore, you can have 4 instances open and get 100% efficiency by running code in all 4.
So, I gained efficiency by having multiple instances of matlab open at the same time and running a job. My jobs took 8 to 27 hours at a time, and as one might imagine without liquid cooling I burnt out my cpu fan and had to replace it.
Also do look into optimizing your matlab code, I recently optimized my code and now it runs 40% faster.

Determine the Maximum Number of Processors Available for matlabpool (MATLAB Parallel Toolbox)

I'm currently writing some code in MATLAB that uses the parfor loop to speed up some tedious calculations.
My issue is that the code will be run on a remote cluster, and could be run on 4-core, 8-core or 12-core machines (I won't know which one in advance)...
I basically need a code snippet that will allow MATLAB to determine the maximum number of cores that can be used in matlabpool. If we call this variable maxcores, I can then go ahead and use
matlabpool('open',maxcores).
so that I can make sure that I am using all the cores that are available to me.
You can get the number of cores on the machine through feature('numCores'), which is undocumented but seems unlikely to break. (source)
Someone claims there that getNumberOfComputationalThreads also works since R2007a, but it doesn't on my R2012a.
Beyond Dougal's response, I found getenv('NUMBER_OF_PROCESSORS') returns the number of threads on my Windows systems.