MATLAB and clearing the swap space - matlab

In the debugging mode I stop at some breakpoint and do some matrix manipulation in order to test the program. These manipulations are computationally expensive so MATLAB uses the swap space on my linux system. Then, after continuing the program running, the swap space is almost full so MATLAB crushes. Is there a way I could clean the swap at the debugging node? Doing clear all and clear classes makes effect only on RAM memory, but do not affect the swap.

You can't. Swap isn't special, so just work through this as as a regular out-of-memory issue. If you free up memory, you'll indirectly free up the swap that's being used to back it (or avoid having to use swap to supplement it).
Swap space is just an OS-managed backing store for virtual memory. From a normal program's point of view, swap is RAM (just slow RAM) and you don't manage it separately. (Well... you can "wire" pages to prevent them from being swapped out and so on, or use OS APIs to directly manipulate swap, but those are low-level platform-specific details, (like, below malloc), and not exposed to you as a Matlab M-code programmer, and not what you want to do here.) If your Matlab program runs out of memory, that means it's used up or fragmented its process's virtual memory, not something in particular about your swap space. (Unless there's a low-level bug somewhere.)
When this happens, you may need to look elsewhere in your Matlab program (e.g. in global variables, figure handle properties, or other levels of the function call stack) to find additional data that hasn't been cleared yet, or just restart the Matlab process to fix memory fragmentation (which can happen if your code fills up the memory with lots of small arrays).
Like #siliconwafer suggests, memory, whos, and feature memstats are good tools for debugging this. And if you're stopped inside the debugger, realize you can't actually clear everything until you dbquit out of it.
Doing large matrix operations inside the debugger is not necessarily a recoverable operation: if you've modified arrays held in local variables in the stack frame(s) you're working on, but there are still copies of them held in other variables or frames, Matlab's copy-on-write mechanism needs to hold on to both copies of the arrays, and you might be out of luck for that run of the program if you hit your RAM limits.
If clear all and clear classes after exiting the debugger are not recovering enough memory for you, that smells like either memory fragmentation or a C-level memory leak (like in a MEX file). In either case, you need to restart Matlab to resolve it. Avoid the use of large cellstr arrays or other arrays-of-small-arrays to reduce fragmentation. And take a good hard look at your C code if you're using any custom MEX functions.
Or you just might not have enough memory to do the operations you're doing.

Related

How to 'copy' matrix without creating a temporary matrix in memory that caused memory overflow?

By assigning a matrix into a much bigger allocated memory, matlab somehow will duplicate it while 'copying' it, and if the matrix to be copied is large enough, there will be memory overflow. This is the sample code:
main_mat=zeros(500,500,2000);
n=500;
slice_matrix=zeros(500,500,n);
for k=1:4
parfor i=1:n
slice_matrix(:,:,i)=gather(gpuArray(rand(500,500)));
end
main_mat(:,:,1+(k-1)*n:1+(k-1)*n+n-1)=slice_matrix; %This is where the memory will likely overflow
end
Any way to just 'smash' the slice_matrix onto the main_mat without the overhead? Thanks in advance.
EDIT:
The overflow occurred when main_mat is allocated beforehand. If main_mat is initialized with main_mat=zeros(500,500,1); (smaller size), the overflow will not occur, but it will slowed down as allocation is not done before matrix is assigned into it. This will significantly reduce the performance as the range of k increases.
The main issue is that numbers take more space than zeros.
main_mat=zeros(500,500,2000); takes little RAM while main_mat = rand(500,500,2000); take a lot, no matter if you use GPU or parfor (in fact, parfor will make you use more RAM). So This is not an unnatural swelling of memory. Following Daniel's link below, it seems that the assignment of zeros only creates pointers to memory, and the physical memory is filled only when you use the matrix for "numbers". This is managed by the operating system. And it is expected for Windows, Mac and Linux, either you do it with Matlab or other languages such as C.
Removing parfor will likely fix your problem.
parfor is not useful there. MATLAB's parfor does not use shared memory parallelism (i.e. it doesn't start new threads) but rather distributed memory parallelism (it starts new processes). It is designed to distribute work over a set or worker nodes. And though it also works within one node (or a single desktop computer) to distribute work over multiple cores, it is not an optimal way of doing parallelism within one node.
This means that each of the processes started by parfor needs to have its own copy of slice_matrix, which is the cause of the large amount of memory used by your program.
See "Decide When to Use parfor" in the MATLAB documentation to learn more about parfor and when to use it.
I assume that your code is just a sample code and that rand() represents a custom in your MVE. So there are a few hints and tricks for the memory usage in matlab.
There is a snippet from The MathWorks training handbooks:
When assigning one variable to another in MATLAB, as occurs when passing parameters into a function, MATLAB transparently creates a reference to that variable. MATLAB breaks the reference, and creates a copy of that variable, only when code modifies one or more of teh values. This behavior, known as copy-on-write, or lazy-copying, defers the cost of copying large data sets until the code modifies a values. Therefore, if the code performs no modifications, there is no need for extra memory space and execution time to copy variables.
The first thing to do would be to check the (memory) efficiency of your code. Even the code of excellent programmers can be futher optimized with (a little) brain power. Here are a few hints regarding memory efficiency
make use of the nativ vectorization of matlab, e.g. sum(X,2), mean(X,2), std(X,[],2)
make sure that matlab does not have to expand matrices (implicit expanding was changed recently). It might be more efficient to use the bsxfun
use in-place-operations, e.g. x = 2*x+3 rather than x = 2*x+3
...
Be aware that optimum regarding memory usage is not the same as if you would want to reduce computation time. Therefore, you might want to consider reducing the number of workers or refrain from using the parfor-loop. (As parfor cannot use shared memory, there is no copy-on-write feature with using the Parallel Toolbox.
If you want to have a closer look at your memory, what is available and that can be used by Matlab, check out feature('memstats'). What is interesting for you is the Virtual Memory that is
Total and available memory associated with the whole MATLAB process. It is limited by processor architecture and operating system.
or use this command [user,sys] = memory.
Quick side node: Matlab stores matrices consistently in memory. You need to have a large block of free RAM for large matrices. That is also the reason why you want to allocate variables, because changing them dynamically forces Matlab to copy the entire matrix to a larger spot in the RAM every time it outgrows the current spot.
If you really have memory issues, you might just want to dig into the art of data types -- as is required in lower level languages. E.g. you can cut your memory usage in half by using single-precision directly from the start main_mat=zeros(500,500,2000,'single'); -- btw, this also works with rand(...,'single') and more native functions -- although a few of the more sophisticated matlab functions require input of type double, which you can upcast again.
If I understand correctly your main issue is that parfor does not allow to share memory. Think of every parfor worker as almost a separate matlab instance.
There is basically just one workaround for this that I know (that I have never tried), that is 'shared matrix' on Fileexchange: https://ch.mathworks.com/matlabcentral/fileexchange/28572-sharedmatrix
More solutions: as others suggested: remove parfor is certainly one solution, get more ram, use tall arrays (that use harddrives when ram runs full, read here), divide operations in smaller chunks, last but not least, consider an alternative other than Matlab.
You may use following code. You actually don't need the slice_matrix
main_mat=zeros(500,500,2000);
n=500;
slice_matrix=zeros(500,500,n);
for k=1:4
parfor i=1:n
main_mat(:,:,1+(k-1)*n + i - 1) = gather(gpuArray(rand(500,500)));
end
%% now you don't need this main_mat(:,:,1+(k-1)*n:1+(k-1)*n+n-1)=slice_matrix; %This is where the memory will likely overflow
end

load 160MB data, Matlab memory usage jumps 0.5GB to 1.3GB

When I clear all, the Matlab memory usage drops to 0.5GB. I then load a *.mat file in which the memory requirement is dominated by an object requiring 161MB (from whos). The Matlab memory usage jumps to 1.3GB. I then run a script that processes the data, creating small data objects/structures in the process. From whos, nothing rivals the 161MB object in terms of memory footprint. However, the Matlab's memory footprint creeps up to 2.2GB during processing, then settles down to 1.6GB when done. whos still reveals the overwhelmingly dominant memory user to be the loaded object.
Why does Matlab use so much memory than the data that it is processing? It's about 1000 times more. Is this just to give it the space for intermediate results?
I'm using Windows 7, 64-bit. My code is pretty simple post-processing script to tally up some of the loaded data. It invokes no user-defined functions or 3rd party tools. I understand that readers can't analyze my code to track down the specific causes, but is 1000x memory footprint typical? What are typical reasons for this?

why Matlab don't use Swap but error "Out of memory"?

I was wondering why Matlab doesn't use swap, but instead throws the error "Out of memory"?
Shouldn't Matlab just slow down instead of throwing an "Out of memory"?
Is this Java related?
added:
I know "out of memory" means it's out of contiguous memory. Doesn't swap have contiguous memory, or? I'm confused...
It is not about MATLAB. What happens when you try allocate more memory than exists in your hardware is an OS specific behavior.
On Linux, by default the OS will 'optimistically' allocate almost anything you want, i.e. swap space is also counted as allocatable memory. You will get what you want - no OOM error, but slow computations with swap-allocated data. This 'feature' is called overcommit. You can change this behavior by modifying the overcommit settings in Linux (have a look e.g. here for a brief summary).
Overcommit is probably not the best idea to use this for solving larger problems in MATLAB, since the entire OS starts to work really slow. It can definitely not be compared to optimized 'out-of-core' implementations that consciously use the hard disk in computations.
This is how it is on Linux. I do not know how to change the memory allocation behavior on Windows, but I doubt you really want to do that. You need more RAM.
And you do confuse things - swap has nothing to do with contiguous memory. Memory allocated by the OS is 'virtual memory', which is contiguous regardless of whether the underlying physical memory can be mapped to contiguous pages.
Edit For transparent 'out-of-core' operations on large matriices using disk space as extra memory you might want to have look at VVAR fileexchange project. This class pretends to be a usual MATLAB class, but it operates on an underlying HDD file. Note that the usual array size limitations of MATLAB still apply.

If only segmentation is enabled

beginners question:
"If" paging is disabled and only segmentation is enabled (CR0.PE is set) then does that mean if a program is loaded in memory (RAM), its whole binary image is loaded and none of its "part" is swapped out, becoz a program is broken into fixed size chunks only when paging is enabled (which then can be swapped out). And if it's true this will reduce the number of processes that run in memory of a particular size of RAM, say 2 GB?
Likely, but not necessarily.
It depends on the operating system...
You could write an operating system that uses a segment to map a part of the program into memory. When the program accesses memory outside the segment, you get a segmentation fault. As the segmentation fault is then passed to the operating system, it could swap in some data from disk and modify segmentation information, before returning control to the program.
However, this is probably difficult and expensive to do, and i do not know of any operating system that acts in this way.
As to the number of processes - you need to split the available memory into contiguous parts, one for each process. This is easy if processes do not grow; if they do, you need padding and may need to copy processes around, which is rather expensive...

Is it possible to build a Large Memory System for matlab?

I am suffering from the out of memory problem in mablab.
Is is possible to build a large memory system for matlab(e.g. 64GB ram)?
If yes, what do I need?
#Itamar gives good advice about how MATLAB requires contiguous memory to store arrays, and about good practices in memory management such as chunking your data. In particular, the technical note on memory management that he links to is a great resource. However much memory your machine has, these are always sensible things to do.
Nevertheless, there are many applications of MATLAB that will never be solved by these tips, as the datasets are just too large; and it is also clearly true that having a machine with much more RAM can address these issues.
(By the way, it's also sometimes the case that it's cheaper to just buy a new machine with more RAM than it is to pay the MATLAB developer to make all the memory optimizations they could - but that's for you to decide).
It's not difficult to access large amounts of memory with MATLAB. If you have a Windows or Linux machine with 64GB (or more) - it will obviously need to be running a 64-bit OS - MATLAB will be able to access it. I've come across plenty of MATLAB users who are doing this. If you know what you're doing you can build your own machine, or nowadays you can just buy a machine that size of the shelf from Dell.
Another option (depending on your application) would be to look into getting a small cluster, and using Parallel Computing Toolbox together with MATLAB Distributed Computing Server.
When you try to allocate an array in Matlab, Matlab must have enough contiguous memory the size of the array, and if not enough contiguous memory is available, you will get out of memory error, no matter how much RAM you have on your computer.
From my experience, the solution will not come from dealing directly with memory-related properties of your hardware, but from writing your code in a way that prevents allocation of too large arrays (cutting data to chunks, etc.). If you can describe your code and the task you try to solve, it might be possible to guide you in that direction.
You can read more here:http://www.mathworks.com/support/tech-notes/1100/1106.html