'Out of memory' in Matlab. A slow but a permanent solution? - matlab

I am wondering if my suggestion to 'Out of Memory' problem is impossible. Here is my suggestion:
The idea is seamlessly saving huge matrices (say BIG = rand(10^6)) to HDD as a .mat(-v7.3) file when it is not possible to keep it in memory and call it seamlessly whenever required. Then, when you want to use it like:
a = BIG(3678,2222);
s = size(BIG);
, it seamlessly does this behind the scene:
m = matfile('BIG.m');
a = m.BIG(3678,2222);
s = size(m,'BIG');
I know that speed is important but suppose that I have enough time but not enough memory. And also its better to write an memory efficient program but again suppose that I need to use someone else's function which cant be optimized. I do actually have some more related questions: Can this be implemented using objects? Or does it require a infrastructural change in Matlab?

Seems to me like this is certainly possible, as this essentially what many operating systems do in the form of paging.
Moreover, something similar is provided by MATLAB Distributed Computing Server. This allows you (among other things) to store the data for a single matrix on multiple machines, and access it seamlessly in the way that you propose.
IMHO, allowing for data to be paged to file/swap should be a setting in MATLAB. Unfortunately, that is not how MATLAB's memory model works, and I suspect it is very difficult to implement this on their side. Plus, when this setting is enabled, users will not be protected anymore against making silly mistakes like zeros(1e7) instead of zeros(1e7,1); it will simply seem to hang the system, as MATLAB is busy filling your entire drive with zeros.
Anyway, I think it is possible using MATLAB classes. But I wouldn't recommend it. Note that implementing a proper subsref and subsasgn is *ahum* challenging, plus, it is likely that you'll have to re-implement many algorithms (like mldivide). This will most likely mean you'll lose a great deal of performance; think factors of in the thousands.
Here's an interesting random relevant paper I found while googling around a bit.

What could probably be a solution to your problem is memory mapped io (which matlab supports).
There a file is mapped to the memory and all read/writes to that memory address are actually read/writes to the file.
This only reserves/blocks memory addresses, it does not consume physical memory.
However, I would only advise it with 64 bit matlab, since with 32 bit matlab the address space is simply not large enough to use ram for data, code for matlab and dlls, and memory mapped io.
Check out the examples for the documentation page of memmapfile(), e.g.,
m = memmapfile('records.dat', ...
'Offset', 1024, ...
'Format', {'uint32' [4 10 18] 'x'});
A = m.Data(1).x;
whos A
Name Size Bytes Class
A 4x10x18 2880 uint32 array
Note that, accesses to m.Data(1).x redirect to file IO, i.e. no memory is consumed. So it provides efficient random access to parts of possibly very large data files residing on disk. Also note, that more complex data structures as in the example can be realized.
It seems to me that this provides the "implementation with objects" you had in mind.
Unfortunately, this does not allow to memmap MATfiles directly, which would be really useful. Probably this is difficult because of the compression.

Write a function:
function a=BIG(x,y)
m = matfile('BIG.mat');
a = m.BIG(x,y);
end
Every time you write BIG(a,b) the function is called.

Related

How to 'copy' matrix without creating a temporary matrix in memory that caused memory overflow?

By assigning a matrix into a much bigger allocated memory, matlab somehow will duplicate it while 'copying' it, and if the matrix to be copied is large enough, there will be memory overflow. This is the sample code:
main_mat=zeros(500,500,2000);
n=500;
slice_matrix=zeros(500,500,n);
for k=1:4
parfor i=1:n
slice_matrix(:,:,i)=gather(gpuArray(rand(500,500)));
end
main_mat(:,:,1+(k-1)*n:1+(k-1)*n+n-1)=slice_matrix; %This is where the memory will likely overflow
end
Any way to just 'smash' the slice_matrix onto the main_mat without the overhead? Thanks in advance.
EDIT:
The overflow occurred when main_mat is allocated beforehand. If main_mat is initialized with main_mat=zeros(500,500,1); (smaller size), the overflow will not occur, but it will slowed down as allocation is not done before matrix is assigned into it. This will significantly reduce the performance as the range of k increases.
The main issue is that numbers take more space than zeros.
main_mat=zeros(500,500,2000); takes little RAM while main_mat = rand(500,500,2000); take a lot, no matter if you use GPU or parfor (in fact, parfor will make you use more RAM). So This is not an unnatural swelling of memory. Following Daniel's link below, it seems that the assignment of zeros only creates pointers to memory, and the physical memory is filled only when you use the matrix for "numbers". This is managed by the operating system. And it is expected for Windows, Mac and Linux, either you do it with Matlab or other languages such as C.
Removing parfor will likely fix your problem.
parfor is not useful there. MATLAB's parfor does not use shared memory parallelism (i.e. it doesn't start new threads) but rather distributed memory parallelism (it starts new processes). It is designed to distribute work over a set or worker nodes. And though it also works within one node (or a single desktop computer) to distribute work over multiple cores, it is not an optimal way of doing parallelism within one node.
This means that each of the processes started by parfor needs to have its own copy of slice_matrix, which is the cause of the large amount of memory used by your program.
See "Decide When to Use parfor" in the MATLAB documentation to learn more about parfor and when to use it.
I assume that your code is just a sample code and that rand() represents a custom in your MVE. So there are a few hints and tricks for the memory usage in matlab.
There is a snippet from The MathWorks training handbooks:
When assigning one variable to another in MATLAB, as occurs when passing parameters into a function, MATLAB transparently creates a reference to that variable. MATLAB breaks the reference, and creates a copy of that variable, only when code modifies one or more of teh values. This behavior, known as copy-on-write, or lazy-copying, defers the cost of copying large data sets until the code modifies a values. Therefore, if the code performs no modifications, there is no need for extra memory space and execution time to copy variables.
The first thing to do would be to check the (memory) efficiency of your code. Even the code of excellent programmers can be futher optimized with (a little) brain power. Here are a few hints regarding memory efficiency
make use of the nativ vectorization of matlab, e.g. sum(X,2), mean(X,2), std(X,[],2)
make sure that matlab does not have to expand matrices (implicit expanding was changed recently). It might be more efficient to use the bsxfun
use in-place-operations, e.g. x = 2*x+3 rather than x = 2*x+3
...
Be aware that optimum regarding memory usage is not the same as if you would want to reduce computation time. Therefore, you might want to consider reducing the number of workers or refrain from using the parfor-loop. (As parfor cannot use shared memory, there is no copy-on-write feature with using the Parallel Toolbox.
If you want to have a closer look at your memory, what is available and that can be used by Matlab, check out feature('memstats'). What is interesting for you is the Virtual Memory that is
Total and available memory associated with the whole MATLAB process. It is limited by processor architecture and operating system.
or use this command [user,sys] = memory.
Quick side node: Matlab stores matrices consistently in memory. You need to have a large block of free RAM for large matrices. That is also the reason why you want to allocate variables, because changing them dynamically forces Matlab to copy the entire matrix to a larger spot in the RAM every time it outgrows the current spot.
If you really have memory issues, you might just want to dig into the art of data types -- as is required in lower level languages. E.g. you can cut your memory usage in half by using single-precision directly from the start main_mat=zeros(500,500,2000,'single'); -- btw, this also works with rand(...,'single') and more native functions -- although a few of the more sophisticated matlab functions require input of type double, which you can upcast again.
If I understand correctly your main issue is that parfor does not allow to share memory. Think of every parfor worker as almost a separate matlab instance.
There is basically just one workaround for this that I know (that I have never tried), that is 'shared matrix' on Fileexchange: https://ch.mathworks.com/matlabcentral/fileexchange/28572-sharedmatrix
More solutions: as others suggested: remove parfor is certainly one solution, get more ram, use tall arrays (that use harddrives when ram runs full, read here), divide operations in smaller chunks, last but not least, consider an alternative other than Matlab.
You may use following code. You actually don't need the slice_matrix
main_mat=zeros(500,500,2000);
n=500;
slice_matrix=zeros(500,500,n);
for k=1:4
parfor i=1:n
main_mat(:,:,1+(k-1)*n + i - 1) = gather(gpuArray(rand(500,500)));
end
%% now you don't need this main_mat(:,:,1+(k-1)*n:1+(k-1)*n+n-1)=slice_matrix; %This is where the memory will likely overflow
end

Optimizing compression using HDF5/H5 in Matlab

Using Matlab, I am going to generate several data files and store them in H5 format as 20x1500xN, where N is an integer that can vary, but typically around 2300. Each file will have 4 different data sets with equal structure. Thus, I will quickly achieve a storage problem. My two questions:
Is there any reason not the split the 4 different data sets, and just save as 4x20x1500xNinstead? I would prefer having them split, since it is different signal modalities, but if there is any computational/compression advantage to not having them separated, I will join them.
Using Matlab's built-in compression, I set deflate=9 (and DataType=single). However, I have now realized that using deflate multiplies my computational time with 5. I realize this could have something to do with my ChunkSize, which I just put to 20x1500x5 - without any reasoning behind it. Is there a strategic way to optimize computational load w.r.t. deflation and compression time?
Thank you.
1- Splitting or merging? It won't make a difference in the compression procedure, since it is performed in blocks.
2- Your choice of chunkshape seems, indeed, bad. Chunksize determines the shape and size of each block that will be compressed independently. The bad is that each chunk is of 600 kB, that is much larger than the L2 cache, so your CPU is likely twiddling its fingers, waiting for data to come in. Depending on the nature of your data and the usage pattern you will use the most (read the whole array at once, random reads, sequential reads...) you may want to target the L1 or L2 sizes, or something in between. Here are some experiments done with a Python library that may serve you as a guide.
Once you have selected your chunksize (how many bytes will your compression blocks have), you have to choose a chunkshape. I'd recommend the shape that most closely fits your reading pattern, if you are doing partial reads, or filling in in a fastest-axis-first if you want to read the whole array at once. In your case, this will be something like 1x1500x10, I think (second axis being the fastest, last one the second fastest, and fist the slowest, change if I am mistaken).
Lastly, keep in mind that the details are quite dependant on the specific machine you run it: the CPU, the quality and load of the hard drive or SSD, speed of RAM... so the fine tuning will always require some experimentation.

matlab cell array size much larger than actual data

I recently discovered an strange behavour of MATLAB cell arrays that was not happening before.
If I create a cell array with
a=cell(1,4)
its size is 32 bytes.
If then I put something inside, e.g.
a{2}='abcd'
its size becomes 144 bytes. But if I remove this content by putting
a{2}=[]
the size becomes 132 bytes and so on. What is the problem?
Simply put, the Matlab cell array needs some internal data structures to keep track of what is stored within.
As it seems, Matlab allocates memory as needed, and thus extends the storage needed by the cell array as you insert data.
Removing the data doesn't mean that matlab can return the now unused memory to the OS or internal memory pool -- that might either be something that is impossible with the internal storage structure, or something that would be unwise with respect to performance, because cell arrays from which data is removed are (speaking over all use cases of cell arrays) be structures that get updated often, so that "prematurely" returning memory just to acquire it back again a few instructions later would be pretty CPU-intense.
As a general note: Matlab has pretty terrible storage approaches for nearly everything but matrices and sparse matrices (vectors of course being special cases of matrices). That's because it's not Matlab's job to be e.g. a string parser etc.
If memory becomes a problem, it might be worth considering implementing the math core of your problem in Matlab and doing the rest in other, more generally usable programming languages and somehow interfacing your Matlab code with that -- I haven't tried that myself, but Mathworks has a Matlab engine for python, and I'd take writing python for things like storing arbitrary data over using Matlab every day; with that engine, you can call Matlab to do your dirty math work, and use python to do your everyday scripting/programming work.
Notice that my bottom line here is that Matlab has great Math routines and impressive documentation, but if you want to actually develop software, using a general purpose tool/language is much more likely to be satisfying quickly.
I'd even go as far as saying that it's probably worth your time to learn python, just to be able to circumvent having to deal with things that Matlab wasn't designed for (and cell arrays are a prime example of what Matlab is really complicated about and what's extremely easy in python).
You use
a{2}=[]
to 'kill' the data in that field. In reality you actually do access the data, that is you leave a non-empty cell entry with an empty double array. (Thanks to matlab for representing empty cells as empty doubles...)
but if you use (no curly braces, but parentheses):
a(2) = cell(1,1)
then the cell array size is back to "empty" = 32 bytes.

Multiplication of matrices when cannot loaded into memory at once

I've read some similar posts, while none of them actually tackled my problem.
I need to do a series of multiplication-similar operations for A, B, specifically calculating kernel matrices, on Windows Platform. While, the problem is both of A, B could be really large, let us say, 20000-by-360000. While, my server can only provide 96 GB memory. It may seem infeasible to have them in memory at the same time and do the calculation. So is there any good way to efficiently handle such a large multiplication? Btw, The size of result, which is 20000-by-20000, is much less than the multiplier and can fit in the memory properly.
Because I do it on Windows, it may be not feasible to call functions like mmap2.
I wonder whether converting them into sparse matrix is a good option. However, it may heavily depend on the properties of data.
Another solution I've come up with is to partition the origin matrix into blocks. Then do the calculation block-by-block.
Is there any other better solution? Any practical suggestions would be really appreciated.
Best regards,
Peiyun
If I where you I'd look into the block processing function:
B = blockproc(filename,[M N],fun)
and use the Destination parameter to allow saving the results without overflowing your memory.

CUDA: Reasons for using preprocessing variables to specify the problem size

I'm coding CUDA in Matlab mex-Files. When you look at CUDA examples on the internet or even manuals from nvidia, you often see the use of preprocessing variables to specify the problem size, e.g. the vector length for a vector addition or something like this. I coded my program also like this: Preprocessing Variables for specifying the problem size. And I have to admit it: I like it since you can access those everywhere in your code, e.g. as limits in a loop or something like this, without having to explicitly pass them via argument to the function.
But I ran into the following problem: I wanted to bench the program for several different problem sizes and thus I need to compile the code everytime again by passing the preprocessing-variable to the compiler. It's not a problem, I already coded the benchmark and it works. But I just wonder afterwards now, why I chose this version and did not simply specify it by a user input on runtime. And thus I'm looking for reasons one might want to use preprocessing variables instead of simply passing the problem size to the program.
Thanks!
When you compile-in problem-size constants in the kernel, then the compiler can make certain classes of optimizations that it can't if the sizes are only known at runtime. Full loop unrolling is an obvious example.
In other cases, for instance shared memory array sizes, it is a lot clearer if the sizes are compiled-in; otherwise you have to pass in the total shared memory size at kernel launch time and break that memory up into the number of shared arrays you need. That works fine, but the code is much clearer if you can just have static declarations, for which you need the compile-time sizes.
The main reason is that in general the problem size will be intimately linked to the GPU architecture, e.g. number of threads per block, number of blocks, amount of shared memory per thread, number of registers per thread, etc. In general these numbers are all carefully hand tuned to get the maximum usage of available resources and you can't easily change the problem size dynamically while still maintaining optimum performance.