Using HDD memory for the MATLAB - matlab

As in my previous question I have the following problem. I have a matrix P nxn which elements are matrices P{i,j} which are also nxn. So the total amount of elements is n^4. For n=100 there is an error about the lack of memory. I calculate this matrix only one time and then operate with it. Could you advise me, how to store matrices P{i,j} on the HDD?
I mean that maybe it is possible to store each of them in a file like "data_i_j.dat" and then load it while doing computations in a loop for i and j?

The save function will write data to a file, and the load function will read it back again. save(filename,varname,varname,varname...), followed by S = load(filename) and referring to S.varname (there's also a version of load that just dumps stuff into your current workspace, but that seems like poor practice).

Related

How to 'copy' matrix without creating a temporary matrix in memory that caused memory overflow?

By assigning a matrix into a much bigger allocated memory, matlab somehow will duplicate it while 'copying' it, and if the matrix to be copied is large enough, there will be memory overflow. This is the sample code:
main_mat=zeros(500,500,2000);
n=500;
slice_matrix=zeros(500,500,n);
for k=1:4
parfor i=1:n
slice_matrix(:,:,i)=gather(gpuArray(rand(500,500)));
end
main_mat(:,:,1+(k-1)*n:1+(k-1)*n+n-1)=slice_matrix; %This is where the memory will likely overflow
end
Any way to just 'smash' the slice_matrix onto the main_mat without the overhead? Thanks in advance.
EDIT:
The overflow occurred when main_mat is allocated beforehand. If main_mat is initialized with main_mat=zeros(500,500,1); (smaller size), the overflow will not occur, but it will slowed down as allocation is not done before matrix is assigned into it. This will significantly reduce the performance as the range of k increases.
The main issue is that numbers take more space than zeros.
main_mat=zeros(500,500,2000); takes little RAM while main_mat = rand(500,500,2000); take a lot, no matter if you use GPU or parfor (in fact, parfor will make you use more RAM). So This is not an unnatural swelling of memory. Following Daniel's link below, it seems that the assignment of zeros only creates pointers to memory, and the physical memory is filled only when you use the matrix for "numbers". This is managed by the operating system. And it is expected for Windows, Mac and Linux, either you do it with Matlab or other languages such as C.
Removing parfor will likely fix your problem.
parfor is not useful there. MATLAB's parfor does not use shared memory parallelism (i.e. it doesn't start new threads) but rather distributed memory parallelism (it starts new processes). It is designed to distribute work over a set or worker nodes. And though it also works within one node (or a single desktop computer) to distribute work over multiple cores, it is not an optimal way of doing parallelism within one node.
This means that each of the processes started by parfor needs to have its own copy of slice_matrix, which is the cause of the large amount of memory used by your program.
See "Decide When to Use parfor" in the MATLAB documentation to learn more about parfor and when to use it.
I assume that your code is just a sample code and that rand() represents a custom in your MVE. So there are a few hints and tricks for the memory usage in matlab.
There is a snippet from The MathWorks training handbooks:
When assigning one variable to another in MATLAB, as occurs when passing parameters into a function, MATLAB transparently creates a reference to that variable. MATLAB breaks the reference, and creates a copy of that variable, only when code modifies one or more of teh values. This behavior, known as copy-on-write, or lazy-copying, defers the cost of copying large data sets until the code modifies a values. Therefore, if the code performs no modifications, there is no need for extra memory space and execution time to copy variables.
The first thing to do would be to check the (memory) efficiency of your code. Even the code of excellent programmers can be futher optimized with (a little) brain power. Here are a few hints regarding memory efficiency
make use of the nativ vectorization of matlab, e.g. sum(X,2), mean(X,2), std(X,[],2)
make sure that matlab does not have to expand matrices (implicit expanding was changed recently). It might be more efficient to use the bsxfun
use in-place-operations, e.g. x = 2*x+3 rather than x = 2*x+3
...
Be aware that optimum regarding memory usage is not the same as if you would want to reduce computation time. Therefore, you might want to consider reducing the number of workers or refrain from using the parfor-loop. (As parfor cannot use shared memory, there is no copy-on-write feature with using the Parallel Toolbox.
If you want to have a closer look at your memory, what is available and that can be used by Matlab, check out feature('memstats'). What is interesting for you is the Virtual Memory that is
Total and available memory associated with the whole MATLAB process. It is limited by processor architecture and operating system.
or use this command [user,sys] = memory.
Quick side node: Matlab stores matrices consistently in memory. You need to have a large block of free RAM for large matrices. That is also the reason why you want to allocate variables, because changing them dynamically forces Matlab to copy the entire matrix to a larger spot in the RAM every time it outgrows the current spot.
If you really have memory issues, you might just want to dig into the art of data types -- as is required in lower level languages. E.g. you can cut your memory usage in half by using single-precision directly from the start main_mat=zeros(500,500,2000,'single'); -- btw, this also works with rand(...,'single') and more native functions -- although a few of the more sophisticated matlab functions require input of type double, which you can upcast again.
If I understand correctly your main issue is that parfor does not allow to share memory. Think of every parfor worker as almost a separate matlab instance.
There is basically just one workaround for this that I know (that I have never tried), that is 'shared matrix' on Fileexchange: https://ch.mathworks.com/matlabcentral/fileexchange/28572-sharedmatrix
More solutions: as others suggested: remove parfor is certainly one solution, get more ram, use tall arrays (that use harddrives when ram runs full, read here), divide operations in smaller chunks, last but not least, consider an alternative other than Matlab.
You may use following code. You actually don't need the slice_matrix
main_mat=zeros(500,500,2000);
n=500;
slice_matrix=zeros(500,500,n);
for k=1:4
parfor i=1:n
main_mat(:,:,1+(k-1)*n + i - 1) = gather(gpuArray(rand(500,500)));
end
%% now you don't need this main_mat(:,:,1+(k-1)*n:1+(k-1)*n+n-1)=slice_matrix; %This is where the memory will likely overflow
end

Efficient way to store single matrices generated in a loop in Matlab?

I would like to know whether there is a way to reduce the amount of memory used by the following piece of code in Matlab:
n=3;
T=100;
r=T*2;
b=80;
BS=1000
bsuppostmp_=cell(1,BS);
bslowpostmp_=cell(1,BS);
bsuppnegtmp_=cell(1,BS);
bslownegtmp_=cell(1,BS);
for w=1:BS
bsuppostmp_{w}= randi([0,1],n*T,2^(n-1),r,b);
bslowpostmp_{w}=randi([0,3],n*T,2^(n-1),r,b);
bsuppnegtmp_{w}=randi([0,4],n*T,2^(n-1),r,b);
bslownegtmp_{w}=randi([0,2],n*T,2^(n-1),r,b);
end
I have decided to use cells of matrices because after this loop I need to call separately each single matrix in another loop.
If I run this code I get the message error "Your system has run out of application memory".
Do you know a more efficient (in terms of memory) way to store each single matrix?
Let's refer the page about Strategies for Efficient Use of Memory:
Because simple numeric arrays (comprising one mxArray) have the least overhead, you should use them wherever possible. When data is too complex to store in a simple array (or matrix), you can use other data structures.
Cell arrays are comprised of separate mxArrays for each element. As a result, cell arrays with many small elements have a large overhead.
I doubt that the overhead for cell array is really large ...
Let me give a possible explanation. What if Matlab cannot use the swap file in case of storing the 4D arrays into a cell array? When storing large numeric arrays, there is no out-of-memory error because Matlab uses the swap file for caching each variable when the used memory becomes too big. Whereas if each 4D array is stored in a super cell array, Matlab sees it as a single variable and cannot fragment it part in the RAM and part in the swap file. Ok I don't work at Mathworks so I don't know if I'm right or not, it's just an idea about this topic so I would be glad to know what is the real reason.
So my advice is the same as other comments: try to free matrices as soon as you've done with them. There is not so many possibilities to store many dense arrays: one big array (NOT recommended here, will reach out-of-memory sooner because Matlab makes it contiguous), cell array or struct array (and if I correctly understand the documentation, the overhead can be equivalent). In all cases, the data amount over all 4D arrays is really large, so the best thing to do is to care about keeping the memory constantly as low as possible by discarding some data once they are used and keep in memory only the results of computation (in case they take lower memory usage ...).

error reading a large binary file in MATLAB

I have to read in a large Binary file whose size is 92,504 KB. When I am using fread command MATLAB is giving me error:
Error using fread Out of memory. Type HELP MEMORY for your options.
I tried to restart MATLAB also so that if I am using any virtual memory it should be cleared but still the problem persists.
How can I solve this problem of reading data.
The problem is the code that you are using to read the data:
[data,count] = fread(fid,'uint8');
The above line tells matlab to read in as many uint8s as it can and put them into a vector.
The trouble is that matlab will put it into a vector of doubles. So rather than a vector where each element is one byte, you have a vector where each element is 8 bytes. This ends up making your 92Mb of data take up 92*8 = 736mb which is probably going to be bigger than the maximum possible array size shown by the memory command.
The solution here is to tell matlab to put the data you are reading into a vector of uint8 which can be achieved as follows:
[data,count] = fread(fid,'*uint8');
This method for reading in the data tells matlab that the output vector should be the same type as the input data. You can read more about it in the precision section of the fread documentation.
In a 32-bit system, you may have very less memory available to MATLAB. The fread command you are using reads the entire file at once. This is probably a bad idea, since you system is not having enough memory. A better way to implement would be to read file part by part. See,
A = fread(fileID, sizeA)
in link below[1]. You can put this code inside a loop. In case you want to read whole file at once, what i would recommend is to use a 64-bit system with 3GB RAM.
[1] http://www.mathworks.in/help/matlab/ref/fread.html

2d matrix to a 3d matrix without using a loop

I have a 300x300 matrix. I need to make a 300x300x1024 matrix where each "slice" is the original 300x300 matrix. Is there any way to do this without a loop? I tried the following:
old=G;
for j=2:N;
G(:,:,j)=old;
end
where N is 1024, but I run out of memory.
Know any shorter routes?
use repmat.
B = repmat(A,[m n p...])
produces a multidimensional array B composed of copies of A. The size of B is [size(A,1)*m, size(A,2)*n, size(A,3)*p, ...].
In your case ,
G=repmat(old,[1 1 1024]);
Will yield the result you wanted without the for loop. The memory issue is a completely different subject. A 300x300x1024 double matrix will "cost" you ~740 MB of memory, that's not a lot these days. Check your memory load before you try the repmat and see why you don't have these extra 700 MB. use memory and whos to see what is the available memory and which variables can be cleared.
You are likely running out of memory because you haven't pre-initialized your matrix.
if you do this first,
old = G;
G = zeros(size(old,1), size(old,2), 1024);
and then start the loop from 1 instead of 2, you will probably not run out of memory
Why this works is because you first set aside a block of memory large enough for the entire matrix. If you do not initialize your matrix, matlab first sets aside enough memory for a 300x300x1 matrix. Next when you add the second slice, it moves down the memory, and allocates a new block for a 300x300x2 matrix, and so on, never being able to access the memory allocated for the first matrices.
This occurs often in matlab, so it is important to never grow your matrices within a loop.
Quick answer is no, you will need to loop.
You might be able to do something smart like block-copying your array's memory but you didn't even give us a language to work with.
You will probably want to make sure each entry in your matrix is a minimum size, even at byte matrix size you will require 92mb, if you are storing a 64bit value we're talking nearly a gig. If it's an object your number will leap into the many-gig range in no time. Bit packing may come in handy... but again no idea what your other constraints are.
Edit: I updated your tags.
I'm not sure if I can help much, but doubles are 64bits each so at bare minimum you're talking about 2gb (You're already past impossible if you are on a 32 bit os). This could easily double if each cell involves one or two pointers to different memory locations (I don't know enough about matlab to tell you for sure).
If you're not running on an 8gb 64 bit machine I don't think you have a chance. If you are, allocate all the memory you can to matlab and pray.
Sorry I can't be of more help, maybe someone else knows more tricks.

Matlab: Is it possible to save in the workspace a vector that contains 4 millions of values?

I am calculating something and, as result I have a vector 4 millions elements. I am not still finished to calculate it. I´ve calculate it will take 2 and a half hours more. When I will have finished I could save it?
It is not possible, what can I do?
Thank you.
On Windows 32-bit you can have at maximum a double array of 155-200 millions of elements. Check other OSs on Mathworks support page.
Yes, just use the command save. If you just need it for later Matlab computations, then it is best to save it in .mat format.
save('SavedFile.mat','largeVector')
You can then load your file whenever you need it using the load function.
load('SavedFile.mat')
On my machine it takes 0.01 sec to get a random vector with 4 million elements, with whos you can see that it takes (only) 32 MB.
It would take only few seconds to save such amount of data with MATLAB. If you are working with post-R2007b then maybe it is better to save with '-v7.3' option, newer MATLAB versions use by default general HDF5 format but there could be some performance/disc usage issues.