I wrote my bottleneck function into a parallelized MEX function with MATLAB coder and it works fine so far.
But at certain points the function crashes with an Unexpected unknown exception from MEX file.
I marked some points in the function that they will display letters like 'a' so that a see where in the MEX function the error happens since I can't put breakpoint inside anymore. It didn't even come to the first letter, so the error must happen while initializing the function.
I recognized that the error always happens when the input variable size exceeds a certain limit.
The main input variables alongside some double 1x1 values are 3 nx1 double arrays and one nxn logical matrix. n can vary in size and will change size during the iterations, so I declared the as :Inf:1 and :Inf:Inf: in coder.
The problem appears when n exceeds a value of about 45000. I don't know the exact value since in one iterations when its below the value everything is fine and in the next one when its above that one it crashes.
When analysing it further I saw that it seems to happen when the nxn matrix exceeds a size of 2^31 bytes.
When I want to translate my code into MEX with the matrix that doesn't load into the MEX function I get an error with something like "error with intmax()... not supported"
So I guess that coder writes MEX functions for 32-bit systems and when loading a matrix larger than that there is no pointer to point to the right element in the matrix due to the 32 bit system?
How do I bypass that problem?
Since the matrix is a boolean matrix I now found out that MATLAB supports sparse matrices and that this would save a lot of memory since only 1/10^4 of my elements are 1, but I still would like to know if the problem above results from the 32bit MEX functions and if so how to solve this problem?
Yes, this issue is caused by the MEX-file being limited to 32-bit indexing. This is a limit in Coder:
For variable-size arrays that use dynamic memory allocation, the maximum number of elements is the smaller of:
intmax('int32').
The largest power of 2 that fits in the C int data type on the target hardware.
These restrictions apply even on a 64-bit platform.
See this documentation page
As far as I can tell, the only workaround is to have Coder generate C or C++ code for you, and you manually editing the code to use size_t instead of int for indexing (and of course remove any checks for array size). I have never worked with code generated by Coder, so am not sure how hard this is. I have written many MEX-files in C and C++ though, and these MEX-files work perfectly fine with 64-bit indexing and larger than 2 Gb arrays.
Related
I am getting an out of memory error on this line of MATLAB code:
result = (A(1:xmax,1:ymax,1:zmax) .* B(2:xmax+1,2:ymax+1,2:zmax+1) +
A(2:xmax+1,2:ymax+1,2:zmax+1) .* B(1:xmax,1:ymax,1:zmax)) ./ C
where C is another array. This is on 32 bit MATLAB (I can't seem to get the 64 bit version at the moment, which would temporarily fix my problems).
The arrays result, A, B, and C are pre-initialized and never change size. It is then my guess that this computation is not being performed in constant space.
Is this correct? Is there a way to make it run or check if it is running in constant space?
These arrays of are approximate size (250, 250, 250).
If MATLAB does not run this in constant size, does anyone have any experience as to whether Octave or Julia or (insert similar language) does?
edit 1:
I eliminated excess arrays. There are 10 arrays that are 258 x 258 x 338, which corresponds to 1.67 GB. There are a bunch of other variables but they are much smaller. The calculation presented is simplified, the form of the calculation is:
R = (A(3Drange) .* B(3Drange) + A(new_3Drange) .* D(new_3Drange) + . . . ) ./ C
where the ranges generally just differ by a shift of plus or minus 1 or 2.
The output of memory command:
Maximum possible array: 669 MB (7.013e+08 bytes) *
Memory available for all arrays: 1541 MB (1.616e+09 bytes) **
Memory used by MATLAB: 2209 MB (2.316e+09 bytes)
Physical Memory (RAM): 8154 MB (8.550e+09 bytes)
* Limited by contiguous virtual address space available.
** Limited by virtual address space available.
Apparently I should be violating the second line. However, the code runs fine until the first operation that I actually do with the arrays. Perhaps MATLAB is being lazy and not allocating when I type:
A=zeros(xmax+2,ymax+2,zmax+2);
but still telling me in the workspace that the variable is allocated.
This code has worked before with smaller arrays. (edit: but it seems the actual memory size is the problem, not the size of each individual array).
The very curious thing to me is why it does not error during allocation, but instead errors during the first calculation.
edit 2:
I have confirmed that the loop is not running constant in space. There is about a .8 GB of memory being allocated during the calculation. Here is an image of resource usage while the command is being executed in a loop:
However, I tried breaking up the computation into multiple lines. I split the computation at each addition and added on each part in a new command, treating R as a accumulator. The result is that less memory is allocated at one time, but presumably more often. Here is the picture:
I am still curious as to why MATLAB doesn't want to execute this in constant space. I think it perhaps has something to do with the indexing being shifted - I am planning on investigating it more later and then putting this all together in an answer, but someone may beat me to it, which would be great also. Now, though, I can run the array size I was looking for and can finish my project.
I guess that most of the question has already been answered:
Does it operate in constant space?
No as you verified, it does not.
Why doesn't it operate in constant space?
Matlab claims to be fast at vectorized matrix operations, not so much emphasis is placed on memory efficiency.
What to do now?
Here are different options, the first one is preferred if possible, the other two are certainly possible.
Make it fit, for example by upgrading to 64 bit matlab or by not putting other stuf in your workspace
Work on parts of the matrix, so for example cut it in half
Dont use vectorization at all but make a simple for loop
If you don't vectorize, you will have a minimal space solution.
I know matlab has a built in pdist function that will calculate pairwise distances. However, my matrix is so large that its 60000 by 300 and matlab runs out of memory.
This question is a follow up on Matlab euclidean pairwise square distance function.
Is there any workaround for this computational inefficiency. I tried manually coding the pairwise distance calculations and it usually takes a full day to run (sometimes 6 to 7 hours).
Any help is greatly appreciated!
Well, I couldn't resist playing around. I created a Matlab mex C file called pdistc that implements pairwise Euclidean distance for single and double precision. On my machine using Matlab R2012b and R2015a it's 20–25% faster than pdist(and the underlying pdistmex helper function) for large inputs (e.g., 60,000-by-300).
As has been pointed out, this problem is fundamentally bounded by memory and you're asking for a lot of it. My mex C code uses minimal memory beyond that needed for the output. In comparing its memory usage to that of pdist, it looks like the two are virtually the same. In other words, pdist is not using lots of extra memory. Your memory problem is likely in the memory used up before calling pdist (can you use clear to remove any large arrays?) or simply because you're trying to solve a big computational problem on tiny hardware.
So, my pdistc function likely won't be able to save you memory overall, but you may be able to use another feature I built in. You can calculate chunks of your overall pairwise distance vector. Something like this:
m = 6e3;
n = 3e2;
X = rand(m,n);
sz = m*(m-1)/2;
for i = 1:m:sz-m
D = pdistc(X', i, i+m); % mex C function, X is transposed relative to pdist
... % Process chunk of pairwise distances
end
This is considerably slower (10 times or so) and this part of my C code is not optimized well, but it will allow much less memory use – assuming that you don't need the entire array at one time. Note that you could do the same thing much more efficiently with pdist (or pdistc) by creating a loop where you passed in subsets of X directly, rather than all of it.
If you have a 64-bit Intel Mac, you won't need to compile as I've included the .mexmaci64 binary, but otherwise you'll need to figure out how to compile the code for your machine. I can't help you with that. It's possible that you may not be able to get it to compile or that there will be compatibility issues that you'll need to solve by editing the code yourself. It's also possible that there are bugs and the code will crash Matlab. Also, note that you may get slightly different outputs relative to pdist with differences between the two in the range of machine epsilon (eps). pdist may or may not do fancy things to avoid overflows for large inputs and other numeric issues, but be aware that my code does not.
Additionally, I created a simple pure Matlab implementation. It is massively slower than the mex code, but still faster than a naïve implementation or the code found in pdist.
All of the files can be found here. The ZIP archive includes all of the files. It's BSD licensed. Feel free to optimize (I tried BLAS calls and OpenMP in the C code to no avail – maybe some pointer magic or GPU/OpenCL could further speed it up). I hope that it can be helpful to you or someone else.
On my system the following is the fastest (Even faster than the C code pdistc by #horchler):
function [ mD ] = CalcDistMtx ( mX )
vSsqX = sum(mX .^ 2);
mD = sqrt(bsxfun(#plus, vSsqX.', vSsqX) - (2 * (mX.' * mX)));
end
You'll need a very well tuned C code to beat this, I think.
Update
Since MATLAB R2016b MATLAB supports implicit broadcasting without the use of bsxfun().
Hence the code can be written:
function [ mD ] = CalcDistMtx ( mX )
vSsqX = sum(mX .^ 2, 1);
mD = sqrt(vSsqX.'+ vSsqX - (2 * (mX.' * mX)));
end
A generalization is given in my Calculate Distance Matrix project.
P. S.
Using MATLAB's pdist for comparison: squareform(pdist(mX.')) is equivalent to CalcDistMtx(mX).
Namely the input should be transposed.
Computers are not infinitely large, or infinitely fast. People think that they have a lot of memory, a fast CPU, so they just create larger and larger problems, and then eventually wonder why their problem runs slowly. The fact is, this is NOT computational inefficiency. It is JUST an overloaded CPU.
As Oli points out in a comment, there are something like 2e9 values to compute, even assuming you only compute the upper or lower half of the distance matrix. (6e4^2/2 is approximately 2e9.) This will require roughly 16 gigabytes of RAM to store, assuming that only ONE copy of the array is created in memory. If your code is sloppy, you might easily double or triple that. As soon as you go into virtual memory, things get much slower.
Wanting a big problem to run fast is not enough. To really help you, we need to know how much RAM is available. Is this a virtual memory issue? Are you using 64 bit MATLAB, on a CPU that can handle all the needed RAM?
I have to read in a large Binary file whose size is 92,504 KB. When I am using fread command MATLAB is giving me error:
Error using fread Out of memory. Type HELP MEMORY for your options.
I tried to restart MATLAB also so that if I am using any virtual memory it should be cleared but still the problem persists.
How can I solve this problem of reading data.
The problem is the code that you are using to read the data:
[data,count] = fread(fid,'uint8');
The above line tells matlab to read in as many uint8s as it can and put them into a vector.
The trouble is that matlab will put it into a vector of doubles. So rather than a vector where each element is one byte, you have a vector where each element is 8 bytes. This ends up making your 92Mb of data take up 92*8 = 736mb which is probably going to be bigger than the maximum possible array size shown by the memory command.
The solution here is to tell matlab to put the data you are reading into a vector of uint8 which can be achieved as follows:
[data,count] = fread(fid,'*uint8');
This method for reading in the data tells matlab that the output vector should be the same type as the input data. You can read more about it in the precision section of the fread documentation.
In a 32-bit system, you may have very less memory available to MATLAB. The fread command you are using reads the entire file at once. This is probably a bad idea, since you system is not having enough memory. A better way to implement would be to read file part by part. See,
A = fread(fileID, sizeA)
in link below[1]. You can put this code inside a loop. In case you want to read whole file at once, what i would recommend is to use a 64-bit system with 3GB RAM.
[1] http://www.mathworks.in/help/matlab/ref/fread.html
I have a 300x300 matrix. I need to make a 300x300x1024 matrix where each "slice" is the original 300x300 matrix. Is there any way to do this without a loop? I tried the following:
old=G;
for j=2:N;
G(:,:,j)=old;
end
where N is 1024, but I run out of memory.
Know any shorter routes?
use repmat.
B = repmat(A,[m n p...])
produces a multidimensional array B composed of copies of A. The size of B is [size(A,1)*m, size(A,2)*n, size(A,3)*p, ...].
In your case ,
G=repmat(old,[1 1 1024]);
Will yield the result you wanted without the for loop. The memory issue is a completely different subject. A 300x300x1024 double matrix will "cost" you ~740 MB of memory, that's not a lot these days. Check your memory load before you try the repmat and see why you don't have these extra 700 MB. use memory and whos to see what is the available memory and which variables can be cleared.
You are likely running out of memory because you haven't pre-initialized your matrix.
if you do this first,
old = G;
G = zeros(size(old,1), size(old,2), 1024);
and then start the loop from 1 instead of 2, you will probably not run out of memory
Why this works is because you first set aside a block of memory large enough for the entire matrix. If you do not initialize your matrix, matlab first sets aside enough memory for a 300x300x1 matrix. Next when you add the second slice, it moves down the memory, and allocates a new block for a 300x300x2 matrix, and so on, never being able to access the memory allocated for the first matrices.
This occurs often in matlab, so it is important to never grow your matrices within a loop.
Quick answer is no, you will need to loop.
You might be able to do something smart like block-copying your array's memory but you didn't even give us a language to work with.
You will probably want to make sure each entry in your matrix is a minimum size, even at byte matrix size you will require 92mb, if you are storing a 64bit value we're talking nearly a gig. If it's an object your number will leap into the many-gig range in no time. Bit packing may come in handy... but again no idea what your other constraints are.
Edit: I updated your tags.
I'm not sure if I can help much, but doubles are 64bits each so at bare minimum you're talking about 2gb (You're already past impossible if you are on a 32 bit os). This could easily double if each cell involves one or two pointers to different memory locations (I don't know enough about matlab to tell you for sure).
If you're not running on an 8gb 64 bit machine I don't think you have a chance. If you are, allocate all the memory you can to matlab and pray.
Sorry I can't be of more help, maybe someone else knows more tricks.
As in my previous question I have the following problem. I have a matrix P nxn which elements are matrices P{i,j} which are also nxn. So the total amount of elements is n^4. For n=100 there is an error about the lack of memory. I calculate this matrix only one time and then operate with it. Could you advise me, how to store matrices P{i,j} on the HDD?
I mean that maybe it is possible to store each of them in a file like "data_i_j.dat" and then load it while doing computations in a loop for i and j?
The save function will write data to a file, and the load function will read it back again. save(filename,varname,varname,varname...), followed by S = load(filename) and referring to S.varname (there's also a version of load that just dumps stuff into your current workspace, but that seems like poor practice).