Debugging Matlab avoids memory leak - matlab

I have a memory intensive Matlab script.
What puzzles me is that if I run this code it will leak memory at the very first iteration (out of the 46 expected). The leak will eventually become so big that it will require forcing Matlab to quit:
Trying to find the leak point, I set a breakpoint at the first line in the loop but as I hit "Continue" the execution ran through the first loop and stopped again at the breakpoint and produced no leak. Removing the breakpoint and continuing from that point reintroduces the leak.
Using the breakpoint to execute the code one loop at the time avoids the leak and the code terminates with no issues (fig.2).
Now, I would like to:
1) understand whether this leak is due to something I introduced or whether it could be a Matlab specific issue,
2) get an idea of how to find the leak (I cannot use the debugger as it removes the problem).
I would love to provide the code but it is quite a big chunk (>100 lines), so my question is more about the general approach than the actual debugging of the specific issue.

Thanks for the suggestions.
My approach has been to isolate the portion of code that was causing the problem with printouts above each line of code so that before the leaky crash I could see where the execution stopped.
The culprit was a zeros(100k) line where I tried to pre-allocate a big matrix.
I tried executing the same line on a newer version of Matlab (2015b vs 2014b) and found that while the older version lets you instantiate big matrices (over ~ 50k by 50k) and freezes when it sucks all the memory, the newer version returns the following error:
Error using zeros
Requested 50000x50000 (18.6GB) array exceeds maximum array size preference. Creation of arrays greater than this limit may take a long time and cause MATLAB to become unresponsive.
See array size limit or preference panel for more
information.
In my case the limits for a NxN matrix are:
N > ~60000 on Matlab2014b on 16GB RAM
N >= 46341 on Matlab2015b on 12GB RAM
With the difference that my 2014 version lets me at least try to create them and collapses when they are too big and the 2015 version prevented me from trying at all.
The puzzling bit is that, on the 2014b version, if I debug the code the compilers lets the zeros(100k) line run and everything works just fine.
The problem appears again if I try to visualise the contents of the matrix in Matlab Variables Tab.

Related

Matlab Process Memory Leak Over 16 Days

I'm running a real-time data assimilation program written in Matlab, and there seems to be a slow memory leak. Over the course of about 16 days, the average memory usage has increased by about 40% (see the figure below) from about 1.1GB to 1.5GB. The program loops every 15 minutes, and there is a peak in memory usage for about 30 seconds during the data assimilation step (visible in the figure).
At the end of each 15 minute cycle, I'm saving the names, sizes, and types of all variables in the currently active workspace to a .mat file using the whos function. There are just over 100 variables, and after running the code for about 16 days, there is no clear trend in the amount of memory used by any of the variables.
Some variables are cleared at the end of each cycle, but some of them are not. I'm also calling close all to make sure there are no figures sitting in memory, and I made sure that when I'm writing ASCII files, I always fclose(fileID) the file.
I'm stumped...I'm wondering if anyone here has any suggestions about things I should look for or tools that could help track down the issue. Thanks in advance!
Edit, system info:
RHEL 6.8
Matlab R2014b
I figured out the problem. It turns out that the figure handles were hidden, and close('all') only works on figures that are visible. I assume they're hidden because the figures are created outside the scope of where I was trying to close the figures. The solution was to replace close('all') with close all hidden, which closes all figures including those with hidden handles.
I'll go ahead and restate what #John and #horchler mentioned in their comments, in case their suggestions can help people with similar issues:
Reusing existing figures can increase performance and reduce the potential for memory leaks.
Matlab has an undocumented memory profiler that could help debug performance related issues.
For processes that are running indefinitely, it's good practice to separate data collection/processing and product generation (figures etc). The first reads in and processes the data and saves it to a DB or file. The second allows you to "view/access/query" the data.
If you are calling compiled mex functions in your code, the memory leak could be coming from the Fortran or C/C++ code. Not cleaning up a single variable could cause a leak, and would explain linear memory growth.
The Matlab function whos is great for looking at the size in memory of each variable in the workspace. This can be useful for tracking down which variable is the culprit of a memory leak.
Thanks #John and #horchler!

MATLAB out of memory on linux despite regular "clear all"

I am batch processing a bunch of files (~200) on MATLAB, in essence
for i = 1:n, process(i); end
where process(i) opens a file, reads it and writes out the output to another file. (I am not posting details about process here because it is hundreds of lines long and I readily admit I don't fully understand the code, having obtained it from someone else).
This runs out of memory after every dozen of files or so. Of course, on Linux, the memory function is not available so we have to figure it out "by hand". Well, I thought there is some memory leak, so let's issue a clear all after every run, i.e.
for i = 1:n, process(i); clear all; end
No luck, this still runs out of memory. At the point where this happens, who says there's just two small arrays in memory (<100 elements). Note that quitting MATLAB and restarting solves the problem, so the computer certainly has enough memory to process a single item.
Any ideas to help me detect where the error comes from would be welcome.
This is probable not the solution you are hoping for but as a workaround you could have a shell script that loops over several calls to Matlab.

MATLAB - codeHints taking 99.9% of runtime (R2013a)

I'm not sure what is going on. I am running my neural network simulations on my laptop, which has MATLAB R2013a on it.
The code runs fast on my desktop (R2012a though), but very very slow on the laptop. I ran it with performance and timing thing because this seems abnormal, here are the screenshots I took of the functions spending the most time doing something:
This is located in the codeHints.m file, so it isn't something I wrote. Is there any way I can disable this? I googled it but maybe I am not searching for the right things... I couldn't find anything. I can't get any work done because it is so slow :(
Would appreciate some advice!
Update: I have also attempted to run it on my desktop at work (same MATLAB version as laptop, also 8GB of RAM), and I get the same issue. I checked the resource monitor and it seems like the process is triggering a lot of memory faults (~40/sec), even though not even half of my RAM is being used.
I typed in "memory" in MATLAB and got the following information:
Maximum possible array: 11980 MB (1.256e+10 bytes) *
Memory available for all arrays: 11980 MB (1.256e+10 bytes) *
Memory used by MATLAB: 844 MB (8.849e+08 bytes)
Physical Memory (RAM): 8098 MB (8.491e+09 bytes)
So it seems like there should be sufficient room. I will try to put together a sample file.
Update #2: I ran my code on 2012a on the work computer with the following "memory" info:
Maximum possible array: 10872 MB (1.140e+10 bytes) *
Memory available for all arrays: 10872 MB (1.140e+10 bytes) *
Memory used by MATLAB: 846 MB (8.874e+08 bytes)
Physical Memory (RAM): 8098 MB (8.491e+09 bytes)
The run with more iterations than above (15000 as opposed to 10000) completed much faster and there are no extraneous calls for memory allocation:
So it seems to me that it is an issue exclusively with 2013a. For now I will use 2012a (because I need this finished), but if anyone has ideas on what to do with 2013a to stop those calls to codeHints, I would appreciate it.
Though this would scream memory problems at first sight, it seems like your test have made a lack of memory improbable. In this case the only reasonable explanation that I can think off is that the computer is actually trying to do 2 different things, thus taking more time.
Some possibilities:
Actually not using exactly the same inputs
Actually not using exactly the same functions
The first point can be detected by putting some breakpoints in the codes whilst running it on 2 computers and verifying that the inputs are exactly the same. (Consider using visdiff if you have a lot of variables)
The second one could almost only be caused by having overloaded zeros. Make sure to stop at this line and see which function is being called.
If both these points don't solve the problem, try reducing the code as much as possible till you have only one or a few lines that create the difference. If it turns out that the difference just comes from this one line, try using the zeros function with the right size input on both computers and time the result with the timeit File Exchange Submission
If you find that you are using the builtin function on both computers, with plenty of memory and there still is a huge performance difference, it is probably time to contact mathworks support and hear what they have to say about it.

Why does Matlab run faster after a script is "warmed up"?

I have noticed that the first time I run a script, it takes considerably more time than the second and third time1. The "warm-up" is mentioned in this question without an explanation.
Why does the code run faster after it is "warmed up"?
I don't clear all between calls2, but the input parameters change for every function call. Does anyone know why this is?
1. I have my license locally, so it's not a problem related to license checking.
2. Actually, the behavior doesn't change if I clear all.
One reason why it would run faster after the first time is that many things are initialized once, and their results are cached and reused the next time. For example in the M-side, variables can be defined as persistent in functions that can be locked. This can also occur on the MEX-side of things.
In addition many dependencies are loaded after the first time and remain so in memory to be re-used. This include M-functions, OOP classes, Java classes, MEX-functions, and so on. This applies to both builtin and user-defined ones.
For example issue the following command before and after running your script for the first run, then compare:
[M,X,C] = inmem('-completenames')
Note that clear all does not necessarily clear all of the above, not to mention locked functions...
Finally let us not forget the role of the accelerator. Instead of interpreting the M-code every time a function is invoked, it gets compiled into machine code instructions during runtime. JIT compilation occurs only for the first invocation, so ideally the efficiency of running object code the following times will overcome the overhead of re-interpreting the program every time it runs.
Matlab is interpreted. If you don't warm up the code, you will be losing a lot of time due to interpretation instead of the actual algorithm. This can skew results of timings significantly.
Running the code at least once will enable Matlab to actually compile appropriate code segments.
Besides Matlab-specific reasons like JIT-compilation, modern CPUs have large caches, branch predictors, and so on. Warming these up is an issue for benchmarking even in assembly language.
Also, more importantly, modern CPUs usually idle at low clock speed, and only jump to full speed after several milliseconds of sustained load.
Intel's Turbo feature gets even more funky: when power and thermal limits allow, the CPU can run faster than its sustainable max frequency. So the first ~20 seconds to 1 minute of your benchmark may run faster than the rest of it, if you aren't careful to control for these factors.
Another issue not mensioned by Amro and Marc is memory (pre)allocation.
If your script does not pre-allocate its memory it's first run would be very slow due to memory allocation. Once it completed its first iteration all memory is allocated, so consecutive invokations of the script would be more efficient.
An illustrative example
for ii = 1:1000
vec(ii) = ii; %// vec grows inside loop the first time this code is executed only
end

Matlab Preallocation

I'm running a simulation of a diffusion-reaction equation in MATLAB, and I pre-allocate the memory for all of my vectors beforehand, however, during the loop, in which I solve a system of equations using BICG, the amount of memory that MATLAB uses is increasing.
For example:
concentration = zeros(N, iterations);
for t = 1:iterations
concentration(:,t+1) = bicg(matrix, concentration(:,t));
end
As the program runs, the amount of memory MATLAB is using increases, which seems to suggest that the matrix, concentration, is increasing in size as the program continues, even though I pre-allocated the space. Is this because the elements in the matrix are becoming doubles instead of zeros? Is there a better way to pre-allocate the memory for this matrix, so that all of the memory the program requires will be pre-allocated at the start? It would be easier for me that way, because then I would know from the start how much memory the program will require and if the simulation will crash the computer or not.
Thanks for all your help, guys. I did some searching around and didn't find an answer, so I hope I'm not repeating a question.
EDIT:
Thanks Amro and stardt for your help guys. I tried running 'memory' in MATLAB, but the interpreter said that command is not supported for my system type. I re-ran the simulation though with 'whos concentration' displayed every 10 iterations, and the allocation size of the matrix wasn't changing with time. However, I did notice that the size of the matrix was about 1.5 GB. Even though that was the case, system monitor was only showing MATLAB as using 300 MB (but it increased steadily to reach a little over 1 GB by the end of the simulation). So I'm guessing that MATLAB pre-allocated the memory just fine and there are no memory leaks, but system monitor doesn't count the memory as in use until MATLAB starts writing values to it in the loop. I don't know why that would be, as I would imagine that writing zeros would trigger the system monitor to see that memory as 'in use,' but I guess that's not the case here.
Anyway, I appreciate your help with this. I would vote both of your answers up as I found them both helpful, but I don't have enough reputation points to do that. Thanks guys!
I really doubt it's a memory leak, since most "objects" in MATLAB clean after themselves once they go out of scope. AFAIK, MATLAB does not use a GC per se, but a deterministic approach to managing memory.
Therefore I suspect the issue is more likely to be caused by memory fragmentation: when MATLAB allocates memory for a matrix, it has to be contiguous. Thus when the function is repeatedly called, creating and deleting matrices, and over time, the fragmentation becomes a noticeable problem...
One thing that might help you debug is using the undocumented: profile on -memory which will track allocation in the MATLAB profiler. Check out the monitoring tool by Joe Conti as well. Also this page has some useful information.
I am assuming that you are watching the memory usage of matlab in, for example, the task manager on windows. The memory usage is probably increasing due to the execution of bicg() and variables that have not been garbage collected after it ends. The memory allocated to the concentration matrix stays the same. You can type
whos concentration
before and after your "for" loop to see how much memory is allocated to that variable.