MATLAB takes a long time after last line of a function - matlab

I have a function that's taking a long time to run. When I profile it, I find that over half the time (26 out of 50 seconds) is not accounted for in the line by line timing breakdown, and I can show that the time is spent after the function finishes running but before it returns control by the following method:
ts1 = tic;
disp ('calling function');
functionCall(args);
disp (['control returned to caller - ', num2str(toc(ts1))]);
The first line of the function I call is ts2 = tic, and the last line is
disp (['last line of function- ', num2str(toc(ts2))]);
The result is
calling function
last line of function - 24.0043
control returned to caller - 49.857
Poking around on the interwebs, I think this is a symptom of the way MATLAB manages memory. It deallocates on function returns, and sometimes this takes a long time. The function does allocate some large (~1 million element) arrays. It also works with handles, but does not create any new handle objects or store handles explicitly. My questions are:
Is this definitely a memory management problem?
Is there any systematic way to diagnose what causes a problem in this function, as opposed to others which return quickly?
Are there general tips for reducing the amount of time MATLAB spends cleaning up on a function exit?

You are right, it seems to be the time spent on garbage collection. I am afraid it is a fundamental MATLAB flaw, it is known since years but MathWorks has not solved it even in the newest MATLAB version 2010b.
You could try setting variables manually to [] before leaving function - i.e. doing garbage collection manually. This technique also helps against memory leaks in previous MATLAB versions. Now MATLAB will spent time not on end but on myVar=[];
You could alleviate problem working without any kind of references - anonymous functions, nested functions, handle classes, not using cellfun and arrayfun.
If you have arrived to the "performance barrier" of MATLAB then maybe you should simply change the environment. I do not see any sense anyway starting today a new project in MATLAB except if you are using SIMULINK. Python rocks for technical computing and with C# you can also do many things MATLAB does using free libraries. And both are real programming languages and are free, unlike MATLAB.

I discovered a fix to my specific problem that may be applicable in general.
The function that was taking a long time to exit was called on a basic object that contained a vector of handle objects. When I changed the definition of the basic object to extend handle, I eliminated the lag on the close of the function.
What I believe was happening is this: When I passed the basic object to my function, it created a copy of that object (MATLAB is pass by value by default). This doesn't take a lot of time, but when the function exited, it destroyed the object copy, which caused it to look through the vector of handle objects to make sure there weren't any orphans that needed to be cleaned up. I believe it is this operation that was taking MATLAB a long time.
When I changed the object I was passing to a handle, no copy was made in the function workspace, so no cleanup of the object was required at the end.
This suggests a general rule to me:
If a function is taking a long time to clean up its workspace on exiting and you are passing a lot of data or complex structures by value, try encapsulating the arguments to that function in a handle object
This will avoid duplication and hence time consuming cleanup on exit. The downside is that your function can now unexpectedly change your inputs, because MATLAB doesn't have the ability to declare an argument const, as in c++.

A simple fix could be this: pre-allocate the large arrays and pass them as args to your functionCall(). This moves the deallocation issue back to the caller of functionCall(), but it could be that you are calling functionCall more often than its parent, in which case this will speed up your code.
workArr = zeros(1,1e6); % allocate once
...
functionCall(args,workArr); % call with extra argument
...
functionCall(args,wokrArr); % call again, no realloc of workArr needed
...
Inside functionCall you can take care of initializing and/or re-setting workArr, for instance
[workArr(:)] = 0; % reset work array

Related

Adding Multiple Values to a Variable in MATLAB

I have to work with a lot of data and run the same MATLAB program more than once, and every time the program is run it will store the data in the same preset variables. The problem is, every time the program is run the values are overwritten and replaced, most likely because all the variables are type double and are not a matrix. I know how to make a variable that can store multiple values in a program, but only when the program is run once.
This is the code I am able to provide:
volED = reconstructVolume(maskAlignedED1,maskAlignedED2,maskAlignedED3,res)
volMean = (volED1+volED2+volES3)/3
strokeVol = volED-volES
EF = strokeVol/volED*100
The program I am running depends on a ton more MATLAB files that I cannot provide at this moment, however I believe the double variables strokeVol and EF are created at this instant. How do I create a variable that will store multiple values and keep adding the values every time the program is run?
The reason your variables are "overwritten" with each run is that every function (or standalone program) has its own workspace where the local variables are located, and these local variables cease to exist when the function (or standalone program) returns/terminates. In order to preserve the value of a variable, you have to return it from your function. Since MATLAB passes its variables by value (rather than reference), you have to explicitly provide a vector (or more generally, an array) as input and output from your function if you want to have a cumulative set of data in your calling workspace. But it all depends on whether you have a function or a deployed program.
Assuming your program is a function
If your function is now declared as something like
function strokefraction(inputvars)
you can change its definition to
function [EFvec]=strokefraction(inputvars,EFvec)
%... code here ...
%volES initialized somewhere
volED = reconstructVolume(maskAlignedED1,maskAlignedED2,maskAlignedED3,res);
volMean = (volED1+volED2+volES3)/3;
strokeVol = volED-volES;
EF = strokeVol/volED*100;
EFvec = [EFvec; EF]; %add EF to output (column) vector
Note that it's legal to have the same name for an input and an output variable. Now, when you call your function (from MATLAB or from another function) each time, you add the vector to its call, like this:
EFvec=[]; %initialize with empty vector
for k=1:ndata %simulate several calls
inputvar=inputvarvector(k); %meaning that the input changes
EFvec=strokefraction(inputvar,EFvec);
end
and you will see that the size of EFvec grows from call to call, saving the output from each run. If you want to save several variables or arrays, do the same (for arrays, you can always introduce an input/output array with one more dimension for this purpose, but you probably have to use explicit indexing instead of just shoving the next EF value to the bottom of your vector).
Note that if your input/output array eventually grows large, then it will cost you a lot of time to keep allocating the necessary memory by small chunks. You could then choose to allocate the EFvec (or equivalent) array instead of initializing it to [], and introduce a counter variable telling you where to overwrite the next data points.
Disclaimer: what I said about the workspace of functions is only true for local variables. You could also define a global EFvec in your function and on your workspace, and then you don't have to pass it in and out of the function. As I haven't yet seen a problem which actually needed the use of global variables, I would avoid this option. Then you also have persistent variables, which are basically globals with their scope limited to their own workspace (run help global and help persistent in MATLAB if you'd like to know more, these help pages are surprisingly informative compared to usual help entries).
Assuming your program is a standalone (deployed) program
While I don't have any experience with standalone MATLAB programs, it seems to me that it would be hard to do what you want for that. A MathWorks Support answer suggests that you can pass variables to standalone programs, but only as you would pass to a shell script. By this I mean that you have to pass filenames or explicit numbers (but this makes sense, as there is no MATLAB workspace in the first place). This implies that in order to keep a cumulative set of output from your program you would probably have to store those in a file. This might not be so painful: opening a file to append the next set of data is straightforward (I don't know about issues such as efficiency, and anyway this all depends on how much data and how many runs of your function we're talking about).

Loading Variables into Functions from Structs in Matlab

Say I have a project which is comprised of:
A main script that handles all of the running of my simulation
Several smaller functions
A couple of structs containing the data
Within the script I will be accessing the functions many times within for loops (some over a thousand times within the minute long simulation). Each function is also looking for data contained with a struct files as part of their calculations, which are usually parameters that are fixed over the course of the simulation, however need to be varied manually between runs to observe the effects.
As typically these functions form the bulk of the runtime I'm trying to save time, as my simulation can't quite run at real-time as it stands (the ultimate goal), and I lose alot of time passing variables/parameters around functions. So I've had three ideas to try and do this:
Load the structs in the main simulation, then pass each variable in turn to the function in the form of a large argument (the current solution).
Load the structs every time the function is called.
Define the structs as global variables.
In terms of both the efficiency of the system (most relevent as the project develops), and possibly as I'm no expert programmer from a "good practise" perspective what is the best solution for this? Is there another option that I have not considered?
As mentioned above in the comments - the 1st item is best one.
Have you used the profiler to find out where you code takes most of its time?
profile on
% run your code
profile viewer
Note: if you are modifying your input struct in your child functions -> this will take more time, but if you are just referencing them then that should not be a problem.
Matlab does what's known as a "lazy copy" when passing arguments between functions. This means that it passes a pointer to the data to the function, rather than creating a new instance of that data, which is very efficient memory- and speed-wise. However, if you make any alteration to that data inside the subroutine, then it has to make a new instance of that argument so as to not overwrite the argument's value in the main function. Your response to matlabgui indicates you're doing just that. So, the subroutine may be making an entire new struct every time it's called, even though it's only modifying a small part of that struct's values.
If your subroutine is altering a small part of the array, then your best bet is to just pass that small part to it, then assign your outputs. For instance,
[modified_array] = somesubroutine(struct.original_array);
struct.original_array=modified_array;
You can also do this in just one line. Conceptually, the less data you pass to the subroutine, the smaller the memory footprint is. I'd also recommend reading up on in-place operations, as it relates to this.
Also, as a general rule, don't use global variables in Matlab. I have not personally experienced, nor read of an instance in which they were genuinely faster.

Variable Declaration with the Presence of Nested Functions

Someone on /r/matlab asked me a really interesting question a few days ago related to a Flappy Bird clone submitted to the MATLAB FEX. The poster noticed that if you open the main .m file, stop it in the debugger on the first line, and run a whos(), you see a bunch of variables before they are explicitly defined by the function.
The first thing that I noticed in the editor was the syntax highlighting indicating the presence of nested functions. At a glance, it seems like the variables returned by the whos() are only those that will be defined at some point in the scope of the base function.
You can recreate this with a simpler example:
function testcode
asdf = 1;
function testing
ghfj = 2;
end
end
If you set a breakpoint on the first line and run a whos(), you get
Name Size Bytes Class Attributes
ans 0x0 0 (unassigned)
asdf 0x0 0 (unassigned)
I couldn't seem to find anything explaining this behavior in the documentation for nested functions or related topics. I am not a computer scientist and my programming knowledge is limited to MATLAB and a very small sprinkling of Python. Can anybody explain what is going on? Does it have something to do with how MATLAB compiles the code at run time?
The workspace of a function with nested function is protected. When the function is called, Matlab has to analyze the code to determine which variables are in scope at what part of the function. Remember, variables that are declared in the main function and that are used in a nested function are passed by reference, and can be modified within the nested function even if not explicitly declared as input or output.
To avoid messing up any of the nested functions, and possibly to help speed things up, Matlab does not allow assigning any additional variables to the workspace of that function. For example, if you stop the execution of the code at line 1, and then try assigning a value to a new variable klmn, Matlab will throw an error. This can be a bit frustrating for debugging, but you can always assign ans, fortunately.

Matlab: Free memory of class objects

I recently wrote some code using Matlab's OOP. In each class object I save some measurement data as a property and define the methods for evaluating them. With an average data set one single class object uses about 32 MB of memory.
Now I am writing a GUI that should process these objects.
In the first step I load a set of objects from a saved .mat-file (about 200 objects, 2GB on harddisk) and store them in the handles struct. They fill the RAM and use about 6-7 GB, when loaded. This is no problem.
But if I close the GUI, it seems that I can't free the used memory.
I tried different approaches with no success.
Setting the data fields to "empty" in the destructor of the class:
function delete(obj)
obj.timeVector = [];
obj.valueVector = [];
end
Trying to free it in the figure_CloseRequestFcn:
function figure_CloseRequestFcn(hObject, eventdata, handles)
handles.data = [];
handles = rmfield(handles,'data');
guidata(hObject,handles);
clear handles;
pack; %Matlab issues a warning, that pack could only
%be used from the command line, but that did
%not work either
delete(hObject);
end
Any ideas, besides closing Matlab every time after working with the GUI?
I found the answer in the Matlab Bug Report Center. Seems to exist since R2011b.
Summary
Storing objects in MAT-files can cause a memory leak and prevent the object class from being cleared
Description
After storing an instance of a class, 'MyClass', in a MAT-file, calling clear classes may result in the warning:
Warning: Objects of 'MyClass' class exist. Cannot clear this class or any of its superclasses.
This warning persists, even if you have cleared all instances of the class in the workspace.
The warning may occur for one MAT-file format, and not for another.
Workaround
Under some circumstances, switching to a different MAT-file format may eliminate the warning.
http://www.mathworks.ch/support/bugreports/857319
Edit:
I tried older formats for saving, but this does not work either. I get an "Error closing file" (http://www.mathworks.ch/matlabcentral/answers/18098-error-using-save-error-closing-file). So Matlab does not support saving class objects that well. I will have to live with the memory issues then and restart Matlab after every use of the GUI.
Based on your memory screenshots, there is definitely memory that is not being cleared. There is a small chance that you have found a fundamental flaw in Matlab's garbage collection, but it is much more likely that the ~6Gigs of memory resident data is still actually available via some series of links. Based on personal experience, here are a few ways that memory which you thought was cleared can still be available:
Timer objects: If one of the callback functions of a timer references a this data (or a copy), then that data is still available. You need to call deleted(t) on that timer.
Persistent variables in functions: I often cache data in a persistent variable within a function, this clearly allows access to that data in the future, so it will not be cleared. You need to call clear FUNCTIONNAME to clear associated persistent variables.
In GUI objects, as either data or within callback functions: The figures and any persistents need to be cleared.
Any static methods or constant attributes in classes which can retain data. These can either be cleared individually within the class, or by force using clear CLASSNAME.
Some tips for finding stale link to data (again, based on personal mistakes)
Look at the exact number of bytes being lost after each call, using the x=memory; call to get an exact count. Is it consistent? Is it a number that you recognize? Sometimes I can find the leak after realizing that it is exactly 238263232 bytes, therefore a 29782904 double array, which must be from function xyz.
See which classes are actually being deleted. Within your delete(obj) function add a detailed display or which objects are being deleted, and by inference, which are not. For a given non-deleted object, where could it be reference from? You should not need to clear data in the delete(obj) function like you are doing, Matlab should handle that for you. Use the delete function instead as a debugging tool.
Matlab has a garbage collector so you don't need to manually manage memory. After closing the GUI, all the memory will be freed except for what is in your workspace. You can clear the workspace variables using clear.
One thing I've noticed on Windows (not sure about other platforms) is that Matlab's GUI sometimes retains extra memory (maybe 100 MB, but not multiple GB like you are seeing). Simply minimizing and then restoring the GUI will free this excess memory.

MATLAB Magical Mystery timing behavior

I am experiencing some very odd timing behavior from a function I wrote. If I wrap my function inside another empty container function, it gets a 3x speedup.
>> tic; foo(args); toc
time elapsed: ~140 seconds
>>tic; bar(args); toc
time elapsed: ~35 seconds
Here's the kicker - the definition of bar():
define bar(args)
foo(args)
end
Is there some sort of optimization that gets triggered in MATLAB for nested function calls? Should I be adding a dummy function to every function that I write?
The JIT accelerator does not operate on command line expressions as far as I know. Thus, when you run "tic; foo(args); toc" foo's code runs entirely in the MATLAB interpreter. However, when you run "tic; bar(args); toc", bar is evaluated in the interpreter and the JIT accelerator takes a shot at compiling the call to foo() to native code.
I'm really waving my hands over the details, but that's the gist of it. Details for MATLAB's JIT capabilities are hard to come by; most of what I've found is on Loren's blog at The MathWorks. The closest authoritative statement I can find about the command line being interpreter-only is here:
http://blogs.mathworks.com/loren/2006/05/10/memory-management-for-functions-and-variables/#comment-207
This is surprising behavior. An intermediate function call should not speed things up like that.
Try profiling it and see where it's spending its time. This is the best first resort to almost any "Why is my Matlab code slow?" question.
clear all
profile on -timer real
foo(args);
profile report
%read the report and save a screencap
clear all
profile clear
bar(args);
profile report
There ends the advice. Here starts the speculation.
There are a couple things that are different in the two calls. There is workspace interaction. Calling foo() from the command line may leave the variable "ans" populated in your workspace. When called from bar(), ans will be set but then immediately cleared when bar() returns. Also, the foo() may be using evalin()/assignin() to look into workspaces up the call stack, and it may interact with variables assigned in your base workspace. The bar() function has a clean workspace.
Depending on where bar.m is, it may actually be invoking a different foo(), or maybe resolving it slightly differently. Check your path resolution with "which foo" in both contexts.
Depending on how "args" is defined, different inputname()s may be visible to foo.
Also, foo() may contain pathological code that checks whether it is being called from the base workspace, or even whether it's being called by a function of a particular name, and behaves differently based on that.
That said, these should mostly be minor interactions and shouldn't cause a slowdown of that order. I'd suspect something else was going on, maybe just exposed by slightly different calling contexts. Adding a level of indirection with bar() shouldn't be the answer. See what the profiler has to say and go from there. Exact code to reproduce will help a lot in getting assistance from the community.
I don't know if you have tried running your code multiple times, but one potential explanation I've noticed is that the very first run of a newly updated file is usually slower than subsequent runs (I assume due to compiling). I'm guessing you may see different timing for the third line of the following (called after modifying foo):
tic; foo(args); toc; % First call of foo
tic; bar(args); toc; % Second call of foo inside bar
tic; foo(args); toc; % Third call of foo
Have you tried foo a second time without clearing variables? I'm unable to reproduce this performance increase if I run it repeatedly. Else, it does seem faster but that's only because MATLAB does precompile these functions if you run them once.
function barfoo
for i = 1:Inf
end
end
And,
function foobar
barfoo();
end