I am experiencing some very odd timing behavior from a function I wrote. If I wrap my function inside another empty container function, it gets a 3x speedup.
>> tic; foo(args); toc
time elapsed: ~140 seconds
>>tic; bar(args); toc
time elapsed: ~35 seconds
Here's the kicker - the definition of bar():
define bar(args)
foo(args)
end
Is there some sort of optimization that gets triggered in MATLAB for nested function calls? Should I be adding a dummy function to every function that I write?
The JIT accelerator does not operate on command line expressions as far as I know. Thus, when you run "tic; foo(args); toc" foo's code runs entirely in the MATLAB interpreter. However, when you run "tic; bar(args); toc", bar is evaluated in the interpreter and the JIT accelerator takes a shot at compiling the call to foo() to native code.
I'm really waving my hands over the details, but that's the gist of it. Details for MATLAB's JIT capabilities are hard to come by; most of what I've found is on Loren's blog at The MathWorks. The closest authoritative statement I can find about the command line being interpreter-only is here:
http://blogs.mathworks.com/loren/2006/05/10/memory-management-for-functions-and-variables/#comment-207
This is surprising behavior. An intermediate function call should not speed things up like that.
Try profiling it and see where it's spending its time. This is the best first resort to almost any "Why is my Matlab code slow?" question.
clear all
profile on -timer real
foo(args);
profile report
%read the report and save a screencap
clear all
profile clear
bar(args);
profile report
There ends the advice. Here starts the speculation.
There are a couple things that are different in the two calls. There is workspace interaction. Calling foo() from the command line may leave the variable "ans" populated in your workspace. When called from bar(), ans will be set but then immediately cleared when bar() returns. Also, the foo() may be using evalin()/assignin() to look into workspaces up the call stack, and it may interact with variables assigned in your base workspace. The bar() function has a clean workspace.
Depending on where bar.m is, it may actually be invoking a different foo(), or maybe resolving it slightly differently. Check your path resolution with "which foo" in both contexts.
Depending on how "args" is defined, different inputname()s may be visible to foo.
Also, foo() may contain pathological code that checks whether it is being called from the base workspace, or even whether it's being called by a function of a particular name, and behaves differently based on that.
That said, these should mostly be minor interactions and shouldn't cause a slowdown of that order. I'd suspect something else was going on, maybe just exposed by slightly different calling contexts. Adding a level of indirection with bar() shouldn't be the answer. See what the profiler has to say and go from there. Exact code to reproduce will help a lot in getting assistance from the community.
I don't know if you have tried running your code multiple times, but one potential explanation I've noticed is that the very first run of a newly updated file is usually slower than subsequent runs (I assume due to compiling). I'm guessing you may see different timing for the third line of the following (called after modifying foo):
tic; foo(args); toc; % First call of foo
tic; bar(args); toc; % Second call of foo inside bar
tic; foo(args); toc; % Third call of foo
Have you tried foo a second time without clearing variables? I'm unable to reproduce this performance increase if I run it repeatedly. Else, it does seem faster but that's only because MATLAB does precompile these functions if you run them once.
function barfoo
for i = 1:Inf
end
end
And,
function foobar
barfoo();
end
Related
I am running code which calls lsqnonlin() 1000 times. I profiled the code and found that optimoptions() was taking about 40% of the time. Instead, I set the opt1 = optimotpions() once and passed opt1 as an argument to the function running lsqnonlin() and I saw performance improvement.
What is taking optimoptions() that long?
Ok, let's take a look at what optimoptions function does internally:
open optimoptions
It seems like the core part of the function is represented by the following line:
options = optim.options.createSolverOptions(solverName, varargin{:});
Let's take a look at the createSolverOptions function too:
open optim.options.createSolverOptions
Bingo! A comment in the code reveals the reason why optimoptions, if called multiple times within a loop, may have a huge impact on the overall performance:
We check to see if there is an installation of Global Optimization Toolbox here. Furthermore, we assume that these toolbox files will not be removed between calls to this function.
Note that we do not perform a license check here to see if a user can create a set of Global Optimization toolbox options. To ensure the license check is correct, we have to check for license every time this function is called. This is very expensive if optimoptions is called multiple times in a tight loop.
As such, we rely on the license manager to throw an error in the case where the user has a Global Optimization Toolbox installation, but no license is available.
The process of option objects creation is drastically slowed down by a license check included within the code. On the top of that, the optimoptions function and its subfunctions include a lot of try-catch blocks, regex extractions and transformations within for loops, and other nice pieces of code that are not very performance-friendly.
Someone on /r/matlab asked me a really interesting question a few days ago related to a Flappy Bird clone submitted to the MATLAB FEX. The poster noticed that if you open the main .m file, stop it in the debugger on the first line, and run a whos(), you see a bunch of variables before they are explicitly defined by the function.
The first thing that I noticed in the editor was the syntax highlighting indicating the presence of nested functions. At a glance, it seems like the variables returned by the whos() are only those that will be defined at some point in the scope of the base function.
You can recreate this with a simpler example:
function testcode
asdf = 1;
function testing
ghfj = 2;
end
end
If you set a breakpoint on the first line and run a whos(), you get
Name Size Bytes Class Attributes
ans 0x0 0 (unassigned)
asdf 0x0 0 (unassigned)
I couldn't seem to find anything explaining this behavior in the documentation for nested functions or related topics. I am not a computer scientist and my programming knowledge is limited to MATLAB and a very small sprinkling of Python. Can anybody explain what is going on? Does it have something to do with how MATLAB compiles the code at run time?
The workspace of a function with nested function is protected. When the function is called, Matlab has to analyze the code to determine which variables are in scope at what part of the function. Remember, variables that are declared in the main function and that are used in a nested function are passed by reference, and can be modified within the nested function even if not explicitly declared as input or output.
To avoid messing up any of the nested functions, and possibly to help speed things up, Matlab does not allow assigning any additional variables to the workspace of that function. For example, if you stop the execution of the code at line 1, and then try assigning a value to a new variable klmn, Matlab will throw an error. This can be a bit frustrating for debugging, but you can always assign ans, fortunately.
I have a problem that I'm having a hard time even framing for this question's title.
I have a library that calculates the properties of refrigerants. For example, you give pressure and enthalpy, and it tells you the temperature. That library is coded in Fortran with a mex file to interface with Matlab. Now, I am 100% sure that library is thoroughly debugged (it was coded by people much smarter than me, and has been used for almost a decade). The problem is definitely in how I call it.
And that problem is this. I call the library from a StartFcn callback (a .m script file) in a subsystem of a simulink model. The first time I run this model, it runs perfectly. The values I'm sedning to the function are therefore correct. The second time I run it, however, it crashes. The inputs both times are exactly the same.
Also, if I do a clear all between the two runs, then there is no crash. But if I do only clearvars or clear, I still get a crash. When I debug and look at the variables being passed in the function call, they are valid and the same both times.
Does someone have any experience with this, or can advise me on what I might be doing wrong? Is there something persisting within the function call that only clear all can remove and not clear? Save My Soul!
Yes, persistent variables can be declared by the persistent keyword.
If you want to clear only those, try
clear StartFcn
to clear all variables of the function StartFcn. A quote from the documentation:
If name is a function name, then clear name reinitializes any persistent variables in the function.
A quick thing to try would be clear mex inbetween simulations - this should clear all the mex code from matlab.
Other questions to think about..
Can you call the fortran interface directly from the matlab command line two times in a row?
I believe that using a m-file sfunction to call fortran in simulink is quite inefficient. Possibly consider writing your own fortran or C sfunction to interface to the code and compile in directly?
in case you're using LoadLibrary to load fortran code compiled into a dll, are you calling FreeLibrary in the mdlTerminate function?
Hope some of this helps.
I would try to put a clear all inside the function that you are calling in the StartFcn Callback.
So let's say your function is:
function [out] = nameoffunction(a,b,c)
%do calculation with a,b,c
d = a + b + c;
%output out
out = d;
assignin('base','out',d)
clear all
And you can call the function:
nameoffunction(a,b,c)
Let me know if it changes something. If this works, you could try other clear command but inside the function.
I've recently learned Haskell, and am trying to carry the pure functional style over to my other code when possible. An important aspect of this is treating all variables as immutable, i.e. constants. In order to do so, many computations that would be implemented using loops in an imperative style have to be performed using recursion, which typically incurs a memory penalty due to the allocation a new stack frame for each function call. In the special case of a tail call (where the return value of a called function is immediately returned to the callee's caller), however, this penalty can be bypassed by a process called tail call optimization (in one method, this can be done by essentially replacing a call with a jmp after setting up the stack properly). Does MATLAB perform TCO by default, or is there a way to tell it to?
If I define a simple tail-recursive function:
function tailtest(n)
if n==0; feature memstats; return; end
tailtest(n-1);
end
and call it so that it will recurse quite deeply:
set(0,'RecursionLimit',10000);
tailtest(1000);
then it doesn't look as if stack frames are eating a lot of memory. However, if I make it recurse much deeper:
set(0,'RecursionLimit',10000);
tailtest(5000);
then (on my machine, today) MATLAB simply crashes: the process unceremoniously dies.
I don't think this is consistent with MATLAB doing any TCO; the case where a function tail-calls itself, only in one place, with no local variables other than a single argument, is just about as simple as anyone could hope for.
So: No, it appears that MATLAB does not do TCO at all, at least by default. I haven't (so far) looked for options that might enable it. I'd be surprised if there were any.
In cases where we don't blow out the stack, how much does recursion cost? See my comment to Bill Cheatham's answer: it looks like the time overhead is nontrivial but not insane.
... Except that Bill Cheatham deleted his answer after I left that comment. OK. So, I took a simple iterative implementation of the Fibonacci function and a simple tail-recursive one, doing essentially the same computation in both, and timed them both on fib(60). The recursive implementation took about 2.5 times longer to run than the iterative one. Of course the relative overhead will be smaller for functions that do more work than one addition and one subtraction per iteration.
(I also agree with delnan's sentiment: highly-recursive code of the sort that feels natural in Haskell is typically likely to be unidiomatic in MATLAB.)
There is a simple way to check this. Create this function tail_recursion_check:
function r = tail_recursion_check(n)
if n > 1
r = tail_recursion_check(n - 1);
else
error('error');
end
end
and run tail_recursion_check(10), for example. You are going to see a very long stack trace with 10 items that says error at line 3. If there were tail call optimization, you would only see one.
I have a function that's taking a long time to run. When I profile it, I find that over half the time (26 out of 50 seconds) is not accounted for in the line by line timing breakdown, and I can show that the time is spent after the function finishes running but before it returns control by the following method:
ts1 = tic;
disp ('calling function');
functionCall(args);
disp (['control returned to caller - ', num2str(toc(ts1))]);
The first line of the function I call is ts2 = tic, and the last line is
disp (['last line of function- ', num2str(toc(ts2))]);
The result is
calling function
last line of function - 24.0043
control returned to caller - 49.857
Poking around on the interwebs, I think this is a symptom of the way MATLAB manages memory. It deallocates on function returns, and sometimes this takes a long time. The function does allocate some large (~1 million element) arrays. It also works with handles, but does not create any new handle objects or store handles explicitly. My questions are:
Is this definitely a memory management problem?
Is there any systematic way to diagnose what causes a problem in this function, as opposed to others which return quickly?
Are there general tips for reducing the amount of time MATLAB spends cleaning up on a function exit?
You are right, it seems to be the time spent on garbage collection. I am afraid it is a fundamental MATLAB flaw, it is known since years but MathWorks has not solved it even in the newest MATLAB version 2010b.
You could try setting variables manually to [] before leaving function - i.e. doing garbage collection manually. This technique also helps against memory leaks in previous MATLAB versions. Now MATLAB will spent time not on end but on myVar=[];
You could alleviate problem working without any kind of references - anonymous functions, nested functions, handle classes, not using cellfun and arrayfun.
If you have arrived to the "performance barrier" of MATLAB then maybe you should simply change the environment. I do not see any sense anyway starting today a new project in MATLAB except if you are using SIMULINK. Python rocks for technical computing and with C# you can also do many things MATLAB does using free libraries. And both are real programming languages and are free, unlike MATLAB.
I discovered a fix to my specific problem that may be applicable in general.
The function that was taking a long time to exit was called on a basic object that contained a vector of handle objects. When I changed the definition of the basic object to extend handle, I eliminated the lag on the close of the function.
What I believe was happening is this: When I passed the basic object to my function, it created a copy of that object (MATLAB is pass by value by default). This doesn't take a lot of time, but when the function exited, it destroyed the object copy, which caused it to look through the vector of handle objects to make sure there weren't any orphans that needed to be cleaned up. I believe it is this operation that was taking MATLAB a long time.
When I changed the object I was passing to a handle, no copy was made in the function workspace, so no cleanup of the object was required at the end.
This suggests a general rule to me:
If a function is taking a long time to clean up its workspace on exiting and you are passing a lot of data or complex structures by value, try encapsulating the arguments to that function in a handle object
This will avoid duplication and hence time consuming cleanup on exit. The downside is that your function can now unexpectedly change your inputs, because MATLAB doesn't have the ability to declare an argument const, as in c++.
A simple fix could be this: pre-allocate the large arrays and pass them as args to your functionCall(). This moves the deallocation issue back to the caller of functionCall(), but it could be that you are calling functionCall more often than its parent, in which case this will speed up your code.
workArr = zeros(1,1e6); % allocate once
...
functionCall(args,workArr); % call with extra argument
...
functionCall(args,wokrArr); % call again, no realloc of workArr needed
...
Inside functionCall you can take care of initializing and/or re-setting workArr, for instance
[workArr(:)] = 0; % reset work array