Matlab profiler: optimoptions() is taking about the same time as lsqnonlin()

Matlab profiler: optimoptions() is taking about the same time as lsqnonlin() - matlab

I am running code which calls lsqnonlin() 1000 times. I profiled the code and found that optimoptions() was taking about 40% of the time. Instead, I set the opt1 = optimotpions() once and passed opt1 as an argument to the function running lsqnonlin() and I saw performance improvement.
What is taking optimoptions() that long?

Ok, let's take a look at what optimoptions function does internally:
open optimoptions
It seems like the core part of the function is represented by the following line:
options = optim.options.createSolverOptions(solverName, varargin{:});
Let's take a look at the createSolverOptions function too:
open optim.options.createSolverOptions
Bingo! A comment in the code reveals the reason why optimoptions, if called multiple times within a loop, may have a huge impact on the overall performance:
We check to see if there is an installation of Global Optimization Toolbox here. Furthermore, we assume that these toolbox files will not be removed between calls to this function.
Note that we do not perform a license check here to see if a user can create a set of Global Optimization toolbox options. To ensure the license check is correct, we have to check for license every time this function is called. This is very expensive if optimoptions is called multiple times in a tight loop.
As such, we rely on the license manager to throw an error in the case where the user has a Global Optimization Toolbox installation, but no license is available.
The process of option objects creation is drastically slowed down by a license check included within the code. On the top of that, the optimoptions function and its subfunctions include a lot of try-catch blocks, regex extractions and transformations within for loops, and other nice pieces of code that are not very performance-friendly.

Related

MATLAB: Does the execution of addpath/rmpath/savepath in one MATLAB instance affect other instances?

Does the execution of addpath/rmpath/savepath in one MATLAB instance affect other instances?
Motivation: Imagine that you are developing a MATLAB package, which provides a group of functions to the users. You have multiple versions of this package being developed on a single laptop. You would like to test these different versions in multiple instances of MATLAB:
You open one MATLAB window, type run_test(DIRECTORY_OF_PACKAGE_VERSION1), and hit enter;
While the first test is running, you open another MATLAB window, type run_test(DIRECTORY_OF_PACKAGE_VERSION2), and hit enter.
See the pseudo-code below for a better idea about the tests.
No code or data is shared between different tests --- except for those embedded in MATLAB, as the tests are running on the same laptop, using the same installation of MATLAB. Below is a piece of pseudo-code for such a scenario.
% MATLAB instance 1
run_test(DIRECTORY_OF_PACKAGE_VERSION1);
% MATLAB instance 2
run_test(DIRECTORY_OF_PACKAGE_VERSION2);
% Code for the tests
function run_test(package_directory)
setup_package(package_dirctory);
RUN EXPERIMENTS TO TEST THE FUNCTIONS PROVIDED BY THE PACKAGE;
uninstall_package(package_directory);
end
% This is the setup of the package that you are developing.
% It should be called as a black box in the tests.
function setup_package(package_dirctory)
addpath(PATH_TO_THE_FUNCTIONS_PROVIDED_BY_THE_PACKAGE);
% Make the package available in subsequent MATLAB sessions
savepath;
end
% The function that uninstalls the package: remove the paths
% added by `setup_package` and delete the files etc.
function uninstall_package(package_directory)
rmpath(PATH_TO_THE_FUNCTIONS_PROVIDED_BY_THE_PACKAGE);
savepath;
end
You want to make sure the following.
The tests do not interfere with each other;
Each test is calling funtions from the correct version of the package.
Hence here come our questions.
Questions:
Does the execuation of addpath, rmpath, and savepath in one MATLAB instance affect the other instance, sooner or later?
More generally, what kind of commands executed in one MATLAB instance can affect the other instance？
3. What if I am running only one instance of MATLAB, but invoke a parfor loop with two loops running in parallel? Does the execution of addpath/rmpath/savepath in one loop affect the other loop, sooner or later? In general, what kind of commands executed in one parallel loop can affect the other loop? (As pointed out by #Edric, this can be complicated; so let us not worry about it. Thank you, #Edric.)
Thank you very much for any comments and insights. It would be much appreciated if you could direct me to relevant sections in the official documentation of MATLAB --- I did some searching in the documentation, but have not found an answer to my question.
BTW, in case you find that the test described in the pseudo code is conducted in a wrong/bad manner, I will be very grateful if you could recommend a better way of doing it.

The documentation page for the MATLAB Search Path specifies at the bottom:
When you change the search path, MATLAB uses it in the current session, but does not update pathdef.m. To use the modified search path in the current and future sessions, save the changes using savepath or the Save button in the Set Path dialog box. This updates pathdef.m.
So, standard MATLAB sessions are "isolated" in terms of their MATLAB Search Path unless you use savepath. After a call to savepath, new MATLAB sessions will read the updated pathdef.m on startup.
The situation with a parallel pool is slightly more complex. There are a couple of things that affect this. First is the parameter AutoAddClientPath that you can specify for the parpool command. When true, an attempt is made to reflect the desktop MATLAB's path on the workers. (This might not work if the workers cannot access the same folders).
When a parallel pool is running, any changes to the path on the desktop MATLAB client are sent to the workers, so they can attempt to add or remove path entries. Parallel pool workers calling addpath or rmpath do so in isolation. (I'm afraid I can't find a documentation reference for this).

Abort execution of parsim

For the use case of being able to abort parallel simulations with a MATLAB GUI, I would like to stop all scheduled simulations after the user pressed the Stop button.
All simulations are submitted at once using the parsim command, hence something like a callback to my GUI variables (App Designer) would be the most preferable solution.
Approaches I have considered but were not exactly providing a desirable solution:
The Simulation Manager provides the functionality to close simulations using its own interface. If I only had the code its Stop button executes...
parsim uses the Simulink.SimulationInput class as input to run simulations, allowing to modify the preSimFcn at the beginning of each simulation. I have not found a way to "skip" the simulation at its initialization phase apart from intentionally throwing an error so far.
Thank you for your help!
Update 1: Using the preSimFcn to set the the termination time equal to the start time drastically reduces simulation time. But since the first step still is computed there has to be a better solution.
simin = simin.setModelParameter('StopTime',get_param(mdl,'StartTime'))
Update 2: Intentionally throwing an error executing the preSimFcn, for example by setting it to
simin = simin.setModelParameter('SimulationCommand','stop')
provides the shortest termination times for me so far. Though, it requires catching and identifying the error in the ErrorMessageof the Simulink.SimulationOutput object. As this is exactly the "ugly" implementation I wanted to avoid, the issue is still active.

If you are using 17b or later, parsim provides an option to 'RunInBackground'. It returns an array of Future objects.
F = parsim(in, 'RunInBackground', 'on')
Please note that is only available for parallel simulations. The Simulink.Simulation.Future object F provides a cancel method which will terminate the simulation. You can use the fetchOutputs methods to fetch the output from the simulation.
F.cancel();

Determining direct-feedthrough paths without compilation/execution

I am currently working on a tool written in M-Script that executes a set of checks on a given simulink model. This tool does not compile/execute the model, I'm using find_system and get_param to retrieve all the information I need in order to run the routines of my tool.
I've reached a point where I need to determine whether a certain block has direct-feedthrough or not. I am not entirely sure how to do this. Two possible solutions come to mind:
A property might store this information and might be accessible via get_param. After investigating this, I could not find any such property.
Some block types have direct-feedthrough (Sum, Logic, ...), some other do not (Unit Delay, Integrator), so I could use the block type to determine whether a block has direct-feedthrough or not. Since I'm not an experienced Simulink modeller, I'm not sure if its possible to tell whether a block has direct-feedthrough by solely looking at its block type. Also, this would require a lookup table including all Simulink block types. An impossible task, since additional block types might get added to Simulink via third party modules.
Any help or pointers to possible solutions are greatly appreciated.

after some further research...
There is an "official solution" by Matlab:
just download the linked m-file
It shows that my idea was not that bad ;)
and for the record, my idea:
I think it's doable quite easily. I cannot present you some code yet, but I'll see what I can do. My idea is the following:
programatically create a new model
Add a Constant source block and a Terminator
add the Block you want to get to know the direct feedthrough ability in the middle
add_lines
run the simulation and log the states, which will give you the xout variable in the workspace.
If there is direct feedthrough the vector is empty, otherwise not.
probably you need to include some try/catch error catching for special cases
This way you can analyse a block for direct feedthrough by just migrating it to another model, without compiling your actual main model. It's not the fastest solution, but I can not imagine that performance matters that much for you.
Here we go, this script works fine for my examples:
function feedthrough = hasfeedthrough( input )
% get block path
blockinfo = find_system('simulink','Name',input);
blockpath = blockinfo{1};
% create new system
new_system('feed');
open_system('feed');
% add test model elements
src = add_block('simulink/Sources/Constant','feed/Constant');
src_ports = get_param(src,'PortHandles');
src_out = src_ports.Outport;
dest = add_block('simulink/Sinks/To Workspace','feed/simout');
dest_ports = get_param(dest,'PortHandles');
dest_in = dest_ports.Inport;
test = add_block(blockpath,'feed/test');
test_ports = get_param(test,'PortHandles');
test_in = test_ports.Inport;
test_out = test_ports.Outport;
add_line('feed',src_out,test_in);
add_line('feed',test_out,dest_in);
% setup simulation
set_param('feed','StopTime','0.1');
set_param('feed','Solver','ode3');
set_param('feed','FixedStep','0.05');
set_param('feed','SaveState','on');
% run simulation and get states
sim('feed');
% if condition for blocks like state space
feedthrough = isempty(xout);
if ~feedthrough
a = simout.data;
if ~any(a == xout);
feedthrough = ~feedthrough;
end
end
delete system
close_system('feed',1)
delete('feed');
end
When enter for example 'Gain' it will return 1, when you enter 'Integrator' it will return 0.
Execution time on my ancient machine is 1.3sec, not that bad.
Things you probably still have to do:
add another parameter, to define whether the block is continuous or discrete time and set the solver accordingly.
test some "extraordinary" blocks, maybe it's not working for everything. Also I haven implemented anything which could deal with logic, but actually the constant is 1 so it should work as well.
Just try out everything, at least it's a good base for you to work on.
A famous exception is the StateSpace Block which can have direct feedthrough AND states. But there are not sooo much standard blocks with this "behaviour". If you also have to deal with third party blocks you could get into some trouble, I have to admit that.
possible solution for the state space: if one compares xout with yout than one can find another indicator for direct feedthrough: if there is, the vectors are not equal. If so, than they are equal. Just an example, but I can imagine that it is possible to find more general ways to test things like that.
besides the added simout block above one needs the condition:
% if condition for blocks like state space
feedthrough = isempty(xout);
if ~feedthrough
a = simout.data;
if ~any(a == xout);
feedthrough = ~feedthrough;
end
end

From the documentation:
Tip
To determine if a block has direct feedthrough:
Double-click the
block. The block parameter dialog box opens.
Click the Help button in
the block parameter dialog box. The block reference page opens.
Scroll
to the Characteristics section of the block reference page, which
lists whether or not that block has direct feedthrough.
I couldn't find a programmatic equivalent though...

Based on a similar approach to the one by #thewaywewalk, you could set up a temporary model that contains an algebraic loop, similar to,
(Note that you would replace the State-Space block with any block that you want to test.)
Then set the diagnostics to error out if there is an algebraic loop,
If an error occurs when the model is compiled
>> modelname([],[],[],'compile');
(and you should check that it is the Algebraic Loop error that has occured), then the block has direct feed though.
If no error occurs then the block does not have direct feed though.
At this point you would need to terminate the model using
>> modelname([],[],[],'term');
If the block has multiple inports or outprts then you'll need to iterate over all combinations of them.

MATLAB takes a long time after last line of a function

I have a function that's taking a long time to run. When I profile it, I find that over half the time (26 out of 50 seconds) is not accounted for in the line by line timing breakdown, and I can show that the time is spent after the function finishes running but before it returns control by the following method:
ts1 = tic;
disp ('calling function');
functionCall(args);
disp (['control returned to caller - ', num2str(toc(ts1))]);
The first line of the function I call is ts2 = tic, and the last line is
disp (['last line of function- ', num2str(toc(ts2))]);
The result is
calling function
last line of function - 24.0043
control returned to caller - 49.857
Poking around on the interwebs, I think this is a symptom of the way MATLAB manages memory. It deallocates on function returns, and sometimes this takes a long time. The function does allocate some large (~1 million element) arrays. It also works with handles, but does not create any new handle objects or store handles explicitly. My questions are:
Is this definitely a memory management problem?
Is there any systematic way to diagnose what causes a problem in this function, as opposed to others which return quickly?
Are there general tips for reducing the amount of time MATLAB spends cleaning up on a function exit?

You are right, it seems to be the time spent on garbage collection. I am afraid it is a fundamental MATLAB flaw, it is known since years but MathWorks has not solved it even in the newest MATLAB version 2010b.
You could try setting variables manually to [] before leaving function - i.e. doing garbage collection manually. This technique also helps against memory leaks in previous MATLAB versions. Now MATLAB will spent time not on end but on myVar=[];
You could alleviate problem working without any kind of references - anonymous functions, nested functions, handle classes, not using cellfun and arrayfun.
If you have arrived to the "performance barrier" of MATLAB then maybe you should simply change the environment. I do not see any sense anyway starting today a new project in MATLAB except if you are using SIMULINK. Python rocks for technical computing and with C# you can also do many things MATLAB does using free libraries. And both are real programming languages and are free, unlike MATLAB.

I discovered a fix to my specific problem that may be applicable in general.
The function that was taking a long time to exit was called on a basic object that contained a vector of handle objects. When I changed the definition of the basic object to extend handle, I eliminated the lag on the close of the function.
What I believe was happening is this: When I passed the basic object to my function, it created a copy of that object (MATLAB is pass by value by default). This doesn't take a lot of time, but when the function exited, it destroyed the object copy, which caused it to look through the vector of handle objects to make sure there weren't any orphans that needed to be cleaned up. I believe it is this operation that was taking MATLAB a long time.
When I changed the object I was passing to a handle, no copy was made in the function workspace, so no cleanup of the object was required at the end.
This suggests a general rule to me:
If a function is taking a long time to clean up its workspace on exiting and you are passing a lot of data or complex structures by value, try encapsulating the arguments to that function in a handle object
This will avoid duplication and hence time consuming cleanup on exit. The downside is that your function can now unexpectedly change your inputs, because MATLAB doesn't have the ability to declare an argument const, as in c++.

A simple fix could be this: pre-allocate the large arrays and pass them as args to your functionCall(). This moves the deallocation issue back to the caller of functionCall(), but it could be that you are calling functionCall more often than its parent, in which case this will speed up your code.
workArr = zeros(1,1e6); % allocate once
...
functionCall(args,workArr); % call with extra argument
...
functionCall(args,wokrArr); % call again, no realloc of workArr needed
...
Inside functionCall you can take care of initializing and/or re-setting workArr, for instance
[workArr(:)] = 0; % reset work array

MATLAB Magical Mystery timing behavior

I am experiencing some very odd timing behavior from a function I wrote. If I wrap my function inside another empty container function, it gets a 3x speedup.
>> tic; foo(args); toc
time elapsed: ~140 seconds
>>tic; bar(args); toc
time elapsed: ~35 seconds
Here's the kicker - the definition of bar():
define bar(args)
foo(args)
end
Is there some sort of optimization that gets triggered in MATLAB for nested function calls? Should I be adding a dummy function to every function that I write?

The JIT accelerator does not operate on command line expressions as far as I know. Thus, when you run "tic; foo(args); toc" foo's code runs entirely in the MATLAB interpreter. However, when you run "tic; bar(args); toc", bar is evaluated in the interpreter and the JIT accelerator takes a shot at compiling the call to foo() to native code.
I'm really waving my hands over the details, but that's the gist of it. Details for MATLAB's JIT capabilities are hard to come by; most of what I've found is on Loren's blog at The MathWorks. The closest authoritative statement I can find about the command line being interpreter-only is here:
http://blogs.mathworks.com/loren/2006/05/10/memory-management-for-functions-and-variables/#comment-207

This is surprising behavior. An intermediate function call should not speed things up like that.
Try profiling it and see where it's spending its time. This is the best first resort to almost any "Why is my Matlab code slow?" question.
clear all
profile on -timer real
foo(args);
profile report
%read the report and save a screencap
clear all
profile clear
bar(args);
profile report
There ends the advice. Here starts the speculation.
There are a couple things that are different in the two calls. There is workspace interaction. Calling foo() from the command line may leave the variable "ans" populated in your workspace. When called from bar(), ans will be set but then immediately cleared when bar() returns. Also, the foo() may be using evalin()/assignin() to look into workspaces up the call stack, and it may interact with variables assigned in your base workspace. The bar() function has a clean workspace.
Depending on where bar.m is, it may actually be invoking a different foo(), or maybe resolving it slightly differently. Check your path resolution with "which foo" in both contexts.
Depending on how "args" is defined, different inputname()s may be visible to foo.
Also, foo() may contain pathological code that checks whether it is being called from the base workspace, or even whether it's being called by a function of a particular name, and behaves differently based on that.
That said, these should mostly be minor interactions and shouldn't cause a slowdown of that order. I'd suspect something else was going on, maybe just exposed by slightly different calling contexts. Adding a level of indirection with bar() shouldn't be the answer. See what the profiler has to say and go from there. Exact code to reproduce will help a lot in getting assistance from the community.

I don't know if you have tried running your code multiple times, but one potential explanation I've noticed is that the very first run of a newly updated file is usually slower than subsequent runs (I assume due to compiling). I'm guessing you may see different timing for the third line of the following (called after modifying foo):
tic; foo(args); toc; % First call of foo
tic; bar(args); toc; % Second call of foo inside bar
tic; foo(args); toc; % Third call of foo

Have you tried foo a second time without clearing variables? I'm unable to reproduce this performance increase if I run it repeatedly. Else, it does seem faster but that's only because MATLAB does precompile these functions if you run them once.
function barfoo
for i = 1:Inf
end
end
And,
function foobar
barfoo();
end

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse