MATLAB: creating a fast function that closes over a large variable - matlab

I am trying to create a function that stores a very large variable for use each time the function is called. I have a function myfun(x,y) where y is very large, and therefore slow to execute because MATLAB is pass-by-value. However, I only pass the variable y once during execution of a program, creating a closure that is then passed off to another function to call repeatedly:
Y = create_big_matrix();
newfun = #(x) myfun(x,Y);
some_other_fun(newfun); % This calls newfun several times
I assume that each time newfun is called, it copies the stored value of Y to myfun. This seems very inefficient. Is there a better way to implement newfun so that Y is only copied once, when the newfun is created (and maybe when it's passed to some_other_fun)?

MATLAB has copy-on-write mechanisms that prevent a copy of Y when myfun(x,Y) is called, unless it modifies the input. I do not think you need to worry about performance issues in this case, but I would have to see the code for myfun and run tests to verify.
Quoting this post on a MathWorks blog:
Some users think that because MATLAB behaves as if data are passed by value (as opposed to by reference), that MATLAB always makes copies of the inputs when calling a function. This is not necessarily true.
The article goes on to describe that MATLAB has limited abilities to recognize when a variable is modified by a function and avoids copies when possible. See also this SO answer summarizing the "under the hood" operations during copy-on-write as described in an old newsreader post.
To investigate whether a copy is taking place, use format debug to check the data pointer address pr for Y in the calling function, and again inside myfun. If it is the same, no copy is taking place. Set a breakpoint to step through and examine this pointer.

Since Y is not a parameter of the function handle, it is only passed to the function handle once, namely at the time of it's execution - including all fancy copy-on-write optimizations there might be.
So the function_handle gets its own version of Y right at the time of its definition.
During execution of the function_handle Y is not passed to the function_handle at all.
You can visualize that with a easy example:
>> r = rand();
% at this point the function_handle gets lazy copy of r, stored in its own workspace
>> f = #() r;
>> r
r =
0.6423
% what happens to the original r, doesn't matter to the function-handle:
>> clear r
>> f()
ans =
0.6423
An actual copy should only be made if the original Y gets modified / deleted (where it's really copied on delete might be another question).
You can check whether or not Y gets really copied into the function_handle workspace using functions, with the above example (only with a different value for r):
r =
Structure address = c601538
m = 1
n = 1
pr = 4c834a60
pi = 0
0.4464
>> fcn = functions(f);
>> fcn.workspace{1}.r
ans =
Structure address = c6010c0
m = 1
n = 1
pr = 4c834a60
pi = 0
0.4464
You can see that, as discussed in chappjc's answer, the parameter is only lazily copied at this time - so no full copy is created.

Related

deleting a huge matrix after passing its local copy to a function in Matlab

Considering we have a huge matrix called A and we pass it to the function func(A) wherein func I do a set of computations like:
func(A):
B=A;
%% a lot of processes will happen on B here
return B;
end
The fact is that as soon as I pass A to B I would not have anything to do with A anymore in my Matlab session so it takes an unnecessary space in memory. Is it possible to remove its instance in the scope of the script that called func?
Using evalin with the option caller you can evaluate expression clear A:
function A = func(A)
evalin('caller', 'clear A')
A(1) = 5;
end
However we usually don't know the name of the input variable so we can use inputname to get the actual name of the workspace variable:
function A = func(A)
name = inputname(1);
if ~isempty(name)
evalin('caller', ['clear ' name])
end
A(1)=4;
end
1.Here inputname(1) means the actual name of the first argument.
2.Work directly with A because if you copy A into B the function scope will have two copies of A.
If you write your function as
function A = func(A)
% Do lots of processing on A here
and call it as
A = func(A);
Then MATLAB will optimize it such that you work on A in-place, meaning that no copy is made. There is no need to delete A from the workspace.
This behavior is not expressly documented as far as I know, but it is well-known. See for example on Undocumented MATLAB, or on Loren's blog.

MATLAB Using fzero - returns error

I'm trying to use the MATLAB function fzero properly but my program keeps returning an error message. This is my code (made up of two m-files):
friction_zero.m
function fric_zero = friction_zero(reynolds)
fric_zero = 0.25*power(log10(5.74/(power(reynolds,0.9))),-2);
flow.m
function f = flow(fric)
f = 1/(sqrt(fric))-1.873*log10(reynolds*sqrt(fric))-233/((reynolds*sqrt(fric))^0.9)-0.2361;
f_initial = friction_zero(power(10,4));
z = fzero(#flow,f_initial)
The goal is to return z as the root for the equation specified by f when flow.m is run.
I believe I have the correct syntax as I have spent a couple of hours online looking at examples. What happens is that it returns the following error message:
"Undefined function or variable 'fric'."
(Of course it's undefined, it's the variable I'm trying to solve!)
Can someone point out to me what I've done wrong? Thanks
EDIT
Thanks to all who helped! You have assisted me to eventually figure out my problem.
I had to add another file. Here is a full summary of the completed code with output.
friction_zero.m
function fric_zero = friction_zero(re)
fric_zero = 0.25*power(log10(5.74/(power(re,0.9))),-2); %starting value for fric
flow.m
function z = flow(fric)
re = power(10,4);
z = 1/(sqrt(fric))-1.873*log10(re*sqrt(fric))-233/((re*sqrt(fric))^0.9)-0.2361;
flow2.m
f_initial = friction_zero(re); %arbitrary starting value (Reynolds)
x = #flow;
fric_root = fzero(x,f_initial)
This returns an output of:
fric_root = 0.0235
Which seems to be the correct answer (phew!)
I realised that (1) I didn't define reynolds (which is now just re) in the right place, and (2) I was trying to do too much and thus skipped out on the line x = #flow;, for some reason when I added the extra line in, MATLAB stopped complaining. Not sure why it wouldn't have just taken #flow straight into fzero().
Once again, thanks :)
You need to make sure that f is a function in your code. This is simply an expression with reynolds being a constant when it isn't defined. As such, wrap this as an anonymous function with fric as the input variable. Also, you need to make sure the output variable from your function is z, not f. Since you're solving for fric, you don't need to specify this as the input variable into flow. Also, you need to specify f as the input into fzero, not flow. flow is the name of your main function. In addition, reynolds in flow is not defined, so I'm going to assume that it's the same as what you specified to friction_zero. With these edits, try doing this:
function z = flow()
reynolds = power(10,4);
f = #(fric) 1/(sqrt(fric))-1.873*log10(reynolds*sqrt(fric))-233/((reynolds*sqrt(fric))^0.9)-0.2361;
f_initial = friction_zero(reynolds);
z = fzero(#f, f_initial); %// You're solving for `f`, not flow. flow is your function name
The reason that you have a problem is because flow is called without argument I think. You should read a little more about matlab functions. By the way, reynolds is not defined either.
I am afraid I cannot help you completely since I have not been doing fluid mechanics. However, I can tell you about functions.
A matlab function definition looks something like this:
function x0 = f(xGuess)
a = 2;
fcn =#(t) a*t.^3+t; % t must not be an input to f.
disp(fcn);
a = 3;
disp(fcn);
x0 = fsolve(fcn1,xGuess); % x0 is calculated here
The function can then ne called as myX0 = f(myGuess). When you define a matlab function with arguments and return values, you must tell matlab what to do with them. Matlab cannot guess that. In this function you tell matlab to use xGuess as an initial guess to fsolve, when solving the anonymous function fcn. Notice also that matlab does not assume that an undefined variable is an independent variable. You need to tell matlab that now I want to create an anonymous function fcn which have an independent variable t.
Observation 1: I use .^. This is since the function will take an argument an evaluate it and this argument can also be a vector. In this particulat case I want pointwise evaluation. This is not really necessary when using fsolve but it is good practice if f is not a matrix equation, since "vectorization" is often used in matlab.
Observation 2: notice that even if a changes its value the function does not change. This is since matlab passes the value of a variable when defining a function and not the variable itself. A c programmer would say that a variable is passed by its value and not by a pointer. This means that fcn is really defined as fcn = #(x) 2*t.^3+t;. Using the variable a is just a conveniance (constants can may also be complicated to find, but when found they are just a value).
Armed with this knowledge, you should be able to tackle the problem in front of you. Also, the recursive call to flow in your function will eventuallt cause a crash. When you write a function that calls itself like this you must have a stopping criterium, something to tell the program when to stop. As it is now, flow will call ifself in the last row, like z = fzero(#flow,f_initial) for 500 times and then crash. Alos it is possible as well to define functions with zero inputs:
function plancksConstant = h()
plancksConstant = 6.62606957e−34;
Where the call h or h() will return Plancks constant.
Good luck!

MATLAB function passing by reference

I have a class with properties in it (let say the name of the class file is inputvar),
I use it as the input argument for two different functions, which have an exactly the same calculation, but a little bit different code, which I'll explain later.
For the first function (let say the name is myfun1), I wrote the input argument like this:
f = myfun1 (inputvar)
So every time I want to use variables from the class inside the function, I'll have to call inputvar.var1, inputvar.var2, and etc.
For the second function (myfun2), I wrote each variables from the class in the input argument, so it looks like this:
f = myfun2 (inputvar.var1, inputvar.var2, ... etc )
Inside the function, I just have to use var1, var2, and etc, without having to include the name of the class.
After running both functions, I found that myfun2 runs a lot faster than myfun1, about 60% (I used tic-toc).
Can someone explain to me exactly why is that ?
with the reference:
MATLAB uses a system commonly called "copy-on-write" to avoid making a
copy of the input argument inside the function workspace until or
unless you modify the input argument. If you do not modify the input
argument, MATLAB will avoid making a copy. For instance, in this code:
function y = functionOfLargeMatrix(x) y = x(1); MATLAB will not make a
copy of the input in the workspace of functionOfLargeMatrix, as x is
not being changed in that function. If on the other hand, you called
this function:
function y = functionOfLargeMatrix2(x) x(2) = 2; y = x(1);
then x is being modified inside the workspace of functionOfLargeMatrix2, and so a copy must be made.
According to the statement above, when you directly pass a class object and you change any member of this object, a whole copy operation for the class is applied.
On the other side, with giving the class members as separate arguments, copy operation is applied only for the related members modified in the function, resulting in a faster execution.
I found that accessing properties is very slow in Matlab. I have not found a way around it, but some basic ideas are found here: http://blogs.mathworks.com/loren/2012/03/26/considering-performance-in-object-oriented-matlab-code/
But this article talks only about avoiding horrible, abysmal performance. Even with the simplest properties, performance is mediocre at best.
Take the example class from the Mathworks article. I did small test script:
clear all
clc
n = 1e5;
%% OOP way - abysimal
result = zeros(1, n);
tic
for i = 1:n
cyl = SimpleCylinder();
cyl.R = i;
cyl.Height = 10;
result(i) = cyl.volume();
end
toc
%% OOP Vectorized - fair
clear result
tic
cyl = SimpleCylinder();
cyl.R = 1:n;
cyl.Height = 10;
result = cyl.volume();
toc
%% for loop without objects - good
result = zeros(1, n);
tic
for i = 1:n
result(i) = pi .* i.^2 .* 10;
end
toc
%% Vectorized without objects - excellent
clear result
tic
R = 1:n;
result = pi .* R.^2 .* 10;
toc
With these results:
Elapsed time is 6.141445 seconds.
Elapsed time is 0.006245 seconds.
Elapsed time is 0.002116 seconds.
Elapsed time is 0.000478 seconds.
As you can see, every property access is slowing down. Try to vectorize (as always) but even the simple for-loop outperforms the vectorized OOP solution for small n. (On my PC, they break even at 1e7)
Essential message: OOP in Matlab is slow! You pay the price for every property access.
To your question: When you call
myfun2 (inputvar.var1, inputvar.var2, ... etc )
the values are copied. Within the function, you are no longer dealing with classes. Access to variables is fast. However, if you pass the whole class, every access to a property is slow. You can circumvent this by caching all properties in local variables and use these.
If you modify the class to inherit from handle everything gets a bit faster, but the difference is negligible.

Efficient Way to Generate Vector of Points Used in Recursive Scheme

I am implementing the adaptive Simpsons method in Matlab recursively. I wish to store all of the points where function evaluations take place to generate a histogram after integrating. I currently have:
function [S, points] = adsimp(f, a, b, fv, tol, level, points)
...
d = (a+b)*0.25;
e = (a+b)*0.75;
points = [points, d, e];
...
Thus, for every function call, I am increasing the length of points by two. My understanding of Matlab's function input/output scheme is poor. I'd like to know:
1) When the input and output share a variable name, does this use a single variable, or is a local copy made and then returned?
2) If it is a copy, is there a way to pass points by reference and preallocate sufficient memory?
To answer your first question, see here. Most MATLAB variables are passed by value (matrices, etc.) unless it is a handle object (function handle, axis handle etc.) A local copy of an input variable is made only if that variable is altered in the function. ie.
function y = doTheFunc1(x)
x(2) = 17;
y = x;
a copy must be made. As opposed to:
function y = doTheFunc2(x)
y = x(1);
where no copy need be made inside the function. In other words, MATLAB is a "copy on write" language. I am almost certain this is true regardless what your output variable output name is (ie. this holds even if your output and input are both named x).
To answer your second question, look at the first answer here. Consider using a nested function or a handle object.

Matlab function handle workspace shenanigans

In short: is there an elegant way to restrict the scope of anonymous functions, or is Matlab broken in this example?
I have a function that creates a function handle to be used in a pipe network solver. It takes as input a Network state which includes information about the pipes and their connections (or edges and vertices if you must), constructs a large string which will return a large matrix when in function form and "evals" that string to create the handle.
function [Jv,...] = getPipeEquations(Network)
... %// some stuff happens here
Jv_str = ['[listConnected(~endNodes,:)',...
' .* areaPipes(~endNodes,:);\n',...
anotherLongString,']'];
Jv_str = sprintf(Jv_str); %// This makes debugging the string easier
eval(['Jv = #(v,f,rho)', Jv_str, ';']);
This function works as intended, but whenever I need to save later data structures that contain this function handle, it requires a ridiculous amount of memory (150MB) - coincidentally about as much as the entire Matlab workspace at the time of this function's creation (~150MB). The variables that this function handle requires from the getPipeEquations workspace are not particularly large, but what's even crazier is that when I examine the function handle:
>> f = functions(Network.jacobianFun)
f =
function: [1x8323 char]
type: 'anonymous'
file: '...\pkg\+adv\+pipe\getPipeEquations.m'
workspace: {2x1 cell}
...the workspace field contains everything that getPipeEquations had (which, incidentally is not the entire Matlab workspace).
If I instead move the eval statement to a sub-function in an attempt to force the scope, the handle will save much more compactly (~1MB):
function Jv = getJacobianHandle(Jv_str,listConnected,areaPipes,endNodes,D,L,g,dz)
eval(['Jv = #(v,f,rho)', Jv_str, ';']);
Is this expected behavior? Is there a more elegant way to restrict the scope of this anonymous function?
As an addendum, when I run the simulation that includes this function several times, clearing workspaces becomes painfully slow, which may or may not be related to Matlab's handling of the function and its workspace.
I can reproduce: anonymous functions for me are capturing copies of all variables in the enclosing workspace, not just those referenced in the expression of the anonymous function.
Here's a minimal repro.
function fcn = so_many_variables()
a = 1;
b = 2;
c = 3;
fcn = #(x) a+x;
a = 42;
And indeed, it captures a copy of the whole enclosing workspace.
>> f = so_many_variables;
>> f_info = functions(f);
>> f_info.workspace{1}
ans =
a: 1
>> f_info.workspace{2}
ans =
fcn: #(x)a+x
a: 1
b: 2
c: 3
This was a surprise to me at first. But it makes sense when you think about it: because of the presence of feval and eval, Matlab can't actually know at construction time what variables the anonymous function is actually going to end up referencing. So it has to capture everything in scope just in case they get referenced dynamically, like in this contrived example. This uses the value of foo but Matlab won't know that until you invoke the returned function handle.
function fcn = so_many_variables()
a = 1;
b = 2;
foo = 42;
fcn = #(x) x + eval(['f' 'oo']);
The workaround you're doing - isolating the function construction in a separate function with a minimal workspace - sounds like the right fix.
Here's a generalized way to get that restricted workspace to build your anonymous function in.
function eval_with_vars_out = eval_with_vars(eval_with_vars_expr, varargin)
% Assign variables to the local workspace so they can be captured
ewvo__reserved_names = {'varargin','eval_with_vars_out','eval_with_vars_expr','ewvo__reserved_names','ewvo_i'};
for ewvo_i = 2:nargin
if ismember(inputname(ewvo_i), ewvo__reserved_names)
error('variable name collision: %s', inputname(ewvo_i));
end
eval([ inputname(ewvo_i) ' = varargin{ewvo_i-1};']);
end
clear ewvo_i ewvo__reserved_names varargin;
% And eval the expression in that context
eval_with_vars_out = eval(eval_with_vars_expr);
The long variable names here hurt readability, but reduce the likelihood of collision with the caller's variables.
You just call eval_with_vars() instead of eval(), and pass in all the input variables as additional arguments. Then you don't have to type up a static function definition for each of your anonymous function builders. This'll work as long as you know up front what variables are actually going to be referenced, which is the same limitation as the approach with getJacobianHandle.
Jv = eval_with_vars_out(['#(v,f,rho) ' Jv_str],listConnected,areaPipes,endNodes,D,L,g,dz);
Anonymous functions capture everything within their scope and store them in the function workspace. See MATLAB documentation for anonymous functions
In particular:
"Variables specified in the body of the expression. MATLAB captures these variables and holds them constant throughout the lifetime of the function handle.
The latter variables must have a value assigned to them at the time you construct an anonymous function that uses them. Upon construction, MATLAB captures the current value for each variable specified in the body of that function. The function will continue to associate this value with the variable even if the value should change in the workspace or go out of scope."
An alternative workaround to your problem, is to use the fact that the matlab save function can be used to save only the specific variables you need. I have had issues with the save function saving way too much data (very different context from yours), but some judicial naming conventions, and use of wildcards in the variables list made all my problems go away.