Exploit Matlab copy-on-write by ensuring function arguments are read-only? - matlab

Background
I'm planning to create a large number of Matlab table objects once, so that I can quickly refer to their contents repeatedly. My understanding is that each table variable/column is treated in copy-on-write manner. That is, if a table column is not modified by a function, then a new copy is not created.
From what I recall of C++ as of 1.5 decades ago, I could ensure that the code for a function does not modify its argument's data by using constant-correctness formalism.
The specific question
I am not using C++ in these days, but I would like to achieve a similar effect of ensuring that the code for my Matlab function doesn't change the data for selected arguments, either inadvertently or otherwise. Does anyone know of a nonburensome way to do this, or just as importantly, whether this is an unrealistic expectation?
I am using R2015b.
P.S. I've web searched and came across various relevant articles, e.g.:
http://www.mathworks.com/matlabcentral/answers/359410-is-it-possible-to-avoid-copy-on-write-behavior-in-functions-yet
http://blogs.mathworks.com/loren/2007/03/22/in-place-operations-on-data
(which I need clarification on to fully understand, but it isn't my priority just now)
However, I don't believe that I am prematurely optimizing. I know that I don't want to modify the tables. I just need a way to enforce that without having to go through contortions like creating a wrapper class.
I've posted this at:
* Stack Overflow
* Google groups

There is no way of making variables constants in MATLAB, except by creating a class with a constant (and static?) member variable. But even then you can do:
t = const_table_class.table;
t(1,1) = 0; % Created and modified a copy!
The reason that a function does not need to mark its inputs as const is because arguments are always passed by value. So a local modification does not modify data in the caller’s workspace. const is something that just doesn’t exist in the MATLAB language.
On the other hand, you can be certain that your data will not be modified by any of the functions you call. Thus, as long as the function that owns the tables does not modify them, they will remain constant. Any function you pass these tables to, if they attempt to modify them, they will create a local copy to be modified. This is only locally a problem. The memory used up by this copy will be freed upon function exit. It will be a bug in the function, but not affect code outside this function.

You can define a handle class that contains a table as it's preperty. Define a property set listener that triggers and generates error/warning when the value of the property changes.
classdef WarningTable < handle
properties (SetObservable)
t
end
methods
function obj = WarningTable(varargin)
obj.t = table(varargin);
addlistener(obj,'t','PreSet',...
#(a,b)warning('table changed!'));
end
end
end
This should generate warning:
mytable = WarningTable;
mytable.t(1,1) = 0;

Related

Collect internal variables of nested functions in matlab

I have a project consisting of multiple nested functions.
For debugging purpose I want to save all internal variables in one way or another, in order to display figures, replay parts of code etc...
I also want to keep this as transparent as possible regarding calculation time.
My first thought was to create a global variable, and store programmatically at the end of each function the inputs and outputs inside the variable as a structure :
globalVariable.nameOfParentfunction_NameOfFunction.nameInput1 = valueInput1;
globalVariable.nameOfParentfunction_NameOfFunction.nameInput2 = valueInput2;
...
globalVariable.nameOfParentfunction_NameOfFunction.nameOutput1 = valueOutput1;
...
Is it possible to use some kind of reflection to get the name and value of inputs/outputs without necessarily parse the file where the function is written?
I found a good but maybe outdated topic about parsing
How do I collect internal signals?
The simple solution is to use save, which will save all variables in the current workspace (the function’s context) to file.
If you want to keep the values in memory, not in a file, you can use names = who to get a list of all variables defined in the current workspace, then use val = eval(names{i}) to get the value of the variable called name{i}.
I would recommend putting all of that in a separate function, which you can call from any other function to store its variables, to avoid repeating code. This function would use evalin('caller',…) to get names and values of variables in the workspace of the calling function.
Note that using eval or evalin prevents MATLAB from optimizing code using its JIT. They also are dangerous to use, since they can execute arbitrary code, however in this case you control what is being executed so that is no a concern.

Matlab MEX-function side effect

MEX is the framework Matlab uses to run C/C++ functions in Matlab (runs faster). In the documentation it says:
input parameters (found in the prhs array) are read-only; do not modify them in your MEX file. Changing data in an input parameter can produce undesired side effects.
Is this simply a warning about how changing a variable passed as a pointer will change that variable even outside of the function (unlike how Matlab works), or is there a more subtle way that this can mess up the Matlab/MEX interface?
Reason I am asking is I specifically want the MEX function to modify the arguments for real.
MATLAB uses lazy copying, which means that when you do b = a, variable b points to the same data as variable a, even though semantically you made a copy. When you now do a(1) = 0, for example, you modify variable a, and MATLAB first makes a copy, so that variable b is not affected by the assignment. This obviously saves a lot of memory, because many copies are made where the copy is not modified.
For example, when calling a function, a copy of input variables are placed in the function’s workspace. sum(a) causes a (lazy) copy of a to be made available inside the function. If the function doesn’t need to modify the variable, a copy is avoided. If it does modify it, then a copy is made so that a is not changed for the caller.
MEX-files work the same way, except that MATLAB cannot detect if you modify the input variable, so it cannot make the copy before you do. Hence the warning. You need to call mxDuplicateArray() to copy the array and make changes to your new copy.
The side effects that the documentation warns about is that the variable in the caller’s workspace is modified, along with all variables that it shares data with. For example imagine you make a MEX-file function modifyIn that modified the input, then:
a = zeros(500);
b = a;
% much later in the code…
modifyIn(b); % update b the way I want!
will very unexpectedly also modify a!
This blog post on Undocumented MATLAB talks about this issue in more detail, and talks about mxUnshareArray(), an undocumented function that you should only use if you are really comfortable with the possible crashes and other issues that could happen. Undocumented functions have a limited shelf life.

MATLAB: overriding table() methods

SETUP Win7 64b, R2015b, 16 GB of RAM, CPU i7-2700
The table() is a fundamental Matlab class which is also sealed, hence I cannot subclass it.
I want to fix some methods of this class and add new ones.
For instance, table.disp() is fundamentally broken, e.g. try NOT disp(table(rand(1e7,1))), or forget the ; in the command window. The variable takes only 76 MB in RAM but the display is unbuffered and it will stall your system!
Can I override methods like table.disp() without writing into matlabroot\toolbox\matlab\datatypes\#table?
Can I extend the table class with a new method under C:\MATLAB\#table\ismatrixlike.m? Why do I get
ismatrixlike(table)
Undefined function 'ismatrixlike' for input arguments of type 'table'.
Obviously, I did
addpath C:\MATLAB\
rehash toolboxcache
I also tried clear all.
The path has (alphabetic) precedence over matlabroot, but is missing a table.m class definition. If I add the native class defition to C:\MATLAB\#table, then I can run my new method (after a clear all). However:
>> methods(table)
Methods for class table:
classVarNames ismatrixlike table varfun
convertColumn renameVarNames unstack
only lists the methods in the new \#table folder, even though (some of) the old methods still work, e.g.
size(table)
This partly solves the problem, since now, the native \#table\private folder is not accessible anymore and therefore many native methods are broken!
Why am I doing this? Because I do not want to wait another 2 years before the table() is fixed. I already lost entire days because I simply forgot a ; in the command window and I cannot force a restart on my pc if it is running multiday simulations, but I have to wait for the disk-swap to end :(.
APPENDIX
More context about disp(table(rand(1e7,1))). This is what happens when I hit it (and luckily I am fast enough to CTRL-C out of it):
The culprit is line 172 of table.disp() which converts the numeric array into a cellstring (with the padding too!):
[cells, err, isLeft] = sprintfc(f, x, b);
After experimenting with several alternatives, I adopted the solution that intereferes the least with Matlab's native #table implementation and it's easily removed if things go awry.
The solution:
copy the whole #table folder, i.e. fullfile(matlabroot,'toolbox','matlab','datatypes','#table'), into a destination where you have write permissions.
I picked the destination to be fullfile(matlabroot,'toolbox','local','myfiles') since I do not have to bother with OS cross-compatibility, i.e. matlabroot takes care of that for me.
paste into the destination your #table folder with the new, overloaded and overriding methods (partially overwriting the copied original files)
add the destination to the matlab path, before the original #table, e.g. addpath your_destination -begin
Effects, pros and cons:
The native #table class/methods are now shadowed, try e.g. which table -all. However, this effect is quite clear, easily detectable and easily removed (delete destintation and remove path);
No weird conflicts between native #table (now shadowed) and new #table;
All methods, new and old, are visible, try methods(table);
Private table methods are accessible...
... but you are forced to use them.
Exposing the new methods (user-implemented) to the private ones requires more maintenance and direct handling of version conflicts in the table implementations.
You need write permissions on some eligible destination.
For those interested about the details, you can look into, https://github.com/okomarov/tableutils. Specifically the install_tableutils (the readme might not be updated).
The following works for me:
Define a modified disp function, say disp_modified.m, as follows, and put it in your path:
function disp_modified(t)
if istable(t)
%// Do whatever you want to display tables
builtin('disp', '''disp'' function intercepted!')
else
%// For non-tables, call `disp` normally
builtin('disp', t)
end
Define disp as a function handle to the modifed function (you can do that in startup.m to always have it by default):
disp = #disp_modified;
After this, in the command window I get
>> disp(1:5)
1 2 3 4 5
>> disp({1 2 3 'bb'})
[1] [2] [3] 'bb'
>> disp(table(rand(1e3,1)))
'disp' function intercepted!
Depending on the usage of the new class perhaps you could follow a cleaner approach. The proposed approach described in your post has the drawback that perhaps code used in your updated environment would not be easily portable to a new environment, or a program executed in your environment may demonstrate different behavior in a different environment.
Some questions you could consider (and perhaps clarify) would be: How do you intend to use the new class? Do you want to replace all the existing table uses? Do you want to be able to use it instead of a table class argument? Or do you want to alter the table so that each usage of the original table class in your environment uses the new class.
If you just need a new improved table for your usage, you could consider encapsulating the original table class in a new class. E.g MyTable, delegate all the methods you do not need to the original table methods, replace the methods you would like to improve or add new ones.
Update: Just saw the complete solution in Github and understood what you intended to do. Nice work. I will leave the post in case anyone finds it useful.

MATLAB best practice: reading variables from a saved workspace within a method

I have a a workspace called "parameters.mat", which contains many variables (really, constants) used by several methods throughout my simulation. The reason that I want these in one workspace is to have them in a handy place for the user to change.
I want to access these variables within class methods. I've found two ways of doing this, and I'd like to know which one is considered better (or perhaps if there's an even better way):
Load the workspace before anything else, as the base workspace, and whenever I want to use a variable from it within a method, I call evalin('base', 'variable_name') first.
Load the workspace within the method whenever I need it. This works,
but it gives me a warning when I use an undefined variable name in
the rest of the method (because MATLAB doesn't know it will be
loaded from a workspace). Is there a clean way to remove this warning?
Probably the cleanest way to do this is to use a wrapper function. Building on my comment, assuming your parameter constants are in a file parameters.mat:
function value = param(name)
s = load('parameters.mat');
value = getfield(s, name);
Now you can use a syntax like
var = param('name');
wherever you need the value of this variable. This way to do it is easily understandable to humans, and transparent to Matlab's code checker. You can also use param('name') directly in your computations, without assigning the value to a local variable.
If the parameter file contains more than just a few numbers, and loading it time after time slows down things, you can cache the data in a persistent variable:
function value = param(name)
persistent s
if isempty(s)
s = load('parameters.mat');
end
value = getfield(s, name);
Now the mat-file is read only on the first call to param(). The persistent variable s remains until the next clear all (or similar, see clear) or the end of the Matlab session. A drawback of this is that if you changed the mat-file, you have to clear all in order to make param() re-read it.
If on the other hand your mat-file does only consist of a few numbers, maybe a mat-file is not even necessary:
function value = param(name)
s.x0 = 1;
s.epsilon = 1;
s.dt = 0.01;
value = getfield(s, name);
With this approach the function param() is no longer a wrapper, but a central location where you store parameter values instead of a mat-file.

Find indirect calls to specific built-in MATLAB function

What do I want?
I am looking for a way to detect all points in my code where a specific function is called.
Why do I want it?
Some examples:
Some output comes out sorted or randomized, and I want to know where this happens
I am considering to change/overload a function and want to know in which part of my code this could have impact
What have I tried?
I tried placing a breakpoint in the file that was called. This only works for non builtin functions which are called from short running code that always executes everything.
I tried 'find files', this way I can easily find direct calls to sort but it is not so easy to find a call to sort invoked by unique for example.
I have tried depfun, it tells me:
whether something will be called
from where non-builtin functions will be called
I thought of overloading the builtin function, but feels like a last resort for me as I am afraid to make a mess. | Edit: Also it probably won't help due to function precedence.
The question
What is the best way to track all potential (in)direct function calls from a specific function to a specific (built-in)function.
I don't exactly understand your use case, but I guess most of the information you want can be obtained using dbstack, which gives you the call-stack of all the parent functions calling a certain function. I think the easiest way is to overload built-in functions something like this (I tried to overload min):
function varargout = min(varargin)
% print info before function call
disp('Wrapped function called with inputs:')
disp(varargin)
[stack,I] = dbstack();
disp('Call stack:')
for i=1:length(stack)
fprintf('level %i: called from line %i in file %s\n', ...
i, stack(i).line, stack(i).file);
end
% call original function
[varargout{1:nargout}] = builtin('min', varargin{:});
% print info after function call
disp('Result of wrapped function:')
disp(varargout)
I tried to test this, but I could not make it work unfortunately, matlab keeps on using the original function, even after playing a lot with addpath. Not sure what I did wrong there, but I hope this gets you started ...
Built-in functions take precedence over functions in local folder or in path. There are two ways you can overload a built-in for direct calls from your own code. By putting your function in a private folder under the same directory where your other MATLAB functions are. This is easier if you are not already using private folder. You can rename your private folder once you are done investigating.
Another way is to use packages and importing them. You put all your override functions in a folder (e.g. +do_not_use). Then in the function where you suspect built-in calls are made add the line "import do_not_use.*;". This will make calls go to the functions in +do_not_use directory first. Once you are done checking you can use "clear import" to clear all imports. This is not easy to use if you have too many functions and do not know in which function you need to add import.
In addition to this, for each of the function you need to follow Bas Swinckels answer for the function body.
Function precedence order.
Those two methods does not work for indirect calls which are not from your own code. For indirect calls I can only think of one way where you create your own class based on built-in type. For example, if you work only on double precision types, you need to create your own class which inherits from double and override the methods you want to detect. Then pass this class as input to your code. Your code should work fine (assuming you are not using class(x) to decide code paths) since the new class should behave like a double data type. This option will not work if your output data is not created from your input data. See subclassing built-in types.
Did you try depfun?
The doc shows results similar to the ones you request.
doc depfun:
...
[list, builtins, classes, prob_files, prob_sym, eval_strings, called_from, java_classes] = depfun('fun') creates additional cell arrays or structure arrays containing information about any problems with the depfun search and about where the functions in list are invoked. The additional outputs are ...
Looks to me you could just filter the results for your function.
Though need to warn you - usually it takes forever to analyze code.