MATLAB: Replacing multivariate function to avoid redundant calcs? [duplicate] - matlab

I have quite a heavy function in MATLAB:
function [out] = f ( in1, in2, in3)
Which is called quite often with the same parameters.
The function is deterministic so for given input parameters its output will always be the same.
What would be the simplest way of storing the results of computed inputs in the function such that if the function will be called again with the same output it would be able to answer quickly?
Is a persistent variable which maps (using containers.Map or some other class) input set <in1, in2, in3> to a result the way to go?
Note that any method which requires saving the data to disk is out of the question in my application.

Below is an idea for a CacheableFunction class
It seems all of the answers to your main question are pointing the same direction - a persistent Map is the consensus way to cache results, and I do this too.
If the inputs are arrays, they'll need to be hashed to a string or scalar to be used as a map key. There are a lot of ways to hash your 3 input arrays to a key, I used DataHash in my solution below.
I chose to make it a class rather than a function like memoize so that the input hashing function can be dynamically specified one time, rather than hardcoded.
Depending on the form of your output, it also uses dzip/dunzip to reduce the footprint of the saved outputs.
Potential improvement: a clever way of deciding which elements to remove from the persistent map when its memory footprint approaches some limit.
Class definition
classdef CacheableFunction < handle
properties
exeFun
hashFun
cacheMap
nOutputs
zipOutput
end
methods
function obj = CacheableFunction(exeFun, hashFun, nOutputs)
obj.exeFun = exeFun;
obj.hashFun = hashFun;
obj.cacheMap = containers.Map;
obj.nOutputs = nOutputs;
obj.zipOutput = [];
end
function [result] = evaluate(obj, varargin)
thisKey = obj.hashFun(varargin);
if isKey(obj.cacheMap, thisKey)
if obj.zipOutput
result = cellfun(#(x) dunzip(x), obj.cacheMap(thisKey), 'UniformOutput', false);
else
result = obj.cacheMap(thisKey);
end
else
[result{1:obj.nOutputs}] = obj.exeFun(varargin);
if isempty(obj.zipOutput)
obj.zipCheck(result);
end
if obj.zipOutput
obj.cacheMap(thisKey) = cellfun(#(x) dzip(x), result, 'UniformOutput', false);
else
obj.cacheMap(thisKey) = result;
end
end
end
function [] = zipCheck(obj,C)
obj.zipOutput = all(cellfun(#(x) isreal(x) & ~issparse(x) & any(strcmpi(class(x), ...
{'double','single','logical','char','int8','uint8',...
'int16','uint16','int32','uint32','int64','uint64'})), C));
end
end
end
Testing it out...
function [] = test_caching_perf()
A = CacheableFunction(#(x) long_annoying_function(x{:}), #(x) DataHash(x), 3);
B = rand(50, 50);
C = rand(50, 50);
D = rand(50, 50);
tic;
myOutput = A.evaluate(B, C, D);
toc
tic;
myOutput2 = A.evaluate(B, C, D);
toc
cellfun(#(x, y) all(x(:) == y(:)), myOutput, myOutput2)
end
function [A, B, C] = long_annoying_function(A, B, C)
for ii = 1:5000000
A = A+1;
B = B+2;
C = C+3;
end
end
And results
>> test_caching_perf
Elapsed time is 16.781889 seconds.
Elapsed time is 0.011116 seconds.
ans =
1 1 1

MATLAB now ships with a function just for this purpose.
The technique used is called "memoization" and the function's name is "memoize".
Check out : https://www.mathworks.com/help/matlab/ref/memoize.html

Persistent map is indeed a nice way to implement cached results. Advantages I can think of:
No need to implement hash function for every data type.
Matlab matrices are copy-on-write, which can offer certain memory efficiency.
If memory usage is an issue, one can control how many results to cache.
There is a file exchange submission, A multidimensional map class by David Young, comes with a function memoize() does exactly this. It's implementation uses a bit different mechanism (referenced local variable), but the idea is about the same. Compared with persistent map inside each function, this memoize() function allows existing function to be memoized without modification. And as pointed out by Oleg, using DataHash (or equivalent) can further reduce the memory usage.
PS: I have used the MapN class extensively and it is quite reliable. Actually I have submitted a bug report and the author fixed it promptly.

Related

Defining a function with multiple outputs that can't be organised into a matrix

Is there any natural way to define a MATLAB function with multiple outputs that cannot or are inappropriate to "stack" into a matrix? For example, what if I want a function f that returns a 3x3 matrix A and a 4x4 matrix B?
I'm really surprised that this would even be an issue in MATLAB. Because in Python, all we need to do is return A, B which returns a tuple of the two. However it seems that MATLAB doesn't quite support the idea of containers. As a non-elegant workaround, I can use a struct to put the two pieces of data in, and the function goes something like:
function re = f(x)
%f: returns two dimensional-inconsistent matrices A and B
% function body as follows
....
A = ...;
B = ...;
% put data into the struct 're'
re.A = A;
re.B = B;
end
Apart from possible performance issues, this approach looks very unnatural and clumsy. Is there any better approach?
In MATLAB you can return any number of outputs with this syntax:
function [A,B] = f(x)
A = ...;
B = ...;
end
that is an even elegant solution than tuples used in python.
You can even control the behavior with the number of inputs and outputs (nargin and nargout) and discard outputs with a tilde. More information here.
I cannot think of a more elegant syntax.
Usually when having several outputs, one should declare the function as follows:
function [out1, out2, ... , outN] = funcName(in1,...,inM)
...
end
MATLAB also allows you to alter the behavior of your function based on the amount of requested inputs/outputs via the nargin/nargout functions, respectively (you can think of this as a form of overloading).
For example, you can specify as one of the inputs an array indicating which outputs you want the function to give, then populate the varargout cell array accordingly:
function varargout = funcName(in1,...,whichOut)
...
for indO = 1:numel(whichOut)
switch whichOut{indO}
case 'out1'
varargout{indO} = out1;
case 'out2'
... etc
case 'out6'
varargout{indO} = out6;
end
end
then call it using [out6, out1] = funcName(inp, {'out6','out1'});
See also varargin.

How to force MATLAB to return all values in a nested function call?

I find it impossible to write MATLAB code without creating a huge number of superfluous, single-use variables.
For example, suppose function foo returns three column vectors of exactly the same size. Say:
function [a, b, c] = foo(n)
a = rand(n, 1);
b = rand(n, 1);
c = rand(n, 1);
end
Now, suppose that bar is a function that expect as imput a cell array of size (1, 3).
function result = bar(triplet)
[x, y, z] = triplet{:};
result = x + y + z;
end
If I want to pass the results of foo(5), I can do it by creating three otherwise-useless variables:
[x, y, z] = foo(5);
result = bar({x, y, z});
Is there some function baz that would allow me to replace the two lines above with
result = bar(baz(foo(5)));
?
NB: the functions foo and bar above are meant only as examples. They're supposed to represent functions over which I have no control. IOW, modifying them is not an option.
You can replace the three variables by a cell array using a comma-separated list:
vars = cell(1,3); % initiallize cell array with as many elements as outputs of foo
[vars{:}] = foo(5); % comma-separated list: each cell acts as a variable that
% receives an output of foo
result = bar(vars);
Not possible. baz in baz(foo(5)) will only take the first output of foo, the other two would be ignored. The plain two-line variant is not that awkward. And this is not a common situation. You don't generally work with cell arrays where normal numerical arrays would do.
You could of course just write your own wrapper for foo that returns whatever you need (i.e. containing similar two lines), in case you need to use it frequently.
As nirvana-msu said, it is not possible to do the task without creating temporary variables. But it is possible to handle it within a function and even with varargin and varargout. Inspired by this answer on my question, you can define and use the following function:
function varargout = redirect(F1, F2, count, list, varargin)
output = cell(1, count);
[output{:}] = F2(varargin{:});
varargout = cell(1, max(nargout, 1));
[varargout{:}] = F1(output(list));
end
For instance, in your example you can write result = redirect(#bar, #foo, 3, [1:3], 5);
The issue is you are converting the cell triplet{} into an array[], so a conversion is the method you want. Although this method will perform the inverse transformation, I know of no method that will perform the transformation you want, likely due to the relative complexity of the cell data structure. You may have some luck further digging into the API.
EDIT: EBH kindly pointed out this method, which does what you are looking for.
EDIT2: The above method will not perform the action OP asked for. Leaving this up because the API often has great solutions that are hidden by bad names.

How to update a uitable after creation from other functions?

I created a matfile in which I store data that are constantly overwritten by user behavior. This occurs in a function "test()".
n=1
while n < 5
myVal = double(Test704(1, 780, -1)) %Returns the user's behavior
if myVal == 1
n = n + 1 %"n" is the overwritten variable in the matfile
end
save('testSave1.mat') %The matfile
m = matfile('testSave1.mat')
end
Then, I want to display these data in another function (it is essential to have two separated functions) called "storageTest()". More particularly, storageTest() is a GUI function where I developped an uitable "t". So, I first call the function "test()" and give its output values as data of "t". Here is the code of the interesting part of "storageTest":
m = test()
d = [m.n]
t = uitable('Data',d, ...
'ColumnWidth',{50}, ...
'Position',[100 100 461 146]);
t.Position(3) = t.Extent(3);
t.Position(4) = t.Extent(4);
drawnow
This code executes only when "m = test()" running is over and displays me a tab in which I can see the final value of "n". However, I want my table to be displayed before and to see my value incrementing according to user's behavior.
I have searched on the web to solve my issue, but I cannot find any answer, is it possible to do such a thing?
Assuming I'm interpreting the question correctly, it should be fairly trivial to accomplish this if you initialize your table prior to calling test and then pass the handle to your table for test to update in the while loop:
For example:
function testGUI
% Initialize table
t = uitable('ColumnWidth',{50}, 'Position',[100 100 461 146]);
test(t)
function test(t)
n = 1;
while n < 5
n = n + 1;
t.Data = n;
pause(0.25); % Since we're just incrementing a number, a slight so we can actually see the change
end
When you run the above, you'll notice the data in your table iterating as expected.
excaza was a little faster in writing basically the same answer like me. As it looks a slightly different, I'll post it anyway.
function storagetest()
close all
f = figure;
data = [1];
t = uitable(f,'Data',data,'ColumnWidth',{50});
test()
end
function test()
% handle uitable
t = evalin('caller','t')
n = 1;
while n < 5
newVal = input('Enter a number:');
data = get(t,'Data');
set(t,'Data', [data; newVal]);
n = n + 1;
end
end
The "user behaviour" I imitated with the input function. The basic idea is to update your table from within test(). evalin you can use, if you don't want to pass parameters to test(), though passing the handle of the uitable directly is certainly the better option.
If you are working on a serious GUI project I highly recommend you reading this answer.

How does function work regarding to the memory usage?

When you are using function in MATLAB you have just the output of the function in the work space and all the other variables that maybe created or used in the body of that function are not shown. I am wondering how does function work? Does it clear all other variables from memory and just save the output?
function acts like a small, isolated programming environment. At the front end you insert your input (e.g. variables, strings, name-value pairs etc). After the function has finished, only the output is available, discarding all temporarily created variables.
function [SUM] = MySum(A)
for ii = 1:length(A)-1
SUM(ii) = A(ii)+A(ii+1);
kk(ii) = ii;
end
end
>> A=1:10
>> MySum(A)
This code just adds two consecutive values for the input array A. Note that the iteration number, stored in kk, is not output and is thus discarded after the function has completed. In MATLAB kk(ii) = ii; will be underlined orange, since it 'might be unused'.
Say you want to also retain kk, just add it to the function outputs:
function [SUM,kk] = MySum(A)
and keep the rest the same.
If you have large variables that you only use up to a certain point and wish them not clogging up your memory whilst the function is running, use clear for that:
function [SUM] = MySum(A)
for ii = 1:length(A)-1
SUM(ii) = A(ii)+A(ii+1);
kk(ii) = ii;
end
clear kk
end

Cleanest way to cache function results in MATLAB

I have quite a heavy function in MATLAB:
function [out] = f ( in1, in2, in3)
Which is called quite often with the same parameters.
The function is deterministic so for given input parameters its output will always be the same.
What would be the simplest way of storing the results of computed inputs in the function such that if the function will be called again with the same output it would be able to answer quickly?
Is a persistent variable which maps (using containers.Map or some other class) input set <in1, in2, in3> to a result the way to go?
Note that any method which requires saving the data to disk is out of the question in my application.
Below is an idea for a CacheableFunction class
It seems all of the answers to your main question are pointing the same direction - a persistent Map is the consensus way to cache results, and I do this too.
If the inputs are arrays, they'll need to be hashed to a string or scalar to be used as a map key. There are a lot of ways to hash your 3 input arrays to a key, I used DataHash in my solution below.
I chose to make it a class rather than a function like memoize so that the input hashing function can be dynamically specified one time, rather than hardcoded.
Depending on the form of your output, it also uses dzip/dunzip to reduce the footprint of the saved outputs.
Potential improvement: a clever way of deciding which elements to remove from the persistent map when its memory footprint approaches some limit.
Class definition
classdef CacheableFunction < handle
properties
exeFun
hashFun
cacheMap
nOutputs
zipOutput
end
methods
function obj = CacheableFunction(exeFun, hashFun, nOutputs)
obj.exeFun = exeFun;
obj.hashFun = hashFun;
obj.cacheMap = containers.Map;
obj.nOutputs = nOutputs;
obj.zipOutput = [];
end
function [result] = evaluate(obj, varargin)
thisKey = obj.hashFun(varargin);
if isKey(obj.cacheMap, thisKey)
if obj.zipOutput
result = cellfun(#(x) dunzip(x), obj.cacheMap(thisKey), 'UniformOutput', false);
else
result = obj.cacheMap(thisKey);
end
else
[result{1:obj.nOutputs}] = obj.exeFun(varargin);
if isempty(obj.zipOutput)
obj.zipCheck(result);
end
if obj.zipOutput
obj.cacheMap(thisKey) = cellfun(#(x) dzip(x), result, 'UniformOutput', false);
else
obj.cacheMap(thisKey) = result;
end
end
end
function [] = zipCheck(obj,C)
obj.zipOutput = all(cellfun(#(x) isreal(x) & ~issparse(x) & any(strcmpi(class(x), ...
{'double','single','logical','char','int8','uint8',...
'int16','uint16','int32','uint32','int64','uint64'})), C));
end
end
end
Testing it out...
function [] = test_caching_perf()
A = CacheableFunction(#(x) long_annoying_function(x{:}), #(x) DataHash(x), 3);
B = rand(50, 50);
C = rand(50, 50);
D = rand(50, 50);
tic;
myOutput = A.evaluate(B, C, D);
toc
tic;
myOutput2 = A.evaluate(B, C, D);
toc
cellfun(#(x, y) all(x(:) == y(:)), myOutput, myOutput2)
end
function [A, B, C] = long_annoying_function(A, B, C)
for ii = 1:5000000
A = A+1;
B = B+2;
C = C+3;
end
end
And results
>> test_caching_perf
Elapsed time is 16.781889 seconds.
Elapsed time is 0.011116 seconds.
ans =
1 1 1
MATLAB now ships with a function just for this purpose.
The technique used is called "memoization" and the function's name is "memoize".
Check out : https://www.mathworks.com/help/matlab/ref/memoize.html
Persistent map is indeed a nice way to implement cached results. Advantages I can think of:
No need to implement hash function for every data type.
Matlab matrices are copy-on-write, which can offer certain memory efficiency.
If memory usage is an issue, one can control how many results to cache.
There is a file exchange submission, A multidimensional map class by David Young, comes with a function memoize() does exactly this. It's implementation uses a bit different mechanism (referenced local variable), but the idea is about the same. Compared with persistent map inside each function, this memoize() function allows existing function to be memoized without modification. And as pointed out by Oleg, using DataHash (or equivalent) can further reduce the memory usage.
PS: I have used the MapN class extensively and it is quite reliable. Actually I have submitted a bug report and the author fixed it promptly.