I do scientific image processing in MATLAB, which requires solving weird optimization problems with many parameters that are used by an "experiment" function which has many different "helper functions" that use various subsets of the parameters. Passing these parameters around is a pain and I would like an elegant extensible solution.
My preferred solution would be a "relative global" variable - common to a "main" function's workspace, as well as any subfunctions that the main function calls and that specify they want to share that variable. But the relative global variable would not exist outside the function which declares it, and would disappear once the main function returns.
The code would look like this, except there would be many more experiment_helpers, each using the parameters in different ways:
function y = experiment(p1,p2,p3,...)
% declare relative global variables.
% these will not change value within this experiment function,
% but this experiment will be reused several times in the calling function,
% each time with different parameter values.
relative global p1, p2, p3 ...
% load and process some data based on parameters
...
% call the helper + return
y = experiment_helper(y,z);
end
function y = experiment_helper(y,z)
relative global p1, p3
%% do some stuff with y, z, possibly using p1 and p3, but not changing either of them.
...
end
I realize that you can get the desired behavior in several other ways -- you could pass the parameters to the sub-functions called by the experiment, you could put the parameters in a parameter structure and pass them to the sub-functions, and so on. The first is horrible because I have to change all the subfunctions' arguments every time I want to change the use of parameters. The second is okay, but I have to include the structure prefix every time I want to use the variables.
I suppose the "proper" solution to my problem is to use options structures much like matlab does in its own optimization code. I just wonder if there isn't a slicker way that doesn't involve me typing "paramStruct.value" every time I want to use the parameter "value".
I realize that relative global variables could cause all sorts of debugging nightmares, but I don't think they would necessarily be worse than the ones caused by existing global variables. And they would at least have more locality than unqualified globals.
So how does a real programmer handle this problem? Is there an elegant, easy to create and maintain design for this kind of problem? Can it be done in matlab? If not, how would you do it in another language? (I always feel guilty about using matlab as it doesn't exactly encourage good design. I always want to switch to python but can't justify relearning things and migrating the code base -- unless it would make me a much faster and better programmer within a few weeks, and the matlab-specific tools, such as wavelet and optimization toolboxes, could quickly+easily be found or duplicated within python.)
No, I don't think Matlab has exactly what you are looking for.
The best answer depends on if your primary concern is due to the amount of typing required to pass the parameters around, or if you are concerned about memory usage as you pass around large datasets. There are a few approaches that I have taken, and they all have pros and cons.
Parameters in a structure. You've already mentioned this, in your question,but I would like to reinforce it as an excellent answer in most circumstances. Instead of calling my parameter structure paramStruct, I usually just call it p. It is always locally scoped to the function I'm using, and then I need to type p.value instead of value. I believe the extra two characters are well worth the ability to easily pass around the complete set of parameters.
InputParser. I use the InputParser class a lot when designing functions which require a lot of inputs. This actually fits in well with (1), in that a sturcture of parameters can be passed in, or you can use 'param'/value pairs, and yuo can allow your function to define defaults. Very useful.
Nested functions can help in limited cases, but only when all of your functions can be defined hierarchically. It limits code re-use if one of your inner function can be generic enough to support multiple parent functions.
Pass by reference using handle classes. If your concern is memory. the new form of
classes actually allows you to define a paraemter structure which is passes by reference. The code looks something like this:
classdef paramStructure < handle
properties (SetAccess = public, GetAccess = public)
param1 = [];
param2 = [];
% ...
param100 = [];
end
end
Then you can create a new pass-by-reference structure like this
p = paramStructure;
And set values
p.param1 = randn(100,1);
As this is passed around, all passing is by reference, with the pros (less memory use, enables some coding styles) and cons (generally easier to make some kinds of mistakes.
global variables. I try really hard to avoid these, but they are an option.
What you're describing is exactly available in MATLAB using nested functions. They have their own workspace, but they also have access to the workspace of the parent function in which they are nested. So the parent function can define your parameters, and the nested functions will be able to see them as they have a shared scope.
It's a pretty elegant way of programming, and causes many fewer problems with debugging than globals. Recent versions of MATLAB even highlight shared variables in the editor in light blue, to help with this.
Related
I am working with a legacy scala codebase, and as is always the case modifying the code is quite difficult without touching different parts.
One of my new requirement in to make several decisions based on some input parameters. Problem is that these decisions are to be made at various points along the execution. So either I encapsulate all those parameters in a case class instance and pass it along. But it means I would have to modify multiple methods signatures, and I want to avoid this approach as much as possible.
Another approach can be to create a global object containing all those input parameters and accessible from different points in the execution. Is it a good approach in Scala?
No, using global mutable variables to pass “hidden” parameters is not a good idea, not in Scala and not in any other programming language. It makes the code hard to understand and modify, because a function's behaviour will now depend on which functions were invoked earlier. And it's extremely fragile, because you might forget setting one of those global parameters before invoking the function, which means that it will use whatever value was stored there before. This is the kind of thing that can appear to work for years, and then break when you modify a completely unrelated part of the program.
I can't stress this enough: do not use global mutable variables, period. The solution is to man up and change those method signatures. Depending on the details, dependency injection may or may not help in your particular case.
I have seen the use of set keyword in tcl often. This cannot be used to create constant. How does one create constant in tcl which can then be used by other procedures?
Generally speaking, most of the uses of constants fall into several categories:
enumeration values
magical numbers
looping control factors
scaling factors
In Tcl, for the first case you'd usually just use the name instead of mapping it to an integer, with integer mappings only being applied in the cases that need them. Even bit sets can be handled that way, substituting a list of names for the set of bits (and the name being present in the list is equivalent to the bit being set). Tcl's C API has relevant functions for helping with this, specifically Tcl_GetIndexFromObj().
Magical values are usually best locked away close to the code that handles them. If I was interfacing to hardware, I'd not let the magic values appear at all at the script level (since I'd have the binding code written in C).
Looping control factors are often best represented as default values for procedure arguments, as they are things that you want to sometimes override. But they're often not as needed once custom control structures are available, and they fit a lot more into the Tcl style of working.
Scaling factors are the case where constants might be useful. I tend to simulate those by just using a global or namespace variable and plain old not assigning to it from elsewhere. I'd be quite interested in having code to allow constants (specifically variables that can't be assigned to) as a standard feature, but we don't have that right now.
Once those cases are covered, what remain tend to be unimportant constants. After all, there's almost no need to calculate the sizes of things for allocation and stuff like that, and things like positional binding in SQL statements are discouraged within TDBC in favour of binding by name (an awful lot easier to get right).
A simple way of making a constant is to put a write trace on a variable so that whenever it is written to, it is reset back to its constant value.
set CONSTANT 123
trace add variable CONSTANT write {apply {args {
global CONSTANT
# Reset to the constant value; write traces are after the fact
set CONSTANT 123
# Make the write give an error
error "may not change constant"
}}}
Let's say we have the following model:
Collector:
model Collector
Real collect_here;
annotation(defaultComponentPrefixes="inner");
end Collector;
and the following model potentially multiple times:
model Calculator
outer Collector collector;
Real calculatedVariable = 2*time;
equation
calculatedVariable = collector.collect_here;
end Calculator;
The code above works if calcModel is present only once in the system to be simulated. If the model exists more than once I get a singular system. This is demonstrated by the Example below. Changing the parameter works either gives a working or failing system.
model Example
parameter Boolean works = true;
inner Collector collector;
Calculator calculator1;
Calculator calculator2 if not works;
end Example;
Using an array inside the collector to pass multiple variables in it doesn't solve it.
Another possible way to solve this is possible by use of connectors, but I only made it work with one calcModel.
Using multiple instances of Calculator does brake the model, as the single variable calculatedVariable will have multiple equations trying to compute its value. Therefore Dymola complains that the system is structurally singular, in this case meaning that there are more equations than variables in the resulting system of equations.
To give a bit more of an insight: Actually checking Collector will fail, as since Modelica 3.0 every component has to be balanced (meaning it has to have as many unknowns as states), which is not the case for Collector as it does have one unknown but no equation. This strongly limits the possible applications for the inner/outer construct as basically every variable has to be computed where it is defined.
In the given example this is compensated in the overall system if exactly one Calculator is used. So this single combination will work. Although this works, it is something that should not be done - for the obvious reason of being very error-prone (and all sub-models should pass the check).
Your question on how to solve this issue actually misses a description of what the issue actually is. There are some cases in my mind that your approach could be useful for:
You want to plot multiple variables from a single point, which would be collector. For this purpose "variable selections" should be the most straight-forward way to go: see Dymola Manual Vol. 1, Section "4.3.11 Matching and variable selections" on how to apply them.
You want to carry out some mathematical operation on that variables. Then it could be useful to have a vectorized input of variable size. This enables an arbitrary number of connections to this input. For an example of this take a look at: Modelica.Blocks.Math.MultiSum
You want to route multiple signals between different models (which is unlikely judging from your description, but still): Then expandable connectors would be a good possibility. To get an impression of what that does take a look at Modelica.Blocks.Examples.BusUsage.
Hope this helps, otherwise please specify more clearly what you actually want to achieve with your code.
I prepared a demonstrative library for such scenario some days ago. You can access it at https://gist.github.com/beutlich/e630b2bf6cdf3efe96e5e9a637124fe1. If you read the documentation on Example2 you can see the link to an article from H. Elmqvis et. al., which is the clue to your problem. That is, you need a connector, and inherited connects from every Calculator to the one Collector.
I'm trying to write a function what detect this relation between the variables I have got in the workspace:
v1 - fft(v2) = 0
Where v1, v2 are variables of my workspace.
Sometimes I need to know which variables have a certain numerical relation. If I have thirty, I don´t want to be looking for this relation in "manual way", just introducing a sentence for each pair of different variables.
I would like a function in which I introduce (or I modify this function every time I need it) the sentence (for instance what I wrote before) and the function show me the pair of variables a I am looking for.
Does anyone know how to do it?
You can use who() to programatically obtain a list of variables that currently exist. You can then use eval() to get their values. At that point, you can use a fairly trivial nested loop to iterate over all possible pairs, looking for that relationship.
Note 1: Using eval() for "normal" programming is considered bad style; it should only really be used for meta-programming tasks like this.
Note 2: If you have N variables in the workspace, there are N^2 ordered pairs. This may take a while to iterate over if N is large.
Note 3: You're essentially looking for equality between variables, which may not be particularly reliable in floating-point.
I am new to MATLAB, it wasn't in the job description and I've been forced to take over for the person who wrote and maintained the code my company uses. Life's tough.
The guy from which I'm taking over told me that he declared all the big data vectors as global, to save memory. More specifically, so that when one function calls another function, he doesn't create a copy of the data when he passes it over.
Is this true? I read Strategies for Efficient Use of Memory, and it says that
When working with large data sets, be aware that MATLAB makes a temporary copy of an input variable if the called function modifies its value. This temporarily doubles the memory required to store the array, which causes MATLAB to generate an error if sufficient memory is not available.
It says something very similiar in Memory Allocation For Array #Function Arguments:
When you pass a variable to a function, you are actually passing a reference to the data that the variable represents. As long as the input data is not modified by the function being called, the variable in the calling function and the variable in the called function point to the same location in memory. If the called function modifies the value of the input data, then MATLAB makes a copy of the original array in a new location in memory, updates that copy with the modified value, and points the input variable in the called function to this new array.
So is it true that using global can be better? It seems a little sloppy to blithely declare all the large data as global, instead of making sure that none of the code modifies its input argument. Am I wrong? Does this really improve RAM usage?
In my experience, provided that none of the code modifies the large data, memory usage is the same, regardless of whether you use a global variable or an input argument, just like the Matlab docs say. Further information is in this blog post by a MathWorks employee.
There is quite a bit of folklore on performance issues in Matlab and not all of it is right. The internals of Matlab have changed quite a bit. It may be that in a previous version it's better to use a global variable.
This answer may be somewhat tangential, but an additional topic that bears mention here is the use of nested functions to manage memory.
As has already been established in other answers, there is no need for global variables if the data you are passing to the function is not modified (since it will be passed by reference). If it is modified (and is thus passed by value), using a global variable instead will save you memory. However, global variables can be somewhat "uncouth" for the following reasons:
You have to make a declaration like global varName everywhere you need them.
It can be conceptually a little messy trying to keep track of when and how they are modified, especially if they are spread across multiple m-files.
The user can easily break your code with an ill-placed clear global, which clears all global variables.
An alternative to global variables was mentioned in the first set of documentation you cited: nested functions. Immediately following the quote you cited is a code example (which I've formatted slightly differently here):
function myfun
A = magic(500);
setrowval(400, 0);
disp('The new value of A(399:401,1:10) is')
A(399:401,1:10)
function setrowval(row, value)
A(row,:) = value;
end
end
In this example, the function setrowval is nested inside the function myfun. The variable A in the workspace of myfun is accessible within setrowval (as if it had been declared global in each). The nested function modifies this shared variable, thus avoiding any additional memory allocation. You don't have to worry about the user inadvertently clearing anything and (in my opinion) it's a bit cleaner and easier to follow than declaring global variables.
The solution seems a bit strange to me. As you found out already, it shouldn't have significant impact on the memory usage if the called function does not modify the data array. However, if the called function modifies the data array, there's a functional difference: In one case (making the data array global), the change has an impact on the rest of the code, in the other case (passing it as reference) the modifications are only local and temporary.
I think you pretty much answered your own question, but a couple more references would be good here:
I made a video on this:
http://blogs.mathworks.com/videos/2008/09/16/new-location-and-memory-allocation/
Similar to what Loren spoke of here:
http://blogs.mathworks.com/loren/2006/05/10/memory-management-for-functions-and-variables/
-Dogu