Do I conserve memory in MATLAB by declaring variables global instead of passing them as arguments? - matlab

I am new to MATLAB, it wasn't in the job description and I've been forced to take over for the person who wrote and maintained the code my company uses. Life's tough.
The guy from which I'm taking over told me that he declared all the big data vectors as global, to save memory. More specifically, so that when one function calls another function, he doesn't create a copy of the data when he passes it over.
Is this true? I read Strategies for Efficient Use of Memory, and it says that
When working with large data sets, be aware that MATLAB makes a temporary copy of an input variable if the called function modifies its value. This temporarily doubles the memory required to store the array, which causes MATLAB to generate an error if sufficient memory is not available.
It says something very similiar in Memory Allocation For Array #Function Arguments:
When you pass a variable to a function, you are actually passing a reference to the data that the variable represents. As long as the input data is not modified by the function being called, the variable in the calling function and the variable in the called function point to the same location in memory. If the called function modifies the value of the input data, then MATLAB makes a copy of the original array in a new location in memory, updates that copy with the modified value, and points the input variable in the called function to this new array.
So is it true that using global can be better? It seems a little sloppy to blithely declare all the large data as global, instead of making sure that none of the code modifies its input argument. Am I wrong? Does this really improve RAM usage?

In my experience, provided that none of the code modifies the large data, memory usage is the same, regardless of whether you use a global variable or an input argument, just like the Matlab docs say. Further information is in this blog post by a MathWorks employee.
There is quite a bit of folklore on performance issues in Matlab and not all of it is right. The internals of Matlab have changed quite a bit. It may be that in a previous version it's better to use a global variable.

This answer may be somewhat tangential, but an additional topic that bears mention here is the use of nested functions to manage memory.
As has already been established in other answers, there is no need for global variables if the data you are passing to the function is not modified (since it will be passed by reference). If it is modified (and is thus passed by value), using a global variable instead will save you memory. However, global variables can be somewhat "uncouth" for the following reasons:
You have to make a declaration like global varName everywhere you need them.
It can be conceptually a little messy trying to keep track of when and how they are modified, especially if they are spread across multiple m-files.
The user can easily break your code with an ill-placed clear global, which clears all global variables.
An alternative to global variables was mentioned in the first set of documentation you cited: nested functions. Immediately following the quote you cited is a code example (which I've formatted slightly differently here):
function myfun
A = magic(500);
setrowval(400, 0);
disp('The new value of A(399:401,1:10) is')
A(399:401,1:10)
function setrowval(row, value)
A(row,:) = value;
end
end
In this example, the function setrowval is nested inside the function myfun. The variable A in the workspace of myfun is accessible within setrowval (as if it had been declared global in each). The nested function modifies this shared variable, thus avoiding any additional memory allocation. You don't have to worry about the user inadvertently clearing anything and (in my opinion) it's a bit cleaner and easier to follow than declaring global variables.

The solution seems a bit strange to me. As you found out already, it shouldn't have significant impact on the memory usage if the called function does not modify the data array. However, if the called function modifies the data array, there's a functional difference: In one case (making the data array global), the change has an impact on the rest of the code, in the other case (passing it as reference) the modifications are only local and temporary.

I think you pretty much answered your own question, but a couple more references would be good here:
I made a video on this:
http://blogs.mathworks.com/videos/2008/09/16/new-location-and-memory-allocation/
Similar to what Loren spoke of here:
http://blogs.mathworks.com/loren/2006/05/10/memory-management-for-functions-and-variables/
-Dogu

Related

Scala legacy code: how to access input parameters at different points in execution path?

I am working with a legacy scala codebase, and as is always the case modifying the code is quite difficult without touching different parts.
One of my new requirement in to make several decisions based on some input parameters. Problem is that these decisions are to be made at various points along the execution. So either I encapsulate all those parameters in a case class instance and pass it along. But it means I would have to modify multiple methods signatures, and I want to avoid this approach as much as possible.
Another approach can be to create a global object containing all those input parameters and accessible from different points in the execution. Is it a good approach in Scala?
No, using global mutable variables to pass “hidden” parameters is not a good idea, not in Scala and not in any other programming language. It makes the code hard to understand and modify, because a function's behaviour will now depend on which functions were invoked earlier. And it's extremely fragile, because you might forget setting one of those global parameters before invoking the function, which means that it will use whatever value was stored there before. This is the kind of thing that can appear to work for years, and then break when you modify a completely unrelated part of the program.
I can't stress this enough: do not use global mutable variables, period. The solution is to man up and change those method signatures. Depending on the details, dependency injection may or may not help in your particular case.

How does one create constants in Tcl?

I have seen the use of set keyword in tcl often. This cannot be used to create constant. How does one create constant in tcl which can then be used by other procedures?
Generally speaking, most of the uses of constants fall into several categories:
enumeration values
magical numbers
looping control factors
scaling factors
In Tcl, for the first case you'd usually just use the name instead of mapping it to an integer, with integer mappings only being applied in the cases that need them. Even bit sets can be handled that way, substituting a list of names for the set of bits (and the name being present in the list is equivalent to the bit being set). Tcl's C API has relevant functions for helping with this, specifically Tcl_GetIndexFromObj().
Magical values are usually best locked away close to the code that handles them. If I was interfacing to hardware, I'd not let the magic values appear at all at the script level (since I'd have the binding code written in C).
Looping control factors are often best represented as default values for procedure arguments, as they are things that you want to sometimes override. But they're often not as needed once custom control structures are available, and they fit a lot more into the Tcl style of working.
Scaling factors are the case where constants might be useful. I tend to simulate those by just using a global or namespace variable and plain old not assigning to it from elsewhere. I'd be quite interested in having code to allow constants (specifically variables that can't be assigned to) as a standard feature, but we don't have that right now.
Once those cases are covered, what remain tend to be unimportant constants. After all, there's almost no need to calculate the sizes of things for allocation and stuff like that, and things like positional binding in SQL statements are discouraged within TDBC in favour of binding by name (an awful lot easier to get right).
A simple way of making a constant is to put a write trace on a variable so that whenever it is written to, it is reset back to its constant value.
set CONSTANT 123
trace add variable CONSTANT write {apply {args {
global CONSTANT
# Reset to the constant value; write traces are after the fact
set CONSTANT 123
# Make the write give an error
error "may not change constant"
}}}

Method frame - how local variables array is created?

My question is most likely platform/compiler/language specific, but I will try to be as generic as possible.
When we call a method, a frame is created for it and we (usually/always) allocate some space for the local variables in an array, return address, and probably some other stuff depending on the platform we're on. We would also have our PC pointing to the first item of the function's bytecode array. My question arises here...
That bytecode array only includes opcodes and their operands (right?). In this case, when the relevant method has been called, the os/runtime should have an idea about how much space does it need to reserve to create the local variable array. I think that information is probably part of the class file that was already compiled. So, where's that size information stored actually? Is it part of the method's bytecode array (in addition to the opcodes and operands)? Or it's kept somewhere else?
To make the question more clear, perhaps this example might help. For example, when I call a function object, what I'm returned is the address of the first opcode/instruction in the method or the address of something that helps me to initiate the method frame?
Multiple approaches are more than welcome.
Hope my question is clear
It's unclear which language/platform you're talking about, but in the case of Java classfiles, the size of the "local variable" table is stored as a field in the Code attribute of each method that has code.
That being said, modern JVMs operate at a higher level of abstraction. They don't just blindly interpret the bytecode - they may analyze and optimize the bytecode, or even compile it into machine code.

"relative global" variables in matlab or other languages

I do scientific image processing in MATLAB, which requires solving weird optimization problems with many parameters that are used by an "experiment" function which has many different "helper functions" that use various subsets of the parameters. Passing these parameters around is a pain and I would like an elegant extensible solution.
My preferred solution would be a "relative global" variable - common to a "main" function's workspace, as well as any subfunctions that the main function calls and that specify they want to share that variable. But the relative global variable would not exist outside the function which declares it, and would disappear once the main function returns.
The code would look like this, except there would be many more experiment_helpers, each using the parameters in different ways:
function y = experiment(p1,p2,p3,...)
% declare relative global variables.
% these will not change value within this experiment function,
% but this experiment will be reused several times in the calling function,
% each time with different parameter values.
relative global p1, p2, p3 ...
% load and process some data based on parameters
...
% call the helper + return
y = experiment_helper(y,z);
end
function y = experiment_helper(y,z)
relative global p1, p3
%% do some stuff with y, z, possibly using p1 and p3, but not changing either of them.
...
end
I realize that you can get the desired behavior in several other ways -- you could pass the parameters to the sub-functions called by the experiment, you could put the parameters in a parameter structure and pass them to the sub-functions, and so on. The first is horrible because I have to change all the subfunctions' arguments every time I want to change the use of parameters. The second is okay, but I have to include the structure prefix every time I want to use the variables.
I suppose the "proper" solution to my problem is to use options structures much like matlab does in its own optimization code. I just wonder if there isn't a slicker way that doesn't involve me typing "paramStruct.value" every time I want to use the parameter "value".
I realize that relative global variables could cause all sorts of debugging nightmares, but I don't think they would necessarily be worse than the ones caused by existing global variables. And they would at least have more locality than unqualified globals.
So how does a real programmer handle this problem? Is there an elegant, easy to create and maintain design for this kind of problem? Can it be done in matlab? If not, how would you do it in another language? (I always feel guilty about using matlab as it doesn't exactly encourage good design. I always want to switch to python but can't justify relearning things and migrating the code base -- unless it would make me a much faster and better programmer within a few weeks, and the matlab-specific tools, such as wavelet and optimization toolboxes, could quickly+easily be found or duplicated within python.)
No, I don't think Matlab has exactly what you are looking for.
The best answer depends on if your primary concern is due to the amount of typing required to pass the parameters around, or if you are concerned about memory usage as you pass around large datasets. There are a few approaches that I have taken, and they all have pros and cons.
Parameters in a structure. You've already mentioned this, in your question,but I would like to reinforce it as an excellent answer in most circumstances. Instead of calling my parameter structure paramStruct, I usually just call it p. It is always locally scoped to the function I'm using, and then I need to type p.value instead of value. I believe the extra two characters are well worth the ability to easily pass around the complete set of parameters.
InputParser. I use the InputParser class a lot when designing functions which require a lot of inputs. This actually fits in well with (1), in that a sturcture of parameters can be passed in, or you can use 'param'/value pairs, and yuo can allow your function to define defaults. Very useful.
Nested functions can help in limited cases, but only when all of your functions can be defined hierarchically. It limits code re-use if one of your inner function can be generic enough to support multiple parent functions.
Pass by reference using handle classes. If your concern is memory. the new form of
classes actually allows you to define a paraemter structure which is passes by reference. The code looks something like this:
classdef paramStructure < handle
properties (SetAccess = public, GetAccess = public)
param1 = [];
param2 = [];
% ...
param100 = [];
end
end
Then you can create a new pass-by-reference structure like this
p = paramStructure;
And set values
p.param1 = randn(100,1);
As this is passed around, all passing is by reference, with the pros (less memory use, enables some coding styles) and cons (generally easier to make some kinds of mistakes.
global variables. I try really hard to avoid these, but they are an option.
What you're describing is exactly available in MATLAB using nested functions. They have their own workspace, but they also have access to the workspace of the parent function in which they are nested. So the parent function can define your parameters, and the nested functions will be able to see them as they have a shared scope.
It's a pretty elegant way of programming, and causes many fewer problems with debugging than globals. Recent versions of MATLAB even highlight shared variables in the editor in light blue, to help with this.

iPhone -- context parameters vs. global variables

In my iPhone development, I've always used global variables for lots of stuff. The style guide in my new job says we should use context parameters instead. So I need to figure out what that means and how to do that.
Can anyone explain in more detail what this means -- or point me to some code that works this way?
Thanks
It sounds like there may be a clash in nomenclature. From this definition of Context Parameters, they seem to be concerned storing global state for the duration of a session. Perhaps, you could use a 'contextParameters' NSDictionary within NSUserDefaults to store your globals. To the extent that your globals might need to be exported in their entirety (for debugging, for state saving) this might be useful in the long run.
The style guide might just be generically saying to keep your variables scoped based on the context of their usage. For example if you have a variable that you need for the lifetime of a class instance then make it a member variable of that class. If it is something that you need for the lifetime of the app then put it in an application wide object (but not a global variable).
If you use a global object (which could mostly be a big C struct containing all your former global variables) instead of individual naked global variables, you might be able to copy the object, serialize it to save it or create a unified core dump, eventually add setters/listeners, etc.
If you break the global object up, based on the shared scope or the required context of groupings of instance/struct variables, then the fractional objects might end up being good candidates for the M portion of an MVC repartitioning of your code for better reuse, extensibility, etc.