Set function workspace to base in MATLAB - matlab

I have a rather bulky program that I've been running as a script from the MATLAB command line. I decided to clean it up a bit with some nested functions (I need to keep everything in one file), but in order for that to work it required me to also make the program itself a function. As a result, the program no longer runs in the base workspace like it did when it was a script. This means I no longer have access to the dozens of useful variables that used to remain after the program runs, which are important for extra calculations and information about the run.
The suggested workarounds I can find are to use assignin, evalin, define the variables as global, or set the output in the definition of the now function-ized program. None of these solutions appeal to me, however, and I would really like to find a way to force the workspace itself to base. Does any such workaround exist? Or is there any other way to do this that doesn't require me to manually define or label each specific variable I want to get out of the function?

Functions should define clearly input and output variables. Organizing the code differently will be much more difficult to understand and to modify later on. In the end, it will most likely cost you more time to work with an unorthodox style than investing in some restructuring.
If you have a huge number of output variables, I would suggest organizing them in structure arrays, which might be easy to handle as output variables.
The only untidy workaround I can imagine would use whos, assignin and eval:
function your_function()
x = 'hello' ;
y = 'world' ;
variables = whos ;
for k=1:length(variables)
assignin('base',variables(k).name,eval(variables(k).name))
end
end
But I doubt that this will help with the aim to clean up your program. As mentioned above I suggest ordering things manually in structures:
function out = your_function()
x = 'hello' ;
y = 'world' ;
out.x = x ;
out.y = y ;
end

If the function you would like to define are simple and have a single output, one option is to use anonymous functions.
Another option is to store all the variable you would like to use afterwards in a struct and have your big function return this struct as an output.
function AllVariables = GlobalFunction(varargin);
% bunch of stuff
AllVariables= struct('Variable1', Variable1, 'Variable2', Variable2, …);
end

Related

When is the eval function necessary?

I just read the following article from MathWorks which describes why it is important to avoid the eval function and lists alternatives to many of eval's common uses.
After reading the article, I have the impression that the eval function is neither useful nor necessary. So, my question is this: When is the eval function necessary?
I have found only one useful case for eval, and then the evalc variety: when calling a function with built-in command line call back (e.g. lines without ; or with disp calls), which you cannot modify. For instance when you got some obfuscated function that dumps heaps of stuff to your command window. In that case it's best to try and obtain the source code to modify that to your needs, as using evalc will mess up your performance. Otherwise, I have not found a case where eval is the best solution.
I wrote an extensive answer detailing why you should try to avoid eval as much as possible here: How to put these images together?
I have already used eval when trying to create multiple arrays with different names. This is not really recommended, but it worked for my specific application. For example, if I wanted to have N matrices with the specific names "matrix1" "matrix2" .. "matrixN" , one solution would be to manually type these in as "matrix1 = something" ... "matrixN = somethingelse". If N is really large, this is not ideal. Using eval , you could set up a for loop that would change the name of the matrix on every loop, and calculate some value based on that same N value.
Of course, ideally saving them in to a cell would be better, but I needed the arrays in the format I described.

Stata: Edit each element of a global (which contains a list of variables)

This is probably just a short syntax question. I have:
clear all
macro drop _all
global variables var1 var2
and I want
global means m_var1 m_var2
which I have generated elsewhere. The goal is to use both globals in a Mundlak regression (like reg depvar $variables $means and not having to calculate/include the means by hand for different specifications. My idea was something along the lines of:
global means "m_`variables'"
but that simply ignores the variables global. Again, sorry for the R-think...
Edit: My strategy: I am trying to write a program which runs models (Mundlak/Chamberlain random effects logit, see Wooldridges Panel book 2nd ed p. 487) on several distinct lists of variables and returns graphs of regression results. This should be done such that I only have to change the globals/locals specifying these variables in the beginning. Thus, I need to have code that creates time averages of the globals and uses these and the original global in the logit specification.
I'm not convinced your general strategy is a good one, but I don't have information on the issue you face, so I won't comment much more.
I'll state that using locals is a better idea if you can spare the globals, and that you can redefine the contents of a macro using a loop:
clear all
set more off
local variables var1 var2
// original
local means "m_`variables'"
// loop
local means2
foreach v of local variables {
local means2 `means2' m_`v'
}
display "`means'"
display "`means2'"

How to use assignin for multiple variables?

I have a nested function which calls a script basicially containing some definitions of constants and strings. I need to pass these variables to the base workspace. I know I could define them as global which is supposed to be not the best solution, is it?
The conventional way, using the output arguments of the function seems to be too complicated in my case. (It's actually just a one time call, so I don't want to blow up my code) So I thought about using assignin and who but neither it does seem to work for cell arrays nor for comma separated lists. Probably I'm just missing some syntax refinements.
function myFunction()
myScriptWithDefinitions;
% who returns a cell array with all variables from my script
temp = who;
% now I try to assign these variables to my base workspace
% these are my attempts, none of them working
assignin('base',who);
assignin('base',temp{:});
assignin('base',{temp{:}});
...
end
I'm aware that I acually need to pass both, a list of names and a list of values.
any further ideas?
Edit: something like
assignin('base',{'A','B'},{2,5})
% or
assignin('base',{'A',2},{'B',5})
does not work, so I guess assignin in general is not an option.
with assignin you can only assign-in 1 variable at once.
With "who" you get a cell-array of strings, that contains the names of the variables. Now if you have this list:
myVarList=who;
you can loop over and assign the variables to the workspace:
myVarList=who;
for indVar = 1:length(myVarList)
assignin('base',myVarList{indVar},eval(myVarList{indVar}))
end
Note: this is an eval-solution... if someone knows a quick replacement for that, please let me know :)

Elegant way to set a mode of operation across multiple function files

I've written a piece of code which calls numerous functions, which in turn also call several sub-functions.
I am calling the main file from the command line and supplementing the call with certain arguments to initiate certain modes I have accounted for.
E.g. octave classify_file.m --debug <file> would run in my custom debug mode, which sets a constant debug to 1 and subsequently outputs all plots and relevant variables. Omitting the argument outputs only 1 variable.
Similarily I have a template and a histogram mode, which essentially all do the same thing, except output some more variables, matrices and plots depending on the mode.
As it is now, I have to include the debug, template and histogram constants as arguments to each and every function if I want them to be influenced by the respective modes.
This is cumbersome and confusing, there has to be a better way. I've never worked with global variables, but would this be a good place to use one? What's an elegant solution for this problem?
This is a situation in which global variables will come in handy, although as you may be aware they are sometimes frowned upon, and can also have certain performance implications in matlab. Personally I don't think passing the mode all the way down the call stack is too bad - although are you treating all 3 as separate arguments? The least you could do is put them in a struct in your highest level function so that you only have 1 argument:
mode.debug = [whatever]
mode.histogram = [whatever]
mode.template = [whatever]
myFunction(mode);
OR, if you can only have one mode at a time what about some integer constants?
mode = MODE_DEBUG
or
mode = MODE_NONE
I would define the constants by creating short functions, this is how the pi constant works for example.
Finally, there is an alternative to global variables which I rather like, which is functions that use persistent variables. For example:
function m = debugMode(newValue)
persistant isModeOn;
if nargin > 0
isModeOn = newValue
end
m = isModeOn;
end
This way you can call debugMode(1) to set it on, and you can call m = debugMode anywhere to get the value.

How does scoping in Matlab work?

I just discovered (to my surprise) that calling the following function
function foo()
if false
fprintf = 1;
else
% do nothing
end
fprintf('test')
gives and error Undefined function or variable "fprintf". My conclusion is that the scope of variables is determined before runtime (in my limited understanding how interpretation of computer languages and specifically Matlab works). Can anyone give me some background information on this?
Edit
Another interesting thing I forgot to mention above is that
function foo()
if false
fprintf = 1;
else
% do nothing
end
clear('fprintf')
fprintf('test')
produces Reference to a cleared variable fprintf.
MATLAB parses the function before it's ever run. It looks for variable names, for instance, regardless of the branching that activates (or doesn't activate) those variables. That is, scope is not determined at runtime.
ADDENDUM: I wouldn't recommend doing this, but I've seen a lot of people doing things with MATLAB that I wouldn't recommend. But... consider what would happen if someone were to define their own function called "false". The pre-runtime parser couldn't know what would happen if that function were called.
It seems that the first time the MATLAB JIT compiler parses the m-file, it identifies all variables declared in the function. It doesn't seem to care whether said variable is being declared in unreachable code. So your local fprintf variable immediately hides the builtin function fprintf. This means that, as far as this function is concerned, there is no builtin function named fprintf.
Of course, once that happens, every reference within the function to fprintf refers to the local variable, and since the variable never actually gets created, attempting to access it results in errors.
Clearing the variable simply clears the local variable, if it exists, it does not bring the builtin function back into scope.
To call a builtin function explicitly, you can use the builtin function.
builtin( 'fprintf', 'test' );
The line above will always print the text at the MATLAB command line, irrespective of local variables that may shadow the fprintf function.
Interesting situation. I doubt if there is detailed information available about how the MATLAB interpreter works in regard to this strange case, but there are a couple of things to note in the documentation...
The function precedence order used by MATLAB places variables first:
Before assuming that a name matches a function, MATLAB checks for a variable with that name in the current workspace.
Of course, in your example the variable fprintf doesn't actually exist in the workspace, since that branch of the conditional statement is never entered. However, the documentation on variable naming says this:
Avoid creating variables with the same name as a function (such as i, j, mode, char, size, and path). In general, variable names take precedence over function names. If you create a variable that uses the name of a function, you sometimes get unexpected results.
This must be one of those "unexpected results", especially when the variable isn't actually created. The conclusion is that there must be some mechanism in MATLAB that parses a file at runtime to determine what possible variables could exist within a given scope, the net result of which is functions can still get shadowed by variables that appear in the m-file even if they don't ultimately appear in the workspace.
EDIT: Even more baffling is that functions like exist and which aren't even aware of the fact that the function appears to be shadowed. Adding these lines before the call to fprintf:
exist('fprintf')
which('fprintf')
Gives this output before the error occurs:
ans =
5
built-in (C:\Program Files\MATLAB\R2012a\toolbox\matlab\iofun\fprintf)
Indicating that they still see the built-in fprintf.
These may provide insight:
https://www.mathworks.com/help/matlab/matlab_prog/base-and-function-workspaces.html
https://www.mathworks.com/help/matlab/matlab_prog/share-data-between-workspaces.html
This can give you some info about what is shadowed:
which -all
(Below was confirmed as a bug)
One gotcha is that Workspace structs, and classes on the path, have particular scoping and type precedence that (if you are me) may catch you out.
E.g. in 2017b:
% In C.m, saved in the current directory
classdef C
properties (Constant)
x = 100;
end
end
% In Command window
C.x = 1;
C.x % 100
C.x % 1 (Note the space)
C.x*C.x % 1
disp(C.x) % 1