I am trying to make sense out of a statement in the Julia's Metaprogramming documentation on macro hygiene. The documentation claims that
Julia’s macro expander solves these problems in the following way. First, variables within a macro result are classified as either local or global. A variable is considered local if it is assigned to (and not declared global), declared local, or used as a function argument name. Otherwise, it is considered global. Local variables are then renamed to be unique (using the gensym() function, which generates new symbols), and global variables are resolved within the macro definition environment. Therefore both of the above concerns are handled; the macro’s locals will not conflict with any user variables, and time and println will refer to the standard library definitions.
I wrote a small program to see whether global variables were indeed resolved in the macro definition environment. I wrote the following:
f(x) = x + 100
macro g() # According to Julia docs, ...
:(f(x) + 5) # ... f is global, x is local, right?
end # if so, f should refer to the f above?
(function main()
local x = 3
f(x) = x - 100 # f in the call environment subtracts 100
println(#g()) # So why does this do -92?
end)()
If I am to understand the Julia docs correctly, part of macro hygiene is to make sure whichever functions are called in the macro's returned expression don't get hijacked by functions of the same name inside the caller's environment. But this is exactly what happens here, the function f that was used was the one that was defined locally.
I would have thought I would have to use esc in order to use the f in scope at the point of call. But this is not so, why?
And in addition, I noticed that the the variable x inside the macro result is considered local, so a new gensymed variable name should have been generated for it, so as not to clash with the x in the macro call environment. But this did not happen either!
How am I to read the documentation to make any sense out the reason that esc need not be used here?
EDIT (CLARIFICATION)
When I claim f is global and x is local, according to the docs, I do so because I see that x is used as a function argument. I understand x is not being written to nor declared local, and it certainly looks global, but those docs claim function arguments should be global too!
I know the usual part of hygiene where the gensymming of locals ensures that variables of the same name in the macro caller's context are not inadvertently plastered. However, the docs claim that for functions, the ones that are seen in the macro definition's context are protected from the caller using their own. This is the part that makes no sense to me, because my experiment shows otherwise.
This is a bug. It's an old issue of Julia 0.5 and earlier, and has been fixed in Julia 0.6. See https://github.com/JuliaLang/julia/issues/4873 for more information.
Related
Is the distinction between the 3 different let forms (as in Scheme's let, let*, and letrec) useful in practice?
I am current in the midst of developing a lisp-style language that does current support all 3 forms, yet I have found:
regular "let" is the most inefficient form, effectively having to translate to an immediately called lambda form and the instructions generated are nearly identical. Additionally, I haven't found myself needing this form very often.
let* (sequential binding) seems to be the most practically useful and most often used. This form can be translated to a sequence of nested "lets", each environment storing a single variable. But this again is highly inefficient, wasting space and lookup time.
letrec (recursive binding) can be efficiently implemented, given that no initializer expression refers to an unbound variable. Typically the case is that all initializers are lambda expressions and the above is true.
The question is: since letrec can be efficiently implemented and also subsumes the behavior of let*, regular let is not often used and can be converted to a lambda form with no great loss of efficiency, why not make default "let" have the behavior of the current "letrec" and be rid of the original "let"?
This [let*] form can be translated to a sequence of nested "lets", each environment storing a single variable. But this again is highly inefficient, wasting space and lookup time.
While what you are saying here is not incorrect, in fact there is no need for such a transformation. A compiling strategy for the simple let can handle the semantics of let* with just simple modifications (possibly supporting both with just a flag passed to common code).
let* just alters the scoping rules, which are settled at compile time; it's mostly a matter of which compile-time environment object is used when compiling a given variable init form.
A compiler can use a single environment object for the sequential bindings of a let*, and destructively update it as it compiles the variable init forms, so that each successive init form sees a more and more extended version of that environment which contains more and more variables. At the end of that, the complete environment is available with all the variables, for doing the code generation for generating the frame and whatnot.
One issue to watch out for is that a flat environment representation for let* means that lexical closures captured during the variable binding phase can capture future variables which are lexically invisible to them:
(let* ((past 42)
(present (lambda () (do-something-with past)))
(future (construct-huge-cumbersome-object)))
...))
If there is a single run-time environment object here containing the compiled versions of the variables past, present and future, then it means that the lambda must capture that environment. Which means that although ostensibly the lambda "sees" only the past variable, because future is not in scope, it has de facto captured future.
Thus, garbage collection will consider the huge-cumbersome-object to be reachable for as long as the lambda remains reachable.
There are ways to address this, like accompanying the environmental reference emanating from the lambda with some kind of frame index which says, "I'm only referencing part of the environment vector up to index 13". Then when the garbage collector traverses this fenced reference, it will only mark the indicated part of the environment vector: cells 0 to 13.
Anyway, about whether to implement both let and let*. I suspect if Lisp were being "green field" designed from scratch today, many designers would like reach for the sequentially binding version to be called let. The parallel construct would be the one available under the special name let*. The situations when you actually need let to be parallel are fewer. For instance, let allows us to re-bind a pair of variable symbols such that their contents appear exchanged; but this is rarely something that comes up in application programming. In some programming language cultures, variable shadowing is frowned up on entirely; GNU C has a -Wshadow warning against it, for instance.
Note how in ANSI Common Lisp, which has let and let*, the optional parameters of a function behave sequentially, like let*, and this is the only binding strategy supported! So that is to say:
(lambda (required &optional opt1 (opt2 opt1)) ...)
Here the value of opt2 is defaulted from whatever the value of opt1 is at the time of the call. The initialization expression of opt2 has the opt1 parameter in scope.
Also, in the same Lisp dialect, the regular setf is sequential; if you want parallel assignment you must use psetf, which is the longer name of the two.
Common Lisp already shows evidence of design decisions more recent than let tend to favor sequential operation, and designate the parallel as the extraordinary variant.
Think of metaprogramming. If your default let will sequentially create nested scopes, you'll have to make sure that none of the initialiser expressions are referring to the names from the wrong scopes. You have such a guarantee with a regular let. Control over name scoping is very important when you're generating code.
Letrec is even worse, it's introducing a very complicated scope rules that cannot be easily reasoned with.
I am working with Stata.
I have a variable called graduate_secondary.
I generate a global variable called outcome, because eventually I will use another outcome.
Now I want to replace the variable graduate if a condition relative to global is met, but I get an error:
My code is:
global outcome "graduate_secondary"
gen graduate=.
replace graduate=1 if graduate_primary==1 & `outcome'==1
But i receive the symbol ==1 invalid name.
Does anyone know why?
Something along those lines might work (using a reproducible example):
sysuse auto, clear
global outcome "rep78"
gen graduate=.
replace graduate=1 if mpg==22 & $outcome==3
(2 real changes made)
In your example, just use
replace graduate=1 if graduate_primary==1 & $outcome==1
would work.
Another solution is to replace global outcome "graduate_secondary" with local outcome "graduate_secondary".
Stata has two types of macros: global, which are accessed with a $, and local, which are accessed with single quotes `' around the name -- as you did in your original code.
You get an error message because a local by the name of outcome has no value assigned to it in your workspace. By design, this will not itself produce an error but instead will the reference to the macro will evaluate as a blank value. You can see the result of evaluating macro references when you type them by using display as follows. You can also see all of the macros in your workspace with macro dir (the locals start with an underscore):
display `outcome'
display $outcome
Here is a blog post about using macros in Stata. In general, I only use global macros when I have to pass something between multiple routines, but this seems like a good use case for locals.
Suppose I have this macro definition in a module:
module Example
export #example_macro
macro example_macro(a)
quote
local r = RemoteRef()
put!(r, $(esc(a)))
remotecall_fetch(2, (r) -> fetch(r), r)
end
end
end
And here is its expansion:
julia> include("Example.jl")
julia> using Example
julia> macroexpand(quote #example_macro a end)
quote # none, line 1:
begin # /.../Example.jl, line 7:
local #121#r = Example.RemoteRef() # line 8:
Example.put!(#121#r,a) # line 9:
Example.remotecall_fetch(2,(r) -> Example.fetch(r),#121#r)
end
end
Every single one of globally available functions (like put! or fetch) are prefixed with the name of the module. I understand that this is needed for the macro to be hygienic - if, say, fetch was redefined in the module in which #example_macro is called, and fetch was inserted into the expansion as is, it wouldn't work correctly.
However, this also requires Example module to be available not only in the main process, but also on the second worker (since remotecall_fetch needs to execute Example.fetch on it). I don't want it - after all, fetch is a basic function available on all workers by default.
So, is there a way to disable prefixing all identifiers with the name of the current module? I think this would mean turning the macro non-hygienic as it is impossible to decide where some identifier (like fetch) is defined on macro expansion phase, and that's fine for me.
Since this is a pretty profound question, I think that you should give the Julia devs themselves a chance to answer it by asking on julia-users.
Currently, you can completely circumvent macro hygiene by wrapping the whole quote block in your macro in an esc(...) (don't forget to take away the esc around a), but I would in general advise against it - then you are on your own.
I just discovered (to my surprise) that calling the following function
function foo()
if false
fprintf = 1;
else
% do nothing
end
fprintf('test')
gives and error Undefined function or variable "fprintf". My conclusion is that the scope of variables is determined before runtime (in my limited understanding how interpretation of computer languages and specifically Matlab works). Can anyone give me some background information on this?
Edit
Another interesting thing I forgot to mention above is that
function foo()
if false
fprintf = 1;
else
% do nothing
end
clear('fprintf')
fprintf('test')
produces Reference to a cleared variable fprintf.
MATLAB parses the function before it's ever run. It looks for variable names, for instance, regardless of the branching that activates (or doesn't activate) those variables. That is, scope is not determined at runtime.
ADDENDUM: I wouldn't recommend doing this, but I've seen a lot of people doing things with MATLAB that I wouldn't recommend. But... consider what would happen if someone were to define their own function called "false". The pre-runtime parser couldn't know what would happen if that function were called.
It seems that the first time the MATLAB JIT compiler parses the m-file, it identifies all variables declared in the function. It doesn't seem to care whether said variable is being declared in unreachable code. So your local fprintf variable immediately hides the builtin function fprintf. This means that, as far as this function is concerned, there is no builtin function named fprintf.
Of course, once that happens, every reference within the function to fprintf refers to the local variable, and since the variable never actually gets created, attempting to access it results in errors.
Clearing the variable simply clears the local variable, if it exists, it does not bring the builtin function back into scope.
To call a builtin function explicitly, you can use the builtin function.
builtin( 'fprintf', 'test' );
The line above will always print the text at the MATLAB command line, irrespective of local variables that may shadow the fprintf function.
Interesting situation. I doubt if there is detailed information available about how the MATLAB interpreter works in regard to this strange case, but there are a couple of things to note in the documentation...
The function precedence order used by MATLAB places variables first:
Before assuming that a name matches a function, MATLAB checks for a variable with that name in the current workspace.
Of course, in your example the variable fprintf doesn't actually exist in the workspace, since that branch of the conditional statement is never entered. However, the documentation on variable naming says this:
Avoid creating variables with the same name as a function (such as i, j, mode, char, size, and path). In general, variable names take precedence over function names. If you create a variable that uses the name of a function, you sometimes get unexpected results.
This must be one of those "unexpected results", especially when the variable isn't actually created. The conclusion is that there must be some mechanism in MATLAB that parses a file at runtime to determine what possible variables could exist within a given scope, the net result of which is functions can still get shadowed by variables that appear in the m-file even if they don't ultimately appear in the workspace.
EDIT: Even more baffling is that functions like exist and which aren't even aware of the fact that the function appears to be shadowed. Adding these lines before the call to fprintf:
exist('fprintf')
which('fprintf')
Gives this output before the error occurs:
ans =
5
built-in (C:\Program Files\MATLAB\R2012a\toolbox\matlab\iofun\fprintf)
Indicating that they still see the built-in fprintf.
These may provide insight:
https://www.mathworks.com/help/matlab/matlab_prog/base-and-function-workspaces.html
https://www.mathworks.com/help/matlab/matlab_prog/share-data-between-workspaces.html
This can give you some info about what is shadowed:
which -all
(Below was confirmed as a bug)
One gotcha is that Workspace structs, and classes on the path, have particular scoping and type precedence that (if you are me) may catch you out.
E.g. in 2017b:
% In C.m, saved in the current directory
classdef C
properties (Constant)
x = 100;
end
end
% In Command window
C.x = 1;
C.x % 100
C.x % 1 (Note the space)
C.x*C.x % 1
disp(C.x) % 1
I've been getting my hands wet with emacs lisp, and one thing that trips me up sometimes is the dynamic scope. Is there much of a future for it? Most languages I know use static scoping (or have moved to static scoping, like Python), and probably because I know it better I tend to prefer it. Are there specific applications/instances or examples where dynamic scope is more useful?
There's a good discussion of this issue here. The most useful part that pertains to your question is:
Dynamic bindings are great for
modifying the behaviour of subsystems.
Suppose you are using a function ‘foo’
that generates output using ‘print’.
But sometimes you would like to
capture the output in a buffer of your
choosing. With dynamic binding, it’s
easy:
(let ((b (generate-new-buffer-name " *string-output*"))))
(let ((standard-output b))
(foo))
(set-buffer b)
;; do stuff with the output of foo
(kill-buffer b))
(And if you used this kind of thing a
lot, you’d encapsulate it in a macro –
but luckily it’s already been done as
‘with-output-to-temp-buffer’.)
This works because ‘foo’ uses the
dynamic binding of the name
‘standard-output’, so you can
substitute your own binding for that
name to modify the behaviour of ‘foo’
– and of all the functions that ‘foo’
calls.
In a language without dynamic binding,
you’d probably add an optional
argument to ‘foo’ to specify a buffer
and then ‘foo’ would pass that to any
calls to ‘print’. But if ‘foo’ calls
other functions which themselves call
‘print’ you’ll have to alter those
functions as well. And if ‘print’ had
another option, say ‘print-level’,
you’d have to add that as an optional
argument as well… Alternatively, you
could remember the old value of
‘standard-output’, substitute your new
value, call ‘foo’ and then restore the
old value. And remember to handle
non-local exits using ‘throw’. When
you’re through with this, you’ll see
that you’ve implemented dynamic
binding!
That said, lexical binding is IMHO much better for 99% of the cases. Note that modern Lisps are not dynamic-binding-only like Emacs lisp.
Common Lisp supports both forms of binding, though the lexical one is used much more
The Scheme specification doesn't even specify dynamic binding (only lexical one), though many implementations support both.
In addition, modern languages like Python and Ruby that were somewhat inspired by Lisp usually support lexical-binding in a straightforward way, with dynamic binding also available but less straightforward.
If you read the Emacs paper (written in 1981), there's a specific section "Language Features for Extensibility" that addresses this question. In Emacs, there's also the added scope of buffer-local (file local) variables.
I've quoted the most relevant portion below:
Formal Parameters Cannot Replace
Dynamic Scope
Some language designers believe that
dynamic binding should be avoided, and
explicit argument passing should be
used instead. Imagine that function A
binds the variable FOO, and calls the
function B, which calls the function
C, and C uses the value of FOO.
Supposedly A should pass the value as
an argument to B, which should pass it
as an argument to C.
This cannot be done in an extensible
system, however, because the author of
the system cannot know what all the
parameters will be. Imagine that the
functions A and C are part of a user
extension, while B is part of the
standard system. The variable FOO does
not exist in the standard system; it
is part of the extension. To use
explicit argument passing would
require adding a new argument to B,
which means rewriting B and everything
that calls B. In the most common case,
B is the editor command dispatcher
loop, which is called from an awful
number of places.
What's worse, C must also be passed an
additional argument. B doesn't refer
to C by name (C did not exist when B
was written). It probably finds a
pointer to C in the command dispatch
table. This means that the same call
which sometimes calls C might equally
well call any editor command
definition. So all the editing
commands must be rewritten to accept
and ignore the additional argument. By
now, none of the original system is
left!