I'm learning blocks in Common lisp and did this example to see how blocks and the return-from command work:
(block b1
(print 1)
(print 2)
(print 3)
(block b2
(print 4)
(print 5)
(return-from b1)
(print 6)
)
(print 7))
It will print 1, 2, 3, 4, and 5, as expected. Changing the return-from to (return-from b2) it'll print 1, 2, 3, 4, 5, and 7, as one would expect.
Then I tried turn this into a function and paremetrize the label on the return-from:
(defun test-block (arg) (block b1
(print 1)
(print 2)
(print 3)
(block b2
(print 4)
(print 5)
(return-from (eval arg))
(print 6)
)
(print 7)))
and using (test-block 'b1) to see if it works, but it doesn't. Is there a way to do this without conditionals?
Using a conditional like CASE to select a block to return from
The recommended way to do it is using case or similar. Common Lisp does not support computed returns from blocks. It also does not support computed gos.
Using a case conditional expression:
(defun test-block (arg)
(block b1
(print 1)
(print 2)
(print 3)
(block b2
(print 4)
(print 5)
(case arg
(b1 (return-from b1))
(b2 (return-from b2)))
(print 6))
(print 7)))
One can't compute lexical go tags, return blocks or local functions from names
CLTL2 says about the restriction for the go construct:
Compatibility note: The ``computed go'' feature of MacLisp is not supported. The syntax of a computed go is idiosyncratic, and the feature is not supported by Lisp Machine Lisp, NIL (New Implementation of Lisp), or Interlisp. The computed go has been infrequently used in MacLisp anyway and is easily simulated with no loss of efficiency by using a case statement each of whose clauses performs a (non-computed) go.
Since features like go and return-from are lexically scoped constructs, computing the targets is not supported. Common Lisp has no way to access lexical environments at runtime and query those. This is for example also not supported for local functions. One can't take a name and ask for a function object with that name in some lexical environment.
Dynamic alternative: CATCH and THROW
The typically less efficient and dynamically scoped alternative is catch and throw. There the tags are computed.
I think these sorts of things boils down to the different types of namespaces bindings and environments in Common Lisp.
One first point is that a slightly more experienced novice learning Lisp might try to modify your attempted function to say (eval (list 'return-from ,arg)) instead. This seems to make more sense but still does not work.
Namespaces
A common beginner mistake in a language like scheme is having a variable called list as this shadows the top level definition of this as a function and stops the programmer from being able to make lists inside the scope for this binding. The corresponding mistake in Common Lisp is trying to use a symbol as a function when it is only bound as a variable.
In Common Lisp there are namespaces which are mappings from names to things. Some namespaces are:
The functions. To get the corresponding thing either call it: (foo a b c ...), or get the function for a static symbol (function foo) (aka #'foo) or for a dynamic symbol (fdefinition 'foo). Function names are either symbols or lists of setf and one symbol (e.g. (serf bar)). Symbols may alternatively be bound to macros in this namespace in which case function and fdefinition signal errors.
The variables. This maps symbols to the values in the corresponding variable. This also maps symbols to constants. Get the value of a variable by writing it down, foo or dynamically as (symbol-value). A symbol may also be bound as a symbol-macro in which case special macro expansion rules apply.
Go tags. This maps symbols to labels to which one can go (like goto in other languages).
Blocks. This maps symbols to places you can return from.
Catch tags. This maps objects to the places which catch them. When you throw to an object, the implementation effectively looks up the corresponding catch in this namespace and unwinds the stack to it.
classes (and structs, conditions). Every class has a name which is a symbol (so different packages may have a point class)
packages. Each package is named by a string and possibly some nicknames. This string is normally the name of a symbol and therefore usually in uppercase
types. Every type has a name which is a symbol. Naturally a class definition also defines a type.
declarations. Introduced with declare, declaim, proclaim
there might be more. These are all the ones I can think of.
The catch-tag and declarations namespaces aren’t like the others as they don’t really map symbols to things but they do have bindings and environments in the ways described below (note that I have used declarations to refer to the things that have been declared, like the optimisation policy or which variables are special, rather than the namespace in which e.g. optimize, special, and indeed declaration live which seems too small to include).
Now let’s talk about the different ways that this mapping may happen.
The binding of a name to a thing in a namespace is the way in which they are associated, in particular, how it may come to be and how it may be inspected.
The environment of a binding is the place where the binding lives. It says how long the binding lives for and where it may be accessed from. Environments are searched for to find the thing associated with some name in some namespace.
static and dynamic bindings
We say a binding is static if the name that is bound is fixed in the source code and a binding is dynamic if the name can be determined at run time. For example let, block and tags in a tagbody all introduce static bindings whereas catch and progv introduce dynamic bindings.
Note that my definition for dynamic binding is different from the one in the spec. The spec definition corresponds to my dynamic environment below.
Top level environment
This is the environment where names are searched for last and it is where toplevel definitions go to, for example defvar, defun, defclass operate at this level. This is where names are looked up last after all other applicable environments have been searched, e.g. if a function or variable binding can not be found at a closer level then this level is searched. References can sometimes be made to bindings at this level before they are defined, although they may signal warnings. That is, you may define a function bar which calls foo before you have defined foo. In other cases references are not allowed, for example you can’t try to intern or read a symbol foo::bar before the package FOO has been defined. Many namespaces only allow bindings in the top level environment. These are
constants (within the variables namespace)
classes
packages
types
Although (excepting proclaim) all bindings are static, they can effectively be made dynamic by calling eval which evaluates forms at the top level.
Functions (and [compiler] macros) and special variables (and symbol macros) may also be defined top level. Declarations can be defined toplevel either statically with the macro declaim or dynamically with the function proclaim.
Dynamic environment
A dynamic environment exists for a region of time during the programs execution. In particular, a dynamic environment begins when control flow enters some (specific type of) form and ends when control flow leaves it, either by returning normally or by some nonlocal transfer of control like a return-from or go. To look up a dynamically bound name in a namespace, the currently active dynamic environments are searched (effectively, ie a real system wouldn’t be implemented this way) from most recent to oldest for that name and the first binding wins.
Special variables and catch tags are bound in dynamic environments. Catch tags are bound dynamically using catch while special variables are bound statically using let and dynamically using progv. As we shall discuss later, let can make two different kinds of binding and it knows to treat a symbol as special if it has been defined with defvar or ‘defparameteror if it has been declared asspecial`.
Lexical environment
A lexical environment corresponds to a region of source code as it is written and a specific runtime instantiation of it. It (slightly loosely) begins at an opening parenthesis and ends at the corresponding closing parenthesis, and is instantiated when control flow hits the opening parenthesis. This description is a little complicated so let’s have an example with variables which are bound in a lexically environment (unless they are special. By convention the names special variables are wrapped in * symbols)
(defun foo ()
(let ((x 10))
(bar (lambda () x))))
(defun bar (f)
(let ((x 20))
(funcall f)))
Now what happens when we call (foo)? Well if x were bound in a dynamic environment (in foo and bar) then the anonymous function would be called in bar and the first dynamic environment with a binding for x would have it bound to 20.
But this call returns 10 because x is bound in a lexical environment so even though the anonymous function gets passed to bar, it remembers the lexical environment corresponding to the application of foo which created it and in that lexical environment, x is bound to 10. Let’s now have another example to show what I mean by ‘specific runtime instantiation’ above.
(defun baz (islast)
(let ((x (if islast 10 20)))
(let ((lx (lambda () x)))
(if islast
lx
(frob lx (baz t))))))
(defun frob (a b)
(list (funcall a) (funcall b)))
Now running (baz nil) will give us (20 10) because the first function passed to frob remembers the lexical environment for the outer call to baz (where islast is nil) whilst the second remembers the environment for the inner call.
For variables which are not special, let creates static lexical bindings. Block names (introduced statically by block), go tags (scopes inside a tagbody), functions (by felt or labels), macros (macrolet), and symbol macros (symbol-macrolet) are all bound statically in lexical environments. Bindings from a lambda list are also lexically bound. Declarations can be created lexically using (declare ...) in one of the allowed places or by using (locally (declare ...) ...) anywhere.
We note that all lexical bindings are static. The eval trick described above does not work because eval happens in the toplevel environment but references to lexical names happen in the lexical environment. This allows the compiler to optimise references to them to know exactly where they are without running code having to carry around a list of bindings or accessing global state (e.g. lexical variables can live in registers and the stack). It also allows the compiler to work out which bindings can escape or be captured in closures or not and optimise accordingly. The one exception is that the (symbol-)macro bindings can be dynamically inspected in a sense as all macros may take an &environment parameter which should be passed to macroexpand (and other expansion related functions) to allow the macroexpander to search the compile-time lexical environment for the macro definitions.
Another thing to note is that without lambda-expressions, lexical and dynamic environments would behave the same way. But note that if there were only a top level environment then recursion would not work as bindings would not be restored as control flow leaves their scope.
Closure
What happens to a lexical binding captured by an anonymous function when that function escapes the scope it was created in? Well there are two things that can happen
Trying to access the binding results in an error
The anonymous function keeps the lexical environment alive for as long as the functions referencing it are alive and they can read and write it as they please.
The second case is called a closure and happens for functions and variables. The first case happens for control flow related bindings because you can’t return from a form that has already returned. Neither happens for macro bindings as they cannot be accessed at run time.
Nonlocal control flow
In a language like Java, control (that is, program execution) flows from one statement to the next, branching for if and switch statements, looping for others with special statements like break and return for certain kinds of jumping. For functions control flow goes into the function until it eventually comes out again when the function returns. The one nonlocal way to transfer control is by using throw and try/catch where if you execute a throw then the stack is unwound piece by piece until a suitable catch is found.
In C there are is no throw or try/catch but there is goto. The structure of C programs is secretly flat with the nesting just specifying that “blocks” end in the opposite order to the order they start. What I mean by this is that it is perfectly legal to have a while loop in the middle of a switch with cases inside the loop and it is legal to goto the middle of a loop from outside of that loop. There is a way to do nonlocal control transfer in C: you use setjmp to save the current control state somewhere (with the return value indicating whether you have successfully saved the state or just nonlocally returned there) and longjmp to return control flow to a previously saved state. No real cleanup or freeing of memory happens as the stack unwinds and there needn’t be checks that you still have the function which called setjmp on the callstack so the whole thing can be quite dangerous.
In Common Lisp there’s a range of ways to do nonlocal control transfer but the rules are more strict. Lisp doesn’t really have statements but rather everything is built out of a tree of expressions and so the first rule is that you can’t nonlocally transfer control into a deeper expression, you may only transfer out. Let’s look at how these different methods of control transfer work.
block and return-from
You’ve already seen how these work inside a single function but recall that I said block names are lexically scoped. So how does this interact with anonymous functions?
Well suppose you want to search some big nested data structure for something. If you were writing this function in Java or C then you might implement a special search function to recurse through your data structure until it finds the right thing and then return it all the way up. If you were implementing it in Haskell then you would probably want to do it as some kind of fold and rely on lazy evaluation to not do too much work. In Common Lisp you might have a function which applies some other function passed as a parameter to each item in the data structure. And now you can call that with a searching function. How might you get the result out? Well just return-from to the outer block.
tagbody and go
A tagbody is like a progn but instead of evaluating single symbols in the body, they are called tags and any expression within the tagbody can go to them to transfer control to it. This is partly like goto, if you’re still in the same function but if your go expression happens inside some anonymous function then it’s like a safe lexically scoped longjmp.
catch and throw
These are most similar to the Java model. The key difference between block and catch is that block uses lexical scoping and catch uses dynamic scoping. Therefore their relationship is like that between special and regular variables.
Finally
In Java one can execute code to tidy things up if the stack has to unwind through it as an exception is thrown. This is done with try/finally. The Common Lisp equivalent is called unwind-protect which ensures a form is executed however control flow may leave it.
Errors
It’s perhaps worth looking a little at how errors work in Common Lisp. Which of these methods do they use?
Well it turns out that the answer is that errors instead of generally unwinding the stack start by calling functions. First they look up all the possible restarts (ways to deal with an error) and save them somewhere. Next they look up all applicable handlers (a list of handlers could, for example, be stored in a special variable as handlers have dynamic scope) and try each one at a time. A handler is just a function so it might return (ie not want to handle the error) or it might not return. A handler might not return if it invokes a restart. But restarts are just normal functions so why might these not return? Well restarts are created in a dynamic environment below the one where the error was raised and so they can transfer control straight out of the handler and the code that threw the error to some code to try to do something and then carry on. Restarts can transfer control using go or return-from. It is worth noting that it is important here that we have lexical scope. A recursive function could define a restart on each successive call and so it is necessary to have lexical scope for variables and tags/block names so that we can make sure we transfer control to the right level on the call stack with the right state.
Related
Is the distinction between the 3 different let forms (as in Scheme's let, let*, and letrec) useful in practice?
I am current in the midst of developing a lisp-style language that does current support all 3 forms, yet I have found:
regular "let" is the most inefficient form, effectively having to translate to an immediately called lambda form and the instructions generated are nearly identical. Additionally, I haven't found myself needing this form very often.
let* (sequential binding) seems to be the most practically useful and most often used. This form can be translated to a sequence of nested "lets", each environment storing a single variable. But this again is highly inefficient, wasting space and lookup time.
letrec (recursive binding) can be efficiently implemented, given that no initializer expression refers to an unbound variable. Typically the case is that all initializers are lambda expressions and the above is true.
The question is: since letrec can be efficiently implemented and also subsumes the behavior of let*, regular let is not often used and can be converted to a lambda form with no great loss of efficiency, why not make default "let" have the behavior of the current "letrec" and be rid of the original "let"?
This [let*] form can be translated to a sequence of nested "lets", each environment storing a single variable. But this again is highly inefficient, wasting space and lookup time.
While what you are saying here is not incorrect, in fact there is no need for such a transformation. A compiling strategy for the simple let can handle the semantics of let* with just simple modifications (possibly supporting both with just a flag passed to common code).
let* just alters the scoping rules, which are settled at compile time; it's mostly a matter of which compile-time environment object is used when compiling a given variable init form.
A compiler can use a single environment object for the sequential bindings of a let*, and destructively update it as it compiles the variable init forms, so that each successive init form sees a more and more extended version of that environment which contains more and more variables. At the end of that, the complete environment is available with all the variables, for doing the code generation for generating the frame and whatnot.
One issue to watch out for is that a flat environment representation for let* means that lexical closures captured during the variable binding phase can capture future variables which are lexically invisible to them:
(let* ((past 42)
(present (lambda () (do-something-with past)))
(future (construct-huge-cumbersome-object)))
...))
If there is a single run-time environment object here containing the compiled versions of the variables past, present and future, then it means that the lambda must capture that environment. Which means that although ostensibly the lambda "sees" only the past variable, because future is not in scope, it has de facto captured future.
Thus, garbage collection will consider the huge-cumbersome-object to be reachable for as long as the lambda remains reachable.
There are ways to address this, like accompanying the environmental reference emanating from the lambda with some kind of frame index which says, "I'm only referencing part of the environment vector up to index 13". Then when the garbage collector traverses this fenced reference, it will only mark the indicated part of the environment vector: cells 0 to 13.
Anyway, about whether to implement both let and let*. I suspect if Lisp were being "green field" designed from scratch today, many designers would like reach for the sequentially binding version to be called let. The parallel construct would be the one available under the special name let*. The situations when you actually need let to be parallel are fewer. For instance, let allows us to re-bind a pair of variable symbols such that their contents appear exchanged; but this is rarely something that comes up in application programming. In some programming language cultures, variable shadowing is frowned up on entirely; GNU C has a -Wshadow warning against it, for instance.
Note how in ANSI Common Lisp, which has let and let*, the optional parameters of a function behave sequentially, like let*, and this is the only binding strategy supported! So that is to say:
(lambda (required &optional opt1 (opt2 opt1)) ...)
Here the value of opt2 is defaulted from whatever the value of opt1 is at the time of the call. The initialization expression of opt2 has the opt1 parameter in scope.
Also, in the same Lisp dialect, the regular setf is sequential; if you want parallel assignment you must use psetf, which is the longer name of the two.
Common Lisp already shows evidence of design decisions more recent than let tend to favor sequential operation, and designate the parallel as the extraordinary variant.
Think of metaprogramming. If your default let will sequentially create nested scopes, you'll have to make sure that none of the initialiser expressions are referring to the names from the wrong scopes. You have such a guarantee with a regular let. Control over name scoping is very important when you're generating code.
Letrec is even worse, it's introducing a very complicated scope rules that cannot be easily reasoned with.
defmacro is documented at http://clhs.lisp.se/Body/m_defmac.htm but the documentation is not entirely clear on exactly when things happen. By experiment with Clisp, I have found the following (assuming all macros and functions defined at top level):
Straight top-level code can only call macros and functions that have been defined earlier.
Code within a macro or function, or generated by a macro, can call any function it likes, including one define later (as expected from the need to support mutual recursion).
Code within a macro can only call a macro defined earlier than the calling site of the first macro.
Code generated by a macro can call a macro defined later.
Is it the case that Clisp is just following the specification, or is there any variation between implementations in this regard?
Is the exact intended set of rules, and the rationale behind them, documented anywhere?
You are asking about macro expansion - but I'd like to clarify how functions are handled first.
Pay attention to when the calls and the defines actually happens. In your second point you say code within a function can call a function that is defined later. This isn't strictly true.
In languages like C++ you declare and define functions and then compile your app. Ignoring inlining, templates, lambdas and other magic..., when compiling a function, the declarations of all other functions used by that function need to be present - and at link time, the compiled definitions need to be present - all before the program starts running. Once the program starts running, all functions are already fully prepared and ready to be called.
Now in Lisp, things are different. Ignore compilation for now - let's just think about an interpreted environment. If you run:
;; time 1
(defun a () (b))
;; time 2
(defun b () 123)
;; time 3
(a)
At time 1 your program has no functions.
The first defun then creates a function (lambda () (b)), and associates it with the symbol a. This function contains a reference to the symbol b, but at this point in time it is not calling b. a will only call b when a itself gets called.
So, at time 2 your program has one function, associated with the symbol a, but it has not been executed yet.
Now the second defun creates a function (lambda () 123), and associates it with the symbol b.
At time 3 your program has two functions, associated with the symbols a and b, but neither has been called yet.
Now you call a. During its execution, it looks for the function associated with the symbol b, finds that such a function already exists at this point in time, and calls it. b executes and returns 123.
Let's add more code:
;; time 4
(defun b () 456)
;; time 5
(a)
After time 4, a new defun creates a function returning 456, and associates it with the symbol b. This replaces the reference b was holding to the function returning 123, which will then be garbage collected (or whatever you implementation does to take out the trash).
Calling a (or more correctly, the lambda referenced by the function attribute of the symbol a), will now result in a call to a function that returns 456.
If, instead, we had originally written:
;; time 1
(defun a () (b))
;; time 2
(a)
;; time 3
(defun b () 123)
... this would not have worked, because after time 2 when we call a, it can't find a function associated with the symbol b and so it will fail.
Now - compile, eval-when, optimisation and other magic can do all kinds of funky things different from what I've described above, but make sure you first have a grasp of these basics before worrying about that more advanced stuff.
Functions are only created at the time that defun is called. (The interpreter doesn't "look ahead in the file".)
One of the attributes of a symbol is a reference to a function. (The function itself doesn't actually have a name.)
Multiple symbols can reference the same function. ((setf (symbol-function 'd) (symbol-function 'b)))
Defining a function a that calls function b (speaking colloquially), is OK as long as the symbol b has an associated function by the time a is called. (It is not required at the time of defunning a.)
A symbol can refer to different functions at different times. This affects any functions "calling" that symbol.
The rules for macros are different (their expansions are static after "read" time), but many of the principles remain the same (Lisp doesn't "look ahead in the file" to find them). Understand that Lisp programs are far more dynamic and "run-time" than most (lesser ;-) ) languages you may be used to. Understand what happens when during execution of a Lisp program, and the rules governing macro expansion will start making sense.
From the CLHS
symbol-macrolet lexically establishes expansion functions for each of the symbol macros named by symbols.
...
The use of symbol-macrolet can be shadowed by let.
This allows the following code to work (inside *b* x is bound to '1'):
CT> (with-slots (x y z) *b*
(let ((x 10))
(format nil "~a ~a ~a" x y z)))
"10 2 3"
My question is: How does symbol-macro let know which forms are allowed to shadow it? I ask as macros cannot guarentee that let has not been redefined or that the user has not created another form to do the same job as let. Is this a special 'dumb' case that just looks for the cl:let symbol? Or is there some more advance technique going on?
I am happy to edit this question if it is too vague, I am having difficulty articulating the issue.
See 3.1.1.4 and the surrounding materials.
Where is that quote from? I don't think it's entirely correct, since let is not the only thing that can shadow the name established by macrolet in the lexical environment.
I don't think it does much harm to reveal that the lexical environment isn't just an abstract concept, there is an actual data structure that manifests it. The lexical environment is available to macros at compile time via the &environment binding mechanism. Macros can use that to get a window into the environment and there is a protocol for using the environment. So, to give a simple example, macros can be authored that are sensitive to declarations in the lexical environment, for example expanding one way if a variable is declared fixnum.
The implementation of the environment is left up to the implementers, but heh it is just stack of names with information about the names. So lambda bindings, macrolet, labels, let*, etc. etc. are merely pushing new names into that stack and thus shadowing the old names. And lexical declarations are adding to the information about the names.
The compiler or evaluator then uses the environment (stack) to guide how the resulting code or execution behaves. It's worth noting that this data structure need not survive into runtime, though often some descendent of it does to help the debugger.
So to answer you question: macrolet doesn't know anything about what forms in it's &body might be doing.
As you can see SYMBOL-MACROLET is a built-in feature of Common Lisp. Just as LET. These special operators can't be redefined and it is not allowed to do so.
Common Lisp only has a fixed set of special operators and no way to define one by the user. There are only these ones defined: Common Lisp special operators.
Since macros will be expanded, they expand to the basic primitives: function calls and special forms. Thus the compiler/interpreter implements symbol-macrolet and this task is limited by the number of primitive forms. If the user implements his/her own LET, eventually this implementation boils also down to function calls and special forms - all uses of macros will be expanded to those, eventually. But those are known and there is nothing new for symbol-macrolet.
I've been getting my hands wet with emacs lisp, and one thing that trips me up sometimes is the dynamic scope. Is there much of a future for it? Most languages I know use static scoping (or have moved to static scoping, like Python), and probably because I know it better I tend to prefer it. Are there specific applications/instances or examples where dynamic scope is more useful?
There's a good discussion of this issue here. The most useful part that pertains to your question is:
Dynamic bindings are great for
modifying the behaviour of subsystems.
Suppose you are using a function ‘foo’
that generates output using ‘print’.
But sometimes you would like to
capture the output in a buffer of your
choosing. With dynamic binding, it’s
easy:
(let ((b (generate-new-buffer-name " *string-output*"))))
(let ((standard-output b))
(foo))
(set-buffer b)
;; do stuff with the output of foo
(kill-buffer b))
(And if you used this kind of thing a
lot, you’d encapsulate it in a macro –
but luckily it’s already been done as
‘with-output-to-temp-buffer’.)
This works because ‘foo’ uses the
dynamic binding of the name
‘standard-output’, so you can
substitute your own binding for that
name to modify the behaviour of ‘foo’
– and of all the functions that ‘foo’
calls.
In a language without dynamic binding,
you’d probably add an optional
argument to ‘foo’ to specify a buffer
and then ‘foo’ would pass that to any
calls to ‘print’. But if ‘foo’ calls
other functions which themselves call
‘print’ you’ll have to alter those
functions as well. And if ‘print’ had
another option, say ‘print-level’,
you’d have to add that as an optional
argument as well… Alternatively, you
could remember the old value of
‘standard-output’, substitute your new
value, call ‘foo’ and then restore the
old value. And remember to handle
non-local exits using ‘throw’. When
you’re through with this, you’ll see
that you’ve implemented dynamic
binding!
That said, lexical binding is IMHO much better for 99% of the cases. Note that modern Lisps are not dynamic-binding-only like Emacs lisp.
Common Lisp supports both forms of binding, though the lexical one is used much more
The Scheme specification doesn't even specify dynamic binding (only lexical one), though many implementations support both.
In addition, modern languages like Python and Ruby that were somewhat inspired by Lisp usually support lexical-binding in a straightforward way, with dynamic binding also available but less straightforward.
If you read the Emacs paper (written in 1981), there's a specific section "Language Features for Extensibility" that addresses this question. In Emacs, there's also the added scope of buffer-local (file local) variables.
I've quoted the most relevant portion below:
Formal Parameters Cannot Replace
Dynamic Scope
Some language designers believe that
dynamic binding should be avoided, and
explicit argument passing should be
used instead. Imagine that function A
binds the variable FOO, and calls the
function B, which calls the function
C, and C uses the value of FOO.
Supposedly A should pass the value as
an argument to B, which should pass it
as an argument to C.
This cannot be done in an extensible
system, however, because the author of
the system cannot know what all the
parameters will be. Imagine that the
functions A and C are part of a user
extension, while B is part of the
standard system. The variable FOO does
not exist in the standard system; it
is part of the extension. To use
explicit argument passing would
require adding a new argument to B,
which means rewriting B and everything
that calls B. In the most common case,
B is the editor command dispatcher
loop, which is called from an awful
number of places.
What's worse, C must also be passed an
additional argument. B doesn't refer
to C by name (C did not exist when B
was written). It probably finds a
pointer to C in the command dispatch
table. This means that the same call
which sometimes calls C might equally
well call any editor command
definition. So all the editing
commands must be rewritten to accept
and ignore the additional argument. By
now, none of the original system is
left!
My problem isn't with the built-in eval procedure but how to create a simplistic version of it. Just for starters I would like to be able to take this in '(+ 1 2) and have it evaluate the expression + where the quote usually takes off the evaluation.
I have been thinking about this and found a couple things that might be useful:
Unquote: ,
(quasiquote)
(apply)
My main problem is regaining the value of + as a procedure and not a symbol. Once I get that I think I should just be able to use it with the other contents of the list.
Any tips or guidance would be much appreciated.
Firstly, if you're doing what you're doing, you can't go wrong reading at least the first chapter of the Metalinguistic Abstraction section of Structure and Interpretation of Computer Programs.
Now for a few suggestions from myself.
The usual thing to do with a symbol for a Scheme (or, indeed, any Lisp) interpreter is to look it up in some sort of "environment". If you're going to write your own eval, you will likely want to provide your own environment structures to go with it. The one thing for which you could fall back to the Scheme system you're building your eval on top of is the initial environment containing bindings for things like +, cons etc.; this can't be achieved in a 100% portable way, as far as I know, due to various Scheme systems providing different means of getting at the initial environment (including the-environment special form in MIT Scheme and interaction-environment in (Petite) Chez Scheme... and don't ask me why this is so), but the basic idea stays the same:
(define (my-eval form env)
(cond ((self-evaluating? form) form)
((symbol? form)
;; note the following calls PCS's built-in eval
(if (my-kind-of-env? env)
(my-lookup form env)
;; apparently we're dealing with an environment
;; from the underlying Scheme system, so fall back to that
;; (note we call the built-in eval here)
(eval form env)))
;; "applicative forms" follow
;; -- special forms, macro / function calls
...))
Note that you will certainly want to check whether the symbol names a special form (lambda and if are necessary -- or you could use cond in place of if -- but you're likely to want more and possibly allow for extentions to the basic set, i.e. macros). With the above skeleton eval, this would have to take place in what I called the "applicative form" handlers, but you could also handle this where you deal with symbols, or maybe put special form handlers first, followed by regular symbol lookup and function application.