How is set! defined in scheme? - lisp

How would you implement your own set! function in Scheme? A set! function is a destructive procedure that changes a value that is defined taking into account the previous value.

As noted in the comments, set! is a primitive in Scheme that must be provided by the implementation. Similarly, you can't implement the assignment operator, =, in most programming languages. In Common Lisp, setf can be extended (using setf-expanders) to allow (setf form value) to work on new kinds of forms.
Because Scheme's set! only modifies variable bindings (like Common Lisp's setq), it is still worth asking how can we implement functions like set-car! and other structural modifiers. This could be seen, in one sense, as a generalization of assignment to variables, but because lexical variables (along with closures) are sufficient to represent arbitrarily complex structures, it can also be seen as a more specialized case. In Scheme (aside from built in primitives like arrays), mutation of object fields is a specialization, because objects can be implemented by lexical closures, and implemented in terms of set!. This is a typical exercise given when showing how structures, e.g., cons cells, can be implemented using lexical closures alone. Here's an example that shows the implementation of single-value mutable cells::
(define (make-cell value)
(lambda (op)
(case op
((update)
(lambda (new-value)
(set! value new-value)))
((retrieve)
(lambda ()
value)))))
(define (set-value! cell new-value)
((cell 'update) new-value))
(define (get-value cell)
((cell 'retrieve)))
Given these definitions, we can create a cell, that starts with the value 4, update the value to 8 using our set-value!, and retrieve the new value:
(let ((c (make-cell 4)))
(set-value! c 8)
(get-value c))
=> 8

As has been mentioned, set! is a primitive and can't be implemented as a procedure. To really understand how it works under the hood, I suggest you take a look at the inner workings of a Lisp interpreter. Here's a great one to start: the metacircular evaluator in SICP, in particular the section titled "Assignments and definitions". Here's an excerpt of the parts relevant for the question:
(define (eval exp env)
(cond ...
((assignment? exp) (eval-assignment exp env))
...
(else (error "Unknown expression type -- EVAL" exp))))
(define (assignment? exp)
(tagged-list? exp 'set!))
(define (eval-assignment exp env)
(set-variable-value! (assignment-variable exp)
(eval (assignment-value exp) env)
env)
'ok)
(define (set-variable-value! var val env)
(define (env-loop env)
(define (scan vars vals)
(cond ((null? vars)
(env-loop (enclosing-environment env)))
((eq? var (car vars))
(set-car! vals val))
(else (scan (cdr vars) (cdr vals)))))
(if (eq? env the-empty-environment)
(error "Unbound variable -- SET!" var)
(let ((frame (first-frame env)))
(scan (frame-variables frame)
(frame-values frame)))))
(env-loop env))
In the end, a set! operation is just a mutation of a binding's value in the environment. Because a modification at this level is off-limits for a "normal" procedure, it has to be implemented as a special form.

Can't, can't, can't. Everyone is so negative! You can definitely do this in Racket. All you need to do is to define your own "lambda" macro, that introduces mutable cells in place of all arguments, and introduces identifier macros for all of those arguments, so that when they're used as regular varrefs they work correctly. And a set! macro that prevents the expansion of those identifier macros, so that they can be mutated.
Piece of cake!

set! modifies a binding between a symbol and a location (anyting really). Compilers and Interpreters would treat set! differently.
An interpreter would have an environment which is a mapping between symbols and values. Strictly set! changes the value of the first occurrence of the symbol to point to the result of the evaluation of the second operand. In many implementations you cannot set! something that is not already bound. In Scheme it is expected that the variable is already bound. Reading SICP or Lisp in small pieces and playing with examples would make you a master implementer of interpreters.
In a compilers situation you don't really need a symbol table. You can keep evaluated operands on a stack and set! need either change what the stack location pointed to or if it's a free variable in a closure you can use assignment conversion. E.g. it could be boxed to a box or a cons.
(define (gen-counter start delta)
(lambda ()
(let ((cur start))
(set! start (+ cur delta))
cur)))
Might be translated to:
(define (gen-counter start delta)
(let ((start (cons start '()))
(lambda ()
(let ((cur (car start)))
(set-car! start (+ cur delta))
cur)))))
You might want to read Control-Flow Analysis of Higher-Order Languages where this method is used together with lots of information on compiler technique.

If you are implementing a simple interpreter, then it is not all that hard. Your environment will map from identifiers to their values (or syntactic keywords to their transformers). The identifier->value mapping will need to account for possible value changes. As such:
(define (make-cell value)
`(CELL ,value))
(define cell-value cadr)
(define (cell-value-set! cell value)
(set-car! (cdr cell) value))
...
((set-stmt? e)
(let ((name (cadr e))
(value (interpret (caddr e) env)))
(let ((cell (env-lookup name)))
(assert (cell? cell))
(cell-value-set! cell value)
'return-value-for-set!)))
...
With two other changes being that when you bind an identifier to a value (like in a let or lambda application) you need to extend the environment with something like:
(env-extend name (cell value) env)
and also when retrieving a value you'll need to use cell-value.
Of course, only mutable identifiers need a cell, but for a simple interpreter allocating a cell for all identifier values is fine.

In scheme there are 2 "global" environments.
There is an environment where you store your highest defined symbols and there is another environment where are stored the primitive functions and the global variables that represent parameters of the system.
When you write something like (set! VAR VAL), the interpreter will look for the binding of VAR.
You cannot use set! on a variable that was not bound. The binding is done either by define or by lambda or by the system for the primitive operators.
Binding means allocating a location in some environment. So the binding function has the signature symbol -> address.
Coming back to set!, the interpreter will look in the environment where VAR is bound. First it looks in local environment (created by lambda), then it its local parent environment, and so on, then in global environment, then in system environment (in that order) until it finds an environment frame that contains a binding for VAR.

Related

What is atom in LISP?

I would like to have a clear understanding, what is 'Atom' in LISP?
Due to lispworks, 'atom - any object that is not a cons.'.
But this definition is not enough clear for me.
For example, in the code below:
(cadr
(caddar (cddddr L)))
Is 'L' an atom? On the one hand, L is not an atom, because it is cons, because it is the list (if we are talking about object, which is associated with the symbol L).
On the other hand, if we are talking about 'L' itself (not about its content, but about the symbol 'L'), it is an atom, because it is not a cons.
I've tried to call function 'atom',
(atom L) => NIL
(atom `L) => T
but still I have no clue... Please, help!
So the final question: in the code above, 'L' is an atom, or not?
P.S. I'm asking this question due to LISP course at my university, where we have a definition of 'simple expression' - it is an expression, which is atom or function call of one or two atomic parameters. Therefore I wonder if expression (cddddr L) is simple, which depends on whether 'L' is atomic parameter or not.
Your Lisp course's private definition of "simple expression" is almost certainly rooted purely in syntax. The idea of "atomic parameter" means that it's not a compound expression. It probably has nothing to do with the run-time value!
Thus, I'm guessing, these are simple expressions:
(+ 1 2)
42
"abc"
whereas these are not:
(+ 1 (* 3 4)) ;; (* 3 4) is not an atomic parameter
(+ a b c) ;; parameters atomic, but more than two
(foo) ;; not simple: fewer than one parameter, not "one or two"
In light of the last counterexample, it would probably behoove them to revise their definition.
On the one hand, L is not an atom, because it is cons, because it is the list (if we are talking about object, which is associated with the symbol L).
You are talking here about the meaning of the code being executed, its semantics. L here stands for a value, which is a list in your tests. At runtime you can inspect values and ask about their types.
On the other hand, if we are talking about 'L' itself (not about its content, but about the symbol 'L'), it is an atom, because it is not a cons.
Here you are looking at the types of the values that make up the syntax of your code, how it is being represented before even being evaluated (or compiled). You are manipulating a tree of symbols, one of them being L. In the source code, this is a symbol. It has no meaning by itself other than being a name.
Code is data
Lisp makes it easy to represent source code using values in the language itself, and easy to manipulate fragments of code at one point to build code that is executed later. This is often called homoiconicity, thought it is somewhat a touchy word because people don't always think the definition is precise enough to be useful. Another saying is "code is data", something that most language designers and programmers will agree to be true.
Lisp code can be built at runtime as follows (> is the prompt of the REPL, what follows is the result of evaluation):
> (list 'defun 'foo (list 'l) (list 'car 'l))
(DEFUN FOO (L) (CAR L))
The resulting form happens to be valid Common Lisp code, not just a generic list of values. If you evaluate it with (eval *), you will define a function named FOO that takes the first element of some list L.
NB. In Common Lisp the asterisk * is bound in the REPL to the last value being successfully returned.
Usually you don't build code like that, the Lisp reader turns a stream of characters into such a tree. For example:
> (read-from-string "(defun foo (l) (car l))")
(DEFUN FOO (L) (CAR L))
But the reader is called also implicitly in the REPL (that's the R in the acronym).
In such a tree of symbols, L is a symbol.
Evaluation model
When you call function FOO after it has been defined, you are evaluating the body of FOO in a context where L is bound to some value. And the rule for evaluating a symbol is to lookup the value it is bound to, and return that. This is the semantics of the code, which the runtime implements for you.
If you are using a simple interpreter, maybe the symbol L is present somewhere at runtime and its binding is looked up. Usually the code is not interpreted like that, it is possible to analyze it and transform it in an efficient way, during compilation. The result of compilation probably does not manipulate symbols anymore here, it just manipulates CPU registers and memory.
In all cases, asking for the type of L at this point is by definition of the semantics just asking the type for whatever value is bound to L in the context it appears.
An atom is anything that is not a cons cell
Really, the definition of atom is no more complex than that. A value in the language that is not a cons-cell is called an atom. This encompasses numbers, strings, everything.
Sometimes you evaluate a tree of symbols that happens to be code, but then the same rule applies.
Simple expressions
P.S. I'm asking this question due to LISP course at my university, where we have a definition of 'simple expression' - it is an expression, which is atom or function call of one or two atomic parameters. Therefore I wonder if expression (cddddr L) is simple, which depends on whether 'L' is atomic parameter or not.
In that course you are writing functions that analyze code. You are given a Lisp value and must decide if it is a simple expression or not.
You are not interested in any particular interpretation of the value being given, at no point you are going to traverse the value, see a a symbol and try to resolve it to a value: you are checking if the syntax is a valid simple expression or not.
Note also that the definitions in your course might be a bit different than the one from any particular exising flavor (this is not necessarily Common Lisp, or Scheme, but a toy LISP dialect). Follow in priority the definitions from your course.
Imagine we have a predicate which tells us if an object is not a number:
(not-number-p 3) -> NIL
(not-number-p "string") -> T
(let ((foo "another string))
(not-number-p foo)) -> T
(not-number '(1 2 3)) -> T
(not-number (first '(1 2 3)) -> NIL
We can define that as:
(defun not-number-p (object)
(not (numberp object))
Above is just the opposite of NUMBERP.
NUMBERP -> T if object is a number
NOT-NUMBER-P -> NIL if object is a number
Now imagine we have a predicate NOT-CONS-P, which tells us if an object is not a CONS cell.
(not-cons-p '(1 . 2)) -> NIL
(let ((c '(1 . 2)))
(not-cons-p c)) -> NIL
(not-cons-p 3) -> T
(let ((n 4))
(not-cons-p n)) -> T
(not-cons-p NIL) -> T
(not-cons-p 'NIL) -> T
(not-cons-p 'a-symbol) -> T
(not-cons-p #\space) -> T
The function NOT-CONS-P can be defined as:
(defun not-cons-p (object)
(if (consp object)
NIL
T))
Or shorter:
(defun not-cons-p (object)
(not (consp object))
The function NOT-CONS-P is traditionally called ATOM in Lisp.
In Common Lisp every object which is not a cons cell is called an atom. The function ATOM is a predicate.
See the Common Lisp HyperSpec: Function Atom
Your question:
(cadr (caddar (cddddr L)))
Is 'L' an atom?
How would we know that? L is a variable. What is the value of L?
(let ((L 10))
(atom L)) -> T
(let ((L (cons 1 2)))
(atom L) -> NIL
(atom l) answers this question:
-> is the value of L an atom
(atom l) does not answer this question:
-> is L an atom? L is a variable and in a function call the value of L is passed to the function ATOM.
If you want to ask if the symbol L is an atom, then you need to quote the symbol:
(atom 'L) -> T
(atom (quote L)) -> T
symbols are atoms. Actually everything is an atom, with the exception of cons cells.

What is the difference between define vs define-syntax and let vs let-syntax when value is syntax-rules?

I'm in a process of implementing Hygienic macros in my Scheme implementation, I've just implemented syntax-rules, but I have this code:
(define odd?
(syntax-rules ()
((_ x) (not (even? x)))))
what should be the difference between that and this:
(define-syntax odd?
(syntax-rules ()
((_ x) (not (even? x)))))
from what I understand syntax-rules just return syntax transformer, why you can't just use define to assign that to symbol? Why I need to use define-syntax? What extra stuff that expression do?
Should first also work in scheme? Or only the second one?
Also what is the difference between let vs let-syntax and letrec vs letrec-syntax. Should (define|let|letrec)-syntax just typecheck if the value is syntax transformer?
EDIT:
I have this implementation, still using lisp macros:
;; -----------------------------------------------------------------------------
(define-macro (let-syntax vars . body)
`(let ,vars
,#(map (lambda (rule)
`(typecheck "let-syntax" ,(car rule) "syntax"))
vars)
,#body))
;; -----------------------------------------------------------------------------
(define-macro (letrec-syntax vars . body)
`(letrec ,vars
,#(map (lambda (rule)
`(typecheck "letrec-syntax" ,(car rule) "syntax"))
vars)
,#body))
;; -----------------------------------------------------------------------------
(define-macro (define-syntax name expr)
(let ((expr-name (gensym)))
`(define ,name
(let ((,expr-name ,expr))
(typecheck "define-syntax" ,expr-name "syntax")
,expr-name))))
This this code correct?
Should this code works?
(let ((let (lambda (x) x)))
(let-syntax ((odd? (syntax-rules ()
((_ x) (not (even? x))))))
(odd? 11)))
This question seems to imply some deep confusion about macros.
Let's imagine a language where syntax-rules returns some syntax transformer function (I am not sure this has to be true in RnRS Scheme, it is true in Racket I think), and where let and let-syntax were the same.
So let's write this function:
(define (f v)
(let ([g v])
(g e (i 10)
(if (= i 0)
i
(e (- i 1))))))
Which we can turn into this, of course:
(define (f v n)
(v e (i n)
(if (<= i 0)
i
(e (- i 1)))))
And I will tell you in addition that there is no binding for e or i in the environment.
What is the interpreter meant to do with this definition? Could it compile it? Could it safely infer that i can't possibly make any sense since it is used as a function and then as a number? Can it safely do anything at all?
The answer is that no, it can't. Until it knows what the argument to the function is it can't do anything. And this means that each time f is called it has to make that decision again. In particular, v might be:
(syntax-rules ()
[(_ name (var init) form ...)
(letrec ([name (λ (var)
form ...)])
(name init))]))
Under which the definition of f does make some kind of sense.
And things get worse: much worse. How about this?
(define (f v1 v2 n)
(let ([v v1])
(v e (i n)
...
(set! v (if (eq? v v1) v2 v1))
...)))
What this means is that a system like this wouldn't know what the code it was meant to interpret meant until, the moment it was interpreting it, or even after that point, as you can see from the second function above.
So instead of this horror, Lisps do something sane: they divide the process of evaluating bits of code into phases where each phase happens, conceptually, before the next one.
Here's a sequence for some imagined Lisp (this is kind of close to what CL does, since most of my knowledge is of that, but it is not intended to represent any particular system):
there's a phase where the code is turned from some sequence of characters to some object, possibly with the assistance of user-defined code;
there's a phase where that object is rewritten into some other object by user- and system-defined code (macros) – the result of this phase is something which is expressed in terms of functions and some small number of primitive special things, traditionally called 'special forms' which are known to the processes of stage 3 and 4;
there may be a phase where the object from phase 2 is compiled, and that phase may involve another set of user-defined macros (compiler macros);
there is a phase where the resulting code is evaluated.
And for each unit of code these phases happen in order, each phase completes before the next one begins.
This means that each phase in which the user can intervene needs its own set of defining and binding forms: it needs to be possible to say that 'this thing controls what happens at phase 2' for instance.
That's what define-syntax, let-syntax &c do: they say that 'these bindings and definitions control what happens at phase 2'. You can't, for instance, use define or let to do that, because at phase 2, these operations don't yet have meaning: they gain meaning (possibly by themselves being macros which expand to some primitive thing) only at phase 3. At phase 2 they are just bits of syntax which the macro is ingesting and spitting out.

Does any Lisp allow mutually recursive macros?

In Common Lisp, a macro definition must have been seen before the first use. This allows a macro to refer to itself, but does not allow two macros to refer to each other. The restriction is slightly awkward, but understandable; it makes the macro system quite a bit easier to implement, and to understand how the implementation works.
Is there any Lisp family language in which two macros can refer to each other?
What is a macro?
A macro is just a function which is called on code rather than data.
E.g., when you write
(defmacro report (x)
(let ((var (gensym "REPORT-")))
`(let ((,var ,x))
(format t "~&~S=<~S>~%" ',x ,var)
,var)))
you are actually defining a function which looks something like
(defun macro-report (system::<macro-form> system::<env-arg>)
(declare (cons system::<macro-form>))
(declare (ignore system::<env-arg>))
(if (not (system::list-length-in-bounds-p system::<macro-form> 2 2 nil))
(system::macro-call-error system::<macro-form>)
(let* ((x (cadr system::<macro-form>)))
(block report
(let ((var (gensym "REPORT-")))
`(let ((,var ,x)) (format t "~&~s=<~s>~%" ',x ,var) ,var))))))
I.e., when you write, say,
(report (! 12))
lisp actually passes the form (! 12) as the 1st argument to macro-report which transforms it into:
(LET ((#:REPORT-2836 (! 12)))
(FORMAT T "~&~S=<~S>~%" '(! 12) #:REPORT-2836)
#:REPORT-2836)
and only then evaluates it to print (! 12)=<479001600> and return 479001600.
Recursion in macros
There is a difference whether a macro calls itself in implementation or in expansion.
E.g., a possible implementation of the macro and is:
(defmacro my-and (&rest args)
(cond ((null args) T)
((null (cdr args)) (car args))
(t
`(if ,(car args)
(my-and ,#(cdr args))
nil))))
Note that it may expand into itself:
(macroexpand '(my-and x y z))
==> (IF X (MY-AND Y Z) NIL) ; T
As you can see, the macroexpansion contains the macro being defined.
This is not a problem, e.g., (my-and 1 2 3) correctly evaluates to 3.
However, if we try to implement a macro using itself, e.g.,
(defmacro bad-macro (code)
(1+ (bad-macro code)))
you will get an error (a stack overflow or undefined function or ...) when you try to use it, depending on the implementation.
Here's why mutually recursive macros can't work in any useful way.
Consider what a system which wants to evaluate (or compile) Lisp code for a slightly simpler Lisp than CL (so I'm avoiding some of the subtleties that happen in CL), such as the definition of a function, needs to do. It has a very small number of things it knows how to do:
it knows how to call functions;
it knows how to evaluate a few sorts of literal objects;
it has some special rules for a few sorts of forms – what CL calls 'special forms', which (again in CL-speak) are forms whose car is a special operator;
finally it knows how to look to see whether forms correspond to functions which it can call to transform the code it is trying to evaluate or compile – some of these functions are predefined but additional ones can be defined.
So the way the evaluator works is by walking over the thing it needs to evaluate looking for these source-code-transforming things, aka macros (the last case), calling their functions and then recursing on the results until it ends up with code which has none left. What's left should consist only of instances of the first three cases, which it then knows how to deal with.
So now think about what the evaluator has to do if it is evaluating the definition of the function corresponding to a macro, called a. In Cl-speak it is evaluating or compiling a's macro function (which you can get at via (macro-function 'a) in CL). Let's assume that at some point there is a form (b ...) in this code, and that b is known also to correspond to a macro.
So at some point it comes to (b ...), and it knows that in order to do this it needs to call b's macro function. It binds suitable arguments and now it needs to evaluate the definition of the body of that function ...
... and when it does this it comes across an expression like (a ...). What should it do? It needs to call a's macro function, but it can't, because it doesn't yet know what it is, because it's in the middle of working that out: it could start trying to work it out again, but this is just a loop: it's not going to get anywhere where it hasn't already been.
Well, there's a horrible trick you could do to avoid this. The infinite regress above happens because the evaluator is trying to expand all of the macros ahead of time, and so there's no base to the recursion. But let's assume that the definition of a's macro function has code which looks like this:
(if <something>
(b ...)
<something not involving b>)
Rather than doing the expand-all-the-macros-first trick, what you could do is to expand only the macros you need, just before you need their results. And if <something> turned out always to be false, then you never need to expand (b ...), so you never get into this vicious loop: the recursion bottoms out.
But this means you must always expand macros on demand: you can never do it ahead of time, and because macros expand to source code you can never compile. In other words a strategy like this is not compatible with compilation. It also means that if <something> ever turns out to be true then you'll end up in the infinite regress again.
Note that this is completely different to macros which expand to code which involves the same macro, or another macro which expands into code which uses it. Here's a definition of a macro called et which does that (it doesn't need to do this of course, this is just to see it happen):
(defmacro et (&rest forms)
(if (null forms)
't
`(et1 ,(first forms) ,(rest forms))))
(defmacro et1 (form more)
(let ((rn (make-symbol "R")))
`(let ((,rn ,form))
(if ,rn
,rn
(et ,#more)))))
Now (et a b c) expands to (et1 a (b c)) which expands to (let ((#:r a)) (if #:r #:r (et b c))) (where all the uninterned things are the same thing) and so on until you get
(let ((#:r a))
(if #:r
#:r
(let ((#:r b))
(if #:r
#:r
(let ((#:r c))
(if #:r
#:r
t))))))
Where now not all the uninterned symbols are the same
And with a plausible macro for let (let is in fact a special operator in CL) this can get turned even further into
((lambda (#:r)
(if #:r
#:r
((lambda (#:r)
(if #:r
#:r
((lambda (#:r)
(if #:r
#:r
t))
c)))
b)))
a)
And this is an example of 'things the system knows how to deal with': all that's left here is variables, lambda, a primitive conditional and function calls.
One of the nice things about CL is that, although there is a lot of useful sugar, you can still poke around in the guts of things if you like. And in particular, you still see that macros are just functions that transform source code. The following does exactly what the defmacro versions do (not quite: defmacro does the necessary cleverness to make sure the macros are available early enough: I'd need to use eval-when to do that with the below):
(setf (macro-function 'et)
(lambda (expression environment)
(declare (ignore environment))
(let ((forms (rest expression)))
(if (null forms)
't
`(et1 ,(first forms) ,(rest forms))))))
(setf (macro-function 'et1)
(lambda (expression environment)
(declare (ignore environment))
(destructuring-bind (_ form more) expression
(declare (ignore _))
(let ((rn (make-symbol "R")))
`(let ((,rn ,form))
(if ,rn
,rn
(et ,#more)))))))
There have been historic Lisp systems that allow this, at least in interpreted code.
We can allow a macro to use itself for its own definition, or two or more macros to mutually use each other, if we follow an extremely late expansion strategy.
That is to say, our macro system expands a macro call just before it is evaluated (and does that each time that same expression is evaluated).
(Such a macro expansion strategy is good for interactive development with macros. If you fix a buggy macro, then all code depending on it automatically benefits from the change, without having to be re-processed in any way.)
Under such a macro system, suppose we have a conditional like this:
(if (condition)
(macro1 ...)
(macro2 ...))
When (condition) is evaluated, then if it yields true, (macro1 ...) is evaluated, otherwise (macro2 ...). But evaluation also means expansion. Thus only one of these two macros is expanded.
This is the key to why mutual references among macros can work: we are able rely on the conditional logic to give us not only conditional evaluation, but conditional expansion also, which then allows the recursion to have ways of terminating.
For example, suppose macro A's body of code is defined with the help of macro B, and vice versa. And when a particular invocation of A is executed, it happens to hit the particular case that requires B, and so that B call is expanded by invocation of macro B. B also hits the code case that depends on A, and so it recurses into A to obtain the needed expansion. But, this time, A is called in a way that avoids requiring, again, an expansion of B; it avoids evaluating any sub-expression containing the B macro. Thus, it calculates the expansion, and returns it to B, which then calculates its expansion returns to the outermost A. A finally expands and the recursion terminates; all is well.
What blocks macros from using each other is the unconditional expansion strategy: the strategy of fully expanding entire top-level forms after they are read, so that the definitions of functions and macros contain only expanded code. In that situation there is no possibility of conditional expansion that would allow for the recursion to terminate.
Note, by the way, that a macro system which expands late doesn't recursively expand macros in a macro expansion. Suppose (mac1 x y) expands into (if x (mac2 y) (mac3 y)). Well, that's all the expansion that is done for now: the if that pops out is not a macro, so expansion stops, and evaluation proceeds. If x yields true, then mac2 is expanded, and mac3 is not.

Custom self-quoting forms: Useful?

Lisps often declare, that certain types are self-evaluating. E.g. in emacs-lisp numbers, "strings", :keyword-symbols and some more evaluate to themselves.
Or, more specifically: Evaluating the form and evaluating the result again gives the same result.
It is also possible to create custom self-evaluating forms, e.g.
(defun my-list (&rest args)
(cons 'my-list (mapcar (lambda (e) (list 'quote e)) args)))
(my-list (+ 1 1) 'hello)
=> (my-list '2 'hello)
(eval (my-list (+ 1 1) 'hello))
=> (my-list '2 'hello)
Are there any practical uses for defining such forms or is this more of an esoteric concept?
I thought of creating "custom-types" as self-evaluating forms, where the evaluation may for instance perform type-checks on the arguments. When trying to use such types in my code, I usually found it inconvenient compared to simply working e.g. with plists though.
*edit* I checked again, and it seems I mixed up "self-evaluating" and "self-quoting". In emacs lisp the later term was applied to the lambda form, at least in contexts without lexical binding. Note that the lambda form does never evaluate to itself (eq), even if the result is equal.
(setq form '(lambda () 1)) ;; => (lambda () 1)
(equal form (eval form)) ;; => t
(equal (eval form) (eval (eval form))) ;; => t
(eq form (eval form)) ;; => nil
(eq (eval form) (eval (eval form))) ;; => nil
As Joshua put it in his answer: Fixed-points of the eval function (with respect to equal).
The code you presented doesn't define a type of self-evaluating form. A self evaluating form that eval would return when passed as an argument. Let's take a closer look. First, there's a function that takes some arguments and returns a new list:
(defun my-list (&rest args)
(cons 'my-list (mapcar (lambda (e) (list 'quote e)) args)))
The new list has the symbol my-list as the first elements. The remaining elements are two-element lists containing the symbol quote and the elements passed to the function:
(my-list (+ 1 1) 'hello)
;=> (my-list '2 'hello)
Now, this does give you a fixed point for eval with regard to equal, since
(eval (my-list (+ 1 1) 'hello))
;=> (my-list '2 'hello)
and
(eval (eval (my-list (+ 1 1) 'hello)))
;=> (my-list '2 'hello)
It's also the case that self-evaluating forms are fixed points with respect to equals, but in Common Lisp, a self-evaluating form is one that is a fixed point for eval with respect to eq (or perhaps eql).
The point of the language specifying self-evaluating forms is really to define what the evaluator has to do with forms. Conceptually eval would be defined something like this:
(defun self-evaluating-p (form)
(or (numberp form)
(stringp form)
(and (listp form)
(eql 2 (length form))
(eq 'quote (first form)))
; ...
))
(defun eval (form)
(cond
((self-evaluating-p form) form)
((symbolp form) (symbol-value-in-environment form))
;...
))
The point is not that a self-evaluating form is one that evaluates to an equivalent (for some equivalence relation) value, but rather one for which eval doesn't have to do any work.
Compiler Macros
While there's generally not a whole lot of use for forms that evaluate to themselves (modulo some equivalence) relation, there is one very important place where something very similar is used Common Lisp: compiler macros (emphasis added):
3.2.2.1 Compiler Macros
The function returned by compiler-macro-function is a function of two
arguments, called the expansion function. To expand a compiler macro,
the expansion function is invoked by calling the macroexpand hook with
the expansion function as its first argument, the entire compiler
macro form as its second argument, and the current compilation
environment (or with the current lexical environment, if the form is
being processed by something other than compile-file) as its third
argument. The macroexpand hook, in turn, calls the expansion function
with the form as its first argument and the environment as its second
argument. The return value from the expansion function, which is
passed through by the macroexpand hook, might either be the same form,
or else a form that can, at the discretion of the code doing the
expansion, be used in place of the original form.
Macro DEFINE-COMPILER-MACRO
Unlike an ordinary macro, a compiler macro can decline to provide an expansion merely by returning a form that is the same as the original
(which can be obtained by using &whole).
As an example:
(defun exponent (base power)
"Just like CL:EXPT, but with a longer name."
(expt base power))
(define-compiler-macro exponent (&whole form base power)
"A compiler macro that replaces `(exponent base 2)` forms
with a simple multiplication. Other invocations are left the same."
(if (eql power 2)
(let ((b (gensym (string '#:base-))))
`(let ((,b ,base))
(* ,b ,b)))
form))
Note that this isn't quite the same as a self-evaluating form, because the compiler is still going through the process of checking whether a form is a cons whose car has an associated compiler macro, and then calling that compiler macro function with the form. But it's similar in that the form goes to something and the case where the same form comes back is important.
What you describe and self-evaluating forms (not types!) is unrelated.
? (list (foo (+ 1 2)))
may evaluate to
-> (foo 3)
But that's running the function foo and it is returning some list with the symbol foo and its first argument value. Nothing more. You've written a function. But not a custom self evaluating form.
A form is some data meant to be evaluated. It needs to be valid Lisp code.
About Evaluation of Forms:
Evaluation of forms is a topic when you have source like this:
(defun foo ()
(list #(1 2 3)))
What's with the above vector? Does (foo) return a list with the vector as its first element?
In Common Lisp such vector forms are self-evaluating. In some other Lisps it was different. In some older Lisp dialect one probably had to write the code below to make the compiler happy. It might even be different with an interpreter. (I've seen this loooong ago in some implementation of a variant of Standard Lisp).
(defun foo ()
(list '#(1 2 3))) ; a vector form quoted
Note the quote. Non-self evaluating forms had to be quoted. That's relatively easy to do. You have to look at the source code and make sure that such forms are quoted. But there is another problem which makes it more difficult. Such data objects could have been introduced by macros in the code. Thus one also had to make sure that all code generated by macros has all literal data quoted. Which makes it a real pain.
This was wrong in some other Lisp dialect (not in Common Lisp):
(defmacro foo (a)
(list 'list a #(1 2 3)))
or even (note the added quote)
(defmacro foo (a)
(list 'list a '#(1 2 3)))
Using
(foo 1)
would be the code (list 1 #(1 2 3)). But in these Lisps there would be a quote missing... so it was wrong there.
One had to write:
(defmacro foo (a)
(list 'list a ''#(1 2 3))) ; note the double quote
Thus
(foo 1)
would be the code (list 1 '#(1 2 3)). Which then works.
To get rid of such problems, Lisp dialects like Common Lisp required that all forms other than symbols and conses are self evaluating. See the CL standard: Self-Evaluating Objects. This is also independent of using an interpreter or compiler.
Note that Common Lisp also provides no mechanism to change that.
What could be done with a custom mechanim? One could let data forms evaluate to something different. Or one could implement different evaluation schemes. But there is nothing like that in Common Lisp. Basically we've got symbols as variables, conses as special forms / functions / macros and the rest is self-evaluating. For anything different you would need to write a custom evaluator/compiler.

SICP: Can or be defined in lisp as a syntactic transformation without gensym?

I am trying to solve the last part of question 4.4 of the Structure and Interpretation of computer programming; the task is to implement or as a syntactic transformation. Only elementary syntactic forms are defined; quote, if, begin, cond, define, apply and lambda.
(or a b ... c) is equal to the first true value or false if no value is true.
The way I want to approach it is to transform for example (or a b c) into
(if a a (if b b (if c c false)))
the problem with this is that a, b, and c would be evaluated twice, which could give incorrect results if any of them had side-effects. So I want something like a let
(let ((syma a))
(if syma syma (let ((symb b))
(if symb symb (let ((symc c))
(if (symc symc false)) )) )) )
and this in turn could be implemented via lambda as in Exercise 4.6. The problem now is determining symbols syma, symb and symc; if for example the expression b contains a reference to the variable syma, then the let will destroy the binding. Thus we must have that syma is a symbol not in b or c.
Now we hit a snag; the only way I can see out of this hole is to have symbols that cannot have been in any expression passed to eval. (This includes symbols that might have been passed in by other syntactic transformations).
However because I don't have direct access to the environment at the expression I'm not sure if there is any reasonable way of producing such symbols; I think Common Lisp has the function gensym for this purpose (which would mean sticking state in the metacircular interpreter, endangering any concurrent use).
Am I missing something? Is there a way to implement or without using gensym? I know that Scheme has it's own hygenic macro system, but I haven't grokked how it works and I'm not sure whether it's got a gensym underneath.
I think what you might want to do here is to transform to a syntactic expansion where the evaluation of the various forms aren't nested. You could do this, e.g., by wrapping each form as a lambda function and then the approach that you're using is fine. E.g., you can do turn something like
(or a b c)
into
(let ((l1 (lambda () a))
(l2 (lambda () b))
(l3 (lambda () c)))
(let ((v1 (l1)))
(if v1 v1
(let ((v2 (l2)))
(if v2 v2
(let ((v3 (l3)))
(if v3 v3
false)))))))
(Actually, the evaluation of the lambda function calls are still nested in the ifs and lets, but the definition of the lambda functions are in a location such that calling them in the nested ifs and lets doesn't cause any difficulty with captured bindings.) This doesn't address the issue of how you get the variables l1–l3 and v1–v3, but that doesn't matter so much, none of them are in scope for the bodies of the lambda functions, so you don't need to worry about whether they appear in the body or not. In fact, you can use the same variable for all the results:
(let ((l1 (lambda () a))
(l2 (lambda () b))
(l3 (lambda () c)))
(let ((v (l1)))
(if v v
(let ((v (l2)))
(if v v
(let ((v (l3)))
(if v v
false)))))))
At this point, you're really just doing loop unrolling of a more general form like:
(define (functional-or . functions)
(if (null? functions)
false
(let ((v ((first functions))))
(if v v
(functional-or (rest functions))))))
and the expansion of (or a b c) is simply
(functional-or (lambda () a) (lambda () b) (lambda () c))
This approach is also used in an answer to Why (apply and '(1 2 3)) doesn't work while (and 1 2 3) works in R5RS?. And none of this required any GENSYMing!
In SICP you are given two ways of implementing or. One that handles them as special forms which is trivial and one as derived expressions. I'm unsure if they actually thought you would see this as a problem, but you can do it by implementing gensym or altering variable? and how you make derived variables like this:
;; a unique tag to identify special variables
(define id (vector 'id))
;; a way to make such variable
(define (make-var x)
(list id x))
;; redefine variable? to handle macro-variables
(define (variable? exp)
(or (symbol? exp)
(tagged-list? exp id)))
;; makes combinations so that you don't evaluate
;; every part twice in case of side effects (set!)
(define (or->combination terms)
(if (null? terms)
'false
(let ((tmp (make-var 'tmp)))
(list (make-lambda (list tmp)
(list (make-if tmp
tmp
(or->combination (cdr terms)))))
(car terms)))))
;; My original version
;; This might not be good since it uses backquotes not introduced
;; until chapter 5 and uses features from exercise 4.6
;; Though, might be easier to read for some so I'll leave it.
(define (or->combination terms)
(if (null? terms)
'false
(let ((tmp (make-var 'tmp)))
`(let ((,tmp ,(car terms)))
(if ,tmp
,tmp
,(or->combination (cdr terms)))))))
How it works is that make-var creates a new list every time it is called, even with the same argument. Since it has id as it's first element variable? will identify it as a variable. Since it's a list it will only match in variable lookup with eq? if it is the same list, so several nested or->combination tmp-vars will all be seen as different by lookup-variable-value since (eq? (list) (list)) => #f and special variables being lists they will never shadow any symbol in code.
This is influenced by eiod, by Al Petrofsky, which implements syntax-rules in a similar manner. Unless you look at others implementations as spoilers you should give it a read.