Correct me if I'm wrong, but there is nothing like gensym in Java, C, C++, Python, Javascript, or any of the other languages I've used, and I've never seemed to need it. Why is it necessary in Lisp and not in other langauges? For clarification, I'm learning Common Lisp.
Common Lisp has a powerful macro system. You can make new syntax patterns that behave exactly the way you want them to behave. It's even expressed in its own language, making everything in the language available to transform the code from what you want to write to something that CL actually understands. All languages with powerful macro systems provide gensym or do it implicitly in their macro implementation.
In Common Lisp you use gensym when you want to make code where the symbol shouldn't match elements used any other places in the result. Without it there is no guarantee that a user uses a symbol that the macro implementer also use and they start to interfere and the result is something different than the intended behavior. It makes sure nested expansions of the same macro don't interfere with previous expansions. With the Common Lisp macro system it's possible to make more restrictive macro systems similar to Scheme syntax-rules and syntax-case.
In Scheme there are several macro systems. One with pattern matching where new introduced symbols act automatically as if they are made with gensym. syntax-case will also by default make new symbols as if they were made with gensym and there is also a way to reduce hygiene. You can make CL defmacro with syntax-case but since Scheme doesn't have gensym you wouldn't be able to make hygienic macros with it.
Java, C, C++, Python, Javascript are all Algol dialects and none of them have other than simple template based macros. Thus they don't have gensym because they don't need it. Since the only way to introduce new syntax in these languages is to wish next version of it will provide it.
There are two Algol dialects with powerful macros that come to mind. Nemerle and Perl6. Both of them have hygienic approach, meaning variables introduced behave as if they are made with gensym.
In CL, Scheme, Nemerle, Perl6 you don't need to wait for language features. You can make them yourself! The news in both Java and PHP are easily implemented with macros in any of them should it not already be available.
Can't say which languages have an equivalent of GENSYM. Many languages don't have a first-class symbol data type (with interned and uninterned symbols) and many are not providing similar code generation (macros, ...) facilities.
An interned symbol is registered in a package. An uninterned is not. If the reader (the reader is the Lisp subsystem which takes textual s-expressions as input and returns data) sees two interned symbols in the same package and with the same name, it assumes that it is the same symbol:
CL-USER 35 > (eq 'cl:list 'cl:list)
T
If the reader sees an uninterned symbol, it creates a new one:
CL-USER 36 > (eq '#:list '#:list)
NIL
Uninterned symbols are written with #: in front of the name.
GENSYM is used in Lisp to create numbered uninterned symbols, because it is sometimes useful in code generation and then debugging this code. Note that the symbols are always new and not eq to anything else. But the symbol name could be the same as the name of another symbol. The number gives a clue to the human reader about the identity.
An example using MAKE-SYMBOL
make-symbol creates a new uninterned symbol using a string argument as its name.
Let's see this function generating some code:
CL-USER 31 > (defun make-tagbody (exp test)
(let ((start-symbol (make-symbol "start"))
(exit-symbol (make-symbol "exit")))
`(tagbody ,start-symbol
,exp
(if ,test
(go ,start-symbol)
(go ,exit-symbol))
,exit-symbol)))
MAKE-TAGBODY
CL-USER 32 > (pprint (make-tagbody '(incf i) '(< i 10)))
(TAGBODY
#:|start| (INCF I)
(IF (< I 10) (GO #:|start|) (GO #:|exit|))
#:|exit|)
Above generated code uses uninterned symbols. Both #:|start| are actually the same symbol. We would see this if we would have *print-circle* to T, since the printer then would clearly label identical objects. But here we don't get this added information. Now if you nest this code, then you would see more than the one start and one exit symbol, each which was used in two places.
An example using GENSYM
Now let's use gensym. Gensym also creates an uninterned symbol. Optionally this symbol is named by a string. A number (see the variable CL:*GENSYM-COUNTER*) is added.
CL-USER 33 > (defun make-tagbody (exp test)
(let ((start-symbol (gensym "start"))
(exit-symbol (gensym "exit")))
`(tagbody ,start-symbol
,exp
(if ,test
(go ,start-symbol)
(go ,exit-symbol))
,exit-symbol)))
MAKE-TAGBODY
CL-USER 34 > (pprint (make-tagbody '(incf i) '(< i 10)))
(TAGBODY
#:|start213051| (INCF I)
(IF (< I 10) (GO #:|start213051|) (GO #:|exit213052|))
#:|exit213052|)
Now the number is an indicator that the two uninterned #:|start213051| symbols are actually the same. When the code would be nested, the new version of the start symbol would have a different number:
CL-USER 7 > (pprint (make-tagbody `(progn
(incf i)
(setf j 0)
,(make-tagbody '(incf ij) '(< j 10)))
'(< i 10)))
(TAGBODY
#:|start2756| (PROGN
(INCF I)
(SETF J 0)
(TAGBODY
#:|start2754| (INCF IJ)
(IF (< J 10)
(GO #:|start2754|)
(GO #:|exit2755|))
#:|exit2755|))
(IF (< I 10) (GO #:|start2756|) (GO #:|exit2757|))
#:|exit2757|)
Thus it helps understanding generated code, without the need to turn *print-circle* on, which would label the identical objects:
CL-USER 8 > (let ((*print-circle* t))
(pprint (make-tagbody `(progn
(incf i)
(setf j 0)
,(make-tagbody '(incf ij) '(< j 10)))
'(< i 10))))
(TAGBODY
#3=#:|start1303| (PROGN
(INCF I)
(SETF J 0)
(TAGBODY
#1=#:|start1301| (INCF IJ)
(IF (< J 10) (GO #1#) (GO #2=#:|exit1302|))
#2#))
(IF (< I 10) (GO #3#) (GO #4=#:|exit1304|))
#4#)
Above is readable for the Lisp reader (the subsystem which reads s-expressions for textual representations), but a bit less for the human reader.
I believe that symbols (in the Lisp sense) are mostly useful in homoiconic languages (those in which the syntax of the language is representable as a data of that language).
Java, C, C++, Python, Javascript are not homoiconic.
Once you have symbols, you want some way to dynamically create them. gensym is a possibility, but you can also intern them.
BTW, MELT is a lisp-like dialect, it does not create symbols with gensym or by interning strings but with clone_symbol. (actually MELT symbols are instances of predefined CLASS_SYMBOL, ...).
gensym is available as a predicate in most of Prolog interpreters. You can find it in the eponym library.
Related
A book [1] that I am reading says this:
One of the most interesting developments in programming languages has
been the creation of extensible languages—languages whose syntax and
semantics can be changed within a program. One of the earliest and
most commonly proposed schemes for language extension is the macro
definition.
Would you give an example (along with an explanation) of a Lisp macro that extends the syntax and semantics of the Lisp programming language, please?
[1] The Theory of Parsing, Translation, and Compiling, Volume 1 Parsing by Aho and Ullman, page 58.
Picture the scene: it is 1958, and FORTRAN has just been invented. Lit only by the afterglow of atomic tests, primitive Lisp programmers are writing loops in primitive Lisp the way primitive FORTRAN programmers always had:
(prog ((i 0)) ;i is 0
start ;label beginning of loop
(if (>= i 10)
(go end)) ;skip to end when finished
(do-hard-sums-on i) ;hard sums!
(setf i (+ i 1)) ;increment i
(go start) ;jump to start
end) ;end
(Except, of course this would all be in CAPITALS because lower-case had not been invented then, and the thing I wrote as setf would be something uglier, because setf (a macro!) also had not been invented then).
Enter, in clouds of only-slightly toxic smoke from their jetpack, another Lisp programmer who had escaped to 1958 from the future. 'Look', they said, 'we could write this strange future thing':
(defmacro sloop ((var init limit &optional (step 1)) &body forms)
(let ((<start> (make-symbol "START")) ;avoid hygiene problems ...
(<end> (make-symbol "END"))
(<limit> (make-symbol "LIMIT")) ;... and multiple evaluation problems
(<step> (make-symbol "STEP")))
`(prog ((,var ,init)
(,<limit> ,limit)
(,<step> ,step))
,<start>
(if (>= ,var ,<limit>)
(go ,<end>))
,#forms
(setf ,var (+ ,var ,<step>))
(go ,<start>)
,<end>)))
'And now', they say, 'you can write this':
(sloop (i 0 10)
(do-hard-sums i))
And thus were simple loops invented.
Back here in the future we can see what this loop expands into:
(sloop (i 0 10)
(format t "~&i = ~D~%" i))
->
(prog ((i 0) (#:limit 10) (#:step 1))
#:start (if (>= i #:limit) (go #:end))
(format t "~&i = ~D~%" i)
(setf i (+ i #:step))
(go #:start)
#:end)
Which is the code that the primitive Lisp programmers used to type in by hand. And we can run this:
> (sloop (i 0 10)
(format t "~&i = ~D~%" i))
i = 0
i = 1
i = 2
i = 3
i = 4
i = 5
i = 6
i = 7
i = 8
i = 9
nil
And in fact this is how loops work in Lisp today. If I try a simple do loop, one of Common Lisp's predefined macros, we can see what it expands into:
(do ((i 0 (+ i 1)))
((>= i 10))
(format t "~&i = ~D~%" i))
->
(block nil
(let ((i 0))
(declare (ignorable i))
(declare)
(tagbody
#:g1481 (if (>= i 10) (go #:g1480))
(tagbody (format t "~&i = ~D~%" i)
(setq i (+ i 1)))
(go #:g1481)
#:g1480)))
Well, this expansion is not the same and it uses constructs I haven't talked about, but you can see the important thing: this loop has been rewritten to use GO. And, although Common Lisp does not define the expansion of it's looping macros, it is almost certainly the case that all the standard ones expand into something like this (but more complicated in general).
In other words: Lisp doesn't have any primitive looping constructs, but all such constructs are added to the language by macros. These macros as well as other macros to extend the language in other ways, can be written by users: they do not have to be provided by the language itself.
Lisp is a programmable programming language.
Well, perhaps the explanations will be terse, but you could look at the macros used in the lisp language itself, defun, for example.
http://clhs.lisp.se/Body/m_defun.htm
In lisp, macros are a big part of the language itself, basically allowing you to rewrite code before it's compiled.
There's more than just macros defined by defmacro. There are also Reader Macros ! as Paul Graham said in On Lisp :
The three big moments in a Lisp expression’s life are read-time,
compile-time, and runtime. Functions are in control at runtime. Macros
give us a chance to perform transformations on programs at
compile-time. …read-macros… do their work at read-time.
Macros and read-macros see your program at different stages. Macros
get hold of the program when it has already been parsed into Lisp
objects by the reader, and read-macros operate on a program while it
is still text. However, by invoking read on this text, a read-macro
can, if it chooses, get parsed Lisp objects as well. Thus read-macros
are at least as powerful as ordinary macros.
With Reader Macros, you can define new semantics way beyond ordinary macros, like for example :
adds support for string interpolation ( cl-interpol )
adds support for JSON directly into the language : See this article to know more.
This macro to implement a C-like for-loop in Lisp is mentioned on this page: https://softwareengineering.stackexchange.com/questions/124930/how-useful-are-lisp-macros
(defmacro for-loop [[sym init check change :as params] & steps]
`(loop [~sym ~init value# nil]
(if ~check
(let [new-value# (do ~#steps)]
(recur ~change new-value#))
value#)))
So than one can use following in code:
(for-loop [i 0 , (< i 10) , (inc i)]
(println i))
How can I convert this macro to be used in Racket language?
I am trying following code:
(define-syntax (for-loop) (syntax-rules (parameterize ((sym) (init) (check) (change)) & steps)
`(loop [~sym ~init value# nil]
(if ~check
(let [new-value# (do ~#steps)]
(recur ~change new-value#))
value#))))
But it give "bad syntax" error.
The snippet of code you have included in your question is written in Clojure, which is one of the many dialects of Lisp. Racket, on the other hand, is descended from Scheme, which is quite a different language from Clojure! Both have macros, yes, but the syntax is going to be a bit different between the two languages.
The Racket macro system is quite powerful, but syntax-rules is actually a slightly simpler way to define macros. Fortunately, for this macro, syntax-rules will suffice. A more or less direct translation of the Clojure macro to Racket would look like this:
(define-syntax-rule (for-loop [sym init check change] steps ...)
(let loop ([sym init]
[value #f])
(if check
(let ([new-value (let () steps ...)])
(loop change new-value))
value)))
It could subsequently be invoked like this:
(for-loop [i 0 (< i 10) (add1 i)]
(println i))
There are a number of changes from the Clojure code:
The Clojure example uses ` and ~ (pronounced “quasiquote” and “unquote” respectively) to “interpolate” values into the template. The syntax-rules form performs this substitution automatically, so there is no need to explicitly perform quotation.
The Clojure example uses names that end in a hash (value# and new-value#) to prevent name conflicts, but Racket’s macro system is hygienic, so that sort of escaping is entirely unnecessary—identifiers bound within macros automatically live in their own scope by default.
The Clojure code uses loop and recur, but Racket supports tail recursion, so the translation just uses “named let”, which is really just some extremely simple sugar for an immediately invoked lambda that calls itself.
There are a few other minor syntactic differences, such as using let instead of do, using ellipses instead of & steps to mark multiple occurrences, the syntax of let, and the use of #f instead of nil to represent the absence of a value.
Finally, commas are not used in the actual use of the for-loop macro because , means something different in Racket. In Clojure, it is treated as whitespace, so it’s totally optional there, too, but in Racket, it would be a syntax error.
A full macro tutorial is well outside the scope of a single Stack Overflow post, though, so if you’re interested in learning more, take a look at the Macros section of the Racket guide.
It’s also worth noting that an ordinary programmer would not need to implement this sort of macro themselves, given that Racket already provides a set of very robust for loops and comprehensions built into the language. In truth, though, they are just defined as macros themselves—there is no special magic just because they are builtins.
Racket’s for loops do not look like traditional C-style for loops, however, because C-style for loops are extremely imperative. On the other hand, Scheme, and therefore Racket, tends to favor a functional style, which avoids mutation and often looks more declarative. Therefore, Racket’s loops attempt to describe higher-level iteration patterns, such as looping through a range of numbers or iterating through a list, rather than low-level semantics like describing how a value should be updated. Of course, if you really want something like that, Racket provides the do loop, which is almost identical to the for-loop macro defined above, albeit with some minor differences.
I want to expand on Alexis's excellent answer a bit. Here's an example usage that demonstrates what she means by do being almost identical to your for-loop:
(do ([i 0 (add1 i)])
((>= i 10) i)
(println i))
This do expression actually expands to the following code:
(let loop ([i 0])
(if (>= i 10)
i
(let ()
(println i)
(loop (add1 i)))))
The above version uses a named let, which is considered the conventional way to write loops in Scheme.
Racket also provides for comprehensions, also mentioned in Alexis's answer, which are also considered conventional, and here's how it'd look like:
(for ([i (in-range 10)])
(println i))
(except that this doesn't actually return the final value of i).
I want to rewrite on Alexis's excellent answer and Chris Jester-Young's excellent answer for people not familiar with let.
#lang racket
(define-syntax-rule (for-loop [var init check change] expr ...)
(local [(define (loop var value)
(if check
(loop change (begin expr ...))
value))]
(loop init #f)))
(for-loop [i 0 (< i 10) (add1 i)]
(println i))
I've written an ad hoc parser generator that creates code to convert an old and little known 7-bit character set into unicode. The call to the parser generator expands into a bunch of defuns enclosed in a progn, which then get compiled. I only want to expose one of the generated defuns--the top-level one--to the rest of the system; all the others are internal to the parser and only get called from within the dynamic scope of the top-level one. Therefore, the other defuns generated have uninterned names (created with gensym). This strategy works fine with SBCL, but I recently tested it for the first time with CLISP, and I get errors like:
*** - FUNCALL: undefined function #:G16985
It seems that CLISP can't handle functions with uninterned names. (Interestingly enough, the system compiled without a problem.) EDIT: It seems that it can handle functions with uninterned names in most cases. See the answer by Rörd below.
My questions is: Is this a problem with CLISP, or is it a limitation of Common Lisp that certain implementations (e.g. SBCL) happen to overcome?
EDIT:
For example, the macro expansion of the top-level generated function (called parse) has an expression like this:
(PRINC (#:G75735 #:G75731 #:G75733 #:G75734) #:G75732)
Evaluating this expression (by calling parse) causes an error like the one above, even though the function is definitely defined within the very same macro expansion:
(DEFUN #:G75735 (#:G75742 #:G75743 #:G75744) (DECLARE (OPTIMIZE (DEBUG 2)))
(DECLARE (LEXER #:G75742) (CONS #:G75743 #:G75744))
(MULTIPLE-VALUE-BIND (#:G75745 #:G75746) (POP-TOKEN #:G75742)
...
The two instances of #:G75735 are definitely the same symbol--not two different symbols with the same name. As I said, this works with SBCL, but not with CLISP.
EDIT:
SO user Joshua Taylor has pointed out that this is due to a long standing CLISP bug.
You don't show one of the lines that give you the error, so I can only guess, but the only thing that could cause this problem as far as I can see is that you are referring to the name of the symbol instead of the symbol itself when trying to call it.
If you were referring to the symbol itself, all your lisp implementation would have to do is lookup that symbol's symbol-function. Whether it's interned or not couldn't possibly matter.
May I ask why you haven't considered another way to hide the functions, i.e. a labels statement or defining the functions within a new package that exports only the one external function?
EDIT: The following example is copied literally from an interaction with the CLISP prompt.
As you can see, calling the function named by a gensym is working as expected.
[1]> (defmacro test ()
(let ((name (gensym)))
`(progn
(defun ,name () (format t "Hello!"))
(,name))))
TEST
[2]> (test)
Hello!
NIL
Maybe your code that's trying to call the function gets evaluated before the defun? If there's any code in the macro expansion besides the various defuns, it may be implementation-dependent what gets evaluated first, and so the behaviour of SBCL and CLISP may differ without any of them violating the standard.
EDIT 2: Some further investigation shows that CLISP's behaviour varies depending upon whether the code is interpreted directly or whether it's first compiled and then interpreted. You can see the difference by either directly loading a Lisp file in CLISP or by first calling compile-file on it and then loading the FASL.
You can see what's going on by looking at the first restart that CLISP offers. It says something like "Input a value to be used instead of (FDEFINITION '#:G3219)." So for compiled code, CLISP quotes the symbol and refers to it by name.
It seems though that this behaviour is standard-conforming. The following definition can be found in the HyperSpec:
function designator n. a designator for a function; that is, an object that denotes a function and that is one of: a symbol (denoting the function named by that symbol in the global environment), or a function (denoting itself). The consequences are undefined if a symbol is used as a function designator but it does not have a global definition as a function, or it has a global definition as a macro or a special form. See also extended function designator.
I think an uninterned symbol matches the "a symbol is used as a function designator but it does not have a global definition as a function" case for unspecified consequences.
EDIT 3: (I can agree that I'm not sure whether CLISP's behaviour is a bug or not. Someone more experienced with details of the standard's terminology should judge this. It comes down to whether the function cell of an uninterned symbol - i.e. a symbol that cannot be referred to by name, only by having a direct hold on the symbol object - would be considered a "global definition" or not)
Anyway, here's an example solution that solves the problem in CLISP by interning the symbols in a throwaway package, avoiding the matter of uninterned symbols:
(defmacro test ()
(let* ((pkg (make-package (gensym)))
(name (intern (symbol-name (gensym)) pkg)))
`(progn
(defun ,name () (format t "Hello!"))
(,name))))
(test)
EDIT 4: As Joshua Taylor notes in a comment to the question, this seems to be a case of the (10 year old) CLISP bug #180.
I've tested both workarounds suggested in that bug report and found that replacing the progn with locally actually doesn't help, but replacing it with let () does.
You can most certainly define functions whose names are uninterned symbols. For instance:
CL-USER> (defun #:foo (x)
(list x))
#:FOO
CL-USER> (defparameter *name-of-function* *)
*NAME-OF-FUNCTION*
CL-USER> *name-of-function*
#:FOO
CL-USER> (funcall *name-of-function* 3)
(3)
However, the sharpsign colon syntax introduces a new symbol each time such a form is read read:
#: introduces an uninterned symbol whose name is symbol-name. Every time this syntax is encountered, a distinct uninterned symbol is created. The symbol-name must have the syntax of a symbol with no package prefix.
This means that even though something like
CL-USER> (list '#:foo '#:foo)
;=> (#:FOO #:FOO)
shows the same printed representation, you actually have two different symbols, as the following demonstrates:
CL-USER> (eq '#:foo '#:foo)
NIL
This means that if you try to call such a function by typing #: and then the name of the symbol naming the function, you're going to have trouble:
CL-USER> (#:foo 3)
; undefined function #:foo error
So, while you can call the function using something like the first example I gave, you can't do this last one. This can be kind of confusing, because the printed representation makes it look like this is what's happening. For instance, you could write such a factorial function like this:
(defun #1=#:fact (n &optional (acc 1))
(if (zerop n) acc
(#1# (1- n) (* acc n))))
using the special reader notation #1=#:fact and #1# to later refer to the same symbol. However, look what happens when you print that same form:
CL-USER> (pprint '(defun #1=#:fact (n &optional (acc 1))
(if (zerop n) acc
(#1# (1- n) (* acc n)))))
(DEFUN #:FACT (N &OPTIONAL (ACC 1))
(IF (ZEROP N)
ACC
(#:FACT (1- N) (* ACC N))))
If you take that printed output, and try to copy and paste it as a definition, the reader creates two symbols named "FACT" when it comes to the two occurrences of #:FACT, and the function won't work (and you might even get undefined function warnings):
CL-USER> (DEFUN #:FACT (N &OPTIONAL (ACC 1))
(IF (ZEROP N)
ACC
(#:FACT (1- N) (* ACC N))))
; in: DEFUN #:FACT
; (#:FACT (1- N) (* ACC N))
;
; caught STYLE-WARNING:
; undefined function: #:FACT
;
; compilation unit finished
; Undefined function:
; #:FACT
; caught 1 STYLE-WARNING condition
I hope I get the issue right. For me it works in CLISP.
I tried it like this: using a macro for creating a function with a GENSYM-ed name.
(defmacro test ()
(let ((name (gensym)))
`(progn
(defun ,name (x) (* x x))
',name)))
Now I can get the name (setf x (test)) and call it (funcall x 2).
Yes, it is perfectly fine defining functions that have names that are unintenred symbols. The problem is that you cannot then call them "by name", since you can't fetch the uninterned symbol by name (that is what "uninterned" means, essentially).
You would need to store the uninterned symbol in some sort of data structure, to then be able to fetch the symbol. Alternatively, store the defined function in some sort of data structure.
Surprisingly, CLISP bug 180 isn't actually an ANSI CL conformance bug. Not only that, but evidently, ANSI Common Lisp is itself so broken in this regard that even the progn based workaround is a courtesy of the implementation.
Common Lisp is a language intended for compilation, and compilation produces issues regarding the identity of objects which are placed into compiled files and later loaded ("externalized" objects). ANSI Common Lisp requires that literal objects reproduced from compiled files are only similar to the original objects. (CLHS 3.2.4 Literal Objects in Compiled Files).
Firstly, according to the definition similarity (3.2.4.2.2 Definition of Similarity), the rules for uninterned symbols is that similarity is name based. If we compile code with a literal that contains an uninterned symbol, then when we load the compiled file, we get a symbol which is similar and not (necessarily) the same object: a symbol which has the same name.
What if the same uninterned symbol is inserted into two different top-level forms which are then compiled as a file? When the file is loaded, are those two similar to each other at least? No, there is no such requirement.
But it gets worse: there is also no requirement that two occurrences of the same uninterned symbol in the same form will be externalized in such a way that their relative identity is preserved: that the re-loaded version of that object will have the same symbol object in all the places where the original was. In fact, the definition of similarity contains no provision for preserving the circular structure and substructure sharing. If we have a literal like '#1=(a b . #1#), as a literal in a compiled file, there appears to be no requirement that this be reproduced as a circular object with the same graph structure as the original (a graph isomorphism). The similarity rule for conses is given as naive recursion: two conses are similar if their respective cars and cdrs are similar. (The rule can't even be evaluated for circular objects; it doesn't terminate).
That the above works is because of implementations going beyond what is required in the spec; they are providing an extension consistent with (3.2.4.3 Extensions to Similarity Rules).
Thus, purely according to ANSI CL, we cannot expect to use macros with gensyms in compiled files, at least in some ways. The expectation expressed in code like the following runs afoul of the spec:
(defmacro foo (arg)
(let ((g (gensym))
(literal '(blah ,g ,g ,arg)))
...))
(defun bar ()
(foo 42))
The bar function contains a literal with two insertions of a gensym, which according to the similarity rules for conses and symbols need not reproduce as a list containing two occurrences of the same object in the second and third positions.
If the above works as expected, it's due to "extensions to the similarity rules".
So the answer to the "Why can't CLISP ..." question is that although CLISP does provide an extension for similarity which preserves the graph structure of literal forms, it doesn't do it across the entire compiled file, only within individual top level items within that file. (It uses *print-circle* to emit the individual items.) The bug is that CLISP doesn't conform to the best possible behavior users can imagine, or at least to a better behavior exhibited by other implementations.
I'd like to learn the internals of Lisp, so I want to see how everything is implemented.
For example,
(macroexpand '(loop for i upto 10 collect i))
gives me (in SBCL)
(BLOCK NIL
(LET ((I 0))
(DECLARE (TYPE (AND NUMBER REAL) I))
(SB-LOOP::WITH-LOOP-LIST-COLLECTION-HEAD (#:LOOP-LIST-HEAD-1026
#:LOOP-LIST-TAIL-1027)
(SB-LOOP::LOOP-BODY NIL
(NIL NIL (WHEN (> I '10) (GO SB-LOOP::END-LOOP)) NIL)
((SB-LOOP::LOOP-COLLECT-RPLACD
(#:LOOP-LIST-HEAD-1026 #:LOOP-LIST-TAIL-1027)
(LIST I)))
(NIL (SB-LOOP::LOOP-REALLY-DESETQ I (1+ I))
(WHEN (> I '10) (GO SB-LOOP::END-LOOP)) NIL)
((RETURN-FROM NIL
(SB-LOOP::LOOP-COLLECT-ANSWER
#:LOOP-LIST-HEAD-1026)))))))
But LOOP-BODY, WITH-LOOP-LIST-COLLECTION-HEAD, etc. are still macros. How can I expand a macro form completely?
To see the full expansion one needs to walk the Lisp form on all levels and expand them. For this it is necessary that this so-called code walker understands Lisp syntax (and not just s-expression syntax). For example in (lambda (a b) (setf a b)), the list (a b) is a parameter list and should not be macro expanded.
Various Common Lisp implementations provide such a tool. The answer of 6502 mentions MACROEXPAND-ALL which is provided by SBCL.
If you use a development environment, it is usually provided as a command:
SLIME: M-x slime-macroexpand-all with C-c M-m
LispWorks: menu Expression > Walk or M-x Walk Form, shorter M-Sh-m.
The other answers are excellent for you question but you say you want to see how everything is implemented.
Many macros (as you know already) are implemented using macros and whilst macroexpand-all is very useful but you can lose the context of what macro was responsible for what change.
One nice middle ground (if you are using slime) is to use slime-expand-1 (C-c Enter) which shows the expansion is another buffer. You can then useslime-expand-1 inside this new buffer to expand macros in-place.
This allows you to walk the tree expanding as you read and also to use undo to close the expansions again.
For me this has been a god-send in understanding other people's macros. Hope this helps you too, have fun!
You can try to use MACROEXPAND-ALL but what you may get is not necessarily useful.
In something like LOOP the real meat is the macro itself, not the generated code.
(Note: If you're not interested in portability, SBCL provides macroexpand-all, which will do what you're after. If you're after a portable solution, read on...)
The quick-and-dirty solution would be to macroexpand the form itself, then recursively macroexpand all but the first element of the resulting list. This is an imperfect solution; it will fail completely the moment it attempts to process a let's bindings (the first argument to let, the list of bindings, is not meant to be macroexpanded, but this code will do it anyway).
;;; Quick-and-dirty macroexpand-all
(defun macroexpand* (form)
(let ((form (macroexpand form)))
(cons (car form) (mapcar #'macroexpand (cdr form)))))
A more complete solution would consider special forms specially, not macroexpanding their unevaluated arguments. I could update with such a solution, if desired.
Also, even if I can use Common Lisp, should I? Is Scheme better?
You have several answers here, but none is really comprehensive (and I'm not talking about having enough details or being long enough). First of all, the bottom line: you should not use Common Lisp if you want to have a good experience with SICP.
If you don't know much Common Lisp, then just take it as that. (Obviously you can disregard this advice as anything else, some people only learn the hard way.)
If you already know Common Lisp, then you might pull it off, but at considerable effort, and at a considerable damage to your overall learning experience. There are some fundamental issues that separate Common Lisp and Scheme, which make trying to use the former with SICP a pretty bad idea. In fact, if you have the knowledge level to make it work, then you're likely above the level of SICP anyway. I'm not saying that it's not possible -- it is of course possible to implement the whole book in Common Lisp (for example, see Bendersky's pages) just as you can do so in C or Perl or whatever. It's just going to harder with languages that are further apart from Scheme. (For example, ML is likely to be easier to use than Common Lisp, even when its syntax is very different.)
Here are some of these major issues, in increasing order of importance. (I'm not saying that this list is exhaustive in any way, I'm sure that there are a whole bunch of additional issues that I'm omitting here.)
NIL and related issues, and different names.
Dynamic scope.
Tail call optimization.
Separate namespace for functions and values.
I'll expand now on each of these points:
The first point is the most technical. In Common Lisp, NIL is used both as the empty list and as the false value. In itself, this is not a big issue, and in fact the first edition of SICP had a similar assumption -- where the empty list and false were the same value. However, Common Lisp's NIL is still different: it is also a symbol. So, in Scheme you have a clear separation: something is either a list, or one of the primitive types of values -- but in Common Lisp, NIL is not only false and the empty list: it is also a symbol. In addition to this, you get a host of slightly different behavior -- for example, in Common Lisp the head and the tail (the car and cdr) of the empty list is itself the empty list, while in Scheme you'll get a runtime error if you try that. To top it off, you have different names and naming convention, for example -- predicates in Common Lisp end by convention with P (eg, listp) while predicates in Scheme end in a question mark (eg, list?); mutators in Common Lisp have no specific convention (some have an N prefix), while in Scheme they almost always have a suffix of !. Also, plain assignment in Common Lisp is usually setf and it can operate on combinations too (eg, (setf (car foo) 1)), while in Scheme it is set! and limited to setting bound variables only. (Note that Common Lisp has the limited version too, it's called setq. Almost nobody uses it though.)
The second point is a much deeper one, and possibly one that will lead to completely incomprehensible behavior of your code. The thing is that in Common Lisp, function arguments are lexically scoped, but variables that are declared with defvar are dynamically scoped. There is a whole range of solutions that rely on lexically scoped bindings -- and in Common Lisp they just won't work. Of course, the fact that Common Lisp has lexical scope means that you can get around this by being very careful about new bindings, and possibly using macros to get around the default dynamic scope -- but again, this requires a much more extensive knowledge than a typical newbie has. Things get even worse than that: if you declare a specific name with a defvar, then that name will be bound dynamically even if they're arguments to functions. This can lead to some extremely difficult to track bugs which manifest themselves in an extremely confusing way (you basically get the wrong value, and you'll have no clue why that happens). Experienced Common Lispers know about it (especially those that have been burnt by it), and will always follow the convention of using stars around dynamically scoped names (eg, *foo*). (And by the way, in Common Lisp jargon, these dynamically scoped variables are called just "special variables" -- which is another source of confusion for newbies.)
The third point was also discussed in some of the previous comments. In fact, Rainer had a pretty good summary of the different options that you have, but he didn't explain just how hard it can make things. The thing is that proper tail-call-optimization (TCO) is one of the fundamental concepts in Scheme. It is important enough that it is a language feature rather than merely an optimization. A typical loop in Scheme is expressed as a tail-calling function (for example, (define (loop) (loop))) and proper Scheme implementations are required to implement TCO which will guarantee that this is, in fact, an infinite loop rather than running for a short while until you blow up the stack space. This is all the essence of Rainer's first non solution, and the reason he labeled it as "BAD".
His third option -- rewriting functional loops (expressed as recursive functions) as Common Lisp loops (dotimes, dolist, and the infamous loop) can work for a few simple cases, but at a very high cost: the fact that Scheme is a language that does proper TCO is not only fundamental to the language -- it is also one of the major themes in the book, so by doing so, you will have lost that point completely. In addition, there are some cases that you just cannot translate Scheme code into a Common Lisp loop construct -- for example, as you work your way through the book, you'll get to implement a meta-circular-interpreter which is an implementation of a mini-Scheme language. It takes a certain click to realize that this meta evaluator implements a language that is itself doing TCO if the language that you implement this evaluator in is itself doing TCO. (Note that I'm talking about the "simple" interpreters -- later in the book you implement this evaluator as something close to a register machine, where you kind of explicitly make it do TCO.) The bottom line to all of this, is that this evaluator -- when implemented in Common Lisp -- will result in a language that is itself not doing TCO. People who are familiar with all of this should not be surprised: after all, the "circularity" of the evaluator means that you're implementing a language with semantics that are very close to the host language -- so in this case you "inherit" the Common Lisp semantics rather than the Scheme TCO semantics. However, this means that your mini-evaluator is now crippled: it has no TCO, so it has no way of doing loops! To get loops in, you will need to implement new constructs in your interpreter, which will usually use the iteration constructs in Common Lisp. But now you're going further away from what's in the book, and you're investing considerable effort in approximately implementing the ideas in SICP to the different language. Note also that all of this is related to the previous point I raised: if you follow the book, then the language that you implement will be lexically scoped, taking it further away from the Common Lisp host language. So overall, you completely lose the "circular" property in what the book calls "meta circular evaluator". (Again, this is something that might not bother you, but it will damage the overall learning experience.) All in all, very few languages get close to Scheme in being able to implement the semantics of the language inside the language as a non-trivial (eg, not using eval) evaluator that easily.
In fact, if you do go with a Common Lisp, then in my opinion, Rainer's second suggestion -- use a Common Lisp implementation that supports TCO -- is the best way to go. However, in Common Lisp this is fundamentally a compiler optimization: so you will likely need to (a) know about the knobs in the implementation that you need to turn to make TCO happen, (b) you will need to make sure that the Common Lisp implementation is actually doing proper TCO, and not just optimization of self calls (which is the much simpler case that is not nearly as important), (c) you would hope that the Common Lisp implementation that does TCO can do so without damaging debugging options (again, since this is considered an optimization in Common Lisp, then turning this knob on, might also be taken by the compiler as saying "I don't care much for debuggability").
Finally, my last point is not too hard to overcome, but it is conceptually the most important one. In Scheme, you have a uniform rule: identifiers have a value, which is determined lexically -- and that's it. It's a very simple language. In Common Lisp, in addition to the historical baggage of sometimes using dynamic scope and sometimes using lexical scope, you have symbols that have two different value -- there's the function value that is used whenever a variable appears at the head of an expression, and there is a different value that is used otherwise. For example, in (foo foo), each of the two instances of foo are interpreted differently -- the first is the function value of foo and the second is its variable value. Again, this is not hard to overcome -- there are a number of constructs that you need to know about to deal with all of this. For example, instead of writing (lambda (x) (x x)) you need to write (lambda (x) (funcall x x)), which makes the function that is being called appear in a variable position, therefore the same value will be used there; another example is (map car something) which you will need to translate to (map #'car something) (or more accurately, you will need to use mapcar which is Common Lisp's equivalent of the car function); yet another thing that you'll need to know is that let binds the value slot of the name, and labels binds the function slot (and has a very different syntax, just like defun and defvar.)
But the conceptual result of all of this is that Common Lispers tend to use higher-order code much less than Schemers, and that goes all the way from the idioms that are common in each language, to what implementations will do with it. (For example, many Common Lisp compilers will never optimize this call: (funcall foo bar), while Scheme compilers will optimize (foo bar) like any function call expression, because there is no other way to call functions.)
Finally, I'll note that much of the above is very good flamewar material: throw any of these issues into a public Lisp or Scheme forum (in particular comp.lang.lisp and comp.lang.scheme), and you'll most likely see a long thread where people explain why their choice is far better than the other, or why some "so called feature" is actually an idiotic decision that was made by language designers that were clearly very drunk at the time, etc etc. But the thing is that these are just differences between the two languages, and eventually people can get their job done in either one. It just happens that if the job is "doing SICP" then Scheme will be much easier considering how it hits each of these issues from the Scheme perspective. If you want to learn Common Lisp, then going with a Common Lisp textbook will leave you much less frustrated.
Using SICP with Common Lisp is possible and fun
You can use Common Lisp for learning with SICP without much problems. The Scheme subset that is used in the book is not very sophisticated. SICP does not use macros and it uses no continuations. There are DELAY and FORCE, which can be written in Common Lisp in a few lines.
Also for a beginner using (function foo) and (funcall foo 1 2 3) is actually better (IMHO !), because the code gets clearer when learning the functional programming parts. You can see where variables and lambda functions are being called/passed.
Tail call optimization in Common Lisp
There is only one big area where using Common Lisp has a drawback: tail call optimization (TCO). Common Lisp does not support TCO in its standard (because of unclear interaction with the rest of the language, not all computer architectures support it directly (think JVM), not all compilers support it (some Lisp Machine), it makes some debugging/tracing/stepping harder, ...).
There are three ways to live with that:
Hope that the stack does not blow out. BAD.
Use a Common Lisp implementation that supports TCO. There are some. See below.
Rewrite the functional loops (and similar constructs) into loops (and similar constructs) using DOTIMES, DO, LOOP, ...
Personally I would recommend 2 or 3.
Common Lisp has excellent and easy to use compilers with TCO support (SBCL, LispWorks, Allegro CL, Clozure CL, ...) and as a development environment use either the built-in ones or GNU Emacs/SLIME.
For use with SICP I would recommend SBCL, since it compiles always by default, has TCO support by default and the compiler catches a lot of coding problems (undeclared variables, wrong argument lists, a bunch of type errors, ...). This helps a lot during learning. Generally make sure the code is compiled, since Common Lisp interpreters will usually not support TCO.
Sometimes it might also helpful to write one or two macros and provide some Scheme function names to make code look a bit more like Scheme. For example you could have a DEFINE macro in Common Lisp.
For the more advanced users, there is an old Scheme implementation written in Common Lisp (called Pseudo Scheme), that should run most of the code in SICP.
My recommendation: if you want to go the extra mile and use Common Lisp, do it.
To make it easier to understand the necessary changes, I've added a few examples - remember, it needs a Common Lisp compiler with support for tail call optimization:
Example
Let's look at this simple code from SICP:
(define (factorial n)
(fact-iter 1 1 n))
(define (fact-iter product counter max-count)
(if (> counter max-count)
product
(fact-iter (* counter product)
(+ counter 1)
max-count)))
We can use it directly in Common Lisp with a DEFINE macro:
(defmacro define ((name &rest args) &body body)
`(defun ,name ,args ,#body))
Now you should use SBCL, CCL, Allegro CL or LispWorks. These compilers support TCO by default.
Let's use SBCL:
* (define (factorial n)
(fact-iter 1 1 n))
; in: DEFINE (FACTORIAL N)
; (FACT-ITER 1 1 N)
;
; caught STYLE-WARNING:
; undefined function: FACT-ITER
;
; compilation unit finished
; Undefined function:
; FACT-ITER
; caught 1 STYLE-WARNING condition
FACTORIAL
* (define (fact-iter product counter max-count)
(if (> counter max-count)
product
(fact-iter (* counter product)
(+ counter 1)
max-count)))
FACT-ITER
* (factorial 1000)
40238726007709....
Another Example: symbolic differentiation
SICP has a Scheme example for differentiation:
(define (deriv exp var)
(cond ((number? exp) 0)
((variable? exp)
(if (same-variable? exp var) 1 0))
((sum? exp)
(make-sum (deriv (addend exp) var)
(deriv (augend exp) var)))
((product? exp)
(make-sum
(make-product (multiplier exp)
(deriv (multiplicand exp) var))
(make-product (deriv (multiplier exp) var)
(multiplicand exp))))
(else
(error "unknown expression type -- DERIV" exp))))
Making this code run in Common Lisp is easy:
some functions have different names, number? is numberp in CL
CL:COND uses T instead of else
CL:ERROR uses CL format strings
Let's define Scheme names for some functions. Common Lisp code:
(loop for (scheme-symbol fn) in
'((number? numberp)
(symbol? symbolp)
(pair? consp)
(eq? eq)
(display-line print))
do (setf (symbol-function scheme-symbol)
(symbol-function fn)))
Our define macro from above:
(defmacro define ((name &rest args) &body body)
`(defun ,name ,args ,#body))
The Common Lisp code:
(define (variable? x) (symbol? x))
(define (same-variable? v1 v2)
(and (variable? v1) (variable? v2) (eq? v1 v2)))
(define (make-sum a1 a2) (list '+ a1 a2))
(define (make-product m1 m2) (list '* m1 m2))
(define (sum? x)
(and (pair? x) (eq? (car x) '+)))
(define (addend s) (cadr s))
(define (augend s) (caddr s))
(define (product? x)
(and (pair? x) (eq? (car x) '*)))
(define (multiplier p) (cadr p))
(define (multiplicand p) (caddr p))
(define (deriv exp var)
(cond ((number? exp) 0)
((variable? exp)
(if (same-variable? exp var) 1 0))
((sum? exp)
(make-sum (deriv (addend exp) var)
(deriv (augend exp) var)))
((product? exp)
(make-sum
(make-product (multiplier exp)
(deriv (multiplicand exp) var))
(make-product (deriv (multiplier exp) var)
(multiplicand exp))))
(t
(error "unknown expression type -- DERIV: ~a" exp))))
Let's try it in LispWorks:
CL-USER 19 > (deriv '(* (* x y) (+ x 3)) 'x)
(+ (* (* X Y) (+ 1 0)) (* (+ (* X 0) (* 1 Y)) (+ X 3)))
Streams example from SICP in Common Lisp
See the book code in chapter 3.5 in SICP. We use the additions to CL from above.
SICP mentions delay, the-empty-stream and cons-stream, but does not implement it. We provide here an implementation in Common Lisp:
(defmacro delay (expression)
`(lambda () ,expression))
(defmacro cons-stream (a b)
`(cons ,a (delay ,b)))
(define (force delayed-object)
(funcall delayed-object))
(defparameter the-empty-stream (make-symbol "THE-EMPTY-STREAM"))
Now comes portable code from the book:
(define (stream-null? stream)
(eq? stream the-empty-stream))
(define (stream-car stream) (car stream))
(define (stream-cdr stream) (force (cdr stream)))
(define (stream-enumerate-interval low high)
(if (> low high)
the-empty-stream
(cons-stream
low
(stream-enumerate-interval (+ low 1) high))))
Now Common Lisp differs in stream-for-each:
we need to use cl:progn instead of begin
function parameters need to be called with cl:funcall
Here is a version:
(defmacro begin (&body body) `(progn ,#body))
(define (stream-for-each proc s)
(if (stream-null? s)
'done
(begin (funcall proc (stream-car s))
(stream-for-each proc (stream-cdr s)))))
We also need to pass functions using cl:function:
(define (display-stream s)
(stream-for-each (function display-line) s))
But then the example works:
CL-USER 20 > (stream-enumerate-interval 10 20)
(10 . #<Closure 1 subfunction of STREAM-ENUMERATE-INTERVAL 40600010FC>)
CL-USER 21 > (display-stream (stream-enumerate-interval 10 1000))
10
11
12
...
997
998
999
1000
DONE
Do you already know some Common Lisp? I assume that is what you mean by 'Lisp'. In that case you might want to use it instead of Scheme. If you don't know either, and you are working through SICP solely for the learning experience, then probably you are better off with Scheme. It has much better support for new learners, and you won't have to translate from Scheme to Common Lisp.
There are differences; specifically, SICP's highly functional style is wordier in Common Lisp because you have to quote functions when passing them around and use funcall to call a function bound to a variable.
However, if you want to use Common Lisp, you can try using Eli Bendersky's Common Lisp translations of the SICP code under the tag SICP.
They are similar but not the same.
I believe If you go with Scheme it would be easier.
Edit: Nathan Sanders' comment is correct. It's clearly been a while since I last read the book, but I just checked and it does not use call/cc directly. I've upvoted Nathan's answer.
Whatever you use needs to implement continuations, which SICP uses a lot. Not even all Scheme interpreters implement them, and I'm not aware of any Common Lisp that does.