S-expressions and keeping track of source location - lisp

Lisp s-expressions are a concise and flexible way to represent code as an abstract syntax tree. Relative to the more specialized data structures used by compilers for other languages, however, they have one drawback: it is difficult to keep track of the file and line number corresponding to any particular point in the code. At least some Lisps end up just punting the problem; in the event of an error, they report source location only as far as function name, not file and line number.
Some dialects of Scheme have solved the problem by representing code not with ordinary cons cells, but with syntax objects, which are isomorphic to cons cells but can also carry additional information such as source location.
Has any implementation of Common Lisp solved this problem? If so, how?

The Common Lisp standard says very little about these things. It mentions for example that the function ed may take a function name and then open the editor with respective source code. But there is no mechanism specified and this feature is entirely provided by the development environment, possibly in combination with the Lisp system.
A typical way to deal with that is to compile a file and the compiler will record the source location of the object defined (a function, a variable, a class, ...). The source location could for example be placed on the property list of the symbol (the name of the thing defined), or recorded in some other place. Also the actual source code as a list structure can be associated with a Lisp symbol. See the function FUNCTION-LAMBDA-EXPRESSION.
Some implementations do more sophisticated source location recording. For example LispWorks can locate a specific part of a function which is currently executed. It also notes when the definition comes from an editor or a Listener. See Dspecs: Tools for Handling Definitions. The debugger then can for example locate where the code of a certain stack frame is located in the source.
SBCL also has a feature to locate source code.
Notice also that the actual 'source code' in Common Lisp is not always a text a file, but the read s-expression. eval and compile - two standard functions - don't take strings or filenames as arguments. They use the actual expressions:
CL-USER 26 > (compile 'foo (lambda (x) (1+ x)))
FOO
NIL
NIL
CL-USER 27 > (foo 41)
42
S-expressions as code are not bound to any particular textual formatting. They can be reformatted by the pretty printer function pprint and this may take available width into account to generate a layout.
So, noting the structure maybe be useful and it would be less useful to record source lines.

My understanding is that whatever data Scheme stores in the AST is data that can be associated to expressions in a CL environment.
Scheme
(defun my-simple-scheme-reader (stream)
(let ((char (read-char stream)))
(or (position char "0123456789")
(and (member char '(#\newline #\space #\tab)) :space)
(case char
(#\) :closing-paren)
(#\( (loop
with beg = (file-position stream)
for x = (my-simple-scheme-reader stream)
until (eq x :closing-paren)
unless (eq x :space)
collect x into items
finally (return (list :beg beg
:end (file-position stream)
:items items))))))))
For example:
(with-input-from-string (in "(0(1 2 3) 4 5 (6 7))")
(my-simple-scheme-reader in))
returns:
(:BEG 1 :END 20 :ITEMS
(0 (:BEG 3 :END 9 :ITEMS (1 2 3)) 4 5 (:BEG 15 :END 19 :ITEMS (6 7))))
The enriched tree represents syntax objects.
Common-Lisp
(defun make-environment ()
(make-hash-table :test #'eq))
(defun my-simple-lisp-reader (stream environment)
(let ((char (read-char stream)))
(or (position char "0123456789")
(and (member char '(#\newline #\space #\tab)) :space)
(case char
(#\) :closing-paren)
(#\( (loop
with beg = (file-position stream)
for x = (my-simple-lisp-reader stream environment)
until (eq x :closing-paren)
unless (eq x :space)
collect x into items
finally
(setf (gethash items environment)
(list :beg beg :end (file-position stream)))
(return items)))))))
Test:
(let ((env (make-environment)))
(with-input-from-string (in "(0(1 2 3) 4 5 (6 7))")
(values
(my-simple-lisp-reader in env)
env)))
Returns two values:
(0 (1 2 3) 4 5 (6 7))
#<HASH-TABLE :TEST EQL :COUNT 3 {1010524CD3}>
Given a cons cell, you can track back its original position. You can add more precise information if you want to. Once you evaluate a defun, for example, the source information can be attached to the function object, or as a symbol property, which means the information is garbage collected on redefinitions.
Remark
Note that in both cases there is no source file to keep track of, unless the system is able to track back to the original string in the source file where the reader is called.

Related

How can I modify the #+ and #- readtable macros in Lisp?

Short version:
I want to change the #+ and #- reader macros to apply to all immediately subsequent tokens starting with ##, in addition to the following token. Therefore, the following code...
#+somefeature
##someattribute1
##someattribute2
(defun ...)
...would, in the absence of somefeature, result in no code.
Long version:
I have written my own readtable-macros which apply transformations to subsequent code. For example:
##traced
(defun ...)
This yields a function that writes its arguments and return values to a file, for debugging.
This fails, however, when used in conjunction with the #+ reader macro:
#+somefeature
##traced
(defun ...)
In the absence of somefeature, the function continues to be defined, albeit without the ##traced modification. This is obviously not the desired outcome.
One possible solution would be to use progn, as follows:
#+somefeature
(progn
##traced
(defun ...))
But that's kind of ugly.
I would like to modify the #+ and #- reader macros, such that they may consume more than one token. Something like this:
(defun conditional-syntax-reader (stream subchar arg)
; If the conditional fails, consume subsequent tokens while they
; start with ##, then consume the next token.
)
(setf *readtable* (copy-readtable))
(set-dispatch-macro-character #\# #\+ #'conditional-syntax-reader)
(set-dispatch-macro-character #\# #\- #'conditional-syntax-reader)
The problem is, I don't know how to "delegate" to the original reader macros; and I don't understand enough about how they were implemented to re-implement them myself in their entirety.
A naive approach would be:
(defun consume-tokens-recursively (stream)
(let ((token (read stream t nil t)))
(when (string= "##" (subseq (symbol-string token) 0 2))
(consume-tokens-recursively stream)))) ; recurse
(defun conditional-syntax-reader (stream subchar arg)
(unless (member (read stream t nil t) *features*)
(consume-tokens-recursively stream)))
However, I'm given to believe that this wouldn't be sufficient:
The #+ syntax operates by first reading the feature specification and then skipping over the form if the feature is false. This skipping of a form is a bit tricky because of the possibility of user-defined macro characters and side effects caused by the #. and #, constructions. It is accomplished by binding the variable read-suppress to a non-nil value and then calling the read function.
This seems to imply that I can just let ((*read-suppress* t)) when using read to solve the issue. Is that right?
EDIT 1
Upon further analysis, it seems the problem is caused by not knowing how many tokens to consume. Consider the following attributes:
##export expects one argument: the (defun ...) to export.
##traced expects two arguments: the debug level and the (defun ...) to trace.
Example:
#+somefeature
##export
##traced 3
(defun ...)
It turns out that #+ and #- are capable of suppressing all these tokens; but there is a huge problem!
When under a suppressing #+ or #-, (read) returns NIL!
Example:
(defun annotation-syntax-reader (stream subchar arg)
(case (read stream t nil t)
('export
(let ((defun-form (read stream t nil t)))))
; do something
('traced
(let* ((debug-level (read stream t nil t))
(defun-form (read stream t nil t)))))))
; do something
(setf *readtable* (copy-readtable))
(set-dispatch-macro-character #\# #\# #'annotation-syntax-reader)
#+(or) ##traced 3 (defun ...)
The ##traced token is being suppressed by the #+. In this situation, all the (read) calls in (annotation-syntax-reader) consume real tokens but return NIL!
Therefore, the traced token is consumed, but the case fails. No additional tokens are thus consumed; and control leaves the scope of the #+.
The (defun ...) clause is executed as normal, and the function comes into being. Clearly not the desired outcome.
The standard readtable
Changing the macros for #+ and #- is a bit excessive solution I think, but in any case remember to not actually change the standard readtable (as you did, but its important to repeat in the answer)
The consequences are undefined if an attempt is made to modify the standard readtable. To achieve the effect of altering or extending standard syntax, a copy of the standard readtable can be created; see the function copy-readtable.
§2.1.1.2 The Standard Readtable
Now, maybe I'm missing something (please give us a hint about how your reader macro is defined if so), but I think it is possible to avoid that and write your custom macros in a way that works for your use case.
Reader macro
Let's define a simple macro as follows:
CL-USER> (defun my-reader (stream char)
(declare (ignore char))
(let ((name (read stream)
(form (read stream))
(unless *read-suppress*
`(with-decoration ,name ,form)))
MY-READER
[NB: This was edited to take into account *read-suppress*: the code always read two forms, but returns nil in case it is being ignored. In the comments you say that you may need to read an indefinite number of forms based on the name of the decoration, but with *read-suppress* the recursive calls to read return nil for symbols, so you don't know which decoration is being applied. In that case it might be better to wrap some arguments in a literal list, or parse the stream manually (read-char, etc.). Also, since you are using a dispatching macro, maybe you can add a numerical argument if you want the decoration to be applied to more than one form (#2#inline), but that could be a bad idea when later the decorated code is being modified.]
Here the reader does a minimal job, namely build a form that is intended to be macroexpanded later. I don't even need to define with-decoration for now, as I'm interested in the read step. The intent is to read the next token (presumably a symbol that indicates what decoration is being applied, and a form to decorate).
I'm binding this macro to a unused character:
CL-USER> (set-macro-character #\§ 'my-reader)
T
Here when I test the macro it wraps the following form:
CL-USER> (read-from-string "§test (defun)")
(WITH-DECORATION TEST (DEFUN))
13 (4 bits, #xD, #o15, #b1101)
And here it works with a preceding QUOTE too, the apostrophe reader grabs the next form, which recursively reads two forms:
CL-USER> '§test (defun)
(WITH-DECORATION TEST (DEFUN))
Likewise, a conditional reader macro will ignore all the next lines:
CL-USER> #+(or) t
; No values
CL-USER> #+(or) §test (defun)
; No values
CL-USER> #+(or) §one §two §three (defun)
; No values
Decoration macro
If you use this syntax, you'll have nested decorated forms:
CL-USER> '§one §two (defun test ())
(WITH-DECORATION ONE (WITH-DECORATION TWO (DEFUN TEST ())))
With respect to defun in toplevel positions, you can arrange for your macros to unwrap the nesting (not completely tested, there might be bugs):
(defun unwrap-decorations (form stack)
(etypecase form
(cons (destructuring-bind (head . tail) form
(case head
(with-decoration (destructuring-bind (token form) tail
(unwrap-decorations form (cons token stack))))
(t `(with-decorations ,(reverse stack) ,form)))))))
CL-USER> (unwrap-decorations ** nil)
(WITH-DECORATIONS (ONE TWO) (DEFUN TEST ()))
And in turn, with-decorations might know about DEFUN forms and how to annotate them as necessary.
For the moment, our original macro is only the following (it needs more error checking):
(defmacro with-decoration (&whole whole &rest args)
(unwrap-decorations whole nil))
For the sake of our example, let's define a generic annotation mechanism:
CL-USER> (defgeneric expand-decoration (type name rest))
#<STANDARD-GENERIC-FUNCTION COMMON-LISP-USER::EXPAND-DECORATION (0)>
It is used in with-decorations to dispatch on an appropriate expander for each decoration. Keep in mind that all the efforts here are to keep defun in a top-level positions (under a progn), a recursive annotation would let evaluation happens (in the case of defun, it would result in the name of the function being defined), and the annotation could be done on the result.
The main macro is then here, with a kind of fold (reduce) mechanism where the forms are decorated using the resulting expansion so far. This allows for expanders to place code before or after the main form (or do other fancy things):
(defmacro with-decorations ((&rest decorations) form)
(etypecase form
(cons (destructuring-bind (head . tail) form
(ecase head
(defun (destructuring-bind (name args . body) tail
`(progn
,#(loop
for b = `((defun ,name ,args ,#body)) then forms
for d in decorations
for forms = (expand-decoration d name b)
finally (return forms))))))))))
(nb. here above we only care about defun but the loop should probably be done outside of the dispatching thing, along with a way to indicate to expander methods that a function is being expanded; well, it could be better)
Say, for example, you want to declare a function as inline, then the declaration must happen before (so that the compiler can know the source code must be kept):
(defmethod expand-decoration ((_ (eql 'inline)) name rest)
`((declaim (inline ,name)) ,#rest))
Likewise, if you want to export the name of the function being defined, you can export it after the function is defined (order is not really important here):
(defmethod expand-decoration ((_ (eql 'export)) name rest)
`(,#rest (export ',name)))
The resulting code allows you to have a single (progn ...) form with a defun in toplevel position:
CL-USER> (macroexpand '§inline §export (defun my-test-fn () "hello"))
(PROGN
(DECLAIM (INLINE MY-TEST-FN))
(DEFUN MY-TEST-FN () "hello")
(EXPORT 'MY-TEST-FN))

lisp: concat arbitrary number of lists

In my ongoing quest to recreate lodash in lisp as a way of getting familiar with the language I am trying to write a concat-list function that takes an initial list and an arbitrary number of additional lists and concatenates them.
I'm sure that this is just a measure of getting familiar with lisp convention, but right now my loop is just returning the second list in the argument list, which makes sense since it is the first item of other-lists.
Here's my non-working code (edit: refactored):
(defun concat-list (input-list &rest other-lists)
;; takes an arbitrary number of lists and merges them
(loop
for list in other-lists
append list into input-list
return input-list
)
)
Trying to run (concat-list '(this is list one) '(this is list two) '(this is list three)) and have it return (this is list one this is list two this is list three).
How can I spruce this up to return the final, merged list?
The signature of your function is a bit unfortunate, it becomes easier if you don't treat the first list specially.
The easy way:
(defun concat-lists (&rest lists)
(apply #'concatenate 'list lists))
A bit more lower level, using loop:
(defun concat-lists (&rest lists)
(loop :for list :in lists
:append list))
Going lower, using dolist:
(defun concat-lists (&rest lists)
(let ((result ()))
(dolist (list lists (reverse result))
(setf result (revappend list result)))))
Going even lower would maybe entail implementing revappend yourself.
It's actually good style in Lisp not to use LABELS based iteration, since a) it's basically a go-to like low-level iteration style and it's not everywhere supported. For example the ABCL implementation of Common Lisp on the JVM does not support TCO last I looked. Lisp has wonderful iteration facilities, which make the iteration intention clear:
CL-USER 217 > (defun my-append (&rest lists &aux result)
(dolist (list lists (nreverse result))
(dolist (item list)
(push item result))))
MY-APPEND
CL-USER 218 > (my-append '(1 2 3) '(4 5 6) '(7 8 9))
(1 2 3 4 5 6 7 8 9)
Some pedagogical solutions to this problem
If you just want to do this, then use append, or nconc (destructive), which are the functions which do it.
If you want to learn how do to it, then learning about loop is not how to do that, assuming you want to learn Lisp: (loop for list in ... append list) really teaches you nothing but how to write a crappy version of append using arguably the least-lispy part of CL (note I have nothing against loop & use it a lot, but if you want to learn lisp, learning loop is not how to do that).
Instead why not think about how you would write this if you did not have the tools to do it, in a Lispy way.
Well, here's how you might do that:
(defun append-lists (list &rest more-lists)
(labels ((append-loop (this more results)
(if (null this)
(if (null more)
(nreverse results)
(append-loop (first more) (rest more) results))
(append-loop (rest this) more (cons (first this) results)))))
(append-loop list more-lists '())))
There's a dirty trick here: I know that results is completely fresh so I am using nreverse to reverse it, which does so destructively. Can we write nreverse? Well, it's easy to write reverse, the non-destructive variant:
(defun reverse-nondestructively (list)
(labels ((r-loop (tail reversed)
(if (null tail)
reversed
(r-loop (rest tail) (cons (first tail) reversed)))))
(r-loop list '())))
And it turns out that a destructive reversing function is only a little harder:
(defun reverse-destructively (list)
(labels ((rd-loop (tail reversed)
(if (null tail)
reversed
(let ((rtail (rest tail)))
(setf (rest tail) reversed)
(rd-loop rtail tail)))))
(rd-loop list '())))
And you can check it works:
> (let ((l (make-list 1000 :initial-element 1)))
(time (reverse-destructively l))
(values))
Timing the evaluation of (reverse-destructively l)
User time = 0.000
System time = 0.000
Elapsed time = 0.000
Allocation = 0 bytes
0 Page faults
Why I think this is a good approach to learning Lisp
[This is a response to a couple of comments which I thought was worth adding to the answer: it is, of course, my opinion.]
I think that there are at least three different reasons for wanting to solve a particular problem in a particular language, and the approach you might want to take depends very much on what your reason is.
The first reason is because you want to get something done. In that case you want first of all to find out if it has been done already: if you want to do x and the language a built-in mechanism for doing x then use that. If x is more complicated but there is some standard or optional library which does it then use that. If there's another language you could use easily which does x then use that. Writing a program to solve the problem should be something you do only as a last resort.
The second reason is because you've fallen out of the end of the first reason, and you now find yourself needing to write a program. In that case what you want to do is use all of the tools the language provides in the best way to solve the problem, bearing in mind things like maintainability, performance and so on. In the case of CL, then if you have some problem which naturally involves looping, then, well, use loop if you want to. It doesn't matter whether loop is 'not lispy' or 'impure' or 'hacky': just do what you need to do to get the job done and make the code maintainable. If you want to print some list of objects, then by all means write (format t "~&~{~A~^, ~}~%" things).
The third reason is because you want to learn the language. Well, assuming you can program in some other language there are two approaches to doing this.
the first is to say 'I know how to do this thing (write loops, say) in languages I know – how do I do it in Lisp?', and then iterate this for all the thing you already know how to do in some other language;
the second is to say 'what is it that makes Lisp distinctive?' and try and understand those things.
These approaches result in very approaches to learning. In particular I think the first approach is often terrible: if the language you know is, say, Fortran, then you'll end up writing Fortran dressed up as Lisp. And, well, there are perfectly adequate Fortran compilers out there: why not use them? Even worse, you might completely miss important aspects of the language and end up writing horrors like
(defun sum-list (l)
(loop for i below (length l)
summing (nth i l)))
And you will end up thinking that Lisp is slow and pointless and return to the ranks of the heathen where you will spread such vile calumnies until, come the great day, the golden Lisp horde sweeps it all away. This has happened.
The second approach is to ask, well, what are the things that are interesting about Lisp? If you can program already, I think this is a much better approach to the first, because learning the interesting and distinctive features of a language first will help you understand, as quickly as possible, whether its a language you might actually want to know.
Well, there will inevitably be argument about what the interesting & distinctive features of Lisp are, but here's a possible, partial, set.
The language has a recursively-defined data structure (S expressions or sexprs) at its heart, which is used among other things to represent the source code of the language itself. This representation of the source is extremely low-commitment: there's nothing in the syntax of the language which says 'here's a block' or 'this is a conditiona' or 'this is a loop'. This low-commitment can make the language hard to read, but it has huge advantages.
Recursive processes are therefore inherently important and the language is good at expressing them. Some variants of the language take this to the extreme by noticing that iteration is simply a special case of recursion and have no iterative constructs at all (CL does not do this).
There are symbols, which are used as names for things both in the language itself and in programs written in the language (some variants take this more seriously than others: CL takes it very seriously).
There are macros. This really follows from the source code of the language being represented as sexprs and this structure having a very low commitment to what it means. Macros, in particular, are source-to-source transformations, with the source being represented as sexprs, written in the language itself: the macro language of Lisp is Lisp, without restriction. Macros allow the language itself to be seamlessly extended: solving problems in Lisp is done by designing a language in which the problem can be easily expressed and solved.
The end result of this is, I think two things:
recursion, in addition to and sometimes instead of iteration is an unusually important technique in Lisp;
in Lisp, programming means building a programming language.
So, in the answer above I've tried to give you examples of how you might think about solving problems involving a recursive data structure recursively: by defining a local function (append-loop) which then recursively calls itself to process the lists. As Rainer pointed out that's probably not a good way of solving this problem in Common Lisp as it tends to be hard to read and it also relies on the implementation to turn tail calls into iteration which is not garuanteed in CL. But, if your aim is to learn to think the way Lisp wants you to think, I think it is useful: there's a difference between code you might want to write for production use, and code you might want to read and write for pedagogical purposes: this is pedagogical code.
Indeed, it's worth looking at the other half of how Lisp might want you to think to solve problems like this: by extending the language. Let's say that you were programming in 1960, in a flavour of Lisp which has no iterative constructs other than GO TO. And let's say you wanted to process some list iteratively. Well, you might write this (this is in CL, so it is not very like programming in an ancient Lisp would be: in CL tagbody establishes a lexical environment in the body of which you can have tags – symbols – and then go will go to those tags):
(defun print-list-elements (l)
;; print the elements of a list, in order, using GO
(let* ((tail l)
(current (first tail)))
(tagbody
next
(if (null tail)
(go done)
(progn
(print current)
(setf tail (rest tail)
current (first tail))
(go next)))
done)))
And now:
> (print-list-elements '(1 2 3))
1
2
3
nil
Let's program like it's 1956!
So, well, let's say you don't like writing this sort of horror. Instead you'd like to be able to write something like this:
(defun print-list-elements (l)
;; print the elements of a list, in order, using GO
(do-list (e l)
(print e)))
Now if you were using most other languages you need to spend several weeks mucking around with the compiler to do this. But in Lisp you spend a few minutes writing this:
(defmacro do-list ((v l &optional (result-form nil)) &body forms)
;; Iterate over a list. May be buggy.
(let ((tailn (make-symbol "TAIL"))
(nextn (make-symbol "NEXT"))
(donen (make-symbol "DONE")))
`(let* ((,tailn ,l)
(,v (first ,tailn)))
(tagbody
,nextn
(if (null ,tailn)
(go ,donen)
(progn
,#forms
(setf ,tailn (rest ,tailn)
,v (first ,tailn))
(go ,nextn)))
,donen
,result-form))))
And now your language has an iteration construct which it previously did not have. (In real life this macro is called dolist).
And you can go further: given our do-list macro, let's see how we can collect things into a list:
(defun collect (thing)
;; global version: just signal an error
(declare (ignorable thing))
(error "not collecting"))
(defmacro collecting (&body forms)
;; Within the body of this macro, (collect x) will collect x into a
;; list, which is returned from the macro.
(let ((resultn (make-symbol "RESULT"))
(rtailn (make-symbol "RTAIL")))
`(let ((,resultn '())
(,rtailn nil))
(flet ((collect (thing)
(if ,rtailn
(setf (rest ,rtailn) (list thing)
,rtailn (rest ,rtailn))
(setf ,resultn (list thing)
,rtailn ,resultn))
thing))
,#forms)
,resultn)))
And now we can write the original append-lists function entirely in terms of constructs we've invented:
(defun append-lists (list &rest more-lists)
(collecting
(do-list (e list) (collect e))
(do-list (l more-lists)
(do-list (e l)
(collect e)))))
If that's not cool then nothing is.
In fact we can get even more carried away. My original answer above used labels to do iteration As Rainer has pointed out, this is not safe in CL since CL does not mandate TCO. I don't particularly care about that (I am happy to use only CL implementations which mandate TCO), but I do care about the problem that using labels this way is hard to read. Well, you can, of course, hide this in a macro:
(defmacro looping ((&rest bindings) &body forms)
;; A sort-of special-purpose named-let.
(multiple-value-bind (vars inits)
(loop for b in bindings
for var = (typecase b
(symbol b)
(cons (car b))
(t (error "~A is hopeless" b)))
for init = (etypecase b
(symbol nil)
(cons (unless (null (cddr b))
(error "malformed binding ~A" b))
(second b))
(t
(error "~A is hopeless" b)))
collect var into vars
collect init into inits
finally (return (values vars inits)))
`(labels ((next ,vars
,#forms))
(next ,#inits))))
And now:
(defun append-lists (list &rest more-lists)
(collecting
(looping ((tail list) (more more-lists))
(if (null tail)
(unless (null more)
(next (first more) (rest more)))
(progn
(collect (first tail))
(next (rest tail) more))))))
And, well, I just think it is astonishing that I get to use a programming language where you can do things like this.
Note that both collecting and looping are intentionally 'unhygenic': they introduce a binding (for collect and next respectively) which is visible to code in their bodies and which would shadow any other function definition of that name. That's fine, in fact, since that's their purpose.
This kind of iteration-as-recursion is certainly cool to think about, and as I've said I think it really helps you to think about how the language can work, which is my purpose here. Whether it leads to better code is a completely different question. Indeed there is a famous quote by Guy Steele from one of the 'lambda the ultimate ...' papers:
procedure calls may be usefully thought of as GOTO statements which also pass parameters
And that's a lovely quote, except that it cuts both ways: procedure calls, in a language which optimizes tail calls, are pretty much GOTO, and you can do almost all the horrors with them that you can do with GOTO. But GOTO is a problem, right? Well, it turns out so are procedure calls, for most of the same reasons.
So, pragmatically, even in a language (or implementation) where procedure calls do have all these nice characteristics, you end up wanting constructs which can express iteration and not recursion rather than both. So, for instance, Racket which, being a Scheme-family language, does mandate tail-call elimination, has a whole bunch of macros with names like for which do iteration.
And in Common Lisp, which does not mandate tail-call elimination but which does have GOTO, you also need to build macros to do iteration, in the spirit of my do-list above. And, of course, a bunch of people then get hopelessly carried away and the end point is a macro called loop: loop didn't exist (in its current form) in the first version of CL, and it was common at that time to simply obtain a copy of it from somewhere, and make sure it got loaded into the image. In other words, loop, with all its vast complexity, is just a macro which you can define in a CL which does not have it already.
OK, sorry, this is too long.
(loop for list in (cons '(1 2 3)
'((4 5 6) (7 8 9)))
append list)

Why does lisp use gensym and other languages don't?

Correct me if I'm wrong, but there is nothing like gensym in Java, C, C++, Python, Javascript, or any of the other languages I've used, and I've never seemed to need it. Why is it necessary in Lisp and not in other langauges? For clarification, I'm learning Common Lisp.
Common Lisp has a powerful macro system. You can make new syntax patterns that behave exactly the way you want them to behave. It's even expressed in its own language, making everything in the language available to transform the code from what you want to write to something that CL actually understands. All languages with powerful macro systems provide gensym or do it implicitly in their macro implementation.
In Common Lisp you use gensym when you want to make code where the symbol shouldn't match elements used any other places in the result. Without it there is no guarantee that a user uses a symbol that the macro implementer also use and they start to interfere and the result is something different than the intended behavior. It makes sure nested expansions of the same macro don't interfere with previous expansions. With the Common Lisp macro system it's possible to make more restrictive macro systems similar to Scheme syntax-rules and syntax-case.
In Scheme there are several macro systems. One with pattern matching where new introduced symbols act automatically as if they are made with gensym. syntax-case will also by default make new symbols as if they were made with gensym and there is also a way to reduce hygiene. You can make CL defmacro with syntax-case but since Scheme doesn't have gensym you wouldn't be able to make hygienic macros with it.
Java, C, C++, Python, Javascript are all Algol dialects and none of them have other than simple template based macros. Thus they don't have gensym because they don't need it. Since the only way to introduce new syntax in these languages is to wish next version of it will provide it.
There are two Algol dialects with powerful macros that come to mind. Nemerle and Perl6. Both of them have hygienic approach, meaning variables introduced behave as if they are made with gensym.
In CL, Scheme, Nemerle, Perl6 you don't need to wait for language features. You can make them yourself! The news in both Java and PHP are easily implemented with macros in any of them should it not already be available.
Can't say which languages have an equivalent of GENSYM. Many languages don't have a first-class symbol data type (with interned and uninterned symbols) and many are not providing similar code generation (macros, ...) facilities.
An interned symbol is registered in a package. An uninterned is not. If the reader (the reader is the Lisp subsystem which takes textual s-expressions as input and returns data) sees two interned symbols in the same package and with the same name, it assumes that it is the same symbol:
CL-USER 35 > (eq 'cl:list 'cl:list)
T
If the reader sees an uninterned symbol, it creates a new one:
CL-USER 36 > (eq '#:list '#:list)
NIL
Uninterned symbols are written with #: in front of the name.
GENSYM is used in Lisp to create numbered uninterned symbols, because it is sometimes useful in code generation and then debugging this code. Note that the symbols are always new and not eq to anything else. But the symbol name could be the same as the name of another symbol. The number gives a clue to the human reader about the identity.
An example using MAKE-SYMBOL
make-symbol creates a new uninterned symbol using a string argument as its name.
Let's see this function generating some code:
CL-USER 31 > (defun make-tagbody (exp test)
(let ((start-symbol (make-symbol "start"))
(exit-symbol (make-symbol "exit")))
`(tagbody ,start-symbol
,exp
(if ,test
(go ,start-symbol)
(go ,exit-symbol))
,exit-symbol)))
MAKE-TAGBODY
CL-USER 32 > (pprint (make-tagbody '(incf i) '(< i 10)))
(TAGBODY
#:|start| (INCF I)
(IF (< I 10) (GO #:|start|) (GO #:|exit|))
#:|exit|)
Above generated code uses uninterned symbols. Both #:|start| are actually the same symbol. We would see this if we would have *print-circle* to T, since the printer then would clearly label identical objects. But here we don't get this added information. Now if you nest this code, then you would see more than the one start and one exit symbol, each which was used in two places.
An example using GENSYM
Now let's use gensym. Gensym also creates an uninterned symbol. Optionally this symbol is named by a string. A number (see the variable CL:*GENSYM-COUNTER*) is added.
CL-USER 33 > (defun make-tagbody (exp test)
(let ((start-symbol (gensym "start"))
(exit-symbol (gensym "exit")))
`(tagbody ,start-symbol
,exp
(if ,test
(go ,start-symbol)
(go ,exit-symbol))
,exit-symbol)))
MAKE-TAGBODY
CL-USER 34 > (pprint (make-tagbody '(incf i) '(< i 10)))
(TAGBODY
#:|start213051| (INCF I)
(IF (< I 10) (GO #:|start213051|) (GO #:|exit213052|))
#:|exit213052|)
Now the number is an indicator that the two uninterned #:|start213051| symbols are actually the same. When the code would be nested, the new version of the start symbol would have a different number:
CL-USER 7 > (pprint (make-tagbody `(progn
(incf i)
(setf j 0)
,(make-tagbody '(incf ij) '(< j 10)))
'(< i 10)))
(TAGBODY
#:|start2756| (PROGN
(INCF I)
(SETF J 0)
(TAGBODY
#:|start2754| (INCF IJ)
(IF (< J 10)
(GO #:|start2754|)
(GO #:|exit2755|))
#:|exit2755|))
(IF (< I 10) (GO #:|start2756|) (GO #:|exit2757|))
#:|exit2757|)
Thus it helps understanding generated code, without the need to turn *print-circle* on, which would label the identical objects:
CL-USER 8 > (let ((*print-circle* t))
(pprint (make-tagbody `(progn
(incf i)
(setf j 0)
,(make-tagbody '(incf ij) '(< j 10)))
'(< i 10))))
(TAGBODY
#3=#:|start1303| (PROGN
(INCF I)
(SETF J 0)
(TAGBODY
#1=#:|start1301| (INCF IJ)
(IF (< J 10) (GO #1#) (GO #2=#:|exit1302|))
#2#))
(IF (< I 10) (GO #3#) (GO #4=#:|exit1304|))
#4#)
Above is readable for the Lisp reader (the subsystem which reads s-expressions for textual representations), but a bit less for the human reader.
I believe that symbols (in the Lisp sense) are mostly useful in homoiconic languages (those in which the syntax of the language is representable as a data of that language).
Java, C, C++, Python, Javascript are not homoiconic.
Once you have symbols, you want some way to dynamically create them. gensym is a possibility, but you can also intern them.
BTW, MELT is a lisp-like dialect, it does not create symbols with gensym or by interning strings but with clone_symbol. (actually MELT symbols are instances of predefined CLASS_SYMBOL, ...).
gensym is available as a predicate in most of Prolog interpreters. You can find it in the eponym library.

Why can't CLISP call certain functions with uninterned names?

I've written an ad hoc parser generator that creates code to convert an old and little known 7-bit character set into unicode. The call to the parser generator expands into a bunch of defuns enclosed in a progn, which then get compiled. I only want to expose one of the generated defuns--the top-level one--to the rest of the system; all the others are internal to the parser and only get called from within the dynamic scope of the top-level one. Therefore, the other defuns generated have uninterned names (created with gensym). This strategy works fine with SBCL, but I recently tested it for the first time with CLISP, and I get errors like:
*** - FUNCALL: undefined function #:G16985
It seems that CLISP can't handle functions with uninterned names. (Interestingly enough, the system compiled without a problem.) EDIT: It seems that it can handle functions with uninterned names in most cases. See the answer by Rörd below.
My questions is: Is this a problem with CLISP, or is it a limitation of Common Lisp that certain implementations (e.g. SBCL) happen to overcome?
EDIT:
For example, the macro expansion of the top-level generated function (called parse) has an expression like this:
(PRINC (#:G75735 #:G75731 #:G75733 #:G75734) #:G75732)
Evaluating this expression (by calling parse) causes an error like the one above, even though the function is definitely defined within the very same macro expansion:
(DEFUN #:G75735 (#:G75742 #:G75743 #:G75744) (DECLARE (OPTIMIZE (DEBUG 2)))
(DECLARE (LEXER #:G75742) (CONS #:G75743 #:G75744))
(MULTIPLE-VALUE-BIND (#:G75745 #:G75746) (POP-TOKEN #:G75742)
...
The two instances of #:G75735 are definitely the same symbol--not two different symbols with the same name. As I said, this works with SBCL, but not with CLISP.
EDIT:
SO user Joshua Taylor has pointed out that this is due to a long standing CLISP bug.
You don't show one of the lines that give you the error, so I can only guess, but the only thing that could cause this problem as far as I can see is that you are referring to the name of the symbol instead of the symbol itself when trying to call it.
If you were referring to the symbol itself, all your lisp implementation would have to do is lookup that symbol's symbol-function. Whether it's interned or not couldn't possibly matter.
May I ask why you haven't considered another way to hide the functions, i.e. a labels statement or defining the functions within a new package that exports only the one external function?
EDIT: The following example is copied literally from an interaction with the CLISP prompt.
As you can see, calling the function named by a gensym is working as expected.
[1]> (defmacro test ()
(let ((name (gensym)))
`(progn
(defun ,name () (format t "Hello!"))
(,name))))
TEST
[2]> (test)
Hello!
NIL
Maybe your code that's trying to call the function gets evaluated before the defun? If there's any code in the macro expansion besides the various defuns, it may be implementation-dependent what gets evaluated first, and so the behaviour of SBCL and CLISP may differ without any of them violating the standard.
EDIT 2: Some further investigation shows that CLISP's behaviour varies depending upon whether the code is interpreted directly or whether it's first compiled and then interpreted. You can see the difference by either directly loading a Lisp file in CLISP or by first calling compile-file on it and then loading the FASL.
You can see what's going on by looking at the first restart that CLISP offers. It says something like "Input a value to be used instead of (FDEFINITION '#:G3219)." So for compiled code, CLISP quotes the symbol and refers to it by name.
It seems though that this behaviour is standard-conforming. The following definition can be found in the HyperSpec:
function designator n. a designator for a function; that is, an object that denotes a function and that is one of: a symbol (denoting the function named by that symbol in the global environment), or a function (denoting itself). The consequences are undefined if a symbol is used as a function designator but it does not have a global definition as a function, or it has a global definition as a macro or a special form. See also extended function designator.
I think an uninterned symbol matches the "a symbol is used as a function designator but it does not have a global definition as a function" case for unspecified consequences.
EDIT 3: (I can agree that I'm not sure whether CLISP's behaviour is a bug or not. Someone more experienced with details of the standard's terminology should judge this. It comes down to whether the function cell of an uninterned symbol - i.e. a symbol that cannot be referred to by name, only by having a direct hold on the symbol object - would be considered a "global definition" or not)
Anyway, here's an example solution that solves the problem in CLISP by interning the symbols in a throwaway package, avoiding the matter of uninterned symbols:
(defmacro test ()
(let* ((pkg (make-package (gensym)))
(name (intern (symbol-name (gensym)) pkg)))
`(progn
(defun ,name () (format t "Hello!"))
(,name))))
(test)
EDIT 4: As Joshua Taylor notes in a comment to the question, this seems to be a case of the (10 year old) CLISP bug #180.
I've tested both workarounds suggested in that bug report and found that replacing the progn with locally actually doesn't help, but replacing it with let () does.
You can most certainly define functions whose names are uninterned symbols. For instance:
CL-USER> (defun #:foo (x)
(list x))
#:FOO
CL-USER> (defparameter *name-of-function* *)
*NAME-OF-FUNCTION*
CL-USER> *name-of-function*
#:FOO
CL-USER> (funcall *name-of-function* 3)
(3)
However, the sharpsign colon syntax introduces a new symbol each time such a form is read read:
#: introduces an uninterned symbol whose name is symbol-name. Every time this syntax is encountered, a distinct uninterned symbol is created. The symbol-name must have the syntax of a symbol with no package prefix.
This means that even though something like
CL-USER> (list '#:foo '#:foo)
;=> (#:FOO #:FOO)
shows the same printed representation, you actually have two different symbols, as the following demonstrates:
CL-USER> (eq '#:foo '#:foo)
NIL
This means that if you try to call such a function by typing #: and then the name of the symbol naming the function, you're going to have trouble:
CL-USER> (#:foo 3)
; undefined function #:foo error
So, while you can call the function using something like the first example I gave, you can't do this last one. This can be kind of confusing, because the printed representation makes it look like this is what's happening. For instance, you could write such a factorial function like this:
(defun #1=#:fact (n &optional (acc 1))
(if (zerop n) acc
(#1# (1- n) (* acc n))))
using the special reader notation #1=#:fact and #1# to later refer to the same symbol. However, look what happens when you print that same form:
CL-USER> (pprint '(defun #1=#:fact (n &optional (acc 1))
(if (zerop n) acc
(#1# (1- n) (* acc n)))))
(DEFUN #:FACT (N &OPTIONAL (ACC 1))
(IF (ZEROP N)
ACC
(#:FACT (1- N) (* ACC N))))
If you take that printed output, and try to copy and paste it as a definition, the reader creates two symbols named "FACT" when it comes to the two occurrences of #:FACT, and the function won't work (and you might even get undefined function warnings):
CL-USER> (DEFUN #:FACT (N &OPTIONAL (ACC 1))
(IF (ZEROP N)
ACC
(#:FACT (1- N) (* ACC N))))
; in: DEFUN #:FACT
; (#:FACT (1- N) (* ACC N))
;
; caught STYLE-WARNING:
; undefined function: #:FACT
;
; compilation unit finished
; Undefined function:
; #:FACT
; caught 1 STYLE-WARNING condition
I hope I get the issue right. For me it works in CLISP.
I tried it like this: using a macro for creating a function with a GENSYM-ed name.
(defmacro test ()
(let ((name (gensym)))
`(progn
(defun ,name (x) (* x x))
',name)))
Now I can get the name (setf x (test)) and call it (funcall x 2).
Yes, it is perfectly fine defining functions that have names that are unintenred symbols. The problem is that you cannot then call them "by name", since you can't fetch the uninterned symbol by name (that is what "uninterned" means, essentially).
You would need to store the uninterned symbol in some sort of data structure, to then be able to fetch the symbol. Alternatively, store the defined function in some sort of data structure.
Surprisingly, CLISP bug 180 isn't actually an ANSI CL conformance bug. Not only that, but evidently, ANSI Common Lisp is itself so broken in this regard that even the progn based workaround is a courtesy of the implementation.
Common Lisp is a language intended for compilation, and compilation produces issues regarding the identity of objects which are placed into compiled files and later loaded ("externalized" objects). ANSI Common Lisp requires that literal objects reproduced from compiled files are only similar to the original objects. (CLHS 3.2.4 Literal Objects in Compiled Files).
Firstly, according to the definition similarity (3.2.4.2.2 Definition of Similarity), the rules for uninterned symbols is that similarity is name based. If we compile code with a literal that contains an uninterned symbol, then when we load the compiled file, we get a symbol which is similar and not (necessarily) the same object: a symbol which has the same name.
What if the same uninterned symbol is inserted into two different top-level forms which are then compiled as a file? When the file is loaded, are those two similar to each other at least? No, there is no such requirement.
But it gets worse: there is also no requirement that two occurrences of the same uninterned symbol in the same form will be externalized in such a way that their relative identity is preserved: that the re-loaded version of that object will have the same symbol object in all the places where the original was. In fact, the definition of similarity contains no provision for preserving the circular structure and substructure sharing. If we have a literal like '#1=(a b . #1#), as a literal in a compiled file, there appears to be no requirement that this be reproduced as a circular object with the same graph structure as the original (a graph isomorphism). The similarity rule for conses is given as naive recursion: two conses are similar if their respective cars and cdrs are similar. (The rule can't even be evaluated for circular objects; it doesn't terminate).
That the above works is because of implementations going beyond what is required in the spec; they are providing an extension consistent with (3.2.4.3 Extensions to Similarity Rules).
Thus, purely according to ANSI CL, we cannot expect to use macros with gensyms in compiled files, at least in some ways. The expectation expressed in code like the following runs afoul of the spec:
(defmacro foo (arg)
(let ((g (gensym))
(literal '(blah ,g ,g ,arg)))
...))
(defun bar ()
(foo 42))
The bar function contains a literal with two insertions of a gensym, which according to the similarity rules for conses and symbols need not reproduce as a list containing two occurrences of the same object in the second and third positions.
If the above works as expected, it's due to "extensions to the similarity rules".
So the answer to the "Why can't CLISP ..." question is that although CLISP does provide an extension for similarity which preserves the graph structure of literal forms, it doesn't do it across the entire compiled file, only within individual top level items within that file. (It uses *print-circle* to emit the individual items.) The bug is that CLISP doesn't conform to the best possible behavior users can imagine, or at least to a better behavior exhibited by other implementations.

How could I implement the push macro?

Can someone help me understand how push can be implemented as a macro? The naive version below evaluates the place form twice, and does so before evaluating the element form:
(defmacro my-push (element place)
`(setf ,place (cons ,element ,place)))
But if I try to fix this as below then I'm setf-ing the wrong place:
(defmacro my-push (element place)
(let ((el-sym (gensym))
(place-sym (gensym)))
`(let ((,el-sym ,element)
(,place-sym ,place))
(setf ,place-sym (cons ,el-sym ,place-sym)))))
CL-USER> (defparameter *list* '(0 1 2 3))
*LIST*
CL-USER> (my-push 'hi *list*)
(HI 0 1 2 3)
CL-USER> *list*
(0 1 2 3)
How can I setf the correct place without evaluating twice?
Doing this right seems to be a little more complicated. For instance, the code for push in SBCL 1.0.58 is:
(defmacro-mundanely push (obj place &environment env)
#!+sb-doc
"Takes an object and a location holding a list. Conses the object onto
the list, returning the modified list. OBJ is evaluated before PLACE."
(multiple-value-bind (dummies vals newval setter getter)
(sb!xc:get-setf-expansion place env)
(let ((g (gensym)))
`(let* ((,g ,obj)
,#(mapcar #'list dummies vals)
(,(car newval) (cons ,g ,getter))
,#(cdr newval))
,setter))))
So reading the documentation on get-setf-expansion seems to be useful.
For the record, the generated code looks quite nice:
Pushing into a symbol:
(push 1 symbol)
expands into
(LET* ((#:G906 1) (#:NEW905 (CONS #:G906 SYMBOL)))
(SETQ SYMBOL #:NEW905))
Pushing into a SETF-able function (assuming symbol points to a list of lists):
(push 1 (first symbol))
expands into
(LET* ((#:G909 1)
(#:SYMBOL908 SYMBOL)
(#:NEW907 (CONS #:G909 (FIRST #:SYMBOL908))))
(SB-KERNEL:%RPLACA #:SYMBOL908 #:NEW907))
So unless you take some time to study setf, setf expansions and company, this looks rather arcane (it may still look so even after studying them). The 'Generalized Variables' chapter in OnLisp may be useful too.
Hint: if you compile your own SBCL (not that hard), pass the --fancy argument to make.sh. This way you'll be able to quickly see the definitions of functions/macros inside SBCL (for instance, with M-. inside Emacs+SLIME). Obviously, don't delete those sources (you can run clean.sh after install.sh, to save 90% of the space).
Taking a look at how the existing one (in SBCL, at least) does things, I see:
* (macroexpand-1 '(push 1 *foo*))
(LET* ((#:G823 1) (#:NEW822 (CONS #:G823 *FOO*)))
(SETQ *FOO* #:NEW822))
T
So, I imagine, mixing in a combination of your version and what this generates, one might do:
(defmacro my-push (element place)
(let ((el-sym (gensym))
(new-sym (gensym "NEW")))
`(let* ((,el-sym ,element)
(,new-sym (cons ,el-sym ,place)))
(setq ,place ,new-sym)))))
A few observations:
This seems to work with either setq or setf. Depending on what problem you're actually trying to solve (I presume re-writing push isn't the actual end goal), you may favor one or the other.
Note that place does still get evaluated twice... though it does at least do so only after evaluating element. Is the double evaluation something you actually need to avoid? (Given that the built-in push doesn't, I'm left wondering if/how you'd be able to... though I'm writing this up before spending terribly much time thinking about it.) Given that it's something that needs to evaluate as a "place", perhaps this is normal?
Using let* instead of let allows us to use ,el-sym in the setting of ,new-sym. This moves where the cons happens, such that it's evaluated in the first evaluation of ,place, and after the evaluation of ,element. Perhaps this gets you what you need, with respect to evaluation ordering?
I think the biggest problem with your second version is that your setf really does need to operate on the symbol passed in, not on a gensym symbol.
Hopefully this helps... (I'm still somewhat new to all this myself, so I'm making some guesses here.)