Let: creating a temporary variable in Common Lisp - lisp

Given a function:
(defun foo (bar)
(let ((baz bar))
(setf baz (+ baz 1)))
I have been given to understand (perhaps incorrectly?) that baz becomes some sort of reference to bar, instead of being a true copy of bar.
What I would like to do is create a true temporary variable so that I can ensure that I can muck about with the passed in variables all I want, without having any side effects.

I think I'd rather say "baz becomes a reference to the same thing that bar is a reference to". But you're right that let does not do any copying.
If you want to make a copy of bar, you certainly can, though how you do that depends on what bar is: a list, a vector, etc.
For the curious, Kent Pitman wrote a great article on the subject of "Why is there no generic COPY function?".

Typically, you can muck about with several references to a single "thing" without any issues, as long as you do not use any mutating functions (slot accessors, descructive list operations, that sort of thing).
In the case you've copied, you're using numbers and those are "always" safe (if you do arithmetic ona new copy, a new copy will be generated).
Unfortunately, there is no generic "Make a copy" function, partly because it's not obvious what "make a copy" means in all cases and partly because it's not necessarily trivial to copy circular data structures (something a generic copy function would have to be able to do).

Related

how can I compare syntax objects in racket?

I'd like to compare the code contents of two syntax objects and ignore things like contexts. Is converting them to datum the only way to do so? Like:
(equal? (syntax->datum #'(x+1)) (syntax->datum #'(x+1)))
If you want to compare both objects without deconstructing them at all, then yes.
HOWEVER, the problem with this method is that it only compares the datum attached to two syntax objects, and won't actually compare their binding information.
The analogy that I've heard (from Ryan Culpepper), is this is kind of like taking two paintings, draining of them of their color, and seeing if they are identical. While they might be similar in some ways, you will miss a lot of differences from the different colors.
A better approach (although it does require some work), is to use syntax-e to destruct the syntax object into more primitive lists of syntax objects, and do this until you get identifiers (basically a syntax object whose datum is a symbol), from there, you can generally use free-identifier=? (and sometimes bound-identifier=? to see if each identifier can bind each other, and identifier-binding to compare module level identifiers.
The reason why there isn't a single simple predicate to compare two arbitrary syntax objects is because, generally, there isn't really one good definition for what makes two pieces of code equal, even if you only care about syntactic equality. For example, using the functions referenced above doesn't track internal bindings in a syntax object, so you will still get a very strict definition of what it means to be 'equal'. that is, both syntax objects have the same structure with identifiers that are either bound to the same module, or are free-identifier=?.
As such, before you use this answer, I highly recommend you take a step back and make sure this is really what you want to do. Once in a blue moon it is, but most of the time you actually are trying to solve a similar, yet simpler, problem.
Here's a concrete example of one possible way you could do the "better approach" Leif Andersen mentioned.
I have used this in multiple places for testing purposes, though if anyone wanted to use it in non-test code, they would probably want to re-visit some of the design decisions.
However, things like the equal?/recur pattern used here should be helpful no matter how you decide to define what equality means.
Some of the decisions you might want to make different choices on:
On identifiers, do you want to check that the scopes are exactly the same (bound-identifier=?), or would you want to assume that they would be bound outside of the syntax object and check that they are bound to the same thing, even if they have different scopes (free-identifier=?)? Note that if you choose the first one, then checking the results of macro expansion will sometimes return #false because of scope differences, but if you choose the second one, then if any identifier is not bound outside of the syntax object, then it would be as if you only care about symbol=? equality on names, so it will return #true in some places where it shouldn't. I chose the first one bound-identifier=? here because for testing, a "false positive" where the test fails is better than a "false negative" where the tests succeeds in cases it shouldn't.
On source locations, do you want to check that they are equal, or do you want to ignore them? This code ignores them because it's only for testing purposes, but if you want equality only for things which have the same source location, you might want to check that using functions like build-source-location-list.
On syntax properties, do you want to check that they are equal, or do you want to ignore them? This code ignores them because it's only for testing purposes, but if you want to check that you might want to use functions like syntax-property-symbol-keys.
Finally here is the code. It may not be exactly what you want depending on how you answered the questions above. However, its structure and how it uses equal?/recur might be helpful to you.
(require rackunit)
;; Works on fully wrapped, non-wrapped, and partially
;; wrapped values, and it checks that the inputs
;; are wrapped in all the same places. It checks scopes,
;; but it does not check source location.
(define-binary-check (check-stx=? stx=? actual expected))
;; Stx Stx -> Bool
(define (stx=? a b)
(cond
[(and (identifier? a) (identifier? b))
(bound-identifier=? a b)]
[(and (syntax? a) (syntax? b))
(and (bound-identifier=? (datum->syntax a '||) (datum->syntax b '||))
(stx=? (syntax-e a) (syntax-e b)))]
[else
(equal?/recur a b stx=?)]))

Parameterizable return-from in Common Lisp

I'm learning blocks in Common lisp and did this example to see how blocks and the return-from command work:
(block b1
(print 1)
(print 2)
(print 3)
(block b2
(print 4)
(print 5)
(return-from b1)
(print 6)
)
(print 7))
It will print 1, 2, 3, 4, and 5, as expected. Changing the return-from to (return-from b2) it'll print 1, 2, 3, 4, 5, and 7, as one would expect.
Then I tried turn this into a function and paremetrize the label on the return-from:
(defun test-block (arg) (block b1
(print 1)
(print 2)
(print 3)
(block b2
(print 4)
(print 5)
(return-from (eval arg))
(print 6)
)
(print 7)))
and using (test-block 'b1) to see if it works, but it doesn't. Is there a way to do this without conditionals?
Using a conditional like CASE to select a block to return from
The recommended way to do it is using case or similar. Common Lisp does not support computed returns from blocks. It also does not support computed gos.
Using a case conditional expression:
(defun test-block (arg)
(block b1
(print 1)
(print 2)
(print 3)
(block b2
(print 4)
(print 5)
(case arg
(b1 (return-from b1))
(b2 (return-from b2)))
(print 6))
(print 7)))
One can't compute lexical go tags, return blocks or local functions from names
CLTL2 says about the restriction for the go construct:
Compatibility note: The ``computed go'' feature of MacLisp is not supported. The syntax of a computed go is idiosyncratic, and the feature is not supported by Lisp Machine Lisp, NIL (New Implementation of Lisp), or Interlisp. The computed go has been infrequently used in MacLisp anyway and is easily simulated with no loss of efficiency by using a case statement each of whose clauses performs a (non-computed) go.
Since features like go and return-from are lexically scoped constructs, computing the targets is not supported. Common Lisp has no way to access lexical environments at runtime and query those. This is for example also not supported for local functions. One can't take a name and ask for a function object with that name in some lexical environment.
Dynamic alternative: CATCH and THROW
The typically less efficient and dynamically scoped alternative is catch and throw. There the tags are computed.
I think these sorts of things boils down to the different types of namespaces bindings and environments in Common Lisp.
One first point is that a slightly more experienced novice learning Lisp might try to modify your attempted function to say (eval (list 'return-from ,arg)) instead. This seems to make more sense but still does not work.
Namespaces
A common beginner mistake in a language like scheme is having a variable called list as this shadows the top level definition of this as a function and stops the programmer from being able to make lists inside the scope for this binding. The corresponding mistake in Common Lisp is trying to use a symbol as a function when it is only bound as a variable.
In Common Lisp there are namespaces which are mappings from names to things. Some namespaces are:
The functions. To get the corresponding thing either call it: (foo a b c ...), or get the function for a static symbol (function foo) (aka #'foo) or for a dynamic symbol (fdefinition 'foo). Function names are either symbols or lists of setf and one symbol (e.g. (serf bar)). Symbols may alternatively be bound to macros in this namespace in which case function and fdefinition signal errors.
The variables. This maps symbols to the values in the corresponding variable. This also maps symbols to constants. Get the value of a variable by writing it down, foo or dynamically as (symbol-value). A symbol may also be bound as a symbol-macro in which case special macro expansion rules apply.
Go tags. This maps symbols to labels to which one can go (like goto in other languages).
Blocks. This maps symbols to places you can return from.
Catch tags. This maps objects to the places which catch them. When you throw to an object, the implementation effectively looks up the corresponding catch in this namespace and unwinds the stack to it.
classes (and structs, conditions). Every class has a name which is a symbol (so different packages may have a point class)
packages. Each package is named by a string and possibly some nicknames. This string is normally the name of a symbol and therefore usually in uppercase
types. Every type has a name which is a symbol. Naturally a class definition also defines a type.
declarations. Introduced with declare, declaim, proclaim
there might be more. These are all the ones I can think of.
The catch-tag and declarations namespaces aren’t like the others as they don’t really map symbols to things but they do have bindings and environments in the ways described below (note that I have used declarations to refer to the things that have been declared, like the optimisation policy or which variables are special, rather than the namespace in which e.g. optimize, special, and indeed declaration live which seems too small to include).
Now let’s talk about the different ways that this mapping may happen.
The binding of a name to a thing in a namespace is the way in which they are associated, in particular, how it may come to be and how it may be inspected.
The environment of a binding is the place where the binding lives. It says how long the binding lives for and where it may be accessed from. Environments are searched for to find the thing associated with some name in some namespace.
static and dynamic bindings
We say a binding is static if the name that is bound is fixed in the source code and a binding is dynamic if the name can be determined at run time. For example let, block and tags in a tagbody all introduce static bindings whereas catch and progv introduce dynamic bindings.
Note that my definition for dynamic binding is different from the one in the spec. The spec definition corresponds to my dynamic environment below.
Top level environment
This is the environment where names are searched for last and it is where toplevel definitions go to, for example defvar, defun, defclass operate at this level. This is where names are looked up last after all other applicable environments have been searched, e.g. if a function or variable binding can not be found at a closer level then this level is searched. References can sometimes be made to bindings at this level before they are defined, although they may signal warnings. That is, you may define a function bar which calls foo before you have defined foo. In other cases references are not allowed, for example you can’t try to intern or read a symbol foo::bar before the package FOO has been defined. Many namespaces only allow bindings in the top level environment. These are
constants (within the variables namespace)
classes
packages
types
Although (excepting proclaim) all bindings are static, they can effectively be made dynamic by calling eval which evaluates forms at the top level.
Functions (and [compiler] macros) and special variables (and symbol macros) may also be defined top level. Declarations can be defined toplevel either statically with the macro declaim or dynamically with the function proclaim.
Dynamic environment
A dynamic environment exists for a region of time during the programs execution. In particular, a dynamic environment begins when control flow enters some (specific type of) form and ends when control flow leaves it, either by returning normally or by some nonlocal transfer of control like a return-from or go. To look up a dynamically bound name in a namespace, the currently active dynamic environments are searched (effectively, ie a real system wouldn’t be implemented this way) from most recent to oldest for that name and the first binding wins.
Special variables and catch tags are bound in dynamic environments. Catch tags are bound dynamically using catch while special variables are bound statically using let and dynamically using progv. As we shall discuss later, let can make two different kinds of binding and it knows to treat a symbol as special if it has been defined with defvar or ‘defparameteror if it has been declared asspecial`.
Lexical environment
A lexical environment corresponds to a region of source code as it is written and a specific runtime instantiation of it. It (slightly loosely) begins at an opening parenthesis and ends at the corresponding closing parenthesis, and is instantiated when control flow hits the opening parenthesis. This description is a little complicated so let’s have an example with variables which are bound in a lexically environment (unless they are special. By convention the names special variables are wrapped in * symbols)
(defun foo ()
(let ((x 10))
(bar (lambda () x))))
(defun bar (f)
(let ((x 20))
(funcall f)))
Now what happens when we call (foo)? Well if x were bound in a dynamic environment (in foo and bar) then the anonymous function would be called in bar and the first dynamic environment with a binding for x would have it bound to 20.
But this call returns 10 because x is bound in a lexical environment so even though the anonymous function gets passed to bar, it remembers the lexical environment corresponding to the application of foo which created it and in that lexical environment, x is bound to 10. Let’s now have another example to show what I mean by ‘specific runtime instantiation’ above.
(defun baz (islast)
(let ((x (if islast 10 20)))
(let ((lx (lambda () x)))
(if islast
lx
(frob lx (baz t))))))
(defun frob (a b)
(list (funcall a) (funcall b)))
Now running (baz nil) will give us (20 10) because the first function passed to frob remembers the lexical environment for the outer call to baz (where islast is nil) whilst the second remembers the environment for the inner call.
For variables which are not special, let creates static lexical bindings. Block names (introduced statically by block), go tags (scopes inside a tagbody), functions (by felt or labels), macros (macrolet), and symbol macros (symbol-macrolet) are all bound statically in lexical environments. Bindings from a lambda list are also lexically bound. Declarations can be created lexically using (declare ...) in one of the allowed places or by using (locally (declare ...) ...) anywhere.
We note that all lexical bindings are static. The eval trick described above does not work because eval happens in the toplevel environment but references to lexical names happen in the lexical environment. This allows the compiler to optimise references to them to know exactly where they are without running code having to carry around a list of bindings or accessing global state (e.g. lexical variables can live in registers and the stack). It also allows the compiler to work out which bindings can escape or be captured in closures or not and optimise accordingly. The one exception is that the (symbol-)macro bindings can be dynamically inspected in a sense as all macros may take an &environment parameter which should be passed to macroexpand (and other expansion related functions) to allow the macroexpander to search the compile-time lexical environment for the macro definitions.
Another thing to note is that without lambda-expressions, lexical and dynamic environments would behave the same way. But note that if there were only a top level environment then recursion would not work as bindings would not be restored as control flow leaves their scope.
Closure
What happens to a lexical binding captured by an anonymous function when that function escapes the scope it was created in? Well there are two things that can happen
Trying to access the binding results in an error
The anonymous function keeps the lexical environment alive for as long as the functions referencing it are alive and they can read and write it as they please.
The second case is called a closure and happens for functions and variables. The first case happens for control flow related bindings because you can’t return from a form that has already returned. Neither happens for macro bindings as they cannot be accessed at run time.
Nonlocal control flow
In a language like Java, control (that is, program execution) flows from one statement to the next, branching for if and switch statements, looping for others with special statements like break and return for certain kinds of jumping. For functions control flow goes into the function until it eventually comes out again when the function returns. The one nonlocal way to transfer control is by using throw and try/catch where if you execute a throw then the stack is unwound piece by piece until a suitable catch is found.
In C there are is no throw or try/catch but there is goto. The structure of C programs is secretly flat with the nesting just specifying that “blocks” end in the opposite order to the order they start. What I mean by this is that it is perfectly legal to have a while loop in the middle of a switch with cases inside the loop and it is legal to goto the middle of a loop from outside of that loop. There is a way to do nonlocal control transfer in C: you use setjmp to save the current control state somewhere (with the return value indicating whether you have successfully saved the state or just nonlocally returned there) and longjmp to return control flow to a previously saved state. No real cleanup or freeing of memory happens as the stack unwinds and there needn’t be checks that you still have the function which called setjmp on the callstack so the whole thing can be quite dangerous.
In Common Lisp there’s a range of ways to do nonlocal control transfer but the rules are more strict. Lisp doesn’t really have statements but rather everything is built out of a tree of expressions and so the first rule is that you can’t nonlocally transfer control into a deeper expression, you may only transfer out. Let’s look at how these different methods of control transfer work.
block and return-from
You’ve already seen how these work inside a single function but recall that I said block names are lexically scoped. So how does this interact with anonymous functions?
Well suppose you want to search some big nested data structure for something. If you were writing this function in Java or C then you might implement a special search function to recurse through your data structure until it finds the right thing and then return it all the way up. If you were implementing it in Haskell then you would probably want to do it as some kind of fold and rely on lazy evaluation to not do too much work. In Common Lisp you might have a function which applies some other function passed as a parameter to each item in the data structure. And now you can call that with a searching function. How might you get the result out? Well just return-from to the outer block.
tagbody and go
A tagbody is like a progn but instead of evaluating single symbols in the body, they are called tags and any expression within the tagbody can go to them to transfer control to it. This is partly like goto, if you’re still in the same function but if your go expression happens inside some anonymous function then it’s like a safe lexically scoped longjmp.
catch and throw
These are most similar to the Java model. The key difference between block and catch is that block uses lexical scoping and catch uses dynamic scoping. Therefore their relationship is like that between special and regular variables.
Finally
In Java one can execute code to tidy things up if the stack has to unwind through it as an exception is thrown. This is done with try/finally. The Common Lisp equivalent is called unwind-protect which ensures a form is executed however control flow may leave it.
Errors
It’s perhaps worth looking a little at how errors work in Common Lisp. Which of these methods do they use?
Well it turns out that the answer is that errors instead of generally unwinding the stack start by calling functions. First they look up all the possible restarts (ways to deal with an error) and save them somewhere. Next they look up all applicable handlers (a list of handlers could, for example, be stored in a special variable as handlers have dynamic scope) and try each one at a time. A handler is just a function so it might return (ie not want to handle the error) or it might not return. A handler might not return if it invokes a restart. But restarts are just normal functions so why might these not return? Well restarts are created in a dynamic environment below the one where the error was raised and so they can transfer control straight out of the handler and the code that threw the error to some code to try to do something and then carry on. Restarts can transfer control using go or return-from. It is worth noting that it is important here that we have lexical scope. A recursive function could define a restart on each successive call and so it is necessary to have lexical scope for variables and tags/block names so that we can make sure we transfer control to the right level on the call stack with the right state.

Bizarre quoted list example from On Lisp

This passage from On Lisp is genuinely confusing -- it is not clear how returning a quoted list such as '(oh my) can actually alter how the function behaves in the future: won't the returned list be generated again in the function from scratch, the next time it is called?
If we define exclaim so that its return value
incorporates a quoted list,
(defun exclaim (expression)
(append expression ’(oh my)))
Then any later destructive modification of the return value
(exclaim ’(lions and tigers and bears))
-> (LIONS AND TIGERS AND BEARS OH MY)
(nconc * ’(goodness))
-> (LIONS AND TIGERS AND BEARS OH MY GOODNESS)
could alter the list within the function:
(exclaim ’(fixnums and bignums and floats))
-> (FIXNUMS AND BIGNUMS AND FLOATS OH MY GOODNESS)
To make exclaim proof against such problems, it should be written:
(defun exclaim (expression)
(append expression (list ’oh ’my)))
How exactly is that last call to exclaim adding the word goodness to the result? The function is not referencing any outside variable so how did the separate call to nconc actually alter how the exclaim function works?
a) the effects of modifying literal lists is undefined in the Common Lisp standard. What you here see as an example is one possible behavior.
(1 2 3 4) is a literal list. But a call to LIST like in (list 1 2 3 4) returns a freshly consed list at runtime.
b) the list is literal data in the code of the function. Every call will return exactly this data object. If you want to provide a fresh list on each call, then you need to use something like LIST or COPY-LIST.
c) Since the returned list is always the same literal data object, modifying it CAN have this effect as described. One could imagine also that an error happens if the code and its objects are allocated in a read-only memory. Modifying the list then would try to write to read-only memory.
d) One thing to keep in mind when working with literal list data in source code is this: the Lisp compiler is free to optimize the storage. If a list happens to be multiple times in the source code, a compiler is allowed to detect this and to only create ONE list. All the various places would then point to this one list. Thus modifying the list would have the effect, that these changes could be visible in several places.
This may also happen with other literal data objects like arrays/vectors.
If your data structure is a part of the code, you return this internal data structure, you modify this data structure - then you try to modify your code.
Note also that Lisp can be executed by an Interpreter. The interpreter typically works on the Lisp source structure - the code is not machine code, but interpreted Lisp code as Lisp data. Here you might be able to modify the source code at runtime, not only the data embedded in the source code.

How to delete variable/forms in Lisp?

In Python we have the del statement for deleting variables.
E.g:
a = 1
del a
What the equivalent of this in Lisp?
(setq foo 1)
;; (del foo) ?
In Common Lisp.
For symbols as variables:
CL-USER 7 > (setf foo 42)
42
CL-USER 8 > foo
42
CL-USER 9 > (makunbound 'foo)
FOO
CL-USER 10 > foo
Error: The variable FOO is unbound.
See:
MAKUNBOUND (defined)
SLOT-MAKUNBOUND (defined)
FMAKUNBOUND (defined)
Python names reside in namespaces, del removes a name from a namespace. Common Lisp has a different design, one much more sympathetic to compiling efficient code.
In common lisp we have two kinds of "variables." Lexical variables account for the majority of these. Lexical variables are analogous to a C local. At runtime a lexical variable usually is implemented as a bit of storage (on the stack say) and the association with it's name is retained only for debugging purposes. It doesn't really make any sense to talk about deleting lexical variables in the sense python uses because the closest analogy to python's namespace that exists for lexical variables is the lexical scope, and that purely an abstraction used by the spec and the compiler/evaluator.
The second variable kind of "variable" in CL are "global" symbols. Symbols are very rich data structures, much richer than a label in python. They can have lots of information associated with them, a value, a printed name, their "home" package, a function, and other arbitrary information stored in a list of properties. Most of these are optional. When you use a name in your source code, e.g. (+ X 3), that name X will usually denote a lexical symbol. But failing that the compiler/evaluator will assume you want the value of the "global" symbol. I.e. you effectively wrote (symbol-value 'X) rather than X. Because of typos, programming conventions, and other things some decades ago the compilers started complaining about references to "global" symbols in the absence of a declaration that signaled that the symbols was intended to be a "global." This declaration is known as "special." Yes it's a stupid bit of nomenclature. And worse, special variables aren't just global they also have a very useful feature known as dynamic binding - but that's another topic.
Symbols that are special are almost always declared using defvar, defparameter, or defconstant. There is a nearly mandatory coding convention that they are spelled uniquely, i.e. *X* rather than X. Some compilers, and most developers, will complain if you deviate from that convention.
Ok. So now we can get back to del. Special variables are denoted with a symbol; and this is analogous to how in python variables are denoted with a name. In python the names are looked up in the current namespace. In Common Lisp they are looked up in the current package. But when the lookup happens differs. In python it's done at runtime, since names can by dynamically added and removed as the program is running. In Common Lisp the names are looked up as the program is read from a file prior to compiling it. (There are exceptions but let's avoid thinking about those.)
You can remove a symbol from a package (see unintern). But this is an rare thing and is likely to just make your brain hurt. It is a simple operation but it get's confusing around the edges because the package system has a small dose of clever features which while very helpful take a bit of effort to become comfortable with. So, in a sense, the short answer to your question is that for global symbols the unintern is the analogous operation. But your probably doing something quite exceptional (and likely wrong) if your using that.
While what #Ben writes is true, my guess is that what you are looking for is makunbound, not unintern. The former does not remove the symbol from the obarray (for Emacs Lisp) or package (for Common Lisp). It just removes its symbol-value, that is, its value as a variable. If you want the behavior that trying to get the variable value results in a not-bound (aka void) error, then try makunbound.
I am not contradicting the previous answers but adding.
Consider that Lisp has garbage collection. As you know, you get a symbol defined in many ways: setf, defvar, defparameter, make-symbol, and a lot of other ways.
But how do you get a clean system? How do you make sure that you don't use a variable by mistake? For example, you defined abc, then later decided you don't want to use abc. Instead you want an abc-1 and an abc-2. And you want the Lisp system to signal error if you try to use abc. If you cannot somehow erase abc, then the system won't stop you by signalling "undefined" error.
The other answers basically tell you that Lisp does not provide a way to get rid of abc. You can use makunbound so that the symbol is fresh with no value assigned to it. And you can use unintern so that the symbol in not in any package. Yet boundp will tell you the symbol still exists.
So, I think that as long as no other symbol refers to abc, garbage collection will eventually get rid of abc. For example, (setf pt-to-abc abc). As long as pt-to-abc is still bound to the symbol abc, abc will continue to exist even with garbage collection.
The other way to really get rid of abc is to close Lisp and restart it. Doesn't seem so desirable. But I think that closing Lisp and starting fresh is what actually gets rid of all the symbols. Then you define the symbols you want.
You probably usually want makunbound, because then, the system will signal error if you try to add the value of abc to something else because abc won't have any value. And if you try to append abc to a string, the system will signal error because abc has no string. Etc.

Strange Lisp Quoting scenario - Graham's On Lisp, page 37

I'm working my way through Graham's book "On Lisp" and can't understand the following example at page 37:
If we define exclaim so that its return value
incorporates a quoted list,
(defun exclaim (expression)
(append expression ’(oh my)))
> (exclaim ’(lions and tigers and bears))
(LIONS AND TIGERS AND BEARS OH MY)
> (nconc * ’(goodness))
(LIONS AND TIGERS AND BEARS OH MY GOODNESS)
could alter the list within the function:
> (exclaim ’(fixnums and bignums and floats))
(FIXNUMS AND BIGNUMS AND FLOATS OH MY GOODNESS)
To make exclaim proof against such problems, it should be written:
(defun exclaim (expression)
(append expression (list ’oh ’my)))
Does anyone understand what's going on here? This is seriously screwing with my mental model of what quoting does.
nconc is a destructive operation that alters its first argument by changing its tail. In this case, it means that the constant list '(oh my) gets a new tail.
To hopefully make this clearer. It's a bit like this:
; Hidden variable inside exclaim
oh_my = oh → my → nil
(exclaim '(lions and tigers and bears)) =
lions → and → tigers → and → bears → oh_my
(nconc * '(goodness)) destructively appends goodness to the last result:
lions → and → tigers → and → bears → oh → my → goodness → nil
so now, oh_my = oh → my → goodness → nil
Replacing '(oh my) with (list 'oh 'my) fixes this because there is no longer a constant being shared by all and sundry. Each call to exclaim generates a new list (the list function's purpose in life is to create brand new lists).
The observation that your mental model of quoting may be flawed is an excellent one—although it may or may not apply depending on what that mental model is.
First, remember that there are various stages to program execution. A Lisp environment must first read the program text into data structures (lists, symbols, and various literal data such as strings and numbers). Next, it may or may not compile those data structures into machine code or some sort of intermediary format. Finally, the resulting code is evaluated (in the case of machine code, of course, this may simply mean jumping to the appropriate address).
Let's put the issue of compilation aside for now and focus on the reading and evaluation stages, assuming (for simplicity) that the evaluator's input is the list of data structures read by the reader.
Consider a form (QUOTE x) where x is some textual representation of an object. This may be symbol literal as in (QUOTE ABC), a list literal as in (QUOTE (A B C)), a string literal as in (QUOTE "abc"), or any other kind of literal. In the reading stage, the reader will read the form as a list (call it form1) whose first element is the symbol QUOTE and whose second element is the object x' whose textual representation is x. Note that I'm specifically saying that the object x' is stored within the list that represents the expression, i.e. in a sense, it's stored as a part of the code itself.
Now it's the evaluator's turn. The evaluator's input is form1, which is a list. So it looks at the first element of form1, and, having determined that it is the symbol QUOTE, it returns as the result of the evaluation the second element of the list. This is the crucial point. The evaluator returns the second element of the list to be evaluated, which is what the reader read in in the first execution stage (prior to compilation!). That's all it does. There's no magic to it, it's very simple, and significantly, no new objects are created and no existing ones are copied.
Therefore, whenever you modify a “quoted list”, you're modifying the code itself. Self-modifying code is a very confusing thing, and in this case, the behaviour is actually undefined (because ANSI Common Lisp permits implementations to put code in read-only memory).
Of course, the above is merely a mental model. Implementations are free to implement the model in various ways, and in fact, I know of no implementation of Common Lisp that, like my explanation, does no compilation at all. Still, this is the basic idea.
In Common Lisp.
Remember:
'(1 2 3 4)
Above is a literal list. Constant data.
(list 1 2 3 4)
LIST is a function that when call returns a fresh new list with its arguments as list elements.
Avoid modifying literal lists. The effects are not standardized. Imagine a Lisp that compiles all constant data into a read only memory area. Imagine a Lisp that takes constant lists and shares them across functions.
(defun a () '(1 2 3)
(defun b () '(1 2 3))
A Lisp compiler may create one list that is shared by both functions.
If you modify the list returned by function a
it might not be changed
it might be changed
it might be an error
it might also change the list returned by function b
Implementations have the freedom to do what they like. This leaves room for optimizations.