values function in Common Lisp - lisp

Is the values function in Common Lisp just syntactic sugar for packaging multiple values into a list that gets destructured by the caller?. I am asking because I thought Common Lisp supports "true" multiple value return rather than returning a tuple or a list as in other languages, such as python. Someone just told me that it's just syntactic sugar, so I would like someone to kindly explain it. To try to understand the type that is returned by the values function, I typed (type-of (values 1 2 3)), and the output was BIT. I searched in the Common Lisp reference for that and I couldn't find it mentioned in the datatypes section. Also, can anyone share some resources that suggest how the values function is implemented in Common Lisp?. Thank you.

Multiple Values in CL
The language Common lisp
is described in the ANSI standard INCITS 226-1994 (R2004) and has many
implementations.
Each can implement multiple values
as it sees fit, and they are allowed, of course, to cons up a list for them
(in fact, the Emacs Lisp compatibility layer for CL does just
that -
but it is, emphatically and intentionally, not a Common Lisp implementation).
Purpose
However, the intent of this facility is to enable passing (at least
some) multiple values without consing (i.e., without allocating
heap memory) and all CL
implementations I know of do that.
In this sense the multiple values facility is an optimization.
Of course, the implementation of this feature can be very different for
different platforms and scenarios. E.g., the first few (say, 20 -
required by the standard) are
stored in a static of thread-local vector, the next several (1000?) are
allocated on the stack, and the rest (if needed) are allocated on the
heap as a vector or list.
Usage
E.g., the function floor returns two values.
If you write
(setq a (floor 10 3))
you capture only the first one and discard the second one, you need to
write
(setf (values q r) (floor 10 3))
to capture both values. This is similar to what other
languages might express as
q,r = floor(10,3)
using tuples, except that CL does
not allocate memory to pass (just a few) multiple values, and the
other languages often do.
IOW, one can think of multiple values as an ephemeral struct.
Note that CL can convert multiple values to lists:
(destructuring-bind (q r) (multiple-value-list (floor 10 3))
; use q & r here
...)
instead of the more efficient and concise
(multiple-value-bind (q r) (floor 10 3)
; use q & r here
...)
MV & type
CL does not have a special type for the "multiple value object"
exactly because it does not allocate a separate object to pass
around multiple values. In this sense one can, indeed, claim that
values is syntactic sugar.
However, in CL one can declare a
function type returning
multiple values:
(declaim (ftype (real &optional real) (values real real)) floor)
This means that floor returns two values, both
reals (as opposed to returning
a value of type (values real real)), i.e., in this case one might
claim abuse of notation.
Your case
In your specific case, type-of
is an ordinary function (i.e., not a macro or special operator).
You pass it a single object, 1, because, unless you are using
multiple-value-bind and
friends, only the first value is used, so
(type-of (values 1 2 3))
is identical to
(type-of 1)
and type of 1 is bit.
PS: Control return values
One use of values is to
control the return values of a function.
Normally a CL function's return values are those of the last form.
Sometimes it is not desirable, e.g., the last form return multiple
values and you want your function to return one value (or none,
like void in C):
(defun 2values (x y)
(floor y x))
(defun 1value (x y)
(values (floor y x)))
(defun no-values (x)
(print x)
(values))

The values function isn't just syntactic sugar for making a list for the caller to destructure.
For one, if the caller expects only a single value, it will get only one value (the first), not a list, from a form that returns multiple values. Since type-of takes only one value as an argument, it is giving you the type of the first value, 1. 1 is of type BIT.
Each Common Lisp implementation is free to pursue its own strategy for implementing multiple values. I learned a lot from what Frode Fjeld wrote about how his implementation, Movitz, handles it in The Movitz development platform, section 2.5.

If you make a CL implementation you could implement it with lists as long as it coheres to the spec. You need to handle one value specific and you need some way to tag zero, 2..n values and the other functions need to understand that format and print can be made to display it the same way as in other makes.
Most likely values and its sister functions is an optimization where the implementations use the stack instead of consing the values to a list structure just to get it destructured in the next level. In the olden times where RAM and CPU was not to be wasted it was very important, but I doubt you'll notice real trouble should you use destructuring-bind instead of multiple-value-bind today.
Common Lisp differs from Scheme a great deal in the positive direction that you can make a function, eg. floor where in it's calculations end up with the remainder in addition to the quotient answer, return all values at the same time but you are allowed to use it as if it only returned the very first value. I really miss that sometimes when writing Scheme since it demands you have a call-with-values that is similar to multiple-value-call or syntactic sugar like let-values to handle all the returned values that again makes you end up with making three versions in case you only need just one of the values.

Related

Understanding `define-symbolic`

I'm reading the Rosette Essentials.
The guide explains the use of define-symbolic:
The define-symbolic form binds the variable to the same (unique) [symbolic] constant every time it is evaluated.
(define (static)
(define-symbolic x boolean?) ; Creates the same constant when evaluated.
x)
(define (dynamic)
(define-symbolic* y integer?) ; Creates a fresh constant when evaluated.
y)
> (eq? (static) (static))
#t
> (eq? (dynamic) (dynamic))
(= y$1 y$2)
Printed constant names, such as x or b, are just comments. Two [symbolic] constants created by evaluating two distinct define-symbolic (or, define-symbolic*) forms are distinct, even if they have the same printed name. They may still represent the same concrete value, but that is determined by the solver:
(define (yet-another-x)
(define-symbolic x boolean?)
x)
> (eq? (static) (yet-another-x))
(<=> x x)
I don't really understand the explanation of define-symbolic:
Why (eq? (static) (static)) returns #t, but (eq? (static) (yet-another-x)) returns (<=> x x)?
Since the define-symbolic binds the variable to the same (unique) constant every time it is evaluated, why (eq? (static) (yet-another-x)) does not return #t? , just like (eq? (static) (static))?
Does it mean that two symbolic variables occur in the different scope with the same name are still different, even if we use define-symbolic?
What's the difference between <=> and =? e.g. (<=> x x) vs (= y$1 y$2).
Thanks.
Here's an attempt to explain what the docs are saying:
It matters which use of define-symbolic two different symbolic constants come from. So the two uses inside static and yet-another-x are different uses and thus different symbolic constants.
It doesn't matter how many times you evaluate the same use of define-symbolic, it always produces the same symbolic constant. You can think of it as being hoisted out to the top-level of your program.
For define-symbolic*, it does matter when you evaluate things, so (eq? (dynamic) (dynamic)) produces a constraint that might or might not be true, depending on what the solver says and what other constraints you add.
= only works on numbers. <=> only works on booleans. In general eq? probably isn't what you want for these comparisons.
As you notice, there's something going on here with regard to the scope, which is left entirely out of the docs, unspecified, not mentioned at all. Very unfortunate; extremely vague. Few more remarks could probably go a long way toward clarifying the situation; but we're left out forced to guess.
My guess is that the form (define-symbolic x boolean?) knows something about where it appears, and acts accordingly each time it (the form) is evaluated. In
(define (static)
(define-symbolic x boolean?) ; Creates the same constant when evaluated.
x)
what is the scope of x? The only possible reading seems to me to be, that the call (define-symbolic x boolean?) creates a local binding inside static and sets it to the "symbolic constant" it (i.e. the call) creates. In normal Scheme these would still be different entities on each different invocation of static, of course. Rosette does its own thing here, evidently.
All we have is the docs (sans the sources of course), so we might as well just go by them.
As to what "symbolic constants" are, they seems closely related to Prolog's logical variables, but such that also know something about their own type (and probably more "constraints" or what have you).
So it looks like (static) becomes kind of a generator of a logical variable. It somehow knows about its internal binding, knows whether it is called for the very first time or not, and acts accordingly.
In short, just read it slowly and take it on faith. Nothing else to do, unless you're willing to study the sources.
Regarding (<=> x x) and (= y$1 y$2), these look like constraints. The guide calls them "symbolic expressions". The <=> most likely means "not equal" is described here as "the logical bi-implication of two boolean values", and the = means equality. x, x, y$1 and y$2 are printed representations of the four different symbolic constants.
Two different symbolic constants can be known (required) to be (eventually) equal, as with the second constraint; or to be not equal, as with the first.
While they (the constants) don't yet have any concrete value associated with them the constraint remains symbolic; but it can be actually checked as soon as the symbolic constant/logical variable gets its concrete value (or possibly gets more constraints associated with it, so those constraints could in theory be checked for consistency).

What's the difference between a cons cell and a 2-vector?

In Lisps that have vectors, why are cons cells still necessary? As I understand it, a cons cell is:
A structure with exactly 2 elements
Ordered
Access is O(1)
All these also apply to a 2-vector, though. So what's the difference? Are cons cells just a vestige from before Lisps had vectors? Or are there other differences I'm unaware of?
Although, physically, conses resemble any other two-element aggregate structure, they are not simply an obsolete form of a 2-vector.
Firstly, all types in Lisp are partitioned into cons and atom. Only conses are of type cons; everything else is an atom. A vector is an atom!
Conses form the representational basis for nested lists, which of course are used to write code. They have a special printed notation, such that the object produced by (cons 1 (cons 2 nil)) conveniently prints as (1 2) and the object produced by (cons 1 (cons 2 3)) prints as (1 2 . 3).
The cons versus atom distinction is important in the syntax, because an expression which satisfies the consp test is treated as a compound form. Whereas atoms that are not keyword symbols, t or nil evaluate to themselves.
To get the list itself instead of the value of the compound form, we use quote, for which we have a nice shorthand.
It's useful to have a vector type which is free from being entangled into the evaluation semantics this way: whose instances are just self-evaluating atoms.
Cons cells are not a vestige from before Lisps had vectors. Firstly, there was almost no such a time. The Lisp 1 manual from 1960 already describes arrays. Secondly, new dialects since then still have conses.
Objects that have a similar representation are not simply redundant for each other. Type distinctions are important. For instance, we would not consider the following two to be redundant for each other just because they both have three slots:
(defstruct name first initial last)
(defstruct bank-transaction account type amount)
In the TXR Lisp dialect, I once had it so that the syntactic sugar a..b denoted (cons a b) for ranges. But this meant that ranges were consp, which was silly due to the ambiguity against lists. I eventually changed it so that a..b denotes (rcons a b): a form which constructs a range object, which prints as #R(x y). (and can be specified that way as a literal). This creates a useful nuance because we can distinguish whether a function argument is a range (rangep) or a list (consp). Just like we care whether some object is a bank-transaction or name. Range objects are represented exactly like conses and allocated from the same heaps; just they have a different type which makes them atoms. If evaluated as forms, they evaluate to themselves.
Basically, we must regard type as if it were an extra slot. A two-element vector really has (at least) three properties, not just two: it has a first element, a second element and a type. The vector #(1 1) differs from the cons cell (1 . 1) in that they both have this third aspect, type, which is not the same.
The immutable properties of an object which it shares with all other objects of its kind can still be regarded as "slots". Effectively, all objects have a "type slot". So conses are actually three-property objects having a car, cdr and type:
(car '(a . b)) -> A
(cdr '(a . b)) -> B
(type-of '(a . b)) -> CONS
Her is a fourth "slot":
(class-of '(a . b)) -> #<BUILT-IN-CLASS CONS>
We can't look at objects in terms of just their per-instance storage vector allocated on the heap.
By the way, the 1960's MacLisp dialect extended the concept of a cons into fixed-size aggregate objects that had more named fields (in addition to car and cdr): the cxr-s. These objects were called "hunks" and are documented in Kent Pitman's MacLisp manual. Hunks do not satisify the predicate consp, but hunkp; i.e. they are considered atoms. However, they extend the cons notation with multiple dots.
In a typical Common Lisp implementation, a cons cell will be represented as "two machine words" (one for the car pointer, one for the cdr pointer; the fact that it's a cons cell is encoded in the pointer constructed to reference it). However, arrays are more complicated object and unless you have a dedicated "two-element-only vector of type T", you'd end up with an array header, containing type information and size, in addition to the storage needed to store elements (probably hard to squeeze to less than "four machine words").
So while it would be eminently possible to use two-element vectors/arrays as cons cells, there's some efficiency to be had by using a dedicated type, based on the fact that cons cells and lists are so frequently used in existing Lisp code.
I think that their are different data structures, for example java has vector and list classes. One is suitable for random access and lists are more suitable for sequential access. So in any language vectors and list can coexists.
For implementing a Lisp using your approach, I believe that it is posible, it depends on your implementations details but for ANSI Common Lisp there is a convention because there is not a list datatype:
CL-USER> (type-of (list 1 2 3))
CONS
This is a CONS and the convention says something similar to this (looking at the common lisp hypersec):
list n.
1. a chain of conses in which the car of each cons is an element of the list, and the cdr of each cons is either the next link in the
chain or a terminating atom. See also proper list, dotted list, or
circular list.
2. the type that is the union of null and cons.
So if you create a Lisp using vectors instead of cons, it will be not the ANSI CL
so you can create lists "consing" things, nil is a list and there are diferrent types of list that you can create with consing:
normally you create a proper list:
(list 1 2 3) = (cons 1 (cons 2 (cons 3 nil)))) = '(1 2 3)
when the list does not end with nil it is a dotted list, and a circular list has a reference to itself
So for example if we create a string common lisp, implements it as a simple-array, which is faster for random acces than a list
CL-USER> (type-of "this is a string")
(SIMPLE-ARRAY CHARACTER (16))
Land of lisp (a great book about common lisp) define cons as the glue for building common lisp, and for processing lists, so of course if you replace cons with other thing similar you will build something similar to common lisp.
Finally a tree of the common lisp sequences types, you can find here the complete
Are cons cells just a vestige from before Lisps had vectors?
Exactly. cons, car, cdr were the constructor and accessors of the only compound data structure in the first lisp. To differentiate it with the only atomic type symbols one had atom that were T for symbols but false for cons. This was extended to other types in Lisp 1.5 including vectors (called arrays, see page 35). Common Lisp were a combination of commercial lisps that all built upon lisp 1.5. Perhaps they would have been different if both types were made from the beginning.
If you were to make a Common Lisp implementation you don't need to have two different ways to make them as long as your implementation works according to the spec. If I remember correctly I think racket actually implements struct with vector and vector? is overloaded to be #f for the vectors that are indeed representing an object. In CL you could implement defstruct the same way and implement cons struct and the functions it needs to be compatible with the hyperspec. You might be using vectors when you create cons in your favorite implementation without even knowing it.
Historically you still have the old functions so that John McCarthy code still works even 58 years after the first lisp. It didn't need to but it doesn't hurt to have a little legacy in a language that had features modern languages are getting today.
If you used two-element vectors you would store their size (and type) in every node of the list.
This is ridiculously wasteful.
You can get around this wastefulness by introducing a special 2-element vector type whose elements can be anything.
Or in other words: by re-introducing the cons cell.
On one hand this is an "implementation detail": given vectors, one can implement cons cells (and thus linked lists) using vectors of length 2.
On the other hand this is a fairly important detail: the ANSI Common Lisp standard specifies that the types vector and cons are disjoint, so, in fact, you cannot use the trick to implement an ANSI CL.

Evaluating arguments of Clojure tagged literals

I have been trying to use Clojure tagged literals, and noticed that the reader does not evaluate the arguments, very much like a macro. This makes sense, but what is the appropriate solution for doing this? Explicit eval?
Example: given this function
(defn my-data
([[arg]]
(prn (symbol? arg))
:ok))
and this definition data_readers.clj
{myns/my-data my.ns/my-data}
The following behave differently:
> (let [x 1] (my.ns/my-data [x]))
false
:ok
So the x passed in is evaluated before being passed into my-data. On the other hand:
> (let [x 1] #myns/my-data [x])
true
:ok
So if I want to use the value of x inside my-data, the my-data function needs to do something about it, namely check that x is a symbol, and if so, use (eval x). That seems ugly. Is there a better approach?
Summary
There is no way to get at the value of the local x in your example, primarily because locals only get assigned values at runtime, whereas tagged literals are handled at read time. (There's also compile time in between; it is impossible to get at locals' values at compile time; therefore macros cannot get at locals' values either.)1
The better approach is to use a regular function at runtime, since after all you want to construct a value based on the runtime values of some parameters. Tagged literals really are literals and should be used as such.
Extended discussion
To illustrate the issue described above:
(binding [*data-readers* {'bar (fn [_] (java.util.Date.))}]
(eval (read-string "(defn foo [] #bar x)")))
foo will always return the same value, because the reader has only one opportunity to return a value for #bar x which is then baked into foo's bytecode.
Notice also that instead of passing it to eval directly, we could store the data structure returned by the call to read-string and compile it at an arbitrary point in the future; the value returned by foo would remain the same. Clearly there's no way for such a literal value to depend on the future values of any locals; in fact, during the reader's operation, it is not even clear which symbols will come to name locals -- that's for the compiler to determine and the result of that determination may be non-obvious in cases where macros are involved.
Of course the reader is free to return a form which looks like a function call, the name of a local etc. To show one example:
(binding [*data-readers* {'bar (fn [sym] (list sym 1 2 3 4 5))}]
(eval (read-string "#bar *")))
;= 120
;; substituting + for * in the string yields a value of 15
Here #bar f becomes equivalent to (f 1 2 3 4 5). Needless to say, this is an abuse of notation and doesn't really do what you asked for.
1 It's worth pointing out that eval has no access to locals (it always operates in the global scope), but the issue with locals not having values assigned before runtime is more fundamental.

Strange Lisp Quoting scenario - Graham's On Lisp, page 37

I'm working my way through Graham's book "On Lisp" and can't understand the following example at page 37:
If we define exclaim so that its return value
incorporates a quoted list,
(defun exclaim (expression)
(append expression ’(oh my)))
> (exclaim ’(lions and tigers and bears))
(LIONS AND TIGERS AND BEARS OH MY)
> (nconc * ’(goodness))
(LIONS AND TIGERS AND BEARS OH MY GOODNESS)
could alter the list within the function:
> (exclaim ’(fixnums and bignums and floats))
(FIXNUMS AND BIGNUMS AND FLOATS OH MY GOODNESS)
To make exclaim proof against such problems, it should be written:
(defun exclaim (expression)
(append expression (list ’oh ’my)))
Does anyone understand what's going on here? This is seriously screwing with my mental model of what quoting does.
nconc is a destructive operation that alters its first argument by changing its tail. In this case, it means that the constant list '(oh my) gets a new tail.
To hopefully make this clearer. It's a bit like this:
; Hidden variable inside exclaim
oh_my = oh → my → nil
(exclaim '(lions and tigers and bears)) =
lions → and → tigers → and → bears → oh_my
(nconc * '(goodness)) destructively appends goodness to the last result:
lions → and → tigers → and → bears → oh → my → goodness → nil
so now, oh_my = oh → my → goodness → nil
Replacing '(oh my) with (list 'oh 'my) fixes this because there is no longer a constant being shared by all and sundry. Each call to exclaim generates a new list (the list function's purpose in life is to create brand new lists).
The observation that your mental model of quoting may be flawed is an excellent one—although it may or may not apply depending on what that mental model is.
First, remember that there are various stages to program execution. A Lisp environment must first read the program text into data structures (lists, symbols, and various literal data such as strings and numbers). Next, it may or may not compile those data structures into machine code or some sort of intermediary format. Finally, the resulting code is evaluated (in the case of machine code, of course, this may simply mean jumping to the appropriate address).
Let's put the issue of compilation aside for now and focus on the reading and evaluation stages, assuming (for simplicity) that the evaluator's input is the list of data structures read by the reader.
Consider a form (QUOTE x) where x is some textual representation of an object. This may be symbol literal as in (QUOTE ABC), a list literal as in (QUOTE (A B C)), a string literal as in (QUOTE "abc"), or any other kind of literal. In the reading stage, the reader will read the form as a list (call it form1) whose first element is the symbol QUOTE and whose second element is the object x' whose textual representation is x. Note that I'm specifically saying that the object x' is stored within the list that represents the expression, i.e. in a sense, it's stored as a part of the code itself.
Now it's the evaluator's turn. The evaluator's input is form1, which is a list. So it looks at the first element of form1, and, having determined that it is the symbol QUOTE, it returns as the result of the evaluation the second element of the list. This is the crucial point. The evaluator returns the second element of the list to be evaluated, which is what the reader read in in the first execution stage (prior to compilation!). That's all it does. There's no magic to it, it's very simple, and significantly, no new objects are created and no existing ones are copied.
Therefore, whenever you modify a “quoted list”, you're modifying the code itself. Self-modifying code is a very confusing thing, and in this case, the behaviour is actually undefined (because ANSI Common Lisp permits implementations to put code in read-only memory).
Of course, the above is merely a mental model. Implementations are free to implement the model in various ways, and in fact, I know of no implementation of Common Lisp that, like my explanation, does no compilation at all. Still, this is the basic idea.
In Common Lisp.
Remember:
'(1 2 3 4)
Above is a literal list. Constant data.
(list 1 2 3 4)
LIST is a function that when call returns a fresh new list with its arguments as list elements.
Avoid modifying literal lists. The effects are not standardized. Imagine a Lisp that compiles all constant data into a read only memory area. Imagine a Lisp that takes constant lists and shares them across functions.
(defun a () '(1 2 3)
(defun b () '(1 2 3))
A Lisp compiler may create one list that is shared by both functions.
If you modify the list returned by function a
it might not be changed
it might be changed
it might be an error
it might also change the list returned by function b
Implementations have the freedom to do what they like. This leaves room for optimizations.

Apply-recur macro in Clojure

I'm not very familiar with Clojure/Lisp macros. I would like to write apply-recur macro which would have same meaning as (apply recur ...)
I guess there is no real need for such macro but I think it's a good exercise. So I'm asking for your solution.
Well, there really is no need for that, if only because recur cannot take varargs (a recur to the top of the function takes a single final seqable argument grouping all arguments pass the last required argument). This doesn't affect the validity of the exercise, of course.
However, there is a problem in that a "proper" apply-recur should presumably handle argument seqs returned by arbitrary expressions and not only literals:
;; this should work...
(apply-recur [1 2 3])
;; ...and this should have the same effect...
(apply-recur (vector 1 2 3))
;; ...as should this, if (foo) returns [1 2 3]
(apply-recur (foo))
However, the value of an arbitrary expression such as (foo) is simply not available, in general, at macro expansion time. (Perhaps (vector 1 2 3) might be assumed to always yield the same value, but foo might mean different things at different times (one reason eval wouldn't work), be a let-bound local rather than a Var (another reason eval wouldn't work) etc.)
Thus to write a fully general apply-recur, we would need to be able to determine how many arguments a regular recur form would expect and have (apply-recur some-expression) expand to something like
(let [seval# some-expression]
(recur (nth seval# 0)
(nth seval# 1)
...
(nth seval# n-1))) ; n-1 being the number of the final parameter
(The final nth might need to be nthnext if we're dealing with varargs, which presents a problem similar to what is described in the next paragraph. Also, it would be a good idea to add an assertion to check the length of the seqable returned by some-expression.)
I am not aware of any method to determine the proper arity of a recur at a particular spot in the code at macro-expansion time. That does not mean one isn't available -- that's something the compiler needs to know anyway, so perhaps there is a way to extract that information from its internals. Even so, any method for doing that would almost certainly need to rely on implementation details which might change in the future.
Thus the conclusion is this: even if it is at all possible to write such a macro (which might not even be the case), it is likely that any implementation would be very fragile.
As a final remark, writing an apply-recur which would only be capable of dealing with literals (actually the general structure of the arg seq would need to be given as a literal; the arguments themselves -- not necessarily, so this could work: (apply-recur [foo bar baz]) => (recur foo bar baz)) would be fairly simple. I'm not spoiling the exercise by giving away the solution, but, as a hint, consider using ~#.
apply is a function that takes another function as an argument. recur is a special form, not a function, so it cannot be passed to apply.