good style in lisp: cons vs list - lisp

Is it good style to use cons for pairs of things or would it be preferable to stick to lists?
like for instance questions and answers:
(list
(cons
"Favorite color?"
"red")
(cons
"Favorite number?"
"123")
(cons
"Favorite fruit?"
"avocado"))
I mean, some things come naturally in pairs; there is no need for something that can hold more than two, so I feel like cons would be the natural choice. However, I also feel like I should be sticking to one thing (lists).
What would be the better or more accepted style?

What you have there is an association list (alist). Alist entries are, indeed, often simple conses rather than lists (though that is a matter of preference: some people use lists for alist entries too), so what you have is fine. Though, I usually prefer to use literal syntax:
'(("Favorite color?" . "red")
("Favorite number?" . "123")
("Favorite fruit?" . "avocado"))
Alists usually use a symbol as the key, because symbols are interned, and so symbol alists can be looked up using assq instead of assoc. Here's how it might look:
'((color . "red")
(number . "123")
(fruit . "avocado"))

The default data-structure for such case should be a HASH-TABLE.
An association list of cons pairs is also a possible variant and was widely used historically. It is a valid variant, because of tradition and simplicity. But you should not use it, when the number of pairs exceeds several (probably, 10 is a good threshold), because search time is linear, while in hash-table it is constant.
Using a list for this task is also possible, but will be both ugly and inefficient.

You would need to decide for yourself based upon circumstances. There isn't a universal answer. Different tasks work differently with structures. Consider the following:
It is faster to search in a hash-table for keys, then it is in the alist.
It is easier to have an iterator and save its state, when working with alist (hash-table would need to export all of its keys as an array or a list and have a pointer into that list, while it is enough to only remember the pointer into alist to be able to restore the iterator's state and continue the iteration.
Alist vs list: they use the same amount of conses for even number of elements, given all other characters are atoms. When using lists vs alists you would have to thus make sure there isn't an odd number of elements (and you may discover it too late), which is bad.
But there are a lot more functions, including the built-in ones, which work on proper lists, and don't work on alists. For example, nth will error on alist, if it hits the cdr, which is not a list.
Some times certain macros would not function as you'd like them to with alists, for example, this:
(destructuring-bind (a b c d)
'((100 . 200) (300 . 400))
(format t "~&~{~s~^,~}" (list a b c d)))
will not work as you might've expected.
On the other hand, certain procedures may be "tricked" into doing something which they don't do for proper lists. For instance, when copying an alist with copy-list, only the conses, whose cdr is a list will be copied anew (depending upon the circumstances this may be a desired result).

Related

What's the difference between a cons cell and a 2-vector?

In Lisps that have vectors, why are cons cells still necessary? As I understand it, a cons cell is:
A structure with exactly 2 elements
Ordered
Access is O(1)
All these also apply to a 2-vector, though. So what's the difference? Are cons cells just a vestige from before Lisps had vectors? Or are there other differences I'm unaware of?
Although, physically, conses resemble any other two-element aggregate structure, they are not simply an obsolete form of a 2-vector.
Firstly, all types in Lisp are partitioned into cons and atom. Only conses are of type cons; everything else is an atom. A vector is an atom!
Conses form the representational basis for nested lists, which of course are used to write code. They have a special printed notation, such that the object produced by (cons 1 (cons 2 nil)) conveniently prints as (1 2) and the object produced by (cons 1 (cons 2 3)) prints as (1 2 . 3).
The cons versus atom distinction is important in the syntax, because an expression which satisfies the consp test is treated as a compound form. Whereas atoms that are not keyword symbols, t or nil evaluate to themselves.
To get the list itself instead of the value of the compound form, we use quote, for which we have a nice shorthand.
It's useful to have a vector type which is free from being entangled into the evaluation semantics this way: whose instances are just self-evaluating atoms.
Cons cells are not a vestige from before Lisps had vectors. Firstly, there was almost no such a time. The Lisp 1 manual from 1960 already describes arrays. Secondly, new dialects since then still have conses.
Objects that have a similar representation are not simply redundant for each other. Type distinctions are important. For instance, we would not consider the following two to be redundant for each other just because they both have three slots:
(defstruct name first initial last)
(defstruct bank-transaction account type amount)
In the TXR Lisp dialect, I once had it so that the syntactic sugar a..b denoted (cons a b) for ranges. But this meant that ranges were consp, which was silly due to the ambiguity against lists. I eventually changed it so that a..b denotes (rcons a b): a form which constructs a range object, which prints as #R(x y). (and can be specified that way as a literal). This creates a useful nuance because we can distinguish whether a function argument is a range (rangep) or a list (consp). Just like we care whether some object is a bank-transaction or name. Range objects are represented exactly like conses and allocated from the same heaps; just they have a different type which makes them atoms. If evaluated as forms, they evaluate to themselves.
Basically, we must regard type as if it were an extra slot. A two-element vector really has (at least) three properties, not just two: it has a first element, a second element and a type. The vector #(1 1) differs from the cons cell (1 . 1) in that they both have this third aspect, type, which is not the same.
The immutable properties of an object which it shares with all other objects of its kind can still be regarded as "slots". Effectively, all objects have a "type slot". So conses are actually three-property objects having a car, cdr and type:
(car '(a . b)) -> A
(cdr '(a . b)) -> B
(type-of '(a . b)) -> CONS
Her is a fourth "slot":
(class-of '(a . b)) -> #<BUILT-IN-CLASS CONS>
We can't look at objects in terms of just their per-instance storage vector allocated on the heap.
By the way, the 1960's MacLisp dialect extended the concept of a cons into fixed-size aggregate objects that had more named fields (in addition to car and cdr): the cxr-s. These objects were called "hunks" and are documented in Kent Pitman's MacLisp manual. Hunks do not satisify the predicate consp, but hunkp; i.e. they are considered atoms. However, they extend the cons notation with multiple dots.
In a typical Common Lisp implementation, a cons cell will be represented as "two machine words" (one for the car pointer, one for the cdr pointer; the fact that it's a cons cell is encoded in the pointer constructed to reference it). However, arrays are more complicated object and unless you have a dedicated "two-element-only vector of type T", you'd end up with an array header, containing type information and size, in addition to the storage needed to store elements (probably hard to squeeze to less than "four machine words").
So while it would be eminently possible to use two-element vectors/arrays as cons cells, there's some efficiency to be had by using a dedicated type, based on the fact that cons cells and lists are so frequently used in existing Lisp code.
I think that their are different data structures, for example java has vector and list classes. One is suitable for random access and lists are more suitable for sequential access. So in any language vectors and list can coexists.
For implementing a Lisp using your approach, I believe that it is posible, it depends on your implementations details but for ANSI Common Lisp there is a convention because there is not a list datatype:
CL-USER> (type-of (list 1 2 3))
CONS
This is a CONS and the convention says something similar to this (looking at the common lisp hypersec):
list n.
1. a chain of conses in which the car of each cons is an element of the list, and the cdr of each cons is either the next link in the
chain or a terminating atom. See also proper list, dotted list, or
circular list.
2. the type that is the union of null and cons.
So if you create a Lisp using vectors instead of cons, it will be not the ANSI CL
so you can create lists "consing" things, nil is a list and there are diferrent types of list that you can create with consing:
normally you create a proper list:
(list 1 2 3) = (cons 1 (cons 2 (cons 3 nil)))) = '(1 2 3)
when the list does not end with nil it is a dotted list, and a circular list has a reference to itself
So for example if we create a string common lisp, implements it as a simple-array, which is faster for random acces than a list
CL-USER> (type-of "this is a string")
(SIMPLE-ARRAY CHARACTER (16))
Land of lisp (a great book about common lisp) define cons as the glue for building common lisp, and for processing lists, so of course if you replace cons with other thing similar you will build something similar to common lisp.
Finally a tree of the common lisp sequences types, you can find here the complete
Are cons cells just a vestige from before Lisps had vectors?
Exactly. cons, car, cdr were the constructor and accessors of the only compound data structure in the first lisp. To differentiate it with the only atomic type symbols one had atom that were T for symbols but false for cons. This was extended to other types in Lisp 1.5 including vectors (called arrays, see page 35). Common Lisp were a combination of commercial lisps that all built upon lisp 1.5. Perhaps they would have been different if both types were made from the beginning.
If you were to make a Common Lisp implementation you don't need to have two different ways to make them as long as your implementation works according to the spec. If I remember correctly I think racket actually implements struct with vector and vector? is overloaded to be #f for the vectors that are indeed representing an object. In CL you could implement defstruct the same way and implement cons struct and the functions it needs to be compatible with the hyperspec. You might be using vectors when you create cons in your favorite implementation without even knowing it.
Historically you still have the old functions so that John McCarthy code still works even 58 years after the first lisp. It didn't need to but it doesn't hurt to have a little legacy in a language that had features modern languages are getting today.
If you used two-element vectors you would store their size (and type) in every node of the list.
This is ridiculously wasteful.
You can get around this wastefulness by introducing a special 2-element vector type whose elements can be anything.
Or in other words: by re-introducing the cons cell.
On one hand this is an "implementation detail": given vectors, one can implement cons cells (and thus linked lists) using vectors of length 2.
On the other hand this is a fairly important detail: the ANSI Common Lisp standard specifies that the types vector and cons are disjoint, so, in fact, you cannot use the trick to implement an ANSI CL.

Standard name for function collecting subtrees satisfying a predicate?

I've had to re-implement a particular function in pretty much every Lisp program I've ever written. Since this function is so useful, it must have been implemented before. I'd expect it to be well known. Perhaps it is part of Common Lisp's standard library. What is it called and what library is it from?
(defun unknown-function (predicate tree)
(loop for item in tree
if (funcall predicate item) collect item
else if (listp item) append (unknown-function predicate item)))
It descends through a tree and creates a flat list of all the nodes in that tree that satisfy the predicate.
My original statement is wrong, because of the subtlety wherein the sublists are tested by the predicate before they are descended into. Here it is, for posterity:
There's no standard name for this. It's just a combination of flattening a list of lists and filtering out the elements that don't satisfy a predicate. In Common Lisp, there's no built-in flatten, but it would be a combination of your own flatten, and the standard remove-if-not.
This has a bit more in common with subst family of functions which do check the subtrees in addition to the leaves. However, they're replacing individuals elements of the tree, rather than removing them completely. So there's something in common with subst-if and subst-if-not, but they're still not quite perfect matches.

Is an empty list in Lisp built from a cons cell?

I'm trying to emulate Lisp-like list in JavaScript (just an exercise with no practical reason), but I'm struggling to figure out how to best represent an empty list.
Is an empty list just a nil value or is it under the hood stored in a cons cell?
I can:
(car '())
NIL
(cdr '())
NIL
but an empty list for sure can not be (cons nil nil), because it would be indistinguishable from a list storing a single nil. It would need to store some other special value.
On the other hand, if an empty list is not built from a cons cell, it seems impossible to have a consistent high-level interface for appending a single value to an existing list. A function like:
(defun append-value (list value) ...
Would modify its argument, but only if it is not an empty list, which seems ugly.
Believe it or not, this is actually a religious question.
There are dialects that people dare to refer to as some kind of Lisp in which empty lists are conses or aggregate objects of some kind, rather than just an atom like nil.
For example, in "MatzLisp" (better known as Ruby) lists are actually arrays.
In NewLisp, lists are containers: objects of list type which contain a linked list of the items, so empty lists are empty containers. [Reference].
In Lisp languages that aren't spectacular cluster-fumbles of this sort, empty lists are atoms, and non-empty lists are binary cells with a field which holds the first item, and another field that holds the rest of the list. Lists can share suffixes. Given a list like (1 2 3) we can use cons to create (a 1 2 3) and (b c 1 2 3) both of which share the storage for (1 2 3).
(In ANSI Common Lisp, the empty list atom () is the same object as the symbol nil, which evaluates to itself and also serves as Boolean false. In Scheme, () isn't a symbol, and is distinct from the Boolean false #f object. However Scheme lists are still made up of pairs, and terminated by an atom.)
The ability to evaluate (car nil) does not automatically follow from the cons-and-nil representation of lists, and if we look at ancient Lisp documentation, such as the Lisp 1.5 manual from early 1960-something, we will find that this was absent. Initially, car was strictly a way to access a field of the cons cell, and required strictly a cons cell argument.
Good ideas like allowing (car nil) to Just Work (so that hackers could trim many useless lines of code from their programs) didn't appear overnight. The idea of allowing (car nil) may have appeared from InterLisp. In any case, Evolution Of Lisp paper claims that MacLisp (one of the important predecessors of Common Lisp, unrelated to the Apple Macintosh which came twenty years later), imitated this feature from InterLisp (another one of the significant predecessors).
Little details like this make the difference between pleasant programming and swearing at the monitor: see for instance A Short Ballad Dedicated to the Growth of Programs inspired by one Lisp programmer's struggle with a bletcherous dialect in which empty lists cannot be accessed with car, and do not serve as a boolean false.
An empty list is simply the nil symbol (and symbols, by definition, are not conses). car and cdr are defined to return nil if given nil.
As for list-mutation functions, they return a value that you are supposed to reassign to your variable. For example, look at the specification for the nreverse function: it may modify the given list, or not, and you are supposed to use the return value, and not rely on it to be modified in-place.
Even nconc, the quintessential destructive-append function, works that way: its return value is the appended list that you're supposed to use. It is specified to modify the given lists (except the last one) in-place, but if you give it nil as the first argument, it can't very well modify that, so you still have to use the return value.
NIL is somewhat a strange beast in Common Lisp because
it's a symbol (meaning that symbolp returns T)
is a list
is NOT a cons cell (consp returns NIL)
you can take CAR and CDR of it anyway
Note that the reasons behind this are probably also historical and you shouldn't think that this is the only reasonable solution. Other Lisp dialects made different choices.
Try it with your Lisp interpreter:
(eq nil '())
=> t
Several operations are special-cased to do unorthogonal (or even curious :-) things when operating on nil / an empty list. The behavior of car and cdr you were investigating is one of those things.
The idenity of nil as the empty list is one of the first things you learn about Lisp. I tried to come up with a good Google hit but I'll just pick one because there are so many: http://www.cs.sfu.ca/CourseCentral/310/pwfong/Lisp/1/tutorial1.html

Iterate over Emacs Lisp hash-table

How to iterate over items (keys, values) in Elisp hash-tables?
I created a hash-table (map, dictionary) using (make-hash-table) and populated it with different items. In Python i could iterate over dicts by:
for k in d # iterate over keys
for k in d.keys() # same
for k in d.values() # iterate over values
for k in d.items() # iterate over tuples (key, value)
How can i do the same the most succinct and elegant way possible, preferably without loop-macro?
(maphash (lambda (key value) ....your code here...) hash-table)
I'm going to advertise myself a bit, so take it with a grain of salt, but here are, basically, your options:
maphash - this is the built-in iteration primitive, fundamentally, no more ways to do it exist.
(loop for KEY being the hash-key of TABLE for VALUE being the hash-value of TABLE ...) is available in cl package. It will internally use maphash anyway, but it offers you some unification on top of different iterating primitives. You can use loop macro to iterate over multiple different things, and it reduces the clutter by removing the technical info from sight.
http://code.google.com/p/i-iterate/ Here's a library I'm working on to provide more versatile ways of iterating over different things and in different ways in Emacs Lisp. It is inspired by Common Lisp Iterate library, but it departed from it quite far (however, some basic principles still hold). If you were to try this library, the iteration over the hash-table would look like this: (++ (for (KEY VALUE) pairs TABLE) ...) or (++ (for KEY keys TABLE) ...) or (++ (for VALUE values TABLE) ...).
I will try to describe cons and pros of using either cl loop or i-iterate.
Unlike loop, iterate allows iterating over multiple hash-tables at once (but you must be aware of the additional cost it incurs: the keys of the second, third etc. hash-tables must be collected into a list before iterating, this is done behind the scenes).
Iterate provides arguably more Lisp-y syntax, which is easier to format in the editor.
With iterate you have more (and potentially even more in the future) options to combine iteration with other operations.
No one else so far is using it, beside myself :) It probably still has bugs and some things may be reworked, but it is near feature-freeze and is getting ready for proper use.
Significantly more people are familiar with either the built-in iteration primitives or the cl library.
Just as an aside, the full version of the iterate on hash-tables looks like this: (for VAR pairs|keys|values TABLE &optional limit LIMIT), where LIMIT stands for the number of element you want to look at (it will generate more efficient code, then if you were to break from the loop using more general-purpose tools).
maphash is the function you want. In addition I would suggest you to look at the manual (info "(elisp) Hash Tables")
Starting from 2013 there is a third-party library ht, which provides many convenient functions to operate on Elisp hash-tables.
Suppose you have a hash-table, where keys are strings and values are integers. To iterate over a hash-table and return a list, use ht-map:
(ht-map (lambda (k v) (+ (length k) v)) table)
;; return list of all values added to length of their keys
ht-each is just an alias for maphash. There are also anaphoric versions of the above 2 functions, called ht-amap and ht-aeach. Instead of accepting an anonymous function, they expose variables key and value. Here's the equivalent expression to the one above:
(ht-amap (+ (length key) value) table)
I would have preferred to put this into a comment, but my reputation
rating ironically prevents me from writing this in the appropriate
format...
loop is considered deprecated and so is the cl library,
because it didn't adhere to the convention of prefixing all symbols by
a common library prefix and thus polluted the obarray with symbols
without clear library association.
Instead use cl-lib which defines the same functions and macros but
names them e.g. cl-loop and cl-defun instead of loop and
defun*. If you need only the macros, you can import cl-macs
instead.

racket/scheme Checking for struture equality

Ok I need some help with thinking through this conceputally.
I need to check if a list and another list is structurally equal.
For example:
(a (bc) de)) is the same as (f (gh) ij)), because they have the same structure.
Now cleary the base case will be if both list are empty they are structurally equal.
The recursive case on the other hand I'm not sure where to start.
Some ideas:
Well we are not going to care if the elements are == to each other because that doesn't matter. We just care in the structure. I do know we will car down the list and recursively call the function with the cdr of the list.
The part that confuses me is how do you determine wheter an atom or sublist has the same structure?
Any help will be appreciated.
You're getting there. In the (free, online, excellent) textbook, this falls into section 17.3, "Processing two lists simultaneously: Case 3". I suggest you take a look.
http://www.htdp.org/2003-09-26/Book/curriculum-Z-H-1.html#node_toc_node_sec_17.3
One caveat: it looks like the data definition you're working with is "s-expression", which you can state like this:
;; an s-expression is either
;; - the empty list, or
;; - (cons symbol s-expression), or
;; - (cons s-expression s-expression)
Since this data definition has three cases, there are nine possibilities when considering two of them.
John Clements
(Yes, you could reduce the number of cases by embedding the data in the more general one that includes improper lists. Doesn't sound like a good idea to me.)