racket/scheme Checking for struture equality - racket

Ok I need some help with thinking through this conceputally.
I need to check if a list and another list is structurally equal.
For example:
(a (bc) de)) is the same as (f (gh) ij)), because they have the same structure.
Now cleary the base case will be if both list are empty they are structurally equal.
The recursive case on the other hand I'm not sure where to start.
Some ideas:
Well we are not going to care if the elements are == to each other because that doesn't matter. We just care in the structure. I do know we will car down the list and recursively call the function with the cdr of the list.
The part that confuses me is how do you determine wheter an atom or sublist has the same structure?
Any help will be appreciated.

You're getting there. In the (free, online, excellent) textbook, this falls into section 17.3, "Processing two lists simultaneously: Case 3". I suggest you take a look.
http://www.htdp.org/2003-09-26/Book/curriculum-Z-H-1.html#node_toc_node_sec_17.3
One caveat: it looks like the data definition you're working with is "s-expression", which you can state like this:
;; an s-expression is either
;; - the empty list, or
;; - (cons symbol s-expression), or
;; - (cons s-expression s-expression)
Since this data definition has three cases, there are nine possibilities when considering two of them.
John Clements
(Yes, you could reduce the number of cases by embedding the data in the more general one that includes improper lists. Doesn't sound like a good idea to me.)

Related

how can I compare syntax objects in racket?

I'd like to compare the code contents of two syntax objects and ignore things like contexts. Is converting them to datum the only way to do so? Like:
(equal? (syntax->datum #'(x+1)) (syntax->datum #'(x+1)))
If you want to compare both objects without deconstructing them at all, then yes.
HOWEVER, the problem with this method is that it only compares the datum attached to two syntax objects, and won't actually compare their binding information.
The analogy that I've heard (from Ryan Culpepper), is this is kind of like taking two paintings, draining of them of their color, and seeing if they are identical. While they might be similar in some ways, you will miss a lot of differences from the different colors.
A better approach (although it does require some work), is to use syntax-e to destruct the syntax object into more primitive lists of syntax objects, and do this until you get identifiers (basically a syntax object whose datum is a symbol), from there, you can generally use free-identifier=? (and sometimes bound-identifier=? to see if each identifier can bind each other, and identifier-binding to compare module level identifiers.
The reason why there isn't a single simple predicate to compare two arbitrary syntax objects is because, generally, there isn't really one good definition for what makes two pieces of code equal, even if you only care about syntactic equality. For example, using the functions referenced above doesn't track internal bindings in a syntax object, so you will still get a very strict definition of what it means to be 'equal'. that is, both syntax objects have the same structure with identifiers that are either bound to the same module, or are free-identifier=?.
As such, before you use this answer, I highly recommend you take a step back and make sure this is really what you want to do. Once in a blue moon it is, but most of the time you actually are trying to solve a similar, yet simpler, problem.
Here's a concrete example of one possible way you could do the "better approach" Leif Andersen mentioned.
I have used this in multiple places for testing purposes, though if anyone wanted to use it in non-test code, they would probably want to re-visit some of the design decisions.
However, things like the equal?/recur pattern used here should be helpful no matter how you decide to define what equality means.
Some of the decisions you might want to make different choices on:
On identifiers, do you want to check that the scopes are exactly the same (bound-identifier=?), or would you want to assume that they would be bound outside of the syntax object and check that they are bound to the same thing, even if they have different scopes (free-identifier=?)? Note that if you choose the first one, then checking the results of macro expansion will sometimes return #false because of scope differences, but if you choose the second one, then if any identifier is not bound outside of the syntax object, then it would be as if you only care about symbol=? equality on names, so it will return #true in some places where it shouldn't. I chose the first one bound-identifier=? here because for testing, a "false positive" where the test fails is better than a "false negative" where the tests succeeds in cases it shouldn't.
On source locations, do you want to check that they are equal, or do you want to ignore them? This code ignores them because it's only for testing purposes, but if you want equality only for things which have the same source location, you might want to check that using functions like build-source-location-list.
On syntax properties, do you want to check that they are equal, or do you want to ignore them? This code ignores them because it's only for testing purposes, but if you want to check that you might want to use functions like syntax-property-symbol-keys.
Finally here is the code. It may not be exactly what you want depending on how you answered the questions above. However, its structure and how it uses equal?/recur might be helpful to you.
(require rackunit)
;; Works on fully wrapped, non-wrapped, and partially
;; wrapped values, and it checks that the inputs
;; are wrapped in all the same places. It checks scopes,
;; but it does not check source location.
(define-binary-check (check-stx=? stx=? actual expected))
;; Stx Stx -> Bool
(define (stx=? a b)
(cond
[(and (identifier? a) (identifier? b))
(bound-identifier=? a b)]
[(and (syntax? a) (syntax? b))
(and (bound-identifier=? (datum->syntax a '||) (datum->syntax b '||))
(stx=? (syntax-e a) (syntax-e b)))]
[else
(equal?/recur a b stx=?)]))

Standard name for function collecting subtrees satisfying a predicate?

I've had to re-implement a particular function in pretty much every Lisp program I've ever written. Since this function is so useful, it must have been implemented before. I'd expect it to be well known. Perhaps it is part of Common Lisp's standard library. What is it called and what library is it from?
(defun unknown-function (predicate tree)
(loop for item in tree
if (funcall predicate item) collect item
else if (listp item) append (unknown-function predicate item)))
It descends through a tree and creates a flat list of all the nodes in that tree that satisfy the predicate.
My original statement is wrong, because of the subtlety wherein the sublists are tested by the predicate before they are descended into. Here it is, for posterity:
There's no standard name for this. It's just a combination of flattening a list of lists and filtering out the elements that don't satisfy a predicate. In Common Lisp, there's no built-in flatten, but it would be a combination of your own flatten, and the standard remove-if-not.
This has a bit more in common with subst family of functions which do check the subtrees in addition to the leaves. However, they're replacing individuals elements of the tree, rather than removing them completely. So there's something in common with subst-if and subst-if-not, but they're still not quite perfect matches.

Is an empty list in Lisp built from a cons cell?

I'm trying to emulate Lisp-like list in JavaScript (just an exercise with no practical reason), but I'm struggling to figure out how to best represent an empty list.
Is an empty list just a nil value or is it under the hood stored in a cons cell?
I can:
(car '())
NIL
(cdr '())
NIL
but an empty list for sure can not be (cons nil nil), because it would be indistinguishable from a list storing a single nil. It would need to store some other special value.
On the other hand, if an empty list is not built from a cons cell, it seems impossible to have a consistent high-level interface for appending a single value to an existing list. A function like:
(defun append-value (list value) ...
Would modify its argument, but only if it is not an empty list, which seems ugly.
Believe it or not, this is actually a religious question.
There are dialects that people dare to refer to as some kind of Lisp in which empty lists are conses or aggregate objects of some kind, rather than just an atom like nil.
For example, in "MatzLisp" (better known as Ruby) lists are actually arrays.
In NewLisp, lists are containers: objects of list type which contain a linked list of the items, so empty lists are empty containers. [Reference].
In Lisp languages that aren't spectacular cluster-fumbles of this sort, empty lists are atoms, and non-empty lists are binary cells with a field which holds the first item, and another field that holds the rest of the list. Lists can share suffixes. Given a list like (1 2 3) we can use cons to create (a 1 2 3) and (b c 1 2 3) both of which share the storage for (1 2 3).
(In ANSI Common Lisp, the empty list atom () is the same object as the symbol nil, which evaluates to itself and also serves as Boolean false. In Scheme, () isn't a symbol, and is distinct from the Boolean false #f object. However Scheme lists are still made up of pairs, and terminated by an atom.)
The ability to evaluate (car nil) does not automatically follow from the cons-and-nil representation of lists, and if we look at ancient Lisp documentation, such as the Lisp 1.5 manual from early 1960-something, we will find that this was absent. Initially, car was strictly a way to access a field of the cons cell, and required strictly a cons cell argument.
Good ideas like allowing (car nil) to Just Work (so that hackers could trim many useless lines of code from their programs) didn't appear overnight. The idea of allowing (car nil) may have appeared from InterLisp. In any case, Evolution Of Lisp paper claims that MacLisp (one of the important predecessors of Common Lisp, unrelated to the Apple Macintosh which came twenty years later), imitated this feature from InterLisp (another one of the significant predecessors).
Little details like this make the difference between pleasant programming and swearing at the monitor: see for instance A Short Ballad Dedicated to the Growth of Programs inspired by one Lisp programmer's struggle with a bletcherous dialect in which empty lists cannot be accessed with car, and do not serve as a boolean false.
An empty list is simply the nil symbol (and symbols, by definition, are not conses). car and cdr are defined to return nil if given nil.
As for list-mutation functions, they return a value that you are supposed to reassign to your variable. For example, look at the specification for the nreverse function: it may modify the given list, or not, and you are supposed to use the return value, and not rely on it to be modified in-place.
Even nconc, the quintessential destructive-append function, works that way: its return value is the appended list that you're supposed to use. It is specified to modify the given lists (except the last one) in-place, but if you give it nil as the first argument, it can't very well modify that, so you still have to use the return value.
NIL is somewhat a strange beast in Common Lisp because
it's a symbol (meaning that symbolp returns T)
is a list
is NOT a cons cell (consp returns NIL)
you can take CAR and CDR of it anyway
Note that the reasons behind this are probably also historical and you shouldn't think that this is the only reasonable solution. Other Lisp dialects made different choices.
Try it with your Lisp interpreter:
(eq nil '())
=> t
Several operations are special-cased to do unorthogonal (or even curious :-) things when operating on nil / an empty list. The behavior of car and cdr you were investigating is one of those things.
The idenity of nil as the empty list is one of the first things you learn about Lisp. I tried to come up with a good Google hit but I'll just pick one because there are so many: http://www.cs.sfu.ca/CourseCentral/310/pwfong/Lisp/1/tutorial1.html

good style in lisp: cons vs list

Is it good style to use cons for pairs of things or would it be preferable to stick to lists?
like for instance questions and answers:
(list
(cons
"Favorite color?"
"red")
(cons
"Favorite number?"
"123")
(cons
"Favorite fruit?"
"avocado"))
I mean, some things come naturally in pairs; there is no need for something that can hold more than two, so I feel like cons would be the natural choice. However, I also feel like I should be sticking to one thing (lists).
What would be the better or more accepted style?
What you have there is an association list (alist). Alist entries are, indeed, often simple conses rather than lists (though that is a matter of preference: some people use lists for alist entries too), so what you have is fine. Though, I usually prefer to use literal syntax:
'(("Favorite color?" . "red")
("Favorite number?" . "123")
("Favorite fruit?" . "avocado"))
Alists usually use a symbol as the key, because symbols are interned, and so symbol alists can be looked up using assq instead of assoc. Here's how it might look:
'((color . "red")
(number . "123")
(fruit . "avocado"))
The default data-structure for such case should be a HASH-TABLE.
An association list of cons pairs is also a possible variant and was widely used historically. It is a valid variant, because of tradition and simplicity. But you should not use it, when the number of pairs exceeds several (probably, 10 is a good threshold), because search time is linear, while in hash-table it is constant.
Using a list for this task is also possible, but will be both ugly and inefficient.
You would need to decide for yourself based upon circumstances. There isn't a universal answer. Different tasks work differently with structures. Consider the following:
It is faster to search in a hash-table for keys, then it is in the alist.
It is easier to have an iterator and save its state, when working with alist (hash-table would need to export all of its keys as an array or a list and have a pointer into that list, while it is enough to only remember the pointer into alist to be able to restore the iterator's state and continue the iteration.
Alist vs list: they use the same amount of conses for even number of elements, given all other characters are atoms. When using lists vs alists you would have to thus make sure there isn't an odd number of elements (and you may discover it too late), which is bad.
But there are a lot more functions, including the built-in ones, which work on proper lists, and don't work on alists. For example, nth will error on alist, if it hits the cdr, which is not a list.
Some times certain macros would not function as you'd like them to with alists, for example, this:
(destructuring-bind (a b c d)
'((100 . 200) (300 . 400))
(format t "~&~{~s~^,~}" (list a b c d)))
will not work as you might've expected.
On the other hand, certain procedures may be "tricked" into doing something which they don't do for proper lists. For instance, when copying an alist with copy-list, only the conses, whose cdr is a list will be copied anew (depending upon the circumstances this may be a desired result).

What does "my other car is a cdr" mean?

Can anyone well versed in lisp explain this joke to me?
I've done some reading on functional programming languages and know that CAR/CDR mean Contents of Address/Decrement Register but I still don't really understand the humour.
In Lisp, a linked list element is called a CONS. It is a data structure with two elements, called the CAR and the CDR for historical reasons. (Some Common Lisp programmers prefer to refer to them using the FIRST and REST functions, while others like CAR and CDR because they fit well with the precomposed versions such as (CADR x) ≡ (CAR (CDR x)).
The joke is a parody of the bumper stickers you sometimes see on beat-up old cars saying "My other car is a Porsche/BMW/etc."
My response to this joke has always been "My other CAR is a CADR. CDR isn't a CAR at all."
Yes, definitely a geek joke.
The names come from the IBM 704, but that's not the joke.
The joke is (bad) pun on "my other car is a ___." But the in-joke is about recursion.
When you loop/manipulate/select/invoke/more in lisp you use a combination of car (the first element in the list) and cdr (the rest of the list) to juggle functions.
So you've got a car, but your other car is your cdr because you can always get a car from a cdr since the cdr is always (in recursion) more elements. Get it? Laugh yet?
You'll probably have to learn lisp to actually chuckle a bit, or not. Of course, by then, you'll probably find yourself chuckling randomly for no apparent reason because:
Lisp makes you loopy.
//Coming from Scheme
Scheme has very few data structures, one of them is a tuple: '(first . second). In this case, car is the first element, and cdr is the second. This construct can be extended to create lists, trees, and other structures.
The joke isn't very funny.