I am trying to write a Lisp function to return a list of characters (no repeats) from a list (with ints, characters, etc). I'm still a beginner with Lisp and am having trouble starting. Our prof mentioned using atom but I can't figure out what she meant. Here is the question:
"Write a lisp function that accepts a list as the input argument (the list is mixed up integers, decimals,
characters and nested lists) and creates a list including all the characters in the original list without any
duplication. Sample program output is shown below:
‘((z f) (b a 5 3.5) 6 (7) (a) c) -> (z f b a c)
‘( (n) 2 (6 h 7.8) (w f) (n) (c) n) -> (h w f c n) "
What your assignment calls “characters” are actually symbols with a name of length 1. It seems that you can just mentally replace the word “characters” with “symbols” and work with this.
An atom is anything that is not a cons—any non-empty list consists of a chain of conses. For example, symbols, numbers, strings, and nil are atoms.
A cons (actually a cons cell) is a simple datastructure that can hold two things. In a list, the first thing of each cons is some list element, and the second either a pointer to the next cons or nil. You can also have lists as list elements; then also the first thing would be a pointer to a list. This would then formally be a tree. The accessor function for the first thing of a cons is called car or first, the accessor function for the other thing is called cdr or rest. Car and cdr are a bit archaic, and mainly used when you see the cons cell as a tree node, while first and rest are more modern, and mainly used when you see the cons cell as a list chain link.
You can test whether a thing is an atom with the function atom. If it is not an atom, it is a list with at least one element.
Your assignment has a few parts:
Walk the tree to look at each element. This can be done through recursion, or through looping in one direction and recursing in the other.
Keep a list of symbols that you already found.
If the element you look at is a symbol (with a name of length 1…), then check whether it is new, if yes, add it to your list.
Finally return that list.
One useful idiom is to use push or pushnew, which put new elements at the front of the list, and at the end reverse it.
I've read the docs (several different versions!) but I can't quite get my head wrapped around multiple-value-bind.
Here's what I (think I) know:
The first parameter is a list of variables that are going to get bound.
The next parameter is a list of values that get bound to the variables.
Am I right that these 2 lists have to be the same length?
The last parameter (is it optional?) is a body of code that can act on the variables with their newly-bound values.
That sure seems to be how the docs read, and it fits with code I'm reading but not-quite following. I get into trouble when I try to create a multiple-value-bind statement from scratch, as a test. I end up with results like this:
? (mulitple-value-bind (x y z) (values 11 22 33) (+ x y z)) ;; EDIT: contains typo
> Error: Unbound variable: Y
> While executing: CCL::CHEAP-EVAL-IN-ENVIRONMENT, in process Listener(7).
> Type cmd-/ to continue, cmd-. to abort, cmd-\ for a list of available restarts.
> If continued: Retry getting the value of Y.
> Type :? for other options.
1 >
(I was sort of hoping for output along the lines of 66.) (I'm using Clozure-CL if it matters, though I don't think it should.)
Also, I'm looking at some sample code (trying to understand Project Euler Problem 24) that reads like this:
(multiple-value-bind
(q r)
(floor n m)
(cons (nth q lst) (permute-b r (remove-nth q lst)))
)
(NOTE: I may have mis-indented it, which may be affecting my lack of understanding)
What I don't get about this is it looks to me as if there are 2 variables being multiply-bound (q & r), but only one value (floor n m). Or is the other value the cons statement, and there is no body?!
As you can see, I completely don't get multiple-value-bind; please enlighten me.
Thanks!
Your first example with the "unbound variable" is due to your misspelling multiple-value-bind. Try fixing the spelling; you should see a different result.
As to your second question, floor returns two values, the floor and the remainder. Remember that values is not the only function that returns multiple values!
So, basically, the multiple-value-bind form looks like this:
(multiple-value-bind (var-1 .. var-n) expr
body)
where expr is an expression that returns multiple values, which are bound to the variable names given in var-1 .. var-n; those variables are available for use in body. It is okay for expr to return more or fewer values than are given as variables; nil is used as the default value for any absent values, and any excess values are discarded.
I'm using Emacs Lisp, but have the cl package loaded, for some common lisp features.
I have a hash table containing up to 50K entries, with integer keys mapped to triplets, something like this (but in actual lisp):
{
8 => '(9 300 12)
27 => '(5 125 9)
100 => '(10 242 14)
}
The second value in the triplet is a score that has been calculated during a complex algorithm that built the hash-table. I need to collect a regular lisp list with all of the keys from the hash, ordered by the score (i.e. all keys ordered by the cadr of the value).
So for the above, I need this list:
'(27 100 8)
I'm currently doing this with two phases, which feels less efficient than it needs to be.
Is there a good way to do this?
My current solution uses maphash to collect the keys and the values into two new lists, then does a sort in the normal way, referring to the list of scores in the predicate. It feels like I could be combining the collection and the sorting together, however.
EDIT | I'm also not attached to using hash-table, though I do need constant access time for the integer keys, which are not linearly spaced.
EDIT 2| It looks like implementing a binary tree sort could work, where the labels in the tree are the scores and the values are the keys... this way I'm doing the sort as I map over the hash.
... Continues reading wikipedia page on sorting algorithms
Basically, you solution is correct: you need to collect the keys into a list:
(defun hash-table-keys (hash-table)
(let ((keys ()))
(maphash (lambda (k v) (push k keys)) hash-table)
keys))
and then sort the list:
(sort (hash-table-keys hash-table)
(lambda (k1 k2)
(< (second (gethash k1 hash-table))
(second (gethash k2 hash-table)))))
Combining key collection with sorting is possible: you need to collect the keys into a tree and then "flatten" the tree. However, this will only matter if you are dealing with really huge tables. Also, since Emacs Lisp compiles to bytecodes, you might find that using the sort built-in is still faster than using a tree. Consider also the development cost - you will need to write code whose value will be mostly educational.
Delving deeper, collecting the keys allocates the list of keys (which you will certainly need anyway for the result) and sort operates "in-place", so the "simple way" is about as good as it gets.
The "tree" way will allocate the tree (the same memory footprint as the required list of keys) and populating and flattening it will be the same O(n*log(n)) process as the "collect+sort" way. However, keeping the tree balanced, and then flattening it "in-place" (i.e., without allocating a new list) is not a simple exercise.
The bottom line is: KISS.
What would be good purely functional data structures for text editors? I want to be able to insert single characters into the text and delete single characters from the text with acceptable efficiency, and I would like to be able to hold on to old versions, so I can undo changes with ease.
Should I just use a list of strings and reuse the lines that don't change from version to version?
I don't know whether this suggestion is "good" for sophisticated definitions of "good", but it's easy and fun. I often set an exercise to write the core of a text editor in Haskell, linking with rendering code that I provide. The data model is as follows.
First, I define what it is to be a cursor inside a list of x-elements, where the information available at the cursor has some type m. (The x will turn out to be Char or String.)
type Cursor x m = (Bwd x, m, [x])
This Bwd thing is just the backward "snoc-lists". I want to keep strong spatial intuitions, so I turn things around in my code, not in my head. The idea is that the stuff nearest the cursor is the most easily accessible. That's the spirit of The Zipper.
data Bwd x = B0 | Bwd x :< x deriving (Show, Eq)
I provide a gratuitous singleton type to act as a readable marker for the cursor...
data Here = Here deriving Show
...and I can thus say what it is to be somewhere in a String
type StringCursor = Cursor Char Here
Now, to represent a buffer of multiple lines, we need Strings above and below the line with the cursor, and a StringCursor in the middle, for the line we're currently editing.
type TextCursor = Cursor String StringCursor
This TextCursor type is all I use to represent the state of the edit buffer. It's a two layer zipper. I provide the students with code to render a viewport on the text in an ANSI-escape-enabled shell window, ensuring that the viewport contains the cursor. All they have to do is implement the code that updates the TextCursor in response to keystrokes.
handleKey :: Key -> TextCursor -> Maybe (Damage, TextCursor)
where handleKey should return Nothing if the keystroke is meaningless, but otherwise deliver Just an updated TextCursor and a "damage report", the latter being one of
data Damage
= NoChange -- use this if nothing at all happened
| PointChanged -- use this if you moved the cursor but kept the text
| LineChanged -- use this if you changed text only on the current line
| LotsChanged -- use this if you changed text off the current line
deriving (Show, Eq, Ord)
(If you're wondering what the difference is between returning Nothing and returning Just (NoChange, ...), consider whether you also want the editor to go beep.) The damage report tells the renderer how much work it needs to do to bring the displayed image up to date.
The Key type just gives a readable dataype representation to the possible keystrokes, abstracting away from the raw ANSI escape sequences. It's unremarkable.
I provide the students with a big clue about to go up and down with this data model by offering these pieces of kit:
deactivate :: Cursor x Here -> (Int, [x])
deactivate c = outward 0 c where
outward i (B0, Here, xs) = (i, xs)
outward i (xz :< x, Here, xs) = outward (i + 1) (xz, Here, x : xs)
The deactivate function is used to shift focus out of a Cursor, giving you an ordinary list, but telling you where the cursor was. The corresponding activate function attempts to place the cursor at a given position in a list:
activate :: (Int, [x]) -> Cursor x Here
activate (i, xs) = inward i (B0, Here, xs) where
inward _ c#(_, Here, []) = c -- we can go no further
inward 0 c = c -- we should go no further
inward i (xz, Here, x : xs) = inward (i - 1) (xz :< x, Here, xs) -- and on!
I offer the students a deliberately incorrect and incomplete definition of handleKey
handleKey :: Key -> TextCursor -> Maybe (Damage, TextCursor)
handleKey (CharKey c) (sz,
(cz, Here, cs),
ss)
= Just (LineChanged, (sz,
(cz, Here, c : cs),
ss))
handleKey _ _ = Nothing
which just handles ordinary character keystrokes but makes the text come out backwards. It's easy to see that the character c appears right of Here. I invite them to fix the bug and add functionality for the arrow keys, backspace, delete, return, and so on.
It may not be the most efficient representation ever, but it's purely functional and enables the code to conform concretely to our spatial intuitions about the text that's being edited.
A Vector[Vector[Char]] would probably be a good bet. It is an IndexedSeq so has decent update / prepend / update performance, unlike the List you mention. If you look at Performance Characteristics, it's the only immutable collection mentioned that has effective constant-time update.
We use a text zipper in Yi, a serious text editor implementation in Haskell.
The implementation of the immutable state types is described in the following,
http://publications.lib.chalmers.se/records/fulltext/local_94979.pdf
http://publications.lib.chalmers.se/records/fulltext/local_72549.pdf
and other papers.
Fingertrees
Ropes
scala.collection.immutable.IndexSeq
I'd suggest to use zippers in combination with Data.Sequence.Seq which is based on finger trees. So you could represent the current state as
data Cursor = Cursor { upLines :: Seq Line
, curLine :: CurLine
, downLines :: Seq Line }
This gives you O(1) complexity for moving cursor up/down a single line, and since splitAt and (><) (union) have both O(log(min(n1,n2))) complexity, you'll get O(log(L)) complexity for skipping L lines up/down.
You could have a similar zipper structure for CurLine to keep a sequence of character before, at and after the cursor.
Line could be something space-efficient, such as ByteString.
I've implemented a zipper for this purpose for my vty-ui library. You can take a look here:
https://github.com/jtdaugherty/vty-ui/blob/master/src/Graphics/Vty/Widgets/TextZipper.hs
The Clojure community is looking at RRB Trees (Relaxed Radix Balanced) as a persistent data strcuture for vectors of data that can be efficiently concatenated / sliced / inserted etc.
It allows concatenation, insert-at-index and split operations in O(log N) time.
I imagine a RRB Tree specialised for character data would be perfectly suited for large "editable" text data structures.
The possibilities that spring to mind are:
The "Text" type with a numerical index. It keeps text in a linked list of buffers (internal representation is UTF16), so while in theory its computational complexity is usually that of a linked list (e.g. indexing is O(n)), in practice its so much faster than a conventional linked list that you could probably just forget about the impact of n unless you are storing the whole of Wikipedia in your buffer. Try some experiments on 1 million character text to see if I'm right (something I haven't actually done, BTW).
A text zipper: store the text after the cursor in one text element, and the text before the cursor in another. To move the cursor transfer text from one side to the other.
The documentation for Coq carries the general admonition not to rely on the builtin naming mechanism, but select one's own names, lest the changes in the naming mechanism render past proofs invalid.
When considering expressions of the form remember Expr as v, we set the variable v to the expression Expr. But the name of the assumption is selected automatically, and is something like Heqv, so we have:
Heqv: v = Expr
How can I select my own name instead of Heqv? I can always rename it to whatever I like using the rename command, but that doesn't keep my proofs independent of the hypothetical future changes in the builtin naming mechanism in Coq.
If you may get rid of the separate equality, try set (name := val). Use unfold instead of rewrite to get the value back in place.
If you need the equality for more than the rewrite <-, I know of no built in tactic that does this. You can do it manually, though, or build a tactic / notation. I just threw this together. (Note: I'm not an expert, this might be done more easily.)
Tactic Notation "remember_as_eq" constr(expr) ident(vname) ident(eqname) :=
let v := fresh in
let HHelp := fresh in
set (v := expr);
(assert (HHelp : sigT (fun x => x = v)) by ( apply (existT _ v); reflexivity));
inversion HHelp as [vname eqname];
unfold v in *; clear v HHelp;
rewrite <- eqname in *.
Use as remember_as_eq (2+2) four Heqfour to get the same result as with remember (2+2) as four.
Note: Updated to handle more cases, the old version failed on some combinations of value and goal type. Leave a comment if you find another case that works with rewrite but not this one.