How to iterate over items (keys, values) in Elisp hash-tables?
I created a hash-table (map, dictionary) using (make-hash-table) and populated it with different items. In Python i could iterate over dicts by:
for k in d # iterate over keys
for k in d.keys() # same
for k in d.values() # iterate over values
for k in d.items() # iterate over tuples (key, value)
How can i do the same the most succinct and elegant way possible, preferably without loop-macro?
(maphash (lambda (key value) ....your code here...) hash-table)
I'm going to advertise myself a bit, so take it with a grain of salt, but here are, basically, your options:
maphash - this is the built-in iteration primitive, fundamentally, no more ways to do it exist.
(loop for KEY being the hash-key of TABLE for VALUE being the hash-value of TABLE ...) is available in cl package. It will internally use maphash anyway, but it offers you some unification on top of different iterating primitives. You can use loop macro to iterate over multiple different things, and it reduces the clutter by removing the technical info from sight.
http://code.google.com/p/i-iterate/ Here's a library I'm working on to provide more versatile ways of iterating over different things and in different ways in Emacs Lisp. It is inspired by Common Lisp Iterate library, but it departed from it quite far (however, some basic principles still hold). If you were to try this library, the iteration over the hash-table would look like this: (++ (for (KEY VALUE) pairs TABLE) ...) or (++ (for KEY keys TABLE) ...) or (++ (for VALUE values TABLE) ...).
I will try to describe cons and pros of using either cl loop or i-iterate.
Unlike loop, iterate allows iterating over multiple hash-tables at once (but you must be aware of the additional cost it incurs: the keys of the second, third etc. hash-tables must be collected into a list before iterating, this is done behind the scenes).
Iterate provides arguably more Lisp-y syntax, which is easier to format in the editor.
With iterate you have more (and potentially even more in the future) options to combine iteration with other operations.
No one else so far is using it, beside myself :) It probably still has bugs and some things may be reworked, but it is near feature-freeze and is getting ready for proper use.
Significantly more people are familiar with either the built-in iteration primitives or the cl library.
Just as an aside, the full version of the iterate on hash-tables looks like this: (for VAR pairs|keys|values TABLE &optional limit LIMIT), where LIMIT stands for the number of element you want to look at (it will generate more efficient code, then if you were to break from the loop using more general-purpose tools).
maphash is the function you want. In addition I would suggest you to look at the manual (info "(elisp) Hash Tables")
Starting from 2013 there is a third-party library ht, which provides many convenient functions to operate on Elisp hash-tables.
Suppose you have a hash-table, where keys are strings and values are integers. To iterate over a hash-table and return a list, use ht-map:
(ht-map (lambda (k v) (+ (length k) v)) table)
;; return list of all values added to length of their keys
ht-each is just an alias for maphash. There are also anaphoric versions of the above 2 functions, called ht-amap and ht-aeach. Instead of accepting an anonymous function, they expose variables key and value. Here's the equivalent expression to the one above:
(ht-amap (+ (length key) value) table)
I would have preferred to put this into a comment, but my reputation
rating ironically prevents me from writing this in the appropriate
format...
loop is considered deprecated and so is the cl library,
because it didn't adhere to the convention of prefixing all symbols by
a common library prefix and thus polluted the obarray with symbols
without clear library association.
Instead use cl-lib which defines the same functions and macros but
names them e.g. cl-loop and cl-defun instead of loop and
defun*. If you need only the macros, you can import cl-macs
instead.
Related
I have an SBCL hash table where the hash keys are symbols. If the hash table was made with eq, will calling gethash give random access to the elements? I know these details are implementation specific, but so far I haven't been able to find a clear answer in the documentation.
I assume (also from the discussion in the comments) that by "give random access" you mean that the distribution of elements in the hash-table will be random and hence it will have O(1) access performance. The answer is yes, it will be. There are some degraded cases like this one (Why does `sxhash` return a constant for all structs?) when the distribution becomes skewed, but this is definitely not it. For eq comparisons the implementations will use the address of an object for hashing. In the case of SBCL, here's the actual code:
(defun eq-hash (key)
(declare (values hash (member t nil)))
;; I think it would be ok to pick off SYMBOL here and use its hash slot
;; as far as semantics are concerned, but EQ-hash is supposed to be
;; the lightest-weight in terms of speed, so I'm letting everything use
;; address-based hashing, unlike the other standard hash-table hash functions
;; which try use the hash slot of certain objects.
(values (pointer-hash key)
(sb-vm:is-lisp-pointer (get-lisp-obj-address key))))
However, you can also opt to use an eql hash-table (which I'd recommend: using eq should be reserved only for those who know what they are doing :) ). For this case, SBCL has a special function to hash symbols: symbol-hash. I assume, other implementation also do something similar, for symbol is, probably, the most frequent type of hash-table keys.
Hash tables, by design, give O(1) access and update of their elements. It's not implementation specific.
Since hashing works differently than comparing hash tables in standard CL is limited to eq, eql (default), equal, and equalp. In reality this only means the hash value for two values considered by one of these to be true will have the same hash value. SBCL lets you define hash functions but that is not portable.
I've had to re-implement a particular function in pretty much every Lisp program I've ever written. Since this function is so useful, it must have been implemented before. I'd expect it to be well known. Perhaps it is part of Common Lisp's standard library. What is it called and what library is it from?
(defun unknown-function (predicate tree)
(loop for item in tree
if (funcall predicate item) collect item
else if (listp item) append (unknown-function predicate item)))
It descends through a tree and creates a flat list of all the nodes in that tree that satisfy the predicate.
My original statement is wrong, because of the subtlety wherein the sublists are tested by the predicate before they are descended into. Here it is, for posterity:
There's no standard name for this. It's just a combination of flattening a list of lists and filtering out the elements that don't satisfy a predicate. In Common Lisp, there's no built-in flatten, but it would be a combination of your own flatten, and the standard remove-if-not.
This has a bit more in common with subst family of functions which do check the subtrees in addition to the leaves. However, they're replacing individuals elements of the tree, rather than removing them completely. So there's something in common with subst-if and subst-if-not, but they're still not quite perfect matches.
What would be the canonical way for emacs to represent JSON-like structures, or nested hashmaps ?
I have a structure with approximately 25 top-level keys. Each key has no more than a sub-key (ie. the value is another key/value element). Some of the final values are FIFO arrays.
I stated to model this using hash-map, but it feels cumbersome. Now I just stumbled upon assoc-lists, what would be the most appropriate in my case ?
Note : I intend to replicate parinfer in elisp, this part for now, and learn elisp at the same time.
You should use assoc-lists, which are the Emacs standard way of representing a map/dictionary/table. You see them in a lot of places: auto-mode-alist, minor-mode-alist, interpreter-mode-alist, etc. hash-map is only meant for speed, when you have 1000+ entries.
There's even an official way to convert JSON to an assoc-list:
(json-read-from-string "{\"foo\": {\"bar\": 5}}")
=> ((foo (bar . 5)))
Is the values function in Common Lisp just syntactic sugar for packaging multiple values into a list that gets destructured by the caller?. I am asking because I thought Common Lisp supports "true" multiple value return rather than returning a tuple or a list as in other languages, such as python. Someone just told me that it's just syntactic sugar, so I would like someone to kindly explain it. To try to understand the type that is returned by the values function, I typed (type-of (values 1 2 3)), and the output was BIT. I searched in the Common Lisp reference for that and I couldn't find it mentioned in the datatypes section. Also, can anyone share some resources that suggest how the values function is implemented in Common Lisp?. Thank you.
Multiple Values in CL
The language Common lisp
is described in the ANSI standard INCITS 226-1994 (R2004) and has many
implementations.
Each can implement multiple values
as it sees fit, and they are allowed, of course, to cons up a list for them
(in fact, the Emacs Lisp compatibility layer for CL does just
that -
but it is, emphatically and intentionally, not a Common Lisp implementation).
Purpose
However, the intent of this facility is to enable passing (at least
some) multiple values without consing (i.e., without allocating
heap memory) and all CL
implementations I know of do that.
In this sense the multiple values facility is an optimization.
Of course, the implementation of this feature can be very different for
different platforms and scenarios. E.g., the first few (say, 20 -
required by the standard) are
stored in a static of thread-local vector, the next several (1000?) are
allocated on the stack, and the rest (if needed) are allocated on the
heap as a vector or list.
Usage
E.g., the function floor returns two values.
If you write
(setq a (floor 10 3))
you capture only the first one and discard the second one, you need to
write
(setf (values q r) (floor 10 3))
to capture both values. This is similar to what other
languages might express as
q,r = floor(10,3)
using tuples, except that CL does
not allocate memory to pass (just a few) multiple values, and the
other languages often do.
IOW, one can think of multiple values as an ephemeral struct.
Note that CL can convert multiple values to lists:
(destructuring-bind (q r) (multiple-value-list (floor 10 3))
; use q & r here
...)
instead of the more efficient and concise
(multiple-value-bind (q r) (floor 10 3)
; use q & r here
...)
MV & type
CL does not have a special type for the "multiple value object"
exactly because it does not allocate a separate object to pass
around multiple values. In this sense one can, indeed, claim that
values is syntactic sugar.
However, in CL one can declare a
function type returning
multiple values:
(declaim (ftype (real &optional real) (values real real)) floor)
This means that floor returns two values, both
reals (as opposed to returning
a value of type (values real real)), i.e., in this case one might
claim abuse of notation.
Your case
In your specific case, type-of
is an ordinary function (i.e., not a macro or special operator).
You pass it a single object, 1, because, unless you are using
multiple-value-bind and
friends, only the first value is used, so
(type-of (values 1 2 3))
is identical to
(type-of 1)
and type of 1 is bit.
PS: Control return values
One use of values is to
control the return values of a function.
Normally a CL function's return values are those of the last form.
Sometimes it is not desirable, e.g., the last form return multiple
values and you want your function to return one value (or none,
like void in C):
(defun 2values (x y)
(floor y x))
(defun 1value (x y)
(values (floor y x)))
(defun no-values (x)
(print x)
(values))
The values function isn't just syntactic sugar for making a list for the caller to destructure.
For one, if the caller expects only a single value, it will get only one value (the first), not a list, from a form that returns multiple values. Since type-of takes only one value as an argument, it is giving you the type of the first value, 1. 1 is of type BIT.
Each Common Lisp implementation is free to pursue its own strategy for implementing multiple values. I learned a lot from what Frode Fjeld wrote about how his implementation, Movitz, handles it in The Movitz development platform, section 2.5.
If you make a CL implementation you could implement it with lists as long as it coheres to the spec. You need to handle one value specific and you need some way to tag zero, 2..n values and the other functions need to understand that format and print can be made to display it the same way as in other makes.
Most likely values and its sister functions is an optimization where the implementations use the stack instead of consing the values to a list structure just to get it destructured in the next level. In the olden times where RAM and CPU was not to be wasted it was very important, but I doubt you'll notice real trouble should you use destructuring-bind instead of multiple-value-bind today.
Common Lisp differs from Scheme a great deal in the positive direction that you can make a function, eg. floor where in it's calculations end up with the remainder in addition to the quotient answer, return all values at the same time but you are allowed to use it as if it only returned the very first value. I really miss that sometimes when writing Scheme since it demands you have a call-with-values that is similar to multiple-value-call or syntactic sugar like let-values to handle all the returned values that again makes you end up with making three versions in case you only need just one of the values.
Is it good style to use cons for pairs of things or would it be preferable to stick to lists?
like for instance questions and answers:
(list
(cons
"Favorite color?"
"red")
(cons
"Favorite number?"
"123")
(cons
"Favorite fruit?"
"avocado"))
I mean, some things come naturally in pairs; there is no need for something that can hold more than two, so I feel like cons would be the natural choice. However, I also feel like I should be sticking to one thing (lists).
What would be the better or more accepted style?
What you have there is an association list (alist). Alist entries are, indeed, often simple conses rather than lists (though that is a matter of preference: some people use lists for alist entries too), so what you have is fine. Though, I usually prefer to use literal syntax:
'(("Favorite color?" . "red")
("Favorite number?" . "123")
("Favorite fruit?" . "avocado"))
Alists usually use a symbol as the key, because symbols are interned, and so symbol alists can be looked up using assq instead of assoc. Here's how it might look:
'((color . "red")
(number . "123")
(fruit . "avocado"))
The default data-structure for such case should be a HASH-TABLE.
An association list of cons pairs is also a possible variant and was widely used historically. It is a valid variant, because of tradition and simplicity. But you should not use it, when the number of pairs exceeds several (probably, 10 is a good threshold), because search time is linear, while in hash-table it is constant.
Using a list for this task is also possible, but will be both ugly and inefficient.
You would need to decide for yourself based upon circumstances. There isn't a universal answer. Different tasks work differently with structures. Consider the following:
It is faster to search in a hash-table for keys, then it is in the alist.
It is easier to have an iterator and save its state, when working with alist (hash-table would need to export all of its keys as an array or a list and have a pointer into that list, while it is enough to only remember the pointer into alist to be able to restore the iterator's state and continue the iteration.
Alist vs list: they use the same amount of conses for even number of elements, given all other characters are atoms. When using lists vs alists you would have to thus make sure there isn't an odd number of elements (and you may discover it too late), which is bad.
But there are a lot more functions, including the built-in ones, which work on proper lists, and don't work on alists. For example, nth will error on alist, if it hits the cdr, which is not a list.
Some times certain macros would not function as you'd like them to with alists, for example, this:
(destructuring-bind (a b c d)
'((100 . 200) (300 . 400))
(format t "~&~{~s~^,~}" (list a b c d)))
will not work as you might've expected.
On the other hand, certain procedures may be "tricked" into doing something which they don't do for proper lists. For instance, when copying an alist with copy-list, only the conses, whose cdr is a list will be copied anew (depending upon the circumstances this may be a desired result).