Emacs - represent JSON-like structures - emacs

What would be the canonical way for emacs to represent JSON-like structures, or nested hashmaps ?
I have a structure with approximately 25 top-level keys. Each key has no more than a sub-key (ie. the value is another key/value element). Some of the final values are FIFO arrays.
I stated to model this using hash-map, but it feels cumbersome. Now I just stumbled upon assoc-lists, what would be the most appropriate in my case ?
Note : I intend to replicate parinfer in elisp, this part for now, and learn elisp at the same time.

You should use assoc-lists, which are the Emacs standard way of representing a map/dictionary/table. You see them in a lot of places: auto-mode-alist, minor-mode-alist, interpreter-mode-alist, etc. hash-map is only meant for speed, when you have 1000+ entries.
There's even an official way to convert JSON to an assoc-list:
(json-read-from-string "{\"foo\": {\"bar\": 5}}")
=> ((foo (bar . 5)))

Related

Random access for hash table

I have an SBCL hash table where the hash keys are symbols. If the hash table was made with eq, will calling gethash give random access to the elements? I know these details are implementation specific, but so far I haven't been able to find a clear answer in the documentation.
I assume (also from the discussion in the comments) that by "give random access" you mean that the distribution of elements in the hash-table will be random and hence it will have O(1) access performance. The answer is yes, it will be. There are some degraded cases like this one (Why does `sxhash` return a constant for all structs?) when the distribution becomes skewed, but this is definitely not it. For eq comparisons the implementations will use the address of an object for hashing. In the case of SBCL, here's the actual code:
(defun eq-hash (key)
(declare (values hash (member t nil)))
;; I think it would be ok to pick off SYMBOL here and use its hash slot
;; as far as semantics are concerned, but EQ-hash is supposed to be
;; the lightest-weight in terms of speed, so I'm letting everything use
;; address-based hashing, unlike the other standard hash-table hash functions
;; which try use the hash slot of certain objects.
(values (pointer-hash key)
(sb-vm:is-lisp-pointer (get-lisp-obj-address key))))
However, you can also opt to use an eql hash-table (which I'd recommend: using eq should be reserved only for those who know what they are doing :) ). For this case, SBCL has a special function to hash symbols: symbol-hash. I assume, other implementation also do something similar, for symbol is, probably, the most frequent type of hash-table keys.
Hash tables, by design, give O(1) access and update of their elements. It's not implementation specific.
Since hashing works differently than comparing hash tables in standard CL is limited to eq, eql (default), equal, and equalp. In reality this only means the hash value for two values considered by one of these to be true will have the same hash value. SBCL lets you define hash functions but that is not portable.

Is there any benefit to using an obarray rather than a hash-table in Emacs Lisp?

I have an Emacs Lisp program that needs to keep track of a set of strings, use them for completion and test other strings for membership in the set. In most languages without a built-in set type, I would use a dictionary or hash table with a dummy t or 1 value for this, but it occurred to me that Elisp's obarray type could also serve the purpose, with intern, intern-soft and unintern taking the place of puthash, gethash and remhash.
(I know about the cl-lib functions which operate on lists as sets, but those are not particularly relevant for this problem, which only needs a set membership test).
Is there any advantage (in speed, memory usage or otherwise) in using an obarray rather than a hash table in a modern Emacs, or are obarrays other than the main symbol table more of a leftover from before Emacs Lisp had a separate hash-table type?
Since both work, it's to a large extent a question of taste or performance.
In terms of memory usage (counted in words), an obarray uses 1 array of fixed size N plus one symbol per entry (of size 6), whereas a hash-table has a size that is more or less 5 per element plus a bit more. So memorywise, it's a wash.
In terms of speed, I don't know anyone who has bothered to measure it, so it's probably not a big issue either.
IOW, it's a question of taste. FWIW, I prefer hash tables which offer more options; obarrays are largely a historical accident in my view.

Retrieve keys from hash-table, sorted by the values, efficiently

I'm using Emacs Lisp, but have the cl package loaded, for some common lisp features.
I have a hash table containing up to 50K entries, with integer keys mapped to triplets, something like this (but in actual lisp):
{
8 => '(9 300 12)
27 => '(5 125 9)
100 => '(10 242 14)
}
The second value in the triplet is a score that has been calculated during a complex algorithm that built the hash-table. I need to collect a regular lisp list with all of the keys from the hash, ordered by the score (i.e. all keys ordered by the cadr of the value).
So for the above, I need this list:
'(27 100 8)
I'm currently doing this with two phases, which feels less efficient than it needs to be.
Is there a good way to do this?
My current solution uses maphash to collect the keys and the values into two new lists, then does a sort in the normal way, referring to the list of scores in the predicate. It feels like I could be combining the collection and the sorting together, however.
EDIT | I'm also not attached to using hash-table, though I do need constant access time for the integer keys, which are not linearly spaced.
EDIT 2| It looks like implementing a binary tree sort could work, where the labels in the tree are the scores and the values are the keys... this way I'm doing the sort as I map over the hash.
... Continues reading wikipedia page on sorting algorithms
Basically, you solution is correct: you need to collect the keys into a list:
(defun hash-table-keys (hash-table)
(let ((keys ()))
(maphash (lambda (k v) (push k keys)) hash-table)
keys))
and then sort the list:
(sort (hash-table-keys hash-table)
(lambda (k1 k2)
(< (second (gethash k1 hash-table))
(second (gethash k2 hash-table)))))
Combining key collection with sorting is possible: you need to collect the keys into a tree and then "flatten" the tree. However, this will only matter if you are dealing with really huge tables. Also, since Emacs Lisp compiles to bytecodes, you might find that using the sort built-in is still faster than using a tree. Consider also the development cost - you will need to write code whose value will be mostly educational.
Delving deeper, collecting the keys allocates the list of keys (which you will certainly need anyway for the result) and sort operates "in-place", so the "simple way" is about as good as it gets.
The "tree" way will allocate the tree (the same memory footprint as the required list of keys) and populating and flattening it will be the same O(n*log(n)) process as the "collect+sort" way. However, keeping the tree balanced, and then flattening it "in-place" (i.e., without allocating a new list) is not a simple exercise.
The bottom line is: KISS.

Iterate over Emacs Lisp hash-table

How to iterate over items (keys, values) in Elisp hash-tables?
I created a hash-table (map, dictionary) using (make-hash-table) and populated it with different items. In Python i could iterate over dicts by:
for k in d # iterate over keys
for k in d.keys() # same
for k in d.values() # iterate over values
for k in d.items() # iterate over tuples (key, value)
How can i do the same the most succinct and elegant way possible, preferably without loop-macro?
(maphash (lambda (key value) ....your code here...) hash-table)
I'm going to advertise myself a bit, so take it with a grain of salt, but here are, basically, your options:
maphash - this is the built-in iteration primitive, fundamentally, no more ways to do it exist.
(loop for KEY being the hash-key of TABLE for VALUE being the hash-value of TABLE ...) is available in cl package. It will internally use maphash anyway, but it offers you some unification on top of different iterating primitives. You can use loop macro to iterate over multiple different things, and it reduces the clutter by removing the technical info from sight.
http://code.google.com/p/i-iterate/ Here's a library I'm working on to provide more versatile ways of iterating over different things and in different ways in Emacs Lisp. It is inspired by Common Lisp Iterate library, but it departed from it quite far (however, some basic principles still hold). If you were to try this library, the iteration over the hash-table would look like this: (++ (for (KEY VALUE) pairs TABLE) ...) or (++ (for KEY keys TABLE) ...) or (++ (for VALUE values TABLE) ...).
I will try to describe cons and pros of using either cl loop or i-iterate.
Unlike loop, iterate allows iterating over multiple hash-tables at once (but you must be aware of the additional cost it incurs: the keys of the second, third etc. hash-tables must be collected into a list before iterating, this is done behind the scenes).
Iterate provides arguably more Lisp-y syntax, which is easier to format in the editor.
With iterate you have more (and potentially even more in the future) options to combine iteration with other operations.
No one else so far is using it, beside myself :) It probably still has bugs and some things may be reworked, but it is near feature-freeze and is getting ready for proper use.
Significantly more people are familiar with either the built-in iteration primitives or the cl library.
Just as an aside, the full version of the iterate on hash-tables looks like this: (for VAR pairs|keys|values TABLE &optional limit LIMIT), where LIMIT stands for the number of element you want to look at (it will generate more efficient code, then if you were to break from the loop using more general-purpose tools).
maphash is the function you want. In addition I would suggest you to look at the manual (info "(elisp) Hash Tables")
Starting from 2013 there is a third-party library ht, which provides many convenient functions to operate on Elisp hash-tables.
Suppose you have a hash-table, where keys are strings and values are integers. To iterate over a hash-table and return a list, use ht-map:
(ht-map (lambda (k v) (+ (length k) v)) table)
;; return list of all values added to length of their keys
ht-each is just an alias for maphash. There are also anaphoric versions of the above 2 functions, called ht-amap and ht-aeach. Instead of accepting an anonymous function, they expose variables key and value. Here's the equivalent expression to the one above:
(ht-amap (+ (length key) value) table)
I would have preferred to put this into a comment, but my reputation
rating ironically prevents me from writing this in the appropriate
format...
loop is considered deprecated and so is the cl library,
because it didn't adhere to the convention of prefixing all symbols by
a common library prefix and thus polluted the obarray with symbols
without clear library association.
Instead use cl-lib which defines the same functions and macros but
names them e.g. cl-loop and cl-defun instead of loop and
defun*. If you need only the macros, you can import cl-macs
instead.

good style in lisp: cons vs list

Is it good style to use cons for pairs of things or would it be preferable to stick to lists?
like for instance questions and answers:
(list
(cons
"Favorite color?"
"red")
(cons
"Favorite number?"
"123")
(cons
"Favorite fruit?"
"avocado"))
I mean, some things come naturally in pairs; there is no need for something that can hold more than two, so I feel like cons would be the natural choice. However, I also feel like I should be sticking to one thing (lists).
What would be the better or more accepted style?
What you have there is an association list (alist). Alist entries are, indeed, often simple conses rather than lists (though that is a matter of preference: some people use lists for alist entries too), so what you have is fine. Though, I usually prefer to use literal syntax:
'(("Favorite color?" . "red")
("Favorite number?" . "123")
("Favorite fruit?" . "avocado"))
Alists usually use a symbol as the key, because symbols are interned, and so symbol alists can be looked up using assq instead of assoc. Here's how it might look:
'((color . "red")
(number . "123")
(fruit . "avocado"))
The default data-structure for such case should be a HASH-TABLE.
An association list of cons pairs is also a possible variant and was widely used historically. It is a valid variant, because of tradition and simplicity. But you should not use it, when the number of pairs exceeds several (probably, 10 is a good threshold), because search time is linear, while in hash-table it is constant.
Using a list for this task is also possible, but will be both ugly and inefficient.
You would need to decide for yourself based upon circumstances. There isn't a universal answer. Different tasks work differently with structures. Consider the following:
It is faster to search in a hash-table for keys, then it is in the alist.
It is easier to have an iterator and save its state, when working with alist (hash-table would need to export all of its keys as an array or a list and have a pointer into that list, while it is enough to only remember the pointer into alist to be able to restore the iterator's state and continue the iteration.
Alist vs list: they use the same amount of conses for even number of elements, given all other characters are atoms. When using lists vs alists you would have to thus make sure there isn't an odd number of elements (and you may discover it too late), which is bad.
But there are a lot more functions, including the built-in ones, which work on proper lists, and don't work on alists. For example, nth will error on alist, if it hits the cdr, which is not a list.
Some times certain macros would not function as you'd like them to with alists, for example, this:
(destructuring-bind (a b c d)
'((100 . 200) (300 . 400))
(format t "~&~{~s~^,~}" (list a b c d)))
will not work as you might've expected.
On the other hand, certain procedures may be "tricked" into doing something which they don't do for proper lists. For instance, when copying an alist with copy-list, only the conses, whose cdr is a list will be copied anew (depending upon the circumstances this may be a desired result).