I have an SBCL hash table where the hash keys are symbols. If the hash table was made with eq, will calling gethash give random access to the elements? I know these details are implementation specific, but so far I haven't been able to find a clear answer in the documentation.
I assume (also from the discussion in the comments) that by "give random access" you mean that the distribution of elements in the hash-table will be random and hence it will have O(1) access performance. The answer is yes, it will be. There are some degraded cases like this one (Why does `sxhash` return a constant for all structs?) when the distribution becomes skewed, but this is definitely not it. For eq comparisons the implementations will use the address of an object for hashing. In the case of SBCL, here's the actual code:
(defun eq-hash (key)
(declare (values hash (member t nil)))
;; I think it would be ok to pick off SYMBOL here and use its hash slot
;; as far as semantics are concerned, but EQ-hash is supposed to be
;; the lightest-weight in terms of speed, so I'm letting everything use
;; address-based hashing, unlike the other standard hash-table hash functions
;; which try use the hash slot of certain objects.
(values (pointer-hash key)
(sb-vm:is-lisp-pointer (get-lisp-obj-address key))))
However, you can also opt to use an eql hash-table (which I'd recommend: using eq should be reserved only for those who know what they are doing :) ). For this case, SBCL has a special function to hash symbols: symbol-hash. I assume, other implementation also do something similar, for symbol is, probably, the most frequent type of hash-table keys.
Hash tables, by design, give O(1) access and update of their elements. It's not implementation specific.
Since hashing works differently than comparing hash tables in standard CL is limited to eq, eql (default), equal, and equalp. In reality this only means the hash value for two values considered by one of these to be true will have the same hash value. SBCL lets you define hash functions but that is not portable.
Related
I'd like to compare the code contents of two syntax objects and ignore things like contexts. Is converting them to datum the only way to do so? Like:
(equal? (syntax->datum #'(x+1)) (syntax->datum #'(x+1)))
If you want to compare both objects without deconstructing them at all, then yes.
HOWEVER, the problem with this method is that it only compares the datum attached to two syntax objects, and won't actually compare their binding information.
The analogy that I've heard (from Ryan Culpepper), is this is kind of like taking two paintings, draining of them of their color, and seeing if they are identical. While they might be similar in some ways, you will miss a lot of differences from the different colors.
A better approach (although it does require some work), is to use syntax-e to destruct the syntax object into more primitive lists of syntax objects, and do this until you get identifiers (basically a syntax object whose datum is a symbol), from there, you can generally use free-identifier=? (and sometimes bound-identifier=? to see if each identifier can bind each other, and identifier-binding to compare module level identifiers.
The reason why there isn't a single simple predicate to compare two arbitrary syntax objects is because, generally, there isn't really one good definition for what makes two pieces of code equal, even if you only care about syntactic equality. For example, using the functions referenced above doesn't track internal bindings in a syntax object, so you will still get a very strict definition of what it means to be 'equal'. that is, both syntax objects have the same structure with identifiers that are either bound to the same module, or are free-identifier=?.
As such, before you use this answer, I highly recommend you take a step back and make sure this is really what you want to do. Once in a blue moon it is, but most of the time you actually are trying to solve a similar, yet simpler, problem.
Here's a concrete example of one possible way you could do the "better approach" Leif Andersen mentioned.
I have used this in multiple places for testing purposes, though if anyone wanted to use it in non-test code, they would probably want to re-visit some of the design decisions.
However, things like the equal?/recur pattern used here should be helpful no matter how you decide to define what equality means.
Some of the decisions you might want to make different choices on:
On identifiers, do you want to check that the scopes are exactly the same (bound-identifier=?), or would you want to assume that they would be bound outside of the syntax object and check that they are bound to the same thing, even if they have different scopes (free-identifier=?)? Note that if you choose the first one, then checking the results of macro expansion will sometimes return #false because of scope differences, but if you choose the second one, then if any identifier is not bound outside of the syntax object, then it would be as if you only care about symbol=? equality on names, so it will return #true in some places where it shouldn't. I chose the first one bound-identifier=? here because for testing, a "false positive" where the test fails is better than a "false negative" where the tests succeeds in cases it shouldn't.
On source locations, do you want to check that they are equal, or do you want to ignore them? This code ignores them because it's only for testing purposes, but if you want equality only for things which have the same source location, you might want to check that using functions like build-source-location-list.
On syntax properties, do you want to check that they are equal, or do you want to ignore them? This code ignores them because it's only for testing purposes, but if you want to check that you might want to use functions like syntax-property-symbol-keys.
Finally here is the code. It may not be exactly what you want depending on how you answered the questions above. However, its structure and how it uses equal?/recur might be helpful to you.
(require rackunit)
;; Works on fully wrapped, non-wrapped, and partially
;; wrapped values, and it checks that the inputs
;; are wrapped in all the same places. It checks scopes,
;; but it does not check source location.
(define-binary-check (check-stx=? stx=? actual expected))
;; Stx Stx -> Bool
(define (stx=? a b)
(cond
[(and (identifier? a) (identifier? b))
(bound-identifier=? a b)]
[(and (syntax? a) (syntax? b))
(and (bound-identifier=? (datum->syntax a '||) (datum->syntax b '||))
(stx=? (syntax-e a) (syntax-e b)))]
[else
(equal?/recur a b stx=?)]))
I have an Emacs Lisp program that needs to keep track of a set of strings, use them for completion and test other strings for membership in the set. In most languages without a built-in set type, I would use a dictionary or hash table with a dummy t or 1 value for this, but it occurred to me that Elisp's obarray type could also serve the purpose, with intern, intern-soft and unintern taking the place of puthash, gethash and remhash.
(I know about the cl-lib functions which operate on lists as sets, but those are not particularly relevant for this problem, which only needs a set membership test).
Is there any advantage (in speed, memory usage or otherwise) in using an obarray rather than a hash table in a modern Emacs, or are obarrays other than the main symbol table more of a leftover from before Emacs Lisp had a separate hash-table type?
Since both work, it's to a large extent a question of taste or performance.
In terms of memory usage (counted in words), an obarray uses 1 array of fixed size N plus one symbol per entry (of size 6), whereas a hash-table has a size that is more or less 5 per element plus a bit more. So memorywise, it's a wash.
In terms of speed, I don't know anyone who has bothered to measure it, so it's probably not a big issue either.
IOW, it's a question of taste. FWIW, I prefer hash tables which offer more options; obarrays are largely a historical accident in my view.
Is the values function in Common Lisp just syntactic sugar for packaging multiple values into a list that gets destructured by the caller?. I am asking because I thought Common Lisp supports "true" multiple value return rather than returning a tuple or a list as in other languages, such as python. Someone just told me that it's just syntactic sugar, so I would like someone to kindly explain it. To try to understand the type that is returned by the values function, I typed (type-of (values 1 2 3)), and the output was BIT. I searched in the Common Lisp reference for that and I couldn't find it mentioned in the datatypes section. Also, can anyone share some resources that suggest how the values function is implemented in Common Lisp?. Thank you.
Multiple Values in CL
The language Common lisp
is described in the ANSI standard INCITS 226-1994 (R2004) and has many
implementations.
Each can implement multiple values
as it sees fit, and they are allowed, of course, to cons up a list for them
(in fact, the Emacs Lisp compatibility layer for CL does just
that -
but it is, emphatically and intentionally, not a Common Lisp implementation).
Purpose
However, the intent of this facility is to enable passing (at least
some) multiple values without consing (i.e., without allocating
heap memory) and all CL
implementations I know of do that.
In this sense the multiple values facility is an optimization.
Of course, the implementation of this feature can be very different for
different platforms and scenarios. E.g., the first few (say, 20 -
required by the standard) are
stored in a static of thread-local vector, the next several (1000?) are
allocated on the stack, and the rest (if needed) are allocated on the
heap as a vector or list.
Usage
E.g., the function floor returns two values.
If you write
(setq a (floor 10 3))
you capture only the first one and discard the second one, you need to
write
(setf (values q r) (floor 10 3))
to capture both values. This is similar to what other
languages might express as
q,r = floor(10,3)
using tuples, except that CL does
not allocate memory to pass (just a few) multiple values, and the
other languages often do.
IOW, one can think of multiple values as an ephemeral struct.
Note that CL can convert multiple values to lists:
(destructuring-bind (q r) (multiple-value-list (floor 10 3))
; use q & r here
...)
instead of the more efficient and concise
(multiple-value-bind (q r) (floor 10 3)
; use q & r here
...)
MV & type
CL does not have a special type for the "multiple value object"
exactly because it does not allocate a separate object to pass
around multiple values. In this sense one can, indeed, claim that
values is syntactic sugar.
However, in CL one can declare a
function type returning
multiple values:
(declaim (ftype (real &optional real) (values real real)) floor)
This means that floor returns two values, both
reals (as opposed to returning
a value of type (values real real)), i.e., in this case one might
claim abuse of notation.
Your case
In your specific case, type-of
is an ordinary function (i.e., not a macro or special operator).
You pass it a single object, 1, because, unless you are using
multiple-value-bind and
friends, only the first value is used, so
(type-of (values 1 2 3))
is identical to
(type-of 1)
and type of 1 is bit.
PS: Control return values
One use of values is to
control the return values of a function.
Normally a CL function's return values are those of the last form.
Sometimes it is not desirable, e.g., the last form return multiple
values and you want your function to return one value (or none,
like void in C):
(defun 2values (x y)
(floor y x))
(defun 1value (x y)
(values (floor y x)))
(defun no-values (x)
(print x)
(values))
The values function isn't just syntactic sugar for making a list for the caller to destructure.
For one, if the caller expects only a single value, it will get only one value (the first), not a list, from a form that returns multiple values. Since type-of takes only one value as an argument, it is giving you the type of the first value, 1. 1 is of type BIT.
Each Common Lisp implementation is free to pursue its own strategy for implementing multiple values. I learned a lot from what Frode Fjeld wrote about how his implementation, Movitz, handles it in The Movitz development platform, section 2.5.
If you make a CL implementation you could implement it with lists as long as it coheres to the spec. You need to handle one value specific and you need some way to tag zero, 2..n values and the other functions need to understand that format and print can be made to display it the same way as in other makes.
Most likely values and its sister functions is an optimization where the implementations use the stack instead of consing the values to a list structure just to get it destructured in the next level. In the olden times where RAM and CPU was not to be wasted it was very important, but I doubt you'll notice real trouble should you use destructuring-bind instead of multiple-value-bind today.
Common Lisp differs from Scheme a great deal in the positive direction that you can make a function, eg. floor where in it's calculations end up with the remainder in addition to the quotient answer, return all values at the same time but you are allowed to use it as if it only returned the very first value. I really miss that sometimes when writing Scheme since it demands you have a call-with-values that is similar to multiple-value-call or syntactic sugar like let-values to handle all the returned values that again makes you end up with making three versions in case you only need just one of the values.
How to iterate over items (keys, values) in Elisp hash-tables?
I created a hash-table (map, dictionary) using (make-hash-table) and populated it with different items. In Python i could iterate over dicts by:
for k in d # iterate over keys
for k in d.keys() # same
for k in d.values() # iterate over values
for k in d.items() # iterate over tuples (key, value)
How can i do the same the most succinct and elegant way possible, preferably without loop-macro?
(maphash (lambda (key value) ....your code here...) hash-table)
I'm going to advertise myself a bit, so take it with a grain of salt, but here are, basically, your options:
maphash - this is the built-in iteration primitive, fundamentally, no more ways to do it exist.
(loop for KEY being the hash-key of TABLE for VALUE being the hash-value of TABLE ...) is available in cl package. It will internally use maphash anyway, but it offers you some unification on top of different iterating primitives. You can use loop macro to iterate over multiple different things, and it reduces the clutter by removing the technical info from sight.
http://code.google.com/p/i-iterate/ Here's a library I'm working on to provide more versatile ways of iterating over different things and in different ways in Emacs Lisp. It is inspired by Common Lisp Iterate library, but it departed from it quite far (however, some basic principles still hold). If you were to try this library, the iteration over the hash-table would look like this: (++ (for (KEY VALUE) pairs TABLE) ...) or (++ (for KEY keys TABLE) ...) or (++ (for VALUE values TABLE) ...).
I will try to describe cons and pros of using either cl loop or i-iterate.
Unlike loop, iterate allows iterating over multiple hash-tables at once (but you must be aware of the additional cost it incurs: the keys of the second, third etc. hash-tables must be collected into a list before iterating, this is done behind the scenes).
Iterate provides arguably more Lisp-y syntax, which is easier to format in the editor.
With iterate you have more (and potentially even more in the future) options to combine iteration with other operations.
No one else so far is using it, beside myself :) It probably still has bugs and some things may be reworked, but it is near feature-freeze and is getting ready for proper use.
Significantly more people are familiar with either the built-in iteration primitives or the cl library.
Just as an aside, the full version of the iterate on hash-tables looks like this: (for VAR pairs|keys|values TABLE &optional limit LIMIT), where LIMIT stands for the number of element you want to look at (it will generate more efficient code, then if you were to break from the loop using more general-purpose tools).
maphash is the function you want. In addition I would suggest you to look at the manual (info "(elisp) Hash Tables")
Starting from 2013 there is a third-party library ht, which provides many convenient functions to operate on Elisp hash-tables.
Suppose you have a hash-table, where keys are strings and values are integers. To iterate over a hash-table and return a list, use ht-map:
(ht-map (lambda (k v) (+ (length k) v)) table)
;; return list of all values added to length of their keys
ht-each is just an alias for maphash. There are also anaphoric versions of the above 2 functions, called ht-amap and ht-aeach. Instead of accepting an anonymous function, they expose variables key and value. Here's the equivalent expression to the one above:
(ht-amap (+ (length key) value) table)
I would have preferred to put this into a comment, but my reputation
rating ironically prevents me from writing this in the appropriate
format...
loop is considered deprecated and so is the cl library,
because it didn't adhere to the convention of prefixing all symbols by
a common library prefix and thus polluted the obarray with symbols
without clear library association.
Instead use cl-lib which defines the same functions and macros but
names them e.g. cl-loop and cl-defun instead of loop and
defun*. If you need only the macros, you can import cl-macs
instead.
Is it good style to use cons for pairs of things or would it be preferable to stick to lists?
like for instance questions and answers:
(list
(cons
"Favorite color?"
"red")
(cons
"Favorite number?"
"123")
(cons
"Favorite fruit?"
"avocado"))
I mean, some things come naturally in pairs; there is no need for something that can hold more than two, so I feel like cons would be the natural choice. However, I also feel like I should be sticking to one thing (lists).
What would be the better or more accepted style?
What you have there is an association list (alist). Alist entries are, indeed, often simple conses rather than lists (though that is a matter of preference: some people use lists for alist entries too), so what you have is fine. Though, I usually prefer to use literal syntax:
'(("Favorite color?" . "red")
("Favorite number?" . "123")
("Favorite fruit?" . "avocado"))
Alists usually use a symbol as the key, because symbols are interned, and so symbol alists can be looked up using assq instead of assoc. Here's how it might look:
'((color . "red")
(number . "123")
(fruit . "avocado"))
The default data-structure for such case should be a HASH-TABLE.
An association list of cons pairs is also a possible variant and was widely used historically. It is a valid variant, because of tradition and simplicity. But you should not use it, when the number of pairs exceeds several (probably, 10 is a good threshold), because search time is linear, while in hash-table it is constant.
Using a list for this task is also possible, but will be both ugly and inefficient.
You would need to decide for yourself based upon circumstances. There isn't a universal answer. Different tasks work differently with structures. Consider the following:
It is faster to search in a hash-table for keys, then it is in the alist.
It is easier to have an iterator and save its state, when working with alist (hash-table would need to export all of its keys as an array or a list and have a pointer into that list, while it is enough to only remember the pointer into alist to be able to restore the iterator's state and continue the iteration.
Alist vs list: they use the same amount of conses for even number of elements, given all other characters are atoms. When using lists vs alists you would have to thus make sure there isn't an odd number of elements (and you may discover it too late), which is bad.
But there are a lot more functions, including the built-in ones, which work on proper lists, and don't work on alists. For example, nth will error on alist, if it hits the cdr, which is not a list.
Some times certain macros would not function as you'd like them to with alists, for example, this:
(destructuring-bind (a b c d)
'((100 . 200) (300 . 400))
(format t "~&~{~s~^,~}" (list a b c d)))
will not work as you might've expected.
On the other hand, certain procedures may be "tricked" into doing something which they don't do for proper lists. For instance, when copying an alist with copy-list, only the conses, whose cdr is a list will be copied anew (depending upon the circumstances this may be a desired result).