I am doing some text processing, part of it is splitting words into single characters. Every character gets interned as a symbol in upper case with some frequency value assigned to it just for the sake of easiness but as one might imagine, there is a stumbling block in form of the T constant.
The solution I am looking at now is to simply use a lowercase symbol instead of upper case T, however I am wondering if there would be a quick and easy way to verify if the symbol at hand is T.
All I can think of is:
(intern (if (string= "T" (symbol-name symbol)) #\t symbol)
but that just does not look nice since string comparison is not cumbersome. Any ideas?
PS. I need all the symbols in upper case since it is less hassle to evaluate them in listener but I can live with one lowercase t.
You should use a hash table instead of hacking the current package into an ad-hoc one. It sidesteps the T issue entirely, and is a far cleaner solution.
If concision is a concern, you can have a function like (defun frequency (char) (gethash char the-table)), which you ought to use even in the main body of code, since aside from being shorter it means your code is written in terms of "frequencies of characters" rather than in terms of "looking up values in a hash table."
If you're looking for the ultimate in keyboarding minimalism for the REPL, you can go so far as to define a reader macro such as:
(set-macro-character #\?
(lambda (stream char)
(declare (ignore char))
(let ((char (read stream)))
`(frequency (character ',char))))
t)
Which I guess you might not understand wholly, but nevertheless you can inspect the frequency of #\A with something as simple as ?A.
Anyways, the point is to write code that accomplishes its objectives simply, perspicuously, and aligned to good style and "best practises," because if you desire something special-purpose like less-typing-in-the-REPL you can always pile on another abstraction layer.
You may shadow the symbol T:
CL-USER> (shadow 't)
COMMON-LISP:T
CL-USER> (let ((t 17)) t)
17
The shadowed constant T may still be referred to as cl:t.
Related
These all work:
(defun testcaseexpr (thecase)
(case thecase
('foo (format t "matched foo"))
(bar (format t "matched bar"))
((funk) (format t "matched funky"))))
Which of these three expressions is considered the idiomatic way? And perhaps as a side point, why are they all working, when clearly they are not the same syntax. In fact in other contexts they have different semantics completely. A list (funk) is certainly not the same as a quoted atom, 'foo. Yet just passing in the words foo bar and funk all work the same.
First, note that you've actually only got two cases here. 'foo is expanded by the reader as (quote foo), so your code is equivalent to
(defun testcaseexpr (thecase)
(case thecase
((quote foo) (format t "matched foo"))
(bar (format t "matched bar"))
((funk) (format t "matched funky"))))
wherein the first and third cases have the same structure; the keys part of the clause is a list of objects.
Perhaps this question is off-topic, since it's asking for the “best”, and that might be primarily opinion based. I agree with the points made in wvxvw's answer, but I tend to use the style you've shown in the third case almost exclusively. I've got a couple reasons for this:
It's the most general form.
It's the most general form. In the documentation for case, we read that in an normal-clause ::= (keys form*) keys is a designator for a list of keys. This means that a clause like (2 (print 'two)) is equivalent to ((2) (print 'two)). You never lose anything by using a list instead of a non-list, but if you have some clauses with multiple objects and some with single objects, you'll have consistent syntax for all of them. E.g., you can have
(case operator
((and or) ...)
((if iff) ...)
((not) ...))
It's harder to mess up.
It makes it harder to mess up the special cases of t and otherwise. The documentation says about keys that (emphasis added):
keys—a designator for a list of objects. In the case of case, the
symbols t and otherwise may not be used as the keys designator. To
refer to these symbols by themselves as keys, the designators (t) and
(otherwise), respectively, must be used instead.
In practice, some implementations will let you use t and otherwise as keys in normal-clauses, even though it seems like this shouldn't be allowed. E.g., in SBCL:
CL-USER> (macroexpand-1 '(case keyform
(otherwise 'a)
(otherwise 'b)))
(LET ((#:G962 KEYFORM))
(DECLARE (IGNORABLE #:G962))
(COND ((EQL #:G962 'OTHERWISE) NIL 'A)
(T NIL 'B)))
Using explicit lists removes any ambiguity about what you're trying to do. Even though t and otherwise are called out specifically, keys is a list designator, which means that nil (an atom and a list) needs some special consideration. Will the following code produce a or b? (Can you tell without testing it or checking the spec? This case is actually highlighted in the examples.)
(case nil
(nil 'a)
(otherwise 'b))
It returns b. To return a, the first normal-clause would have to be ((nil) 'a).
Conclusion
If you always make sure that keys is a list, you'll:
end up with more consistent looking code;
avoid edge-case bugs (especially if you're writing macros that expand into case); and
make your intentions clearer.
Second :)
First is never used, unless you expand a macro into something like it by accident, and third is used when you have more then one matching symbol (a fall-through case).
I've written an ad hoc parser generator that creates code to convert an old and little known 7-bit character set into unicode. The call to the parser generator expands into a bunch of defuns enclosed in a progn, which then get compiled. I only want to expose one of the generated defuns--the top-level one--to the rest of the system; all the others are internal to the parser and only get called from within the dynamic scope of the top-level one. Therefore, the other defuns generated have uninterned names (created with gensym). This strategy works fine with SBCL, but I recently tested it for the first time with CLISP, and I get errors like:
*** - FUNCALL: undefined function #:G16985
It seems that CLISP can't handle functions with uninterned names. (Interestingly enough, the system compiled without a problem.) EDIT: It seems that it can handle functions with uninterned names in most cases. See the answer by Rörd below.
My questions is: Is this a problem with CLISP, or is it a limitation of Common Lisp that certain implementations (e.g. SBCL) happen to overcome?
EDIT:
For example, the macro expansion of the top-level generated function (called parse) has an expression like this:
(PRINC (#:G75735 #:G75731 #:G75733 #:G75734) #:G75732)
Evaluating this expression (by calling parse) causes an error like the one above, even though the function is definitely defined within the very same macro expansion:
(DEFUN #:G75735 (#:G75742 #:G75743 #:G75744) (DECLARE (OPTIMIZE (DEBUG 2)))
(DECLARE (LEXER #:G75742) (CONS #:G75743 #:G75744))
(MULTIPLE-VALUE-BIND (#:G75745 #:G75746) (POP-TOKEN #:G75742)
...
The two instances of #:G75735 are definitely the same symbol--not two different symbols with the same name. As I said, this works with SBCL, but not with CLISP.
EDIT:
SO user Joshua Taylor has pointed out that this is due to a long standing CLISP bug.
You don't show one of the lines that give you the error, so I can only guess, but the only thing that could cause this problem as far as I can see is that you are referring to the name of the symbol instead of the symbol itself when trying to call it.
If you were referring to the symbol itself, all your lisp implementation would have to do is lookup that symbol's symbol-function. Whether it's interned or not couldn't possibly matter.
May I ask why you haven't considered another way to hide the functions, i.e. a labels statement or defining the functions within a new package that exports only the one external function?
EDIT: The following example is copied literally from an interaction with the CLISP prompt.
As you can see, calling the function named by a gensym is working as expected.
[1]> (defmacro test ()
(let ((name (gensym)))
`(progn
(defun ,name () (format t "Hello!"))
(,name))))
TEST
[2]> (test)
Hello!
NIL
Maybe your code that's trying to call the function gets evaluated before the defun? If there's any code in the macro expansion besides the various defuns, it may be implementation-dependent what gets evaluated first, and so the behaviour of SBCL and CLISP may differ without any of them violating the standard.
EDIT 2: Some further investigation shows that CLISP's behaviour varies depending upon whether the code is interpreted directly or whether it's first compiled and then interpreted. You can see the difference by either directly loading a Lisp file in CLISP or by first calling compile-file on it and then loading the FASL.
You can see what's going on by looking at the first restart that CLISP offers. It says something like "Input a value to be used instead of (FDEFINITION '#:G3219)." So for compiled code, CLISP quotes the symbol and refers to it by name.
It seems though that this behaviour is standard-conforming. The following definition can be found in the HyperSpec:
function designator n. a designator for a function; that is, an object that denotes a function and that is one of: a symbol (denoting the function named by that symbol in the global environment), or a function (denoting itself). The consequences are undefined if a symbol is used as a function designator but it does not have a global definition as a function, or it has a global definition as a macro or a special form. See also extended function designator.
I think an uninterned symbol matches the "a symbol is used as a function designator but it does not have a global definition as a function" case for unspecified consequences.
EDIT 3: (I can agree that I'm not sure whether CLISP's behaviour is a bug or not. Someone more experienced with details of the standard's terminology should judge this. It comes down to whether the function cell of an uninterned symbol - i.e. a symbol that cannot be referred to by name, only by having a direct hold on the symbol object - would be considered a "global definition" or not)
Anyway, here's an example solution that solves the problem in CLISP by interning the symbols in a throwaway package, avoiding the matter of uninterned symbols:
(defmacro test ()
(let* ((pkg (make-package (gensym)))
(name (intern (symbol-name (gensym)) pkg)))
`(progn
(defun ,name () (format t "Hello!"))
(,name))))
(test)
EDIT 4: As Joshua Taylor notes in a comment to the question, this seems to be a case of the (10 year old) CLISP bug #180.
I've tested both workarounds suggested in that bug report and found that replacing the progn with locally actually doesn't help, but replacing it with let () does.
You can most certainly define functions whose names are uninterned symbols. For instance:
CL-USER> (defun #:foo (x)
(list x))
#:FOO
CL-USER> (defparameter *name-of-function* *)
*NAME-OF-FUNCTION*
CL-USER> *name-of-function*
#:FOO
CL-USER> (funcall *name-of-function* 3)
(3)
However, the sharpsign colon syntax introduces a new symbol each time such a form is read read:
#: introduces an uninterned symbol whose name is symbol-name. Every time this syntax is encountered, a distinct uninterned symbol is created. The symbol-name must have the syntax of a symbol with no package prefix.
This means that even though something like
CL-USER> (list '#:foo '#:foo)
;=> (#:FOO #:FOO)
shows the same printed representation, you actually have two different symbols, as the following demonstrates:
CL-USER> (eq '#:foo '#:foo)
NIL
This means that if you try to call such a function by typing #: and then the name of the symbol naming the function, you're going to have trouble:
CL-USER> (#:foo 3)
; undefined function #:foo error
So, while you can call the function using something like the first example I gave, you can't do this last one. This can be kind of confusing, because the printed representation makes it look like this is what's happening. For instance, you could write such a factorial function like this:
(defun #1=#:fact (n &optional (acc 1))
(if (zerop n) acc
(#1# (1- n) (* acc n))))
using the special reader notation #1=#:fact and #1# to later refer to the same symbol. However, look what happens when you print that same form:
CL-USER> (pprint '(defun #1=#:fact (n &optional (acc 1))
(if (zerop n) acc
(#1# (1- n) (* acc n)))))
(DEFUN #:FACT (N &OPTIONAL (ACC 1))
(IF (ZEROP N)
ACC
(#:FACT (1- N) (* ACC N))))
If you take that printed output, and try to copy and paste it as a definition, the reader creates two symbols named "FACT" when it comes to the two occurrences of #:FACT, and the function won't work (and you might even get undefined function warnings):
CL-USER> (DEFUN #:FACT (N &OPTIONAL (ACC 1))
(IF (ZEROP N)
ACC
(#:FACT (1- N) (* ACC N))))
; in: DEFUN #:FACT
; (#:FACT (1- N) (* ACC N))
;
; caught STYLE-WARNING:
; undefined function: #:FACT
;
; compilation unit finished
; Undefined function:
; #:FACT
; caught 1 STYLE-WARNING condition
I hope I get the issue right. For me it works in CLISP.
I tried it like this: using a macro for creating a function with a GENSYM-ed name.
(defmacro test ()
(let ((name (gensym)))
`(progn
(defun ,name (x) (* x x))
',name)))
Now I can get the name (setf x (test)) and call it (funcall x 2).
Yes, it is perfectly fine defining functions that have names that are unintenred symbols. The problem is that you cannot then call them "by name", since you can't fetch the uninterned symbol by name (that is what "uninterned" means, essentially).
You would need to store the uninterned symbol in some sort of data structure, to then be able to fetch the symbol. Alternatively, store the defined function in some sort of data structure.
Surprisingly, CLISP bug 180 isn't actually an ANSI CL conformance bug. Not only that, but evidently, ANSI Common Lisp is itself so broken in this regard that even the progn based workaround is a courtesy of the implementation.
Common Lisp is a language intended for compilation, and compilation produces issues regarding the identity of objects which are placed into compiled files and later loaded ("externalized" objects). ANSI Common Lisp requires that literal objects reproduced from compiled files are only similar to the original objects. (CLHS 3.2.4 Literal Objects in Compiled Files).
Firstly, according to the definition similarity (3.2.4.2.2 Definition of Similarity), the rules for uninterned symbols is that similarity is name based. If we compile code with a literal that contains an uninterned symbol, then when we load the compiled file, we get a symbol which is similar and not (necessarily) the same object: a symbol which has the same name.
What if the same uninterned symbol is inserted into two different top-level forms which are then compiled as a file? When the file is loaded, are those two similar to each other at least? No, there is no such requirement.
But it gets worse: there is also no requirement that two occurrences of the same uninterned symbol in the same form will be externalized in such a way that their relative identity is preserved: that the re-loaded version of that object will have the same symbol object in all the places where the original was. In fact, the definition of similarity contains no provision for preserving the circular structure and substructure sharing. If we have a literal like '#1=(a b . #1#), as a literal in a compiled file, there appears to be no requirement that this be reproduced as a circular object with the same graph structure as the original (a graph isomorphism). The similarity rule for conses is given as naive recursion: two conses are similar if their respective cars and cdrs are similar. (The rule can't even be evaluated for circular objects; it doesn't terminate).
That the above works is because of implementations going beyond what is required in the spec; they are providing an extension consistent with (3.2.4.3 Extensions to Similarity Rules).
Thus, purely according to ANSI CL, we cannot expect to use macros with gensyms in compiled files, at least in some ways. The expectation expressed in code like the following runs afoul of the spec:
(defmacro foo (arg)
(let ((g (gensym))
(literal '(blah ,g ,g ,arg)))
...))
(defun bar ()
(foo 42))
The bar function contains a literal with two insertions of a gensym, which according to the similarity rules for conses and symbols need not reproduce as a list containing two occurrences of the same object in the second and third positions.
If the above works as expected, it's due to "extensions to the similarity rules".
So the answer to the "Why can't CLISP ..." question is that although CLISP does provide an extension for similarity which preserves the graph structure of literal forms, it doesn't do it across the entire compiled file, only within individual top level items within that file. (It uses *print-circle* to emit the individual items.) The bug is that CLISP doesn't conform to the best possible behavior users can imagine, or at least to a better behavior exhibited by other implementations.
Does emacs lisp have a function that provides a unique object identifier, such as e.g. a memory address? Python has id(), which returns an integer guaranteed to be unique among presently existing objects. What about elisp?
The only reason I know for wanting a function like id() is to compare objects, and ensure that they only compare equal if they are the same (as in, in the same memory location). In Lisps, this is done a bit differently from in Python:
In most lisps, including elisp, there are several different notions of equality. The most expensive, and weakest equivalence is equal. This is not what you want, since two lists (say) are equal if they have the same elements (tested recursively with equal). As such
(equal (list 1 2) (list 1 2)) => T
is true. At the other end of the spectrum is eq, which tests "identity" rather than equality:
(eq (list 1 2) (list 1 2)) => NIL
This is what you want, I think.
So, it seems that Python works by providing one equality test, and then a function that gives you a memory location for each object, which then can be compared as integers. In Elisp (and at least Common Lisp too), on the other hand, there is more than one meaning of "equality".
Note, there is also "eql", which lies somewhere between the two.
(EDIT: My original answer probably wasn't clear enough about why the distinction between eq and equal probably solves the problem the original poster was having)
There is no such feature in Emacs Lisp, as far as I know. If you only need equality, use eq, which performs a pointer comparison behind the scenes.
If you need a printable unique identifier, use gensym from the cl package.
If you need a unique identifier to serve as an index in a data structure, use gensym (or maintain your own unique id — gensym is simpler and less error-prone).
Some languages bake a unique id into every object, but this has a cost: either every object needs extra memory to store the id, or the id is derived from the address of the object, which precludes modifying the address. Python chooses to pay the cost, Emacs chooses not to.
My whole point in asking the question was that I was looking for a way to distinguish between the printed representations of different symbols that have the same name. Thanks to the elisp manual, I've discovered the variable print-gensym, which, when non-nil, causes #: to be prepended to uninterned symbols printed. Moreover, if the same call to print prints the same uninterned symbol more than once, it will mark the first one with #N= and subsequent ones with `#N#. This is exactly the kind of functionality I was looking for. For example:
(setq print-gensym t)
==> t
(make-symbol "foo")
==> #:foo
(setq a (make-symbol "foo"))
==> #:foo
(cons a a)
==> (#1=#:foo . #1#)
(setq b (make-symbol "foo"))
==> #:foo
(cons a b)
==> (#:foo . #:foo)
The #: notation works for read as well:
(setq a '#:foo)
==> #:foo
(symbol-name a)
==> "foo"
Note the ' on '#:foo--the #: notation is a symbol-literal. Without the ', the uninterned symbol is evaluated:
(symbol-name '#:foo)
==> "foo"
(symbol-name #:foo)
==> (void-variable #:foo)
i've been reading some lisp code and came across this section, didn't quite understand what it specifically does, though the whole function is supposed to count how many times the letters from a -z appear in an entered text.
(do ((i #.(char-code #\a) (1+ i)))
((> i #.(char-code #\z)))
can anyone explain step by step what is happening? I know that it's somehow counting the letters but not quite sure how.
This Lisp code is slightly unusual, since it uses read-time evaluation. #.expr means that the expression will be evaluated only once, during read-time.
In this case a clever compiler might have guessed that the character code of a given character is known and could have removed the computation of character codes from the DO loop. The author of that code chose to do that by evaluating the expressions before the compiler sees it.
The source looks like this:
(do ((i #.(char-code #\a) (1+ i)))
((> i #.(char-code #\z)))
...)
When Lisp reads in the s-expression, we get this new code as the result (assuming a usual encoding of characters):
(do ((i 97 (1+ i)))
((> i 122))
...)
So that's a loop which counts the variable i up from 97 to 122.
Lisp codes are written as S-Expression. In a typical S-Expression sytax, the first element of any S-expression is treated as operator and the rest as operand. Operands can either be an atom or another S-expression. Please note, an atom is a single data object. Keeping this in mind
char-code
(char-code #\a) - returns the ascii representation of a character here its 'a'.
The do syntax looks similar to the below
(do ((var1 init1 step1)
(var2 init2 step2)
...)
(end-test result)
statement1
...)
So in your example
(do ((i #.(char-code #\a) (1+ i)))
((> i #.(char-code #\z)))
)
The first s-expression operand of do is the loop initialization, the second s-expression operand is the end-test.
So this means you are simply iterating over 'a' through 'z' incrementing i by 1.
In C++ (Not sure your other language comfort level, you can write
for(i='a';i<='z';i++);
the trick with the code you show is in poor form. i know this because i do it all
the time. the code makes an assumtion that the compiler will know the current fixnum
for each character. #.(char-code #\a) eq [(or maybe eql if you are so inclided) unsigned small integer or unsigned 8 bit character with a return value of a positive fixnum].
The # is a reader macro (I'm fairly sure you know this :). Using two reader macros is
not a great idea but it is fast when the compiler knows the datatype.
I have another example. Need to search for ascii in a binary stream:
(defmacro code-char= (byte1 byte2)
(flet ((maybe-char-code (x) (if characterp x) (char-code x) x)))
`(the fixnum (= (the fixnum ,(maybe-char-code byte1)
(the fixnum ,(maybe-char-code byte2)))))
))
Declaring the return type in sbcl will probably insult the complier, but I leave it as a sanity check (4 me not u).
(code-char= #\$ #x36)
=>
t
. At least I think so. But somehow I think you might know your way around some macros ... Hmmmm... I should turn on the machine...
If you're seriously interested, there is some assembler for the 286 (8/16 bit dos assembler) that you can use a jump table. It works fast for the PC , I'd have to look it up...
I've been working through Practical Common Lisp and as an exercise decided to write a macro to determine if a number is a multiple of another number:
(defmacro multp (value factor)
`(= (rem ,value ,factor) 0))
so that :
(multp 40 10)
evaluates to true whilst
(multp 40 13)
does not
The question is does this macro leak in some way? Also is this "good" Lisp? Is there already an existing function/macro that I could have used?
Siebel gives an extensive rundown (for simple cases anyway) of possible sources of leaks, and there aren't any of those here. Both value and factor are evaluated only once and in order, and rem doesn't have any side effects.
This is not good Lisp though, because there's no reason to use a macro in this case. A function
(defun multp (value factor)
(zerop (rem value factor)))
is identical for all practical purposes. (Note the use of zerop. I think it makes things clearer in this case, but in cases where you need to highlight, that the value you're testing might still be meaningful if it's something other then zero, (= ... 0) might be better)
Your macro looks fine to me. I don't know what a leaky macro is, but yours is pretty straightforward and doesn't require any gensyms. As far as if this is "good" Lisp, my rule of thumb is to use a macro only when a function won't do, and in this case a function can be used in place of your macro. However, if this solution works for you there's no reason not to use it.
Well, in principle, a user could do this:
(flet ((= (&rest args) nil))
(multp 40 10))
which would evaluate to NIL... except that ANSI CL makes it illegal to rebind most standard symbols, including CL:=, so you're on the safe side in this particular case.
In generial, of course, you should be aware of both referential untransparency (capturing identifiers from the context the macro is expanded in) and macro unhygiene (leaking identifiers to expanded code).
No, no symbol introduced in the macro's "lexical closure" is released to the outside.
Note that leaking isn't NECESSARILY a bad thing, even if accidental leaking almost always is. For one project I worked on, I found that a macro similar to this was useful:
(defmacro ana-and (&rest forms)
(loop for form in (reverse forms)
for completion = form then `(let ((it ,form))
(when it
,completion))
finally (return completion)))
This allowed me to get "short-circuiting" of things needed to be done in sequence, with arguments carried over from previous calls in the sequence (and a failure signalled by returning NIL). The specific context this code is from is for a hand-written parser for a configuration file that has a cobbled-together-enough syntax that writing a proper parser using a parser generator was more work than hand-rolling.