Why is the null predicate called null, not nullp? - lisp

Basically, the title says it all: In Common Lisp, why is the null predicate called null. not nullp (to conform to other predicates such as evenp or oddp)? Is there a special reason for this?

First of all, null is not the only one. See atom.
Second, I think these predicates are fundamental ones and thus, very old. I don't think the 'end with p' agreement was introduced from the very beginning of LISP.
Also interesting info on the topic:
By convention, the names of predicates usually end in the letter p (which stands for 'predicate'). Common Lisp uses a uniform convention in hyphenating names of predicates. If the name of the predicate is formed by adding a p to an existing name, such as the name of a data type, a hyphen is placed before the final p if and only if there is a hyphen in the existing name. For example, number begets numberp but standard-char begets standard-char-p. On the other hand, if the name of a predicate is formed by adding a prefixing qualifier to the front of an existing predicate name, the two names are joined with a hyphen and the presence or absence of a hyphen before the final p is not changed. For example, the predicate string-lessp has no hyphen before the p because it is the string version of lessp (a MacLisp function that has been renamed < in Common Lisp). The name string-less-p would incorrectly imply that it is a predicate that tests for a kind of object called a string-less, and the name stringlessp would connote a predicate that tests whether something has no strings (is ``stringless'')!
Source: http://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node69.html

The predicate NULL is very old. It's called NULL in Common Lisp in the hope of encouraging the conversion of legacy Lisp code into Common Lisp
Gosh! It appears in the March 1959 version, and possibly earlier, of McCarthy's original memos (see page 3 of the pdf found here: http://dspace.mit.edu/handle/1721.1/6096 ) as one of the three basic predicates.
Amusingly this is before the introduction of car and cdr. Fascinating - in that memo elements in lists are separated by commas.

Related

Are periods in object names bad practice?

For example, a constraint for a default value of 0 could be named DF__tablename.columnname.
Although my search for this being bad practice doesn't yield results, in the numerous constraints examples I've seen on SO and many other sites, I never spotted a period.
Using period in an object name is bad practice.
Don't use dot character in an identifier. Yes it can be done but the drawbacks outweigh any benefits.
tl;dr
Special characters, such as a dot, are not allowed in regular identifiers. If an identifier does not follow the rules for regular identifier, then references to the identifier must be enclosed in square brackets (or ANSI double quotes).
https://learn.microsoft.com/en-us/sql/relational-databases/databases/database-identifiers?view=sql-server-2017
In terms of the period (dot character), using that in an identifier is not allowed in a regular identifier; but it could be used within square brackets.
The dot character is even more of a special-ish character in SQL; it's used to separate an identifier from a preceding qualifier.
SELECT mytable.mycolumn FROM mytable
We could also write that as
SELECT [mytable].[mycolumn] FROM mytable
We could also write
SELECT [mytable.mycolumn] FROM mytable
but that means something very different. With that, we aren't referencing a column named mycolumn, we are now referencing an identifier that contains a dot character.
SQL Server will deal with this just fine.
But if we do this, and start using the dot character in our identifiers, we will be causing confusion and frustration to future readers. Any benefit we would gain by using dot characters in identifiers is going to be far outweighed by the downside for others.
Similarly, why we don't create tables named WHERE (1=1) OR, or create columns named SUBSTR(foo.bar,1,10) to avoid monstrosities like
SELECT [SUBSTR(foo.bar,1,10)] FROM [WHERE (1=1)] OR]
Which may be valid SQL, but it will cause future readers to become very upset, and cause them to curse us, our descendants and loved ones. Don't make them do that. For the love of all that is good and beautiful in this world, don't use dot characters in identifiers.
It is perfectly valid to have periods in the object names. However, this requires you to use square brackets around the object name when referring to it. In case you forget these square brackets you will get some error messages that can be less intuitive to the inexperienced developer. For this reason I recommend not to use periods in the object names. I would also guess this is the main reason you don't often see examples of periods in object names on the internet.
In your example, you could use another underscore instead of the period, like this: DF__tablename_columnname

Mathematical formula terms in Scala

Our application relies on lots of equations, which, to correspond with the standard scientific names, use variable names like mu_k, (if the standard is $\mu_k$). (We could debate whether scientists should switch to CS style descriptive variable names, but often the terms don't really describe anything, they are just part of equations, and, more over, we need our code to match the known literature.)
In C this is easy to name vars this way: int mu_k. We are considering porting our code to Scala, but I know that val mu_k is discouraged in Scala, because underscores have special meanings.
If we use underscores only in the middle of the var name (e.g. mu_k) and not beginning or end (e.g. _x or x_), will this present a problem in Scala?
What is the recommended naming convention for Scala in this case?
You are right that underscores are discouraged in variable names in Scala, which implies that they are not forbidden. In my opinion, a convention should be followed wherever sensible.
In the case of mathematical formulae, I disagree that the Greek letters don't convey a meaning; the meaning is not necessarily intuitively descriptive for non-mathematicians, but as you say, the reference to the usage in a paper may be meaningful and important. Therefore, sticking with the underscore won't hurt, although I would probably prefer a more Scala-style way as muX when possible and meaningful. If you want a perfect answer, you might need to perform a usability test with your developers. In the specific example, I personally find mu_x more readable than muX, but that might differ among individuals.
I don't think the Scala compiler has a problem with underscores in the examples you described. Presumably, even leading and trailing underscores are fine, but should indeed be avoided strictly because they have a special meaning: http://docs.scala-lang.org/style/naming-conventions.html#methods.
Underscores are not special in any way in identifiers. There are a lot of special meanings for the underscore in Scala, but not in identifiers. (There is a special rule in identifiers that if you want to mix alphanumeric characters and operator characters in the same identifier, they have to be separated by an underscore, e.g. foo? is not a legal identifier, but foo_? is.)
So, there is no problem using an identifier with an underscore in it.
It is generally preferred to use camelCase and PascalCase for alphanumeric identifiers, and not mix alphanumeric and operator characters in the same identifier (i.e. use maxBy instead of max_by and use isFoo instead of foo_?) but that's just a coding convention whose purpose is to reduce the number of "unspecial" underscores, so that you can quickly scan for the "special" ones.
But in your case, you are using special naming conventions anyway, so you don't need to adhere to the community naming conventions as strictly.
However, I personally would actually prefer the name µ_k over mu_k.
That's as far as it goes with Scala, unfortunately. The Fortress programming language by Sun/Oracle did allow boldface, overstrike, superscripts and subscripts in identifier names, so something like µk would have been possible as a legal identifier, but sadly, Fortress was abandoned a couple of years ago.
I'm not stating this is the correct way, and myself would be rather discouraged to do this, but you can use full string literals as identifiers:
From: http://www.scala-lang.org/files/archive/spec/2.11/01-lexical-syntax.html
id ::= plainid
| ‘’ stringLiteral ‘’
Finally, an identifier may also be formed by an arbitrary string
between back-quotes (host systems may impose some restrictions on
which strings are legal for identifiers). The identifier then is
composed of all characters excluding the backquotes themselves.
So this is valid:
val ’mu k‘
(sorry, for formatting)

Why can't CASE be used on string values and only symbol values?

In book 'land of lisp' I read
Because the case command uses eq for comparisons, it is usually used
only for branching on symbol values. It cannot be used to branch on
string values, among other things.
Please explain why?
The other two excellent answers do answer the question asked. I will try to answer the natural next question - why does case use eql?
The reason is actually the same as in C (where the corresponding switch statement uses numeric comparison): the case forms in Lisp are usually compiled to something like goto, so (case x (1 ...) (2 ...) (3 ...)) is much more efficient than the corresponding cond. This is often accomplished by compiling case to a hash table lookup which maps the value being compared to the clause directly.
That said, the next question would be - why not have a case variant with equal hash table clause lookup instead of eql? Well, this is not in the ANSI standard, but implementations can provide such extensions, e.g., ext:fcase in CLISP.
See also why eql is the default comparison.
Two strings with the same content "foo" and "foo" are not EQL. CASE uses EQL as a comparison (not EQ as in your question). Usually one might want different tests: string comparison case and case insensitive, for example. But for CASE on cannot use another test. EQL is built-in. EQL compares for pointer equality, numbers and characters. But not string contents. You can test if two strings are the identical data objects, though.
So, two strings "FOO" and "FOO" are usually two different objects.
But two symbols FOO and FOO are usually really the same object. That's a basic feature of Lisp. Thus they are EQL and CASE can be used to compare them.
Because (eq "foo" "foo") is not necessarily true. Each time you type a string literal, it may create a fresh, unique string. So when CASE is comparing the value with the literals in the cases with EQ, they won't match.

valid characters for lisp symbols

First of all, as I understand it variable identifiers are called symbols in common lisp.
I noted that while in languages like C variable identifiers can only be alphanumberics and underscores, Common Lisp allows many more characters to be used like "*" and (at least scheme does) "?"
So, what I want to know is: what exactly is the full set of characters that Common Lisp allows to have in a symbol (or variable identifier if I'm wrong)? is that the same for Scheme?
Also, is the set of characters different for function names?
I've been googling, looking in the CLHS, and in Practical Common Lisp, and for the life of me, something must be wrong because I can't seem to find the answer.
A detailed answer is a bit tricky. There is the ANSI standard for Common Lisp. It defines the set of available characters. Basically you can use all those defined characters for symbols. See also Symbols as Tokens.
For example
|Polynom 2 * x ** 3 - 5 * x ** 2 + 10|
is a valid symbol. Note that the vertical bars mark the symbol and do not belong to the symbol name.
Then there are the existing implementations of Common Lisp and their support of various character sets and string types. So several support Unicode (or similar) and allow Unicode characters in symbol names.
LispWorks:
CL-USER 1 > (list 'δ 'ψ 'σ '\|)
(δ ψ σ \|)
[From a Schemer's perspective. Even though some concepts in Scheme and Common Lisp have the same name, it does not mean that the mean the same thing in the two languages.]
First note that symbols and identifiers are two different things.
Symbols can be thought of as strings which support fast equality comparision.
Two symbols s and t are equal (more or less) if they are spelled the same way. The operation string=? needs to loop over the characters in the and see if they are all alike. This take time proportional to the length of the shortest string. Symbols on the other hand are automatically (ny the runtime system) put into a (typically) hash table. Therefore symbol=? boils down to a simple pointer comparison and is thus very fast. Symbols are often used in cases where one in C would use enumerations.
Symbols are values that can be present at runtime.
Identifiers are simply names of variables in a program.
Now if said program is to be represented as a Scheme value, one choice would be to use symbols to represent identifiers - but that does not mean symbols are identifiers (or vice versa). A better representation of identifiers (still in Scheme) is syntax objects which besides the name of the identifier also records the where the identifier was read (or constructed). Say you encounter an undefined variable and want to signal where in the program the undefined variable is, then is very convenient that the source location is part of the representation of the identifier.
Last but not least. What are the legal characters of an identifer? Here it is best to quote chapter and version from R6RS:
4.2.4 Identifiers
Most identifiers allowed by other programming languages are also
acceptable to Scheme. In general, a sequence of letters, digits, and
“extended alphabetic characters” is an identifier when it begins with
a character that cannot begin a representation of a number object. In
addition, +, -, and ... are identifiers, as is a sequence of letters,
digits, and extended alphabetic characters that begins with the
two-character sequence ->. Here are some examples of identifiers:
lambda q soup
list->vector + V17a
<= a34kTMNs ->-
the-word-recursion-has-many-meanings
Extended alphabetic characters may be used within identifiers as if
they were letters. The following are extended alphabetic characters:
! $ % & * + - . / : < = > ? # ^ _ ~
Moreover, all characters whose Unicode scalar values are greater than
127 and whose Unicode category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd,
Nl, No, Pd, Pc, Po, Sc, Sm, Sk, So, or Co can be used within
identifiers. In addition, any character can be used within an
identifier when specified via an <inline hex escape>. For
example, the identifier H\x65;llo is the same as the identifier
Hello, and the identifier \x3BB; is the same as the identifier
λ.
Any identifier may be used as a variable or as a syntactic keyword
(see sections 5.2 and 9.2) in a Scheme program. Any identifier may
also be used as a syntactic datum, in which case it represents a
symbol (see section 11.10).
From: http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-7.html#node_sec_4.2.4
See Chapter 2 of the CLHS, which describes the reader algorithm in detail. But the simple answer is that if a token isn't a readmacro invocation (section 2.4), and isn't a number or all dots, it defaults to being interpreted as a symbol.

Why is the hyphen conventional in symbol names in LISP?

What's the reason of this recommendation? Why not keeping consistent with other programming languages which use underscore instead?
I think that LISP uses the hyphen for two reasons: "history" and "because you can".
History
LISP is an old language, and in the early days typing an underscore could be challenging. For example, the first terminal I used for LISP was an ASR-33 teletype. On some hosts and teletype models, the key sequence for the underscore character would be interpreted as a left-pointing arrow (the assignment operator in Smalltalk). Hyphens could be typed more reliably.
Because You Can
In LISP, there are no infix operators (well, few). So there is no ambiguity concerning whether x-1 means "x minus 1" or "x hyphen 1". The early pioneers liked the look of the hypen for multiword symbols (or were stuck on ASR-33s as well :).
Just a guess: it may be because it resembles English language's compound words (like "well-known", "merry-go-round", etc). Like Paul said in comment, it's one of the oldest languages and for the creators of LISP hyphen might have seemed more natural than, for example, an underscore.
Side note: I, personally, do like it, because it separates words, but at the same time makes the long identifier look as a whole (compare fooBarBaz, foo-bar-baz and foo_bar_baz).
In written natural languages the - sign is often used as a way to make compound words. In German for example we compose german nouns just by appending them:
Hofbräuhaus
Above consist of three parts: Hof bräu haus.
But when we write concepts in German which have foreign names as their part we write them like this:
Mubarak-Regime
In natural languages it is not common to compose words by CamelCase or Under_Score.
The design of most Lisps was more oriented towards the linguistic tradition. The convention in some languages to use the underscore came up, because in these languages the - sign was already taken for the minus operation and the identifiers of theses were not allowed to include the - sign. The - sign is a identifier terminating character in these languages. Not in Lisp.
Note though that one can use the underscore in Lisp identifiers, though this is rarely use in code for aesthetic reasons.
One can also use every character in an identifier. The vertical bar encloses an arbitrary symbol:
|this *#^! symbol is valid - why is that po_ss_ib_le?|
> (defun |this *#^! symbol is valid - why is that po_ss_ib_le?| (|what? really?|)
(+ |what? really?| 42))
|this *#^! symbol is valid - why is that po_ss_ib_le?|
> (|this *#^! symbol is valid - why is that po_ss_ib_le?| 42)
84
Note that the backslash is an escape character in symbol names.