I read in the book "Land of Lisp", the author mentions syntax expression. Does that mean the ability to express syntax as a form of data? Is this the same as S-expression (symbolic expression)?
A symbolic expression is data which is serialized as known from Lisp. It uses symbols, strings, numbers, lists and more. Lists are written in the form of ( expression* ).
Thee author of Land of Lisp talks about syntax expressions and Lisp syntax expressions. Seems like this is something he invented (discovered?). ;-) He probably means an expression in Lisp syntax, where something like (walk right) is such an expression with the first element of the list being a verb.
In Common Lisp a valid expression of the programming language is called a Lisp form. So an s-expression can express all kinds of data, but not all s-expressions are valid Lisp programs. For example (defun) is not a valid Common Lisp program, since it lacks a function name and a parameter list - plus the optional declarations, documentation and implementation body: (defun foo ()).
Related
I understand that both suppress evaluation of a symbol or expression. But the backtick is used for macro definitions while the apostrophe is used for symbols (among other things). What is the difference, semantically speaking, between these two notations?
Backticks allow for ,foo and ,#foo to interpolate dynamic parts into the quoted expression.
' straight up quotes everything literally.
If there are no comma parts in the expression, ` and ' can be used interchangeably.
A standard quote is a true constant literal and similar lists and list that end with the same structure can share values:
'(a b c d) ; ==> (a b c d)
A backquoted structure might not be a literal. It is evaluated as every unquote needs to be evaluated and inserted into place. This means that something like `(a ,#b ,c d) actually gets expanded to something similar to (cons 'a (append b (cons c '(d)))).
The standard is very flexible on how the implementations solves this so if you try to macroexpand the expression you get many different solutions and sometimes internal functions. The result though is well explained in the standard.
NB: Even though two separate evaluation produces different values the implementation is still free to share structure and thus in my example '(d) has the potential to be shared and if one would use mutating concatenation of the result might end up with an infinite structure.
A parallel to this is that in some algol languages you have two types of strings. One that interpolates variables and one that don't. Eg. in PHP
"Hello $var"; // ==> 'Hello Shoblade'
'Hello $var'; // ==> 'Hello $var'
For the love of the almighty I have yet to understand the purpose of the symbol 'iamasymbol. I understand numbers, booleans, strings... variables. But symbols are just too much for my little imperative-thinking mind to take. What exactly do I use them for? How are they supposed to be used in a program? My grasp of this concept is just fail.
In Scheme and Racket, a symbol is like an immutable string that happens to be interned so that symbols can be compared with eq? (fast, essentially pointer comparison). Symbols and strings are separate data types.
One use for symbols is lightweight enumerations. For example, one might say a direction is either 'north, 'south, 'east, or 'west. You could of course use strings for the same purpose, but it would be slightly less efficient. Using numbers would be a bad idea; represent information in as obvious and transparent a manner as possible.
For another example, SXML is a representation of XML using lists, symbols, and strings. In particular, strings represent character data and symbols represent element names. Thus the XML <em>hello world</em> would be represented by the value (list 'em "hello world"), which can be more compactly written '(em "hello world").
Another use for symbols is as keys. For example, you could implement a method table as a dictionary mapping symbols to implementation functions. To call a method, you look up the symbol that corresponds to the method name. Lisp/Scheme/Racket makes that really easy, because the language already has a built-in correspondence between identifiers (part of the language's syntax) and symbols (values in the language). That correspondence makes it easy to support macros, which implement user-defined syntactic extensions to the language. For example, one could implement a class system as a macro library, using the implicit correspondence between "method names" (a syntactic notion defined by the class system) and symbols:
(send obj meth arg1 arg2)
=>
(apply (lookup-method obj 'meth) obj (list arg1 arg2))
(In other Lisps, what I've said is mostly truish, but there are additional things to know about, like packages and function vs variable slots, IIRC.)
A symbol is an object with a simple string representation that (by default) is guaranteed to be interned; i.e., any two symbols that are written the same are the same object in memory (reference equality).
Why do Lisps have symbols? Well, it's largely an artifact of the fact that Lisps embed their own syntax as a data type of the language. Compilers and interpreters use symbols to represent identifiers in a program; since Lisp allows you to represent a program's syntax as data, it provides symbols because they're part of the representation.
What are they useful apart from that? Well, a few things:
Lisp is commonly used to implement embedded domain-specific languages. Many of the techniques used for that come from the compiler world, so symbols are an useful tool here.
Macros in Common Lisp usually involve dealing with symbols in more detail than this answer provides. (Though in particular, generation of unique identifiers for macro expansions requires being able to generate a symbol that's guaranteed never to be equal to any other.)
Fixed enumeration types are better implemented as symbols than strings, because symbols can be compared by reference equality.
There are many data structures you can construct where you can get a performance benefit from using symbols and reference equality.
Symbols in lisp are human-readable identifiers. They are all singletons. So if you declare 'foo somewhere in your code and then use 'foo again, it will point to the same place in memory.
Sample use: different symbols can represent different pieces on a chessboard.
From Structure and Interpretation of Computer Programs Second Edition by Harold Abelson and Gerald Jay Sussman 1996:
In order to manipulate symbols we need a new element in our language:
the ability to quote a data object. Suppose we want to construct the list
(a b). We can’t accomplish this with (list a b), because this expression
constructs a list of the values of a and b rather than the symbols themselves.
This issue is well known in the context of natural languages, where words
and sentences may be regarded either as semantic entities or as character
strings (syntactic entities). The common practice in natural languages is to use quotation marks to indicate that a word or a sentence is to be treated
literally as a string of characters. For instance, the first letter of “John” is
clearly “J.” If we tell somebody “say your name aloud,” we expect to hear
that person’s name. However, if we tell somebody “say ‘your name’ aloud,”
we expect to hear the words “your name.” Note that we are forced to nest
quotation marks to describe what somebody else might say.
We can follow this same practice to identify lists and symbols that are
to be treated as data objects rather than as expressions to be evaluated.
However, our format for quoting differs from that of natural languages in
that we place a quotation mark (traditionally, the single quote symbol ’)
only at the beginning of the object to be quoted. We can get away with this in Scheme syntax because we rely on blanks and parentheses to delimit
objects. Thus, the meaning of the single quote character is to quote the
next object.
Now we can distinguish between symbols and their values:
(define a 1)
(define b 2)
(list a b)
(1 2)
(list ’a ’b)
(a b)
(list ’a b)
(a 2)
Lists containing symbols can look just like the expressions of our language:
(* (+ 23 45) (+ x 9))
(define (fact n) (if (= n 1) 1 (* n (fact (- n 1)))))
Example: Symbolic Differentiation
A symbol is just a special name for a value. The value could be anything, but the symbol is used to refer to the same value every time, and this sort of thing is used for fast comparisons. As you say you are imperative-thinking, they are like numerical constants in C, and this is how they are usually implemented (internally stored numbers).
To illustrate the point made by Luis Casillas, it might be useful to observe how symbols eval differently than strings.
The example below is for mit-scheme (Release 10.1.10). For convenience, I use this function as eval:
(define my-eval (lambda (x) (eval x (scheme-report-environment 5))))
A symbol can easily evaluate to the value or function it names:
(define s 2) ;Value: s
(my-eval "s") ;Value: "s"
(my-eval s) ;Value: 2
(define a '+) ;Value: a
(define b "+") ;Value: b
(my-eval a) ;Value: #[arity-dispatched-procedure 12]
(my-eval b) ;Value: "+"
((my-eval a) 2 3) ;Value: 5
((my-eval b) 2 3) ;ERROR: The object "+" is not applicable.
How do digital calculators convert the expression for parsing? Is it infix, prefix, or postfix? How do calculators solve expressions?
Nowadays it is most common to enter single line expressions in digital calculators just as you would write them down, so normally infix notation. Converting to postfix is a common way to evaluate those expressions in order to solve them.
There are lots of examples around, this C++ implementation has lots of implementation information and also briefly explains handling parentheses, left associative vs. right associative operators and stuff like that.
What's the reason of this recommendation? Why not keeping consistent with other programming languages which use underscore instead?
I think that LISP uses the hyphen for two reasons: "history" and "because you can".
History
LISP is an old language, and in the early days typing an underscore could be challenging. For example, the first terminal I used for LISP was an ASR-33 teletype. On some hosts and teletype models, the key sequence for the underscore character would be interpreted as a left-pointing arrow (the assignment operator in Smalltalk). Hyphens could be typed more reliably.
Because You Can
In LISP, there are no infix operators (well, few). So there is no ambiguity concerning whether x-1 means "x minus 1" or "x hyphen 1". The early pioneers liked the look of the hypen for multiword symbols (or were stuck on ASR-33s as well :).
Just a guess: it may be because it resembles English language's compound words (like "well-known", "merry-go-round", etc). Like Paul said in comment, it's one of the oldest languages and for the creators of LISP hyphen might have seemed more natural than, for example, an underscore.
Side note: I, personally, do like it, because it separates words, but at the same time makes the long identifier look as a whole (compare fooBarBaz, foo-bar-baz and foo_bar_baz).
In written natural languages the - sign is often used as a way to make compound words. In German for example we compose german nouns just by appending them:
Hofbräuhaus
Above consist of three parts: Hof bräu haus.
But when we write concepts in German which have foreign names as their part we write them like this:
Mubarak-Regime
In natural languages it is not common to compose words by CamelCase or Under_Score.
The design of most Lisps was more oriented towards the linguistic tradition. The convention in some languages to use the underscore came up, because in these languages the - sign was already taken for the minus operation and the identifiers of theses were not allowed to include the - sign. The - sign is a identifier terminating character in these languages. Not in Lisp.
Note though that one can use the underscore in Lisp identifiers, though this is rarely use in code for aesthetic reasons.
One can also use every character in an identifier. The vertical bar encloses an arbitrary symbol:
|this *#^! symbol is valid - why is that po_ss_ib_le?|
> (defun |this *#^! symbol is valid - why is that po_ss_ib_le?| (|what? really?|)
(+ |what? really?| 42))
|this *#^! symbol is valid - why is that po_ss_ib_le?|
> (|this *#^! symbol is valid - why is that po_ss_ib_le?| 42)
84
Note that the backslash is an escape character in symbol names.
I'm sure I join many in being glad there's finally a powerful language tied tightly to a mainstream GUI/Database/Communication framework.
I haven't been sure where to post this, but here seems the best spot.
I need to use Unicode symbol characters either as operators or as function names. I'd like syntactic sugar, but I don't need it.
Guy Steele pointed out in Communications of the ACM that "*" was a forced choice when it was adopted from Ascii as multiply, but my software works in Unicode, so I'm not tethered to Ascii anymore.
!$%&*+-./<=>?,#^|~:
Part of localization includes local programmers. Why limit the set of operators that can be defined in F#? It isn’t orthogonal to C#'s and F#'s acceptance of many Unicode IsLetter in identifiers.
Also, F# is likely to be used for symbolic manipulation of problems from logic, math, physicists, etc. It makes work much easier if there’s a direct mapping into the language of the basic operators. (F# and C# accept many Unicode IsLetter? as well as IsDigit’? This is a request to allow Unicode IsSymbol? As operators with the precedence of, for example, *, or, since “+” is both a unary and binary operator, I could put up with the precedence of + and make up the difference with parenthesized groupings.
Consider the domain-specific needs of logicians, mathematicians, physicists, etc. I’d rather write a symbolic differentiator or integrator using math symbols than Ascii permutations of already-taken operators.
Logic: ∀ ∃ ⇒
Math: ∑ ∫ ∂
Group theory: ≤ ≥ ∈ ∉
Set Theory: ⊆ ⊇ ⊃ ∪ ∩
Tensors: ⊗
I’ve written many languages in other languages, but because F# is tightly .Net-integrated, this issue poses special challenges without language support:
It’s trivial to cobble up a translator that takes Unicode-operator F# source and maps it, line-by-line, to Ascii-operator F# source.
But when debugging, how do I make sure the programmer still sees their untranslated source? And that they can see variable values.
Operators and converts them is trivial. But how do I ensure the translation is what gets compiled, while the programmer sees their own source? If I map line-for-line correctly, how do I ensure they can still point at a variable and see its value?
There is a math (Unicode) symbol extension for F# available in the Visual Studio Gallery.
This allows you to define Unicode symbols, e.g.:
let inline (~∑) xs = xs |> Seq.sum
let total = ∑myList
You may be interested in Project Fortress which is a new functional programming language that embraces the Unicode character set (among many other features). In particular, see the Mathematical Syntax in Fortress page which contains some sample code.
For an interesting discussion on this check: http://cs.hubfs.net/forums/thread/9690.aspx
Other languages, such as Scala, do permit operators from outside the ASCII range -- mathematical symbols(Sm) and other symbols(So)