Writing an Eval Procedure in Scheme? - lisp

My problem isn't with the built-in eval procedure but how to create a simplistic version of it. Just for starters I would like to be able to take this in '(+ 1 2) and have it evaluate the expression + where the quote usually takes off the evaluation.
I have been thinking about this and found a couple things that might be useful:
Unquote: ,
(quasiquote)
(apply)
My main problem is regaining the value of + as a procedure and not a symbol. Once I get that I think I should just be able to use it with the other contents of the list.
Any tips or guidance would be much appreciated.

Firstly, if you're doing what you're doing, you can't go wrong reading at least the first chapter of the Metalinguistic Abstraction section of Structure and Interpretation of Computer Programs.
Now for a few suggestions from myself.
The usual thing to do with a symbol for a Scheme (or, indeed, any Lisp) interpreter is to look it up in some sort of "environment". If you're going to write your own eval, you will likely want to provide your own environment structures to go with it. The one thing for which you could fall back to the Scheme system you're building your eval on top of is the initial environment containing bindings for things like +, cons etc.; this can't be achieved in a 100% portable way, as far as I know, due to various Scheme systems providing different means of getting at the initial environment (including the-environment special form in MIT Scheme and interaction-environment in (Petite) Chez Scheme... and don't ask me why this is so), but the basic idea stays the same:
(define (my-eval form env)
(cond ((self-evaluating? form) form)
((symbol? form)
;; note the following calls PCS's built-in eval
(if (my-kind-of-env? env)
(my-lookup form env)
;; apparently we're dealing with an environment
;; from the underlying Scheme system, so fall back to that
;; (note we call the built-in eval here)
(eval form env)))
;; "applicative forms" follow
;; -- special forms, macro / function calls
...))
Note that you will certainly want to check whether the symbol names a special form (lambda and if are necessary -- or you could use cond in place of if -- but you're likely to want more and possibly allow for extentions to the basic set, i.e. macros). With the above skeleton eval, this would have to take place in what I called the "applicative form" handlers, but you could also handle this where you deal with symbols, or maybe put special form handlers first, followed by regular symbol lookup and function application.

Related

lisp: when to use a function vs. a macro

In my ongoing quest to learn lisp, I'm running into a conceptual problem. It's somewhat akin to the question here, but maybe it's thematically appropriate to lisp that my question is a level of abstraction up.
As a rule, when should you create a macro vs. a function? It seems to me, maybe naively, that there would be very few cases where you must create a macro instead of a function, and that in most remainder cases, a function would generally suffice. Of these remainder cases, it seems like the main additional value of a macro would be in clarity of syntax. And if that's the case, then it seems like not just the decision to opt for macro use but also the design of their structures might be fundamentally idiosyncratic to the individual programmer.
Is this wrong? Is there a general case outlining when to use macros over functions? Am I right that the cases where a macro is required by the language are generally few? And lastly, is there a general syntactic form that's expected of macros, or are they generally used as shorthands by programmers?
I found a detailed answer, from Paul Graham's On Lisp, bold emphases added:
Macros can do two things that functions can’t: they can control (or prevent) the evaluation of their arguments, and they are expanded right into the calling context. Any application which requires macros requires, in the end, one or both of these properties.
...
Macros use this control in four major ways:
Transformation. The Common Lisp setf macro is one of a class of macros which pick apart their arguments before evaluation. A built-in access function will often have a converse whose purpose is to set what the access function retrieves. The converse of car is rplaca, of cdr, rplacd, and so on. With setf we can use calls to such access functions as if they were variables to be set, as in (setf (car x) ’a), which could expand into (progn (rplaca x ’a) ’a).
To perform this trick, setf has to look inside its first argument. To know that the case above requires rplaca, setf must be able to see that the first argument is an expression beginning with car. Thus setf, and any other operator which transforms its arguments, must be written as a macro.
Binding. Lexical variables must appear directly in the source code. The first argument to setq is not evaluated, for example, so anything built on setq must be a macro which expands into a setq, rather than a function which calls it. Likewise for operators like let, whose arguments are to appear as parameters in a lambda expression, for macros like do which expand into lets, and so on. Any new operator which is to alter the lexical bindings of its arguments must be written as a macro.
Conditional evaluation. All the arguments to a function are evaluated. In constructs like when, we want some arguments to be evaluated only under certain conditions. Such flexibility is only possible with macros.
Multiple evaluation. Not only are the arguments to a function all evaluated, they are all evaluated exactly once. We need a macro to define a construct like do, where certain arguments are to be evaluated repeatedly.
There are also several ways to take advantage of the inline expansion of macros. It’s important to emphasize that the expansions thus appear in the lexical context of the macro call, since two of the three uses for macros depend on that fact. They are:
Using the calling environment. A macro can generate an expansion containing a variable whose binding comes from the context of the macro call. The behavior of the following macro:
(defmacro foo (x) ‘(+ ,x y))
depends on the binding of y where foo is called.
This kind of lexical intercourse is usually viewed more as a source of contagion than a source of pleasure. Usually it would be bad style to write such a macro. The ideal of functional programming applies as well to macros: the preferred way to communicate with a macro is through its parameters. Indeed, it is so rarely necessary to use the calling environment that most of the time it happens, it happens by mistake...
Wrapping a new environment. A macro can also cause its arguments to be evaluated in a new lexical environment. The classic example is let, which could be implemented as a macro on lambda. Within the body of an expression like (let ((y 2)) (+ x y)), y will refer to a new variable.
Saving function calls. The third consequence of the inline insertion of macro expansions is that in compiled code there is no overhead associated with a macro call. By runtime, the macro call has been replaced by its expansion. (The same is true in principle of functions declared inline.)
...
What about those operators which could be written either way [i.e. as a function or a macro]?... Here are several points to consider when we face such choices:
THE PROS
Computation at compile-time. A macro call involves computation at two times: when the macro is expanded, and when the expansion is evaluated. All the macro expansion in a Lisp program is done when the program is compiled, and every bit of computation which can be done at compile-time is one bit that won’t slow the program down when it’s running. If an operator could be written to do some of its work in the macro expansion stage, it will be more efficient to make it a macro, because whatever work a smart compiler can’t do itself, a function has to do at runtime. Chapter 13 describes macros like avg which do some of their work during the expansion phase.
Integration with Lisp. Sometimes, using macros instead of functions will make a program more closely integrated with Lisp. Instead of writing a program to solve a certain problem, you may be able to use macros to transform the problem into one that Lisp already knows how to solve. This approach, when possible, will usually make programs both smaller and more efficient: smaller because Lisp is doing some of your work for you, and more efficient because production Lisp systems generally have had more of the fat sweated out of them than user programs. This advantage appears mostly in embedded languages, which are described starting in Chapter 19.
Saving function calls. A macro call is expanded right into the code where it appears. So if you write some frequently used piece of code as a macro, you can save a function call every time it’s used. In earlier dialects of Lisp, programmers took advantage of this property of macros to save function calls at runtime. In Common Lisp, this job is supposed to be taken over by functions declared inline.
By declaring a function to be inline, you ask for it to be compiled right into the calling code, just like a macro. However, there is a gap between theory and practice here; CLTL2 (p. 229) says that “a compiler is free to ignore this declaration,” and some Common Lisp compilers do. It may still be reasonable to use macros to save function calls, if you are compelled to use such a compiler...
THE CONS
Functions are data, while macros are more like instructions to the compiler. Functions can be passed as arguments (e.g. to apply), returned by functions, or stored in data structures. None of these things are possible with macros.
In some cases, you can get what you want by enclosing the macro call within a lambda-expression. This works, for example, if you want to apply or funcall certain macros:> (funcall #’(lambda (x y) (avg x y)) 1 3) --> 2. However, this is an inconvenience. It doesn’t always work, either: even if, like avg, the macro has an &rest parameter, there is no way to pass it a varying number of arguments.
Clarity of source code. Macro definitions can be harder to read than the equivalent function definitions. So if writing something as a macro would only make a program marginally better, it might be better to use a function instead.
Clarity at runtime. Macros are sometimes harder to debug than functions. If you get a runtime error in code which contains a lot of macro calls, the code you see in the backtrace could consist of the expansions of all those macro calls, and may bear little resemblance to the code you originally wrote.
And because macros disappear when expanded, they are not accountable at runtime. You can’t usually use trace to see how a macro is being called. If it worked at all, trace would show you the call to the macro’s expander function, not the macro call itself.
Recursion. Using recursion in macros is not so simple as it is in functions. Although the expansion function of a macro may be recursive, the expansion itself may not be. Section 10.4 deals with the subject of recursion in macros...
Having considered what can be done with macros, the next question to ask is: in what sorts of applications can we use them? The closest thing to a general description of macro use would be to say that they are used mainly for syntactic transformations. This is not to suggest that the scope for macros is restricted. Since Lisp programs are made from lists, which are Lisp data structures, “syntactic transformation” can go a long way indeed...
Macro applications form a continuum between small general-purpose macros like while, and the large, special-purpose macros defined in the later chapters. On one end are the utilities, the macros resembling those that every Lisp has built-in. They are usually small, general, and written in isolation. However, you can write utilities for specific classes of programs too, and when you have a collection of macros for use in, say, graphics programs, they begin to look like a programming language for graphics. At the far end of the continuum, macros allow you to write whole programs in a language distinctly different from Lisp. Macros used in this way are said to implement embedded languages.
Yes, the first rule is: don't use a macro where a function will do.
There are a few things you can't do with functions, for example conditional evaluation of code. Others become quite unwieldy.
In general I am aware of three recurring use cases for macros (which doesn't mean that there aren't any others):
Defining forms (e. g. defun, defmacro, define-frobble-twiddle)
These often have to take some code snippet, wrap it (e. g. in a lamdba form), and register it somewhere, maybe even multiple places. The users (programmers) should only concern themselves with the code snippet. This is thus mostly about removing boilerplate. Additionally, the macro can process the body, e. g. registering docstrings, handle declarations etc.
Example: Imagine that you are writing a sort of event mini-framework. Your event handlers are pure functions that take some input and produce an effect declaration (think re-frame from the Clojure world). You want these functions to be normal named functions so that you can just test them with the usual testing frameworks, but also register them in a lookup table for your event loop mechanism. You'd maybe want to have something like a define-handler macro:
(defvar *handlers* (make-hash-table)) ; internal for the framework
(defmacro define-handler (&whole whole name lambda-list &body body)
`(progn (defun ,#(rest whole))
(setf (gethash ,name *handlers*)
(lambda ,lambda-list ,#body)))) ; could also be #',name
Control constructs (e. g. case, cond, switch, some->)
These use conditional evaluation and convenient re-arrangement of the expression.
With- style wrappers
This is an idiom to provide unwind-protect functionality to some arbitrary resource. The difference to a general with construct (as in Clojure) is that the resource type can be anything, you don't have to reify it with something like a Closable interface.
Example:
(defmacro with-foo-bar-0 (&body body)
(let ((foo-bar (gensym "FOO-BAR")))
`(let (,foo-bar))
(shiftf ,foo-bar (aref (gethash :foo *buzz*) 0) 0)
(unwind-protect (progn ,#body)
(setf (aref (gethash :foo *buzz*) 0) ,foo-bar)))))
This sets something inside a nested data structure to 0, and ensures that it is reset to the value it had before on any, even non-local, exit.
[This is a much-reduced version of a longer, incomplete answer which I decided was not appropriate for SE.]
There are no cases where you must use a macro. Indeed, there are no cases where you must use a programming language at all: if you are happy to learn the order code for the machine you are using and competent with a keypunch then you can program that way.
Most of us are not happy doing that: we like to use programming languages. These have two obvious benefits and one less-obvious but far more important one. The two obvious benefits:
programming languages make programming easier;
programming languages make programs portable across machines.
The more important reason is that building languages is an enormously successful approach to problem solving for human beings. It's so successful that we do it all the time, without even thinking we are doing it. Every time we invent some new term for something we are in fact inventing a language; every time a mathematician invents some new bit of notation they are inventing a language. People like to sneer at these languages by calling them 'jargon', 'slang' or 'dialect' but, famously: a shprakh iz a dialekt mit an armey un flot (translated: a language is a dialect with an army and navy).
The same thing is true for programming languages as is true for natural languages, except that programming languages are designed to communicate both with other humans and with a machine, and the machine requires very precise instructions. This means that it can be rather hard to build programming languages, so people tend to stick with the languages they know.
Except that they don't: the approach of building a language to describe some problem is so powerful that people in fact do this anyway. But they don't know that they are doing it and they don't have the tools to do it so what they end up with tends to be a hideous monster stitched together from pieces of other things with the robustness and readability of custard. We've all dealt with such things. A common characteristic is 'language in a string' where one language appears within strings of another language, with constructs of this inner language being put together by string operations in the outer language. If you are really lucky this will go several levels deep (I have seen three).
These things are abominations, but they are still the best way of dealing with large problem areas. Well, they are the best way if you live in a world where constructing a new programming language is so hard that only special clever people can do it
But it's hard only because if your only tool is C then everything looks like a PDP-11. If instead we used a tool which made the incremental construction of programming languages easy by allowing them to be defined in terms of simpler versions of themselves in a lightweight way, then we could just construct whole families of programming languages in which to talk about various problems, each of which would simply be a point in the space of possible languages. And anyone could do this: it would be a little bit harder than just writing functions, because working out grammar rules is a little bit harder than thinking up new words, but it would not be a lot harder.
And that's what macros do: they let you define programming languages to talk about a particular problem area in a way which is extremely lightweight. One such language is Common Lisp, but it's just one starting point in the space of Lisp-family languages: a point from which you can build the language you actually want (and people, of course, will belittle these languages by calling them 'dialects': well, a programming language is only a dialect with a standards committee).
Functions let you add to the vocabulary of the language you are building. Macros let you add to the grammar of the language. Between them they let you define a new language in which to talk about the problem area you are interested in. And doing that is the whole point of programming in Lisp: Lisp is about building languages to talk about problem areas.
An soon as you are little familiar to macros, you will wonder why you ever had this question. :-)
Macros are in no way alternatives to functions and neither vice versa. It just seems to be so, if you are working on the REPL, because macro expansion, compilation and running is happening within the moment you are pressing [enter].
Macros are running at compile time, so any macro-processing is finished, as son as your definition runs. There is no way to "call" a macro at the runtime of the definition that involves this very macro.
Macros just calculate S-exprs, that will be passed to the compiler.
Just think of a macro as something, that is coding for you.
This is easier to understand with little more code in your editor than with small definitions the REPL. Good luck!

Lisp source code rewriting system

I would like to take Emacs Lisp code that has been macro expanded and unmacro expand it. I have asked this on the Emacs forum with no success. See:
https://emacs.stackexchange.com/questions/35913/program-rewriting-systems-unexpanded-a-defmacro-given-a-list-of-macros-to-undo
However one would think that this kind of thing, S-expression transformation, is right up Lisp's alley. And defmacro is I believe available in Lisp as it is in Emacs Lisp.
So surely there are program transformation systems, or term-rewriting systems that can be adapted here.
Ideally, in certain situations such a tool would be able to work directly off the defmacro to do its pattern find and replace on. However even if I have to come up with specific search and replace patterns manually to add to the transformation system, having such a framework to work in would still be useful
Summary of results so far: Although there have been a few answers that explore interesting possibilities, right now there is nothing definitive. So I think best to leave this open. I'll summarize some of the suggestions. (I've upvoted all the answers that were in fact answers instead of commentary on the difficulty.)
First, many people suggest considered the special form of macros that do expansion only,or as Drew puts it:
macro-expansion (i.e., not expansion followed by Lisp evaluation).
Macro-expansion is another way of saying reduction semantics, or
rewriting.
The current front-runner to my mind is in phils post where he uses a pattern-matching facility that seems specific to Emacs: pcase. I will be exploring this and will post results of my findings. If anyone else has thoughts on this please chime in.
Drew wrote a program called FTOC whose purpose was to convert Franz Lisp to Common Lisp; googling turns up a comp.lang.lisp posting
I found a Common Lisp package called optima with fare-quasiquote. Paulo thinks however this might not be powerful enough since it doesn't handle backtracking out of the box, but might be programmed in by hand. Although the generality of backtracking might be nice, I'm not convinced I need that for the most-used situations.)
Side note: Some seem put off by the specific application causing my initial interest. (But note that in research, it is not uncommon for good solutions to get applied in ways not initially envisioned.)
So in that spirit, here are a couple of suggestions for changing the end application. A good solution for these would probably translate to a solution for Emacs Lisp. (And if if helps you to pretend I'm not interested in Emacs Lisp, that's okay with me). Instead of a decompiler for Emacs Lisp, suppose I want to write a decompiler for clojure or some Common Lisp system. Or as suggested by Sylwester's answer, suppose I would like to automatically refactor my code by taking into account the benefit of using more concise macros that exist or that have gotten improved. Recall that at one time Emacs Lisp didn't have "when" or "unless" macros.
30-some years ago I did something similar, using macrolet.
(Actually, I used defmacro because we had only an early implementation of Common Lisp, which did not yet have macrolet. But macrolet is the right thing to use.)
I didn't translate macro-expanded code to what it was expanded from, but the idea is pretty much the same. You will come across some different difficulties, I expect, since your translation is even farther away from one-to-one.
I wrote a translator from (what was then) Franz Lisp to Common Lisp, to help with porting lots of existing code to a Lisp+Prolog-machine project. Franz Lisp back then was only dynamically scoped, while Common Lisp is (in general) lexically scoped.
And yes, obviously there is no general way to automatically translate Lisp code (in particular), especially considering that it can generate and then evaluate other code - but even ignoring that special case. Many functions are quite similar, but there is the lexical/dynamic difference, as well as significant differences in the semantics of some seemingly similar functions.
All of that has to be understood and taken for granted from the outset, by anyone wanting to make use of the results of translation.
Still, much that is useful can be done. And if the resulting code is self-documenting, telling you what it was derived from etc., then when in the resulting context you can decide just what to do with this or that bit that might be tricky (e.g., rewrite it manually, from scratch or just tweak it). In practice, lots of code was easily converted from Franz to Common - it saved much reprogramming effort.
The translator program was written in Common Lisp. It could be used interactively as well as in batch. When used interactively it provided, in effect, a Franz Lisp interpreter on top of Common Lisp.
The program used only macro-expansion (i.e., not expansion followed by Lisp evaluation). Macro-expansion is another way of saying reduction semantics, or rewriting.
Input Franz-Lisp code was macro-expanded via function-definition mapping macros to produce Common-Lisp code. Code that was problematic for translation was flagged (in code) with a description/analysis that described the situation.
The program was called FTOC. I think you can still find it, or at least references to it, by googling (ftoc lisp). (It was the first Lisp program I wrote, and I still have fond memories of the experience. It was a good way to learn both Lisp dialects and to learn Lisp in general.)
Have fun!
In general, I don't think you can do this. The expansion of an lisp macro is Turing complete, so you have to be able to predict the output of a program which could have arbitrary input.
There are some simple things that you could do. defmacros with backquoted forms in appear fairly similar in the output form and might be detected. This sort of heuristic would probably get you a long way.
What I don't understand is your use case. The macro-expanded version of a piece of code is usually only present in the compiled (or in emacs-lisp byte-compiled) form.
Ok so other people have pointed out the fact that this problem is impossible in general. There are two hard parts to this problem: one is that it could be a lot of work to find a preimage of some code fragment through a macro and it is also impossible to determine whether a macro was called or not—there are examples where one may write code which could have come from a macro without using that macro. Imagine for the sake of illustration an sha macro which expands to the SHA hash of the string literal passed to it. Then if you see some sha hash in your expanded code, it would obviously be silly to try to unexpand it. But it may be that the hash was put into the code as a literal, e.g. referencing a specific point in the history of a git repository so it would also be unhelpful to unexpand the macro.
Tractable subproblems
Let me preface this by saying that whilst these may be a little tractable, I still wouldn’t try to solve this problem.
Let’s ignore all the macros that do weird things (like the example above) and all the macros that are just as likely to not have been used in the original (e.g. cond vs if) and all the macros which generate complex code which seems like it would be difficult to unravel (e.g. loop, do, and backquote. Annoyingly these difficult cases are some of those which you would perhaps most want to unexpand). The type this leaves us with (that I’d like to focus on) are macros which basically just reduce boilerplate, e.g. save-excursion or with-XXXX. These are macros whose implementation consists of possibly making some fresh symbols (via gensym) and then having a big simple backquoted block of code. I still think it would be too hard to automatically go from defmacro to a function for unexpansion but I think you could attack some of these on a case-by-case basis. Do this by looking for the forms generated by the macro that delimit (I.e. begin/end) the expanded code. I can’t really offer much beyond that. This is still a hard problem and I don’t think any existing solutions (to other problems) will get you very far on your way.
A further complication I understand is that you do not start at the macroexpanded code but rather at the bytecode. Without knowing anything about the elisp compiler, I worry that more information would be lost in the compilation step and you would have to undo that as well, e.g. perhaps it is hard to determine which code goes inside a let or even when a let begins, or bytecode starts using goto type features even though elisp doesn’t have them.
You suggest that the reason you would like to unexpand macros is so you can decompile bytecode which sometimes comes up in the Emacs debugger and that this would be useful as even though the source code is available in theory, it isn’t always at your fingertips. I put it to you that if you want to make your life debugging elisp easier it would be more worthwhile to figure out how to have the Emacs debugger always take you to the source code for internal functions. This might involve installing extra debugging related packages or downloading the Emacs source code and setting some variable so Emacs knows where to find it or compiling Emacs yourself from source. I don’t really know about that but I bet getting thrown into bytecode instead of source would have been enough of a problem for Emacs developers over the past thirty years that a solution to that problem does exist.
If however what you really want to do is to try to implement a decompiler for elisp then I suppose that’s what you should do. A final observation is that while Lisp provides facilities which make manipulating Lisp code easy, this doesn’t help much with decompiling as all these facilities can be used in compilation so there are infinitely more patterns one might want to detect than in e.g. a C decompiler. Perhaps scheme style macros would be easier to unexpand, although they would still be hard.
If you’re decompiling because you want to give a better idea of which exact subexpression rather than line is being evaluated (normally Lisp debuggers work on expressions not lines anyway) in the debugger then perhaps it would actually be useful to see the code at the expanded level rather than the unexpanded one. Or perhaps it would be best to see both and maybe in between as well. Keeping track of what’s what through forwards macroexpansion is already difficult and fiddly. Doing it in reverse certainly won’t be easier. Good luck!
Edit: seeing as your not currently using Lisp anyway, I wonder if you might have more success using something like prolog for your unexpanding. You’d still have to manually write rules but I think it would be a large amount of work to try to derive rules from macro definitions.
I would like to take Emacs Lisp code that has been macro expanded and unmacro expand it.
Macros generate arbitrary expressions, which may contain macros recursively. You have no general way to revert the transformations, because it's not pattern-based.
Even if macros were pattern-based, they could still be infinite.
Even if macros were not infinite, they can certainly contain bugs in expansions of patterns that never matched. Given arbitrary code to try to unwind, it could match an expansion that looks like the code and try to revert to its pattern. Without bugs, you could still abuse this.
Even if you could revert macro expansion, some macros expand to the same code. An approach could be signalling a warning with a restart when all reversions expand equally minus the operator, such that if the restart doesn't handle the signal, it would choose the first expansion; and otherwise signalling an error with a restart, such that if the restart doesn't handle the signal, it errors. Or you could configure it to choose certain macros under certain conditions, such as in which package the code was found.
In practice, there are very few cases where reverting an expansion makes any sense. It could be a useful development tool that suggests macros, but I wouldn't generally rely on it for whole source transformations.
One way you could achieve what you want is through a controlled pattern matching. You could initially create patterns manually, which would already handle cases you care about directly, such as the ones you mention:
(if (not <cond>) <expr>) and (if (not <cond>) (progn <&expr>)) to (unless <cond> <&expr>)
You'd have to decide whether null would be equivalent to not. I personally don't mix the boolean meaning of nil with that of empty list or something else, e.g. no result, nothing found, null object, a designator, etc. But perhaps Lisp code as old as that in Emacs just uses them interchangeably.
(if <cond> <expr>) and (if <cond> (progn <&expr>)) to (when <cond> <&expr>)
If you feel like improving code overall, include cond with a single condition. And be careful with cond clauses with only the condition.
You should have a few dozen more, to see how the pattern matching behaves with more patterns to match in terms of time (CPU) and space (memory).
From the description of fare-quasiquote, optima doesn't support backtracking, which you probably want.
But you can do backtracking with optima by yourself, using recursion on complex inner patterns, and if nothing matches, return a control value to keep searching for matching patterns from the outer input.
Another approach is to treat a pattern as a description of a state machine, and handle each new token to advance the current state machines until one of them reaches the end, discarding the state machines that couldn't advance. This approach may consume more memory, depending on the amount of patterns, the similarity between patterns (if many have the same starting token, many state machines will be generated on a matching token), the length of the patterns and, last but not least, the length of the input (s-expression).
An advantage of this approach is that you can use it interactively to see which patterns have matched the most tokens, and you can give weights to patterns instead of just taking the first that matches.
A disadvantage is that, most probably, you'll have to spend effort to develop it.
EDIT: I just lousily described a kind of trie or radix tree.
Once you got something working, maybe try to obtain patterns automatically. This is really hard, you must probably limit it to simple backquoting and accept the fact you can't generalize for anything that contains more complex code.
I believe the hardest will be code walking, which is hard enough with source code, but much more with macro-expanded code. Perhaps if you could expand the whole picture a bit further to understand the goal, maybe someone could suggest a better approach other than operating on macro-expanded code.
However one would think that this kind of thing, S-expression transformation, is right up Lisp's alley. And defmacro is I believe available in Lisp as it is in Emacs Lisp.
So surely there are program transformation systems, or term-rewriting systems that can be adapted here.
There's a huge step from expanding code with defmacro and all that generality. Most Lisp developers will know about hygienic macros, at least in terms of symbols as variables.
But there's still hygienic macros in terms of symbols as operators1, code walking, interaction with a containing macro (usually using macrolet), etc. It's way too complex.
1.
Common Lisp evaluates the operator in a compound form in the lexical environment, and probably everyone makes macros that assume that the global macro or function definition of a symbol will be used.
But it might not be so:
(defmacro my-macro-1 ()
`1)
(defmacro my-macro-2 ()
`(my-function (my-macro-1)))
(defun my-function (n)
(* n 100))
(macrolet ((my-macro-1 ()
`2))
(flet ((my-function (n)
(* n 1000)))
(my-macro-2)))
That last line will expand to (my-function (my-macro-2)), which will be recursively expanded to (my-function 2). When evaluated, it will yield 2000.
For proper operator hygiene, you'd have to do something like this:
(defmacro my-macro-2 ()
;; capture global bindings of my-macro-1 and my-function-1 by name
(flet ((my-macro-1-global (form env)
(funcall (macro-function 'my-macro-1) form env))
(my-function-global (&rest args)
;; hope the compiler can optimize this
(apply 'my-function args)))
;; store them globally in uninterned symbols
;; hopefully, no one will mess with them
(let ((my-macro-1-symbol (gensym (symbol-name 'my-macro-1)))
(my-function-symbol (gensym (symbol-name 'my-function))))
(setf (macro-function my-macro-1-symbol) #'my-macro-1-global)
(setf (symbol-function my-function-symbol) #'my-function-global)
`(,my-function-symbol (,my-macro-1-symbol)))))
With this definition, the example will yield 100.
Common Lisp has some restrictions to avoid this, but it only states the consequences are undefined when (re)defining symbols in the common-lisp package, globally or locally. It doesn't require errors or warnings to be signaled.
I don't think it is possible to do this in general, but you can undo a pattern back into a macro use for every match if you supply code for each unmacroing. Code that mixed cond and if will end up being just if and your code would remove all if into cond making the reverse not the same as the starting point. The more macros you have and the more they expand into each other the more uncertain of the end result will be of the starting point.
You could have rules such that if is not translated into cond unless you used one of the features, like more than one predicate or implicit progn, but you have no idea if the coder actually did use cond everywhere because he liked in consistent regardless. Thus your unmacroing will acyually be more of a simplification.
I don't believe there's a general solution to that, and you certainly
can't guarantee that the structure of the output would match that of
the original code, and I'm not going near the idea of auto-generating
patterns and desired transformations from macro definitions; but you
might achieve a simple version of this with Emacs' own pcase pattern
matching facility.
Here's the simplest example I could think of:
With reference to the definition of when:
(defmacro when (cond &rest body)
(list 'if cond (cons 'progn body)))
We can transform code using a pcase pattern like so:
(let ((form '(if (and foo bar baz) (progn do (all the) things))))
(pcase form
(`(if ,cond (progn . ,body))
`(when ,cond ,#body))
(_ form)))
=> (when (and foo bar baz) do (all the) things)
Obviously if the macro definitions change, then your patterns will
cease to work (but that's a pretty safe kind of failure).
Caveat: This is the first time I've written a pcase form, and I
don't know what I don't know. It seems to work as intended, though.

Parameterizable return-from in Common Lisp

I'm learning blocks in Common lisp and did this example to see how blocks and the return-from command work:
(block b1
(print 1)
(print 2)
(print 3)
(block b2
(print 4)
(print 5)
(return-from b1)
(print 6)
)
(print 7))
It will print 1, 2, 3, 4, and 5, as expected. Changing the return-from to (return-from b2) it'll print 1, 2, 3, 4, 5, and 7, as one would expect.
Then I tried turn this into a function and paremetrize the label on the return-from:
(defun test-block (arg) (block b1
(print 1)
(print 2)
(print 3)
(block b2
(print 4)
(print 5)
(return-from (eval arg))
(print 6)
)
(print 7)))
and using (test-block 'b1) to see if it works, but it doesn't. Is there a way to do this without conditionals?
Using a conditional like CASE to select a block to return from
The recommended way to do it is using case or similar. Common Lisp does not support computed returns from blocks. It also does not support computed gos.
Using a case conditional expression:
(defun test-block (arg)
(block b1
(print 1)
(print 2)
(print 3)
(block b2
(print 4)
(print 5)
(case arg
(b1 (return-from b1))
(b2 (return-from b2)))
(print 6))
(print 7)))
One can't compute lexical go tags, return blocks or local functions from names
CLTL2 says about the restriction for the go construct:
Compatibility note: The ``computed go'' feature of MacLisp is not supported. The syntax of a computed go is idiosyncratic, and the feature is not supported by Lisp Machine Lisp, NIL (New Implementation of Lisp), or Interlisp. The computed go has been infrequently used in MacLisp anyway and is easily simulated with no loss of efficiency by using a case statement each of whose clauses performs a (non-computed) go.
Since features like go and return-from are lexically scoped constructs, computing the targets is not supported. Common Lisp has no way to access lexical environments at runtime and query those. This is for example also not supported for local functions. One can't take a name and ask for a function object with that name in some lexical environment.
Dynamic alternative: CATCH and THROW
The typically less efficient and dynamically scoped alternative is catch and throw. There the tags are computed.
I think these sorts of things boils down to the different types of namespaces bindings and environments in Common Lisp.
One first point is that a slightly more experienced novice learning Lisp might try to modify your attempted function to say (eval (list 'return-from ,arg)) instead. This seems to make more sense but still does not work.
Namespaces
A common beginner mistake in a language like scheme is having a variable called list as this shadows the top level definition of this as a function and stops the programmer from being able to make lists inside the scope for this binding. The corresponding mistake in Common Lisp is trying to use a symbol as a function when it is only bound as a variable.
In Common Lisp there are namespaces which are mappings from names to things. Some namespaces are:
The functions. To get the corresponding thing either call it: (foo a b c ...), or get the function for a static symbol (function foo) (aka #'foo) or for a dynamic symbol (fdefinition 'foo). Function names are either symbols or lists of setf and one symbol (e.g. (serf bar)). Symbols may alternatively be bound to macros in this namespace in which case function and fdefinition signal errors.
The variables. This maps symbols to the values in the corresponding variable. This also maps symbols to constants. Get the value of a variable by writing it down, foo or dynamically as (symbol-value). A symbol may also be bound as a symbol-macro in which case special macro expansion rules apply.
Go tags. This maps symbols to labels to which one can go (like goto in other languages).
Blocks. This maps symbols to places you can return from.
Catch tags. This maps objects to the places which catch them. When you throw to an object, the implementation effectively looks up the corresponding catch in this namespace and unwinds the stack to it.
classes (and structs, conditions). Every class has a name which is a symbol (so different packages may have a point class)
packages. Each package is named by a string and possibly some nicknames. This string is normally the name of a symbol and therefore usually in uppercase
types. Every type has a name which is a symbol. Naturally a class definition also defines a type.
declarations. Introduced with declare, declaim, proclaim
there might be more. These are all the ones I can think of.
The catch-tag and declarations namespaces aren’t like the others as they don’t really map symbols to things but they do have bindings and environments in the ways described below (note that I have used declarations to refer to the things that have been declared, like the optimisation policy or which variables are special, rather than the namespace in which e.g. optimize, special, and indeed declaration live which seems too small to include).
Now let’s talk about the different ways that this mapping may happen.
The binding of a name to a thing in a namespace is the way in which they are associated, in particular, how it may come to be and how it may be inspected.
The environment of a binding is the place where the binding lives. It says how long the binding lives for and where it may be accessed from. Environments are searched for to find the thing associated with some name in some namespace.
static and dynamic bindings
We say a binding is static if the name that is bound is fixed in the source code and a binding is dynamic if the name can be determined at run time. For example let, block and tags in a tagbody all introduce static bindings whereas catch and progv introduce dynamic bindings.
Note that my definition for dynamic binding is different from the one in the spec. The spec definition corresponds to my dynamic environment below.
Top level environment
This is the environment where names are searched for last and it is where toplevel definitions go to, for example defvar, defun, defclass operate at this level. This is where names are looked up last after all other applicable environments have been searched, e.g. if a function or variable binding can not be found at a closer level then this level is searched. References can sometimes be made to bindings at this level before they are defined, although they may signal warnings. That is, you may define a function bar which calls foo before you have defined foo. In other cases references are not allowed, for example you can’t try to intern or read a symbol foo::bar before the package FOO has been defined. Many namespaces only allow bindings in the top level environment. These are
constants (within the variables namespace)
classes
packages
types
Although (excepting proclaim) all bindings are static, they can effectively be made dynamic by calling eval which evaluates forms at the top level.
Functions (and [compiler] macros) and special variables (and symbol macros) may also be defined top level. Declarations can be defined toplevel either statically with the macro declaim or dynamically with the function proclaim.
Dynamic environment
A dynamic environment exists for a region of time during the programs execution. In particular, a dynamic environment begins when control flow enters some (specific type of) form and ends when control flow leaves it, either by returning normally or by some nonlocal transfer of control like a return-from or go. To look up a dynamically bound name in a namespace, the currently active dynamic environments are searched (effectively, ie a real system wouldn’t be implemented this way) from most recent to oldest for that name and the first binding wins.
Special variables and catch tags are bound in dynamic environments. Catch tags are bound dynamically using catch while special variables are bound statically using let and dynamically using progv. As we shall discuss later, let can make two different kinds of binding and it knows to treat a symbol as special if it has been defined with defvar or ‘defparameteror if it has been declared asspecial`.
Lexical environment
A lexical environment corresponds to a region of source code as it is written and a specific runtime instantiation of it. It (slightly loosely) begins at an opening parenthesis and ends at the corresponding closing parenthesis, and is instantiated when control flow hits the opening parenthesis. This description is a little complicated so let’s have an example with variables which are bound in a lexically environment (unless they are special. By convention the names special variables are wrapped in * symbols)
(defun foo ()
(let ((x 10))
(bar (lambda () x))))
(defun bar (f)
(let ((x 20))
(funcall f)))
Now what happens when we call (foo)? Well if x were bound in a dynamic environment (in foo and bar) then the anonymous function would be called in bar and the first dynamic environment with a binding for x would have it bound to 20.
But this call returns 10 because x is bound in a lexical environment so even though the anonymous function gets passed to bar, it remembers the lexical environment corresponding to the application of foo which created it and in that lexical environment, x is bound to 10. Let’s now have another example to show what I mean by ‘specific runtime instantiation’ above.
(defun baz (islast)
(let ((x (if islast 10 20)))
(let ((lx (lambda () x)))
(if islast
lx
(frob lx (baz t))))))
(defun frob (a b)
(list (funcall a) (funcall b)))
Now running (baz nil) will give us (20 10) because the first function passed to frob remembers the lexical environment for the outer call to baz (where islast is nil) whilst the second remembers the environment for the inner call.
For variables which are not special, let creates static lexical bindings. Block names (introduced statically by block), go tags (scopes inside a tagbody), functions (by felt or labels), macros (macrolet), and symbol macros (symbol-macrolet) are all bound statically in lexical environments. Bindings from a lambda list are also lexically bound. Declarations can be created lexically using (declare ...) in one of the allowed places or by using (locally (declare ...) ...) anywhere.
We note that all lexical bindings are static. The eval trick described above does not work because eval happens in the toplevel environment but references to lexical names happen in the lexical environment. This allows the compiler to optimise references to them to know exactly where they are without running code having to carry around a list of bindings or accessing global state (e.g. lexical variables can live in registers and the stack). It also allows the compiler to work out which bindings can escape or be captured in closures or not and optimise accordingly. The one exception is that the (symbol-)macro bindings can be dynamically inspected in a sense as all macros may take an &environment parameter which should be passed to macroexpand (and other expansion related functions) to allow the macroexpander to search the compile-time lexical environment for the macro definitions.
Another thing to note is that without lambda-expressions, lexical and dynamic environments would behave the same way. But note that if there were only a top level environment then recursion would not work as bindings would not be restored as control flow leaves their scope.
Closure
What happens to a lexical binding captured by an anonymous function when that function escapes the scope it was created in? Well there are two things that can happen
Trying to access the binding results in an error
The anonymous function keeps the lexical environment alive for as long as the functions referencing it are alive and they can read and write it as they please.
The second case is called a closure and happens for functions and variables. The first case happens for control flow related bindings because you can’t return from a form that has already returned. Neither happens for macro bindings as they cannot be accessed at run time.
Nonlocal control flow
In a language like Java, control (that is, program execution) flows from one statement to the next, branching for if and switch statements, looping for others with special statements like break and return for certain kinds of jumping. For functions control flow goes into the function until it eventually comes out again when the function returns. The one nonlocal way to transfer control is by using throw and try/catch where if you execute a throw then the stack is unwound piece by piece until a suitable catch is found.
In C there are is no throw or try/catch but there is goto. The structure of C programs is secretly flat with the nesting just specifying that “blocks” end in the opposite order to the order they start. What I mean by this is that it is perfectly legal to have a while loop in the middle of a switch with cases inside the loop and it is legal to goto the middle of a loop from outside of that loop. There is a way to do nonlocal control transfer in C: you use setjmp to save the current control state somewhere (with the return value indicating whether you have successfully saved the state or just nonlocally returned there) and longjmp to return control flow to a previously saved state. No real cleanup or freeing of memory happens as the stack unwinds and there needn’t be checks that you still have the function which called setjmp on the callstack so the whole thing can be quite dangerous.
In Common Lisp there’s a range of ways to do nonlocal control transfer but the rules are more strict. Lisp doesn’t really have statements but rather everything is built out of a tree of expressions and so the first rule is that you can’t nonlocally transfer control into a deeper expression, you may only transfer out. Let’s look at how these different methods of control transfer work.
block and return-from
You’ve already seen how these work inside a single function but recall that I said block names are lexically scoped. So how does this interact with anonymous functions?
Well suppose you want to search some big nested data structure for something. If you were writing this function in Java or C then you might implement a special search function to recurse through your data structure until it finds the right thing and then return it all the way up. If you were implementing it in Haskell then you would probably want to do it as some kind of fold and rely on lazy evaluation to not do too much work. In Common Lisp you might have a function which applies some other function passed as a parameter to each item in the data structure. And now you can call that with a searching function. How might you get the result out? Well just return-from to the outer block.
tagbody and go
A tagbody is like a progn but instead of evaluating single symbols in the body, they are called tags and any expression within the tagbody can go to them to transfer control to it. This is partly like goto, if you’re still in the same function but if your go expression happens inside some anonymous function then it’s like a safe lexically scoped longjmp.
catch and throw
These are most similar to the Java model. The key difference between block and catch is that block uses lexical scoping and catch uses dynamic scoping. Therefore their relationship is like that between special and regular variables.
Finally
In Java one can execute code to tidy things up if the stack has to unwind through it as an exception is thrown. This is done with try/finally. The Common Lisp equivalent is called unwind-protect which ensures a form is executed however control flow may leave it.
Errors
It’s perhaps worth looking a little at how errors work in Common Lisp. Which of these methods do they use?
Well it turns out that the answer is that errors instead of generally unwinding the stack start by calling functions. First they look up all the possible restarts (ways to deal with an error) and save them somewhere. Next they look up all applicable handlers (a list of handlers could, for example, be stored in a special variable as handlers have dynamic scope) and try each one at a time. A handler is just a function so it might return (ie not want to handle the error) or it might not return. A handler might not return if it invokes a restart. But restarts are just normal functions so why might these not return? Well restarts are created in a dynamic environment below the one where the error was raised and so they can transfer control straight out of the handler and the code that threw the error to some code to try to do something and then carry on. Restarts can transfer control using go or return-from. It is worth noting that it is important here that we have lexical scope. A recursive function could define a restart on each successive call and so it is necessary to have lexical scope for variables and tags/block names so that we can make sure we transfer control to the right level on the call stack with the right state.

Should macros have side effects?

Can (or should) a macro expansion have side effects? For example, here is a macro which actually goes and grabs the contents of a webpage at compile time:
#lang racket
(require (for-syntax net/url))
(require (for-syntax racket/port))
(define-syntax foo
(lambda (syntx)
(datum->syntax #'lex
(port->string
(get-pure-port
(string->url
(car (cdr (syntax->datum syntx)))))))))
Then, I can do (foo "http://www.pointlesssites.com/") and it will be replaced with "\r\n<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\"\r\n\t <and so on>"
Is this good practice, or not? Am I garunteed that Racket will only run this code once? If I add a (display "running...") line to the macro, it only prints once, but I would hate to generalize from one example ...
PS - the reason I'm asking is because I actually think this could be really useful sometimes. For example this is a library which allows you to load (at compile-time) a discovery document from the Google API Discovery service and automatically create wrappers for it. I think it would be really cool if the library actually fetched the discovery document from the web, instead of from a local file.
Also, to give an example of a macro with a different kind of side effects: I once built a macro which translated a small subset of Racket into (eta-expanded) lambda calculus (which is, of course, still runnable in Racket). Whenever the macro finished translating a function, it would store the result in a dictionary so that later invocations of the macro could make use of that function definition in their own translations.
The short answer
It's fine for macros to have side effects, but you should make sure that your program doesn't change behavior when it's compiled ahead of time.
The longer answer
Macros with side effects are a powerful tool, and can let you do things that make programs much easier to write, or enable things that wouldn't be possible at all. But there are pitfalls to be aware of when you use side effects in macros. Fortunately, Racket provides all the tools to make sure that you can do this correctly.
The simplest kind of macro side effect is where you use some external state to find the code you want to generate. The examples you list in the question (reading Google API description) are of this kind. An even simpler example is the include macro:
#lang racket
(include "my-file.rktl")
This reads the contents of myfile.rktl and drops it in place right where the include form is used.
Now, include isn't a good way to structure your program, but this is a quite benign sort of side effect in the macro. It works the same if you compile the file ahead of time as if you don't, since the result of the include is part of the file.
Another simple example that's not good is something like this:
#lang racket
(define-syntax (show-file stx)
(printf "using file ~a\n" (syntax-source stx))
#'(void))
(show-file)
That's because the printf gets executed only at compile time, so if you compile your program that uses show-file ahead of time (as with raco make) then the printf will happen then, and won't happen when the program is run, which probably isn't the intention.
Fortunately, Racket has a technique for letting you write macros like show-file effectively. The basic idea is to leave residual code that actually performs the side effect. In particular, you can use Racket's begin-for-syntax form for this purpose. Here's how I would write show-file:
#lang racket
(define-syntax (show-file stx)
#`(begin-for-syntax
(printf "using file ~a\n" #,(syntax-source stx))))
(show-file)
Now, instead of happening when the show-file macro is expanded, the printf happens in the code that show-file generates, with the source embedded in the expanded syntax. That way your program keeps working correctly in the presence of ahead-of-time compilation.
There are other uses of macros with side effects, too. One of the most prominent in Racket is for inter-module communication -- because require doesn't produce values that the requiring module can get, to communicate between modules the most effective way is to use side effects. To make this work in the presence of compilation requires almost exactly the same trick with begin-for-syntax.
This is a topic that the Racket community, and I in particular, have thought a lot about, and there are several academic papers talking about how this works:
Composable and Compilable Macros: You want it when?, Matthew Flatt, ICFP 2002
Advanced Macrology and the Implementation of Typed Scheme, Ryan Culpepper, Sam Tobin-Hochstadt, and Matthew Flatt, Scheme Workshop 2007
Languages as Libraries, Sam Tobin-Hochstadt, Ryan Culpepper, Vincent St-Amour, Matthew Flatt, and Matthias Felleisen, PLDI 2011
In common Lisp the function eval-when allows you to decide when the macro will be expanded.

What makes Lisp macros so special?

Reading Paul Graham's essays on programming languages one would think that Lisp macros are the only way to go. As a busy developer, working on other platforms, I have not had the privilege of using Lisp macros. As someone who wants to understand the buzz, please explain what makes this feature so powerful.
Please also relate this to something I would understand from the worlds of Python, Java, C# or C development.
To give the short answer, macros are used for defining language syntax extensions to Common Lisp or Domain Specific Languages (DSLs). These languages are embedded right into the existing Lisp code. Now, the DSLs can have syntax similar to Lisp (like Peter Norvig's Prolog Interpreter for Common Lisp) or completely different (e.g. Infix Notation Math for Clojure).
Here is a more concrete example:Python has list comprehensions built into the language. This gives a simple syntax for a common case. The line
divisibleByTwo = [x for x in range(10) if x % 2 == 0]
yields a list containing all even numbers between 0 and 9. Back in the Python 1.5 days there was no such syntax; you'd use something more like this:
divisibleByTwo = []
for x in range( 10 ):
if x % 2 == 0:
divisibleByTwo.append( x )
These are both functionally equivalent. Let's invoke our suspension of disbelief and pretend Lisp has a very limited loop macro that just does iteration and no easy way to do the equivalent of list comprehensions.
In Lisp you could write the following. I should note this contrived example is picked to be identical to the Python code not a good example of Lisp code.
;; the following two functions just make equivalent of Python's range function
;; you can safely ignore them unless you are running this code
(defun range-helper (x)
(if (= x 0)
(list x)
(cons x (range-helper (- x 1)))))
(defun range (x)
(reverse (range-helper (- x 1))))
;; equivalent to the python example:
;; define a variable
(defvar divisibleByTwo nil)
;; loop from 0 upto and including 9
(loop for x in (range 10)
;; test for divisibility by two
if (= (mod x 2) 0)
;; append to the list
do (setq divisibleByTwo (append divisibleByTwo (list x))))
Before I go further, I should better explain what a macro is. It is a transformation performed on code by code. That is, a piece of code, read by the interpreter (or compiler), which takes in code as an argument, manipulates and the returns the result, which is then run in-place.
Of course that's a lot of typing and programmers are lazy. So we could define DSL for doing list comprehensions. In fact, we're using one macro already (the loop macro).
Lisp defines a couple of special syntax forms. The quote (') indicates the next token is a literal. The quasiquote or backtick (`) indicates the next token is a literal with escapes. Escapes are indicated by the comma operator. The literal '(1 2 3) is the equivalent of Python's [1, 2, 3]. You can assign it to another variable or use it in place. You can think of `(1 2 ,x) as the equivalent of Python's [1, 2, x] where x is a variable previously defined. This list notation is part of the magic that goes into macros. The second part is the Lisp reader which intelligently substitutes macros for code but that is best illustrated below:
So we can define a macro called lcomp (short for list comprehension). Its syntax will be exactly like the python that we used in the example [x for x in range(10) if x % 2 == 0] - (lcomp x for x in (range 10) if (= (% x 2) 0))
(defmacro lcomp (expression for var in list conditional conditional-test)
;; create a unique variable name for the result
(let ((result (gensym)))
;; the arguments are really code so we can substitute them
;; store nil in the unique variable name generated above
`(let ((,result nil))
;; var is a variable name
;; list is the list literal we are suppose to iterate over
(loop for ,var in ,list
;; conditional is if or unless
;; conditional-test is (= (mod x 2) 0) in our examples
,conditional ,conditional-test
;; and this is the action from the earlier lisp example
;; result = result + [x] in python
do (setq ,result (append ,result (list ,expression))))
;; return the result
,result)))
Now we can execute at the command line:
CL-USER> (lcomp x for x in (range 10) if (= (mod x 2) 0))
(0 2 4 6 8)
Pretty neat, huh? Now it doesn't stop there. You have a mechanism, or a paintbrush, if you like. You can have any syntax you could possibly want. Like Python or C#'s with syntax. Or .NET's LINQ syntax. In end, this is what attracts people to Lisp - ultimate flexibility.
You will find a comprehensive debate around lisp macro here.
An interesting subset of that article:
In most programming languages, syntax is complex. Macros have to take apart program syntax, analyze it, and reassemble it. They do not have access to the program's parser, so they have to depend on heuristics and best-guesses. Sometimes their cut-rate analysis is wrong, and then they break.
But Lisp is different. Lisp macros do have access to the parser, and it is a really simple parser. A Lisp macro is not handed a string, but a preparsed piece of source code in the form of a list, because the source of a Lisp program is not a string; it is a list. And Lisp programs are really good at taking apart lists and putting them back together. They do this reliably, every day.
Here is an extended example. Lisp has a macro, called "setf", that performs assignment. The simplest form of setf is
(setf x whatever)
which sets the value of the symbol "x" to the value of the expression "whatever".
Lisp also has lists; you can use the "car" and "cdr" functions to get the first element of a list or the rest of the list, respectively.
Now what if you want to replace the first element of a list with a new value? There is a standard function for doing that, and incredibly, its name is even worse than "car". It is "rplaca". But you do not have to remember "rplaca", because you can write
(setf (car somelist) whatever)
to set the car of somelist.
What is really happening here is that "setf" is a macro. At compile time, it examines its arguments, and it sees that the first one has the form (car SOMETHING). It says to itself "Oh, the programmer is trying to set the car of somthing. The function to use for that is 'rplaca'." And it quietly rewrites the code in place to:
(rplaca somelist whatever)
Common Lisp macros essentially extend the "syntactic primitives" of your code.
For example, in C, the switch/case construct only works with integral types and if you want to use it for floats or strings, you are left with nested if statements and explicit comparisons. There's also no way you can write a C macro to do the job for you.
But, since a lisp macro is (essentially) a lisp program that takes snippets of code as input and returns code to replace the "invocation" of the macro, you can extend your "primitives" repertoire as far as you want, usually ending up with a more readable program.
To do the same in C, you would have to write a custom pre-processor that eats your initial (not-quite-C) source and spits out something that a C compiler can understand. It's not a wrong way to go about it, but it's not necessarily the easiest.
Lisp macros allow you to decide when (if at all) any part or expression will be evaluated. To put a simple example, think of C's:
expr1 && expr2 && expr3 ...
What this says is: Evaluate expr1, and, should it be true, evaluate expr2, etc.
Now try to make this && into a function... thats right, you can't. Calling something like:
and(expr1, expr2, expr3)
Will evaluate all three exprs before yielding an answer regardless of whether expr1 was false!
With lisp macros you can code something like:
(defmacro && (expr1 &rest exprs)
`(if ,expr1 ;` Warning: I have not tested
(&& ,#exprs) ; this and might be wrong!
nil))
now you have an &&, which you can call just like a function and it won't evaluate any forms you pass to it unless they are all true.
To see how this is useful, contrast:
(&& (very-cheap-operation)
(very-expensive-operation)
(operation-with-serious-side-effects))
and:
and(very_cheap_operation(),
very_expensive_operation(),
operation_with_serious_side_effects());
Other things you can do with macros are creating new keywords and/or mini-languages (check out the (loop ...) macro for an example), integrating other languages into lisp, for example, you could write a macro that lets you say something like:
(setvar *rows* (sql select count(*)
from some-table
where column1 = "Yes"
and column2 like "some%string%")
And thats not even getting into Reader macros.
Hope this helps.
I don't think I've ever seen Lisp macros explained better than by this fellow: http://www.defmacro.org/ramblings/lisp.html
A lisp macro takes a program fragment as input. This program fragment is represented a data structure which can be manipulated and transformed any way you like. In the end the macro outputs another program fragment, and this fragment is what is executed at runtime.
C# does not have a macro facility, however an equivalent would be if the compiler parsed the code into a CodeDOM-tree, and passed that to a method, which transformed this into another CodeDOM, which is then compiled into IL.
This could be used to implement "sugar" syntax like the for each-statement using-clause, linq select-expressions and so on, as macros that transforms into the underlying code.
If Java had macros, you could implement Linq syntax in Java, without needing Sun to change the base language.
Here is pseudo-code for how a lisp-style macro in C# for implementing using could look:
define macro "using":
using ($type $varname = $expression) $block
into:
$type $varname;
try {
$varname = $expression;
$block;
} finally {
$varname.Dispose();
}
Since the existing answers give good concrete examples explaining what macros achieve and how, perhaps it'd help to collect together some of the thoughts on why the macro facility is a significant gain in relation to other languages; first from these answers, then a great one from elsewhere:
... in C, you would have to write a custom pre-processor [which would probably qualify as a sufficiently complicated C program] ...
—Vatine
Talk to anyone that's mastered C++ and ask them how long they spent learning all the template fudgery they need to do template metaprogramming [which is still not as powerful].
—Matt Curtis
... in Java you have to hack your way with bytecode weaving, although some frameworks like AspectJ allows you to do this using a different approach, it's fundamentally a hack.
—Miguel Ping
DOLIST is similar to Perl's foreach or Python's for. Java added a similar kind of loop construct with the "enhanced" for loop in Java 1.5, as part of JSR-201. Notice what a difference macros make. A Lisp programmer who notices a common pattern in their code can write a macro to give themselves a source-level abstraction of that pattern. A Java programmer who notices the same pattern has to convince Sun that this particular abstraction is worth adding to the language. Then Sun has to publish a JSR and convene an industry-wide "expert group" to hash everything out. That process--according to Sun--takes an average of 18 months. After that, the compiler writers all have to go upgrade their compilers to support the new feature. And even once the Java programmer's favorite compiler supports the new version of Java, they probably ''still'' can't use the new feature until they're allowed to break source compatibility with older versions of Java. So an annoyance that Common Lisp programmers can resolve for themselves within five minutes plagues Java programmers for years.
—Peter Seibel, in "Practical Common Lisp"
Think of what you can do in C or C++ with macros and templates. They're very useful tools for managing repetitive code, but they're limited in quite severe ways.
Limited macro/template syntax restricts their use. For example, you can't write a template which expands to something other than a class or a function. Macros and templates can't easily maintain internal data.
The complex, very irregular syntax of C and C++ makes it difficult to write very general macros.
Lisp and Lisp macros solve these problems.
Lisp macros are written in Lisp. You have the full power of Lisp to write the macro.
Lisp has a very regular syntax.
Talk to anyone that's mastered C++ and ask them how long they spent learning all the template fudgery they need to do template metaprogramming. Or all the crazy tricks in (excellent) books like Modern C++ Design, which are still tough to debug and (in practice) non-portable between real-world compilers even though the language has been standardised for a decade. All of that melts away if the langauge you use for metaprogramming is the same language you use for programming!
I'm not sure I can add some insight to everyone's (excellent) posts, but...
Lisp macros work great because of the Lisp syntax nature.
Lisp is an extremely regular language (think of everything is a list); macros enables you to treat data and code as the same (no string parsing or other hacks are needed to modify lisp expressions). You combine these two features and you have a very clean way to modify code.
Edit: What I was trying to say is that Lisp is homoiconic, which means that the data structure for a lisp program is written in lisp itself.
So, you end up with a way of creating your own code generator on top of the language using the language itself with all its power (eg. in Java you have to hack your way with bytecode weaving, although some frameworks like AspectJ allows you to do this using a different approach, it's fundamentally a hack).
In practice, with macros you end up building your own mini-language on top of lisp, without the need to learn additional languages or tooling, and with using the full power of the language itself.
Lisp macros represents a pattern that occurs in almost any sizeable programming project. Eventually in a large program you have a certain section of code where you realize it would be simpler and less error prone for you to write a program that outputs source code as text which you can then just paste in.
In Python objects have two methods __repr__ and __str__. __str__ is simply the human readable representation. __repr__ returns a representation that is valid Python code, which is to say, something that can be entered into the interpreter as valid Python. This way you can create little snippets of Python that generate valid code that can be pasted into your actually source.
In Lisp this whole process has been formalized by the macro system. Sure it enables you to create extensions to the syntax and do all sorts of fancy things, but it's actual usefulness is summed up by the above. Of course it helps that the Lisp macro system allows you to manipulate these "snippets" with the full power of the entire language.
In short, macros are transformations of code. They allow to introduce many new syntax constructs. E.g., consider LINQ in C#. In lisp, there are similar language extensions that are implemented by macros (e.g., built-in loop construct, iterate). Macros significantly decrease code duplication. Macros allow embedding «little languages» (e.g., where in c#/java one would use xml to configure, in lisp the same thing can be achieved with macros). Macros may hide difficulties of using libraries usage.
E.g., in lisp you can write
(iter (for (id name) in-clsql-query "select id, name from users" on-database *users-database*)
(format t "User with ID of ~A has name ~A.~%" id name))
and this hides all the database stuff (transactions, proper connection closing, fetching data, etc.) whereas in C# this requires creating SqlConnections, SqlCommands, adding SqlParameters to SqlCommands, looping on SqlDataReaders, properly closing them.
While the above all explains what macros are and even have cool examples, I think the key difference between a macro and a normal function is that LISP evaluates all the parameters first before calling the function. With a macro it's the reverse, LISP passes the parameters unevaluated to the macro. For example, if you pass (+ 1 2) to a function, the function will receive the value 3. If you pass this to a macro, it will receive a List( + 1 2). This can be used to do all kinds of incredibly useful stuff.
Adding a new control structure, e.g. loop or the deconstruction of a list
Measure the time it takes to execute a function passed in. With a function the parameter would be evaluated before control is passed to the function. With the macro, you can splice your code between the start and stop of your stopwatch. The below has the exact same code in a macro and a function and the output is very different. Note: This is a contrived example and the implementation was chosen so that it is identical to better highlight the difference.
(defmacro working-timer (b)
(let (
(start (get-universal-time))
(result (eval b))) ;; not splicing here to keep stuff simple
((- (get-universal-time) start))))
(defun my-broken-timer (b)
(let (
(start (get-universal-time))
(result (eval b))) ;; doesn't even need eval
((- (get-universal-time) start))))
(working-timer (sleep 10)) => 10
(broken-timer (sleep 10)) => 0
One-liner answer:
Minimal syntax => Macros over Expressions => Conciseness => Abstraction => Power
Lisp macros do nothing more than writing codes programmatically. That is, after expanding the macros, you got nothing more than Lisp code without macros. So, in principle, they achieve nothing new.
However, they differ from macros in other programming languages in that they write codes on the level of expressions, whereas others' macros write codes on the level of strings. This is unique to lisp thanks to their parenthesis; or put more precisely, their minimal syntax which is possible thanks to their parentheses.
As shown in many examples in this thread, and also Paul Graham's On Lisp, lisp macros can then be a tool to make your code much more concise. When conciseness reaches a point, it offers new levels of abstractions for codes to be much cleaner. Going back to the first point again, in principle they do not offer anything new, but that's like saying since paper and pencils (almost) form a Turing machine, we do not need an actual computer.
If one knows some math, think about why functors and natural transformations are useful ideas. In principle, they do not offer anything new. However by expanding what they are into lower-level math you'll see that a combination of a few simple ideas (in terms of category theory) could take 10 pages to be written down. Which one do you prefer?
I got this from the common lisp cookbook and I think it explained why lisp macros are useful.
"A macro is an ordinary piece of Lisp code that operates on another piece of putative Lisp code, translating it into (a version closer to) executable Lisp. That may sound a bit complicated, so let's give a simple example. Suppose you want a version of setq that sets two variables to the same value. So if you write
(setq2 x y (+ z 3))
when z=8 both x and y are set to 11. (I can't think of any use for this, but it's just an example.)
It should be obvious that we can't define setq2 as a function. If x=50 and y=-5, this function would receive the values 50, -5, and 11; it would have no knowledge of what variables were supposed to be set. What we really want to say is, When you (the Lisp system) see (setq2 v1 v2 e), treat it as equivalent to (progn (setq v1 e) (setq v2 e)). Actually, this isn't quite right, but it will do for now. A macro allows us to do precisely this, by specifying a program for transforming the input pattern (setq2 v1 v2 e)" into the output pattern (progn ...)."
If you thought this was nice you can keep on reading here:
http://cl-cookbook.sourceforge.net/macros.html
In python you have decorators, you basically have a function that takes another function as input. You can do what ever you want: call the function, do something else, wrap the function call in a resource acquire release, etc. but you don't get to peek inside that function. Say we wanted to make it more powerful, say your decorator received the code of the function as a list then you could not only execute the function as is but you can now execute parts of it, reorder lines of the function etc.