Why are macros based on abstract syntax trees better than macros based on string preprocessing? - macros

I am beginning my journey of learning Rust. I came across this line in Rust by Example:
However, unlike macros in C and other languages, Rust macros are expanded into abstract syntax trees, rather than string preprocessing, so you don't get unexpected precedence bugs.
Why is an abstract syntax tree better than string preprocessing?

If you have this in C:
#define X(A,B) A+B
int r = X(1,2) * 3;
The value of r will be 7, because the preprocessor expands it to 1+2 * 3, which is 1+(2*3).
In Rust, you would have:
macro_rules! X { ($a:expr,$b:expr) => { $a+$b } }
let r = X!(1,2) * 3;
This will evaluate to 9, because the compiler will interpret the expansion as (1+2)*3. This is because the compiler knows that the result of the macro is supposed to be a complete, self-contained expression.
That said, the C macro could also be defined like so:
#define X(A,B) ((A)+(B))
This would avoid any non-obvious evaluation problems, including the arguments themselves being reinterpreted due to context. However, when you're using a macro, you can never be sure whether or not the macro has correctly accounted for every possible way it could be used, so it's hard to tell what any given macro expansion will do.
By using AST nodes instead of text, Rust ensures this ambiguity can't happen.

A classic example using the C preprocessor is
#define MUL(a, b) a * b
// ...
int res = MUL(x + y, 5);
The use of the macro will expand to
int res = x + y * 5;
which is very far from the expected
int res = (x + y) * 5;
This happens because the C preprocessor really just does simple text-based substitutions, it's not really an integral part of the language itself. Preprocessing and parsing are two separate steps.
If the preprocessor instead parsed the macro like the rest of the compiler, which happens for languages where macros are part of the actual language syntax, this is no longer a problem as things like precedence (as mentioned) and associativity are taken into account.

Related

Calling macro from within generated function in Julia

I have been messing around with generated functions in Julia, and have come to a weird problem I do not understand fully: My final goal would involve calling a macro (more specifically #tullio) from within a generated function (to perform some tensor contractions that depend on the input tensors). But I have been having problems, which I narrowed down to calling the macro from within the generated function.
To illustrate the problem, let's consider a very simple example that also fails:
macro my_add(a,b)
return :($a + $b)
end
function add_one_expr(x::T) where T
y = one(T)
return :( #my_add($x,$y) )
end
#generated function add_one_gen(x::T) where T
y = one(T)
return :( #my_add($x,$y) )
end
With these declarations, I find that eval(add_one_expr(2.0)) works just as expected and returns and expression
:(#my_add 2.0 1.0)
which correctly evaluates to 3.0.
However evaluating add_one_gen(2.0) returns the following error:
MethodError: no method matching +(::Type{Float64}, ::Float64)
Doing some research, I have found that #generated actually produces two codes, and in one only the types of the variables can be used. I think this is what is happening here, but I do not understand what is happening at all. It must be some weird interaction between macros and generated functions.
Can someone explain and/or propose a solution? Thank you!
I find it helpful to think of generated functions as having two components: the body and any generated code (the stuff inside a quote..end). The body is evaluated at compile time, and doesn't "know" the values, only the types. So for a generated function taking x::T as an argument, any references to x in the body will actually point to the type T. This can be very confusing. To make things clearer, I recommend the body only refer to types, never to values.
Here's a little example:
julia> #generated function show_val_and_type(x::T) where {T}
quote
println("x is ", x)
println("\$x is ", $x)
println("T is ", T)
println("\$T is ", $T)
end
end
show_val_and_type
julia> show_val_and_type(3)
x is 3
$x is Int64
T is Int64
$T is Int64
The interpolated $x means "take the x from the body (which refers to T) and splice it in.
If you follow the approach of never referring to values in the body, you can test generated functions by removing the #generated, like this:
julia> function add_one_gen(x::T) where T
y = one(T)
quote
#my_add(x,$y)
end
end
add_one_gen
julia> add_one_gen(3)
quote
#= REPL[42]:4 =#
#= REPL[42]:4 =# #my_add x 1
end
That looks reasonable, but when we test it we get
julia> add_one_gen(3)
ERROR: UndefVarError: x not defined
Stacktrace:
[1] macro expansion
# ./REPL[48]:4 [inlined]
[2] add_one_gen(x::Int64)
# Main ./REPL[48]:1
[3] top-level scope
# REPL[49]:1
So let's see what the macro gives us
julia> #macroexpand #my_add x 1
:(Main.x + 1)
It's pointing to Main.x, which doesn't exist. The macro is being too eager, and we need to delay its evaluation. The standard way to do this is with esc. So finally, this works:
julia> macro my_add(a,b)
return :($(esc(a)) + $(esc(b)))
end
#my_add
julia> #generated function add_one_gen(x::T) where T
y = one(T)
quote
#my_add(x,$y)
end
end
add_one_gen
julia> add_one_gen(3)
4

Fast iteration over unicode string Cython

I have the following cython function.
01:
+02: cdef int count_char_in_x(unicode x,Py_UCS4 c):
03: cdef:
+04: int count = 0
05: Py_UCS4 x_k
06:
+07: for x_k in x: ## Yellow
+08: if x_k == c:
+09: count+=1
10:
+11: return count
Line 07 is not properly optimized.
The annotated HTML code is expanded as:
+07: for x_k in x: ## Yellow
if (unlikely(__pyx_v_x == Py_None)) {
PyErr_SetString(PyExc_TypeError, "'NoneType' is not iterable");
__PYX_ERR(0, 8, __pyx_L1_error)
}
__Pyx_INCREF(__pyx_v_x);
__pyx_t_1 = __pyx_v_x;
__pyx_t_6 = __Pyx_init_unicode_iteration(__pyx_t_1, (&__pyx_t_3), (&__pyx_t_4), (&__pyx_t_5)); if (unlikely(__pyx_t_6 == ((int)-1))) __PYX_ERR(0, 8, __pyx_L1_error)
for (__pyx_t_7 = 0; __pyx_t_7 < __pyx_t_3; __pyx_t_7++) {
__pyx_t_2 = __pyx_t_7;
__pyx_v_x_k = __Pyx_PyUnicode_READ(__pyx_t_5, __pyx_t_4, __pyx_t_2);
Any tips on how could this be improved?
I think it is possible to write a cdef/cpdef function that at runtime completly avoids Python None type checks. Any idea on how this could be done?
The generated C code looks pretty good to me. The loop overall is a int-iterated for loop (i.e. it's not relying on calling the Python methods __iter__ and __next__).
__Pyx_PyUnicode_READ is translated pretty directly to PyUnicode_READ (depending slightly on the Python version you're using). PyUnicode_READ is a C macro which is as close to a direct array access as you can get.
This is probably as good as it's getting. You might get a small improvement by using bytes rather than unicode (provided you're dealing with ASCII characters). You might just consider whether it's really worth reimplementing unicode.count.
If it were a regular def function you could declare x as unicode not None to remove the None check before the loop. That might make a small difference. However, as #ead points out that isn't supported for cdef functions. It's likely the cost of a def function call will be slightly larger than the cost of a None-check, but you should time it if you care.

Performance difference between functions and pattern matching in Mathematica

So Mathematica is different from other dialects of lisp because it blurs the lines between functions and macros. In Mathematica if a user wanted to write a mathematical function they would likely use pattern matching like f[x_]:= x*x instead of f=Function[{x},x*x] though both would return the same result when called with f[x]. My understanding is that the first approach is something equivalent to a lisp macro and in my experience is favored because of the more concise syntax.
So I have two questions, is there a performance difference between executing functions versus the pattern matching/macro approach? Though part of me wouldn't be surprised if functions were actually transformed into some version of macros to allow features like Listable to be implemented.
The reason I care about this question is because of the recent set of questions (1) (2) about trying to catch Mathematica errors in large programs. If most of the computations were defined in terms of Functions, it seems to me that keeping track of the order of evaluation and where the error originated would be easier than trying to catch the error after the input has been rewritten by the successive application of macros/patterns.
The way I understand Mathematica is that it is one giant search replace engine. All functions, variables, and other assignments are essentially stored as rules and during evaluation Mathematica goes through this global rule base and applies them until the resulting expression stops changing.
It follows that the fewer times you have to go through the list of rules the faster the evaluation. Looking at what happens using Trace (using gdelfino's function g and h)
In[1]:= Trace#(#*#)&#x
Out[1]= {x x,x^2}
In[2]:= Trace#g#x
Out[2]= {g[x],x x,x^2}
In[3]:= Trace#h#x
Out[3]= {{h,Function[{x},x x]},Function[{x},x x][x],x x,x^2}
it becomes clear why anonymous functions are fastest and why using Function introduces additional overhead over a simple SetDelayed. I recommend looking at the introduction of Leonid Shifrin's excellent book, where these concepts are explained in some detail.
I have on occasion constructed a Dispatch table of all the functions I need and manually applied it to my starting expression. This provides a significant speed increase over normal evaluation as none of Mathematica's inbuilt functions need to be matched against my expression.
My understanding is that the first approach is something equivalent to a lisp macro and in my experience is favored because of the more concise syntax.
Not really. Mathematica is a term rewriter, as are Lisp macros.
So I have two questions, is there a performance difference between executing functions versus the pattern matching/macro approach?
Yes. Note that you are never really "executing functions" in Mathematica. You are just applying rewrite rules to change one expression into another.
Consider mapping the Sqrt function over a packed array of floating point numbers. The fastest solution in Mathematica is to apply the Sqrt function directly to the packed array because it happens to implement exactly what we want and is optimized for this special case:
In[1] := N#Range[100000];
In[2] := Sqrt[xs]; // AbsoluteTiming
Out[2] = {0.0060000, Null}
We might define a global rewrite rule that has terms of the form sqrt[x] rewritten to Sqrt[x] such that the square root will be calculated:
In[3] := Clear[sqrt];
sqrt[x_] := Sqrt[x];
Map[sqrt, xs]; // AbsoluteTiming
Out[3] = {0.4800007, Null}
Note that this is ~100× slower than the previous solution.
Alternatively, we might define a global rewrite rule that replaces the symbol sqrt with a lambda function that invokes Sqrt:
In[4] := Clear[sqrt];
sqrt = Function[{x}, Sqrt[x]];
Map[sqrt, xs]; // AbsoluteTiming
Out[4] = {0.0500000, Null}
Note that this is ~10× faster than the previous solution.
Why? Because the slow second solution is looking up the rewrite rule sqrt[x_] :> Sqrt[x] in the inner loop (for each element of the array) whereas the fast third solution looks up the value Function[...] of the symbol sqrt once and then applies that lambda function repeatedly. In contrast, the fastest first solution is a loop calling sqrt written in C. So searching the global rewrite rules is extremely expensive and term rewriting is expensive.
If so, why is Sqrt ever fast? You might expect a 2× slowdown instead of 10× because we've replaced one lookup for Sqrt with two lookups for sqrt and Sqrt in the inner loop but this is not so because Sqrt has the special status of being a built-in function that will be matched in the core of the Mathematica term rewriter itself rather than via the general-purpose global rewrite table.
Other people have described much smaller performance differences between similar functions. I believe the performance differences in those cases are just minor differences in the exact implementation of Mathematica's internals. The biggest issue with Mathematica is the global rewrite table. In particular, this is where Mathematica diverges from traditional term-level interpreters.
You can learn a lot about Mathematica's performance by writing mini Mathematica implementations. In this case, the above solutions might be compiled to (for example) F#. The array may be created like this:
> let xs = [|1.0..100000.0|];;
...
The built-in sqrt function can be converted into a closure and given to the map function like this:
> Array.map sqrt xs;;
Real: 00:00:00.006, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0
...
This takes 6ms just like Sqrt[xs] in Mathematica. But that is to be expected because this code has been JIT compiled down to machine code by .NET for fast evaluation.
Looking up rewrite rules in Mathematica's global rewrite table is similar to looking up the closure in a dictionary keyed on its function name. Such a dictionary can be constructed like this in F#:
> open System.Collections.Generic;;
> let fns = Dictionary<string, (obj -> obj)>(dict["sqrt", unbox >> sqrt >> box]);;
This is similar to the DownValues data structure in Mathematica, except that we aren't searching multiple resulting rules for the first to match on the function arguments.
The program then becomes:
> Array.map (fun x -> fns.["sqrt"] (box x)) xs;;
Real: 00:00:00.044, CPU: 00:00:00.031, GC gen0: 0, gen1: 0, gen2: 0
...
Note that we get a similar 10× performance degradation due to the hash table lookup in the inner loop.
An alternative would be to store the DownValues associated with a symbol in the symbol itself in order to avoid the hash table lookup.
We can even write a complete term rewriter in just a few lines of code. Terms may be expressed as values of the following type:
> type expr =
| Float of float
| Symbol of string
| Packed of float []
| Apply of expr * expr [];;
Note that Packed implements Mathematica's packed lists, i.e. unboxed arrays.
The following init function constructs a List with n elements using the function f, returning a Packed if every return value was a Float or a more general Apply(Symbol "List", ...) otherwise:
> let init n f =
let rec packed ys i =
if i=n then Packed ys else
match f i with
| Float y ->
ys.[i] <- y
packed ys (i+1)
| y ->
Apply(Symbol "List", Array.init n (fun j ->
if j<i then Float ys.[i]
elif j=i then y
else f j))
packed (Array.zeroCreate n) 0;;
val init : int -> (int -> expr) -> expr
The following rule function uses pattern matching to identify expressions that it can understand and replaces them with other expressions:
> let rec rule = function
| Apply(Symbol "Sqrt", [|Float x|]) ->
Float(sqrt x)
| Apply(Symbol "Map", [|f; Packed xs|]) ->
init xs.Length (fun i -> rule(Apply(f, [|Float xs.[i]|])))
| f -> f;;
val rule : expr -> expr
Note that the type of this function expr -> expr is characteristic of term rewriting: rewriting replaces expressions with other expressions rather than reducing them to values.
Our program can now be defined and executed by our custom term rewriter:
> rule (Apply(Symbol "Map", [|Symbol "Sqrt"; Packed xs|]));;
Real: 00:00:00.049, CPU: 00:00:00.046, GC gen0: 24, gen1: 0, gen2: 0
We've recovered the performance of Map[Sqrt, xs] in Mathematica!
We can even recover the performance of Sqrt[xs] by adding an appropriate rule:
| Apply(Symbol "Sqrt", [|Packed xs|]) ->
Packed(Array.map sqrt xs)
I wrote an article on term rewriting in F#.
Some measurements
Based on #gdelfino answer and comments by #rcollyer I made this small program:
j = # # + # # &;
g[x_] := x x + x x ;
h = Function[{x}, x x + x x ];
anon = Table[Timing[Do[ # # + # # &[i], {i, k}]][[1]], {k, 10^5, 10^6, 10^5}];
jj = Table[Timing[Do[ j[i], {i, k}]][[1]], {k, 10^5, 10^6, 10^5}];
gg = Table[Timing[Do[ g[i], {i, k}]][[1]], {k, 10^5, 10^6, 10^5}];
hh = Table[Timing[Do[ h[i], {i, k}]][[1]], {k, 10^5, 10^6, 10^5}];
ListLinePlot[ {anon, jj, gg, hh},
PlotStyle -> {Black, Red, Green, Blue},
PlotRange -> All]
The results are, at least for me, very surprising:
Any explanations? Please feel free to edit this answer (comments are a mess for long text)
Edit
Tested with the identity function f[x] = x to isolate the parsing from the actual evaluation. Results (same colors):
Note: results are very similar to this Plot for constant functions (f[x]:=1);
Pattern matching seems faster:
In[1]:= g[x_] := x*x
In[2]:= h = Function[{x}, x*x];
In[3]:= Do[h[RandomInteger[100]], {1000000}] // Timing
Out[3]= {1.53927, Null}
In[4]:= Do[g[RandomInteger[100]], {1000000}] // Timing
Out[4]= {1.15919, Null}
Pattern matching is also more flexible as it allows you to overload a definition:
In[5]:= g[x_] := x * x
In[6]:= g[x_,y_] := x * y
For simple functions you can compile to get the best performance:
In[7]:= k[x_] = Compile[{x}, x*x]
In[8]:= Do[k[RandomInteger[100]], {100000}] // Timing
Out[8]= {0.083517, Null}
You can use function recordSteps in previous answer to see what Mathematica actually does with Functions. It treats it just like any other Head. IE, suppose you have the following
f = Function[{x}, x + 2];
f[2]
It first transforms f[2] into
Function[{x}, x + 2][2]
At the next step, x+2 is transformed into 2+2. Essentially, "Function" evaluation behaves like an application of pattern matching rules, so it shouldn't be surprising that it's not faster.
You can think of everything in Mathematica as an expression, where evaluation is the process of rewriting parts of the expression in a predefined sequence, this applies to Function like to any other head

Scala closures on wikipedia

Found the following snippet on the Closure page on wikipedia
//# Return a list of all books with at least 'threshold' copies sold.
def bestSellingBooks(threshold: Int) = bookList.filter(book => book.sales >= threshold)
//# or
def bestSellingBooks(threshold: Int) = bookList.filter(_.sales >= threshold)
Correct me if I'm wrong, but this isn't a closure? It is a function literal, an anynomous function, a lambda function, but not a closure?
Well... if you want to be technical, this is a function literal which is translated at runtime into a closure, closing the open terms (binding them to a val/var in the scope of the function literal). Also, in the context of this function literal (_.sales >= threshold), threshold is a free variable, as the function literal itself doesn't give it any meaning. By itself, _.sales >= threshold is an open term At runtime, it is bound to the local variable of the function, each time the function is called.
Take this function for example, generating closures:
def makeIncrementer(inc: Int): (Int => Int) = (x: Int) => x + inc
At runtime, the following code produces 3 closures. It's also interesting to note that b and c are not the same closure (b == c gives false).
val a = makeIncrementer(10)
val b = makeIncrementer(20)
val c = makeIncrementer(20)
I still think the example given on wikipedia is a good one, albeit not quite covering the whole story. It's quite hard giving an example of actual closures by the strictest definition without actually a memory dump of a program running. It's the same with the class-object relation. You usually give an example of an object by defining a class Foo { ... and then instantiating it with val f = new Foo, saying that f is the object.
-- Flaviu Cipcigan
Notes:
Reference: Programming in Scala, Martin Odersky, Lex Spoon, Bill Venners
Code compiled with Scala version 2.7.5.final running on Java 1.6.0_14.
I'm not entirely sure, but I think you're right. Doesn't a closure require state (I guess free variables...)?
Or maybe the bookList is the free variable?
As far as I understand, this is a closure that contains a formal parameter, threshold and context variable, bookList, from the enclosing scope. So the return value(List[Any]) of the function may change while applying the filter predicate function. It is varying based on the elements of List(bookList) variable from the context.

Why does Scala's semicolon inference fail here?

On compiling the following code with Scala 2.7.3,
package spoj
object Prime1 {
def main(args: Array[String]) {
def isPrime(n: Int) = (n != 1) && (2 to n/2 forall (n % _ != 0))
val read = new java.util.Scanner(System.in)
var nTests = read nextInt // [*]
while(nTests > 0) {
val (start, end) = (read nextInt, read nextInt)
start to end filter(isPrime(_)) foreach println
println
nTests -= 1
}
}
}
I get the following compile time error :
PRIME1.scala:8: error: illegal start of simple expression
while(nTests > 0) {
^
PRIME1.scala:14: error: block must end in result expression, not in definition
}
^
two errors found
When I add a semicolon at the end of the line commented as [*], the program compiles fine. Can anyone please explain why does Scala's semicolon inference fail to work on that particular line?
Is it because scala is assuming that you are using the syntax a foo b (equivalent to a.foo(b)) in your call to readInt. That is, it assumes that the while loop is the argument to readInt (recall that every expression has a type) and hence the last statement is a declaration:
var ntests = read nextInt x
wherex is your while block.
I must say that, as a point of preference, I've now returned to using the usual a.foo(b) syntax over a foo b unless specifically working with a DSL which was designed with that use in mind (like actors' a ! b). It makes things much clearer in general and you don't get bitten by weird stuff like this!
Additional comment to the answer by oxbow_lakes...
var ntests = read nextInt()
Should fix things for you as an alternative to the semicolon
To add a little more about the semicolon inference, Scala actually does this in two stages. First it infers a special token called nl by the language spec. The parser allows nl to be used as a statement separator, as well as semicolons. However, nl is also permitted in a few other places by the grammar. In particular, a single nl is allowed after infix operators when the first token on the next line can start an expression -- and while can start an expression, which is why it interprets it that way. Unfortunately, although while can start a expression, a while statement cannot be used in an infix expression, hence the error. Personally, it seems a rather quirky way for the parser to work, but there's quite plausibly a sane rationale behind it for all I know!
As yet another option to the others suggested, putting a blank newline between your [*] line and the while line will also fix the problem, because only a single nl is permitted after infix operators, so multiple nls forces a different interpretation by the parser.