Performance difference between functions and pattern matching in Mathematica

Performance difference between functions and pattern matching in Mathematica - lisp

So Mathematica is different from other dialects of lisp because it blurs the lines between functions and macros. In Mathematica if a user wanted to write a mathematical function they would likely use pattern matching like f[x_]:= x*x instead of f=Function[{x},x*x] though both would return the same result when called with f[x]. My understanding is that the first approach is something equivalent to a lisp macro and in my experience is favored because of the more concise syntax.
So I have two questions, is there a performance difference between executing functions versus the pattern matching/macro approach? Though part of me wouldn't be surprised if functions were actually transformed into some version of macros to allow features like Listable to be implemented.
The reason I care about this question is because of the recent set of questions (1) (2) about trying to catch Mathematica errors in large programs. If most of the computations were defined in terms of Functions, it seems to me that keeping track of the order of evaluation and where the error originated would be easier than trying to catch the error after the input has been rewritten by the successive application of macros/patterns.

The way I understand Mathematica is that it is one giant search replace engine. All functions, variables, and other assignments are essentially stored as rules and during evaluation Mathematica goes through this global rule base and applies them until the resulting expression stops changing.
It follows that the fewer times you have to go through the list of rules the faster the evaluation. Looking at what happens using Trace (using gdelfino's function g and h)
In[1]:= Trace#(#*#)&#x
Out[1]= {x x,x^2}
In[2]:= Trace#g#x
Out[2]= {g[x],x x,x^2}
In[3]:= Trace#h#x
Out[3]= {{h,Function[{x},x x]},Function[{x},x x][x],x x,x^2}
it becomes clear why anonymous functions are fastest and why using Function introduces additional overhead over a simple SetDelayed. I recommend looking at the introduction of Leonid Shifrin's excellent book, where these concepts are explained in some detail.
I have on occasion constructed a Dispatch table of all the functions I need and manually applied it to my starting expression. This provides a significant speed increase over normal evaluation as none of Mathematica's inbuilt functions need to be matched against my expression.

My understanding is that the first approach is something equivalent to a lisp macro and in my experience is favored because of the more concise syntax.
Not really. Mathematica is a term rewriter, as are Lisp macros.
So I have two questions, is there a performance difference between executing functions versus the pattern matching/macro approach?
Yes. Note that you are never really "executing functions" in Mathematica. You are just applying rewrite rules to change one expression into another.
Consider mapping the Sqrt function over a packed array of floating point numbers. The fastest solution in Mathematica is to apply the Sqrt function directly to the packed array because it happens to implement exactly what we want and is optimized for this special case:
In[1] := N#Range[100000];
In[2] := Sqrt[xs]; // AbsoluteTiming
Out[2] = {0.0060000, Null}
We might define a global rewrite rule that has terms of the form sqrt[x] rewritten to Sqrt[x] such that the square root will be calculated:
In[3] := Clear[sqrt];
sqrt[x_] := Sqrt[x];
Map[sqrt, xs]; // AbsoluteTiming
Out[3] = {0.4800007, Null}
Note that this is ~100× slower than the previous solution.
Alternatively, we might define a global rewrite rule that replaces the symbol sqrt with a lambda function that invokes Sqrt:
In[4] := Clear[sqrt];
sqrt = Function[{x}, Sqrt[x]];
Map[sqrt, xs]; // AbsoluteTiming
Out[4] = {0.0500000, Null}
Note that this is ~10× faster than the previous solution.
Why? Because the slow second solution is looking up the rewrite rule sqrt[x_] :> Sqrt[x] in the inner loop (for each element of the array) whereas the fast third solution looks up the value Function[...] of the symbol sqrt once and then applies that lambda function repeatedly. In contrast, the fastest first solution is a loop calling sqrt written in C. So searching the global rewrite rules is extremely expensive and term rewriting is expensive.
If so, why is Sqrt ever fast? You might expect a 2× slowdown instead of 10× because we've replaced one lookup for Sqrt with two lookups for sqrt and Sqrt in the inner loop but this is not so because Sqrt has the special status of being a built-in function that will be matched in the core of the Mathematica term rewriter itself rather than via the general-purpose global rewrite table.
Other people have described much smaller performance differences between similar functions. I believe the performance differences in those cases are just minor differences in the exact implementation of Mathematica's internals. The biggest issue with Mathematica is the global rewrite table. In particular, this is where Mathematica diverges from traditional term-level interpreters.
You can learn a lot about Mathematica's performance by writing mini Mathematica implementations. In this case, the above solutions might be compiled to (for example) F#. The array may be created like this:
> let xs = [|1.0..100000.0|];;
...
The built-in sqrt function can be converted into a closure and given to the map function like this:
> Array.map sqrt xs;;
Real: 00:00:00.006, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0
...
This takes 6ms just like Sqrt[xs] in Mathematica. But that is to be expected because this code has been JIT compiled down to machine code by .NET for fast evaluation.
Looking up rewrite rules in Mathematica's global rewrite table is similar to looking up the closure in a dictionary keyed on its function name. Such a dictionary can be constructed like this in F#:
> open System.Collections.Generic;;
> let fns = Dictionary<string, (obj -> obj)>(dict["sqrt", unbox >> sqrt >> box]);;
This is similar to the DownValues data structure in Mathematica, except that we aren't searching multiple resulting rules for the first to match on the function arguments.
The program then becomes:
> Array.map (fun x -> fns.["sqrt"] (box x)) xs;;
Real: 00:00:00.044, CPU: 00:00:00.031, GC gen0: 0, gen1: 0, gen2: 0
...
Note that we get a similar 10× performance degradation due to the hash table lookup in the inner loop.
An alternative would be to store the DownValues associated with a symbol in the symbol itself in order to avoid the hash table lookup.
We can even write a complete term rewriter in just a few lines of code. Terms may be expressed as values of the following type:
> type expr =
| Float of float
| Symbol of string
| Packed of float []
| Apply of expr * expr [];;
Note that Packed implements Mathematica's packed lists, i.e. unboxed arrays.
The following init function constructs a List with n elements using the function f, returning a Packed if every return value was a Float or a more general Apply(Symbol "List", ...) otherwise:
> let init n f =
let rec packed ys i =
if i=n then Packed ys else
match f i with
| Float y ->
ys.[i] <- y
packed ys (i+1)
| y ->
Apply(Symbol "List", Array.init n (fun j ->
if j<i then Float ys.[i]
elif j=i then y
else f j))
packed (Array.zeroCreate n) 0;;
val init : int -> (int -> expr) -> expr
The following rule function uses pattern matching to identify expressions that it can understand and replaces them with other expressions:
> let rec rule = function
| Apply(Symbol "Sqrt", [|Float x|]) ->
Float(sqrt x)
| Apply(Symbol "Map", [|f; Packed xs|]) ->
init xs.Length (fun i -> rule(Apply(f, [|Float xs.[i]|])))
| f -> f;;
val rule : expr -> expr
Note that the type of this function expr -> expr is characteristic of term rewriting: rewriting replaces expressions with other expressions rather than reducing them to values.
Our program can now be defined and executed by our custom term rewriter:
> rule (Apply(Symbol "Map", [|Symbol "Sqrt"; Packed xs|]));;
Real: 00:00:00.049, CPU: 00:00:00.046, GC gen0: 24, gen1: 0, gen2: 0
We've recovered the performance of Map[Sqrt, xs] in Mathematica!
We can even recover the performance of Sqrt[xs] by adding an appropriate rule:
| Apply(Symbol "Sqrt", [|Packed xs|]) ->
Packed(Array.map sqrt xs)
I wrote an article on term rewriting in F#.

Some measurements
Based on #gdelfino answer and comments by #rcollyer I made this small program:
j = # # + # # &;
g[x_] := x x + x x ;
h = Function[{x}, x x + x x ];
anon = Table[Timing[Do[ # # + # # &[i], {i, k}]][[1]], {k, 10^5, 10^6, 10^5}];
jj = Table[Timing[Do[ j[i], {i, k}]][[1]], {k, 10^5, 10^6, 10^5}];
gg = Table[Timing[Do[ g[i], {i, k}]][[1]], {k, 10^5, 10^6, 10^5}];
hh = Table[Timing[Do[ h[i], {i, k}]][[1]], {k, 10^5, 10^6, 10^5}];
ListLinePlot[ {anon, jj, gg, hh},
PlotStyle -> {Black, Red, Green, Blue},
PlotRange -> All]
The results are, at least for me, very surprising:
Any explanations? Please feel free to edit this answer (comments are a mess for long text)
Edit
Tested with the identity function f[x] = x to isolate the parsing from the actual evaluation. Results (same colors):
Note: results are very similar to this Plot for constant functions (f[x]:=1);

Pattern matching seems faster:
In[1]:= g[x_] := x*x
In[2]:= h = Function[{x}, x*x];
In[3]:= Do[h[RandomInteger[100]], {1000000}] // Timing
Out[3]= {1.53927, Null}
In[4]:= Do[g[RandomInteger[100]], {1000000}] // Timing
Out[4]= {1.15919, Null}
Pattern matching is also more flexible as it allows you to overload a definition:
In[5]:= g[x_] := x * x
In[6]:= g[x_,y_] := x * y
For simple functions you can compile to get the best performance:
In[7]:= k[x_] = Compile[{x}, x*x]
In[8]:= Do[k[RandomInteger[100]], {100000}] // Timing
Out[8]= {0.083517, Null}

You can use function recordSteps in previous answer to see what Mathematica actually does with Functions. It treats it just like any other Head. IE, suppose you have the following
f = Function[{x}, x + 2];
f[2]
It first transforms f[2] into
Function[{x}, x + 2][2]
At the next step, x+2 is transformed into 2+2. Essentially, "Function" evaluation behaves like an application of pattern matching rules, so it shouldn't be surprising that it's not faster.
You can think of everything in Mathematica as an expression, where evaluation is the process of rewriting parts of the expression in a predefined sequence, this applies to Function like to any other head

Related

Can Julia macros be used to generate code based on specific function implementation?

I am fairly new to Julia and I am learning about metaprogramming.
I would like to write a macro that receive in input a function and returns another function based on the implementation details of its input.
For example given:
function f(x)
x + 100
end
function g(x)
f(x)*x
end
function h(x)
g(x)-0.5*f(x)
end
I would like to write a macro that returns something like that:
function h_traced(x)
f = x + 100
println("loc 1 x: ", x)
g = f * x
println("loc 2 x: ", x)
res = g - 0.5 * f
println("loc 3 x: ", x)
Now both code_lowered and code_typed seems to give me back the AST in the form of CodeInfo, however when I try to use it programmatically in my macro I get empty object.
macro myExpand(f)
body = code_lowered(f)
println("myExpand Body lenght: ",length(body))
end
called like this
#myExpand :(h)
however the same call outside the macro works ok.
code_lowered(h)
At last even the following return an empty CodeInfo.
macro myExpand(f)
body = code_lowered(Symbol("h"))
println("myExpand Body lenght: ",length(body))
end
This might be incredible trivial but I could not work out myseld why the h symbol does not resolve to the function defined. Am I missing something about the scope of symbols?

I find it useful to think about macros as a way to transform an input syntax into an output syntax.
So you could very well define a macro #my_macro such that
#my_macro function h(x)
g(x)-0.5*f(x)
end
would expand to something like
function h_traced(x)
println("entering function: x=", x)
g(x)-0.5*f(x)
end
But to such a macro, h is merely a name, an identifier (technically, a Symbol) that can be transformed into h_traced. h is not the function that is bound to this name (in the same way as x = 2 involves binding a name x, to an integer value 2, but x is not 2; x is merely a name that can be used to refer to 2). In contrast to this, when you call code_lowered(h), h gets evaluated first, and code_lowered is passed its value (which is a function) as argument.
Back to our macro: expanding to an expression that involves the definition of g and f goes way further than mere syntax transformations: we're leaving the purely syntactic domain, since such a transformation would need to "understand" that these are functions, look up their definitions and so on.
You are right to think about code_lowered and friends: this is IMO the adequate level of abstraction for what you're trying to achieve. You should probably look into tools like Cassette.jl or IRTools.jl. That being said, if you're still relatively new to Julia, you might want to get a bit more used to the language before delving too deeply into such topics.

You don't need a macro, you need a generated function. They can not only return code (Expr), but also IR (lowered code). Usually, for this kind of thing, people use Base.uncompressed_ast, not code_lowered. Both Cassette and IRTools simplify the implementation for you, in different ways.
The basic idea is:
Have a generated function that takes a function and its arguments
In that function, get the IR of that function, and modify it to your purposes
Return the new IR from the generated function. This will then be compiled and called on the original arguments.
A short demonstration with IRTools:
julia> IRTools.#dynamo function traced(args...)
ir = IRTools.IR(args...)
p = IRTools.Pipe(ir)
for (v, stmt) in p
IRTools.insertafter!(p, v, IRTools.xcall(println, "loc $v"))
end
return IRTools.finish(p)
end
julia> function h(x)
sin(x)-0.5*cos(x)
end
h (generic function with 1 method)
julia> #code_ir traced(h, 1)
1: (%1, %2)
%3 = Base.getfield(%2, 1)
%4 = Base.getfield(%2, 2)
%5 = Main.sin(%4)
%6 = (println)("loc %3")
%7 = Main.cos(%4)
%8 = (println)("loc %4")
%9 = 0.5 * %7
%10 = (println)("loc %5")
%11 = %5 - %9
%12 = (println)("loc %6")
return %11
julia> traced(h, 1)
loc %3
loc %4
loc %5
loc %6
0.5713198318738266
The rest is left as an exercise. The numbers of the variables are off, because they are, of course, shifted during the transformation. You'd have to add some bookkeeping for that, or use the substitute function on Pipe in some way (but I never quite understood it). If you need the name of the variables, you can get the IR with slots preserved by using a different method of the IR constructor.
(And now the advertisement: I have written something like this. It's currently quite inefficient, but you might get some ideas from it.)

Why are macros based on abstract syntax trees better than macros based on string preprocessing?

I am beginning my journey of learning Rust. I came across this line in Rust by Example:
However, unlike macros in C and other languages, Rust macros are expanded into abstract syntax trees, rather than string preprocessing, so you don't get unexpected precedence bugs.
Why is an abstract syntax tree better than string preprocessing?

If you have this in C:
#define X(A,B) A+B
int r = X(1,2) * 3;
The value of r will be 7, because the preprocessor expands it to 1+2 * 3, which is 1+(2*3).
In Rust, you would have:
macro_rules! X { ($a:expr,$b:expr) => { $a+$b } }
let r = X!(1,2) * 3;
This will evaluate to 9, because the compiler will interpret the expansion as (1+2)*3. This is because the compiler knows that the result of the macro is supposed to be a complete, self-contained expression.
That said, the C macro could also be defined like so:
#define X(A,B) ((A)+(B))
This would avoid any non-obvious evaluation problems, including the arguments themselves being reinterpreted due to context. However, when you're using a macro, you can never be sure whether or not the macro has correctly accounted for every possible way it could be used, so it's hard to tell what any given macro expansion will do.
By using AST nodes instead of text, Rust ensures this ambiguity can't happen.

A classic example using the C preprocessor is
#define MUL(a, b) a * b
// ...
int res = MUL(x + y, 5);
The use of the macro will expand to
int res = x + y * 5;
which is very far from the expected
int res = (x + y) * 5;
This happens because the C preprocessor really just does simple text-based substitutions, it's not really an integral part of the language itself. Preprocessing and parsing are two separate steps.
If the preprocessor instead parsed the macro like the rest of the compiler, which happens for languages where macros are part of the actual language syntax, this is no longer a problem as things like precedence (as mentioned) and associativity are taken into account.

How to prevent this Cyclic polynomial hash function from using a type constraint?

I am trying to implement the Cyclic polynomial hash function in f#. It uses the bit-wise operators ^^^ and <<<. Here is an example of a function that hashes an array:
let createBuzhash (pattern : array<'a>) =
let n = pattern.Length
let rec loop index pow acc =
if index < n then
loop (index+1) (pow-1) (acc ^^^ ((int pattern.[index]) <<< pow))
else
acc
loop 0 (n-1) 0
My problem is that the type of 'a will be constrained to an int, while i want this function to work with any of the types that work with bit-wise operators, for example a char. I tried using inline, but that creates some problems farther down in my library. Is there a way to fix this without using inline?
Edit for clarity: The function will be part of a library, and another hash function is provided for types that don't support the bit-wise operators. I want this function to work with arrays of numeric types and/or chars.
Edit 2 (problem solved) : The problem with inline was the way how i loaded the function from my library. instead of
let hashedPattern = library.createBuzhash targetPattern
I used this binding:
let myFunction = library.createBuzhash
let hashedPattern = myFunction targetPattern
that constraints the input type for myFunction to int, although the createBuzhash function is an inline function in the library. Changing the way I call the function fixed the type constraint problem, and inline works perfectly fine, as the answer below suggests.

In the implementation, you are converting the value in the array to an Integer using the int function as follows: int pattern.[index]
This creates a constraint on the type of array elements requiring them to be "something that can be converted to int". If you mark the function as inline, it will actually work for types like char and you'll be able to write:
createBuzhash [|'a'; 'b'|]
But there are still many other types that cannot be converted to integer using the int function.
To make this work for any type, you have to decide how you want to handle types that are not numeric. Do you want to:
Provide your own hashing function for all values?
Use the built-in .NET GetHashCode operation?
Only make your function work on numeric types and arrays of numeric types?
One option would be to add a parameter that specifies how to do the conversion:
let inline createBuzhash conv (pattern : array<'a>) =
let n = pattern.Length
let rec loop index pow acc =
if index < pattern.Length then
loop (index+1) (pow-1) (acc ^^^ ((conv pattern.[index]) <<< pow))
else
acc
loop 0 (n-1) 0
When calling createBuzhash, you now need to give it a function for hashing the elements. This works on primitive types using the int function:
createBuzhash int [| 0 .. 10 |]
createBuzhash int [|'a'; 'b'|]
But you can also use built-in F# hashing mechanism:
createBuzhash hash [| (1,"foo"); (2,"bar") |]
And you can even handle nested arrays by passing the function to itself:
createBuzhash (createBuzhash int) [| [| 1 |]; [| 2 |] |]

Turn off Warning: Extension: Conversion from LOGICAL(4) to INTEGER(4) at (1) for gfortran?

I am intentionally casting an array of boolean values to integers but I get this warning:
Warning: Extension: Conversion from LOGICAL(4) to INTEGER(4) at (1)
which I don't want. Can I either
(1) Turn off that warning in the Makefile?
or (more favorably)
(2) Explicitly make this cast in the code so that the compiler doesn't need to worry?
The code will looking something like this:
A = (B.eq.0)
where A and B are both size (n,1) integer arrays. B will be filled with integers ranging from 0 to 3. I need to use this type of command again later with something like A = (B.eq.1) and I need A to be an integer array where it is 1 if and only if B is the requested integer, otherwise it should be 0. These should act as boolean values (1 for .true., 0 for .false.), but I am going to be using them in matrix operations and summations where they will be converted to floating point values (when necessary) for division, so logical values are not optimal in this circumstance.
Specifically, I am looking for the fastest, most vectorized version of this command. It is easy to write a wrapper for testing elements, but I want this to be a vectorized operation for efficiency.
I am currently compiling with gfortran, but would like whatever methods are used to also work in ifort as I will be compiling with intel compilers down the road.
update:
Both merge and where work perfectly for the example in question. I will look into performance metrics on these and select the best for vectorization. I am also interested in how this will work with matrices, not just arrays, but that was not my original question so I will post a new one unless someone wants to expand their answer to how this might be adapted for matrices.

I have not found a compiler option to solve (1).
However, the type conversion is pretty simple. The documentation for gfortran specifies that .true. is mapped to 1, and false to 0.
Note that the conversion is not specified by the standard, and different values could be used by other compilers. Specifically, you should not depend on the exact values.
A simple merge will do the trick for scalars and arrays:
program test
integer :: int_sca, int_vec(3)
logical :: log_sca, log_vec(3)
log_sca = .true.
log_vec = [ .true., .false., .true. ]
int_sca = merge( 1, 0, log_sca )
int_vec = merge( 1, 0, log_vec )
print *, int_sca
print *, int_vec
end program
To address your updated question, this is trivial to do with merge:
A = merge(1, 0, B == 0)
This can be performed on scalars and arrays of arbitrary dimensions. For the latter, this can easily be vectorized be the compiler. You should consult the manual of your compiler for that, though.
The where statement in Casey's answer can be extended in the same way.
Since you convert them to floats later on, why not assign them as floats right away? Assuming that A is real, this could look like:
A = merge(1., 0., B == 0)

Another method to compliment #AlexanderVogt is to use the where construct.
program test
implicit none
integer :: int_vec(5)
logical :: log_vec(5)
log_vec = [ .true., .true., .false., .true., .false. ]
where (log_vec)
int_vec = 1
elsewhere
int_vec = 0
end where
print *, log_vec
print *, int_vec
end program test
This will assign 1 to the elements of int_vec that correspond to true elements of log_vec and 0 to the others.
The where construct will work for any rank array.

For this particular example you could avoid the logical all together:
A=1-(3-B)/3
Of course not so good for readability, but it might be ok performance-wise.
Edit, running performance tests this is 2-3 x faster than the where construct, and of course absolutely standards conforming. In fact you can throw in an absolute value and generalize as:
integer,parameter :: h=huge(1)
A=1-(h-abs(B))/h
and still beat the where loop.

New Scope in Coq

I'd like my own scope, to play around with long distfixes.
Declare Scope my_scope.
Delimit Scope my_scope with my.
Open Scope my_scope.
Definition f (x y a b : nat) : nat := x+y+a+b.
Notation "x < y * a = b" := (f x y a b)
(at level 100, no associativity) : my_scope.
Check (1 < 2 * 3 = 4)%my.
How do you make a new scope?
EDIT: I chose "x < y * a = b" to override Coq's operators (each with a different precedence).

The command Declare Scope does not exist. The various commands about scopes are described in section 12.2 of the Coq manual.
Your choice of an example notation has inherent problems, because it clashes with pre-defined notations, which seem to be used before your notation.
When looking at the first components the parser sees _ < _ and thinks that you are actually talking about comparison of integers, then it sees the second part as being an instance of the notation _ * _, then it sees that all that is the left hand side of an equality. And all along the parser is happy, it constructs an expression of the form:
(1 < (2 * 3)) = 4
This is constructed by the parser, and the type system has not been solicited yet. The type checker sees a natural number as the first child of (_ < _) and is happy. It sees (_ * _) as the second child and it is happy, it now knows that the first child of that product should be a nat number and it is still happy; in the end it has an equality, and the first component of this equality is in type Prop, but the second component is in type nat.
If you type Locate "_ < _ * _ = _". the answer tells you that you did define a new notation. The problem is that this notation never gets used, because the parser always finds another notation it can use before. Understanding why a notation is preferred to another one requires more knowledge of parsing technology, as alluded to in Coq's manual, chapter 12, in the sentence (obscure to me):
Coq extensible parsing is performed by Camlp5 which is essentially a LL1 parser.
You have to choose the levels of the various variables, x, y, a, and b so that none of these variables will be able to match too much of the text. For instance, I tried defining a notation close to yours, but with a starting and an ending bracket (and I guess this simplifies the task greatly).
Notation "<< x < y * a = b >>" := (f x y a b)
(x at level 59, y at level 39, a at level 59) : my_scope.
The level of x is chosen to be lower than the level of =, the level of y is chosen to be lower than the level of *, the level of a is chosen to be lower than =. The levels were obtained by reading the answer of the command Print Grammar constr. It seems to work, as the following command is accepted.
Check << 1 < 2 * 3 = 4 >>.
But you may need to include a little more engineering to have a really good notation.

To answer the actual question in your title:
The new scope gets created when you declare a notation that uses it. That is, you don’t declare a new scope my_scope separately. You just write
Notation "x <<< y" := (f x y) (at level 100, no associativity) : my_scope.
and that declares a new scope my_scope.

The answers for this question only apply to older versions of Coq. I'm not sure when it started but in at least Coq 8.13.2, Coq prefers the user to first use Declare Scope create a new scope. What the OP has in their code is Coq's preferred way to declare scopes now.
See the current manual

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse