how to return list for eager compilation? - numba

The numba allows eager compilation by telling the function signature. But, I can not find some information about the list type. My test code is:
import numba as nb
import numpy as np
# #nb.jit(nb.ListType())
def Cal():
return [1, np.zeros(shape=[5, 5]), 3]
a = Cal()
What's the function signature for Cal?
In addition, what if there are three outputs? How to provide the function signature? For example:
def TwoOutput():
return 1, 2
Any suggestion is appreciated

You cannot return the list [1, np.zeros(shape=[5, 5]), 3] from a Numba jitted function. In fact, if you try, Numba will throw few error saying that the "compilation is falling back to object mode WITH looplifting enabled because Function "Cal" failed type inference" due to the output type being not well defined. Indeed, the type of the items in the list needs to be all the same and not objects (called typed list). This is not the case here (reflected list). Reflected lists are not supported by Numba (anymore). What makes Numba fast is the type inference enabling it to generate a fast native code. With Python dynamic objects there is no way to generate a fast code (due to many heavy overhead like type checking, reference counting, allocations, etc.).
Note that you can return a tuple if the output always have the same small number of item defined at compile-time. Also note that Numba can automatically infer the output type so you can use #nb.jit(()) here to enable the eager compilation of the target function.
Put it shortly, Numba is not meant to support/speed-up this use-case. Note that Cython can and be slightly faster. You need not to use reflected lists (nor dynamic objects) if you want to get a fast code.

Related

numba error with tuple sorting containing numpy arrays

I have a (working) function that uses the heapq module to build a priority queue of tuples and I would like to compile it with numba, however I get a very long and unclear error. It seems to boil down to a problem with tuple order comparison needed for the queue. The tuples have a fixed format, where the first item is a a floating point number (whose order I care about) and then a numpy array, which I need for computation but never get compared when running normally. This is intended because comparison on numpy arrays yields an array which cannot be used in conditionals and raises an exception. However, I guess numba needs a scalar yielding comparison to be at least defined for all items in the tuple, and hence the numba error.
I have a very minimal example:
#numba.njit
def f():
return 1 if (1, numpy.arange(3)) < (2, numpy.arange(3)) else 2
f()
where the numba compilation fails (without numba it works since it never needs to actually compare the arrays, as in the original code).
Here is a slightly less minimal but maybe clearer example, which shows what I am actually doing:
from heapq import heappush
import numpy
import numba
#numba.njit
def f(n):
heap = [(1, 0, numpy.random.rand(2, 3))]
for unique_id in range(n):
order = numpy.random.rand()
data = numpy.random.rand(2, 3)
heappush(heap, (order, unique_id, data))
return heap[0]
f(100)
Here order is the variable whose order I care about in the queue, unique_id is a trick to avoid that when order is the same the comparison goes on to data and throws an exception.
I tried to bypass the problem converting the numpy array to a list when in the tuple and back to array for computation, but while this compiles, the numba version is slower than the interpreted one, even thought the array is quite small (usually 2x3). Without converting I would need to rewrite the code as loops which I would prefer to avoid (but is doable).
Is there a better alternative to get this working with numba, hopefully running faster than the python interpreter?
I'll try to respond based on the minimal example you provided.
I think that the problem here is not related to the ability of numba to perform comparison between all the elements of the tuple, but rather on where to store the result of such a comparison. This is stated in the error log returned when trying to execute your example:
cannot store {i8*, i8*, i64, i64, i8*, [1 x i64], [1 x i64]} to i1*: mismatching types
Basically, you are trying to store the result of a comparison between a pair of floats and a pair of arrays into a single boolean, and numba doesn't know how to do that.
If you are only interested in comparing the first elements of the tuples, the quickest workaround I can think of is forcing the comparison to happen only on the first elements, e.g.
#numba.njit
def f():
return 1 if (1, numpy.arange(3))[0] < (2, numpy.arange(3))[0] else 2
f()
If this is not applicable to your use case, please provide more details about it.
EDIT
According to the further information you provided, I think the best way to solve this is avoiding pushing the numpy arrays to the heap. Since you're only interested in the ordering properties of the heap, you can just push the keys to the heap and store the corresponding numpy arrays in a separate dictionary, using as keys the same values you push in the heap.
As a sidenote, when you use standard library functions in nopython-jitted functions, you are resorting on specific numba's re-implementation of those functions rather then the "original" python's ones. A comprehensive list of the available python's features in numba can be found here.
Ok, I found a solution to the problem: since storing the array in the heap tuple is the cause of the numba error, it is enough to store it in a separate dictionary with an unique key and store only the key in the heap tuple. For instance, using an integer as the key:
from heapq import heappush
import numpy
import numba
#numba.njit
def f(n):
key = 0
array_storage = {key: numpy.random.rand(2, 3)}
heap = [(1.0, key)]
for _ in range(n):
order = numpy.random.rand()
data = numpy.random.rand(2, 3)
key += 1
heappush(heap, (order, key))
array_storage[key] = data
return heap[0]
f(100)
Now the tuples in the heap can be compared yielding a boolean value and I still get to associate the data with its tuple. I am not completely satisfied since it seems a workaround, but it works pretty well and it is not overly complicated. If anyone has a better one please let me know!

How are methods evaluated in Scala?

Method types have no value. How do we evaluate a method?
Using SML as an example, I have
fun myFunc(x) = x + 5
val b = myFunc(2)
In the second expression, myFun has a type and a value, we use its type to do type checking and use its value together with its argument to evaluate value for b
But in Scala methods without a value how do we evaluate? I am pretty new to Scala so it may not be very clear.
def myFunc(x) = x + 5
val b = myFunc(2)
From val b = myFunc(2) to val b = 2 + 5, what happened in between? From where or what object do we know that myFunc(x) is x + 5?
THanks!!
The simple answer is: just because a method is not a value in the sense of "a thing that can be manipulated by you" doesn't mean that it is not a value in the sense of "a thing that can be manipulated by the author of the compiler".
Of course, a method will have an object representing it inside of the compiler. In fact, that object will probably look very similar to the object representing a function inside, say, the MLTon SML compiler or SML/NJ.
In SML, syntax is not a value, but you are not questioning how it is possible to write a function call either, aren't you? After all, in order to call a function in SML, I need to write a function call using function call syntax, so how can I do that when syntax is not a value?
Well, the answer is the same: just because syntax is not a value that the programmer can manipulate, the compiler (or more precisely the parser) obviously does know about syntax.
I can't tell you why the decision was made to have functions be values in Scala but not methods, but I can make a guess. Scala is an object-oriented language. In an object-oriented language, every value is an object, and every object has methods that are bound to that object. So, if methods are objects, they need to have methods, which are objects, which have methods, which are objects, and so on.
There are ways to deal with this, of course, but it makes the language more complex. For a similar reason, classes aren't objects (unlike, say, in Smalltalk, Python, and Ruby). Note that even in highly reflective, introspective, dynamic languages like Ruby, methods are not objects. Classes are, but not methods.
It is possible using reflection to get a proxy object that represents a method, but that object is not the method itself. And you can actually do the same in Scala as well.
And of course it is possible to turn a method into a function value by η-expansion.
I'm assuming that you're compiling to Java Virtual Machine (JVM) bytecode, such as with scalac, which is probably the most common way to use Scala. Disclaimer: I'm not an expert on the JVM, so some parts of this answer might be a bit wrong, but the general idea is right.
Essentially, a method is a set of instructions for the runtime to execute. It exists as part of the compiled code on disk (e.g. a .class file). When the JVM loads the class, it pulls the entire class file into memory, including the methods. When the JVM encounters a method call, it looks up the method and starts executing the instructions in it. If the method returns a result, the JVM makes that result available in the calling code, then does whatever you wanted to do with it there, such as assigning to a variable.
With that knowledge, we can answer some of your questions:
From val b = myFunc(2) to val b = 2 + 5, what happened in between?
This isn't quite how it works, as the JVM doesn't "expand" myFunc in place, but instead looks up myFunc and executes the instructions in it.
From where or what object do we know that myFunc(x) is x + 5?
Not from any object. While myFunc is in memory, it's in an area of memory that you can't access directly (but the JVM can).
why can't it be a value since it is a chunk of memory?
Not all memory fits into the nice abstractions of types and values.

How do purely functional compilers annotate the AST with type info?

In the syntax analysis phase, an imperative compiler can build an AST out of nodes that already contain a type field that is set to null during construction, and then later, in the semantic analysis phase, fill in the types by assigning the declared/inferred types into the type fields.
How do purely functional languages handle this, where you do not have the luxury of assignment? Is the type-less AST mapped to a different kind of type-enriched AST? Does that mean I need to define two types per AST node, one for the syntax phase, and one for the semantic phase?
Are there purely functional programming tricks that help the compiler writer with this problem?
I usually rewrite a source (or an already several steps lowered) AST into a new form, replacing each expression node with a pair (tag, expression).
Tags are unique numbers or symbols which are then used by the next pass which derives type equations from the AST. E.g., a + b will yield something like { numeric(Tag_a). numeric(Tag_b). equals(Tag_a, Tag_b). equals(Tag_e, Tag_a).}.
Then types equations are solved (e.g., by simply running them as a Prolog program), and, if successful, all the tags (which are variables in this program) are now bound to concrete types, and if not, they're left as type parameters.
In a next step, our previous AST is rewritten again, this time replacing tags with all the inferred type information.
The whole process is a sequence of pure rewrites, no need to replace anything in your AST destructively. A typical compilation pipeline may take a couple of dozens of rewrites, some of them changing the AST datatype.
There are several options to model this. You may use the same kind of nullable data fields as in your imperative case:
data Exp = Var Name (Maybe Type) | ...
parse :: String -> Maybe Exp -- types are Nothings here
typeCheck :: Exp -> Maybe Exp -- turns Nothings into Justs
or even, using a more precise type
data Exp ty = Var Name ty | ...
parse :: String -> Maybe (Exp ())
typeCheck :: Exp () -> Maybe (Exp Type)
I cant speak for how it is supposed to be done, but I did do this in F# for a C# compiler here
The approach was basically - build an AST from the source, leaving things like type information unconstrained - So AST.fs basically is the AST which strings for the type names, function names, etc.
As the AST starts to be compiled to (in this case) .NET IL, we end up with more type information (we create the types in the source - lets call these type-stubs). This then gives us the information needed to created method-stubs (the code may have signatures that include type-stubs as well as built in types). From here we now have enough type information to resolve any of the type names, or method signatures in the code.
I store that in the file TypedAST.fs. I do this in a single pass, however the approach may be naive.
Now we have a fully typed AST you could then do things like compile it, fully analyze it, or whatever you like with it.
So in answer to the question "Does that mean I need to define two types per AST node, one for the syntax phase, and one for the semantic phase?", I cant say definitively that this is the case, but it is certainly what I did, and it appears to be what MS have done with Roslyn (although they have essentially decorated the original tree with type info IIRC)
"Are there purely functional programming tricks that help the compiler writer with this problem?"
Given the ASTs are essentially mirrored in my case, it would be possible to make it generic and transform the tree, but the code may end up (more) horrendous.
i.e.
type 'type AST;
| MethodInvoke of 'type * Name * 'type list
| ....
Like in the case when dealing with relational databases, in functional programming it is often a good idea not to put everything in a single data structure.
In particular, there may not be a data structure that is "the AST".
Most probably, there will be data structures that represent parsed expressions. One possible way to deal with type information is to assign a unique identifier (like an integer) to each node of the tree already during parsing and have some suitable data structure (like a hash map) that associates those node-ids with types. The job of the type inference pass, then, would be just to create this map.

Prevent automatic hash function for mutable classes

Python allows hash values only for immutable objects. For example,
hash((1,2,3))
works, but
hash([1,2,3])
raises a TypeError: unhashable type: 'list'. See the Python documentation. However, when I wrap a C++ class in Boost.Python via the usual boost::python::class_<> function, every generated Python class has a default hash function, where the hash value is related to the object's location in memory. (On my 64-bit OS, the hash value is the location divided by 8.)
When I expose a class to Python whose members can be changed (any mutable data structure, so this is a very common situation!), I do not want a default hash function but want a call to hash() raise the same TypeError as users receive for Python's own mutable data types. In particular, users shouldn't be able to accidentally use mutable objects as dictionary keys. How can I achieve this in the C++ code?
I found out how it goes:
boost::python::class_<MyClass>("MyClass")
.setattr("__hash__", boost::python::object());
A boost::python::object which is initialized with no arguments corresponds to None. The procedure for disabling hash generation in the pure Python C API is a little more complicated, as is described in the Python documentation. However, the above code snippet apparently does the job in boost::python.
On a sidenote: The Boost.Python behaviour mirrors the default behaviour of classes in Python, where objects are basically hashable as of object id (derived from id(x)):
>>> hash(object())
8795488122377
>>> class MyClass(object): pass
...
>>> hash(MyClass)
878579
>>> hash(MyClass())
8795488082665
>>>

How can I reuse definition (AST) subtrees in a macro?

I am working in a Scala embedded DSL and macros are becoming a main tool for achieving my purposes. I am getting an error while trying to reuse a subtree from the incoming macro expression into the resulting one. The situation is quite complex, but (I hope) I have simplified it for its understanding.
Suppose we have this code:
val y = transform {
val x = 3
x
}
println(y) // prints 3
where 'transform' is the involved macro. Although it could seem it does absolutely nothing, it is really transforming the shown block into this expression:
3 match { case x => x }
It is done with this macro implementation:
def transform(c: Context)(block: c.Expr[Int]): c.Expr[Int] = {
import c.universe._
import definitions._
block.tree match {
/* {
* val xNam = xVal
* xExp
* }
*/
case Block(List(ValDef(_, xNam, _, xVal)), xExp) =>
println("# " + showRaw(xExp)) // prints Ident(newTermName("x"))
c.Expr(
Match(
xVal,
List(CaseDef(
Bind(xNam, Ident(newTermName("_"))),
EmptyTree,
/* xExp */ Ident(newTermName("x")) ))))
case _ =>
c.error(c.enclosingPosition, "Can't transform block to function")
block // keep original expression
}
}
Notice that xNam corresponds with the variable name, xVal corresponds with its associated value and finally xExp corresponds with the expression containing the variable. Well, if I print the xExp raw tree I get Ident(newTermName("x")), and that is exactly what is set in the case RHS. Since the expression could be modified (for instance x+2 instead of x), this is not a valid solution for me. What I want to do is to reuse the xExp tree (see the xExp comment) while altering the 'x' meaning (it is a definition in the input expression but will be a case LHS variable in the output one), but it launches a long error summarized in:
symbol value x does not exist in org.habla.main.Main$delayedInit$body.apply); see the error output for details.
My current solution consists on the parsing of the xExp to sustitute all the Idents with new ones, but it is totally dependent on the compiler internals, and so, a temporal workaround. It is obvious that the xExp comes along with more information that the offered by showRaw. How can I clean that xExp for allowing 'x' to role the case variable? Can anyone explain the whole picture of this error?
PS: I have been trying unsuccessfully to use the substitute* method family from the TreeApi but I am missing the basics to understand its implications.
Disassembling input expressions and reassembling them in a different fashion is an important scenario in macrology (this is what we do internally in the reify macro). But unfortunately, it's not particularly easy at the moment.
The problem is that input arguments of the macro reach macro implementation being already typechecked. This is both a blessing and a curse.
Of particular interest for us is the fact that variable bindings in the trees corresponding to the arguments are already established. This means that all Ident and Select nodes have their sym fields filled in, pointing to the definitions these nodes refer to.
Here is an example of how symbols work. I'll copy/paste a printout from one of my talks (I don't give a link here, because most of the info in my talks is deprecated by now, but this particular printout has everlasting usefulness):
>cat Foo.scala
def foo[T: TypeTag](x: Any) = x.asInstanceOf[T]
foo[Long](42)
>scalac -Xprint:typer -uniqid Foo.scala
[[syntax trees at end of typer]]// Scala source: Foo.scala
def foo#8339
[T#8340 >: Nothing#4658 <: Any#4657]
(x#9529: Any#4657)
(implicit evidence$1#9530: TypeTag#7861[T#8341])
: T#8340 =
x#9529.asInstanceOf#6023[T#8341];
Test#14.this.foo#8339[Long#1641](42)(scala#29.reflect#2514.`package`#3414.mirror#3463.TypeTag#10351.Long#10361)
To recap, we write a small snippet and then compile it with scalac, asking the compiler to dump the trees after the typer phase, printing unique ids of the symbols assigned to trees (if any).
In the resulting printout we can see that identifiers have been linked to corresponding definitions. For example, on the one hand, the ValDef("x", ...), which represents the parameter of the method foo, defines a method symbol with id=9529. On the other hand, the Ident("x") in the body of the method got its sym field set to the same symbol, which establishes the binding.
Okay, we've seen how bindings work in scalac, and now is the perfect time to introduce a fundamental fact.
If a symbol has been assigned to an AST node,
then subsequent typechecks will never reassign it.
This is why reify is hygienic. You can take a result of reify and insert it into an arbitrary tree (that possibly defines variables with conflicting names) - the original bindings will remain intact. This works because reify preserves the original symbols, so subsequent typechecks won't rebind reified AST nodes.
Now we're all set to explain the error you're facing:
symbol value x does not exist in org.habla.main.Main$delayedInit$body.apply); see the error output for details.
The argument of the transform macro contains both a definition and a reference to a variable x. As we've just learned, this means that the corresponding ValDef and Ident will have their sym fields synchronized. So far, so good.
However unfortunately the macro corrupts the established binding. It recreates the ValDef, but doesn't clean up the sym field of the corresponding Ident. Subsequent typecheck assigns a fresh symbol to the newly created ValDef, but doesn't touch the original Ident that is copied to the result verbatim.
After the typecheck, the original Ident points to a symbol that no longer exists (this is exactly what the error message was saying :)), which leads to a crash during bytecode generation.
So how do we fix the error? Unfortunately there is no easy answer.
One option would be to utilize c.resetLocalAttrs, which recursively erases all symbols in a given AST node. Subsequent typecheck will then reestablish the bindings granted that the code you generated doesn't mess with them (if, for example, you wrap xExp in a block that itself defines a value named x, then you're in trouble).
Another option is to fiddle with symbols. For example, you could write your own resetLocalAttrs that only erases corrupted bindings and doesn't touch the valid ones. You could also try and assign symbols by yourself, but that's a short road to madness, though sometimes one is forced to walk it.
Not cool at all, I agree. We're aware of that and intend to try and fix this fundamental issue sometimes. However right now our hands are full with bugfixing before the final 2.10.0 release, so we won't be able to address the problem in the nearest future. upd. See https://groups.google.com/forum/#!topic/scala-internals/rIyJ4yHdPDU for some additional information.
Bottom line. Bad things happen, because bindings get messed up. Try resetLocalAttrs first, and if it doesn't work, prepare yourself for a chore.