Is it possible to define a numba jitclass of a recursive data type - numba

I'd like to speed up my python code of a simple implementation of a tree-like data structure. The first thing that comes to mind is to jit it with numba. The problem I'm running into is how to tell numba the types of Node as one of its members is of type Node again. Here's a minimal example:
spec=[("value", nb.float64), ("parent", Node.class_type.instance_type)]
#nb.jitclass(spec)
class Node:
def __init__(self, value:float, parent:"Node") -> None:
self.value:float = value
self.parent:Node = parent
This example obviously doesn't work since when defining spec it doesn't know what to do with the class Node yet.
Is this even possible in numba?

Related

Pass information from one compiler component to another without mutation

I am building a compiler plugin that has two components
Permission Accumulator: Load the function definitions and some extra meta data about them into a structure like a Map[String, (...)] where String keys represents the function name and the tuple contains the meta information + the definition in scope.
Function Transformer: Recursively traverse the function bodies to check if the metadata of the caller aligns with the callee. More specifically caller.metadata ⊆ callee.metada
This kind of preloading is a rather common thing in compilers (Zinc, Unison etc. all have similar tricks they pull). The first component needs to pass this information it has accumulated to the second component.
Unfortunately the current implementation uses a mutable.Map in the Plugin class and initiates the phases with a reference to this mutable Map. While given the fact that this code won't be surfaced to the end user and some amount of mutation could be tolerated, if someone (including myself) were to add another component/phase that touched this Map in the future, things can go very wrong, resulting in a situation that is painful to debug.
Question: I am wondering if there is a way to instantiate one component, extract some information from it, use that info to init the second component and run it.
Current Implementation:
import scala.collection.mutable.{ Map => MMap }
class Contentable(override val global: Global) extends Plugin {
val functions: MMap[String, (String, List[String])] = MMap()
val components = new PermissionAccumulator(global, functions) :: new FunctionRewriter(global, functions) :: Nil
}
The first component mutates the Map as such:
functions += (dd.name.toString -> ((md5HashString(dd.rhs.toString()), roles)))
What I have tried:
Original plan was to encapsulate the mutation inside the first component and do something like secondComponent(global, firstComponent.functions) but because Scala class create a copy of their arguments when an instance is created, the changes to this Map is not reflected in the second component
Note: I have no problem turning these component to phases if that makes a difference.

How to use Scala Cats Validated the correct way?

Following is my use case
I am using Cats for validation of my config. My config file is in json.
I deserialize my config file to my case class Config using lift-json and then validate it using Cats. I am using this as a guide.
My motive for using Cats is to collect all errors iff present at time of validation.
My problem is the examples given in the guide, are of the type
case class Person(name: String, age: Int)
def validatePerson(name: String, age: Int): ValidationResult[Person] = {
(validateName(name),validate(age)).mapN(Person)
}
But in my case I already deserialized my config into my case class ( below is a sample ) and then I am passing it for validation
case class Config(source: List[String], dest: List[String], extra: List[String])
def vaildateConfig(config: Config): ValidationResult[Config] = {
(validateSource(config.source), validateDestination(config.dest))
.mapN { case _ => config }
}
The difference here is mapN { case _ => config }. As I already have a config if everything is valid I dont want to create the config anew from its members. This arises as I am passing config to validate function not it's members.
A person at my workplace told me this is not the correct way, as Cats Validated provides a way to construct an object if its members are valid. The object should not exist or should not be constructible if its members are invalid. Which makes complete sense to me.
So should I make any changes ? Is the above I'm doing acceptable ?
PS : The above Config is just an example, my real config can have other case classes as its members which themselves can depend on other case classes.
One of the central goals of the kind of programming promoted by libraries like Cats is to make invalid states unrepresentable. In a perfect world, according to this philosophy, it would be impossible to create an instance of Config with invalid member data (through the use of a library like Refined, where complex constraints can be expressed in and tracked by the type system, or simply by hiding unsafe constructors). In a slightly less perfect world, it might still be possible to construct invalid instances of Config, but discouraged, e.g. through the use of safe constructors (like your validatePerson method for Person).
It sounds like you're in an even less perfect world where you have instances of Config that may or may not contain invalid data, and you want to validate them to get "new" instances of Config that you know are valid. This is totally possible, and in some cases reasonable, and your validateConfig method is a perfectly legitimate way to solve this problem, if you're stuck in that imperfect world.
The downside, though, is that the compiler can't track the difference between the already-validated Config instances and the not-yet-validated ones. You'll have Config instances floating around in your program, and if you want to know whether they've already been validated or not, you'll have to trace through all the places they could have come from. In some contexts this might be just fine, but for large or complex programs it's not ideal.
To sum up: ideally you'd validate Config instances whenever they are created (possibly even making it impossible to create invalid ones), so that you don't have to remember whether any given Config is good or not—the type system can remember for you. If that's not possible, because of e.g. APIs or definitions you don't control, or if it just seems too burdensome for a simple use case, what you're doing with validateConfig is totally reasonable.
As a footnote, since you say above that you're interested in looking in more detail at Refined, what it provides for you in a situation like this is a way to avoid even more functions of the shape A => ValidationResult[A]. Right now your validateName method, for example, probably takes a String and returns a ValidationResult[String]. You can make exactly the same argument against this signature as I have against Config => ValidationResult[Config] above—once you're working with the result (by mapping a function over the Validated or whatever), you just have a string, and the type doesn't tell you that it's already been validated.
What Refined allows you to do is write a method like this:
def validateName(in: String): ValidationResult[Refined[String, SomeProperty]] = ...
…where SomeProperty might specify a minimum length, or the fact that the string matches a particular regular expression, etc. The important point is that you're not validating a String and returning a String that only you know something about—you're validating a String and returning a String that the compiler knows something about (via the Refined[A, Prop] wrapper).
Again, this may be (okay, probably is) overkill for your use case—you just might find it nice to know that you can push this principle (tracking validation in types) even further down through your program.

What is a "<refinement>" type gotten through a TypeTag?

I have a method:
import scala.reflect.runtime.universe.{TypeTag,typeOf}
def print[T:TypeTag] = println(typeOf[T].typeSymbol.name.toString)
Most of the time, print[MyClass] prints MyClass when invoked, but sometimes, it prints <refinement>?
I am working on a fairly complex system (multiple interconnecting jars, 100K lines of code), and I cannot seem to identify what determines if it is the one behaviour or the other. But if I knew what <refinement> means, or what triggers that, maybe I could?
Refinements could be explained as anonymous class type. E.g.
import scala.reflect.runtime.universe.{TypeTag,typeOf}
def print[T:TypeTag] = println(typeOf[T].typeSymbol.name.toString)
class C
trait T
print[C with T]
type A = C with T
print[A]
Output will be <refinement> in both cases.

in scala what is cleaner code - to def (~member) versus pass function parameter?

Which is cleaner?
def version
trait Foo {
def db: DB
def save() = db.save()
def load() = db.load()
}
versus parametric version:
trait Foo {
def save(db: DB) = db.save()
def load(db: DB) = db.load()
}
(left out intentionaly other parameters/members I want to focus on this one).
I have to say that when I look at complex projects I thank god when functions are taking all their dependencies in
I can unit test them easily without overriding members, the functions tells me all that it's dependent upon on its signature.
I don't have to read their internal code to understand better what the function does, I have its name, I have its input, I have its output all in function signature.
But I also noticed that in scala its very conventional to use the def version, and I have to say that this code when it comes bundled in complex projects such code is much less readable for me. Am I missing something?
I think in this case it highly depends on what the relationship is between Foo and DB. Would it ever be the case that a single instance of Foo would use one DB for load and another for save? If yes, then DB isn't really a dependency of Foo and the first example makes no sense. But it seems to me that the answer is no, that if you call load with one DB, you'll be using the same DB when you call save.
In your first example, that information is encoded into the type system. You're effectively letting the compiler do some correctness checking for you, since now you're enforcing at compile-time that for a single Foo, load and save will be called on the same DB (yes it's possible that db is a var, but that in itself is another issue).
Furthermore, it seems inevitable that you're just going to be passing around a DB every place you pass a Foo. Suppose you have a function that uses Foo. In the first example, your function would look like
def loadFoo(foo: Foo) {
foo.load()
}
whereas in the second it would look like:
def loadFoo(foo: Foo, db: DB) {
foo.load(db)
}
So all you've done is lengthened every function signature and opened up room for errors.
Lastly, I would argue that your points about unit testing and not needing to read a function's code are invalid. In the first example, it's true that you can't see all of load's dependencies just by looking at the function signature. But load is not an isolated function, it is a method that is part of a trait. A method is not identical to a plain old function and they exist in the context of their defining trait.
In other words, you should not be thinking about unit testing the functions, but rather unit testing the trait. They're a package deal and you should have no expectations that their behavior is independent of each other. If you do want that kind of independance, than Foo should be an object which basically makes load and save static methods (although even then objects can have internal state, but that is far less idiomatic).
Plus, you can never really tell what a function is doing just by looking at its dependencies. After all I could write a function:
def save(db: DB){
throw new Exception("hello!!")
}

(Usage of Class Variables) Pythonic - or nasty habit learnt from java?

Hello Pythoneers: the following code is only a mock up of what I'm trying to do, but it should illustrate my question.
I would like to know if this is dirty trick I picked up from Java programming, or a valid and Pythonic way of doing things: basically I'm creating a load of instances, but I need to track 'static' data of all the instances as they are created.
class Myclass:
counter=0
last_value=None
def __init__(self,name):
self.name=name
Myclass.counter+=1
Myclass.last_value=name
And some output of using this simple class , showing that everything is working as I expected:
>>> x=Myclass("hello")
>>> print x.name
hello
>>> print Myclass.last_value
hello
>>> y=Myclass("goodbye")
>>> print y.name
goodbye
>>> print x.name
hello
>>> print Myclass.last_value
goodbye
So is this a generally acceptable way of doing this kind of thing, or an anti-pattern ?
[For instance, I'm not too happy that I can apparently set the counter from both within the class(good) and outside of it(bad); also not keen on having to use full namespace 'Myclass' from within the class code itself - just looks bulky; and lastly I'm initially setting values to 'None' - probably I'm aping static-typed languages by doing this?]
I'm using Python 2.6.2 and the program is single-threaded.
Class variables are perfectly Pythonic in my opinion.
Just watch out for one thing. An instance variable can hide a class variable:
x.counter = 5 # creates an instance variable in the object x.
print x.counter # instance variable, prints 5
print y.counter # class variable, prints 2
print myclass.counter # class variable, prints 2
Do. Not. Have. Stateful. Class. Variables.
It's a nightmare to debug, since the class object now has special features.
Stateful classes conflate two (2) unrelated responsibilities: state of object creation and the created objects. Do not conflate responsibilities because it "seems" like they belong together. In this example, the counting of created objects is the responsibility of a Factory. The objects which are created have completely unrelated responsibilities (which can't easily be deduced from the question).
Also, please use Upper Case Class Names.
class MyClass( object ):
def __init__(self, name):
self.name=name
def myClassFactory( iterable ):
for i, name in enumerate( iterable ):
yield MyClass( name )
The sequence counter is now part of the factory, where the state and counts should be maintained. In a separate factory.
[For folks playing Code Golf, this is shorter. But that's not the point. The point is that the class is no longer stateful.]
It's not clear from question how Myclass instances get created. Lacking any clue, there isn't much more than can be said about how to use the factory. An iterable is the usual culprit. Perhaps something that iterates through a list or a file or some other iterable data structure.
Also -- for folks just of the boat from Java -- the factory object is just a function. Nothing more is needed.
Since the example on the question is perfectly unclear, it's hard to know why (1) two unique objects are created with (2) a counter. The two unique objects are already two unique objects and a counter isn't needed.
For example, the static variables in the Myclass are never referenced anywhere. That makes it very, very hard to understand the example.
x, y = myClassFactory( [ "hello", "goodbye" ] )
If the count or last value where actually used for something, then a perhaps meaningful example could be created.
You can solve this problem by splitting the code into two separate classes.
The first class will be for the object you are trying to create:
class MyClass(object):
def __init__(self, name):
self.Name = name
And the second class will create the objects and keep track of them:
class MyClassFactory(object):
Counter = 0
LastValue = None
#classmethod
def Build(cls, name):
inst = MyClass(name)
cls.Counter += 1
cls.LastValue = inst.Name
return inst
This way, you can create new instances of the class as needed, but the information about the created classes will still be correct.
>>> x = MyClassFactory.Build("Hello")
>>> MyClassFactory.Counter
1
>>> MyClassFactory.LastValue
'Hello'
>>> y = MyClassFactory.Build("Goodbye")
>>> MyClassFactory.Counter
2
>>> MyClassFactory.LastValue
'Goodbye'
>>> x.Name
'Hello'
>>> y.Name
'Goodbye'
Finally, this approach avoids the problem of instance variables hiding class variables, because MyClass instances have no knowledge of the factory that created them.
>>> x.Counter
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'MyClass' object has no attribute 'Counter'
You don't have to use a class variable here; this is a perfectly valid case for using globals:
_counter = 0
_last_value = None
class Myclass(obj):
def __init__(self, name):
self.name = name
global _counter, _last_value
_counter += 1
_last_value = name
I have a feeling some people will knee-jerk against globals out of habit, so a quick review may be in order of what's wrong--and not wrong--with globals.
Globals traditionally are variables which are visible and changeable, unscoped, from anywhere in the program. This is a problem with globals in languages like C. It's completely irrelevant to Python; these "globals" are scoped to the module. The class name "Myclass" is equally global; both names are scoped identically, in the module they're contained in. Most variables--in Python equally to C++--are logically part of instances of objects or locally scoped, but this is cleared shared state across all users of the class.
I don't have any strong inclination against using class variables for this (and using a factory is completely unnecessary), but globals are how I'd generally do it.
Is this pythonic? Well, it's definitely more pythonic than having global variables for a counter and the value of the most recent instance.
It's said in Python that there's only one right way to do anything. I can't think of a better way to implement this, so keep going. Despite the fact that many will criticize you for "non-pythonic" solutions to problems (like the needless object-orientation that Java coders like or the "do-it-yourself" attitude that many from C and C++ bring), in most cases your Java habits will not send you to Python hell.
And beyond that, who cares if it's "pythonic"? It works, and it's not a performance issue, is it?