Scala compiler plugin to rewrite method calls - scala

I'm trying to write a Scala compiler plugin that rewrites method calls from instantiating Set to instantiations of LinkedHashSet. Unfortunately I can't find any working example that does this already. The following code fails with no-symbol does not have an owner:
object DemoErasureComponent extends PluginComponent with TypingTransformers with Transform {
val global: DemoPlugin.this.global.type = DemoPlugin.this.global
import global._
override val runsAfter = List("erasure")
val phaseName = "rewrite-sets"
def newTransformer(unit: CompilationUnit) = new SetTransformer(unit)
class SetTransformer(unit: CompilationUnit) extends TypingTransformer(unit) {
override def transform(tree: Tree): Tree = tree match {
case a#Apply(r#Select(rcvr#Select(predef, set), name), args) if name.toString == "Set" =>
localTyper.typed(treeCopy.Apply(tree, Ident(newTermName("LinkedHashSet")), args))
case t => super.transform(tree)
}
}
For the record, I've found these resources so far:
Scalaxy compiler plugin: https://github.com/ochafik/Scalaxy
Boxer compiler plugin example: https://github.com/retronym/boxer
outdated compiler plugin inside the Scala compiler: http://lampsvn.epfl.ch/trac/scala/browser/scala/trunk/docs/examples/plugintemplate/src/plugintemplate/TemplateTransformComponent.scala

localTyper.typed(treeCopy.Apply(tree, Ident(newTermName("LinkedHashSet")), args))
Here, you are creating a new Apply node via a tree copy, which will copy the type and symbol from tree.
When you typecheck this node, the typer will not recurse into its children as it is already typed, so the Ident will pass through without a type and symbol, which will likely lead to a crash in the code generation phase.
Instead of creating an unattributed Ident and typechecking it, it is more customary to create a fully attrtibuted reference with of the utility methods in TreeGen.
gen.mkAttributedRef(typeOf[scala.collection.immutable.LinkedHashSet].typeSymbol)
The case is also pretty suspicious. You should never need to compare strings like that. It is always better to compare Symbols.
Furthermore, you have to be aware of the tree shapes at the phase you're inserting your compiler plugin. Notice below how after typer, the tree is expanded to include the call to the apply method, and that after erasure, the variable arguments are wrapped in single argument.
% qscalac -Xprint:parser,typer,erasure sandbox/test.scala
[[syntax trees at end of parser]] // test.scala
package <empty> {
object Test extends scala.AnyRef {
def <init>() = {
super.<init>();
()
};
Set(1, 2, 3)
}
}
[[syntax trees at end of typer]] // test.scala
package <empty> {
object Test extends scala.AnyRef {
def <init>(): Test.type = {
Test.super.<init>();
()
};
scala.this.Predef.Set.apply[Int](1, 2, 3)
}
}
[[syntax trees at end of erasure]] // test.scala
package <empty> {
object Test extends Object {
def <init>(): Test.type = {
Test.super.<init>();
()
};
scala.this.Predef.Set().apply(scala.this.Predef.wrapIntArray(Array[Int]{1, 2, 3}))
}
}
Tracking down non-determinism that is caused by making assumptions about the iteration order of HashMaps can be a bit of a nightmare. But I'd caution against this sort of rewriting. If you want a policy within your system to say that non-determistic Sets and Maps should not be used, removing direct usages of them is insufficient. Every call to .toSet or toMap will end up using them anyway.
You might be better served to instrument the bytecode of the standard library (or use a patched standard library) in a test mode to catch all instantiations of these structures (perhaps logging a stack trace). Or, as a more fine grained alternative, find calls to place like HashTrieSet#foreach (although, you'll need to filter out benign usages like the call to foreach within collection.immutable.HashSet(1, 2, 3).count(_ => true).

Related

scala 3 macro: get class properties

i want to writing a macro to get property names of a class.
but can not use Symbol module in quoted statement. i receive blow error...
inline def getProps(inline className: String): Iterable[String] = ${ getPropsImpl('className) }
private def getPropsImpl(className: Expr[String])(using Quotes): Expr[Iterable[String]] = {
import quotes.reflect.*
val props = '{
Symbol.classSymbol($className).fieldMembers.map(_.name) // error access to parameter x$2 from
} wrong staging level:
props - the definition is at level 0,
} - but the access is at level 1.
There are compile time and runtime of macros. And there are compile time and runtime of main code. The runtime of macros is the compile time of main code.
def getPropsImpl... =
'{ Symbol.classSymbol($className).fieldMembers.map(_.name) }
...
is incorrect because what Scala 3 macros do is transforming trees into trees (i.e. Exprs into Exprs, Expr is a wrapper over a tree) (*). The tree
Symbol.classSymbol($className).fieldMembers.map(_.name)
will make no sense inside the scope of application site. Symbol, Symbol.classSymbol etc. make sense here, inside the scope of macro.
def getPropsImpl... =
Symbol.classSymbol(className).fieldMembers.map(_.name)
...
would be also incorrect because className as a value doesn't exist yet, it's just a tree now.
I guess correct is with .valueOrAbort
import scala.quoted.*
inline def getProps(inline className: String): Iterable[String] = ${getPropsImpl('className)}
def getPropsImpl(className: Expr[String])(using Quotes): Expr[Iterable[String]] = {
import quotes.reflect.*
Expr.ofSeq(
Symbol.classSymbol(className.valueOrAbort).fieldMembers.map(s =>
Literal(StringConstant(s.name)).asExprOf[String]
)
)
}
Usage:
// in other file
getProps("mypackage.App.A") //ArraySeq(s, i)
// in other subproject
package mypackage
object App {
case class A(i: Int, s: String)
}
(*) Scala 2 macros can do more with c.eval. In Scala 3 there is similar thing staging.run but it's forbidden in macros.
Actually, c.eval (or forbidden staging.run) can be emulated in Scala 3 too
get annotations from class in scala 3 macros

In scala macro, is it possible to execute code after the macro context has aborted?

I'm solving a macro bug that may affects shapeless & singleton-ops for scala 2, both contains the following code to override the implicitNotFound annotation of a particular type symbol:
def setAnnotation(msg: String, annotatedSym : TypeSymbol): Unit = {
import c.internal._
import decorators._
val tree0 =
c.typecheck(
q"""
new _root_.scala.annotation.implicitNotFound("dummy")
""",
silent = false
)
class SubstMessage extends Transformer {
val global = c.universe.asInstanceOf[scala.tools.nsc.Global]
override def transform(tree: Tree): Tree = {
super.transform {
tree match {
case Literal(Constant("dummy")) => Literal(Constant(msg))
case t => t
}
}
}
}
val tree = new SubstMessage().transform(tree0)
annotatedSym.setAnnotations(Annotation(tree))
()
}
The above code will override the error reporting when c.abort() is called, after which this stateful change should be regard as useless, and the change should be reverted ASAP. In fact, since the TypeSymbol is unique within each compiler invocation, failing to do so will cause the remainder of the compilation to behave erratically (see
https://github.com/fthomas/singleton-ops/issues/234 for detail)
To my best knowledge, what I"m asking for seems to be beyond the capability of the macro system. I need to add the clean up code into a thunk which can be called by The context AFTER the top-level implicit not found error has been reported. How should I do this?

scala typeclass extension/generalization initialization

I ran into a weird and puzzling NPE. Consider the following usecase:
Writing a generic algorithm (binary search in my case), where you'd want to generalize the type, but need some extras.
e.g: maybe you want to cut a range in half, and you need a generic half or two "consts".
Integral typeclass is not enough, since it only offers one and zero, so I came up with:
trait IntegralConsts[N] {
val tc: Integral[N]
val two = tc.plus(tc.one,tc.one)
val four = tc.plus(two,two)
}
object IntegralConsts {
implicit def consts[N : Integral] = new IntegralConsts[N] {
override val tc = implicitly[Integral[N]]
}
}
and used it as follows:
def binRangeSearch[N : IntegralConsts]( /* irrelevant args */ ) = {
val consts = implicitly[IntegralConsts[N]]
val math = consts.tc
// some irrelevant logic, which contain expressions like:
val halfRange = math.quot(range, consts.two)
// ...
}
In runtime, this throws a puzzling NullPointerException on this line: val two = tc.plus(tc.one,tc.one).
As a workaround, I just added lazy to the typeclass' vals, and it all worked out:
trait IntegralConsts[N] {
val tc: Integral[N]
lazy val two = tc.plus(tc.one,tc.one)
lazy val four = tc.plus(two,two)
}
But I would want to know why I got this weird NPE. Initialization order should be known, and tc should have already been instantiated when reaching val two ...
Initialization order should be known, and tc should have already been
instantiated when reaching val two
Not according to the specification. What really happens is that while constructing the anonymous class, first IntegralConsts[T] will be initialized, and only then will the overriding of tc be evacuated in the derived anon class, which is why you're experiencing the NullPointerException.
The specification section §5.1 (Templates) says:
Template Evaluation
Consider a template sc with mt1 with mtn { stats }.
If this is the template of a trait then its mixin-evaluation consists of an evaluation of the statement sequence stats.
If this is not a template of a trait, then its evaluation consists of the following steps:
First, the superclass constructor sc is evaluated.
Then, all base classes in the template's linearization up to the template's superclass denoted by sc are mixin-evaluated. Mixin-evaluation happens in reverse order of occurrence in the linearization.
Finally the statement sequence stats is evaluated.
We can verify this by looking at the compiled code with -Xprint:typer:
final class $anon extends AnyRef with IntegralConsts[N] {
def <init>(): <$anon: IntegralConsts[N]> = {
$anon.super.<init>();
()
};
private[this] val tc: Integral[N] = scala.Predef.implicitly[Integral[N]](evidence$1);
override <stable> <accessor> def tc: Integral[N] = $anon.this.tc
};
We see that first, super.<init> is invoked, and only then is the val tc initialized.
Adding to that, lets look at "Why is my abstract or overridden val null?":
A ‘strict’ or ‘eager’ val is one which is not marked lazy.
In the absence of “early definitions” (see below), initialization of
strict vals is done in the following order:
Superclasses are fully initialized before subclasses.
Otherwise, in declaration order.
Naturally when a val is overridden, it is not initialized more than once ... This is not the case: an overridden val will appear to be null during the construction of superclasses, as will an abstract val.
We can also verify this by passing the -Xcheckinit flag to scalac:
> set scalacOptions := Seq("-Xcheckinit")
[info] Defining *:scalacOptions
[info] The new value will be used by compile:scalacOptions
[info] Reapplying settings...
[info] Set current project to root (in build file:/C:/)
> console
> :pa // paste code here
defined trait IntegralConsts
defined module IntegralConsts
binRangeSearch: [N](range: N)(implicit evidence$2: IntegralConsts[N])Unit
scala> binRangeSearch(100)
scala.UninitializedFieldError: Uninitialized field: <console>: 16
at IntegralConsts$$anon$1.tc(<console>:16)
at IntegralConsts$class.$init$(<console>:9)
at IntegralConsts$$anon$1.<init>(<console>:15)
at IntegralConsts$.consts(<console>:15)
at .<init>(<console>:10)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
As you've noted, since this is an anonymous class, adding the lazy to the definition avoids the initialization quirk altogether. An alternative would be to use early definition:
object IntegralConsts {
implicit def consts[N : Integral] = new {
override val tc = implicitly[Integral[N]]
} with IntegralConsts[N]
}

Implicit resolution in scala 2.10.x. what's going on?

I am using scala 2.10.0-snapshot dated (20120522) and have the following Scala files:
this one defines the typeclass and a basic typeclass instance:
package com.netgents.typeclass.hole
case class Rabbit
trait Hole[A] {
def findHole(x: A): String
}
object Hole {
def apply[A: Hole] = implicitly[Hole[A]]
implicit val rabbitHoleInHole = new Hole[Rabbit] {
def findHole(x: Rabbit) = "Rabbit found the hole in Hole companion object"
}
}
this is the package object:
package com.netgents.typeclass
package object hole {
def findHole[A: Hole](x: A) = Hole[A].findHole(x)
implicit val rabbitHoleInHolePackage = new Hole[Rabbit] {
def findHole(x: Rabbit) = "Rabbit found the hole in Hole package object"
}
}
and here is the test:
package com.netgents.typeclass.hole
object Test extends App {
implicit val rabbitHoleInOuterTest = new Hole[Rabbit] {
def findHole(x: Rabbit) = "Rabbit found the hole in outer Test object"
}
{
implicit val rabbitHoleInInnerTest = new Hole[Rabbit] {
def findHole(x: Rabbit) = "Rabbit found the hole in inner Test object"
}
println(findHole(Rabbit()))
}
}
As you can see, Hole is a simple typeclass that defines a method which a Rabbit is trying to find. I am trying to figure out the implicit resolution rules on it.
with all four typeclass instances uncommented, scalac complains about ambiguities on rabbitHoleInHolePackage and rabbitHoleInHole. (Why?)
if I comment out rabbitHoleInHole, scalac compiles and I get back "Rabbit found the hole in Hole package object". (Shouldn't implicits in the local scope take precedence?)
if I then comment out rabbitHoleInHolePackage, scalac complains about ambiguities on rabbitHoleInOuterTest and rabbitHoleInInnerTest. (Why? In the article by eed3si9n, url listed below, he found implicits btw inner and outer scope can take different precedence.)
if I then comment out rabbitHoleInInnerTest, scalac compiles and I get back "Rabbit found the hole in outer Test object".
As you can see, the above behaviors do not follow the rules I've read on implicit resolution at all. I've only described a fraction of combinations you can do on commenting/uncommenting out instances and most of them are very strange indeed - and I haven't gotten into imports and subclasses yet.
I've read and watched presentation by suereth, stackoverflow answer by sobral, and a very elaborate revisit by eed3si9n, but I am still completely baffled.
Let's start with the implicits in the package object and the type class companion disabled:
package rabbit {
trait TC
object Test extends App {
implicit object testInstance1 extends TC { override def toString = "test1" }
{
implicit object testInstance2 extends TC { override def toString = "test2" }
println(implicitly[TC])
}
}
}
Scalac looks for any in scope implicits, finds testInstance1 and testInstance2. The fact that one is in a tighter scope is only relevant if they have the same name -- the normal rules of shadowing apply. We've chosen distinct names, and there neither implicit is more specific than the other, so an ambiguity is correctly reported.
Let's try another example, this time we'll play off an implicit in the local scope against one in the package object.
package rabbit {
object `package` {
implicit object packageInstance extends TC { override def toString = "package" }
}
trait TC
object Test extends App {
{
implicit object testInstance2 extends TC { override def toString = "test2" }
println(implicitly[TC])
}
}
}
What happens here? The first phase of the implicit search, as before, considers all implicits in scope at the call site. In this case, we have testInstance2 and packageInstance. These are ambiguous, but before reporting that error, the second phase kicks in, and searches the implicit scope of TC.
But what is in the implicit scope here? TC doesn't even have a companion object? We need to review the precise definition here, in 7.2 of the Scala Reference.
The implicit scope of a type T consists of all companion modules
(§5.4) of classes that are associated with the implicit parameter’s
type. Here, we say a class C is associated with a type T, if it
is a base class (§5.1.2) of some part of T.
The parts of a type T are:
if T is a compound type T1 with ... with Tn,
the union of the parts of T1, ..., Tn, as well as T itself,
if T is a parameterized type S[T1, ..., Tn], the union of the parts of S and
T1,...,Tn,
if T is a singleton type p.type, the parts of the type of p,
if T is a type projection S#U, the parts of S as well as T itself,
in all other cases, just T itself.
We're searching for rabbit.TC. From a type system perspective, this is a shorthand for: rabbit.type#TC, where rabbit.type is a type representing the package, as though it were a regular object. Invoking rule 4, gives us the parts TC and p.type.
So, what does that all mean? Simply, implicit members in the package object are part of the implicit scope, too!
In the example above, this gives us an unambiguous choice in the second phase of the implicit search.
The other examples can be explained in the same way.
In summary:
Implicit search proceeds in two phases. The usual rules of importing and shadowing determine a list of candidates.
implicit members in an enclosing package object may also be in scope, assuming you are using nested packages.
If there are more than one candidate, the rules of static overloading are used to see if there is a winner. An addiotnal a tiebreaker, the compiler prefers one implicit over another defined in a superclass of the first.
If the first phase fails, the implicit scope is consulted in the much same way. (A difference is that implicit members from different companions may have the same name without shadowing each other.)
Implicits in package objects from enclosing packages are also part of this implicit scope.
UPDATE
In Scala 2.9.2, the behaviour is different and wrong.
package rabbit {
trait TC
object Test extends App {
implicit object testInstance1 extends TC { override def toString = "test1" }
{
implicit object testInstance2 extends TC { override def toString = "test2" }
// wrongly considered non-ambiguous in 2.9.2. The sub-class rule
// incorrectly considers:
//
// isProperSubClassOrObject(value <local Test>, object Test)
// isProperSubClassOrObject(value <local Test>, {object Test}.linkedClassOfClass)
// isProperSubClassOrObject(value <local Test>, <none>)
// (value <local Test>) isSubClass <none>
// <notype> baseTypeIndex <none> >= 0
// 0 >= 0
// true
// true
// true
// true
//
// 2.10.x correctly reports the ambiguity, since the fix for
//
// https://issues.scala-lang.org/browse/SI-5354?focusedCommentId=57914#comment-57914
// https://github.com/scala/scala/commit/6975b4888d
//
println(implicitly[TC])
}
}
}

How to add a new Class in a Scala Compiler Plugin?

In a Scala Compiler Plugin, I'm trying to create a new class that implement a pre-existing trait. So far my code looks like this:
def trait2Impl(original: ClassDef, newName: String): ClassDef = {
val impl = original.impl
// Seems OK to have same self, but does not make sense to me ...
val self = impl.self
// TODO: implement methods ...
val body = impl.body
// We implement original
val parents = original :: impl.parents
val newImpl = treeCopy.Template(impl, parents, self, body)
val name = newTypeName(newName)
// We are a syntheic class, not a user-defined trait
val mods = (original.mods | SYNTHETIC) &~ TRAIT
val tp = original.tparams
val result = treeCopy.ClassDef(original, mods, name, tp, newImpl)
// Same Package?
val owner = original.symbol.owner
// New symbol. What's a Position good for?
val symbol = new TypeSymbol(owner, NoPosition, name)
result.setSymbol(symbol)
symbol.setFlag(SYNTHETIC)
symbol.setFlag(ABSTRACT)
symbol.resetFlag(INTERFACE)
symbol.resetFlag(TRAIT)
owner.info.decls.enter(symbol)
result
}
But it doesn't seem to get added to the package. I suspect that is because actually the package got "traversed" before the trait that causes the generation, and/or because the "override def transform(tree: Tree): Tree" method of the TypingTransformer can only return one Tree, for every Tree that it receives, so it cannot actually produce a new Tree, but only modify one.
So, how do you add a new Class to an existing package? Maybe it would work if I transformed the package when "transform(Tree)" gets it, but I that point I don't know the content of the package yet, so I cannot generate the new Class this early (or could I?). Or maybe it's related to the "Position" parameter of the Symbol?
So far I found several examples where Trees are modified, but none where a completely new Class is created in a Compiler Plugin.
The full source code is here: https://gist.github.com/1794246
The trick is to store the newly created ClassDefs and use them when creating a new PackageDef. Note that you need to deal with both Symbols and trees: a package symbol is just a handle. In order to generate code, you need to generate an AST (just like for a class, where the symbol holds the class name and type, but the code is in the ClassDef trees).
As you noted, package definitions are higher up the tree than classes, so you'd need to recurse first (assuming you'll generate the new class from an existing class). Then, once the subtrees are traversed, you can prepare a new PackageDef (every compilation unit has a package definition, which by default is the empty package) with the new classes.
In the example, assuming the source code is
class Foo {
def foo {
"spring"
}
}
the compiler wraps it into
package <empty> {
class Foo {
def foo {
"spring"
}
}
}
and the plugin transforms it into
package <empty> {
class Foo {
def foo {
"spring"
}
}
package mypackage {
class MyClass extends AnyRef
}
}