Pass information from one compiler component to another without mutation - scala

I am building a compiler plugin that has two components
Permission Accumulator: Load the function definitions and some extra meta data about them into a structure like a Map[String, (...)] where String keys represents the function name and the tuple contains the meta information + the definition in scope.
Function Transformer: Recursively traverse the function bodies to check if the metadata of the caller aligns with the callee. More specifically caller.metadata ⊆ callee.metada
This kind of preloading is a rather common thing in compilers (Zinc, Unison etc. all have similar tricks they pull). The first component needs to pass this information it has accumulated to the second component.
Unfortunately the current implementation uses a mutable.Map in the Plugin class and initiates the phases with a reference to this mutable Map. While given the fact that this code won't be surfaced to the end user and some amount of mutation could be tolerated, if someone (including myself) were to add another component/phase that touched this Map in the future, things can go very wrong, resulting in a situation that is painful to debug.
Question: I am wondering if there is a way to instantiate one component, extract some information from it, use that info to init the second component and run it.
Current Implementation:
import scala.collection.mutable.{ Map => MMap }
class Contentable(override val global: Global) extends Plugin {
val functions: MMap[String, (String, List[String])] = MMap()
val components = new PermissionAccumulator(global, functions) :: new FunctionRewriter(global, functions) :: Nil
}
The first component mutates the Map as such:
functions += (dd.name.toString -> ((md5HashString(dd.rhs.toString()), roles)))
What I have tried:
Original plan was to encapsulate the mutation inside the first component and do something like secondComponent(global, firstComponent.functions) but because Scala class create a copy of their arguments when an instance is created, the changes to this Map is not reflected in the second component
Note: I have no problem turning these component to phases if that makes a difference.

Related

How to create immutable Doubly LinkedList in Scala?

Can someone help me to create immutable doubly linked list in Scala by showing one of the method implementation? Also, can you provide me one of example of it by let's implementing prepend(element:Int): DoublyLinkedList[Int] ?
Complementing the already linked video with a link to a text version and includinging important quotes:
This is described at Immutable Doubly Linked Lists in Scala with Call-By-Name and Lazy Values
A doubly-linked list doesn’t offer too much benefit in Scala, quite the contrary.
So this data structure is pretty useless from a practical standpoint, which is why ... there is no doubly-linked list in the Scala standard collection library
All that said, a doubly-linked list is an excellent exercise on the mental model of immutable data structures.
Think about it: this operation in Java would have been done in 3 steps:
new node
new node’s prev is 1
1’s next is the new node
If we can execute that new 1 ⇆ 2 in the same expression, we’ve made it; we can then recursively apply this operation on the rest of the list. Strangely enough, it’s possible.
class DLCons[+T](override val value: T, p: => DLList[T], n: => DLList[T]) extends DLList[T] {
// ... other methods
override def updatePrev[S >: T](newPrev: => DLList[S]) = {
lazy val result: DLCons[S] = new DLCons(value, newPrev, n.updatePrev(result))
result
}
}
Look at updatePrev: we’re creating a new node, whose next reference immediately points to a recursive call; the recursive call is on the current next node, which will update its own previous reference to the result we’re still in the process of defining! This forward reference is only possible in the presence of lazy values and the by-name arguments.
Call By Name & lazy val (Call by need) makes it possible to have immutable doubly linked list.

Losing path dependent type when extracting value from Try in scala

I'm working with scalax to generate a graph of my Spark operationS. So, I have a custom library that generates my graph. So, let me show a sample:
val DAGWithoutGet = createGraphFromOps(ops)
val DAGWithGet = createGraphFromOps(ops).get
The return type of DAGWithoutGet is
scala.util.Try[scalax.collection.Graph[typeA, scalax.collection.GraphEdge.DiEdge]],
and, for DAGWithGet is
scalax.collection.Graph[typeA, scalax.collection.GraphEdge.DiEdge].
Here, typeA is a project related class representing a single Spark operation, not relevant for the context of this question. (for context only: What my custom library does is, essentially, generate a map of dependencies between those operations, creating a big Map object, and calling Graph(myBigMap: _*) to generate the graph).
As far as I know, calling the .get command on this point of my code or later should not make any difference, but that is not what I'm seeing.
Calling DAGWithoutGet.get.nodes has a return type of scalax.collection.Graph[typeA,DiEdge]#NodeSetT,
while calling DAGWithGet.nodes returns DAGWithGet.NodeSetT.
When I extract one of those nodes (using the .find method), I receive scalax.collection.Graph[typeA,DiEdge]#NodeT and DAGWithGet.NodeT types, respectively. Much to my dismay, even the methods available in each case are different - I cannot use pathTo (which happens to be what I want) or withSubgraph on the former, only on the latter.
My doubt is, then, after this relatively complex example: what is going on here? Why extracting the value from the Try construct on different moments leads to different types, one path dependent, and the other not - or, if that isn't correct, what may I be missing here?

Is it possible for a structure made of immutable types to have a cycle?

Consider the following:
case class Node(var left: Option[Node], var right: Option[Node])
It's easy to see how you could traverse this, search it, whatever. But now imagine you did this:
val root = Node(None, None)
root.left = root
Now, this is bad, catastrophic. In fact, you type it into a REPL, you'll get a StackOverflow (hey, that would be a good name for a band!) and a stack trace a thousand lines long. If you want to try it, do this:
{ root.left = root }: Unit
to suppress the REPL well-intentioned attempt to print out the results.
But to construct that, I had to specifically give the case-class mutable members, something I would never do in real life. If I use ordinary mutable members, I get a problem with construction. The closest I can come is
case class Node(left: Option[Node], right: Option[Node])
val root: Node = Node(Some(loop), None)
Then root has the rather ugly value Node(Some(null),None), but it's still not cyclic.
So my question is, if a data-structure is transitively immutable (that is, all of its members are either immutable values or references to other data-structures that are themselves transitively immutable), is it guaranteed to be acyclic?
It would be cool if it were.
Yes, it is possible to create cyclic data structures even with purely immutable data structures in a pure, referentially transparent, effect-free language.
The "obvious" solution is to pull out the potentially cyclic references into a separate data structure. For example, if you represent a graph as an adjacency matrix, then you don't need cycles in your data structure to represent cycles in your graph. But that's cheating: every problem can be solved by adding a layer of indirection (except the problem of having too many layers of indirection).
Another cheat would be to circumvent Scala's immutability guarantees from the outside, e.g. on the default Scala-JVM implementation by using Java reflection methods.
It is possible to create actual cyclic references. The technique is called Tying the Knot, and it relies on laziness: you can actually set the reference to an object that you haven't created yet because the reference will be evaluated lazily, by which time the object will have been created. Scala has support for laziness in various forms: lazy vals, by-name parameters, and the now deprecated DelayedInit. Also, you can "fake" laziness using functions or method: wrap the thing you want to make lazy in a function or method which produces the thing, and it won't be created until you call the function or method.
So, the same techniques should be possible in Scala as well.
How about using lazy with call by name ?
scala> class Node(l: => Node, r: => Node, v: Int)
// defined class Node
scala> lazy val root: Node = new Node(root, root, 5)
// root: Node = <lazy>

Extracting the complete call graph of a scala project (tough one)

I would like to extract from a given Scala project, the call graph of all methods which are part of the project's own source.
As I understand, the presentation compiler doesn't enable that, and it requires going down all the way down to the actual compiler (or a compiler plugin?).
Can you suggest complete code, that would safely work for most scala projects but those that use the wackiest dynamic language features? for the call graph, I mean a directed (possibly cyclic) graph comprising class/trait + method vertices where an edge A -> B indicates that A may call B.
Calls to/from libraries should be avoided or "marked" as being outside the project's own source.
EDIT:
See my macro paradise derived prototype solution, based on #dk14's lead, as an answer below. Hosted on github at https://github.com/matanster/sbt-example-paradise.
Here's the working prototype, which prints the necessary underlying data to the console as a proof of concept. http://goo.gl/oeshdx.
How This Works
I have adapted the concepts from #dk14 on top boilerplate from macro paradise.
Macro paradise lets you define an annotation that will apply your macro over any annotated object in your source code. From there you have access to the AST that the compiler generates for the source, and scala reflection api can be used to explore the type information of the AST elements. Quasiquotes (the etymology is from haskell or something) are used to match the AST for the relevant elements.
More about Quasiquotes
The generally important thing to note is that quasiquotes work over an AST, but they are a strange-at-first-glance api and not a direct representation of the AST (!). The AST is picked up for you by paradise's macro annotation, and then quasiquotes are the tool for exploring the AST at hand: you match, slice and dice the abstract syntax tree using quasiquotes.
The practical thing to note about quasiquotes is that there are fixed quasiquote templates for matching each type of scala AST - a template for a scala class definition, a template for a scala method definition, etc. These tempaltes are all provided here, making it very simple to match and deconstruct the AST at hand to its interesting constituents. While the templates may look daunting at first glance, they are mostly just templates mimicking the scala syntax, and you may freely change the $ prepended variable names within them to names that feel nicer to your taste.
I still need to further hone the quasiquote matches I use, which currently aren't perfect. However, my code seems to produce the desired result for many cases, and honing the matches to 95% precision may be well doable.
Sample Output
found class B
class B has method doB
found object DefaultExpander
object DefaultExpander has method foo
object DefaultExpander has method apply
which calls Console on object scala of type package scala
which calls foo on object DefaultExpander.this of type object DefaultExpander
which calls <init> on object new A of type class A
which calls doA on object a of type class A
which calls <init> on object new B of type class B
which calls doB on object b of type class B
which calls mkString on object tags.map[String, Seq[String]](((tag: logTag) => "[".+(Util.getObjectName(tag)).+("]")))(collection.this.Seq.canBuildFrom[String]) of type trait Seq
which calls map on object tags of type trait Seq
which calls $plus on object "[".+(Util.getObjectName(tag)) of type class String
which calls $plus on object "[" of type class String
which calls getObjectName on object Util of type object Util
which calls canBuildFrom on object collection.this.Seq of type object Seq
which calls Seq on object collection.this of type package collection
.
.
.
It is easy to see how callers and callees can be correlated from this data, and how call targets outside the project's source can be filtered or marked out. This is all for scala 2.11. Using this code, one will need to prepend an annotation to each class/object/etc in each source file.
The challenges that remain are mostly:
Challenges remaining:
This crashes after getting the job done. Hinging on https://github.com/scalamacros/paradise/issues/67
Need to find a way to ultimately apply the magic to entire source files without manually annotating each class and object with the static annotation. This is rather minor for now, and admittedly, there are benefits for being able to control classes to include and ignore anyway. A preprocessing stage that implants the annotation before (almost) every top level source file definition, would be one nice solution.
Honing the matchers such that all and only relevant definitions are matched - to make this general and solid beyond my simplistic and cursory testing.
Alternative Approach to Ponder
acyclic brings to mind a quite opposite approach that still sticks to the realm of the scala compiler - it inspects all symbols generated for the source, by the compiler (as much as I gather from the source). What it does is check for cyclic references (see the repo for a detailed definition). Each symbol supposedly has enough information attached to it, to derive the graph of references that acyclic needs to generate.
A solution inspired by this approach may, if feasible, locate the parent "owner" of every symbol rather than focus on the graph of source files connections as acyclic itself does. Thus with some effort it would recover the class/object ownership of each method. Not sure if this design would not computationally explode, nor how to deterministically obtain the class encompassing each symbol.
The upside would be that there is no need for macro annotations here. The downside is that this cannot sprinkle runtime instrumentation as the macro paradise rather easily allows, which could be at times useful.
It requires more precise analysis, but as a start this simple macro will print all possible applyies, but it requires macro-paradise and all traced classess should have #trace annotation:
class trace extends StaticAnnotation {
def macroTransform(annottees: Any*) = macro tracerMacro.impl
}
object tracerMacro {
def impl(c: Context)(annottees: c.Expr[Any]*): c.Expr[Any] = {
import c.universe._
val inputs = annottees.map(_.tree).toList
def analizeBody(name: String, method: String, body: c.Tree) = body.foreach {
case q"$expr(..$exprss)" => println(name + "." + method + ": " + expr)
case _ =>
}
val output = inputs.head match {
case q"class $name extends $parent { ..$body }" =>
q"""
class $name extends $parent {
..${
body.map {
case x#q"def $method[..$tt] (..$params): $typ = $body" =>
analizeBody(name.toString, method.toString, body)
x
case x#q"def $method[..$tt]: $typ = $body" =>
analizeBody(name.toString, method.toString, body)
x
}
}
}
"""
case x => sys.error(x.toString)
}
c.Expr[Any](output)
}
}
Input:
#trace class MyF {
def call(param: Int): Int = {
call2(param)
if(true) call3(param) else cl()
}
def call2(oaram: Int) = ???
def cl() = 5
def call3(param2: Int) = ???
}
Output (as compiler's warnings, but you may output to file intead of println):
Warning:scalac: MyF.call: call2
Warning:scalac: MyF.call: call3
Warning:scalac: MyF.call: cl
Of course, you might want to c.typeCheck(input) it (as now expr.tpe on found trees is equals null) and find which class this calling method belongs to actually, so the resulting code may not be so trivial.
P.S. macroAnnotations give you unchecked tree (as it's on earlier compiler stage than regular macroses), so if you want something typechecked - the best way is surround the piece of code you want to typecheck with call of some your regular macro, and process it inside this macro (you can even pass some static parameters). Every regular macro inside tree produced by macro-annotation - will be executed as usual.
Edit
The basic idea in this answer was to bypass the (pretty complex) Scala compiler completely, and extract the graph from the generated .class files in the end. It appeared that a decompiler with sufficiently verbose output could reduce the problem to basic text manipulation. However, after a more detailed examination it turned out that this is not the case. One would just get back to square one, but with obfuscated Java code instead of the original Scala code. So this proposal does not really work, although there is some rationale behind working with the final .class files instead of intermediate structures used internally by the Scala compiler.
/Edit
I don't know whether there are tools out there that do it out of the box (I assume that you have checked that). I have only a very rough idea what the presentation compiler is. But if all that you want is to extract a graph with methods as nodes and potential calls of methods as edges, I have a proposal for a quick-and-dirty solution. This would work only if you want to use it for some sort of visualization, it doesn't help you at all if you want to perform some clever refactoring operations.
In case that you want to attempt building such a graph-generator yourself, it might turn out much simpler than you think. But for this, you need to go all the way down, even past the compiler. Just grab your compiled .class files, and use something like the CFR java decompiler on it.
When used on a single compiled .class file, CFR will generate list of classes that the current class depends on (here I use my little pet project as example):
import akka.actor.Actor;
import akka.actor.ActorContext;
import akka.actor.ActorLogging;
import akka.actor.ActorPath;
import akka.actor.ActorRef;
import akka.actor.Props;
import akka.actor.ScalaActorRef;
import akka.actor.SupervisorStrategy;
import akka.actor.package;
import akka.event.LoggingAdapter;
import akka.pattern.PipeToSupport;
import akka.pattern.package;
import scala.Function1;
import scala.None;
import scala.Option;
import scala.PartialFunction;
...
(very long list with all the classes this one depends on)
...
import scavenger.backend.worker.WorkerCache$class;
import scavenger.backend.worker.WorkerScheduler;
import scavenger.backend.worker.WorkerScheduler$class;
import scavenger.categories.formalccc.Elem;
Then it will spit out some horribly looking code, that might look like this (small excerpt):
public PartialFunction<Object, BoxedUnit> handleLocalResponses() {
return SimpleComputationExecutor.class.handleLocalResponses((SimpleComputationExecutor)this);
}
public Context provideComputationContext() {
return ContextProvider.class.provideComputationContext((ContextProvider)this);
}
public ActorRef scavenger$backend$worker$MasterJoin$$_master() {
return this.scavenger$backend$worker$MasterJoin$$_master;
}
#TraitSetter
public void scavenger$backend$worker$MasterJoin$$_master_$eq(ActorRef x$1) {
this.scavenger$backend$worker$MasterJoin$$_master = x$1;
}
public ActorRef scavenger$backend$worker$MasterJoin$$_masterProxy() {
return this.scavenger$backend$worker$MasterJoin$$_masterProxy;
}
#TraitSetter
public void scavenger$backend$worker$MasterJoin$$_masterProxy_$eq(ActorRef x$1) {
this.scavenger$backend$worker$MasterJoin$$_masterProxy = x$1;
}
public ActorRef master() {
return MasterJoin$class.master((MasterJoin)this);
}
What one should notice here is that all methods come with their full signature, including the class in which they are defined, for example:
Scheduler.class.schedule(...)
ContextProvider.class.provideComputationContext(...)
SimpleComputationExecutor.class.fulfillPromise(...)
SimpleComputationExecutor.class.computeHere(...)
SimpleComputationExecutor.class.handleLocalResponses(...)
So if you need a quick-and-dirty solution, it might well be that you could get away with just ~10 lines of awk,grep,sort and uniq wizardry to get nice adjacency lists with all your classes as nodes and methods as edges.
I've never tried it, it's just an idea. I cannot guarantee that Java decompilers work well on Scala code.

Configuration data in Scala -- should I use the Reader monad?

How do I create a properly functional configurable object in Scala? I have watched Tony Morris' video on the Reader monad and I'm still unable to connect the dots.
I have a hard-coded list of Client objects:
class Client(name : String, age : Int){ /* etc */}
object Client{
//Horrible!
val clients = List(Client("Bob", 20), Client("Cindy", 30))
}
I want Client.clients to be determined at runtime, with the flexibility of either reading it from a properties file or from a database. In the Java world I'd define an interface, implement the two types of source, and use DI to assign a class variable:
trait ConfigSource {
def clients : List[Client]
}
object ConfigFileSource extends ConfigSource {
override def clients = buildClientsFromProperties(Properties("clients.properties"))
//...etc, read properties files
}
object DatabaseSource extends ConfigSource { /* etc */ }
object Client {
#Resource("configuration_source")
private var config : ConfigSource = _ //Inject it at runtime
val clients = config.clients
}
This seems like a pretty clean solution to me (not a lot of code, clear intent), but that var does jump out (OTOH, it doesn't seem to me really troublesome, since I know it will be injected once-and-only-once).
What would the Reader monad look like in this situation and, explain it to me like I'm 5, what are its advantages?
Let's start with a simple, superficial difference between your approach and the Reader approach, which is that you no longer need to hang onto config anywhere at all. Let's say you define the following vaguely clever type synonym:
type Configured[A] = ConfigSource => A
Now, if I ever need a ConfigSource for some function, say a function that gets the n'th client in the list, I can declare that function as "configured":
def nthClient(n: Int): Configured[Client] = {
config => config.clients(n)
}
So we're essentially pulling a config out of thin air, any time we need one! Smells like dependency injection, right? Now let's say we want the ages of the first, second and third clients in the list (assuming they exist):
def ages: Configured[(Int, Int, Int)] =
for {
a0 <- nthClient(0)
a1 <- nthClient(1)
a2 <- nthClient(2)
} yield (a0.age, a1.age, a2.age)
For this, of course, you need some appropriate definition of map and flatMap. I won't get into that here, but will simply say that Scalaz (or Rúnar's awesome NEScala talk, or Tony's which you've seen already) gives you all you need.
The important point here is that the ConfigSource dependency and its so-called injection are mostly hidden. The only "hint" that we can see here is that ages is of type Configured[(Int, Int, Int)] rather than simply (Int, Int, Int). We didn't need to explicitly reference config anywhere.
As an aside, this is the way I almost always like to think about monads: they hide their effect so it's not polluting the flow of your code, while explicitly declaring the effect in the type signature. In other words, you needn't repeat yourself too much: you say "hey, this function deals with effect X" in the function's return type, and don't mess with it any further.
In this example, of course the effect is to read from some fixed environment. Another monadic effect you might be familiar with include error-handling: we can say that Option hides error-handling logic while making the possibility of errors explicit in your method's type. Or, sort of the opposite of reading, the Writer monad hides the thing we're writing to while making its presence explicit in the type system.
Now finally, just as we normally need to bootstrap a DI framework (somewhere outside our usual flow of control, such as in an XML file), we also need to bootstrap this curious monad. Surely we'll have some logical entry point to our code, such as:
def run: Configured[Unit] = // ...
It ends up being pretty simple: since Configured[A] is just a type synonym for the function ConfigSource => A, we can just apply the function to its "environment":
run(ConfigFileSource)
// or
run(DatabaseSource)
Ta-da! So, contrasting with the traditional Java-style DI approach, we don't have any "magic" occurring here. The only magic, as it were, is encapsulated in the definition of our Configured type and the way it behaves as a monad. Most importantly, the type system keeps us honest about which "realm" dependency injection is occurring in: anything with type Configured[...] is in the DI world, and anything without it is not. We simply don't get this in old-school DI, where everything is potentially managed by the magic, so you don't really know which portions of your code are safe to reuse outside of a DI framework (for example, within your unit tests, or in some other project entirely).
update: I wrote up a blog post which explains Reader in greater detail.