ANLTR not walking whole tree when certain syntax appears - scala

I am working on a project which reads source code of various languages. The project itself is written in scala but what I'm doing should be familiar if you know antlr. I have used the scala.g4 grammar on github to generate the Parser, Lexer etc for antlr4. I have written a subclass of ScalaBaseListener which simply prints on overridden the Enter methods
eg
override def enterClassDef(ctx: ScalaParser.ClassDefContext): Unit = {
println(ctx.getText)
}
In my application's main I am attempting to walk the whole tree from a file source like so:
import ScalaLexer._
import org.antlr.v4.runtime._
import org.antlr.v4.runtime.tree._
import scala.io.Source
object Main extends App {
val fileContents = Source.fromFile(args(0)).getLines.mkString
val charStream = new ANTLRInputStream(fileContents)
val lexer = new ScalaLexer(charStream)
val tokens = new CommonTokenStream(lexer)
val parser = new ScalaParser(tokens)
val tree = parser.compilationUnit
ParseTreeWalker.DEFAULT
.walk(new ScalaMySubclassListener(), tree)
}
I have found that if the source file is say, just a couple of classes:
class Foo {
def bar = {
1
}
def baz = 1
}
class Foo1 {
def bar = {
1
}
def baz = 1
}
I can see from the output of my program that every leaf in the tree is walked.
However, if I were to add an import statement at the top of the file (as there will often be in a scala source file)
import Thing._
class Foo {
def bar = {
1
}
def baz = 1
}
class Foo1 {
def bar = {
1
}
def baz = 1
}
only the leaves in the import statement get walked. The rest of the file gets ignored.
When I parse the source file using the antlr4 GUI, the whole tree is visible.

The first thing to do when the parse tree seems cut off is to check whether there are any syntax errors as that would be the most common cause. Since you didn't mess with error handling at all in your code, that means that any syntax errors should be printed to stderr. Since none are, there apparently weren't any syntax errors.
But let's not give up on the idea of there being a syntax error just yet. One common pitfall when it comes to syntax errors in ANTLR is if your start rule does not end with an EOF. If that's the case, ANTLR will simply try to find a prefix of the input that's syntactically valid and ignore the rest. That is, it will stop at the first syntax error without actually producing an error message (as long as there's a valid program leading up to that error - since many grammars accept empty programs, that is very often the case). And sure enough: if we look at Scala.g4 there's no EOF anywhere in the grammar (at the time of this writing anyway). So let's add EOF at the end of the compilationUnit rule. Now if we recompile everything and run your code again, we finally get a syntax error:
line 1:20 mismatched input 'Foo' expecting {<EOF>, '.', ',', 'implicit', 'lazy', 'case', '#', 'override', 'abstract', 'final', 'sealed', 'private', 'protected', 'import', 'class', 'object', 'trait', 'package'}
Now there's two things that might strike you as curious:
Why does ANTLR detect a syntax error when run from your code, but not from the TestRig GUI (even after adding the EOF, the GUI will still show a correct tree).
Why does the error message claim that Foo appears on column 20 of line 1 when it's actually on line 3?
The answer to both of these questions is the same: The input that you're feeding ANTLR is not what's in your test file. To verify this, try printing fileContents after you read it in. You'll see that the all input is on a single line, starting with import Thing._class Foo, which clearly isn't correct syntax.
The reason that happens is that getLines gives you a list of lines without line endings and mkString joins them together without any separator. The quick fix would be to simply pass "\n" as the separator to mkString, but the better solution is to not read the file yourself at all.
Instead you can make ANTLR do it by creating your input stream using CharStreams.fromFileName. This will also get rid of the warning about ANTLRInputStream being deprecated.

Related

Scala illegal start of simple expression with postfix function

I am playing around with calling external command from Scala. Here is a stripped out example of what I am working on:
import scala.sys.process._
object Testing {
def main(args: Array[String]) {
val command = "ls"
val result = command!
if (result != 0) { // <---- illegal start of simple expression
println("Error")
return
}
}
}
I am getting a compile error: illegal start of simple expression for the line with the if statement. I can fix it with a new line:
val result = command!
// Add a line
if (result != 0) {
My suspicion is that it has something to do with the ! postfix function, but it was my understanding that superfluous lines/whitespaces shouldn't make a difference to the compiler.
You need to explicitly enable postfix expressions:
1) Importing the flag locally: import scala.language.postfixOps
2) or adding the flag to the project itself: scalacOptions += "-language:postfixOps"
The above link in the comment from #Łukasz contains lots of info about this feature. Also, see http://docs.scala-lang.org/style/method-invocation.html in the "Suffix Notation" section for your exact use case.
EDIT: maybe it was not clear enough, but as #Łukasz pointed in comments, importing/enabling postfix expressions doesn't make your code compile. It just avoids the compiler warning. Your code won't compile because the semicolons are optional and the compiler is treating the ! operator as infix, and thus taking elements from the next line for the expressions. This is exactly what the documentation in the link above states with exactly this same example:
This style is unsafe, and should not be used. Since semicolons are
optional, the compiler will attempt to treat it as an infix method if
it can, potentially taking a term from the next line.
names toList
val answer = 42 // will not compile!
This may result in unexpected compile errors at best, and happily
compiled faulty code at worst. Although the syntax is used by some
DSLs, it should be considered deprecated, and avoided.

How can I reuse definition (AST) subtrees in a macro?

I am working in a Scala embedded DSL and macros are becoming a main tool for achieving my purposes. I am getting an error while trying to reuse a subtree from the incoming macro expression into the resulting one. The situation is quite complex, but (I hope) I have simplified it for its understanding.
Suppose we have this code:
val y = transform {
val x = 3
x
}
println(y) // prints 3
where 'transform' is the involved macro. Although it could seem it does absolutely nothing, it is really transforming the shown block into this expression:
3 match { case x => x }
It is done with this macro implementation:
def transform(c: Context)(block: c.Expr[Int]): c.Expr[Int] = {
import c.universe._
import definitions._
block.tree match {
/* {
* val xNam = xVal
* xExp
* }
*/
case Block(List(ValDef(_, xNam, _, xVal)), xExp) =>
println("# " + showRaw(xExp)) // prints Ident(newTermName("x"))
c.Expr(
Match(
xVal,
List(CaseDef(
Bind(xNam, Ident(newTermName("_"))),
EmptyTree,
/* xExp */ Ident(newTermName("x")) ))))
case _ =>
c.error(c.enclosingPosition, "Can't transform block to function")
block // keep original expression
}
}
Notice that xNam corresponds with the variable name, xVal corresponds with its associated value and finally xExp corresponds with the expression containing the variable. Well, if I print the xExp raw tree I get Ident(newTermName("x")), and that is exactly what is set in the case RHS. Since the expression could be modified (for instance x+2 instead of x), this is not a valid solution for me. What I want to do is to reuse the xExp tree (see the xExp comment) while altering the 'x' meaning (it is a definition in the input expression but will be a case LHS variable in the output one), but it launches a long error summarized in:
symbol value x does not exist in org.habla.main.Main$delayedInit$body.apply); see the error output for details.
My current solution consists on the parsing of the xExp to sustitute all the Idents with new ones, but it is totally dependent on the compiler internals, and so, a temporal workaround. It is obvious that the xExp comes along with more information that the offered by showRaw. How can I clean that xExp for allowing 'x' to role the case variable? Can anyone explain the whole picture of this error?
PS: I have been trying unsuccessfully to use the substitute* method family from the TreeApi but I am missing the basics to understand its implications.
Disassembling input expressions and reassembling them in a different fashion is an important scenario in macrology (this is what we do internally in the reify macro). But unfortunately, it's not particularly easy at the moment.
The problem is that input arguments of the macro reach macro implementation being already typechecked. This is both a blessing and a curse.
Of particular interest for us is the fact that variable bindings in the trees corresponding to the arguments are already established. This means that all Ident and Select nodes have their sym fields filled in, pointing to the definitions these nodes refer to.
Here is an example of how symbols work. I'll copy/paste a printout from one of my talks (I don't give a link here, because most of the info in my talks is deprecated by now, but this particular printout has everlasting usefulness):
>cat Foo.scala
def foo[T: TypeTag](x: Any) = x.asInstanceOf[T]
foo[Long](42)
>scalac -Xprint:typer -uniqid Foo.scala
[[syntax trees at end of typer]]// Scala source: Foo.scala
def foo#8339
[T#8340 >: Nothing#4658 <: Any#4657]
(x#9529: Any#4657)
(implicit evidence$1#9530: TypeTag#7861[T#8341])
: T#8340 =
x#9529.asInstanceOf#6023[T#8341];
Test#14.this.foo#8339[Long#1641](42)(scala#29.reflect#2514.`package`#3414.mirror#3463.TypeTag#10351.Long#10361)
To recap, we write a small snippet and then compile it with scalac, asking the compiler to dump the trees after the typer phase, printing unique ids of the symbols assigned to trees (if any).
In the resulting printout we can see that identifiers have been linked to corresponding definitions. For example, on the one hand, the ValDef("x", ...), which represents the parameter of the method foo, defines a method symbol with id=9529. On the other hand, the Ident("x") in the body of the method got its sym field set to the same symbol, which establishes the binding.
Okay, we've seen how bindings work in scalac, and now is the perfect time to introduce a fundamental fact.
If a symbol has been assigned to an AST node,
then subsequent typechecks will never reassign it.
This is why reify is hygienic. You can take a result of reify and insert it into an arbitrary tree (that possibly defines variables with conflicting names) - the original bindings will remain intact. This works because reify preserves the original symbols, so subsequent typechecks won't rebind reified AST nodes.
Now we're all set to explain the error you're facing:
symbol value x does not exist in org.habla.main.Main$delayedInit$body.apply); see the error output for details.
The argument of the transform macro contains both a definition and a reference to a variable x. As we've just learned, this means that the corresponding ValDef and Ident will have their sym fields synchronized. So far, so good.
However unfortunately the macro corrupts the established binding. It recreates the ValDef, but doesn't clean up the sym field of the corresponding Ident. Subsequent typecheck assigns a fresh symbol to the newly created ValDef, but doesn't touch the original Ident that is copied to the result verbatim.
After the typecheck, the original Ident points to a symbol that no longer exists (this is exactly what the error message was saying :)), which leads to a crash during bytecode generation.
So how do we fix the error? Unfortunately there is no easy answer.
One option would be to utilize c.resetLocalAttrs, which recursively erases all symbols in a given AST node. Subsequent typecheck will then reestablish the bindings granted that the code you generated doesn't mess with them (if, for example, you wrap xExp in a block that itself defines a value named x, then you're in trouble).
Another option is to fiddle with symbols. For example, you could write your own resetLocalAttrs that only erases corrupted bindings and doesn't touch the valid ones. You could also try and assign symbols by yourself, but that's a short road to madness, though sometimes one is forced to walk it.
Not cool at all, I agree. We're aware of that and intend to try and fix this fundamental issue sometimes. However right now our hands are full with bugfixing before the final 2.10.0 release, so we won't be able to address the problem in the nearest future. upd. See https://groups.google.com/forum/#!topic/scala-internals/rIyJ4yHdPDU for some additional information.
Bottom line. Bad things happen, because bindings get messed up. Try resetLocalAttrs first, and if it doesn't work, prepare yourself for a chore.

Why is the main function not running in the REPL?

This is a simple program. I expected main to run in interpreted mode. But the presence of another object caused it to do nothing. If the QSort were not present, the program would have executed.
Why is main not called when I run this in the REPL?
object MainObject{
def main(args: Array[String])={
val unsorted = List(8,3,1,0,4,6,4,6,5)
print("hello" + unsorted toString)
//val sorted = QSort(unsorted)
//sorted foreach println
}
}
//this must not be present
object QSort{
def apply(array: List[Int]):List[Int]={
array
}
}
EDIT: Sorry for causing confusion, I am running the script as scala filename.scala.
What's happening
If the parameter to scala is an existing .scala file, it will be compiled in-memory and run. When there is a single top level object a main method will be searched and, if found, executed. If that's not the case the top level statements are wrapped in a synthetic main method which will get executed instead.
This is why removing the top-level QSort objects allows your main method to run.
If you're going to expand this to a full program, I advise to compile and run (use a build tool like sbt) the compiled .class files:
scalac main.scala && scala MainObject
If you're writing a single file script, just drop the main method (and its object) and write the statements you want executed in the outer scope, like:
// qsort.scala
object QSort{
def apply(array: List[Int]):List[Int]={
array
}
}
val unsorted = List(8,3,1,0,4,6,4,6,5)
print("hello" + unsorted toString)
val sorted = QSort(unsorted)
sorted foreach println
and run with: scala qsort.scala
A little context
The scala command is meant for executing both scala "scripts" (single file programs) and complex java-like programs (with a main object and a bunch of classes in the classpath).
From man scala:
The scala utility runs Scala code using a Java runtime environment.
The Scala code to run is specified in one of three ways:
1. With no arguments specified, a Scala shell starts and reads com-
mands interactively.
2. With -howtorun:object specified, the fully qualified name of a
top-level Scala object may be specified. The object should pre-
viously have been compiled using scalac(1).
3. With -howtorun:script specified, a file containing Scala code
may be specified.
If not explicitly specified, the howtorun mode is guessed from the arguments passed to the script.
When given a fully qualified name of an object, scala will guess -howtorun:object and expect a compiled object with that name on the path.
Otherwise, if the parameter to scala is an existing .scala file, -howtorun:script is guessed and the entry point is selected as described above.
Any method of an object module can be run in REPL by explicitly specifying it and giving it the arguments it requires if any. For example:
scala> object MainObject{
| def main(args: Array[String])={
| val unsorted = List(9,3,1,0,7,5,9,3,11)
| print("sorted: " + unsorted.sorted)
| }
| def fun = println("fun here")
| }
defined module MainObject
scala> MainObject.main(Array(""))
sorted: List(0, 1, 3, 3, 5, 7, 9, 9, 11)
scala> MainObject.fun
fun here
In some cases this can be useful for quick testing and troubleshooting.

What are the limitations and walkarounds of Scala interpreter?

What kind of constructs need 'scalac' compile and how to make an equivalent that will work in interpreter?
Edit: I want to use scala instead of python as a scripting language. (with #!/usr/bin/scala)
You ought to be able to do anything in the REPL that you can do in outside code. Keep in mind that:
Things that refer to each other circularly need to be inside one block. So the following can't be entered as-is; you have to wrap it inside some other object:
class C(i : Int) { def succ = C(i+1) }
object C { def apply(i: Int) = new C(i) }
The execution environment is somewhat different, so benchmarking timings will not always come out the same way as if you run them from compiled code.
You enter the execution path a different way; if you want to call a main method, though, you certainly can from inside the REPL.
You can't just cut-and-paste an entire library into the REPL and have it work exactly like the library did; the REPL has a different structure than normal packages do. So drop the "package" declarations during testing.
EDIT: after reading your question again, I have to admit, that I didn't really answer it ;). But maybe it still helps.
I know of two limitations of interpreter (or REPL), when it comes to loading scala files (in order to interactively test them).
You can't load scala files with package definitions in them. REPL complains and does not load all class is the scala file to be loaded. It read that it has to do with the fact that files that are loaded into the REPL are treated as an object . . . which of course can't have any package definitions in them.
REPL is strange (or a little bit unpredictable) when there are class files of loaded scala files on the classpath. Check out this question by myself and especially my 2 last comments to the second answer.
Furthermore there is a problem with circular dependencies, that I don't know a workaround for: Suppose there is a Class A that uses Class B which again needs A to do it's job. Of course you can't define A since there is no definition of B and vice versa. Providing a dummy for one of those doesn't work either:
scala> class A {
| def alterString(s:String) = s
| def printStuff(s:String) = println(alterString(s))
| }
defined class A
scala> class B {
| val prefix = "this is a test: "
| def doJob() = new A() printStuff "1 2 3"
| }
defined class B
scala> class A {
| def alterString(s:String) = new B().prefix + s
| def printStuff(s:String) = println(alterString(s))
| }
defined class A
scala> new B().doJob()
1 2 3
scala>
Although I already provided a newer definition of A, class B still used the one that was present when I defined it.
I think everything in Scala will work from the interpreter as well (which I believe just calls the compiler under the hood anyway).
Do you suspect something to not work? Or is this a trick/interview question (I suppose anything that want to directly interact with the class loader or class files may behave differently in the two environments, but I have no idea really).

"dangling" local blocks in scala

In scala it is possible to define a local block in a function. The local block evaluates to the last statements, for example,
val x = {val x =1;x+1}
Here x==2, the inner val x is local to that block.
However those local blocks can cause sneaky bugs when writing anonymous classes. For example (from scala's reference)
new Iterator[Int]
{...} // new anonymous class inheriting from Iterator[Int]
new Iterator[Int]
{...} //new Iterator[Int] followed by a "dangling" local block
Differntiating between the two cases is frustrating.
Sometimes those two code snippets can compile, for instance if instead of Iterator[Int], Range(0,1,1) is used.
I thought about it and couldn't find a case where "dangling" local block (ie, a local block whose value isn't use) is needed (or makes the code more elegant).
Is there a case where we want a local block, without using its value (and without putting it in a different function and calling this function)? I'll be glad for an example.
If not, I think it would be nice to issue a warning (or even forbid altogther) whenever scalac encounter "dangling" local block. Am I missing something?
Why not write
new Iterator[Int] {
...
}
Edit:
This is the style used by Programming in Scala (see sample chapter pdf)
new RationalTrait {
val numerArg = 1 * x
val denomArg = 2 * x
}
and Java Coding Conventions.
Open brace "{" appears at the end of the same line as the declaration statement
{
import my.crazy.implicit.functions._
// use them...
}
// code I know isn't touched by them.