recognizing eol in scala parser combinators - scala

I'm trying to make a very simple parser with parser combinators (to parse something similar to BNF). I've checked several blog posts that explain the matter (the ones top-ranked at Google (for me)) and I think I understand it but the tests say otherwise.
I've checked the questions in StackOverflow and while some could maybe be applied and useful whenever I try to apply them something else breaks, so best way to to is going through an specific example:
This is my main:
def main(args: Array[String]) {
val parser: BaseParser = new BaseParser
val eol = sys.props("line.separator")
val test = s"a = b ${eol} a = c ${eol}"
System.out.println(test)
parser.parse(test)
}
This is the parser:
import com.github.trylks.tests.parser.ParserClasses._
import scala.util.parsing.combinator.syntactical._
import scala.util.parsing.combinator.ImplicitConversions
import scala.util.parsing.combinator.PackratParsers
class BaseParser extends StandardTokenParsers with ImplicitConversions with PackratParsers {
val eol = sys.props("line.separator")
lexical.delimiters += ("=", "|", "*", "[", "]", "(", ")", ";", eol)
def rules = rep1sep(rule, eol) ^^ { Rules(_) }
def rule = id ~ "=" ~ repsep(expression, "|") ^^ flatten3 { (e1: ID, _: Any, e3: List[Expression]) => Rule(e1, e3) }
def expression: Parser[Expression] = (element | parenthesized | optional) ^^ { x => x } // and sequence and repetition, but that's another problem...
def parenthesized: Parser[Expression] = "(" ~> expression <~ ")" ^^ { x => x }
def optional: Parser[Expression] = "[" ~> expression <~ "]" ^^ { Optional(_) }
def element: Parser[Element] = (id | constant) ^^ { x => x }
def constant: Parser[Constant] = stringLit ^^ { Constant(_) }
def id: Parser[ID] = ident ^^ { ID(_) }
def parse(text: String): Option[Rules] = {
val s = rules(new lexical.Scanner(text))
s match {
case Success(res, next) => {
println("Success!\n" + res.toString)
Some(res)
}
case Error(msg, next) => {
println("error: " + msg)
None
}
case Failure(msg, next) => {
println("failure: " + msg)
None
}
}
}
}
These are the classes that you are missing from the previous part of the code:
object ParserClasses {
abstract class Element extends Expression
case class ID(value: String) extends Element {
override def toString(): String = value
}
case class Constant(value: String) extends Element {
override def toString(): String = value
}
abstract class Expression
case class Optional(value: Expression) extends Expression {
override def toString() = s"[$value]"
}
case class Rule(head: ID, body: List[Expression]) {
override def toString() = s"$head = ${body.mkString(" | ")}"
}
case class Rules(rules: List[Rule]) {
override def toString() = rules.mkString("\n")
}
}
The problem is: as the code is now, it doesn't work, it parses only one rule (not both). If I replace eol with ";" (in the main and the parser) then it works (at least for this test).
Most people seem to prefer regex parsers, every blog explaining parser combinators doesn't get into details about the traits that could be extended or not, so I have no idea about those differences or why there are several (I say this because it may be important to understand why the code doesn't work). The problem is: If I try to use regex parsers then I get errors for all the strings that I have specified in the parsers "=", "*", etc.

Related

Scala 3 Manifest replacement

My task is to print out type information in Java-like notation (using <, > for type arguments notation). In scala 2 I have this small method using scala.reflect.Manifest as a source for type symbol and it's parameters:
def typeOf[T](implicit manifest: Manifest[T]): String = {
def loop[T0](m: Manifest[T0]): String =
if (m.typeArguments.isEmpty) m.runtimeClass.getSimpleName
else {
val typeArguments = m.typeArguments.map(loop(_)).mkString(",")
raw"""${m.runtimeClass.getSimpleName}<$typeArguments>"""
}
loop(manifest)
}
Unfortunately in Scala 3 Manifests are not available. Is there a Scala 3 native way to rewrite this? I'm open to some inline macro stuff. What I have tried so far is
inline def typeOf[T]: String = ${typeOfImpl}
private def typeOfImpl[T: Type](using Quotes): Expr[String] =
import quotes.reflect.*
val tree = TypeTree.of[T]
tree.show
// ^^ call is parameterized with Printer but AFAIK there's no way
// to provide your own implementation for it. You can to chose
// from predefined ones. So how do I proceed from here?
I know that Scala types can't be all represented as Java types. I aim to cover only simple ones that the original method was able to cover. No wildcards or existentials, only fully resolved types like:
List[String] res: List<String>
List[Option[String]] res: List<Option<String>>
Map[String,Option[Int]] res: Map<String,Option<Int>>
I post this answer even though it's not a definitive solution and there's probably a better way but hopefully it can give you some ideas.
I think a good start is using TypeRepr:
val tpr: TypeRepr = TypeRepr.of[T]
val typeParams: List[TypeRepr] = tpr match {
case a: AppliedType => a.args
case _ => Nil
}
Then with a recursive method you should be able to work something out.
Copied from Inspired from https://github.com/gaeljw/typetrees/blob/main/src/main/scala/io/github/gaeljw/typetrees/TypeTreeTagMacros.scala#L12:
private def getTypeString[T](using Type[T], Quotes): Expr[String] = {
import quotes.reflect._
def getTypeStringRec(tpr: TypeRepr)(using Quotes): Expr[String] = {
tpr.asType match {
case '[t] => getTypeString[t]
}
}
val tpr: TypeRepr = TypeRepr.of[T]
val typeParams: List[TypeRepr] = tpr match {
case a: AppliedType => a.args
case _ => Nil
}
val selfTag: Expr[ClassTag[T]] = getClassTag[T]
val argsStrings: Expr[List[String]] =
Expr.ofList(typeParams.map(getTypeStringRec))
'{ /* Compute something using selfTag and argsStrings */ }
}
private def getClassTag[T](using Type[T], Quotes): Expr[ClassTag[T]] = {
import quotes.reflect._
Expr.summon[ClassTag[T]] match {
case Some(ct) =>
ct
case None =>
report.error(
s"Unable to find a ClassTag for type ${Type.show[T]}",
Position.ofMacroExpansion
)
throw new Exception("Error when applying macro")
}
}
The final working solution I came up with was:
def typeOfImpl[T: Type](using Quotes): Expr[String] = {
import quotes.reflect.*
TypeRepr.of[T] match {
case AppliedType(tpr, args) =>
val typeName = Expr(tpr.show)
val typeArguments = Expr.ofList(args.map {
_.asType match {
case '[t] => typeOfImpl[t]
}
})
'{
val tpeName = ${ typeName }
val typeArgs = ${ typeArguments }
typeArgs.mkString(tpeName + "<", ", ", ">")
}
case tpr: TypeRef => Expr(tpr.show)
case other =>
report.errorAndAbort(s"unsupported type: ${other.show}", Position.ofMacroExpansion)
}
}

How to match methods which return a Future and have multiple arguments or multiple arguments list (curried)?

I am playing with scalameta and I want to have a generic measurement annotation which sends measurements about how long the method execution took.
I used Qing Wei's cache annotation demo.
https://www.cakesolutions.net/teamblogs/scalameta-tut-cache
It works for non async methods but my attribute doesn't match on methods which return Future due to the ExecutionContext argument list.
My annotation looks like this:
package measurements
import scala.concurrent.Future
import scala.meta._
class measure(name: String) extends scala.annotation.StaticAnnotation {
inline def apply(defn: Any): Any = meta {
defn match {
case defn: Defn.Def => {
this match {
case q"new $_($backendParam)" =>
val body: Term = MeasureMacroImpl.expand(backendParam, defn)
defn.copy(body = body)
case x =>
abort(s"Unrecognized pattern $x")
}
}
case _ =>
abort("This annotation only works on `def`")
}
}
}
object MeasureMacroImpl {
def expand(nameExpr: Term.Arg, annotatedDef: Defn.Def): Term = {
val name: Term.Name = Term.Name(nameExpr.syntax)
annotatedDef match {
case q"..$_ def $methodName[..$tps](..$nonCurriedParams): $rtType = $expr" => {
rtType match {
case f: Future[Any] => q"""
val name = $name
println("before " + name)
val future: ${rtType} = ${expr}
future.map(result => {
println("after " + name)
result
})
"""
case _ => q"""
val name = $name
println("before " + name)
val result: ${rtType} = ${expr}
println("after " + name)
result
"""
}
}
case _ => abort("This annotation only works on `def`")
}
}
}
I use the annotation like this:
#measure("A")
def test(x: String): String = x
#measure("B")
def testMultipleArg(x: Int, y: Int): Int = x + y
I would like to use it with async methods like this:
#measure("C")
def testAsync(x: String)(implicit ec: ExecutionContext) : Future[String] = {
Future(test(x))
}
but I get the following error:
exception during macro expansion:
scala.meta.internal.inline.AbortException: This annotation only works on `def`
I assume the issue is MeasureMacroImpl matching but I am not sure how to match on multiple argument groups. Could you guys help me? Any ideas or sample code would be greatly appreciated. I am pretty new to scala and scala meta so apologies if I asked a trivial question.
You are getting error because MeasureMacroImpl does not match curried parameters.
It's fairly trivial to match curried params, simply use
scala
case q"..$_ def $methodName[..$tps](...$nonCurriedParams): $rtType = $expr"
Notice the ...$nonCurriedParams instead of ..$nonCurriedParams

Matching braces code doesn't terminate

I have this code below to check a string. We want to verify that it starts with '{' and ends with '}' and that it contains sequences of non-"{}" characters and strings that also have this property.
import util.parsing.combinator._
class Comp extends RegexParsers with PackratParsers {
lazy val bracefree: PackratParser[String] = """[^{}]*""".r ^^ {
case a => a
}
lazy val matching: PackratParser[String] = (
"{" ~ rep(bracefree | matching) ~ "}") ^^ {
case a ~ b ~ c => a + b.mkString("") + c
}
}
object Brackets extends Comp {
def main(args: Array[String])= {
println(parseAll(matching, "{ foo {hello 3 } {}}").get)
}
}
The desired output for this is to echo { foo {hello 3 } {}}, but it ends up taking a long time before dying from java.lang.OutOfMemoryError: GC overhead limit exceeded. What am I doing wrong and what should I have done instead?
Your regular expression for bracefree string matches even an empty string, so parser produced by rep() succeeds without consuming any input and will loop endlessly.
Use a + quantifier instead of *:
lazy val bracefree: PackratParser[String] = """[^{}]+""".r ^^ {
case a => a
}
Also, by default RegexParsers will skip empty strings and whitespaces. To turn that behavior off, just override method skipWhitespace to always return false. In the end your parser will look like this:
import util.parsing.combinator._
class Comp extends RegexParsers with PackratParsers {
override def skipWhitespace = false
lazy val bracefree: PackratParser[String] = """[^{}]+""".r ^^ {
case a => a
}
lazy val matching: PackratParser[String] = (
"{" ~ rep(bracefree | matching) ~ "}") ^^ {
case a ~ b ~ c => a + b.mkString("") + c
}
}
object Brackets extends Comp {
def main(args: Array[String])= {
println(parseAll(matching, "{ foo {hello 3 } {}}").get)
// prints: { foo {hello 3 } {}}
}
}

scala parser combinator infinite loop

I'm trying to write a simple parser in scala but when I add a repeated token Scala seems to get stuck in an infinite loop.
I have 2 parse methods below. One uses rep(). The non repetitive version works as expected (not what I want though) using the rep() version results in an infinite loop.
EDIT:
This was a learning example where I tired to enforce the '=' was surrounded by whitespace.
If it is helpful this is my actual test file:
a = 1
b = 2
c = 1 2 3
I was able to parse: (with the parse1 method)
K = V
but then ran into this problem when tried to expand the exercise out to:
K = V1 V2 V3
import scala.util.parsing.combinator._
import scala.io.Source.fromFile
class MyParser extends RegexParsers {
override def skipWhitespace(): Boolean = { false }
def key: Parser[String] = """[a-zA-Z]+""".r ^^ { _.toString }
def eq: Parser[String] = """\s+=\s+""".r ^^ { _.toString.trim }
def string: Parser[String] = """[^ \t\n]*""".r ^^ { _.toString.trim }
def value: Parser[List[String]] = rep(string)
def foo(key: String, value: String): Boolean = {
println(key + " = " + value)
true
}
def parse1: Parser[Boolean] = key ~ eq ~ string ^^ { case k ~ eq ~ string => foo(k, string) }
def parse2: Parser[Boolean] = key ~ eq ~ value ^^ { case k ~ eq ~ value => foo(k, value.toString) }
def parseLine(line: String): Boolean = {
parse(parse2, line) match {
case Success(matched, _) => true
case Failure(msg, _) => false
case Error(msg, _) => false
}
}
}
object TestParser {
def usage() = {
System.out.println("<file>")
}
def main(args: Array[String]) : Unit = {
if (args.length != 1) {
usage()
} else {
val mp = new MyParser()
fromFile(args(0)).getLines().foreach { mp.parseLine }
println("done")
}
}
}
Next time, please provide some concrete examples, it's not obvious what your input is supposed to look like.
Meanwhile, you can try this, maybe you find it helpful:
import scala.util.parsing.combinator._
import scala.io.Source.fromFile
class MyParser extends JavaTokenParsers {
// override def skipWhitespace(): Boolean = { false }
def key: Parser[String] = """[a-zA-Z]+""".r ^^ { _.toString }
def eq: Parser[String] = "="
def string: Parser[String] = """[^ \t\n]+""".r
def value: Parser[List[String]] = rep(string)
def foo(key: String, value: String): Boolean = {
println(key + " = " + value)
true
}
def parse1: Parser[Boolean] = key ~ eq ~ string ^^ { case k ~ eq ~ string => foo(k, string) }
def parse2: Parser[Boolean] = key ~ eq ~ value ^^ { case k ~ eq ~ value => foo(k, value.toString) }
def parseLine(line: String): Boolean = {
parseAll(parse2, line) match {
case Success(matched, _) => true
case Failure(msg, _) => false
case Error(msg, _) => false
}
}
}
val mp = new MyParser()
for (line <- List("hey = hou", "hello = world ppl", "foo = bar baz blup")) {
println(mp.parseLine(line))
}
Explanation:
JavaTokenParsers and RegexParsers treat white space differently.
The JavaTokenParsers handles the white space for you, it's not specific for Java, it works for most non-esoteric languages. As long as you are not trying to parse Whitespace, JavaTokenParsers is a good starting point.
Your string definition included a *, which caused the infinite recursion.
Your eq definition included something that messed with the empty space handling (don't do this unless it's really necessary).
Furthermore, if you want to parse the whole line, you must call parseAll,
otherwise it parses only the beginning of the string in non-greedy manner.
Final remark: for parsing key-value pairs line by line, some String.split and
String.trim would be completely sufficient. Scala Parser Combinators are a little overkill for that.
PS: Hmm... Did you want to allow =-signs in your key-names? Then my version would not work here, because it does not enforce an empty space after the key-name.
This is not a duplicate, it's a different version with RegexParsers that takes care of whitespace explicitly
If you for some reason really care about the white space, then you could stick to the RegexParsers, and do the following (notice the skipWhitespace = false, explicit parser for whitespace ws, the two ws with squiglies around the equality sign, and the repsep with explicitly specified ws):
import scala.util.parsing.combinator._
import scala.io.Source.fromFile
class MyParser extends RegexParsers {
override def skipWhitespace(): Boolean = false
def ws: Parser[String] = "[ \t]+".r
def key: Parser[String] = """[a-zA-Z]+""".r ^^ { _.toString }
def eq: Parser[String] = ws ~> """=""" <~ ws
def string: Parser[String] = """[^ \t\n]+""".r
def value: Parser[List[String]] = repsep(string, ws)
def foo(key: String, value: String): Boolean = {
print(key + " = " + value)
true
}
def parse1: Parser[Boolean] = (key ~ eq ~ string) ^^ { case k ~ e ~ v => foo(k, v) }
def parse2: Parser[Boolean] = (key ~ eq ~ value) ^^ { case k ~ e ~ v => foo(k, v.toString) }
def parseLine(line: String): Boolean = {
parseAll(parse2, line) match {
case Success(matched, _) => true
case Failure(msg, _) => false
case Error(msg, _) => false
}
}
}
val mp = new MyParser()
for (line <- List("hey = hou", "hello = world ppl", "foo = bar baz blup", "foo= bar baz", "foo =bar baz")) {
println(" (Matches: " + mp.parseLine(line) + ")")
}
Now the parser rejects the lines where there is no whitespace around the equal sign:
hey = List(hou) (Matches: true)
hello = List(world, ppl) (Matches: true)
foo = List(bar, baz, blup) (Matches: true)
(Matches: false)
(Matches: false)
The bug with * instead of + in string has been removed, just like in the previous version.

How to convert context.universe.Annotation to MyAnnotation in a Scala Macro

I am on Scala 2.11.1, I have an annotation
case class MyAnnotation(id: String, message: String) extends StaticAnnotation
I would like to create a macro MessageFromMyAnnotation that transform the following code
class Foo {
#MyAnnotation("001", "James Bond 001")
def foox = {
... // my messy code
val x = MessageFromMyAnnotation("001")
... // my messy code
}
}
to
class Foo {
#MyAnnotation("001", "James Bond 001")
def foox = {
... // my messy code
val x = "Hello world, James Bond 001"
... // my messy code
}
}
In brief, the macro find, on its enclosing element, a message of #MyAnnotation whose id = "001" and return "Hello world, " + message
This is the macro
object MessageFromMyAnnotation {
def apply(id: String) = macro impl
def impl(c: Context)(id: c.Expr[String]): c.Expr[String] = {
c.internal.enclosingOwner.annotations.filter( anno =>
anno.tpe =:= c.universe.typeOf[MyAnnotation] &&
anno.asInstanceOf[MyAnnotation].id == id.value //this does not work
) match {
case anno :: Nil => c.universe.reify("Hello world, " + ...)
case x => c.abort(c.enclosingPosition, c.universe.showRaw(x))
}
}
}
I want to convert anno of type cuniverse.Annotation to MyAnnotation and compare its id with the argument id of type c.Expr[String], but the anno.asInstanceOf[MyAnnotation] yields a ClassCastException and id.value gives me an error message
cannot use value except for signatures of macro implementations
So, please help me with 2 questions:
How to convert anno of type cuniverse.Annotation to MyAnnotation
How to compare its id with the argument id of type c.Expr[String]
I have successfully made it thanks to #Imm's suggestion:
You don't have an instance of MyAnnotation - this is compile time, you only
have an AST that represents the call. You can get the Expr that's the
parameter given to the cuniverse.Annotation, and either splice it, or pattern
match it as a String literal and then take the value out of that.
And here is the code
object MessageFromMyAnnotation {
def apply(id: String) = macro impl
def impl(c: Context)(id: c.Expr[String]): c.Expr[String] = {
import c.universe._
id match { case Expr(Literal(Constant(idVal: String))) =>
(for { Annotation(tpe,
Literal(Constant(annoIdVal: String)) ::
Literal(Constant(annoMessageVal: String)) ::
Nil, _) <- c.internal.enclosingOwner.annotations
if tpe =:= typeOf[MyAnnotation] && idVal == annoIdVal
} yield (annoIdVal, annoMessageVal)
) match {
case (annoIdVal, annoMessageVal) :: Nil =>
reify(c.literal("Hello world, " + annoMessageVal).splice)
case matchedMore :: thanOne :: Nil => c.abort(c.enclosingPosition, "Found more than one #MyAnnotation with the same id")
case x => c.abort(c.enclosingPosition, "Not Found #MyAnnotation with the specified id")
}
}
}
}