Scala parser combinator with recursive structures - scala

I'm a beginner with Scala, and now learning Scala parser combinator, writing "MiniLogicParser", a mini parser for propositional logic formula. I am successful for parsing it partly, but can not convert to case class. I tried some codes like below.
import java.io._
import scala.util.parsing.combinator._
sealed trait Bool[+A]
case object True extends Bool[Nothing]
case class Var[A](label: A) extends Bool[A]
case class Not[A](child: Bool[A]) extends Bool[A]
case class And[A](children: List[Bool[A]]) extends Bool[A]
object LogicParser extends RegexParsers {
override def skipWhitespace = true
def formula = ( TRUE | not | and | textdata )
def TRUE = "TRUE"
def not : Parser[_] = NEG ~ formula ^^ {case ( "!" ~ formula) => Not(formula)}
def and : Parser[_] = LPARENTHESIS ~ formula ~ opt(CONJUNCT ~ formula) ~ RPARENTHESIS
def NEG = "!"
def CONJUNCT = "&&"
def LPARENTHESIS = '('
def RPARENTHESIS = ')'
def textdata = "[a-zA-Z0-9]+".r
def apply(input: String): Either[String, Any] = parseAll(formula, input) match {
case Success(logicData, next) => Right(logicData)
case NoSuccess(errorMessage, next) => Left(s"$errorMessage on line ${next.pos.line} on column ${next.pos.column}")
}
}
but, the compilation failed with the following error message
[error] ... MiniLogicParser.scala:15 type mismatch;
[error] found : Any
[error] required: Bool[?]
[error] def not : Parser[_] = NEG ~ formula ^^ {case ( "!" ~ formula) => Not(formula)}
I can partly understand the error message; i.e., it means for line 15 where I tried to convert the result of parsing to case class, type mismatch is occurring. However, I do not understand how to fix this error.

I've adapted your parser a little bit.
import scala.util.parsing.combinator._
sealed trait Bool[+A]
case object True extends Bool[Nothing]
case class Var[A](label: A) extends Bool[A]
case class Not[A](child: Bool[A]) extends Bool[A]
case class And[A](l: Bool[A], r: Bool[A]) extends Bool[A]
object LogicParser extends RegexParsers with App {
override def skipWhitespace = true
def NEG = "!"
def CONJUNCT = "&&"
def LP = '('
def RP = ')'
def TRUE = literal("TRUE") ^^ { case _ => True }
def textdata = "[a-zA-Z0-9]+".r ^^ { case x => Var(x) }
def formula: Parser[Bool[_]] = textdata | and | not | TRUE
def not = NEG ~ formula ^^ { case n ~ f => Not(f) }
def and = LP ~> formula ~ CONJUNCT ~ formula <~ RP ^^ { case f1 ~ c ~ f2 => And(f1, f2) }
def apply(input: String): Either[String, Any] = parseAll(formula, input) match {
case Success(logicData, next) => Right(logicData)
case NoSuccess(errorMessage, next) => Left(s"$errorMessage on line ${next.pos.line} on column ${next.pos.column}")
}
println(apply("TRUE")) // Right(Var(TRUE))
println(apply("(A && B)")) // Right(And(Var(A),Var(B)))
println(apply("((A && B) && C)")) // Right(And(And(Var(A),Var(B)),Var(C)))
println(apply("!(A && !B)")) // Right(Not(And(Var(A),Not(Var(B)))))
}

The child of the Not-node is of type Bool. In line 15 however, formula, the value that you want to pass to Not's apply method, is of type Any. You can restrict the extractor (i.e., the case-statement) to only match values of formula that are of type Bool by adding the type information after a colon:
case ( "!" ~ (formula: Bool[_]))
Hence, the not method would look like this:
def not : Parser[_] = NEG ~ formula ^^ {case ( "!" ~ (formula: Bool[_])) => Not(formula)}
However, now, e.g., "!TRUE" does not match anymore, because "TRUE" is not yet of type Bool. This can be fixed by extending your parser to convert the string to a Bool, e.g.,
def TRUE = "TRUE" ^^ (_ => True)

Related

Creating AST for arithmetic expression in Scala

I would like to make an AST for arithmetic expression using fastparse from Scala.
For me a arithmetic expression is like:
var_name := value; // value can be an integer, or a whole expression
For the moment I have this parsers:
def word[_:P] = P((CharIn("a-z") | CharIn("A-Z") | "_").rep(1).!)
def digits[_ : P] = P(CharIn("0-9").rep.!)
def div_mul[_: P] = P( digits~ space.? ~ (("*" | "/").! ~ space.? ~/ digits).rep ).map(eval)
def add_sub[_: P] = P( div_mul ~ space.? ~ (("+" | "-").! ~ space.? ~/ div_mul).rep ).map(eval)
def expr[_: P]= P( " ".rep ~ add_sub ~ " ".rep ~ End )
def var_assig[_:P] = P(word ~ " " ~ ":=" ~ " " ~ (value | expr) ~ ";")
I want to create AST for arithmetic expression (2+3*2 for example).
Expected result: Assignment[2,plus[mult,[3,2]]] // symbol[left, right]
My questions is:
What should be like the Tree class/object, if it is necessary, because I want to evaluate that result? This class I will use for the rest parse(if, while).
What should be like the eval function, who takes the input an string, or Seq[String] and return a AST with my expected result?
Here is my way of doing it.
I have defined the components of the Arithmetic Expression using the following Trait:
sealed trait Expression
case class Add(l: Expression, r: Expression) extends Expression
case class Sub(l: Expression, r: Expression) extends Expression
case class Mul(l: Expression, r: Expression) extends Expression
case class Div(l: Expression, r: Expression) extends Expression
case class Num(value: String) extends Expression
And defined the following fastparse patterns (similar to what is described here: https://com-lihaoyi.github.io/fastparse/#Math)
def number[_: P]: P[Expression] = P(CharIn("0-9").rep(1)).!.map(Num)
def parens[_: P]: P[Expression] = P("(" ~/ addSub ~ ")")
def factor[_: P]: P[Expression] = P(number | parens)
def divMul[_: P]: P[Expression] = P(factor ~ (CharIn("*/").! ~/ factor).rep).map(astBuilder _ tupled)
def addSub[_: P]: P[Expression] = P(divMul ~ (CharIn("+\\-").! ~/ divMul).rep).map(astBuilder _ tupled)
def expr[_: P]: P[Expression] = P(addSub ~ End)
Instead of the eval function that was used in the map, I have written a similar one, which returns a folded entity of the previously defined case classes:
def astBuilder(initial: Expression, rest: Seq[(String, Expression)]): Expression = {
rest.foldLeft(initial) {
case (left, (operator, right)) =>
operator match {
case "*" => Mul(left, right)
case "/" => Div(left, right)
case "+" => Add(left, right)
case "-" => Sub(left, right)
}
}
}
And if we would run the following expression:
val Parsed.Success(res, _) = parse("2+3*2", expr(_))
The result would be: Add(Num(2),Mul(Num(3),Num(2)))

Stack overflow in mutually recursive scala parser

So, I'm working on this thing in scala to try to parse arithmetic expressions. I have this below where an expr can either be an add of two exprs or an integer constant, but it gets stuck in an infinite loop of add calling expr calling add calling expr... I'm pretty new to scala, but not to parsing. I know I'm doing something wrong, but the real question is, it it something simple?
import scala.util.parsing.combinator._
abstract class Expr
case class Add(x: Expr, y: Expr) extends Expr
case class Constant(con: String) extends Expr
class Comp extends RegexParsers {
def integer:Parser[Expr] = """-?\d+""".r ^^ {
s => Constant(s)
}
def add: Parser[Expr] = expr ~ "+" ~ expr ^^ {
case(a ~ "+" ~ b) => Add(a, b)
}
def expr: Parser[Expr] = (add | integer)
}
object Compiler extends Comp {
def main(args: Array[String]) = parse(expr, "5+ -3"))//println("5+ -3")
}
Basic RegexParsers can't parse left-recursive grammars. To make it work, you can either modify the rule for add to remove left-recursiveness:
def add: Parser[Expr] = integer ~ "+" ~ expr ^^ {
case(a ~ "+" ~ b) => Add(a, b)
}
or use PackratParsers, which can parse such grammars:
class Comp extends RegexParsers with PackratParsers {
lazy val integer:PackratParser[Expr] = """-?\d+""".r ^^ {
s => Constant(s)
}
lazy val add: PackratParser[Expr] = expr ~ "+" ~ expr ^^ {
case(a ~ "+" ~ b) => Add(a, b)
}
lazy val expr: PackratParser[Expr] = (add | integer)
}
object Compiler extends Comp {
def main(args: Array[String]) = parseAll(expr, "5+ -3")
}

scala parser combinator infinite loop

I'm trying to write a simple parser in scala but when I add a repeated token Scala seems to get stuck in an infinite loop.
I have 2 parse methods below. One uses rep(). The non repetitive version works as expected (not what I want though) using the rep() version results in an infinite loop.
EDIT:
This was a learning example where I tired to enforce the '=' was surrounded by whitespace.
If it is helpful this is my actual test file:
a = 1
b = 2
c = 1 2 3
I was able to parse: (with the parse1 method)
K = V
but then ran into this problem when tried to expand the exercise out to:
K = V1 V2 V3
import scala.util.parsing.combinator._
import scala.io.Source.fromFile
class MyParser extends RegexParsers {
override def skipWhitespace(): Boolean = { false }
def key: Parser[String] = """[a-zA-Z]+""".r ^^ { _.toString }
def eq: Parser[String] = """\s+=\s+""".r ^^ { _.toString.trim }
def string: Parser[String] = """[^ \t\n]*""".r ^^ { _.toString.trim }
def value: Parser[List[String]] = rep(string)
def foo(key: String, value: String): Boolean = {
println(key + " = " + value)
true
}
def parse1: Parser[Boolean] = key ~ eq ~ string ^^ { case k ~ eq ~ string => foo(k, string) }
def parse2: Parser[Boolean] = key ~ eq ~ value ^^ { case k ~ eq ~ value => foo(k, value.toString) }
def parseLine(line: String): Boolean = {
parse(parse2, line) match {
case Success(matched, _) => true
case Failure(msg, _) => false
case Error(msg, _) => false
}
}
}
object TestParser {
def usage() = {
System.out.println("<file>")
}
def main(args: Array[String]) : Unit = {
if (args.length != 1) {
usage()
} else {
val mp = new MyParser()
fromFile(args(0)).getLines().foreach { mp.parseLine }
println("done")
}
}
}
Next time, please provide some concrete examples, it's not obvious what your input is supposed to look like.
Meanwhile, you can try this, maybe you find it helpful:
import scala.util.parsing.combinator._
import scala.io.Source.fromFile
class MyParser extends JavaTokenParsers {
// override def skipWhitespace(): Boolean = { false }
def key: Parser[String] = """[a-zA-Z]+""".r ^^ { _.toString }
def eq: Parser[String] = "="
def string: Parser[String] = """[^ \t\n]+""".r
def value: Parser[List[String]] = rep(string)
def foo(key: String, value: String): Boolean = {
println(key + " = " + value)
true
}
def parse1: Parser[Boolean] = key ~ eq ~ string ^^ { case k ~ eq ~ string => foo(k, string) }
def parse2: Parser[Boolean] = key ~ eq ~ value ^^ { case k ~ eq ~ value => foo(k, value.toString) }
def parseLine(line: String): Boolean = {
parseAll(parse2, line) match {
case Success(matched, _) => true
case Failure(msg, _) => false
case Error(msg, _) => false
}
}
}
val mp = new MyParser()
for (line <- List("hey = hou", "hello = world ppl", "foo = bar baz blup")) {
println(mp.parseLine(line))
}
Explanation:
JavaTokenParsers and RegexParsers treat white space differently.
The JavaTokenParsers handles the white space for you, it's not specific for Java, it works for most non-esoteric languages. As long as you are not trying to parse Whitespace, JavaTokenParsers is a good starting point.
Your string definition included a *, which caused the infinite recursion.
Your eq definition included something that messed with the empty space handling (don't do this unless it's really necessary).
Furthermore, if you want to parse the whole line, you must call parseAll,
otherwise it parses only the beginning of the string in non-greedy manner.
Final remark: for parsing key-value pairs line by line, some String.split and
String.trim would be completely sufficient. Scala Parser Combinators are a little overkill for that.
PS: Hmm... Did you want to allow =-signs in your key-names? Then my version would not work here, because it does not enforce an empty space after the key-name.
This is not a duplicate, it's a different version with RegexParsers that takes care of whitespace explicitly
If you for some reason really care about the white space, then you could stick to the RegexParsers, and do the following (notice the skipWhitespace = false, explicit parser for whitespace ws, the two ws with squiglies around the equality sign, and the repsep with explicitly specified ws):
import scala.util.parsing.combinator._
import scala.io.Source.fromFile
class MyParser extends RegexParsers {
override def skipWhitespace(): Boolean = false
def ws: Parser[String] = "[ \t]+".r
def key: Parser[String] = """[a-zA-Z]+""".r ^^ { _.toString }
def eq: Parser[String] = ws ~> """=""" <~ ws
def string: Parser[String] = """[^ \t\n]+""".r
def value: Parser[List[String]] = repsep(string, ws)
def foo(key: String, value: String): Boolean = {
print(key + " = " + value)
true
}
def parse1: Parser[Boolean] = (key ~ eq ~ string) ^^ { case k ~ e ~ v => foo(k, v) }
def parse2: Parser[Boolean] = (key ~ eq ~ value) ^^ { case k ~ e ~ v => foo(k, v.toString) }
def parseLine(line: String): Boolean = {
parseAll(parse2, line) match {
case Success(matched, _) => true
case Failure(msg, _) => false
case Error(msg, _) => false
}
}
}
val mp = new MyParser()
for (line <- List("hey = hou", "hello = world ppl", "foo = bar baz blup", "foo= bar baz", "foo =bar baz")) {
println(" (Matches: " + mp.parseLine(line) + ")")
}
Now the parser rejects the lines where there is no whitespace around the equal sign:
hey = List(hou) (Matches: true)
hello = List(world, ppl) (Matches: true)
foo = List(bar, baz, blup) (Matches: true)
(Matches: false)
(Matches: false)
The bug with * instead of + in string has been removed, just like in the previous version.

Transforming Parser[Any] to a Stricter Type

Programming in Scala's Chapter 33 explains Combinator Parsing:
It provides this example:
import scala.util.parsing.combinator._
class Arith extends JavaTokenParsers {
def expr: Parser[Any] = term~rep("+"~term | "-"~term)
def term: Parser[Any] = factor~rep("*"~factor | "/"~factor)
def factor: Parser[Any] = floatingPointNumber | "("~expr~")"
}
How can I map expr to a narrower type than Parser[Any]? In other words,
I'd like to take def expr: Parser[Any] and map that via ^^ into a stricter type.
Note - I asked this question in Scala Google Groups - https://groups.google.com/forum/#!forum/scala-user, but haven't received a complete answer that helped me out.
As already stated in the comments, you can narrow down the type to anything you like. You just have to specify it after the ^^.
Here is a complete example with a data structure from your given code.
object Arith extends JavaTokenParsers {
trait Expression //The data structure
case class FNumber(value: Float) extends Expression
case class Plus(e1: Expression, e2: Expression) extends Expression
case class Minus(e1: Expression, e2: Expression) extends Expression
case class Mult(e1: Expression, e2: Expression) extends Expression
case class Div(e1: Expression, e2: Expression) extends Expression
def expr: Parser[Expression] = term ~ rep("+" ~ term | "-" ~ term) ^^ {
case term ~ rest => rest.foldLeft(term)((result, elem) => elem match {
case "+" ~ e => Plus(result, e)
case "-" ~ e => Minus(result, e)
})
}
def term: Parser[Expression] = factor ~ rep("*" ~ factor | "/" ~ factor) ^^ {
case factor ~ rest => rest.foldLeft(factor)((result, elem) => elem match {
case "*" ~ e => Mult(result, e)
case "/" ~ e => Div(result, e)
})
}
def factor: Parser[Expression] = floatingPointNumber ^^ (f => FNumber(f.toFloat)) | "(" ~> expr <~ ")"
def parseInput(input: String): Expression = parse(expr, input) match {
case Success(ex, _) => ex
case _ => throw new IllegalArgumentException //or change the result to Try[Expression]
}
}
Now we can start to parse something.
Arith.parseInput("(1.3 + 2.0) * 2")
//yields: Mult(Plus(FNumber(1.3),FNumber(2.0)),FNumber(2.0))
Of course you can also have a Parser[String] or a Parser[Float], where you directly transform or evaluate the input String. It is as I said up to you.

Idiomatic Scala way of deserializing delimited strings into case classes

Suppose I was dealing with a simple colon-delimited text protocol that looked something like:
Event:005003:information:2013 12 06 12 37 55:n3.swmml20861:1:Full client swmml20861 registered [entry=280 PID=20864 queue=0x4ca9001b]
RSET:m3node:AUTRS:1-1-24:A:0:LOADSHARE:INHIBITED:0
M3UA_IP_LINK:m3node:AUT001LKSET1:AUT001LK1:r
OPC:m3node:1-10-2(P):A7:NAT0
....
I'd like to deserialize each line as an instance of a case class, but in a type-safe way. My first attempt uses type classes to define 'read' methods for each possible type that I can encounter, in addition to the 'tupled' method on the case class to get back a function that can be applied to a tuple of arguments, something like the following:
case class Foo(a: String, b: Integer)
trait Reader[T] {
def read(s: String): T
}
object Reader {
implicit object StringParser extends Reader[String] { def read(s: String): String = s }
implicit object IntParser extends Reader[Integer] { def read(s: String): Integer = s.toInt }
}
def create[A1, A2, Ret](fs: Seq[String], f: ((A1, A2)) => Ret)(implicit A1Reader: Reader[A1], A2Reader: Reader[A2]): Ret = {
f((A1Reader.read(fs(0)), A2Reader.read(fs(1))))
}
create(Seq("foo", "42"), Foo.tupled) // gives me a Foo("foo", 42)
The problem though is that I'd need to define the create method for each tuple and function arity, so that means up to 22 versions of create. Additionally, this doesn't take care of validation, or receiving corrupt data.
As there is a Shapeless tag, a possible solution using it, but I'm not an expert and I guess one can do better :
First, about the lack of validation, you should simply have read return Try, or scalaz.Validation or just option if you do not care about an error message.
Then about boilerplate, you may try to use HList. This way you don't need to go for all the arities.
import scala.util._
import shapeless._
trait Reader[+A] { self =>
def read(s: String) : Try[A]
def map[B](f: A => B): Reader[B] = new Reader[B] {
def read(s: String) = self.read(s).map(f)
}
}
object Reader {
// convenience
def apply[A: Reader] : Reader[A] = implicitly[Reader[A]]
def read[A: Reader](s: String): Try[A] = implicitly[Reader[A]].read(s)
// base types
implicit object StringReader extends Reader[String] {
def read(s: String) = Success(s)
}
implicit object IntReader extends Reader[Int] {
def read(s: String) = Try {s.toInt}
}
// HLists, parts separated by ":"
implicit object HNilReader extends Reader[HNil] {
def read(s: String) =
if (s.isEmpty()) Success(HNil)
else Failure(new Exception("Expect empty"))
}
implicit def HListReader[A : Reader, H <: HList : Reader] : Reader[A :: H]
= new Reader[A :: H] {
def read(s: String) = {
val (before, colonAndBeyond) = s.span(_ != ':')
val after = if (colonAndBeyond.isEmpty()) "" else colonAndBeyond.tail
for {
a <- Reader.read[A](before)
b <- Reader.read[H](after)
} yield a :: b
}
}
}
Given that, you have a reasonably short reader for Foo :
case class Foo(a: Int, s: String)
object Foo {
implicit val FooReader : Reader[Foo] =
Reader[Int :: String :: HNil].map(Generic[Foo].from _)
}
It works :
println(Reader.read[Foo]("12:text"))
Success(Foo(12,text))
Without scalaz and shapeless, I think the ideomatic Scala way to parse some input are Scala parser combinators. In your example, I would try something like this:
import org.joda.time.DateTime
import scala.util.parsing.combinator.JavaTokenParsers
val input =
"""Event:005003:information:2013 12 06 12 37 55:n3.swmml20861:1:Full client swmml20861 registered [entry=280 PID=20864 queue=0x4ca9001b]
|RSET:m3node:AUTRS:1-1-24:A:0:LOADSHARE:INHIBITED:0
|M3UA_IP_LINK:m3node:AUT001LKSET1:AUT001LK1:r
|OPC:m3node:1-10-2(P):A7:NAT0""".stripMargin
trait LineContent
case class Event(number : Int, typ : String, when : DateTime, stuff : List[String]) extends LineContent
case class Reset(node : String, stuff : List[String]) extends LineContent
case class Other(typ : String, stuff : List[String]) extends LineContent
object LineContentParser extends JavaTokenParsers {
override val whiteSpace=""":""".r
val space="""\s+""".r
val lineEnd = """"\n""".r //"""\s*(\r?\n\r?)+""".r
val field = """[^:]*""".r
def stuff : Parser[List[String]] = rep(field)
def integer : Parser[Int] = log(wholeNumber ^^ {_.toInt})("integer")
def date : Parser[DateTime] = log((repsep(integer, space) filter (_.length == 6)) ^^ (l =>
new DateTime(l(0), l(1), l(2), l(3), l(4), l(5), 0)
))("date")
def event : Parser[Event] = "Event" ~> integer ~ field ~ date ~ stuff ^^ {
case number~typ~when~stuff => Event(number, typ, when, stuff)}
def reset : Parser[Reset] = "RSET" ~> field ~ stuff ^^ { case node~stuff =>
Reset(node, stuff)
}
def other : Parser[Other] = ("M3UA_IP_LINK" | "OPC") ~ stuff ^^ { case typ~stuff =>
Other(typ, stuff)
}
def line : Parser[LineContent] = event | reset | other
def lines = repsep(line, lineEnd)
def parseLines(s : String) = parseAll(lines, s)
}
LineContentParser.parseLines(input)
The patterns in the parser combinators are self explanatory. I always convert each successfully parsed chunk as early as possible to an partial result. Then the partial results will be combined to the final result.
A hint for debugging: You can always add the log parser. It will print before and after when a rule is applied. Together with the given name (e.g. "date") it will also print the current position of the input source, where the rule is applied and when applicable the parsed partial result.
An example output looks like this:
trying integer at scala.util.parsing.input.CharSequenceReader#108589b
integer --> [1.13] parsed: 5003
trying date at scala.util.parsing.input.CharSequenceReader#cec2e3
trying integer at scala.util.parsing.input.CharSequenceReader#cec2e3
integer --> [1.30] parsed: 2013
trying integer at scala.util.parsing.input.CharSequenceReader#14da3
integer --> [1.33] parsed: 12
trying integer at scala.util.parsing.input.CharSequenceReader#1902929
integer --> [1.36] parsed: 6
trying integer at scala.util.parsing.input.CharSequenceReader#17e4dce
integer --> [1.39] parsed: 12
trying integer at scala.util.parsing.input.CharSequenceReader#1747fd8
integer --> [1.42] parsed: 37
trying integer at scala.util.parsing.input.CharSequenceReader#1757f47
integer --> [1.45] parsed: 55
date --> [1.45] parsed: 2013-12-06T12:37:55.000+01:00
I think this is an easy and maintainable way to parse input into well typed Scala objects. It is all in the core Scala API, hence I would call it "idiomatic". When typing the example code in an Idea Scala worksheet, completion and type information worked very well. So this way seems to well supported by the IDEs.