Stack overflow in mutually recursive scala parser - scala

So, I'm working on this thing in scala to try to parse arithmetic expressions. I have this below where an expr can either be an add of two exprs or an integer constant, but it gets stuck in an infinite loop of add calling expr calling add calling expr... I'm pretty new to scala, but not to parsing. I know I'm doing something wrong, but the real question is, it it something simple?
import scala.util.parsing.combinator._
abstract class Expr
case class Add(x: Expr, y: Expr) extends Expr
case class Constant(con: String) extends Expr
class Comp extends RegexParsers {
def integer:Parser[Expr] = """-?\d+""".r ^^ {
s => Constant(s)
}
def add: Parser[Expr] = expr ~ "+" ~ expr ^^ {
case(a ~ "+" ~ b) => Add(a, b)
}
def expr: Parser[Expr] = (add | integer)
}
object Compiler extends Comp {
def main(args: Array[String]) = parse(expr, "5+ -3"))//println("5+ -3")
}

Basic RegexParsers can't parse left-recursive grammars. To make it work, you can either modify the rule for add to remove left-recursiveness:
def add: Parser[Expr] = integer ~ "+" ~ expr ^^ {
case(a ~ "+" ~ b) => Add(a, b)
}
or use PackratParsers, which can parse such grammars:
class Comp extends RegexParsers with PackratParsers {
lazy val integer:PackratParser[Expr] = """-?\d+""".r ^^ {
s => Constant(s)
}
lazy val add: PackratParser[Expr] = expr ~ "+" ~ expr ^^ {
case(a ~ "+" ~ b) => Add(a, b)
}
lazy val expr: PackratParser[Expr] = (add | integer)
}
object Compiler extends Comp {
def main(args: Array[String]) = parseAll(expr, "5+ -3")
}

Related

Scala Parser Combinator

I am trying to write a Scala Parser combinator for the following input.
The input can be
10
(10)
((10)))
(((10)))
Here the number of brackets can keep on growing. but they should always match. So parsing should fail for ((((10)))
The result of parsing should always be the number at the center
I wrote the following parser
import scala.util.parsing.combinator._
class MyParser extends RegexParsers {
def i = "[0-9]+".r ^^ (_.toInt)
def n = "(" ~ i ~ ")" ^^ {case _ ~ b ~ _ => b.toInt}
def expr = i | n
}
val parser = new MyParser
parser.parseAll(parser.expr, "10")
parser.parseAll(parser.expr, "(10)")
but now how do I handle the case where the number of brackets keep growing but matched?
Easy, just make the parser recursive:
class MyParser extends RegexParsers {
def i = "[0-9]+".r ^^ (_.toInt)
def expr: Parser[Int] = i | "(" ~ expr ~ ")" ^^ {case _ ~ b ~ _ => b.toInt}
}
(but note that scala-parser-combinators has trouble with left-recursive definitions: Recursive definitions with scala-parser-combinators)

Scala parser combinator with recursive structures

I'm a beginner with Scala, and now learning Scala parser combinator, writing "MiniLogicParser", a mini parser for propositional logic formula. I am successful for parsing it partly, but can not convert to case class. I tried some codes like below.
import java.io._
import scala.util.parsing.combinator._
sealed trait Bool[+A]
case object True extends Bool[Nothing]
case class Var[A](label: A) extends Bool[A]
case class Not[A](child: Bool[A]) extends Bool[A]
case class And[A](children: List[Bool[A]]) extends Bool[A]
object LogicParser extends RegexParsers {
override def skipWhitespace = true
def formula = ( TRUE | not | and | textdata )
def TRUE = "TRUE"
def not : Parser[_] = NEG ~ formula ^^ {case ( "!" ~ formula) => Not(formula)}
def and : Parser[_] = LPARENTHESIS ~ formula ~ opt(CONJUNCT ~ formula) ~ RPARENTHESIS
def NEG = "!"
def CONJUNCT = "&&"
def LPARENTHESIS = '('
def RPARENTHESIS = ')'
def textdata = "[a-zA-Z0-9]+".r
def apply(input: String): Either[String, Any] = parseAll(formula, input) match {
case Success(logicData, next) => Right(logicData)
case NoSuccess(errorMessage, next) => Left(s"$errorMessage on line ${next.pos.line} on column ${next.pos.column}")
}
}
but, the compilation failed with the following error message
[error] ... MiniLogicParser.scala:15 type mismatch;
[error] found : Any
[error] required: Bool[?]
[error] def not : Parser[_] = NEG ~ formula ^^ {case ( "!" ~ formula) => Not(formula)}
I can partly understand the error message; i.e., it means for line 15 where I tried to convert the result of parsing to case class, type mismatch is occurring. However, I do not understand how to fix this error.
I've adapted your parser a little bit.
import scala.util.parsing.combinator._
sealed trait Bool[+A]
case object True extends Bool[Nothing]
case class Var[A](label: A) extends Bool[A]
case class Not[A](child: Bool[A]) extends Bool[A]
case class And[A](l: Bool[A], r: Bool[A]) extends Bool[A]
object LogicParser extends RegexParsers with App {
override def skipWhitespace = true
def NEG = "!"
def CONJUNCT = "&&"
def LP = '('
def RP = ')'
def TRUE = literal("TRUE") ^^ { case _ => True }
def textdata = "[a-zA-Z0-9]+".r ^^ { case x => Var(x) }
def formula: Parser[Bool[_]] = textdata | and | not | TRUE
def not = NEG ~ formula ^^ { case n ~ f => Not(f) }
def and = LP ~> formula ~ CONJUNCT ~ formula <~ RP ^^ { case f1 ~ c ~ f2 => And(f1, f2) }
def apply(input: String): Either[String, Any] = parseAll(formula, input) match {
case Success(logicData, next) => Right(logicData)
case NoSuccess(errorMessage, next) => Left(s"$errorMessage on line ${next.pos.line} on column ${next.pos.column}")
}
println(apply("TRUE")) // Right(Var(TRUE))
println(apply("(A && B)")) // Right(And(Var(A),Var(B)))
println(apply("((A && B) && C)")) // Right(And(And(Var(A),Var(B)),Var(C)))
println(apply("!(A && !B)")) // Right(Not(And(Var(A),Not(Var(B)))))
}
The child of the Not-node is of type Bool. In line 15 however, formula, the value that you want to pass to Not's apply method, is of type Any. You can restrict the extractor (i.e., the case-statement) to only match values of formula that are of type Bool by adding the type information after a colon:
case ( "!" ~ (formula: Bool[_]))
Hence, the not method would look like this:
def not : Parser[_] = NEG ~ formula ^^ {case ( "!" ~ (formula: Bool[_])) => Not(formula)}
However, now, e.g., "!TRUE" does not match anymore, because "TRUE" is not yet of type Bool. This can be fixed by extending your parser to convert the string to a Bool, e.g.,
def TRUE = "TRUE" ^^ (_ => True)

How to use scala combinators to give arbitrary values to expressions

Lets say I want to parse a string in scala, and every time there were parenthesis nested within each other I would multiply some number with itself . Ex
(()) +() + ((())) with number=3 would be 3*3 + 3 + 3*3*3. How would I do this with scala combinators.
class SimpleParser extends JavaTokenParsers {
def Base:Parser[Int] = """(""" ~remainder ~ """)"""
def Plus = atom ~ '+' ~ remainder
def Parens = Base
def remainder:Parser[Int] =(Next|Start) }
How would I make it so that every time an atom is parsed the number would multiply by itself, and then what was inside the atom will also be parsed?
would I put a method after the atom def like
def Base:Parser[Int] = """(""" ~remainder ~ """)""" ^^(2*paser(remainder))
? I don't understand how to do this because of the recursive nature of it, as if I find parenthesis, I must then multiply by three times whatever is in these parenthesis.
This is easiest if you build up the number from the inside out. For the parenthetical groups, we start with the base case (which will result in simply the number itself), and then add the number again for each nesting. For the sum, we start with a single parenthetical group and then optionally add summands until we run out:
import scala.util.parsing.combinator.JavaTokenParsers
class SimpleParser(number: Int) extends JavaTokenParsers {
def base: Parser[Int] = literal("()").map(_ => number)
def pars: Parser[Int] = base | ("(" ~> pars <~ ")").map(_ + number)
def plus: Parser[Int] = "+" ~> expr
def expr: Parser[Int] = (pars ~ opt(plus).map(_.getOrElse(0))).map {
case first ~ rest => first + rest
}
}
object ParserWith3 extends SimpleParser(3)
And then:
scala> ParserWith3.parseAll(ParserWith3.expr, "(())+()+((()))")
res0: ParserWith3.ParseResult[Int] = [1.15] parsed: 18
I'm using map because I can't stand the parsing library's little operator party, but you could replace all the maps with ^^ or ^^^ if you really wanted to.
If you use the fact that you can build right recursive rules using scala parser combinators(here mult appears on the right of its own definition for example):
import scala.util.parsing.combinator.RegexParsers
trait ExprsParsers extends RegexParsers {
val value = 3
lazy val mult: Parser[Int] =
"(" ~> mult <~ ")" ^^ { _ * value } |||
"()" ^^ { _ => value }
lazy val plus: Parser[Int] =
(mult <~ "+") ~ plus ^^ { case m ~ p => m + p } |||
mult
}
To use that code you simply create a structure that inherits ExprsParsers, e.g. :
object MainObj extends ExprsParsers {
def main(args: Array[String]): Unit = {
println(parseAll(plus, "() + ()")) //[1.8] parsed: 6
println(parseAll(plus, "() + (())")) //[1.10] parsed: 12
println(parseAll(plus, "((())) + ()")) //[1.12] parsed: 30
}
}
check scala source file for parser for any operator you don't understand.

Transforming Parser[Any] to a Stricter Type

Programming in Scala's Chapter 33 explains Combinator Parsing:
It provides this example:
import scala.util.parsing.combinator._
class Arith extends JavaTokenParsers {
def expr: Parser[Any] = term~rep("+"~term | "-"~term)
def term: Parser[Any] = factor~rep("*"~factor | "/"~factor)
def factor: Parser[Any] = floatingPointNumber | "("~expr~")"
}
How can I map expr to a narrower type than Parser[Any]? In other words,
I'd like to take def expr: Parser[Any] and map that via ^^ into a stricter type.
Note - I asked this question in Scala Google Groups - https://groups.google.com/forum/#!forum/scala-user, but haven't received a complete answer that helped me out.
As already stated in the comments, you can narrow down the type to anything you like. You just have to specify it after the ^^.
Here is a complete example with a data structure from your given code.
object Arith extends JavaTokenParsers {
trait Expression //The data structure
case class FNumber(value: Float) extends Expression
case class Plus(e1: Expression, e2: Expression) extends Expression
case class Minus(e1: Expression, e2: Expression) extends Expression
case class Mult(e1: Expression, e2: Expression) extends Expression
case class Div(e1: Expression, e2: Expression) extends Expression
def expr: Parser[Expression] = term ~ rep("+" ~ term | "-" ~ term) ^^ {
case term ~ rest => rest.foldLeft(term)((result, elem) => elem match {
case "+" ~ e => Plus(result, e)
case "-" ~ e => Minus(result, e)
})
}
def term: Parser[Expression] = factor ~ rep("*" ~ factor | "/" ~ factor) ^^ {
case factor ~ rest => rest.foldLeft(factor)((result, elem) => elem match {
case "*" ~ e => Mult(result, e)
case "/" ~ e => Div(result, e)
})
}
def factor: Parser[Expression] = floatingPointNumber ^^ (f => FNumber(f.toFloat)) | "(" ~> expr <~ ")"
def parseInput(input: String): Expression = parse(expr, input) match {
case Success(ex, _) => ex
case _ => throw new IllegalArgumentException //or change the result to Try[Expression]
}
}
Now we can start to parse something.
Arith.parseInput("(1.3 + 2.0) * 2")
//yields: Mult(Plus(FNumber(1.3),FNumber(2.0)),FNumber(2.0))
Of course you can also have a Parser[String] or a Parser[Float], where you directly transform or evaluate the input String. It is as I said up to you.

General type parameter for nested lists

I'm trying to write a parser which should parse a Prolog list (for example [1,2,3,4]) into the corresponding Scala List. I programmed the parser with Scalas parsing combinators.
My parser looks like this so far:
class PListParser extends JavaTokenParsers{
def list:Parser[List[Any]] = "[" ~> listArgs <~ "]"
def listArgs:Parser[List[Any]] = list | repsep(args, ",")
def args:Parser[String] = "(.)*".r
}
Is there any possibility to turn the type parameters of the first two parsers into something more specific? Like a general parameter for nested lists of arbitrary dimension but the same underling type.
I think it should be trees, that is the proper structure for lists nested to arbitrary depth
sealed trait Tree[A]
case class Leaf[A](value: A) extends Tree[A] {
override def toString: String = value.toString
}
case class Node[A](items: List[Tree[A]]) extends Tree[A] {
override def toString: String = "Node(" + items.mkString(", ") + ")"
}
(Do the toString as you like, but I think the default one are rather too verbose)
Then, with minor fixes to your grammar (+ parse method, just to test easily on REPL)
object PrologListParser extends JavaTokenParsers{
    def list:Parser[Tree[String]] = "[" ~> listArgs <~ "]"
    def listArgs:Parser[Tree[String]] = repsep(list | args, ",") ^^ {Node(_)}
    def args:Parser[Tree[String]] = """([^,\[\]])*""".r ^^ {Leaf(_)}
def parse(s: String): ParseResult[Tree[String]] = parse(list, s)
}
PrologListParser.parse("[a, b, [c, [d, e], f, [g], h], [i, j], k]")
res0: PrologList.ParseResult[Tree[String]] = [1.42] parsed: Node(a, b, Node(c, Node(d, e), f, Node(g), h), Node(i, j), k)
(Not tested)
sealed trait PrologTerm
case class PInt(i: Integer) extends PrologTerm {
override def toString = i.toString
}
case class PAtom(s: String) extends PrologTerm {
override def toString = s.toString
}
case class PComplex(f: PAtom, args: List[PrologTerm]) extends PrologTerm {
override def toString = f.toString+"("+args.mkString(", ")+")"
}
case class PList(items: List[PrologTerm]) extends PrologTerm {
override def toString = "["+items.mkString(", ")+"]"
}
object PrologListParser extends JavaTokenParsers {
def term : Parser[PrologTerm] = int | complex | atom | list
def int : Parser[PInt] = wholeNumber ^^ {s => PInt(s.toInt)}
def complex : Parser[PComplex] =
(atom ~ ("(" ~> repsep(term, ",") <~ ")")) ^^ {case f ~ args => PAtom(f, args)}
def atom : Parser[PAtom] = "[a-z][a-zA-Z_]*".r ^^ {PAtom(_)}
def list : Parser[PList] = ("[" ~> repsep(term, ",") <~ "]") ^^ {PList(_)}
}