Recursive definitions with scala-parser-combinators - scala

I have been trying to build a SQL parser with the scala-parser-combinator library, which I've simplified greatly into the code below.
class Expression
case class FalseExpr() extends Expression
case class TrueExpr() extends Expression
case class AndExpression(expr1: Expression, expr2: Expression) extends Expression
object SimpleSqlParser {
def parse(sql: String): Try[Expression] = new SimpleSqlParser().parse(sql)
}
class SimpleSqlParser extends RegexParsers {
def parse(sql: String): Try[_ <: Expression] = parseAll(expression, sql) match {
case Success(matched,_) => scala.util.Success(matched)
case Failure(msg,remaining) => scala.util.Failure(new Exception("Parser failed: "+msg + "remaining: "+ remaining.source.toString.drop(remaining.offset)))
case Error(msg,_) => scala.util.Failure(new Exception(msg))
}
private def expression: Parser[_ <: Expression] =
andExpr | falseExpr | trueExpr
private def falseExpr: Parser[FalseExpr] =
"false" ^^ (_ => FalseExpr())
private def trueExpr: Parser[TrueExpr] = "true" ^^ (_ => TrueExpr())
private def andExpr: Parser[Expression] =
expression ~ "and" ~ expression ^^ { case e1 ~ and ~ e2 => AndExpression(e1,e2)}
}
Without the 'and' parsing, it works fine. But I want to be able to parse things like 'true AND (false OR true)', for example. When I add the 'and' part to the definition of an expression, I get a StackOverflowError, the stack is alternating between the definitions of 'and' and 'expression'.
I understand why this is happening - the definition of expression begins with and, and vice-versa. But this seems like the most natural way to model this problem. In reality an expression could also be LIKE, EQUALS etc. Is there another way to model this kind of thing in general in order to get around the problem of recursive definitions.

scala.util.parsing.combinator.RegexParsers cannot handle left-recursive grammars. Your grammar can be summarized by the following production rules:
expression -> andExpr | falseExpr | trueExpr
...
andExpr -> expression "and" expression
expression is indirectly left-recursive via andExpr.
To avoid the infinite recursion, you need to reformulate the grammar so that it is not left-recursive anymore. One frequently-used way is to use repetition combinators, such as chainl1:
private def expression: Parser[_ <: Expression] =
chainl1(falseExpr | trueExpr, "and" ^^^ { AndExpression(_, _) })
Live on Scastie
The new expression matches one or more falseExpr/trueExpr, separated by "and", and combines the matched elements with AndExpression in a left-associative way. Conceptually, it corresponds to the following production rule:
expression -> (falseExpr | trueExpr) ("and" (falseExpr | trueExpr))*
If your grammar contains many tangled left-recursive production rules, you might want to consider other parser combinator libraries, such as GLL combinators, that directly support left recursion.

Related

What is the ^^^ operator in Scala?

I have seen in some Scala code the use of ^^^ and don't understand its use, nor can see any documentation about it. Also I'm pretty sure it doesn't come from any external library, but it may be the case, maybe ? I know that ^^ is an operator for .map() but don't know if there is a similarity when you had a third ^.
Example of use :
case object TypeFoo extends Type {
override def toString() = "Foo"
}
case object TypeBar extends Type {
override def toString() = "Bar"
}
def repType= (
"Foo" ^^^ TypeFoo
| "Bar" ^^^ TypeBar
)
From what I may understand it could be "is defined by", but I'm really not sure, hence my question.
The Javadoc for the ^^^ method says:
A parser combinator that changes a successful result into the specified value.
p ^^^ v succeeds if p succeeds; discards its result, and returns v instead.
#param v The new result for the
parser, evaluated at most once (if p succeeds), not evaluated at all
if p fails.
#return a parser that has the same behaviour as the
current parser, but whose successful result is v
In other words, "Foo" ^^^ TypeFoo is just a shorthand for "Foo" ^^ (_ => TypeFoo).

Scala Parser Combinator ignoring errors in optional elements

We are building a parser for a DSL that looks like SQL. Like in SQL it has multiple blocks namely select, where, order by, group by etc. We made some parts of the query like where, order by etc. optional. When we do this the parser ignores all input errors.
DSL Code Snippet
case class Query(select: Select, where: Where)
case class Select(cols: Seq[String])
case class Where(conditions: Seq[String])
object QueryLanguage extends StandardTokenParsers {
lazy val sql: Parser[Query] = select_block ~ (where_block?) ^^ {case slct ~ whr => Query(slct, whr.getOrElse(null))}
lazy val select_block: Parser[Select] = "SELECT" ~> rep1sep(ident, ",") ^^ { case cols => Select(cols) }
lazy val where_block: Parser[Where] = "WHERE" ~> rep1sep(ident, "and") ^^ { case conds => Where(conds) }
}
Any syntax errors in the select block are reported but not on other blocks. This is frustrating to someone who wants to have a where block in their query but they have no way to report errors in their definition.
For example, a SQL like shown below simply ignores the WHERE clause without reporting error (note and keyword is misspelled)
select A, B
WHERE a=1 andd b=2
The SQL parses correctly if I were to fix the spelling in the input or make the where caluse mandatory in the DSL like shown below
lazy val sql: Parser[Query] = select ~ where ~ (orderby?)
Is there another way to handle this or override this default behavior?

error: left- and right-associative operators with same precedence may not be mixed

I'm trying to make a URI DSL in Scala, but infix methods are really giving me trouble.
Even after committing the lengthy and very unintuitive precedence rules to memory, they are still giving me trouble.
class Foo {
def `://`(a: Unit) = this
def `:`(b: Unit) = this
}
object Foo {
def main(args: Array[String]): Unit = {
new Foo `://` {} `:` {}
}
}
yields
left- and right-associative operators with same precedence may not be mixed
new Foo `://` {} `:` {}
^
What does this mean? I thought all operators were left-associative.
Is there any way for me to write a DSL that looks like this?
"https" `://` "example.com" `:` 80
There are two troubles in operators name you have chosen:
Name :// contains double slash, so without backquotes compiler can misinterpret it as comment
Name : as all other operators ending with : creates right associative operator, this is handy for operators like :: or #:: for building sequences starting from head. Operators with different associativity are not allowed without parenthesis since it's not clear where you should start building you expression.
So my suggestion is get rid of double slash and colon-ending, create maybe a little bit confusing, but correct DSL syntax:
object URILanguage extends App {
case class URL(protocol: String, hostname: String, port: Option[Int] = None, path: Seq[String] = Nil) {
def %(port: Int) = copy(port = Some(port))
def /(component: String) = copy(path = path :+ component)
}
implicit class WithHostname(protocol: String) {
def ~(hostname: String) = URL(protocol, hostname)
}
println("http" ~ "example.com" % 8080 / "mysite" / "index.html")
}
From The Scala Language Specification, section 6.12.3:
The associativity of an operator is determined by the operator’s last
character. Operators ending in a colon ‘:’ are right-associative. All
other operators are left-associative
The compiler doesn't know whether to interpret your code as this:
80.`:`("https".`://`("example.com"))
or this:
"https".`://`(80.`:`("example.com"))
I don't think there's a way to prevent ':' from being treated as a right-associative operator. You could help the compiler out by using parentheses; otherwise, you have to change your operator names.

Scala: parsing multiple files using Scala's combinators

I am writing a DSL using Scala parser combinators and have a working version that can read a single file and parse it. However, I would like to split my input into several files where some files are 'standard' and can be used with any top-level file. What I would like is something like:
import "a.dsl"
import "b.dsl"
// rest of file using {a, b}
It isn't important what order the files are read in or that something is necessarily 'defined' before being referred to so parsing the top-level file first then parsing the closure of all imports into a single model is sufficient. I will then post-process the resulting model for my own purposes.
The question I have is, is there a reasonable way of accomplishing this? If necessary I could iterate over the closure, parsing each file into a separate model, and manually 'merge' the resulting models but this feels clunky and seems ugly to me.
BTW, I am using an extension of StandardTokenParsers, if that matters.
I think the only approach would be to open and parse the file indicated by the import directly. From there you can create a sub-expression tree for the module. You may not need to manually merge the trees when parsing, for example if your already using ^^ and/or ^^^ to return your own Expressions then you should be able to simply emit a relevant expression type in the correct place within the tree, for example:
import scala.util.parsing.combinator.syntactical.StandardTokenParsers
import scala.io.Source
object Example {
sealed trait Expr
case class Imports(modules: List[Module]) extends Expr
case class Module(modulePath: String, root: Option[Expr]) extends Expr
case class BracedExpr(x: String, y: String) extends Expr
case class Main(imports: Imports, braced: BracedExpr) extends Expr
class BlahTest extends StandardTokenParsers {
def importExpr: Parser[Module] = "import" ~> "\"" ~> stringLit <~ "\"" ^^ {
case modulePath =>
//you could use something other than `expr` below if you
//wanted to limit the expressions available in modules
//e.g. you could stop one module importing another.
phrase(expr)(new lexical.Scanner(Source.fromFile(modulePath).mkString)) match {
case Success(result, _) =>
Module(modulePath, Some(result))
case failure : NoSuccess =>
//TODO log or act on failure
Module(modulePath, None)
}
}
def prologExprs = rep(importExpr) ^^ {
case modules =>
Imports(modules)
}
def bracedExpr = "{" ~> stringLit ~ "," ~ stringLit <~ "}" ^^ {
case x ~ "," ~ y =>
BracedExpr(x, y)
}
def bodyExprs = bracedExpr
def expr = prologExprs ~ bodyExprs ^^ {
case prolog ~ body =>
Main(prolog, body)
}
}
}
You could simply add an eval to your Expression trait, implement each eval as necessary on the sub-classes and then have a visitor recursively descend your AST. In this manner you would not need to manually merge expression trees together.

Scala parser-combinators: how to invert matches?

Is it possible to invert matches with Scala parser combinators? I am trying to match lines with a parser that do not start with a set of keywords. I could do this with an annoying zero width negative lookahead regular expression (e.g. "(?!h1|h2).*"), but I'd rather do it with a Scala parser. The best I've been able to come up with is this:
def keyword = "h1." | "h2."
def alwaysfails = "(?=a)b".r
def linenotstartingwithkeyword = keyword ~! alwaysfails | ".*".r
The idea is here that I use ~! to forbid backtracking to the all-matching regexp, and then continue with a regex "(?=a)b".r that matches nothing. (By the way, is there a predefined parser that always fails?) That way the line would not be matched if a keyword is found but would be matched if keyword does not match.
I am wondering if there is a better way to do this. Is there?
You can use not here:
import scala.util.parsing.combinator._
object MyParser extends RegexParsers {
val keyword = "h1." | "h2."
val lineNotStartingWithKeyword = not(keyword) ~> ".*".r
def apply(s: String) = parseAll(lineNotStartingWithKeyword, s)
}
Now:
scala> MyParser("h1. test")
res0: MyParser.ParseResult[String] =
[1.1] failure: Expected failure
h1. test
^
scala> MyParser("h1 test")
res1: MyParser.ParseResult[String] = [1.8] parsed: h1 test
Note that there is also a failure method on Parsers, so you could just as well have written your version with keyword ~! failure("keyword!"). But not's a lot nicer, anyway.