Building a boolean logic parser in Scala with PackratParser - scala

I am trying to build a Boolean logic parser e.g. A == B AND C == D to output something like And(Equals(A,B), Equals(C,D))
My parser has the following definitions:
def program: Parser[Operator] = {
phrase(operator)
}
def operator: PackratParser[Operator] = {
leaf | node
}
def node: PackratParser[Operator] = {
and | or
}
def leaf: PackratParser[Operator] = {
equal | greater | less
}
def and: PackratParser[Operator] = {
(operator ~ ANDT() ~ operator) ^^ {
case left ~ _ ~ right => And(left, right)}
}
I would expect the parser to map to program -> operator -> node -> and -> operator (left) -> leaf -> equal -> operator (right) -> leaf -> equal. This doesn't work.
However if in the above code I do the changes
def operatorWithParens: PackratParser[Operator] = {
lparen ~> (operator | operatorWithParens) <~ rparen
}
and change and to be
def and: PackratParser[Operator] = {
(operatorWithParens ~ ANDT() ~ operatorWithParens) ^^ {
case left ~ _ ~ right => And(left, right)}
}
Parsing (A == B) AND (C == D) succeeds.
I can not wrap my head around why the former doesn't work while the later does.
How should I change my code to be able to parse A == B AND C == D?
EDIT:
Following #Andrey Tyukin advice I've modified the gramma to account for precedence
def program: Parser[Operator] = positioned {
phrase(expr)
}
def expr: PackratParser[Operator] = positioned {
(expr ~ ORT() ~ expr1) ^^ {
case left ~ _ ~ right => Or(left, right)} | expr1
}
def expr1: PackratParser[Operator] = positioned {
(expr1 ~ ANDT() ~ expr2) ^^ {
case left ~ _ ~ right => And(left, right)} | expr2
}
def expr2: PackratParser[Operator] = positioned {
(NOTT() ~ expr2) ^^ {case _ ~ opr => Not(opr)} | expr3
}
def expr3: PackratParser[Operator] = {
lparen ~> (expr) <~ rparen | leaf
}
And although PackratParser supports left-recursive grammar, I run into an infinite loop that never leaves expr

It looks like there is a path from operator to a shorter operator:
operator -> node -> and -> (operator ~ somethingElse)
You seem to be assuming that the shorter operator (left) will somehow reduce to leaf, whereas the outermost operator would skip the leaf and pick the node, for whatever reason. What it does instead is just chocking on the first leaf it encounters.
You could try to move the node before the leaf, so that the whole operator doesn't choke on the first A when seeing sth. like A == B AND ....
Otherwise, I'd suggest to refactor it into
disjunctions
of conjunctions
of atomic formulas
where atomic formulas are either
comparisons or
indivisible parenthesized top-level elements (i.e. parenthesized
disjunctions, in this case).
Expect to use quite a few repSeps.

Related

Creating AST for arithmetic expression in Scala

I would like to make an AST for arithmetic expression using fastparse from Scala.
For me a arithmetic expression is like:
var_name := value; // value can be an integer, or a whole expression
For the moment I have this parsers:
def word[_:P] = P((CharIn("a-z") | CharIn("A-Z") | "_").rep(1).!)
def digits[_ : P] = P(CharIn("0-9").rep.!)
def div_mul[_: P] = P( digits~ space.? ~ (("*" | "/").! ~ space.? ~/ digits).rep ).map(eval)
def add_sub[_: P] = P( div_mul ~ space.? ~ (("+" | "-").! ~ space.? ~/ div_mul).rep ).map(eval)
def expr[_: P]= P( " ".rep ~ add_sub ~ " ".rep ~ End )
def var_assig[_:P] = P(word ~ " " ~ ":=" ~ " " ~ (value | expr) ~ ";")
I want to create AST for arithmetic expression (2+3*2 for example).
Expected result: Assignment[2,plus[mult,[3,2]]] // symbol[left, right]
My questions is:
What should be like the Tree class/object, if it is necessary, because I want to evaluate that result? This class I will use for the rest parse(if, while).
What should be like the eval function, who takes the input an string, or Seq[String] and return a AST with my expected result?
Here is my way of doing it.
I have defined the components of the Arithmetic Expression using the following Trait:
sealed trait Expression
case class Add(l: Expression, r: Expression) extends Expression
case class Sub(l: Expression, r: Expression) extends Expression
case class Mul(l: Expression, r: Expression) extends Expression
case class Div(l: Expression, r: Expression) extends Expression
case class Num(value: String) extends Expression
And defined the following fastparse patterns (similar to what is described here: https://com-lihaoyi.github.io/fastparse/#Math)
def number[_: P]: P[Expression] = P(CharIn("0-9").rep(1)).!.map(Num)
def parens[_: P]: P[Expression] = P("(" ~/ addSub ~ ")")
def factor[_: P]: P[Expression] = P(number | parens)
def divMul[_: P]: P[Expression] = P(factor ~ (CharIn("*/").! ~/ factor).rep).map(astBuilder _ tupled)
def addSub[_: P]: P[Expression] = P(divMul ~ (CharIn("+\\-").! ~/ divMul).rep).map(astBuilder _ tupled)
def expr[_: P]: P[Expression] = P(addSub ~ End)
Instead of the eval function that was used in the map, I have written a similar one, which returns a folded entity of the previously defined case classes:
def astBuilder(initial: Expression, rest: Seq[(String, Expression)]): Expression = {
rest.foldLeft(initial) {
case (left, (operator, right)) =>
operator match {
case "*" => Mul(left, right)
case "/" => Div(left, right)
case "+" => Add(left, right)
case "-" => Sub(left, right)
}
}
}
And if we would run the following expression:
val Parsed.Success(res, _) = parse("2+3*2", expr(_))
The result would be: Add(Num(2),Mul(Num(3),Num(2)))

Scala IF-ELSEIF-ELSE Simple parser combinator for control flow

I am writing a parser combinator to parse simple control flow statements
and execute some code. The structure of language is roughly this -
val resultId = 200
val s = s"""(IF $resultId == 100 GOTO NODE-1-->NODE-2) (ELSE IF $resultId > 100 GOTO NODE-1-->NODE-3) (ELSE GOTO NODE-1-->NODE-4)""".stripMargin
private val result= new ConditionalParserCombinator().run(s)
In above scenario for example I should get GOTO NODE-1-->NODE-3 instead I get false after evaluation of else expression, code of combinator outlined below:
final class ConditionalParserCombinator extends JavaTokenParsers with ParserCombinatorLike {
def IF = "IF"
def ELSE = "ELSE"
def ELSEIF = ELSE ~ IF
def NULL = "NULL"
def GOTO = "GOTO"
def node_id = wholeNumber | floatingPointNumber | stringLiteral
def NODE = "NODE" ~ "-" ~ node_id ^^ (e ⇒ NodeExpression(e._2))
def EDGE = NODE ~ "-->" ~ NODE ^^ (e ⇒ EdgeExpression(e._1._1, e._2))
def lhs = ident | wholeNumber | floatingPointNumber | stringLiteral
def rhs = ident | wholeNumber | floatingPointNumber | stringLiteral | NULL
def operator = "==" | "*" | "/" | "||" | "&&" | ">" | "<" | ">=" | "<="
def block = GOTO ~ EDGE
def expression_block = lhs ~ operator ~ rhs ~ block ^^ {
case lhs ~ operator ~ rhs ~ block ⇒ ExpressionBlock(lhs, rhs, operator, block._2)
}
def ifExpression = IF ~ expression_block ^^ (e ⇒ e._2.operator match {
case "==" ⇒ if (e._2.lhs == e._2.rhs) Block(e._2.block) else false
case ">" ⇒ if (e._2.lhs > e._2.rhs) Block(e._2.block) else false
case "<" ⇒ if (e._2.lhs < e._2.rhs) Block(e._2.block) else false
case _ ⇒ false
})
def elseIFExpression = ELSEIF ~ expression_block ^^ (e ⇒ e._2.operator match {
case "==" ⇒ if (e._2.lhs == e._2.rhs) Block(e._2.block) else false
case ">" ⇒ if (e._2.lhs > e._2.rhs) {
println("matched elseif")
Block(e._2.block)
} else false
case "<" ⇒ if (e._2.lhs < e._2.rhs) Block(e._2.block) else false
case _ ⇒ false
})
def elseExpression = ELSE ~ block ^^ (e ⇒ Block(e._2._2))
override def grammar = "(" ~> log(ifExpression)("ifexpression") <~ ")" ~!
"(" ~> log(elseIFExpression)("elseifexpression") <~ ")" ~!
"(" ~> log(elseExpression)("elseexpression") <~ ")"
}
I am printing result.get and I see false as the result.
** Additional details - Block, ExpressionBlock are all case classes useful for a few things that I may do later on**
I think its cleaner to parse an expression to a type that you can understand (meaning I have custom Product/Case classes defined for it) and then Evaluate it - these are two different things. In hindsight not sure why I got both mixed up. Here's the logic that works -
def IF = "IF"
def ELSE = "ELSE"
def ELSEIF = ELSE ~ IF
def NULL = "NULL"
def GOTO = "GOTO"
def dataType: Parser[DataType] = "[" ~ "Integer" ~ "]" ^^ { e ⇒ DataType("", "Integer") }
def node_id = wholeNumber | floatingPointNumber | stringLiteral
def NODE = "NODE" ~ "-" ~ node_id ^^ (e ⇒ ParseableNode(e._2, DataType({}, "Unit")))
def EDGE = NODE ~ "-->" ~ NODE ^^ (e ⇒ EdgeExpression(e._1._1, e._2))
def lhs = ident | wholeNumber | floatingPointNumber | stringLiteral
def rhs = ident | wholeNumber | floatingPointNumber | stringLiteral | NULL
def operator = "==" | "*" | "/" | "||" | "&&" | ">" | "<" | ">=" | "<="
def block = GOTO ~ EDGE
def expression_block(expType: ConditionalKind) = dataType ~ lhs ~ operator ~ rhs ~ block ^^ {
case dataType ~ lhs ~ operator ~ rhs ~ block ⇒ ExpressionBlock(ParseableNode(lhs, dataType), ParseableNode(rhs, dataType), operator, block._2, expType)
}
def ifExpression = IF ~ expression_block(ConditionalKind("IF")) ^^ {
case "IF" ~ expression_block ⇒ ExpressionBlock(expression_block.lhs, expression_block.rhs, expression_block.operator, expression_block.block, expression_block.conditionalKind)
}
def elseIFExpression = ELSEIF ~ expression_block(ConditionalKind("ELSEIF")) ^^ {
case "ELSE" ~ "IF" ~ expression_block ⇒ ExpressionBlock(expression_block.lhs, expression_block.rhs, expression_block.operator, expression_block.block, expression_block.conditionalKind)
}
def elseExpression = ELSE ~ block ^^ { case "ELSE" ~ block ⇒ Block(block._2) }
override def grammar = log(ifExpression)("ifexpression") ~ log(elseIFExpression)("elseifexpression") ~ log(elseExpression)("elseexpression") ^^ {
case ifExpression ~ elseIFExpression ~ elseExpression ⇒
ConditionalExpressions(List(ifExpression, elseIFExpression), elseExpression)
}
The above logic works after being evaluated like this -
object BasicSelectorExpressionEvaluator extends EvaluatorLike {
override def eval(parseable: Parseable) = parseable match {
case ConditionalExpressions(ifElseIfs, otherwise) ⇒
val mappedIfElseIfs: immutable.Seq[Block] = ifElseIfs.map { e ⇒
println(s"e ==>$e")
e.operator match {
case "==" ⇒ if (e.lhs == e.rhs) {
println("mached ==")
Block(e.block)
} else Block.Unit
case "<" ⇒ if (e.lhs.value.toInt < e.rhs.value.toInt) {
println("matched <")
Block(e.block)
} else Block.Unit
case ">" ⇒ if (e.lhs.value.toInt > e.rhs.value.toInt) {
println("matched >")
Block(e.block)
} else Block.Unit
case "<=" ⇒ if (e.lhs.value.toInt <= e.rhs.value.toInt) {
println("mached <=")
Block(e.block)
} else Block.Unit
case ">=" ⇒ if (e.lhs.value.toInt >= e.rhs.value.toInt) {
println("mached >=")
Block(e.block)
} else Block.Unit
}
}
val filteredMappedIFElseIfs = mappedIfElseIfs.filterNot(e ⇒ e.equals(Block.Unit))
println(s"filteredMappedIFElseIfs == $filteredMappedIFElseIfs")
if (filteredMappedIFElseIfs.nonEmpty) PResult(filteredMappedIFElseIfs.head.block) else PResult(otherwise.block)
}
}
So the above can parse this grammar -
val s = s""" IF [Integer] $resultId == 100 GOTO NODE-1-->NODE-2 ELSE IF [Integer] $resultId > 100 GOTO NODE-1-->NODE-3 ELSE GOTO NODE-1-->NODE-4""".stripMargin
It could be done better, e.g. grammar seems to violate DRY by embedding data types on every If, but I suppose people can derive things out of it.
Edit - Also note - this toInt thing is a bit ugly, needs to be better designed, I will maybe post an update once I do so. I need to rework all grammar now that it all works - suggestions/improvements welcome, still learning.

Scala warning match may not be exhaustive while parsing

class ExprParser extends RegexParsers {
val number = "[0-9]+".r
def expr: Parser[Int] = term ~ rep(
("+" | "-") ~ term ^^ {
case "+" ~ t => t
case "-" ~ t => -t
}) ^^ { case t ~ r => t + r.sum }
def term: Parser[Int] = factor ~ (("*" ~ factor)*) ^^ {
case f ~ r => f * r.map(_._2).product
}
def factor: Parser[Int] = number ^^ { _.toInt } | "(" ~> expr <~ ")"
}
I get the following warning when compiling
warning: match may not be exhaustive.
It would fail on the following input: ~((x: String forSome x not in ("+", "-")), _)
("+" | "-") ~ term ^^ {
^
one warning found
I heard that #unchecked annotation can help. But in this case where should I put it?
The issue here is that with ("+" | "-") you are creating a parser that accepts only two possible strings. However when you map on the resulting parser to extract the value, the result you're going to extract will just be String.
In your pattern matching you only have cases for the strings "+" and "-", but the compiler has no way of knowing that those are the only possible strings that will show up, so it's telling you here that your match may not be exhaustive since it can't know any better.
You could use an unchecked annotation to suppress the warning, but there are much better, more idiomatic ways, to eliminate the issue. One way to solve this is to replace those strings with some kind of structured type as soon as possible. For example, create an ADT
sealed trait Operation
case object Plus extends Operation
case object Minus extends Operation
//then in your parser
("+" ^^^ Plus | "-" ^^^ Minus) ~ term ^^ {
case PLus ~ t => t
case Minus ~ t => -t
}
Now it should be able to realize that the only possible cases are Plus and Minus
Add a case to remove the warning
class ExprParser extends RegexParsers {
val number = "[0-9]+".r
def expr: Parser[Int] = term ~ rep(
("+" | "-") ~ term ^^ {
case "+" ~ t => t
case "-" ~ t => -t
case _ ~ t => t
}) ^^ { case t ~ r => t + r.sum }
def term: Parser[Int] = factor ~ (("*" ~ factor)*) ^^ {
case f ~ r => f * r.map(_._2).product
}
def factor: Parser[Int] = number ^^ { _.toInt } | "(" ~> expr <~ ")"
}

Type mismatch java.io.Serializable and GenTraversableOnce

I'm trying to get a parser to take a sequence of colon-seperated words and convert them into an array.
Here's an SSCCE.
import util.parsing.combinator._
class Element {
def getSuper() = "TODO"
}
class Comp extends RegexParsers with PackratParsers {
lazy val element: PackratParser[Element] = (
"foo") ^^ {
s => new Element
}
lazy val list: PackratParser[Array[Element]] = (
(element ~ ";" ~ list) |
element ~ ";") ^^
{
case a ~ ";" ~ b => Array(a) ++ b
case a ~ ";" => Array(a)
}
}
object Compiler extends Comp {
def main(args: Array[String]) = {
println(parseAll(list, "foo; foo; foo;"))
}
}
It's not working and it's not compiling, if it was I wouldn't be asking about it. This is the error message I'm getting. Is there a way to convert from Serializable to GenTraversableOnce?
~/Documents/Git/Workspace/Uncool/Scales$ scalac stov.scala
stov.scala:19: error: type mismatch;
found : java.io.Serializable
required: scala.collection.GenTraversableOnce[?]
case a ~ ";" ~ b => Array(a) ++ b
^
one error found
My suspicion goes on the | combinator.
The type of (element ~ ";" ~ list) is ~[~[Element, String], Array[Element]] and the type of element ~ ";" is ~[Element, String].
Thus when applying the | combinator on these parsers, it returns a Parser[U] where U is a supertype of T ([U >: T]).
Here the type of T is ~[~[Element, String], Array[Element]] and the type of U is ~[Element, String].
So the most specific type between Array[Element] and String is Serializable.
Between ~[Element, String] and Element its Object. That's why the type of | is ~[Serializable, Object].
So when applying the map operation, you need to provide a function ~[Serializable, Object] => U where U is Array[Element] in your case since the return type of your function is PackratParser[Array[Element]].
Now the only possible match is:
case obj ~ ser => //do what you want
Now you see that the pattern you're trying to match in your map is fundamentally wrong. Even if you return an empty array (just so that it compiles), you'll see that it leads to a match error at runtime.
That said, what I suggest is first to map separately each combinator:
lazy val list: PackratParser[Array[Element]] =
(element ~ ";" ~ list) ^^ {case a ~ ";" ~ b => Array(a) ++ b} |
(element ~ ";") ^^ {case a ~ ";" => Array(a)}
But the pattern you are looking for is already implemented using the rep combinator (you could also take a look at repsep but you'd need to handle the last ; separately):
lazy val list: PackratParser[Array[Element]] = rep(element <~ ";") ^^ (_.toArray)

Scala parsing left-associative subscript operator

I've mastered this syntax for building a left-associative tree for infix operators:
term * (
"+" ^^^ { (a:Expr, b:Expr) => new FunctionCall(plus, a::b::Nil) } |
"-" ^^^ { (a:Expr, b:Expr) => new FunctionCall(minus, a::b::Nil) } )
Though I have to confess I don't fully understand how it works. What I want to do now is to achieve a similar effect for syntax that might look like
a[b](c)(d)[e]
which should parse as
sub(call(call(sub(a, b), c), d), e)
Can the high-level "^^^" magic be extended to cover a case where it's not a pure infix operator? Or do I have to implement some kind of fold-left logic myself? If so, any hints as to what it might look like?
I have solved the problem as follows. I'm happy with the solution, but if any Scala experts out there can help me improve it, that's very welcome.
def subscript: Parser[Expr => Expr] = {
"[" ~> expr <~ "]" ^^ {
case sub => {
{ (base: Expr) => new FunctionCall(subscriptFn, base :: sub :: Nil)}
}
}
}
def argumentList: Parser[Expr => Expr] = {
"(" ~> repsep(expr, ",") <~ ")" ^^ {
case args => {
{ (base: Expr) => new FunctionCall(base :: args)}
}
}
}
def postfixExpr: Parser[Expr] = {
primary ~ rep ( subscript | argumentList ) ^^ {
case base ~ suffixes => {
(base /: suffixes)((b:Expr, f:Expr=>Expr) => f(b))
}
}
}