I am writing a parser combinator to parse simple control flow statements
and execute some code. The structure of language is roughly this -
val resultId = 200
val s = s"""(IF $resultId == 100 GOTO NODE-1-->NODE-2) (ELSE IF $resultId > 100 GOTO NODE-1-->NODE-3) (ELSE GOTO NODE-1-->NODE-4)""".stripMargin
private val result= new ConditionalParserCombinator().run(s)
In above scenario for example I should get GOTO NODE-1-->NODE-3 instead I get false after evaluation of else expression, code of combinator outlined below:
final class ConditionalParserCombinator extends JavaTokenParsers with ParserCombinatorLike {
def IF = "IF"
def ELSE = "ELSE"
def ELSEIF = ELSE ~ IF
def NULL = "NULL"
def GOTO = "GOTO"
def node_id = wholeNumber | floatingPointNumber | stringLiteral
def NODE = "NODE" ~ "-" ~ node_id ^^ (e ⇒ NodeExpression(e._2))
def EDGE = NODE ~ "-->" ~ NODE ^^ (e ⇒ EdgeExpression(e._1._1, e._2))
def lhs = ident | wholeNumber | floatingPointNumber | stringLiteral
def rhs = ident | wholeNumber | floatingPointNumber | stringLiteral | NULL
def operator = "==" | "*" | "/" | "||" | "&&" | ">" | "<" | ">=" | "<="
def block = GOTO ~ EDGE
def expression_block = lhs ~ operator ~ rhs ~ block ^^ {
case lhs ~ operator ~ rhs ~ block ⇒ ExpressionBlock(lhs, rhs, operator, block._2)
}
def ifExpression = IF ~ expression_block ^^ (e ⇒ e._2.operator match {
case "==" ⇒ if (e._2.lhs == e._2.rhs) Block(e._2.block) else false
case ">" ⇒ if (e._2.lhs > e._2.rhs) Block(e._2.block) else false
case "<" ⇒ if (e._2.lhs < e._2.rhs) Block(e._2.block) else false
case _ ⇒ false
})
def elseIFExpression = ELSEIF ~ expression_block ^^ (e ⇒ e._2.operator match {
case "==" ⇒ if (e._2.lhs == e._2.rhs) Block(e._2.block) else false
case ">" ⇒ if (e._2.lhs > e._2.rhs) {
println("matched elseif")
Block(e._2.block)
} else false
case "<" ⇒ if (e._2.lhs < e._2.rhs) Block(e._2.block) else false
case _ ⇒ false
})
def elseExpression = ELSE ~ block ^^ (e ⇒ Block(e._2._2))
override def grammar = "(" ~> log(ifExpression)("ifexpression") <~ ")" ~!
"(" ~> log(elseIFExpression)("elseifexpression") <~ ")" ~!
"(" ~> log(elseExpression)("elseexpression") <~ ")"
}
I am printing result.get and I see false as the result.
** Additional details - Block, ExpressionBlock are all case classes useful for a few things that I may do later on**
I think its cleaner to parse an expression to a type that you can understand (meaning I have custom Product/Case classes defined for it) and then Evaluate it - these are two different things. In hindsight not sure why I got both mixed up. Here's the logic that works -
def IF = "IF"
def ELSE = "ELSE"
def ELSEIF = ELSE ~ IF
def NULL = "NULL"
def GOTO = "GOTO"
def dataType: Parser[DataType] = "[" ~ "Integer" ~ "]" ^^ { e ⇒ DataType("", "Integer") }
def node_id = wholeNumber | floatingPointNumber | stringLiteral
def NODE = "NODE" ~ "-" ~ node_id ^^ (e ⇒ ParseableNode(e._2, DataType({}, "Unit")))
def EDGE = NODE ~ "-->" ~ NODE ^^ (e ⇒ EdgeExpression(e._1._1, e._2))
def lhs = ident | wholeNumber | floatingPointNumber | stringLiteral
def rhs = ident | wholeNumber | floatingPointNumber | stringLiteral | NULL
def operator = "==" | "*" | "/" | "||" | "&&" | ">" | "<" | ">=" | "<="
def block = GOTO ~ EDGE
def expression_block(expType: ConditionalKind) = dataType ~ lhs ~ operator ~ rhs ~ block ^^ {
case dataType ~ lhs ~ operator ~ rhs ~ block ⇒ ExpressionBlock(ParseableNode(lhs, dataType), ParseableNode(rhs, dataType), operator, block._2, expType)
}
def ifExpression = IF ~ expression_block(ConditionalKind("IF")) ^^ {
case "IF" ~ expression_block ⇒ ExpressionBlock(expression_block.lhs, expression_block.rhs, expression_block.operator, expression_block.block, expression_block.conditionalKind)
}
def elseIFExpression = ELSEIF ~ expression_block(ConditionalKind("ELSEIF")) ^^ {
case "ELSE" ~ "IF" ~ expression_block ⇒ ExpressionBlock(expression_block.lhs, expression_block.rhs, expression_block.operator, expression_block.block, expression_block.conditionalKind)
}
def elseExpression = ELSE ~ block ^^ { case "ELSE" ~ block ⇒ Block(block._2) }
override def grammar = log(ifExpression)("ifexpression") ~ log(elseIFExpression)("elseifexpression") ~ log(elseExpression)("elseexpression") ^^ {
case ifExpression ~ elseIFExpression ~ elseExpression ⇒
ConditionalExpressions(List(ifExpression, elseIFExpression), elseExpression)
}
The above logic works after being evaluated like this -
object BasicSelectorExpressionEvaluator extends EvaluatorLike {
override def eval(parseable: Parseable) = parseable match {
case ConditionalExpressions(ifElseIfs, otherwise) ⇒
val mappedIfElseIfs: immutable.Seq[Block] = ifElseIfs.map { e ⇒
println(s"e ==>$e")
e.operator match {
case "==" ⇒ if (e.lhs == e.rhs) {
println("mached ==")
Block(e.block)
} else Block.Unit
case "<" ⇒ if (e.lhs.value.toInt < e.rhs.value.toInt) {
println("matched <")
Block(e.block)
} else Block.Unit
case ">" ⇒ if (e.lhs.value.toInt > e.rhs.value.toInt) {
println("matched >")
Block(e.block)
} else Block.Unit
case "<=" ⇒ if (e.lhs.value.toInt <= e.rhs.value.toInt) {
println("mached <=")
Block(e.block)
} else Block.Unit
case ">=" ⇒ if (e.lhs.value.toInt >= e.rhs.value.toInt) {
println("mached >=")
Block(e.block)
} else Block.Unit
}
}
val filteredMappedIFElseIfs = mappedIfElseIfs.filterNot(e ⇒ e.equals(Block.Unit))
println(s"filteredMappedIFElseIfs == $filteredMappedIFElseIfs")
if (filteredMappedIFElseIfs.nonEmpty) PResult(filteredMappedIFElseIfs.head.block) else PResult(otherwise.block)
}
}
So the above can parse this grammar -
val s = s""" IF [Integer] $resultId == 100 GOTO NODE-1-->NODE-2 ELSE IF [Integer] $resultId > 100 GOTO NODE-1-->NODE-3 ELSE GOTO NODE-1-->NODE-4""".stripMargin
It could be done better, e.g. grammar seems to violate DRY by embedding data types on every If, but I suppose people can derive things out of it.
Edit - Also note - this toInt thing is a bit ugly, needs to be better designed, I will maybe post an update once I do so. I need to rework all grammar now that it all works - suggestions/improvements welcome, still learning.
Related
I am trying to build a Boolean logic parser e.g. A == B AND C == D to output something like And(Equals(A,B), Equals(C,D))
My parser has the following definitions:
def program: Parser[Operator] = {
phrase(operator)
}
def operator: PackratParser[Operator] = {
leaf | node
}
def node: PackratParser[Operator] = {
and | or
}
def leaf: PackratParser[Operator] = {
equal | greater | less
}
def and: PackratParser[Operator] = {
(operator ~ ANDT() ~ operator) ^^ {
case left ~ _ ~ right => And(left, right)}
}
I would expect the parser to map to program -> operator -> node -> and -> operator (left) -> leaf -> equal -> operator (right) -> leaf -> equal. This doesn't work.
However if in the above code I do the changes
def operatorWithParens: PackratParser[Operator] = {
lparen ~> (operator | operatorWithParens) <~ rparen
}
and change and to be
def and: PackratParser[Operator] = {
(operatorWithParens ~ ANDT() ~ operatorWithParens) ^^ {
case left ~ _ ~ right => And(left, right)}
}
Parsing (A == B) AND (C == D) succeeds.
I can not wrap my head around why the former doesn't work while the later does.
How should I change my code to be able to parse A == B AND C == D?
EDIT:
Following #Andrey Tyukin advice I've modified the gramma to account for precedence
def program: Parser[Operator] = positioned {
phrase(expr)
}
def expr: PackratParser[Operator] = positioned {
(expr ~ ORT() ~ expr1) ^^ {
case left ~ _ ~ right => Or(left, right)} | expr1
}
def expr1: PackratParser[Operator] = positioned {
(expr1 ~ ANDT() ~ expr2) ^^ {
case left ~ _ ~ right => And(left, right)} | expr2
}
def expr2: PackratParser[Operator] = positioned {
(NOTT() ~ expr2) ^^ {case _ ~ opr => Not(opr)} | expr3
}
def expr3: PackratParser[Operator] = {
lparen ~> (expr) <~ rparen | leaf
}
And although PackratParser supports left-recursive grammar, I run into an infinite loop that never leaves expr
It looks like there is a path from operator to a shorter operator:
operator -> node -> and -> (operator ~ somethingElse)
You seem to be assuming that the shorter operator (left) will somehow reduce to leaf, whereas the outermost operator would skip the leaf and pick the node, for whatever reason. What it does instead is just chocking on the first leaf it encounters.
You could try to move the node before the leaf, so that the whole operator doesn't choke on the first A when seeing sth. like A == B AND ....
Otherwise, I'd suggest to refactor it into
disjunctions
of conjunctions
of atomic formulas
where atomic formulas are either
comparisons or
indivisible parenthesized top-level elements (i.e. parenthesized
disjunctions, in this case).
Expect to use quite a few repSeps.
I would like to make an AST for arithmetic expression using fastparse from Scala.
For me a arithmetic expression is like:
var_name := value; // value can be an integer, or a whole expression
For the moment I have this parsers:
def word[_:P] = P((CharIn("a-z") | CharIn("A-Z") | "_").rep(1).!)
def digits[_ : P] = P(CharIn("0-9").rep.!)
def div_mul[_: P] = P( digits~ space.? ~ (("*" | "/").! ~ space.? ~/ digits).rep ).map(eval)
def add_sub[_: P] = P( div_mul ~ space.? ~ (("+" | "-").! ~ space.? ~/ div_mul).rep ).map(eval)
def expr[_: P]= P( " ".rep ~ add_sub ~ " ".rep ~ End )
def var_assig[_:P] = P(word ~ " " ~ ":=" ~ " " ~ (value | expr) ~ ";")
I want to create AST for arithmetic expression (2+3*2 for example).
Expected result: Assignment[2,plus[mult,[3,2]]] // symbol[left, right]
My questions is:
What should be like the Tree class/object, if it is necessary, because I want to evaluate that result? This class I will use for the rest parse(if, while).
What should be like the eval function, who takes the input an string, or Seq[String] and return a AST with my expected result?
Here is my way of doing it.
I have defined the components of the Arithmetic Expression using the following Trait:
sealed trait Expression
case class Add(l: Expression, r: Expression) extends Expression
case class Sub(l: Expression, r: Expression) extends Expression
case class Mul(l: Expression, r: Expression) extends Expression
case class Div(l: Expression, r: Expression) extends Expression
case class Num(value: String) extends Expression
And defined the following fastparse patterns (similar to what is described here: https://com-lihaoyi.github.io/fastparse/#Math)
def number[_: P]: P[Expression] = P(CharIn("0-9").rep(1)).!.map(Num)
def parens[_: P]: P[Expression] = P("(" ~/ addSub ~ ")")
def factor[_: P]: P[Expression] = P(number | parens)
def divMul[_: P]: P[Expression] = P(factor ~ (CharIn("*/").! ~/ factor).rep).map(astBuilder _ tupled)
def addSub[_: P]: P[Expression] = P(divMul ~ (CharIn("+\\-").! ~/ divMul).rep).map(astBuilder _ tupled)
def expr[_: P]: P[Expression] = P(addSub ~ End)
Instead of the eval function that was used in the map, I have written a similar one, which returns a folded entity of the previously defined case classes:
def astBuilder(initial: Expression, rest: Seq[(String, Expression)]): Expression = {
rest.foldLeft(initial) {
case (left, (operator, right)) =>
operator match {
case "*" => Mul(left, right)
case "/" => Div(left, right)
case "+" => Add(left, right)
case "-" => Sub(left, right)
}
}
}
And if we would run the following expression:
val Parsed.Success(res, _) = parse("2+3*2", expr(_))
The result would be: Add(Num(2),Mul(Num(3),Num(2)))
class ExprParser extends RegexParsers {
val number = "[0-9]+".r
def expr: Parser[Int] = term ~ rep(
("+" | "-") ~ term ^^ {
case "+" ~ t => t
case "-" ~ t => -t
}) ^^ { case t ~ r => t + r.sum }
def term: Parser[Int] = factor ~ (("*" ~ factor)*) ^^ {
case f ~ r => f * r.map(_._2).product
}
def factor: Parser[Int] = number ^^ { _.toInt } | "(" ~> expr <~ ")"
}
I get the following warning when compiling
warning: match may not be exhaustive.
It would fail on the following input: ~((x: String forSome x not in ("+", "-")), _)
("+" | "-") ~ term ^^ {
^
one warning found
I heard that #unchecked annotation can help. But in this case where should I put it?
The issue here is that with ("+" | "-") you are creating a parser that accepts only two possible strings. However when you map on the resulting parser to extract the value, the result you're going to extract will just be String.
In your pattern matching you only have cases for the strings "+" and "-", but the compiler has no way of knowing that those are the only possible strings that will show up, so it's telling you here that your match may not be exhaustive since it can't know any better.
You could use an unchecked annotation to suppress the warning, but there are much better, more idiomatic ways, to eliminate the issue. One way to solve this is to replace those strings with some kind of structured type as soon as possible. For example, create an ADT
sealed trait Operation
case object Plus extends Operation
case object Minus extends Operation
//then in your parser
("+" ^^^ Plus | "-" ^^^ Minus) ~ term ^^ {
case PLus ~ t => t
case Minus ~ t => -t
}
Now it should be able to realize that the only possible cases are Plus and Minus
Add a case to remove the warning
class ExprParser extends RegexParsers {
val number = "[0-9]+".r
def expr: Parser[Int] = term ~ rep(
("+" | "-") ~ term ^^ {
case "+" ~ t => t
case "-" ~ t => -t
case _ ~ t => t
}) ^^ { case t ~ r => t + r.sum }
def term: Parser[Int] = factor ~ (("*" ~ factor)*) ^^ {
case f ~ r => f * r.map(_._2).product
}
def factor: Parser[Int] = number ^^ { _.toInt } | "(" ~> expr <~ ")"
}
I am writing a Parser in scala and got stuck at this point:
private def expression : Parser[Expression] = cond | variable | integer | liste | function
private def cond : Parser[Expression] = "if" ~ predicate ~ "then" ~ expression ~ "else" ~ expression ^^ {case _~i~_~t~_~el => Cond(i,t,el)}
private def predicate: Parser[Predicate] = identifier ~ "?" ~ "(" ~ repsep(expression, ",") ~ ")" ^^{case n~_~_~el~_ => Predicate(n,el)}
private def function: Parser[Expression] = identifier ~ "(" ~ repsep(expression, ",") ~ ")" ^^{case n~_~el~_ => Function(n,el)}
private def liste: Parser[Expression] = "[" ~ repsep(expression, ",") ~ "]" ^^ {case _~ls~_ => Liste(ls)}
private def variable: Parser[Expression] = identifier ^^ {case v => Variable(v)}
def identifier: Parser[String] = """[a-zA-Z0-9]+""".r ^^ { _.toString }
def integer: Parser[Integer] = num ^^ { case i => Integer(i)}
def num: Parser[String] = """(-?\d*)""".r ^^ {_.toString}
My problem is that when it comes to an "expression" the Parser does not always takes the right way. Like if its funk(x,y) it tries to parse it like a variable ant not like a function.
Any idea?
Change order of parsers in your expression parser - put function before variable and after cond. In general, when you compose parsers using alternative A | B, then parser A shouldn't be able to parse input that is prefix of input parsable by parser B.
I am developing lexical analysis for my program language. I want to produce the fail string which have open quote but dont have close quote. Ex: "hello
class SimpleLexer extends StdLexical {
import scala.util.parsing.input.CharArrayReader.EofCh
def regex(r: Regex): Parser[String] = new Parser[String] {
def apply(in: Input) = {
val source = in.source
val offset = in.offset
(r findPrefixMatchOf (source.subSequence(offset, source.length))) match {
case Some(matched) =>
Success(source.subSequence(offset, offset + matched.end).toString,
in.drop(matched.end))
case None =>
Failure("string matching regex `" + r + "' expected but `" + in.first + "' found", in.drop(0))
}
}
}
override def token: Parser[Token] = {
// Adapted from StdLexical
(
'\"' ~ rep( chrExcept('\"', '\n','\t','\b','\f','\r', EofCh) ) ~ '\"' ^^ { case '\"' ~ chars ~ '\"' => StringLit(chars mkString "") }
|'\"' ~> failure("Unclosed string: "+"??") // I want produce fail string
|EofCh ^^^ EOF
|delim
)
}
override def whitespace: Parser[Any] = rep(
whitespaceChar
| '/' ~ '*' ~ comment
| '/' ~ '*' ~> failure("unclosed comment"))
override protected def comment: Parser[Any] = (
'*' ~ '/' ^^ { case _ => ' ' }
| chrExcept(EofCh) ~ comment)
}
Excample:
input: " safs i
output: ErrorToken(Unclosed string: " safs i)
Can you help me solve this problem.
Thanks.
My answer for your question
override def token: Parser[Token] = {
// Adapted from StdLexical
(
'\"' ~ rep( chrExcept('\"', '\n','\t','\b','\f','\r', EofCh) ) ~ '\"' ^^ { case '\"' ~ chars ~ '\"' => StringLit(chars mkString "") }
|'\"' ~> rep( chrExcept('\"', '\n','\t','\b','\f','\r', EofCh) ) ^^ {chars => ErrorToken(("\"" :: chars) mkString "")}
|EofCh ^^^ EOF
|delim
)
}