Scala parsing left-associative subscript operator - scala

I've mastered this syntax for building a left-associative tree for infix operators:
term * (
"+" ^^^ { (a:Expr, b:Expr) => new FunctionCall(plus, a::b::Nil) } |
"-" ^^^ { (a:Expr, b:Expr) => new FunctionCall(minus, a::b::Nil) } )
Though I have to confess I don't fully understand how it works. What I want to do now is to achieve a similar effect for syntax that might look like
a[b](c)(d)[e]
which should parse as
sub(call(call(sub(a, b), c), d), e)
Can the high-level "^^^" magic be extended to cover a case where it's not a pure infix operator? Or do I have to implement some kind of fold-left logic myself? If so, any hints as to what it might look like?

I have solved the problem as follows. I'm happy with the solution, but if any Scala experts out there can help me improve it, that's very welcome.
def subscript: Parser[Expr => Expr] = {
"[" ~> expr <~ "]" ^^ {
case sub => {
{ (base: Expr) => new FunctionCall(subscriptFn, base :: sub :: Nil)}
}
}
}
def argumentList: Parser[Expr => Expr] = {
"(" ~> repsep(expr, ",") <~ ")" ^^ {
case args => {
{ (base: Expr) => new FunctionCall(base :: args)}
}
}
}
def postfixExpr: Parser[Expr] = {
primary ~ rep ( subscript | argumentList ) ^^ {
case base ~ suffixes => {
(base /: suffixes)((b:Expr, f:Expr=>Expr) => f(b))
}
}
}

Related

Building a boolean logic parser in Scala with PackratParser

I am trying to build a Boolean logic parser e.g. A == B AND C == D to output something like And(Equals(A,B), Equals(C,D))
My parser has the following definitions:
def program: Parser[Operator] = {
phrase(operator)
}
def operator: PackratParser[Operator] = {
leaf | node
}
def node: PackratParser[Operator] = {
and | or
}
def leaf: PackratParser[Operator] = {
equal | greater | less
}
def and: PackratParser[Operator] = {
(operator ~ ANDT() ~ operator) ^^ {
case left ~ _ ~ right => And(left, right)}
}
I would expect the parser to map to program -> operator -> node -> and -> operator (left) -> leaf -> equal -> operator (right) -> leaf -> equal. This doesn't work.
However if in the above code I do the changes
def operatorWithParens: PackratParser[Operator] = {
lparen ~> (operator | operatorWithParens) <~ rparen
}
and change and to be
def and: PackratParser[Operator] = {
(operatorWithParens ~ ANDT() ~ operatorWithParens) ^^ {
case left ~ _ ~ right => And(left, right)}
}
Parsing (A == B) AND (C == D) succeeds.
I can not wrap my head around why the former doesn't work while the later does.
How should I change my code to be able to parse A == B AND C == D?
EDIT:
Following #Andrey Tyukin advice I've modified the gramma to account for precedence
def program: Parser[Operator] = positioned {
phrase(expr)
}
def expr: PackratParser[Operator] = positioned {
(expr ~ ORT() ~ expr1) ^^ {
case left ~ _ ~ right => Or(left, right)} | expr1
}
def expr1: PackratParser[Operator] = positioned {
(expr1 ~ ANDT() ~ expr2) ^^ {
case left ~ _ ~ right => And(left, right)} | expr2
}
def expr2: PackratParser[Operator] = positioned {
(NOTT() ~ expr2) ^^ {case _ ~ opr => Not(opr)} | expr3
}
def expr3: PackratParser[Operator] = {
lparen ~> (expr) <~ rparen | leaf
}
And although PackratParser supports left-recursive grammar, I run into an infinite loop that never leaves expr
It looks like there is a path from operator to a shorter operator:
operator -> node -> and -> (operator ~ somethingElse)
You seem to be assuming that the shorter operator (left) will somehow reduce to leaf, whereas the outermost operator would skip the leaf and pick the node, for whatever reason. What it does instead is just chocking on the first leaf it encounters.
You could try to move the node before the leaf, so that the whole operator doesn't choke on the first A when seeing sth. like A == B AND ....
Otherwise, I'd suggest to refactor it into
disjunctions
of conjunctions
of atomic formulas
where atomic formulas are either
comparisons or
indivisible parenthesized top-level elements (i.e. parenthesized
disjunctions, in this case).
Expect to use quite a few repSeps.

Scala warning match may not be exhaustive while parsing

class ExprParser extends RegexParsers {
val number = "[0-9]+".r
def expr: Parser[Int] = term ~ rep(
("+" | "-") ~ term ^^ {
case "+" ~ t => t
case "-" ~ t => -t
}) ^^ { case t ~ r => t + r.sum }
def term: Parser[Int] = factor ~ (("*" ~ factor)*) ^^ {
case f ~ r => f * r.map(_._2).product
}
def factor: Parser[Int] = number ^^ { _.toInt } | "(" ~> expr <~ ")"
}
I get the following warning when compiling
warning: match may not be exhaustive.
It would fail on the following input: ~((x: String forSome x not in ("+", "-")), _)
("+" | "-") ~ term ^^ {
^
one warning found
I heard that #unchecked annotation can help. But in this case where should I put it?
The issue here is that with ("+" | "-") you are creating a parser that accepts only two possible strings. However when you map on the resulting parser to extract the value, the result you're going to extract will just be String.
In your pattern matching you only have cases for the strings "+" and "-", but the compiler has no way of knowing that those are the only possible strings that will show up, so it's telling you here that your match may not be exhaustive since it can't know any better.
You could use an unchecked annotation to suppress the warning, but there are much better, more idiomatic ways, to eliminate the issue. One way to solve this is to replace those strings with some kind of structured type as soon as possible. For example, create an ADT
sealed trait Operation
case object Plus extends Operation
case object Minus extends Operation
//then in your parser
("+" ^^^ Plus | "-" ^^^ Minus) ~ term ^^ {
case PLus ~ t => t
case Minus ~ t => -t
}
Now it should be able to realize that the only possible cases are Plus and Minus
Add a case to remove the warning
class ExprParser extends RegexParsers {
val number = "[0-9]+".r
def expr: Parser[Int] = term ~ rep(
("+" | "-") ~ term ^^ {
case "+" ~ t => t
case "-" ~ t => -t
case _ ~ t => t
}) ^^ { case t ~ r => t + r.sum }
def term: Parser[Int] = factor ~ (("*" ~ factor)*) ^^ {
case f ~ r => f * r.map(_._2).product
}
def factor: Parser[Int] = number ^^ { _.toInt } | "(" ~> expr <~ ")"
}

recognizing eol in scala parser combinators

I'm trying to make a very simple parser with parser combinators (to parse something similar to BNF). I've checked several blog posts that explain the matter (the ones top-ranked at Google (for me)) and I think I understand it but the tests say otherwise.
I've checked the questions in StackOverflow and while some could maybe be applied and useful whenever I try to apply them something else breaks, so best way to to is going through an specific example:
This is my main:
def main(args: Array[String]) {
val parser: BaseParser = new BaseParser
val eol = sys.props("line.separator")
val test = s"a = b ${eol} a = c ${eol}"
System.out.println(test)
parser.parse(test)
}
This is the parser:
import com.github.trylks.tests.parser.ParserClasses._
import scala.util.parsing.combinator.syntactical._
import scala.util.parsing.combinator.ImplicitConversions
import scala.util.parsing.combinator.PackratParsers
class BaseParser extends StandardTokenParsers with ImplicitConversions with PackratParsers {
val eol = sys.props("line.separator")
lexical.delimiters += ("=", "|", "*", "[", "]", "(", ")", ";", eol)
def rules = rep1sep(rule, eol) ^^ { Rules(_) }
def rule = id ~ "=" ~ repsep(expression, "|") ^^ flatten3 { (e1: ID, _: Any, e3: List[Expression]) => Rule(e1, e3) }
def expression: Parser[Expression] = (element | parenthesized | optional) ^^ { x => x } // and sequence and repetition, but that's another problem...
def parenthesized: Parser[Expression] = "(" ~> expression <~ ")" ^^ { x => x }
def optional: Parser[Expression] = "[" ~> expression <~ "]" ^^ { Optional(_) }
def element: Parser[Element] = (id | constant) ^^ { x => x }
def constant: Parser[Constant] = stringLit ^^ { Constant(_) }
def id: Parser[ID] = ident ^^ { ID(_) }
def parse(text: String): Option[Rules] = {
val s = rules(new lexical.Scanner(text))
s match {
case Success(res, next) => {
println("Success!\n" + res.toString)
Some(res)
}
case Error(msg, next) => {
println("error: " + msg)
None
}
case Failure(msg, next) => {
println("failure: " + msg)
None
}
}
}
}
These are the classes that you are missing from the previous part of the code:
object ParserClasses {
abstract class Element extends Expression
case class ID(value: String) extends Element {
override def toString(): String = value
}
case class Constant(value: String) extends Element {
override def toString(): String = value
}
abstract class Expression
case class Optional(value: Expression) extends Expression {
override def toString() = s"[$value]"
}
case class Rule(head: ID, body: List[Expression]) {
override def toString() = s"$head = ${body.mkString(" | ")}"
}
case class Rules(rules: List[Rule]) {
override def toString() = rules.mkString("\n")
}
}
The problem is: as the code is now, it doesn't work, it parses only one rule (not both). If I replace eol with ";" (in the main and the parser) then it works (at least for this test).
Most people seem to prefer regex parsers, every blog explaining parser combinators doesn't get into details about the traits that could be extended or not, so I have no idea about those differences or why there are several (I say this because it may be important to understand why the code doesn't work). The problem is: If I try to use regex parsers then I get errors for all the strings that I have specified in the parsers "=", "*", etc.

Transforming Parser[Any] to a Stricter Type

Programming in Scala's Chapter 33 explains Combinator Parsing:
It provides this example:
import scala.util.parsing.combinator._
class Arith extends JavaTokenParsers {
def expr: Parser[Any] = term~rep("+"~term | "-"~term)
def term: Parser[Any] = factor~rep("*"~factor | "/"~factor)
def factor: Parser[Any] = floatingPointNumber | "("~expr~")"
}
How can I map expr to a narrower type than Parser[Any]? In other words,
I'd like to take def expr: Parser[Any] and map that via ^^ into a stricter type.
Note - I asked this question in Scala Google Groups - https://groups.google.com/forum/#!forum/scala-user, but haven't received a complete answer that helped me out.
As already stated in the comments, you can narrow down the type to anything you like. You just have to specify it after the ^^.
Here is a complete example with a data structure from your given code.
object Arith extends JavaTokenParsers {
trait Expression //The data structure
case class FNumber(value: Float) extends Expression
case class Plus(e1: Expression, e2: Expression) extends Expression
case class Minus(e1: Expression, e2: Expression) extends Expression
case class Mult(e1: Expression, e2: Expression) extends Expression
case class Div(e1: Expression, e2: Expression) extends Expression
def expr: Parser[Expression] = term ~ rep("+" ~ term | "-" ~ term) ^^ {
case term ~ rest => rest.foldLeft(term)((result, elem) => elem match {
case "+" ~ e => Plus(result, e)
case "-" ~ e => Minus(result, e)
})
}
def term: Parser[Expression] = factor ~ rep("*" ~ factor | "/" ~ factor) ^^ {
case factor ~ rest => rest.foldLeft(factor)((result, elem) => elem match {
case "*" ~ e => Mult(result, e)
case "/" ~ e => Div(result, e)
})
}
def factor: Parser[Expression] = floatingPointNumber ^^ (f => FNumber(f.toFloat)) | "(" ~> expr <~ ")"
def parseInput(input: String): Expression = parse(expr, input) match {
case Success(ex, _) => ex
case _ => throw new IllegalArgumentException //or change the result to Try[Expression]
}
}
Now we can start to parse something.
Arith.parseInput("(1.3 + 2.0) * 2")
//yields: Mult(Plus(FNumber(1.3),FNumber(2.0)),FNumber(2.0))
Of course you can also have a Parser[String] or a Parser[Float], where you directly transform or evaluate the input String. It is as I said up to you.

Scala parser combinator based calculator that can also take a dataRecord

I have created a Scala parser combinator to filter data records based on the answer I got to an previous question How to parse a string with filter citeria in scala and use it to filter objects
I would like to add the calculator parser combinator from the answer to this question
Operator Precedence with Scala Parser Combinators
to the bottom of the parser combinator that I created based on the first question. The calculator parser combinator therefore needs to accept a dataRecord so that an expression like "( doubleValue1 / 10 ) * 2 + doubleValue2" can be parsed to an function that subsequently can take a dataRecord.
This is what I came up with but the plus, minus, times and divide parser combinators are now broken because the + - * / operators are members of Double and not the function DataRecord => Double. How can I fix these parser combinators so that an expression like "( doubleValue1 / 10 ) * 2 + doubleValue2" can be succesfully parsed and results in an function that can take a dataRecord?
import scala.util.parsing.combinator._
import scala.util.parsing.combinator.JavaTokenParsers
object Main extends Arith with App {
val dataRecord = new DataRecord(100, 75 )
val input = "( doubleValue1 / 10 ) * 2 + doubleValue2"
println(parseAll(arithmicExpr, input).get(dataRecord)) // prints 95
}
class DataRecord( val doubleValue1 : Double, val doubleValue2 : Double )
class Arith extends JavaTokenParsers {
type D = Double
type Extractor[Double] = DataRecord => Double
//arithmic expression
def arithmicExpr: Parser[Extractor[D]] = term ~ rep(plus | minus) ^^ {case a~b => (a /: b)((acc,f) => f(acc))}
def plus: Parser[Extractor[D]=>Extractor[D]] = "+" ~ term ^^ {case "+"~b => _ + b}
def minus: Parser[Extractor[D]=>Extractor[D]] = "-" ~ term ^^ {case "-"~b => _ - b}
def term: Parser[Extractor[D]] = factor ~ rep(times | divide) ^^ {case a~b => (a /: b)((acc,f) => f(acc))}
def times: Parser[Extractor[D]=>Extractor[D]] = "*" ~ factor ^^ {case "*"~b => _ * (b) }
def divide: Parser[Extractor[D]=>Extractor[D]] = "/" ~ factor ^^ {case "/"~b => _ / b}
def factor: Parser[Extractor[D]] = fpn | "(" ~> arithmicExpr <~ ")" | intExtractor
def fpn: Parser[Extractor[D]] = floatingPointNumber ^^ (s => Function.const(s.toDouble)_)
def intExtractor: Parser[Extractor[D]] = ("doubleValue1" | "doubleValue2") ^^ {
case "doubleValue1" => _.doubleValue1
case "doubleValue2" => _.doubleValue2
}
}
Your approach to avoid a left recursive grammar is nice, but makes the types really complex. I prefer a different approach:
object ArithParser extends JavaTokenParsers {
//arithmic expression
def arithmicExpr: Parser[Extractor[D]] = plus
def plus: Parser[Extractor[D]] = repsep(times, "+") ^^ { summands : List[Extractor[D]] =>
(in : DataRecord) => summands.map((e : Extractor[D]) => e(in)).foldLeft(0d)(_ + _)
}
def times: Parser[Extractor[D]] = repsep(division, "*") ^^ { factors : List[Extractor[D]] =>
(in : DataRecord) => factors.map((e : Extractor[D]) => e(in)).foldLeft(1d)(_ * _)
}
def division : Parser[Extractor[D]] = rep1sep(number, "/") ^^ {divisons : List[Extractor[D]] =>
(in : DataRecord) => divisons.map((e : Extractor[D]) => e(in)).reduce(_ / _)
} | number
def number : Parser[Extractor[D]] = fpn | intExtractor
def fpn: Parser[Extractor[D]] = floatingPointNumber ^^ (s => Function.const(s.toDouble)_)
def intExtractor: Parser[Extractor[D]] = ("doubleValue1" | "doubleValue2") ^^ {
case "doubleValue1" => _.doubleValue1
case "doubleValue2" => _.doubleValue2
}
}
You can find a live demo here.
This code can be further improved: It contains lots of repeating structures. Perhaps this is a good case for Stack exchange's code review site.
Enhancements for other arithmetic operators, for mathematical functions and especially for braces are straight forward.