Scala RegexParser calculator example right-associativity - scala

Javadoc for the RegexParsers trait contains the following example:
object Calculator extends RegexParsers {
def number: Parser[Double] = """\d+(\.\d*)?""".r ^^ { _.toDouble }
def factor: Parser[Double] = number | "(" ~> expr <~ ")"
def term : Parser[Double] = factor ~ rep( "*" ~ factor | "/" ~ factor) ^^ {
case number ~ list => (number /: list) {
case (x, "*" ~ y) => x * y
case (x, "/" ~ y) => x / y
}
}
def expr : Parser[Double] = term ~ rep("+" ~ log(term)("Plus term") | "-" ~ log(term)("Minus term")) ^^ {
case number ~ list => list.foldLeft(number) { // same as before, using alternate name for /:
case (x, "+" ~ y) => x + y
case (x, "-" ~ y) => x - y
}
}
def apply(input: String): Double = parseAll(expr, input) match {
case Success(result, _) => result
case failure : NoSuccess => scala.sys.error(failure.msg)
}
}
When parsing expression 5 - 4 - 2 it treats it like (5 - 4) - 2 and returns -1. How can I change this parser to be right-associative so it will actually evaluate 5 - (4 - 2) and return 3?

I have managed to make the grammar to be right associative:
trait Tree
case class Node(op: String, left: Tree, right: Tree) extends Tree
case class Leaf(value: BigInt) extends Tree
class ExpressionParser extends RegexParsers {
var result : BigInt = 0
def number: Parser[Tree] = """-?\d+""".r ^^ { s => Leaf(BigInt(s))}
def expr : Parser[Tree] = (term ~ ("+" | "-") ~ expr ^^ {
case ((x ~ op) ~ y) => Node(op, x, y)
}) | term
def term : Parser[Tree] = (factor ~ ("*" | "/") ~ term ^^ {
case ((x ~ op) ~ y) => Node(op, x, y)
}) | factor
def factor : Parser[Tree] = number | "(" ~> expr <~ ")"
}

Related

Why the expr parser only can parse the first item of it?

I have a Praser
package app
import scala.util.parsing.combinator._
class MyParser extends JavaTokenParsers {
import MyParser._
def expr =
plus | sub | multi | divide | num
def num = floatingPointNumber ^^ (x => Value(x.toDouble).e)
def plus = num ~ rep("+" ~> num) ^^ {
case num ~ nums => nums.foldLeft(num.e) {
(x, y) => Operation("+", x, y)
}
}
def sub = num ~ rep("-" ~> num) ^^ {
case num ~ nums => nums.foldLeft(num.e){
(x, y) => Operation("-", x, y)
}
}
def multi = num ~ rep("*" ~> num) ^^ {
case num ~ nums => nums.foldLeft(num.e){
(x, y) => Operation("*", x, y)
}
}
def divide = num ~ rep("/" ~> num) ^^ {
case num ~ nums => nums.foldLeft(num.e){
(x, y) => Operation("/", x, y)
}
}
}
object MyParser {
sealed trait Expr {
def e = this.asInstanceOf[Expr]
def compute: Double = this match {
case Value(x) => x
case Operation(op, left, right) => (op : #unchecked) match {
case "+" => left.compute + right.compute
case "-" => left.compute - right.compute
case "*" => left.compute * right.compute
case "/" => left.compute / right.compute
}
}
}
case class Value(x: Double) extends Expr
case class Operation(op: String, left: Expr, right: Expr) extends Expr
}
and I use it to parse the expression something
package app
object Runner extends App {
val p = new MyParser
println(p.parseAll(p.expr, "1 * 11"))
}
it prints
[1.3] failure: end of input expected
1 * 11
^
but if I parse the expression 1 + 11, it will succeed in parsing it.
[1.7] parsed: Operation(+,Value(1.0),Value(11.0))
and I can parse something through the plus , multi , divide , num , sub combinator , but the expr combinator only can parse the first item of the or combinator .
so why it only can parse the first item of the expr parser? And how can I change the definition of the parsers to make the parse successful ?
The problem is that you're using rep which matches zero or more times.
def rep[T](p: => Parser[T]): Parser[List[T]] = rep1(p) | success(List())
you need to use rep1 instead which would require at least one match.
If you replace all rep with rep1, your code works.
Check out the changes on scastie
Run an experiment:
println(p.parseAll(p.expr, "1 + 11"))
println(p.parseAll(p.expr, "1 - 11"))
println(p.parseAll(p.expr, "1 * 11"))
println(p.parseAll(p.expr, "1 / 11"))
What will happen?
[1.7] parsed: Operation(+,Value(1.0),Value(11.0))
[1.3] failure: end of input expected
1 - 11
^
[1.3] failure: end of input expected
1 * 11
^
[1.3] failure: end of input expected
1 / 11
+ is consumed, but everything else fails. Let's change def expr definition
def expr =
multi | plus | sub | divide | num
[1.3] failure: end of input expected
1 + 11
^
[1.3] failure: end of input expected
1 - 11
^
[1.7] parsed: Operation(*,Value(1.0),Value(11.0))
[1.3] failure: end of input expected
1 / 11
^
By moving multi to the beginning, * case passed, but + failed.
def expr =
num | multi | plus | sub | divide
[1.3] failure: end of input expected
1 + 11
^
[1.3] failure: end of input expected
1 - 11
^
[1.3] failure: end of input expected
1 * 11
^
[1.3] failure: end of input expected
1 / 11
With num as the first case everything fails. It is apparent now that this code
num | multi | plus | sub | divide
is NOT matching if any of its parts match, but only if the first one matches.
What does docs says about it?
/** A parser combinator for alternative composition.
*
* `p | q` succeeds if `p` succeeds or `q` succeeds.
* Note that `q` is only tried if `p`s failure is non-fatal (i.e., back-tracking is allowed).
*
* #param q a parser that will be executed if `p` (this parser) fails (and allows back-tracking)
* #return a `Parser` that returns the result of the first parser to succeed (out of `p` and `q`)
* The resulting parser succeeds if (and only if)
* - `p` succeeds, ''or''
* - if `p` fails allowing back-tracking and `q` succeeds.
*/
def | [U >: T](q: => Parser[U]): Parser[U] = append(q).named("|")
Important note: back tracking has to be allowed. If it isn't, then failure to match the first parser, will results in failing the alternative without trying the second parser at all.
How to make your parser backtracking? Well, you would have to use PackratParsers as this is the only parser in the library that supports backtracking. Or rewrite your code to not rely on backtracking in the first place.
Personally, I recommend not using Scala Parser Combinators and instead use a library where you explicitly decide when you can still backtrack, and when you should not allow it, like e.g. fastparse.

Scala pattern matching with disjunctions not working

I am learning Scala and don't understand why the following is not working.
I want to refactor a (tested) mergeAndCount function which is part of a counting inversions algorithm to utilize pattern matching. Here is the unrefactored method:
def mergeAndCount(b: Vector[Int], c: Vector[Int]): (Int, Vector[Int]) = {
if (b.isEmpty && c.isEmpty)
(0, Vector())
else if (!b.isEmpty && (c.isEmpty || b.head < c.head)) {
val (count, r) = mergeAndCount(b drop 1, c)
(count, b.head +: r)
} else {
val (count, r) = mergeAndCount(b, c drop 1)
(count + b.length, c.head +: r)
}
}
Here is my refactored method mergeAndCount2. Which is working fine.
def mergeAndCount2(b: Vector[Int], c: Vector[Int]): (Int, Vector[Int]) = (b, c) match {
case (Vector(), Vector()) =>
(0, Vector())
case (bh +: br, Vector()) =>
val (count, r) = mergeAndCount2(br, c)
(count, bh +: r)
case (bh +: br, ch +: cr) if bh < ch =>
val (count, r) = mergeAndCount2(br, c)
(count, bh +: r)
case (_, ch +: cr) =>
val (count, r) = mergeAndCount2(b, cr)
(count + b.length, ch +: r)
}
However as you can see the second and third case are duplicate code. I therefore wanted to combine them using the disjunction like this:
case (bh +: br, Vector()) | (bh +: br, ch +: cr) if bh < ch =>
val (count, r) = mergeAndCount2(br, c)
(count, bh +: r)
This gives me an error though (on the case line): illegal variable in pattern alternative.
What am I doing wrong?
Any help (also on style) is greatly appreciated.
Update: thanks to your suggestions here is my result:
#tailrec
def mergeAndCount3(b: Vector[Int], c: Vector[Int], acc : (Int, Vector[Int])): (Int, Vector[Int]) = (b, c) match {
case (Vector(), Vector()) =>
acc
case (bh +: br, _) if c.isEmpty || bh < c.head =>
mergeAndCount3(br, c, (acc._1, acc._2 :+ bh))
case (_, ch +: cr) =>
mergeAndCount3(b, cr, (acc._1 + b.length, acc._2 :+ ch))
}
When pattern matching with pipe (|) you are not allowed to bind any variable other than wildcard (_).
This is easy to understand: in the body of your case, what would be the actual type of bh or br for example if your two alternatives match different types?
Edit - from the scala reference:
8.1.11 Pattern Alternatives Syntax: Pattern ::= Pattern1 { ‘|’ Pattern1 } A pattern alternative p 1 | . . . | p n consists of a
number of alternative patterns p i . All alternative patterns are type
checked with the expected type of the pattern. They may no bind
variables other than wildcards. The alternative pattern matches a
value v if at least one its alternatives matches v.
Edit after first comment - you can use the wildcard to match something like this for example:
try {
...
} catch {
case (_: NullPointerException | _: IllegalArgumentException) => ...
}
If you think about that, looking at your case clause, how should the compiler know if in the case body it should be allowed to use ch and cr or not?
This sort of questions make it very hard to make the compiler support disjunction and variable binding in the same case clause, thus this is not allowed at all.
Your mergeAndCount2 function looks quite fine with respect to pattern matching. I think that its most evident problem is not being tail-recursive and thus not running in constant stack space. If you can solve this problem you will probably end with something that is less repetitive as well.
You can rewrite the case expression and move the disjunction to the if part
case (bh +: br, cr) if cr.isEmpty || bh < cr.head =>
val (count, r) = mergeAndCount2(br, c)
(count, bh +: r)
Update:
You can yet simplify a little bit:
#tailrec
def mergeAndCount3(b: Vector[Int], c: Vector[Int],
count: Int = 0, r: Vector[Int] = Vector()): (Int, Vector[Int]) =
(b, c) match {
case (bh +: br, _) if c.isEmpty || bh < c.head =>
mergeAndCount3(br, c, count, bh +: r)
case (_, ch +: cr) =>
mergeAndCount3(b, cr, count + b.length, ch +: r)
case _ => (count, r)
}

Efficient scalar map / for idiom in Scala?

What is the most concise and bytecode efficient way to access a scalar expression multiple times from deep within another expression?
All of the functions in the following code (exc. scalar4) function as desired. But only bytecoder emits efficient bytecode (although it ends badly with ISTORE 2 ILOAD 2), the others each generate a half dozen INVOKE's.
This idiom is also handy for passing arbitrary parts of a tuple as parameters:
for (a_tuple) { f(_._3, _._1) + g(_._2) } // caution NOT legal Scala
In this example intro represents an expensive function that should only be called once.
object Hack extends App
{
#inline final def fur[T, V](x :T)(f :T => V) :V = f(x)
#inline final def pfor[T, V](x :T)(pf :PartialFunction[T, V]) = pf(x)
#inline final def cfor[T, V](x :T)(f :T => V) :V = x match { case x => f(x) }
def intro :Int = 600 // only one chance to make a first impression
def bytecoder = intro match { case __ => __ + __ / 600 }
def functional = fur(intro) (x => x + x / 600)
def partial = pfor(intro) { case __ => __ + __ / 600 }
def cased = cfor(intro) ($ => $ + $ / 600)
def optional = Some(intro).map(? => ? + ? / 600).get
def folder = Some(intro).fold(0)(? => ? + ? / 600)
// the for I wish for
def scalar4 = for(intro) (_ + _ / 600) // single underline!
println(bytecoder, functional, partial, cased, optional, folder)
}
public bytecoder()I
ALOAD 0
INVOKEVIRTUAL com/_601/hack/Hack$.intro ()I
ISTORE 1
ILOAD 1
ILOAD 1
SIPUSH 600
IDIV
IADD
ISTORE 2
ILOAD 2
IRETURN
Just create a local block with a temporary val. Seriously. It's compact: just one character longer than "idiomatic" pipe
{ val x = whatever; x * x / 600 }
whatever match { case x => x * x / 600 }
whatever |> { x => x * x / 600 }
It's efficient: minimum bytecode possible.
// def localval = { val x = whatever; x * x / 600 }
public int localval();
Code:
0: aload_0
1: invokevirtual #18; //Method whatever:()I
4: istore_1
5: iload_1
6: iload_1
7: imul
8: sipush 600
11: idiv
12: ireturn
The only thing it doesn't do is act as a postfix operator, and you have match for that when you really need that form and can't tolerate extra bytecode.
// Canadian scalar "for" expression
#inline final case class four[T](x: T)
{
#inline def apply(): T = x
#inline def apply[V](f: Function1[T, V]): V = f(x)
#inline def apply[V](f: Function2[T, T, V]): V = { val $ = x; f($, $) }
#inline def apply[V](f: Function3[T, T, T, V]): V = { val $ = x; f($, $, $) }
#inline def apply[V](f: Function4[T, T, T, T, V]): V = { val $ = x; f($, $, $, $) }
// ...
}
// Usage
val x = System.currentTimeMillis.toInt % 1 + 600
def a = four(x)() + 1
def b = four(x)(_ + 1)
def c = four(x)(_ + _ / x)
def d = four(x)(_ + _ / _)
def e = four(x)(_ + _ / _ - _) + 600
println(a, b, c, d, e)
With this four(){}, bytecode and performance is sacrificed in favour of style.
Also, this dangerously breaks from tradition, that underlines are used only once per parameter.

Operator Precedence with Scala Parser Combinators

I am working on a Parsing logic that needs to take operator precedence into consideration. My needs are not too complex. To start with I need multiplication and division to take higher precedence than addition and subtraction.
For example: 1 + 2 * 3 should be treated as 1 + (2 * 3). This is a simple example but you get the point!
[There are couple more custom tokens that I need to add to the precedence logic, which I may be able to add based on the suggestions I receive here.]
Here is one example of dealing with operator precedence: http://jim-mcbeath.blogspot.com/2008/09/scala-parser-combinators.html#precedencerevisited.
Are there any other ideas?
This is a bit simpler that Jim McBeath's example, but it does what you say you need, i.e. correct arithmetic precdedence, and also allows for parentheses. I adapted the example from Programming in Scala to get it to actually do the calculation and provide the answer.
It should be quite self-explanatory. There is a heirarchy formed by saying an expr consists of terms interspersed with operators, terms consist of factors with operators, and factors are floating point numbers or expressions in parentheses.
import scala.util.parsing.combinator.JavaTokenParsers
class Arith extends JavaTokenParsers {
type D = Double
def expr: Parser[D] = term ~ rep(plus | minus) ^^ {case a~b => (a /: b)((acc,f) => f(acc))}
def plus: Parser[D=>D] = "+" ~ term ^^ {case "+"~b => _ + b}
def minus: Parser[D=>D] = "-" ~ term ^^ {case "-"~b => _ - b}
def term: Parser[D] = factor ~ rep(times | divide) ^^ {case a~b => (a /: b)((acc,f) => f(acc))}
def times: Parser[D=>D] = "*" ~ factor ^^ {case "*"~b => _ * b }
def divide: Parser[D=>D] = "/" ~ factor ^^ {case "/"~b => _ / b}
def factor: Parser[D] = fpn | "(" ~> expr <~ ")"
def fpn: Parser[D] = floatingPointNumber ^^ (_.toDouble)
}
object Main extends Arith with App {
val input = "(1 + 2 * 3 + 9) * 2 + 1"
println(parseAll(expr, input).get) // prints 33.0
}

Scala. Difficult expression parser. OutOfMemoryError

I would like to create a parser for difficult expression with order of operations. I have some example but it works very slowly and throw exception OutOfMemoryError. How can I improve it?
def expr: Parser[Expression] = term5
def term5: Parser[Expression] =
(term4 ~ "OR" ~ term4) ^^ { case lhs~o~rhs => BinaryOp("OR", lhs, rhs) } |
term4
def term4: Parser[Expression] =
(term3 ~ "AND" ~ term3) ^^ { case lhs~a~rhs => BinaryOp("AND", lhs, rhs) } |
term3
def term3: Parser[Expression] =
(term2 ~ "<>" ~ term2) ^^ { case lhs~ne~rhs => BinaryOp("NE", lhs, rhs) } |
(term2 ~ "=" ~ term2) ^^ { case lhs~eq~rhs => BinaryOp("EQ", lhs, rhs) } |
(term2 ~ "NE" ~ term2) ^^ { case lhs~ne~rhs => BinaryOp("NE", lhs, rhs) } |
(term2 ~ "EQ" ~ term2) ^^ { case lhs~eq~rhs => BinaryOp("EQ", lhs, rhs) } |
term2
def term2: Parser[Expression] =
(term1 ~ "<" ~ term1) ^^ { case lhs~lt~rhs => BinaryOp("LT", lhs, rhs) } |
(term1 ~ ">" ~ term1) ^^ { case lhs~gt~rhs => BinaryOp("GT", lhs, rhs) } |
(term1 ~ "<=" ~ term1) ^^ { case lhs~le~rhs => BinaryOp("LE", lhs, rhs) } |
(term1 ~ ">=" ~ term1) ^^ { case lhs~ge~rhs => BinaryOp("GE", lhs, rhs) } |
(term1 ~ "LT" ~ term1) ^^ { case lhs~lt~rhs => BinaryOp("LT", lhs, rhs) } |
(term1 ~ "GT" ~ term1) ^^ { case lhs~gt~rhs => BinaryOp("GT", lhs, rhs) } |
(term1 ~ "LE" ~ term1) ^^ { case lhs~le~rhs => BinaryOp("LE", lhs, rhs) } |
(term1 ~ "GE" ~ term1) ^^ { case lhs~ge~rhs => BinaryOp("GE", lhs, rhs) } |
term1
def term1: Parser[Expression] =
(term ~ "+" ~ term) ^^ { case lhs~plus~rhs => BinaryOp("+", lhs, rhs) } |
(term ~ "-" ~ term) ^^ { case lhs~minus~rhs => BinaryOp("-", lhs, rhs) } |
(term ~ ":" ~ term) ^^ { case lhs~concat~rhs => BinaryOp(":", lhs, rhs) } |
term
def term: Parser[Expression] =
(factor ~ "*" ~ factor) ^^ { case lhs~times~rhs => BinaryOp("*", lhs, rhs) } |
(factor ~ "/" ~ factor) ^^ { case lhs~div~rhs => BinaryOp("/", lhs, rhs) } |
(factor ~ "MOD" ~ factor) ^^ { case lhs~div~rhs => BinaryOp("MOD", lhs, rhs) } |
factor
def factor: Parser[Expression] =
"(" ~> expr <~ ")" |
("+" | "-") ~ factor ^^ { case op~rhs => UnaryOp(op, rhs) } |
function |
numericLit ^^ { case x => Number(x/*.toFloat*/) } |
stringLit ^^ { case s => Literal(s) } |
ident ^^ { case id => Variable(id) }
Basically, it's slow and consumes too much memory because your grammar it is incredibly inefficient.
Let's consider the second line: B = A:(1+2). It will try to parse this line like this:
term4 OR term4 and then term4.
term3 AND term3 and then term3.
term2 <> term2, then =, then NE then EQ and then term2.
term1 8 different operators term1, then term1.
term + term, term - term, term : term and then term.
factor * factor, factor / factor, factor MOD factor and then factor.
parenthesis expression, unary factor, function, numeric literal, string literal, ident.
The first try is like this:
ident * factor + term < term1 <> term2 AND term3 OR term4
I'm skipping parenthesis, unary, function, numeric and string literals because they don't match A -- though function probably does, but it's definition isn't available. Now, since : doesn't match *, it will try next:
ident / factor + term < term1 <> term2 AND term3 OR term4
ident MOD factor + term < term1 <> term2 AND term3 OR term4
ident + term < term1 <> term2 AND term3 OR term4
Now it goes to the next term1:
ident * factor - term < term1 <> term2 AND term3 OR term4
ident / factor - term < term1 <> term2 AND term3 OR term4
ident MOD factor - term < term1 <> term2 AND term3 OR term4
ident - term < term1 <> term2 AND term3 OR term4
And next:
ident * factor : term < term1 <> term2 AND term3 OR term4
ident / factor : term < term1 <> term2 AND term3 OR term4
ident MOD factor : term < term1 <> term2 AND term3 OR term4
ident : term < term1 <> term2 AND term3 OR term4
Aha! We finally got a match on term1! But ( doesn't match <, so it has to try the next term2:
ident * factor + term > term1 <> term2 AND term3 OR term4
etc...
All because the first term in each line for each term will always match! To match a simple number it has to parse factor 2 * 2 * 5 * 9 * 4 * 4 = 2880 times!
But that's not half of the story! You see, because termX is repeated twice, it will repeat all this stuff on both sides. For example, the first match for A:(1+2) is this:
ident : term < term1 <> term2 AND term3 OR term4
where ident = A
and term = (1+2)
Which is incorrect, so it will try > instead of <, and then <=, etc, etc.
I'm putting a logging version of this parser below. Try to run it and see all the things it is trying to parse.
Meanwhile, there are good examples of how to write these parsers available. Using sbaz, try:
sbaz install scala-devel-docs
Then look inside the doc/scala-devel-docs/examples/parsing directory of the Scala distribution and you'll find several examples.
Here's a version of your parser (without function) that logs everything it tries:
sealed trait Expression
case class Variable(id: String) extends Expression
case class Literal(s: String) extends Expression
case class Number(x: String) extends Expression
case class UnaryOp(op: String, rhs: Expression) extends Expression
case class BinaryOp(op: String, lhs: Expression, rhs: Expression) extends Expression
object TestParser extends scala.util.parsing.combinator.syntactical.StdTokenParsers {
import scala.util.parsing.combinator.lexical.StdLexical
type Tokens = StdLexical
val lexical = new StdLexical
lexical.delimiters ++= List("(", ")", "+", "-", "*", "/", "=", "OR", "AND", "NE", "EQ", "LT", "GT", "LE", "GE", ":", "MOD")
def stmts: Parser[Any] = log(expr.*)("stmts")
def stmt: Parser[Expression] = log(expr <~ "\n")("stmt")
def expr: Parser[Expression] = log(term5)("expr")
def term5: Parser[Expression] = (
log((term4 ~ "OR" ~ term4) ^^ { case lhs~o~rhs => BinaryOp("OR", lhs, rhs) })("term5 OR")
| log(term4)("term5 term4")
)
def term4: Parser[Expression] = (
log((term3 ~ "AND" ~ term3) ^^ { case lhs~a~rhs => BinaryOp("AND", lhs, rhs) })("term4 AND")
| log(term3)("term4 term3")
)
def term3: Parser[Expression] = (
log((term2 ~ "<>" ~ term2) ^^ { case lhs~ne~rhs => BinaryOp("NE", lhs, rhs) })("term3 <>")
| log((term2 ~ "=" ~ term2) ^^ { case lhs~eq~rhs => BinaryOp("EQ", lhs, rhs) })("term3 =")
| log((term2 ~ "NE" ~ term2) ^^ { case lhs~ne~rhs => BinaryOp("NE", lhs, rhs) })("term3 NE")
| log((term2 ~ "EQ" ~ term2) ^^ { case lhs~eq~rhs => BinaryOp("EQ", lhs, rhs) })("term3 EQ")
| log(term2)("term3 term2")
)
def term2: Parser[Expression] = (
log((term1 ~ "<" ~ term1) ^^ { case lhs~lt~rhs => BinaryOp("LT", lhs, rhs) })("term2 <")
| log((term1 ~ ">" ~ term1) ^^ { case lhs~gt~rhs => BinaryOp("GT", lhs, rhs) })("term2 >")
| log((term1 ~ "<=" ~ term1) ^^ { case lhs~le~rhs => BinaryOp("LE", lhs, rhs) })("term2 <=")
| log((term1 ~ ">=" ~ term1) ^^ { case lhs~ge~rhs => BinaryOp("GE", lhs, rhs) })("term2 >=")
| log((term1 ~ "LT" ~ term1) ^^ { case lhs~lt~rhs => BinaryOp("LT", lhs, rhs) })("term2 LT")
| log((term1 ~ "GT" ~ term1) ^^ { case lhs~gt~rhs => BinaryOp("GT", lhs, rhs) })("term2 GT")
| log((term1 ~ "LE" ~ term1) ^^ { case lhs~le~rhs => BinaryOp("LE", lhs, rhs) })("term2 LE")
| log((term1 ~ "GE" ~ term1) ^^ { case lhs~ge~rhs => BinaryOp("GE", lhs, rhs) })("term2 GE")
| log(term1)("term2 term1")
)
def term1: Parser[Expression] = (
log((term ~ "+" ~ term) ^^ { case lhs~plus~rhs => BinaryOp("+", lhs, rhs) })("term1 +")
| log((term ~ "-" ~ term) ^^ { case lhs~minus~rhs => BinaryOp("-", lhs, rhs) })("term1 -")
| log((term ~ ":" ~ term) ^^ { case lhs~concat~rhs => BinaryOp(":", lhs, rhs) })("term1 :")
| log(term)("term1 term")
)
def term: Parser[Expression] = (
log((factor ~ "*" ~ factor) ^^ { case lhs~times~rhs => BinaryOp("*", lhs, rhs) })("term *")
| log((factor ~ "/" ~ factor) ^^ { case lhs~div~rhs => BinaryOp("/", lhs, rhs) })("term /")
| log((factor ~ "MOD" ~ factor) ^^ { case lhs~div~rhs => BinaryOp("MOD", lhs, rhs) })("term MOD")
| log(factor)("term factor")
)
def factor: Parser[Expression] = (
log("(" ~> expr <~ ")")("factor (expr)")
| log(("+" | "-") ~ factor ^^ { case op~rhs => UnaryOp(op, rhs) })("factor +-")
//| function |
| log(numericLit ^^ { case x => Number(x/*.toFloat*/) })("factor numericLit")
| log(stringLit ^^ { case s => Literal(s) })("factor stringLit")
| log(ident ^^ { case id => Variable(id) })("factor ident")
)
def parse(s: String) = stmts(new lexical.Scanner(s))
}
My first improvement was like that:
def term3: Parser[Expression] =
log((term2 ~ ("<>" | "=" | "NE" | "EQ") ~ term2) ^^ { case lhs~op~rhs => BinaryOp(op, lhs, rhs) })("term3 <>,=,NE,EQ") |
log(term2)("term3 term2")
It works without OutOfMemoryError but to slow. After viewing doc/scala-devel-docs/examples/parsing/lambda/TestParser.scala I got this source:
def expr: Parser[Expression] = term5
def term5: Parser[Expression] =
log(chainl1(term4, term5, "OR" ^^ {o => (a: Expression, b: Expression) => BinaryOp(o, a, b)}))("term5 OR")
def term4: Parser[Expression] =
log(chainl1(term3, term4, "AND" ^^ {o => (a: Expression, b: Expression) => BinaryOp(o, a, b)}))("term4 AND")
def term3: Parser[Expression] =
log(chainl1(term2, term3, ("<>" | "=" | "NE" | "EQ") ^^ {o => (a: Expression, b: Expression) => BinaryOp(o, a, b)}))("term3 <>,=,NE,EQ")
def term2: Parser[Expression] =
log(chainl1(term1, term2, ("<" | ">" | "<=" | ">=" | "LT" | "GT" | "LE" | "GE") ^^ {o => (a: Expression, b: Expression) => BinaryOp(o, a, b)}))("term2 <,>,...")
def term1: Parser[Expression] =
log(chainl1(term, term1, ("+" | "-" | ":") ^^ {o => (a: Expression, b: Expression) => BinaryOp(o, a, b)}))("term1 +,-,:")
def term: Parser[Expression] =
log(chainl1(factor, term, ("*" | "/" | "MOD") ^^ {o => (a: Expression, b: Expression) => BinaryOp(o, a, b)}))("term *,/,MOD")
def factor: Parser[Expression] =
log("(" ~> expr <~ ")")("factor ()") |
log(("+" | "-") ~ factor ^^ { case op~rhs => UnaryOp(op, rhs) })("factor unary") |
log(function)("factor function") |
log(numericLit ^^ { case x => Number(x/*.toFloat*/) })("factor numLit") |
log(stringLit ^^ { case s => Literal(s) })("factor strLit") |
log(ident ^^ { case id => Variable(id) })("factor ident")
It works fast. I'm sorry but I can not understand how chainl1 function improve my source. I don't understand how it works.