Scala. Difficult expression parser. OutOfMemoryError - scala

I would like to create a parser for difficult expression with order of operations. I have some example but it works very slowly and throw exception OutOfMemoryError. How can I improve it?
def expr: Parser[Expression] = term5
def term5: Parser[Expression] =
(term4 ~ "OR" ~ term4) ^^ { case lhs~o~rhs => BinaryOp("OR", lhs, rhs) } |
term4
def term4: Parser[Expression] =
(term3 ~ "AND" ~ term3) ^^ { case lhs~a~rhs => BinaryOp("AND", lhs, rhs) } |
term3
def term3: Parser[Expression] =
(term2 ~ "<>" ~ term2) ^^ { case lhs~ne~rhs => BinaryOp("NE", lhs, rhs) } |
(term2 ~ "=" ~ term2) ^^ { case lhs~eq~rhs => BinaryOp("EQ", lhs, rhs) } |
(term2 ~ "NE" ~ term2) ^^ { case lhs~ne~rhs => BinaryOp("NE", lhs, rhs) } |
(term2 ~ "EQ" ~ term2) ^^ { case lhs~eq~rhs => BinaryOp("EQ", lhs, rhs) } |
term2
def term2: Parser[Expression] =
(term1 ~ "<" ~ term1) ^^ { case lhs~lt~rhs => BinaryOp("LT", lhs, rhs) } |
(term1 ~ ">" ~ term1) ^^ { case lhs~gt~rhs => BinaryOp("GT", lhs, rhs) } |
(term1 ~ "<=" ~ term1) ^^ { case lhs~le~rhs => BinaryOp("LE", lhs, rhs) } |
(term1 ~ ">=" ~ term1) ^^ { case lhs~ge~rhs => BinaryOp("GE", lhs, rhs) } |
(term1 ~ "LT" ~ term1) ^^ { case lhs~lt~rhs => BinaryOp("LT", lhs, rhs) } |
(term1 ~ "GT" ~ term1) ^^ { case lhs~gt~rhs => BinaryOp("GT", lhs, rhs) } |
(term1 ~ "LE" ~ term1) ^^ { case lhs~le~rhs => BinaryOp("LE", lhs, rhs) } |
(term1 ~ "GE" ~ term1) ^^ { case lhs~ge~rhs => BinaryOp("GE", lhs, rhs) } |
term1
def term1: Parser[Expression] =
(term ~ "+" ~ term) ^^ { case lhs~plus~rhs => BinaryOp("+", lhs, rhs) } |
(term ~ "-" ~ term) ^^ { case lhs~minus~rhs => BinaryOp("-", lhs, rhs) } |
(term ~ ":" ~ term) ^^ { case lhs~concat~rhs => BinaryOp(":", lhs, rhs) } |
term
def term: Parser[Expression] =
(factor ~ "*" ~ factor) ^^ { case lhs~times~rhs => BinaryOp("*", lhs, rhs) } |
(factor ~ "/" ~ factor) ^^ { case lhs~div~rhs => BinaryOp("/", lhs, rhs) } |
(factor ~ "MOD" ~ factor) ^^ { case lhs~div~rhs => BinaryOp("MOD", lhs, rhs) } |
factor
def factor: Parser[Expression] =
"(" ~> expr <~ ")" |
("+" | "-") ~ factor ^^ { case op~rhs => UnaryOp(op, rhs) } |
function |
numericLit ^^ { case x => Number(x/*.toFloat*/) } |
stringLit ^^ { case s => Literal(s) } |
ident ^^ { case id => Variable(id) }

Basically, it's slow and consumes too much memory because your grammar it is incredibly inefficient.
Let's consider the second line: B = A:(1+2). It will try to parse this line like this:
term4 OR term4 and then term4.
term3 AND term3 and then term3.
term2 <> term2, then =, then NE then EQ and then term2.
term1 8 different operators term1, then term1.
term + term, term - term, term : term and then term.
factor * factor, factor / factor, factor MOD factor and then factor.
parenthesis expression, unary factor, function, numeric literal, string literal, ident.
The first try is like this:
ident * factor + term < term1 <> term2 AND term3 OR term4
I'm skipping parenthesis, unary, function, numeric and string literals because they don't match A -- though function probably does, but it's definition isn't available. Now, since : doesn't match *, it will try next:
ident / factor + term < term1 <> term2 AND term3 OR term4
ident MOD factor + term < term1 <> term2 AND term3 OR term4
ident + term < term1 <> term2 AND term3 OR term4
Now it goes to the next term1:
ident * factor - term < term1 <> term2 AND term3 OR term4
ident / factor - term < term1 <> term2 AND term3 OR term4
ident MOD factor - term < term1 <> term2 AND term3 OR term4
ident - term < term1 <> term2 AND term3 OR term4
And next:
ident * factor : term < term1 <> term2 AND term3 OR term4
ident / factor : term < term1 <> term2 AND term3 OR term4
ident MOD factor : term < term1 <> term2 AND term3 OR term4
ident : term < term1 <> term2 AND term3 OR term4
Aha! We finally got a match on term1! But ( doesn't match <, so it has to try the next term2:
ident * factor + term > term1 <> term2 AND term3 OR term4
etc...
All because the first term in each line for each term will always match! To match a simple number it has to parse factor 2 * 2 * 5 * 9 * 4 * 4 = 2880 times!
But that's not half of the story! You see, because termX is repeated twice, it will repeat all this stuff on both sides. For example, the first match for A:(1+2) is this:
ident : term < term1 <> term2 AND term3 OR term4
where ident = A
and term = (1+2)
Which is incorrect, so it will try > instead of <, and then <=, etc, etc.
I'm putting a logging version of this parser below. Try to run it and see all the things it is trying to parse.
Meanwhile, there are good examples of how to write these parsers available. Using sbaz, try:
sbaz install scala-devel-docs
Then look inside the doc/scala-devel-docs/examples/parsing directory of the Scala distribution and you'll find several examples.
Here's a version of your parser (without function) that logs everything it tries:
sealed trait Expression
case class Variable(id: String) extends Expression
case class Literal(s: String) extends Expression
case class Number(x: String) extends Expression
case class UnaryOp(op: String, rhs: Expression) extends Expression
case class BinaryOp(op: String, lhs: Expression, rhs: Expression) extends Expression
object TestParser extends scala.util.parsing.combinator.syntactical.StdTokenParsers {
import scala.util.parsing.combinator.lexical.StdLexical
type Tokens = StdLexical
val lexical = new StdLexical
lexical.delimiters ++= List("(", ")", "+", "-", "*", "/", "=", "OR", "AND", "NE", "EQ", "LT", "GT", "LE", "GE", ":", "MOD")
def stmts: Parser[Any] = log(expr.*)("stmts")
def stmt: Parser[Expression] = log(expr <~ "\n")("stmt")
def expr: Parser[Expression] = log(term5)("expr")
def term5: Parser[Expression] = (
log((term4 ~ "OR" ~ term4) ^^ { case lhs~o~rhs => BinaryOp("OR", lhs, rhs) })("term5 OR")
| log(term4)("term5 term4")
)
def term4: Parser[Expression] = (
log((term3 ~ "AND" ~ term3) ^^ { case lhs~a~rhs => BinaryOp("AND", lhs, rhs) })("term4 AND")
| log(term3)("term4 term3")
)
def term3: Parser[Expression] = (
log((term2 ~ "<>" ~ term2) ^^ { case lhs~ne~rhs => BinaryOp("NE", lhs, rhs) })("term3 <>")
| log((term2 ~ "=" ~ term2) ^^ { case lhs~eq~rhs => BinaryOp("EQ", lhs, rhs) })("term3 =")
| log((term2 ~ "NE" ~ term2) ^^ { case lhs~ne~rhs => BinaryOp("NE", lhs, rhs) })("term3 NE")
| log((term2 ~ "EQ" ~ term2) ^^ { case lhs~eq~rhs => BinaryOp("EQ", lhs, rhs) })("term3 EQ")
| log(term2)("term3 term2")
)
def term2: Parser[Expression] = (
log((term1 ~ "<" ~ term1) ^^ { case lhs~lt~rhs => BinaryOp("LT", lhs, rhs) })("term2 <")
| log((term1 ~ ">" ~ term1) ^^ { case lhs~gt~rhs => BinaryOp("GT", lhs, rhs) })("term2 >")
| log((term1 ~ "<=" ~ term1) ^^ { case lhs~le~rhs => BinaryOp("LE", lhs, rhs) })("term2 <=")
| log((term1 ~ ">=" ~ term1) ^^ { case lhs~ge~rhs => BinaryOp("GE", lhs, rhs) })("term2 >=")
| log((term1 ~ "LT" ~ term1) ^^ { case lhs~lt~rhs => BinaryOp("LT", lhs, rhs) })("term2 LT")
| log((term1 ~ "GT" ~ term1) ^^ { case lhs~gt~rhs => BinaryOp("GT", lhs, rhs) })("term2 GT")
| log((term1 ~ "LE" ~ term1) ^^ { case lhs~le~rhs => BinaryOp("LE", lhs, rhs) })("term2 LE")
| log((term1 ~ "GE" ~ term1) ^^ { case lhs~ge~rhs => BinaryOp("GE", lhs, rhs) })("term2 GE")
| log(term1)("term2 term1")
)
def term1: Parser[Expression] = (
log((term ~ "+" ~ term) ^^ { case lhs~plus~rhs => BinaryOp("+", lhs, rhs) })("term1 +")
| log((term ~ "-" ~ term) ^^ { case lhs~minus~rhs => BinaryOp("-", lhs, rhs) })("term1 -")
| log((term ~ ":" ~ term) ^^ { case lhs~concat~rhs => BinaryOp(":", lhs, rhs) })("term1 :")
| log(term)("term1 term")
)
def term: Parser[Expression] = (
log((factor ~ "*" ~ factor) ^^ { case lhs~times~rhs => BinaryOp("*", lhs, rhs) })("term *")
| log((factor ~ "/" ~ factor) ^^ { case lhs~div~rhs => BinaryOp("/", lhs, rhs) })("term /")
| log((factor ~ "MOD" ~ factor) ^^ { case lhs~div~rhs => BinaryOp("MOD", lhs, rhs) })("term MOD")
| log(factor)("term factor")
)
def factor: Parser[Expression] = (
log("(" ~> expr <~ ")")("factor (expr)")
| log(("+" | "-") ~ factor ^^ { case op~rhs => UnaryOp(op, rhs) })("factor +-")
//| function |
| log(numericLit ^^ { case x => Number(x/*.toFloat*/) })("factor numericLit")
| log(stringLit ^^ { case s => Literal(s) })("factor stringLit")
| log(ident ^^ { case id => Variable(id) })("factor ident")
)
def parse(s: String) = stmts(new lexical.Scanner(s))
}

My first improvement was like that:
def term3: Parser[Expression] =
log((term2 ~ ("<>" | "=" | "NE" | "EQ") ~ term2) ^^ { case lhs~op~rhs => BinaryOp(op, lhs, rhs) })("term3 <>,=,NE,EQ") |
log(term2)("term3 term2")
It works without OutOfMemoryError but to slow. After viewing doc/scala-devel-docs/examples/parsing/lambda/TestParser.scala I got this source:
def expr: Parser[Expression] = term5
def term5: Parser[Expression] =
log(chainl1(term4, term5, "OR" ^^ {o => (a: Expression, b: Expression) => BinaryOp(o, a, b)}))("term5 OR")
def term4: Parser[Expression] =
log(chainl1(term3, term4, "AND" ^^ {o => (a: Expression, b: Expression) => BinaryOp(o, a, b)}))("term4 AND")
def term3: Parser[Expression] =
log(chainl1(term2, term3, ("<>" | "=" | "NE" | "EQ") ^^ {o => (a: Expression, b: Expression) => BinaryOp(o, a, b)}))("term3 <>,=,NE,EQ")
def term2: Parser[Expression] =
log(chainl1(term1, term2, ("<" | ">" | "<=" | ">=" | "LT" | "GT" | "LE" | "GE") ^^ {o => (a: Expression, b: Expression) => BinaryOp(o, a, b)}))("term2 <,>,...")
def term1: Parser[Expression] =
log(chainl1(term, term1, ("+" | "-" | ":") ^^ {o => (a: Expression, b: Expression) => BinaryOp(o, a, b)}))("term1 +,-,:")
def term: Parser[Expression] =
log(chainl1(factor, term, ("*" | "/" | "MOD") ^^ {o => (a: Expression, b: Expression) => BinaryOp(o, a, b)}))("term *,/,MOD")
def factor: Parser[Expression] =
log("(" ~> expr <~ ")")("factor ()") |
log(("+" | "-") ~ factor ^^ { case op~rhs => UnaryOp(op, rhs) })("factor unary") |
log(function)("factor function") |
log(numericLit ^^ { case x => Number(x/*.toFloat*/) })("factor numLit") |
log(stringLit ^^ { case s => Literal(s) })("factor strLit") |
log(ident ^^ { case id => Variable(id) })("factor ident")
It works fast. I'm sorry but I can not understand how chainl1 function improve my source. I don't understand how it works.

Related

Why the expr parser only can parse the first item of it?

I have a Praser
package app
import scala.util.parsing.combinator._
class MyParser extends JavaTokenParsers {
import MyParser._
def expr =
plus | sub | multi | divide | num
def num = floatingPointNumber ^^ (x => Value(x.toDouble).e)
def plus = num ~ rep("+" ~> num) ^^ {
case num ~ nums => nums.foldLeft(num.e) {
(x, y) => Operation("+", x, y)
}
}
def sub = num ~ rep("-" ~> num) ^^ {
case num ~ nums => nums.foldLeft(num.e){
(x, y) => Operation("-", x, y)
}
}
def multi = num ~ rep("*" ~> num) ^^ {
case num ~ nums => nums.foldLeft(num.e){
(x, y) => Operation("*", x, y)
}
}
def divide = num ~ rep("/" ~> num) ^^ {
case num ~ nums => nums.foldLeft(num.e){
(x, y) => Operation("/", x, y)
}
}
}
object MyParser {
sealed trait Expr {
def e = this.asInstanceOf[Expr]
def compute: Double = this match {
case Value(x) => x
case Operation(op, left, right) => (op : #unchecked) match {
case "+" => left.compute + right.compute
case "-" => left.compute - right.compute
case "*" => left.compute * right.compute
case "/" => left.compute / right.compute
}
}
}
case class Value(x: Double) extends Expr
case class Operation(op: String, left: Expr, right: Expr) extends Expr
}
and I use it to parse the expression something
package app
object Runner extends App {
val p = new MyParser
println(p.parseAll(p.expr, "1 * 11"))
}
it prints
[1.3] failure: end of input expected
1 * 11
^
but if I parse the expression 1 + 11, it will succeed in parsing it.
[1.7] parsed: Operation(+,Value(1.0),Value(11.0))
and I can parse something through the plus , multi , divide , num , sub combinator , but the expr combinator only can parse the first item of the or combinator .
so why it only can parse the first item of the expr parser? And how can I change the definition of the parsers to make the parse successful ?
The problem is that you're using rep which matches zero or more times.
def rep[T](p: => Parser[T]): Parser[List[T]] = rep1(p) | success(List())
you need to use rep1 instead which would require at least one match.
If you replace all rep with rep1, your code works.
Check out the changes on scastie
Run an experiment:
println(p.parseAll(p.expr, "1 + 11"))
println(p.parseAll(p.expr, "1 - 11"))
println(p.parseAll(p.expr, "1 * 11"))
println(p.parseAll(p.expr, "1 / 11"))
What will happen?
[1.7] parsed: Operation(+,Value(1.0),Value(11.0))
[1.3] failure: end of input expected
1 - 11
^
[1.3] failure: end of input expected
1 * 11
^
[1.3] failure: end of input expected
1 / 11
+ is consumed, but everything else fails. Let's change def expr definition
def expr =
multi | plus | sub | divide | num
[1.3] failure: end of input expected
1 + 11
^
[1.3] failure: end of input expected
1 - 11
^
[1.7] parsed: Operation(*,Value(1.0),Value(11.0))
[1.3] failure: end of input expected
1 / 11
^
By moving multi to the beginning, * case passed, but + failed.
def expr =
num | multi | plus | sub | divide
[1.3] failure: end of input expected
1 + 11
^
[1.3] failure: end of input expected
1 - 11
^
[1.3] failure: end of input expected
1 * 11
^
[1.3] failure: end of input expected
1 / 11
With num as the first case everything fails. It is apparent now that this code
num | multi | plus | sub | divide
is NOT matching if any of its parts match, but only if the first one matches.
What does docs says about it?
/** A parser combinator for alternative composition.
*
* `p | q` succeeds if `p` succeeds or `q` succeeds.
* Note that `q` is only tried if `p`s failure is non-fatal (i.e., back-tracking is allowed).
*
* #param q a parser that will be executed if `p` (this parser) fails (and allows back-tracking)
* #return a `Parser` that returns the result of the first parser to succeed (out of `p` and `q`)
* The resulting parser succeeds if (and only if)
* - `p` succeeds, ''or''
* - if `p` fails allowing back-tracking and `q` succeeds.
*/
def | [U >: T](q: => Parser[U]): Parser[U] = append(q).named("|")
Important note: back tracking has to be allowed. If it isn't, then failure to match the first parser, will results in failing the alternative without trying the second parser at all.
How to make your parser backtracking? Well, you would have to use PackratParsers as this is the only parser in the library that supports backtracking. Or rewrite your code to not rely on backtracking in the first place.
Personally, I recommend not using Scala Parser Combinators and instead use a library where you explicitly decide when you can still backtrack, and when you should not allow it, like e.g. fastparse.

Scala RegexParser calculator example right-associativity

Javadoc for the RegexParsers trait contains the following example:
object Calculator extends RegexParsers {
def number: Parser[Double] = """\d+(\.\d*)?""".r ^^ { _.toDouble }
def factor: Parser[Double] = number | "(" ~> expr <~ ")"
def term : Parser[Double] = factor ~ rep( "*" ~ factor | "/" ~ factor) ^^ {
case number ~ list => (number /: list) {
case (x, "*" ~ y) => x * y
case (x, "/" ~ y) => x / y
}
}
def expr : Parser[Double] = term ~ rep("+" ~ log(term)("Plus term") | "-" ~ log(term)("Minus term")) ^^ {
case number ~ list => list.foldLeft(number) { // same as before, using alternate name for /:
case (x, "+" ~ y) => x + y
case (x, "-" ~ y) => x - y
}
}
def apply(input: String): Double = parseAll(expr, input) match {
case Success(result, _) => result
case failure : NoSuccess => scala.sys.error(failure.msg)
}
}
When parsing expression 5 - 4 - 2 it treats it like (5 - 4) - 2 and returns -1. How can I change this parser to be right-associative so it will actually evaluate 5 - (4 - 2) and return 3?
I have managed to make the grammar to be right associative:
trait Tree
case class Node(op: String, left: Tree, right: Tree) extends Tree
case class Leaf(value: BigInt) extends Tree
class ExpressionParser extends RegexParsers {
var result : BigInt = 0
def number: Parser[Tree] = """-?\d+""".r ^^ { s => Leaf(BigInt(s))}
def expr : Parser[Tree] = (term ~ ("+" | "-") ~ expr ^^ {
case ((x ~ op) ~ y) => Node(op, x, y)
}) | term
def term : Parser[Tree] = (factor ~ ("*" | "/") ~ term ^^ {
case ((x ~ op) ~ y) => Node(op, x, y)
}) | factor
def factor : Parser[Tree] = number | "(" ~> expr <~ ")"
}

Scala - StdTokenParsers parse not exactly as expected

I have an assignment to parse a demo language, this is the code that has problems in it, the other work as I expected:
def parse(s: String) = phrase(program)(new lexical.Scanner(s))
def program: Parser[Any] = rep(sttment)
//Operator
def expression: Parser[Any] = lv1 ~ rep(("<" | ">" | "<=" | ">=") ~ lv1)
def lv1: Parser[Any] = lv2 ~ rep(("<>" | "==") ~ lit)
def lv2: Parser[Any] = lit ~ opt("." ~ ident ~ opt("(" ~ repsep(expression, ",") ~ ")"))
def lit: Parser[Any] = ident | boollit | floatlit | intlit | stringlit
// Statements
def sttment: Parser[Any] = sttm | "{" ~ rep(sttment) ~ "}"
def sttm: Parser[Any] = (assignment ||| returnsttm ||| invokesttm ||| ifsttm ||| whilesttm |||
repeatsttm ||| forsttm ||| breaksttm ||| continuesttm ) ~ ";"
def assignment: Parser[Any] = lhs ~ ":=" ~ expression
def lhs: Parser[Any] = ( "self" ~ "." ~ ident )|||( ident ~ "." ~ ident )|||( ident ~ "[" ~ expression ~ "]")|||ident
def ifsttm: Parser[Any] = "if" ~ expression ~ "then" ~ sttment ~ opt("else" ~ sttment)
def whilesttm: Parser[Any] = "while" ~ expression ~ "do" ~ sttment
def repeatsttm: Parser[Any] = "repeat" ~ sttment ~ "until" ~ expression
def forsttm: Parser[Any] = "for" ~ ident ~ ":=" ~ expression ~ ("to" | "downto") ~ expression ~ "do" ~ sttment
def breaksttm: Parser[Any] = "break"
def continuesttm: Parser[Any] = "continue"
def returnsttm: Parser[Any] = "return" ~ expression
def invokesttm: Parser[Any] = expression ~ "." ~ ident ~ "(" ~ repsep(expression, ",") ~ ")"
def primtype: Parser[Any] = "integer" | "float" | "bool" | "string" | "void"
def boollit: Parser[Any] = elem("boolean", _.isInstanceOf[lexical.BooleanLit])
def floatlit: Parser[Any] = elem("real", _.isInstanceOf[lexical.FloatLit])
def intlit: Parser[Any] = elem("integer", _.isInstanceOf[lexical.IntLit])
def stringlit: Parser[Any] = elem("string", _.isInstanceOf[lexical.StringLit])
For example, when I parse this string:
io.writeFloatLn(s.getArea());
It return:
``.'' expected but `;' found"
at the "return 1". Can someone tell me what mistakes did I make?
Edit:
- I am sorry, because I didn't understand my problem enough, I have asked it the wrong way, now I write the exact error it make.
Delimiter and keywords list:
reserved ++= List("bool", "break", "continue", "do", "downto", "else", "float", "for",
"if", "integer", "new", "repeat", "string", "then", "to", "until", "while", "return",
"true", "false", "void", "null", "self", "final", "class", "extends", "abstract")
delimiters ++= List("[", "]", "(", ")", ":", ";", ".", ",", "{", "}", "+", "=",
"-", "*", "/", "\", "%", ":=", "==", "<", "<=", ">", ">=", "<>", "&&", "!", "||", "^")
Seems like they work as expected - io.writeFloatLn(s.getArea()) was parsed as expression inside statement - so parser just waiting for "." from your statement, something like:
io.writeFloatLn(s.getArea()).writeFloatLn()
I think you can use brackets only as a part of statement (call operation) or only as a part of expression (application), depending on which language do you need - imperative or functional. Same for ..
I would have to see more of your code to be sure but I don't see returnsttm in your sttm list. That would prevent sttment from matching returnsttm before looking for the opening brace.

How to use ~> and <~ in grammar rule definition in Scala?

How can I ignore all strings in these grammar rules using correct placement of ~> or <~ operators?
def typeDefBody = ident ~ ":" ~ ident ~ "{" ~ fieldBody ~ "}"
def fieldBody = "validation" ~ "{" ~ validationBody ~ "}"
def validationBody = length ~ pattern
def length = "length" ~ "=" ~ wholeNumber ~ "to" ~ wholeNumber
def pattern = "pattern" ~ "=" ~ stringLiteral
I found the solution, I should break typeDefBody to 3 None terminal rules as below
def typeDefBody = ident ~ typeDefBodySequence1
def typeDefBodySequence1 = ":" ~> ident ~ typeDefBodySequence2
def typeDefBodySequence2 = "{" ~> fieldBody <~ "}"
def fieldBody = "validation" ~ "{" ~> validationBody <~ "}"
def validationBody = length ~ pattern
def length = "length" ~ "=" ~> wholeNumber ~ "to" ~ wholeNumber
def pattern = "pattern" ~ "=" ~> stringLiteral

Operator Precedence with Scala Parser Combinators

I am working on a Parsing logic that needs to take operator precedence into consideration. My needs are not too complex. To start with I need multiplication and division to take higher precedence than addition and subtraction.
For example: 1 + 2 * 3 should be treated as 1 + (2 * 3). This is a simple example but you get the point!
[There are couple more custom tokens that I need to add to the precedence logic, which I may be able to add based on the suggestions I receive here.]
Here is one example of dealing with operator precedence: http://jim-mcbeath.blogspot.com/2008/09/scala-parser-combinators.html#precedencerevisited.
Are there any other ideas?
This is a bit simpler that Jim McBeath's example, but it does what you say you need, i.e. correct arithmetic precdedence, and also allows for parentheses. I adapted the example from Programming in Scala to get it to actually do the calculation and provide the answer.
It should be quite self-explanatory. There is a heirarchy formed by saying an expr consists of terms interspersed with operators, terms consist of factors with operators, and factors are floating point numbers or expressions in parentheses.
import scala.util.parsing.combinator.JavaTokenParsers
class Arith extends JavaTokenParsers {
type D = Double
def expr: Parser[D] = term ~ rep(plus | minus) ^^ {case a~b => (a /: b)((acc,f) => f(acc))}
def plus: Parser[D=>D] = "+" ~ term ^^ {case "+"~b => _ + b}
def minus: Parser[D=>D] = "-" ~ term ^^ {case "-"~b => _ - b}
def term: Parser[D] = factor ~ rep(times | divide) ^^ {case a~b => (a /: b)((acc,f) => f(acc))}
def times: Parser[D=>D] = "*" ~ factor ^^ {case "*"~b => _ * b }
def divide: Parser[D=>D] = "/" ~ factor ^^ {case "/"~b => _ / b}
def factor: Parser[D] = fpn | "(" ~> expr <~ ")"
def fpn: Parser[D] = floatingPointNumber ^^ (_.toDouble)
}
object Main extends Arith with App {
val input = "(1 + 2 * 3 + 9) * 2 + 1"
println(parseAll(expr, input).get) // prints 33.0
}