Lets say I want to parse a string in scala, and every time there were parenthesis nested within each other I would multiply some number with itself . Ex
(()) +() + ((())) with number=3 would be 3*3 + 3 + 3*3*3. How would I do this with scala combinators.
class SimpleParser extends JavaTokenParsers {
def Base:Parser[Int] = """(""" ~remainder ~ """)"""
def Plus = atom ~ '+' ~ remainder
def Parens = Base
def remainder:Parser[Int] =(Next|Start) }
How would I make it so that every time an atom is parsed the number would multiply by itself, and then what was inside the atom will also be parsed?
would I put a method after the atom def like
def Base:Parser[Int] = """(""" ~remainder ~ """)""" ^^(2*paser(remainder))
? I don't understand how to do this because of the recursive nature of it, as if I find parenthesis, I must then multiply by three times whatever is in these parenthesis.
This is easiest if you build up the number from the inside out. For the parenthetical groups, we start with the base case (which will result in simply the number itself), and then add the number again for each nesting. For the sum, we start with a single parenthetical group and then optionally add summands until we run out:
import scala.util.parsing.combinator.JavaTokenParsers
class SimpleParser(number: Int) extends JavaTokenParsers {
def base: Parser[Int] = literal("()").map(_ => number)
def pars: Parser[Int] = base | ("(" ~> pars <~ ")").map(_ + number)
def plus: Parser[Int] = "+" ~> expr
def expr: Parser[Int] = (pars ~ opt(plus).map(_.getOrElse(0))).map {
case first ~ rest => first + rest
}
}
object ParserWith3 extends SimpleParser(3)
And then:
scala> ParserWith3.parseAll(ParserWith3.expr, "(())+()+((()))")
res0: ParserWith3.ParseResult[Int] = [1.15] parsed: 18
I'm using map because I can't stand the parsing library's little operator party, but you could replace all the maps with ^^ or ^^^ if you really wanted to.
If you use the fact that you can build right recursive rules using scala parser combinators(here mult appears on the right of its own definition for example):
import scala.util.parsing.combinator.RegexParsers
trait ExprsParsers extends RegexParsers {
val value = 3
lazy val mult: Parser[Int] =
"(" ~> mult <~ ")" ^^ { _ * value } |||
"()" ^^ { _ => value }
lazy val plus: Parser[Int] =
(mult <~ "+") ~ plus ^^ { case m ~ p => m + p } |||
mult
}
To use that code you simply create a structure that inherits ExprsParsers, e.g. :
object MainObj extends ExprsParsers {
def main(args: Array[String]): Unit = {
println(parseAll(plus, "() + ()")) //[1.8] parsed: 6
println(parseAll(plus, "() + (())")) //[1.10] parsed: 12
println(parseAll(plus, "((())) + ()")) //[1.12] parsed: 30
}
}
check scala source file for parser for any operator you don't understand.
Related
I would like to make an AST for arithmetic expression using fastparse from Scala.
For me a arithmetic expression is like:
var_name := value; // value can be an integer, or a whole expression
For the moment I have this parsers:
def word[_:P] = P((CharIn("a-z") | CharIn("A-Z") | "_").rep(1).!)
def digits[_ : P] = P(CharIn("0-9").rep.!)
def div_mul[_: P] = P( digits~ space.? ~ (("*" | "/").! ~ space.? ~/ digits).rep ).map(eval)
def add_sub[_: P] = P( div_mul ~ space.? ~ (("+" | "-").! ~ space.? ~/ div_mul).rep ).map(eval)
def expr[_: P]= P( " ".rep ~ add_sub ~ " ".rep ~ End )
def var_assig[_:P] = P(word ~ " " ~ ":=" ~ " " ~ (value | expr) ~ ";")
I want to create AST for arithmetic expression (2+3*2 for example).
Expected result: Assignment[2,plus[mult,[3,2]]] // symbol[left, right]
My questions is:
What should be like the Tree class/object, if it is necessary, because I want to evaluate that result? This class I will use for the rest parse(if, while).
What should be like the eval function, who takes the input an string, or Seq[String] and return a AST with my expected result?
Here is my way of doing it.
I have defined the components of the Arithmetic Expression using the following Trait:
sealed trait Expression
case class Add(l: Expression, r: Expression) extends Expression
case class Sub(l: Expression, r: Expression) extends Expression
case class Mul(l: Expression, r: Expression) extends Expression
case class Div(l: Expression, r: Expression) extends Expression
case class Num(value: String) extends Expression
And defined the following fastparse patterns (similar to what is described here: https://com-lihaoyi.github.io/fastparse/#Math)
def number[_: P]: P[Expression] = P(CharIn("0-9").rep(1)).!.map(Num)
def parens[_: P]: P[Expression] = P("(" ~/ addSub ~ ")")
def factor[_: P]: P[Expression] = P(number | parens)
def divMul[_: P]: P[Expression] = P(factor ~ (CharIn("*/").! ~/ factor).rep).map(astBuilder _ tupled)
def addSub[_: P]: P[Expression] = P(divMul ~ (CharIn("+\\-").! ~/ divMul).rep).map(astBuilder _ tupled)
def expr[_: P]: P[Expression] = P(addSub ~ End)
Instead of the eval function that was used in the map, I have written a similar one, which returns a folded entity of the previously defined case classes:
def astBuilder(initial: Expression, rest: Seq[(String, Expression)]): Expression = {
rest.foldLeft(initial) {
case (left, (operator, right)) =>
operator match {
case "*" => Mul(left, right)
case "/" => Div(left, right)
case "+" => Add(left, right)
case "-" => Sub(left, right)
}
}
}
And if we would run the following expression:
val Parsed.Success(res, _) = parse("2+3*2", expr(_))
The result would be: Add(Num(2),Mul(Num(3),Num(2)))
I am trying to write a Scala Parser combinator for the following input.
The input can be
10
(10)
((10)))
(((10)))
Here the number of brackets can keep on growing. but they should always match. So parsing should fail for ((((10)))
The result of parsing should always be the number at the center
I wrote the following parser
import scala.util.parsing.combinator._
class MyParser extends RegexParsers {
def i = "[0-9]+".r ^^ (_.toInt)
def n = "(" ~ i ~ ")" ^^ {case _ ~ b ~ _ => b.toInt}
def expr = i | n
}
val parser = new MyParser
parser.parseAll(parser.expr, "10")
parser.parseAll(parser.expr, "(10)")
but now how do I handle the case where the number of brackets keep growing but matched?
Easy, just make the parser recursive:
class MyParser extends RegexParsers {
def i = "[0-9]+".r ^^ (_.toInt)
def expr: Parser[Int] = i | "(" ~ expr ~ ")" ^^ {case _ ~ b ~ _ => b.toInt}
}
(but note that scala-parser-combinators has trouble with left-recursive definitions: Recursive definitions with scala-parser-combinators)
So, I'm working on this thing in scala to try to parse arithmetic expressions. I have this below where an expr can either be an add of two exprs or an integer constant, but it gets stuck in an infinite loop of add calling expr calling add calling expr... I'm pretty new to scala, but not to parsing. I know I'm doing something wrong, but the real question is, it it something simple?
import scala.util.parsing.combinator._
abstract class Expr
case class Add(x: Expr, y: Expr) extends Expr
case class Constant(con: String) extends Expr
class Comp extends RegexParsers {
def integer:Parser[Expr] = """-?\d+""".r ^^ {
s => Constant(s)
}
def add: Parser[Expr] = expr ~ "+" ~ expr ^^ {
case(a ~ "+" ~ b) => Add(a, b)
}
def expr: Parser[Expr] = (add | integer)
}
object Compiler extends Comp {
def main(args: Array[String]) = parse(expr, "5+ -3"))//println("5+ -3")
}
Basic RegexParsers can't parse left-recursive grammars. To make it work, you can either modify the rule for add to remove left-recursiveness:
def add: Parser[Expr] = integer ~ "+" ~ expr ^^ {
case(a ~ "+" ~ b) => Add(a, b)
}
or use PackratParsers, which can parse such grammars:
class Comp extends RegexParsers with PackratParsers {
lazy val integer:PackratParser[Expr] = """-?\d+""".r ^^ {
s => Constant(s)
}
lazy val add: PackratParser[Expr] = expr ~ "+" ~ expr ^^ {
case(a ~ "+" ~ b) => Add(a, b)
}
lazy val expr: PackratParser[Expr] = (add | integer)
}
object Compiler extends Comp {
def main(args: Array[String]) = parseAll(expr, "5+ -3")
}
I want to parse a String with scala parser combinators. Lets take
abcd,123,ghijk
as example. So we have 2 words and an Integer joined by comma.
I can do it like that:
import scala.util.parsing.combinator._
case class MyObject(field1:String, field2:Integer, field3:String)
object Test3 extends RegexParsers {
def main(args:Array[String]) {
val testRow = "abcd,123,ghijk"
val parseResult = Test3.parse(Test3.myObject, testRow)
println(parseResult)
}
def word = "\\w+".r ^^ { _ toString }
def int = """\d+""".r ^^ { _ toInt }
def comma = "," ^^ { _ toString }
def myObject = word ~ comma ~ int ~ comma ~ word ^^ {
case wordfield1 ~ sep1 ~ intfield ~ sep2 ~ wordfield2
=> MyObject(wordfield1, intfield, wordfield2)
}
}
However, I want to use the logic "joined by comma". Therefore rather than explicit writing word ~ comma ~ int ~ comma ~ word it should look more like
List(word, int, word) someFunctionIDontKnow {
(resultParser, nextParser) => resultParser ~ comma ~ nextParser
}
I am a little stuck here because I'm not sure how to save my parsers (with different types: Parser[int] and Parser[String]) into a List while maintaining type safety and what function to use to combine these like i did manually. Is what I want even possible or am I on the wrong track here?
I was inspired to use reverse polish notation as an example of parser combinators for a course I will be teaching, however, my solution ends up using the type List[Any] to store floating point numbers and binary operators respectively. In the end, I recursively deconstruct the list and apply binary operators whenever I meet them. The entire implementation is here:
import scala.util.parsing.combinator._
trait Ops {
type Op = (Float,Float) => Float
def add(x: Float, y: Float) = x + y
def sub(x: Float, y: Float) = x - y
def mul(x: Float, y: Float) = x * y
def div(x: Float, y: Float) = x / y
}
trait PolishParser extends Ops with JavaTokenParsers {
// Converts a floating point number as a String to Float
def num: Parser[Float] = floatingPointNumber ^^ (_.toFloat)
// Parses an operator and converts it to the underlying function it logically maps to
def operator: Parser[Op] = ("*" | "/" | "+" | "-") ^^ {
case "+" => add
case "-" => sub
case "*" => mul
case "/" => div
}
}
trait PolishSemantics extends PolishParser {
def polish:Parser[Float] = rep(num | operator) ^^ ( xs => reduce(xs).head )
def pop2(xs:List[Float],f:Op) = (xs take 2 reduce f) :: (xs drop 2)
def reduce(input:List[Any],stack:List[Float] = Nil):List[Float] = input match {
case (f:Op) :: xs => reduce(xs,pop2(stack,f))
case (x:Float) :: xs => reduce(xs,x :: stack)
case Nil => stack
case _ => sys.error("Unexpected input")
}
}
class PolishInterpreter extends PolishParser with PolishSemantics {
// Parse an expression and return the calculated result as a String
def interpret(expression: String) = parseAll(polish, expression)
}
object Calculator extends PolishSemantics {
def main(args: Array[String]) {
val pi = new PolishInterpreter
println("input: " + args(0))
println("result: " + pi.interpret(args(0)))
}
}
What I want to achieve is to not use the type-pattern in the reduce function. One solution is of course to make a custom type hierarchy in the following sense:
trait Elem
case class Floating(f:Float) extends Elem
case class Operator(o: (Float,Float) => Float) extends Elem
By this, I would be able to use pattern matching on case-classes through their unapply methods, but this would also require extensive refactoring of the code.
One other approach could be to apply the semantics directly while parsing, which would allow me to use only a "stack" of floats and then deal with operators immediately after parsing them. This would of course totally ruin the declarative fashion in which the parser-combinators work and would be a crime against everything that is good in the world.
I realize, of course, that this is nit-picking, but inside everyone is a software engineer trying to get out, and I am ready for that last suggestion to make the example perfect. Any ideas? :)