I am trying to write a Scala Parser combinator for the following input.
The input can be
10
(10)
((10)))
(((10)))
Here the number of brackets can keep on growing. but they should always match. So parsing should fail for ((((10)))
The result of parsing should always be the number at the center
I wrote the following parser
import scala.util.parsing.combinator._
class MyParser extends RegexParsers {
def i = "[0-9]+".r ^^ (_.toInt)
def n = "(" ~ i ~ ")" ^^ {case _ ~ b ~ _ => b.toInt}
def expr = i | n
}
val parser = new MyParser
parser.parseAll(parser.expr, "10")
parser.parseAll(parser.expr, "(10)")
but now how do I handle the case where the number of brackets keep growing but matched?
Easy, just make the parser recursive:
class MyParser extends RegexParsers {
def i = "[0-9]+".r ^^ (_.toInt)
def expr: Parser[Int] = i | "(" ~ expr ~ ")" ^^ {case _ ~ b ~ _ => b.toInt}
}
(but note that scala-parser-combinators has trouble with left-recursive definitions: Recursive definitions with scala-parser-combinators)
Related
I would like to make an AST for arithmetic expression using fastparse from Scala.
For me a arithmetic expression is like:
var_name := value; // value can be an integer, or a whole expression
For the moment I have this parsers:
def word[_:P] = P((CharIn("a-z") | CharIn("A-Z") | "_").rep(1).!)
def digits[_ : P] = P(CharIn("0-9").rep.!)
def div_mul[_: P] = P( digits~ space.? ~ (("*" | "/").! ~ space.? ~/ digits).rep ).map(eval)
def add_sub[_: P] = P( div_mul ~ space.? ~ (("+" | "-").! ~ space.? ~/ div_mul).rep ).map(eval)
def expr[_: P]= P( " ".rep ~ add_sub ~ " ".rep ~ End )
def var_assig[_:P] = P(word ~ " " ~ ":=" ~ " " ~ (value | expr) ~ ";")
I want to create AST for arithmetic expression (2+3*2 for example).
Expected result: Assignment[2,plus[mult,[3,2]]] // symbol[left, right]
My questions is:
What should be like the Tree class/object, if it is necessary, because I want to evaluate that result? This class I will use for the rest parse(if, while).
What should be like the eval function, who takes the input an string, or Seq[String] and return a AST with my expected result?
Here is my way of doing it.
I have defined the components of the Arithmetic Expression using the following Trait:
sealed trait Expression
case class Add(l: Expression, r: Expression) extends Expression
case class Sub(l: Expression, r: Expression) extends Expression
case class Mul(l: Expression, r: Expression) extends Expression
case class Div(l: Expression, r: Expression) extends Expression
case class Num(value: String) extends Expression
And defined the following fastparse patterns (similar to what is described here: https://com-lihaoyi.github.io/fastparse/#Math)
def number[_: P]: P[Expression] = P(CharIn("0-9").rep(1)).!.map(Num)
def parens[_: P]: P[Expression] = P("(" ~/ addSub ~ ")")
def factor[_: P]: P[Expression] = P(number | parens)
def divMul[_: P]: P[Expression] = P(factor ~ (CharIn("*/").! ~/ factor).rep).map(astBuilder _ tupled)
def addSub[_: P]: P[Expression] = P(divMul ~ (CharIn("+\\-").! ~/ divMul).rep).map(astBuilder _ tupled)
def expr[_: P]: P[Expression] = P(addSub ~ End)
Instead of the eval function that was used in the map, I have written a similar one, which returns a folded entity of the previously defined case classes:
def astBuilder(initial: Expression, rest: Seq[(String, Expression)]): Expression = {
rest.foldLeft(initial) {
case (left, (operator, right)) =>
operator match {
case "*" => Mul(left, right)
case "/" => Div(left, right)
case "+" => Add(left, right)
case "-" => Sub(left, right)
}
}
}
And if we would run the following expression:
val Parsed.Success(res, _) = parse("2+3*2", expr(_))
The result would be: Add(Num(2),Mul(Num(3),Num(2)))
class ExprParser extends RegexParsers {
val number = "[0-9]+".r
def expr: Parser[Int] = term ~ rep(
("+" | "-") ~ term ^^ {
case "+" ~ t => t
case "-" ~ t => -t
}) ^^ { case t ~ r => t + r.sum }
def term: Parser[Int] = factor ~ (("*" ~ factor)*) ^^ {
case f ~ r => f * r.map(_._2).product
}
def factor: Parser[Int] = number ^^ { _.toInt } | "(" ~> expr <~ ")"
}
I get the following warning when compiling
warning: match may not be exhaustive.
It would fail on the following input: ~((x: String forSome x not in ("+", "-")), _)
("+" | "-") ~ term ^^ {
^
one warning found
I heard that #unchecked annotation can help. But in this case where should I put it?
The issue here is that with ("+" | "-") you are creating a parser that accepts only two possible strings. However when you map on the resulting parser to extract the value, the result you're going to extract will just be String.
In your pattern matching you only have cases for the strings "+" and "-", but the compiler has no way of knowing that those are the only possible strings that will show up, so it's telling you here that your match may not be exhaustive since it can't know any better.
You could use an unchecked annotation to suppress the warning, but there are much better, more idiomatic ways, to eliminate the issue. One way to solve this is to replace those strings with some kind of structured type as soon as possible. For example, create an ADT
sealed trait Operation
case object Plus extends Operation
case object Minus extends Operation
//then in your parser
("+" ^^^ Plus | "-" ^^^ Minus) ~ term ^^ {
case PLus ~ t => t
case Minus ~ t => -t
}
Now it should be able to realize that the only possible cases are Plus and Minus
Add a case to remove the warning
class ExprParser extends RegexParsers {
val number = "[0-9]+".r
def expr: Parser[Int] = term ~ rep(
("+" | "-") ~ term ^^ {
case "+" ~ t => t
case "-" ~ t => -t
case _ ~ t => t
}) ^^ { case t ~ r => t + r.sum }
def term: Parser[Int] = factor ~ (("*" ~ factor)*) ^^ {
case f ~ r => f * r.map(_._2).product
}
def factor: Parser[Int] = number ^^ { _.toInt } | "(" ~> expr <~ ")"
}
Lets say I want to parse a string in scala, and every time there were parenthesis nested within each other I would multiply some number with itself . Ex
(()) +() + ((())) with number=3 would be 3*3 + 3 + 3*3*3. How would I do this with scala combinators.
class SimpleParser extends JavaTokenParsers {
def Base:Parser[Int] = """(""" ~remainder ~ """)"""
def Plus = atom ~ '+' ~ remainder
def Parens = Base
def remainder:Parser[Int] =(Next|Start) }
How would I make it so that every time an atom is parsed the number would multiply by itself, and then what was inside the atom will also be parsed?
would I put a method after the atom def like
def Base:Parser[Int] = """(""" ~remainder ~ """)""" ^^(2*paser(remainder))
? I don't understand how to do this because of the recursive nature of it, as if I find parenthesis, I must then multiply by three times whatever is in these parenthesis.
This is easiest if you build up the number from the inside out. For the parenthetical groups, we start with the base case (which will result in simply the number itself), and then add the number again for each nesting. For the sum, we start with a single parenthetical group and then optionally add summands until we run out:
import scala.util.parsing.combinator.JavaTokenParsers
class SimpleParser(number: Int) extends JavaTokenParsers {
def base: Parser[Int] = literal("()").map(_ => number)
def pars: Parser[Int] = base | ("(" ~> pars <~ ")").map(_ + number)
def plus: Parser[Int] = "+" ~> expr
def expr: Parser[Int] = (pars ~ opt(plus).map(_.getOrElse(0))).map {
case first ~ rest => first + rest
}
}
object ParserWith3 extends SimpleParser(3)
And then:
scala> ParserWith3.parseAll(ParserWith3.expr, "(())+()+((()))")
res0: ParserWith3.ParseResult[Int] = [1.15] parsed: 18
I'm using map because I can't stand the parsing library's little operator party, but you could replace all the maps with ^^ or ^^^ if you really wanted to.
If you use the fact that you can build right recursive rules using scala parser combinators(here mult appears on the right of its own definition for example):
import scala.util.parsing.combinator.RegexParsers
trait ExprsParsers extends RegexParsers {
val value = 3
lazy val mult: Parser[Int] =
"(" ~> mult <~ ")" ^^ { _ * value } |||
"()" ^^ { _ => value }
lazy val plus: Parser[Int] =
(mult <~ "+") ~ plus ^^ { case m ~ p => m + p } |||
mult
}
To use that code you simply create a structure that inherits ExprsParsers, e.g. :
object MainObj extends ExprsParsers {
def main(args: Array[String]): Unit = {
println(parseAll(plus, "() + ()")) //[1.8] parsed: 6
println(parseAll(plus, "() + (())")) //[1.10] parsed: 12
println(parseAll(plus, "((())) + ()")) //[1.12] parsed: 30
}
}
check scala source file for parser for any operator you don't understand.
I got a below program, I can parse the pattern like convert(a.ACCOUNT_ID, string) to the expression, but I want to replace this pattern with CAST(a.ACCOUNT_ID AS VARCHAR). I can do parse the result expression and replace the strings with the one above but there are expressions like this hence I don't want to do that way.. Is there any way that I can do a pattern replace? Like if I find a pattern as convert(a.ACCOUNT_ID, string) then replace it with CAST(a.ACCOUNT_ID AS VARCHAR)
import scala.util.parsing.combinator._
import scala.util.parsing.combinator.lexical._
import scala.util.parsing.combinator.syntactical._
import scala.util.parsing.combinator.token._
import scala.util.parsing.input.CharSequenceReader
trait QParser extends RegexParsers with JavaTokenParsers {
def knownFunction: Parser[Any] = ident ~ "(" ~ ident ~ ("." ~ ident <~ "," ~ ident ~ ")")
def parse(inputString: String): Any = synchronized {
phrase(knownFunction)(new CharSequenceReader(inputString)) match {
case Success(result, _) => result
case Failure(msg,_) => throw new DataTypeException(msg)
case Error(msg,_) => throw new DataTypeException(msg)
}
}
class DataTypeException(message: String) extends Exception(message)
}
object Parser extends QParser {
def main(args: Array[String]) {
println(parse("convert(a.ACCOUNT_ID, string)"));
}
}
Output: (((convert~()~a)~(.~ACCOUNT_ID))
I am not exactly sure what you mean with "there are expressions like this hence I don't want to do that way", but you can transform the result of your parser function using the ^^ operator.
A transformation function for your parser could be :
def knownFunction: Parser[String] =
ident ~ "(" ~ ident ~ "." ~ ident ~ "," ~ ident ~ ")" ^^ {
case func ~ "(" ~ obj ~ "." ~ value ~ "," ~ castType ~ ")" =>
val sqlFunc = Map("convert" -> "CAST")
val sqlType = Map("string" -> "VARCHAR")
s"${sqlFunc(func)}($obj.$value AS ${sqlType(castType)})"
}
Using this updated function, the output of your application would be :
CAST(a.ACCOUNT_ID AS VARCHAR)
More information about the Scala Combinator Parsing can be found in a chapter of Programming in Scala, 1ed.
Programming in Scala's Chapter 33 explains Combinator Parsing:
It provides this example:
import scala.util.parsing.combinator._
class Arith extends JavaTokenParsers {
def expr: Parser[Any] = term~rep("+"~term | "-"~term)
def term: Parser[Any] = factor~rep("*"~factor | "/"~factor)
def factor: Parser[Any] = floatingPointNumber | "("~expr~")"
}
How can I map expr to a narrower type than Parser[Any]? In other words,
I'd like to take def expr: Parser[Any] and map that via ^^ into a stricter type.
Note - I asked this question in Scala Google Groups - https://groups.google.com/forum/#!forum/scala-user, but haven't received a complete answer that helped me out.
As already stated in the comments, you can narrow down the type to anything you like. You just have to specify it after the ^^.
Here is a complete example with a data structure from your given code.
object Arith extends JavaTokenParsers {
trait Expression //The data structure
case class FNumber(value: Float) extends Expression
case class Plus(e1: Expression, e2: Expression) extends Expression
case class Minus(e1: Expression, e2: Expression) extends Expression
case class Mult(e1: Expression, e2: Expression) extends Expression
case class Div(e1: Expression, e2: Expression) extends Expression
def expr: Parser[Expression] = term ~ rep("+" ~ term | "-" ~ term) ^^ {
case term ~ rest => rest.foldLeft(term)((result, elem) => elem match {
case "+" ~ e => Plus(result, e)
case "-" ~ e => Minus(result, e)
})
}
def term: Parser[Expression] = factor ~ rep("*" ~ factor | "/" ~ factor) ^^ {
case factor ~ rest => rest.foldLeft(factor)((result, elem) => elem match {
case "*" ~ e => Mult(result, e)
case "/" ~ e => Div(result, e)
})
}
def factor: Parser[Expression] = floatingPointNumber ^^ (f => FNumber(f.toFloat)) | "(" ~> expr <~ ")"
def parseInput(input: String): Expression = parse(expr, input) match {
case Success(ex, _) => ex
case _ => throw new IllegalArgumentException //or change the result to Try[Expression]
}
}
Now we can start to parse something.
Arith.parseInput("(1.3 + 2.0) * 2")
//yields: Mult(Plus(FNumber(1.3),FNumber(2.0)),FNumber(2.0))
Of course you can also have a Parser[String] or a Parser[Float], where you directly transform or evaluate the input String. It is as I said up to you.