I want to parse a String with scala parser combinators. Lets take
abcd,123,ghijk
as example. So we have 2 words and an Integer joined by comma.
I can do it like that:
import scala.util.parsing.combinator._
case class MyObject(field1:String, field2:Integer, field3:String)
object Test3 extends RegexParsers {
def main(args:Array[String]) {
val testRow = "abcd,123,ghijk"
val parseResult = Test3.parse(Test3.myObject, testRow)
println(parseResult)
}
def word = "\\w+".r ^^ { _ toString }
def int = """\d+""".r ^^ { _ toInt }
def comma = "," ^^ { _ toString }
def myObject = word ~ comma ~ int ~ comma ~ word ^^ {
case wordfield1 ~ sep1 ~ intfield ~ sep2 ~ wordfield2
=> MyObject(wordfield1, intfield, wordfield2)
}
}
However, I want to use the logic "joined by comma". Therefore rather than explicit writing word ~ comma ~ int ~ comma ~ word it should look more like
List(word, int, word) someFunctionIDontKnow {
(resultParser, nextParser) => resultParser ~ comma ~ nextParser
}
I am a little stuck here because I'm not sure how to save my parsers (with different types: Parser[int] and Parser[String]) into a List while maintaining type safety and what function to use to combine these like i did manually. Is what I want even possible or am I on the wrong track here?
Related
I am trying to write a Scala Parser combinator for the following input.
The input can be
10
(10)
((10)))
(((10)))
Here the number of brackets can keep on growing. but they should always match. So parsing should fail for ((((10)))
The result of parsing should always be the number at the center
I wrote the following parser
import scala.util.parsing.combinator._
class MyParser extends RegexParsers {
def i = "[0-9]+".r ^^ (_.toInt)
def n = "(" ~ i ~ ")" ^^ {case _ ~ b ~ _ => b.toInt}
def expr = i | n
}
val parser = new MyParser
parser.parseAll(parser.expr, "10")
parser.parseAll(parser.expr, "(10)")
but now how do I handle the case where the number of brackets keep growing but matched?
Easy, just make the parser recursive:
class MyParser extends RegexParsers {
def i = "[0-9]+".r ^^ (_.toInt)
def expr: Parser[Int] = i | "(" ~ expr ~ ")" ^^ {case _ ~ b ~ _ => b.toInt}
}
(but note that scala-parser-combinators has trouble with left-recursive definitions: Recursive definitions with scala-parser-combinators)
Lets say I want to parse a string in scala, and every time there were parenthesis nested within each other I would multiply some number with itself . Ex
(()) +() + ((())) with number=3 would be 3*3 + 3 + 3*3*3. How would I do this with scala combinators.
class SimpleParser extends JavaTokenParsers {
def Base:Parser[Int] = """(""" ~remainder ~ """)"""
def Plus = atom ~ '+' ~ remainder
def Parens = Base
def remainder:Parser[Int] =(Next|Start) }
How would I make it so that every time an atom is parsed the number would multiply by itself, and then what was inside the atom will also be parsed?
would I put a method after the atom def like
def Base:Parser[Int] = """(""" ~remainder ~ """)""" ^^(2*paser(remainder))
? I don't understand how to do this because of the recursive nature of it, as if I find parenthesis, I must then multiply by three times whatever is in these parenthesis.
This is easiest if you build up the number from the inside out. For the parenthetical groups, we start with the base case (which will result in simply the number itself), and then add the number again for each nesting. For the sum, we start with a single parenthetical group and then optionally add summands until we run out:
import scala.util.parsing.combinator.JavaTokenParsers
class SimpleParser(number: Int) extends JavaTokenParsers {
def base: Parser[Int] = literal("()").map(_ => number)
def pars: Parser[Int] = base | ("(" ~> pars <~ ")").map(_ + number)
def plus: Parser[Int] = "+" ~> expr
def expr: Parser[Int] = (pars ~ opt(plus).map(_.getOrElse(0))).map {
case first ~ rest => first + rest
}
}
object ParserWith3 extends SimpleParser(3)
And then:
scala> ParserWith3.parseAll(ParserWith3.expr, "(())+()+((()))")
res0: ParserWith3.ParseResult[Int] = [1.15] parsed: 18
I'm using map because I can't stand the parsing library's little operator party, but you could replace all the maps with ^^ or ^^^ if you really wanted to.
If you use the fact that you can build right recursive rules using scala parser combinators(here mult appears on the right of its own definition for example):
import scala.util.parsing.combinator.RegexParsers
trait ExprsParsers extends RegexParsers {
val value = 3
lazy val mult: Parser[Int] =
"(" ~> mult <~ ")" ^^ { _ * value } |||
"()" ^^ { _ => value }
lazy val plus: Parser[Int] =
(mult <~ "+") ~ plus ^^ { case m ~ p => m + p } |||
mult
}
To use that code you simply create a structure that inherits ExprsParsers, e.g. :
object MainObj extends ExprsParsers {
def main(args: Array[String]): Unit = {
println(parseAll(plus, "() + ()")) //[1.8] parsed: 6
println(parseAll(plus, "() + (())")) //[1.10] parsed: 12
println(parseAll(plus, "((())) + ()")) //[1.12] parsed: 30
}
}
check scala source file for parser for any operator you don't understand.
I got a below program, I can parse the pattern like convert(a.ACCOUNT_ID, string) to the expression, but I want to replace this pattern with CAST(a.ACCOUNT_ID AS VARCHAR). I can do parse the result expression and replace the strings with the one above but there are expressions like this hence I don't want to do that way.. Is there any way that I can do a pattern replace? Like if I find a pattern as convert(a.ACCOUNT_ID, string) then replace it with CAST(a.ACCOUNT_ID AS VARCHAR)
import scala.util.parsing.combinator._
import scala.util.parsing.combinator.lexical._
import scala.util.parsing.combinator.syntactical._
import scala.util.parsing.combinator.token._
import scala.util.parsing.input.CharSequenceReader
trait QParser extends RegexParsers with JavaTokenParsers {
def knownFunction: Parser[Any] = ident ~ "(" ~ ident ~ ("." ~ ident <~ "," ~ ident ~ ")")
def parse(inputString: String): Any = synchronized {
phrase(knownFunction)(new CharSequenceReader(inputString)) match {
case Success(result, _) => result
case Failure(msg,_) => throw new DataTypeException(msg)
case Error(msg,_) => throw new DataTypeException(msg)
}
}
class DataTypeException(message: String) extends Exception(message)
}
object Parser extends QParser {
def main(args: Array[String]) {
println(parse("convert(a.ACCOUNT_ID, string)"));
}
}
Output: (((convert~()~a)~(.~ACCOUNT_ID))
I am not exactly sure what you mean with "there are expressions like this hence I don't want to do that way", but you can transform the result of your parser function using the ^^ operator.
A transformation function for your parser could be :
def knownFunction: Parser[String] =
ident ~ "(" ~ ident ~ "." ~ ident ~ "," ~ ident ~ ")" ^^ {
case func ~ "(" ~ obj ~ "." ~ value ~ "," ~ castType ~ ")" =>
val sqlFunc = Map("convert" -> "CAST")
val sqlType = Map("string" -> "VARCHAR")
s"${sqlFunc(func)}($obj.$value AS ${sqlType(castType)})"
}
Using this updated function, the output of your application would be :
CAST(a.ACCOUNT_ID AS VARCHAR)
More information about the Scala Combinator Parsing can be found in a chapter of Programming in Scala, 1ed.
I have this code below to check a string. We want to verify that it starts with '{' and ends with '}' and that it contains sequences of non-"{}" characters and strings that also have this property.
import util.parsing.combinator._
class Comp extends RegexParsers with PackratParsers {
lazy val bracefree: PackratParser[String] = """[^{}]*""".r ^^ {
case a => a
}
lazy val matching: PackratParser[String] = (
"{" ~ rep(bracefree | matching) ~ "}") ^^ {
case a ~ b ~ c => a + b.mkString("") + c
}
}
object Brackets extends Comp {
def main(args: Array[String])= {
println(parseAll(matching, "{ foo {hello 3 } {}}").get)
}
}
The desired output for this is to echo { foo {hello 3 } {}}, but it ends up taking a long time before dying from java.lang.OutOfMemoryError: GC overhead limit exceeded. What am I doing wrong and what should I have done instead?
Your regular expression for bracefree string matches even an empty string, so parser produced by rep() succeeds without consuming any input and will loop endlessly.
Use a + quantifier instead of *:
lazy val bracefree: PackratParser[String] = """[^{}]+""".r ^^ {
case a => a
}
Also, by default RegexParsers will skip empty strings and whitespaces. To turn that behavior off, just override method skipWhitespace to always return false. In the end your parser will look like this:
import util.parsing.combinator._
class Comp extends RegexParsers with PackratParsers {
override def skipWhitespace = false
lazy val bracefree: PackratParser[String] = """[^{}]+""".r ^^ {
case a => a
}
lazy val matching: PackratParser[String] = (
"{" ~ rep(bracefree | matching) ~ "}") ^^ {
case a ~ b ~ c => a + b.mkString("") + c
}
}
object Brackets extends Comp {
def main(args: Array[String])= {
println(parseAll(matching, "{ foo {hello 3 } {}}").get)
// prints: { foo {hello 3 } {}}
}
}
I'm trying to make a very simple parser with parser combinators (to parse something similar to BNF). I've checked several blog posts that explain the matter (the ones top-ranked at Google (for me)) and I think I understand it but the tests say otherwise.
I've checked the questions in StackOverflow and while some could maybe be applied and useful whenever I try to apply them something else breaks, so best way to to is going through an specific example:
This is my main:
def main(args: Array[String]) {
val parser: BaseParser = new BaseParser
val eol = sys.props("line.separator")
val test = s"a = b ${eol} a = c ${eol}"
System.out.println(test)
parser.parse(test)
}
This is the parser:
import com.github.trylks.tests.parser.ParserClasses._
import scala.util.parsing.combinator.syntactical._
import scala.util.parsing.combinator.ImplicitConversions
import scala.util.parsing.combinator.PackratParsers
class BaseParser extends StandardTokenParsers with ImplicitConversions with PackratParsers {
val eol = sys.props("line.separator")
lexical.delimiters += ("=", "|", "*", "[", "]", "(", ")", ";", eol)
def rules = rep1sep(rule, eol) ^^ { Rules(_) }
def rule = id ~ "=" ~ repsep(expression, "|") ^^ flatten3 { (e1: ID, _: Any, e3: List[Expression]) => Rule(e1, e3) }
def expression: Parser[Expression] = (element | parenthesized | optional) ^^ { x => x } // and sequence and repetition, but that's another problem...
def parenthesized: Parser[Expression] = "(" ~> expression <~ ")" ^^ { x => x }
def optional: Parser[Expression] = "[" ~> expression <~ "]" ^^ { Optional(_) }
def element: Parser[Element] = (id | constant) ^^ { x => x }
def constant: Parser[Constant] = stringLit ^^ { Constant(_) }
def id: Parser[ID] = ident ^^ { ID(_) }
def parse(text: String): Option[Rules] = {
val s = rules(new lexical.Scanner(text))
s match {
case Success(res, next) => {
println("Success!\n" + res.toString)
Some(res)
}
case Error(msg, next) => {
println("error: " + msg)
None
}
case Failure(msg, next) => {
println("failure: " + msg)
None
}
}
}
}
These are the classes that you are missing from the previous part of the code:
object ParserClasses {
abstract class Element extends Expression
case class ID(value: String) extends Element {
override def toString(): String = value
}
case class Constant(value: String) extends Element {
override def toString(): String = value
}
abstract class Expression
case class Optional(value: Expression) extends Expression {
override def toString() = s"[$value]"
}
case class Rule(head: ID, body: List[Expression]) {
override def toString() = s"$head = ${body.mkString(" | ")}"
}
case class Rules(rules: List[Rule]) {
override def toString() = rules.mkString("\n")
}
}
The problem is: as the code is now, it doesn't work, it parses only one rule (not both). If I replace eol with ";" (in the main and the parser) then it works (at least for this test).
Most people seem to prefer regex parsers, every blog explaining parser combinators doesn't get into details about the traits that could be extended or not, so I have no idea about those differences or why there are several (I say this because it may be important to understand why the code doesn't work). The problem is: If I try to use regex parsers then I get errors for all the strings that I have specified in the parsers "=", "*", etc.