I am writing an cron parser, but compiler complains illegal rule composition,
What's wrong with my parser?
import org.parboiled2._
sealed trait Part
case class Fixed(points: Seq[Int]) extends Part
case class Range(start: Int, end: Int) extends Part
case class Every(start: Int, interval: Int) extends Part
case object Full extends Part
case object Ignore extends Part
class CronParser(val input: ParserInput) extends Parser {
def number = rule { capture(digits) ~> (_.toInt) }
def digits = rule { oneOrMore(CharPredicate.Digit) }
def fixed = rule { oneOrMore(number).separatedBy(",") ~> Fixed }
def range = rule { digits ~ '-' ~ digits ~> Range }
def every= rule { digits ~ '/' ~ digits ~> Every }
def full= rule { '*' ~ push(Full) }
def ignore = rule { '?' ~ push(Ignore) }
def part = rule { fixed | range | every | full | ignore }
def expr = rule { part ~ part ~ part ~ part ~ part}
}
You're using digits where I think you want to be using number. The following should work just fine:
class CronParser(val input: ParserInput) extends Parser {
def number = rule { capture(digits) ~> (_.toInt) }
def digits = rule { oneOrMore(CharPredicate.Digit) }
def fixed = rule { oneOrMore(number).separatedBy(",") ~> Fixed }
def range = rule { number ~ '-' ~ number ~> Range }
def every = rule { number ~ '/' ~ number ~> Every }
def full = rule { '*' ~ push(Full) }
def ignore = rule { '?' ~ push(Ignore) }
def part = rule { fixed | range | every | full | ignore }
def expr = rule { part ~ part ~ part ~ part ~ part }
}
The problem was that digits doesn't push a value, which meant that range, etc. were rules that wanted to pop values off the stack, and these weren't able to be composed with ~.
Related
I took this from a project that claims to parse real numbers, but it somehow eats the pre-decimal part:
object Main extends App {
import org.parboiled.scala._
val res = TestParser.parseDouble("2.3")
println(s"RESULT: ${res.result}")
object TestParser extends Parser {
def RealNumber = rule {
oneOrMore(Digit) ~ optional( "." ~ oneOrMore(Digit) ) ~> { s =>
println(s"CAPTURED '$s'")
s.toDouble
}
}
def Digit = rule { "0" - "9" }
def parseDouble(input: String): ParsingResult[Double] =
ReportingParseRunner(RealNumber).run(input)
}
}
This prints:
CAPTURED '.3'
RESULT: Some(0.3)
What is wrong here? Note that currently I cannot go from Parboiled-1 to Parboiled-2, because I have a larger grammar that would have to be rewritten.
As stated in parboiled documentation, action rules like ~> take the match of the immediately preceding peer rule. In the rule sequence oneOrMore(Digit) ~ optional( "." ~ oneOrMore(Digit) ) the immediately preceding rule is optional( "." ~ oneOrMore(Digit) ), so you get only its match ".3" in the action rule.
To fix that you may, for example, extract the first two elements into a separate rule:
def RealNumberString = rule {
oneOrMore(Digit) ~ optional( "." ~ oneOrMore(Digit) )
}
def RealNumber = rule {
RealNumberString ~> { s =>
println(s"CAPTURED '$s'")
s.toDouble
}
}
Or push both parts onto the stack and then combine them:
def RealNumber = rule {
oneOrMore(Digit) ~> identity ~
optional( "." ~ oneOrMore(Digit) ) ~> identity ~~> { (s1, s2) =>
val s = s1 + s2
println(s"CAPTURED '$s'")
s.toDouble
}
}
Here is one solution, but it looks very ugly. Probably there is a better way:
def Decimal = rule {
Integer ~ optional[Int]("." ~ PosInteger) ~~> { (a: Int, bOpt: Option[Int]) =>
bOpt.fold(a.toDouble)(b => s"$a.$b".toDouble) /* ??? */
}}
def PosInteger = rule { Digits ~> (_.toInt) }
def Integer = rule { optional[Unit]("-" ~> (_ => ())) /* ??? */ ~
PosInteger ~~> { (neg: Option[Unit], num: Int) =>
if (neg.isDefined) -num else num
}
}
def Digit = rule { "0" - "9" }
def Digits = rule { oneOrMore(Digit) }
def parseDouble(input: String): ParsingResult[Double] =
ReportingParseRunner(Decimal).run(input)
This seems pretty simple!
class SeparatedParser(val input: ParserInput, val delimiter: String = ",") extends Parser {
def pipedField = rule { (zeroOrMore(field).separatedBy("|")) }
def field = rule { capture(zeroOrMore(noneOf(delimiter))) }
def d = delimiter
def record = rule {
field ~ d ~ pipedField ~ d ~ field ~ EOI
}
}
I try:
val parser = new SeparatedParser("""49798,piped1|piped2,sklw""")
val parsed = parser.record.run()
parsed match {
case Success(rel) => println(rel)
case Failure(pe:ParseError) =>println(parser.formatError(pe))
}
But I get:
49798 :: Vector(piped1|piped2) :: sklw :: HNil
I would expect the Vector to have two separate elements: piped1 and piped2.
What dumbass mistake am I making?
I have this code below to check a string. We want to verify that it starts with '{' and ends with '}' and that it contains sequences of non-"{}" characters and strings that also have this property.
import util.parsing.combinator._
class Comp extends RegexParsers with PackratParsers {
lazy val bracefree: PackratParser[String] = """[^{}]*""".r ^^ {
case a => a
}
lazy val matching: PackratParser[String] = (
"{" ~ rep(bracefree | matching) ~ "}") ^^ {
case a ~ b ~ c => a + b.mkString("") + c
}
}
object Brackets extends Comp {
def main(args: Array[String])= {
println(parseAll(matching, "{ foo {hello 3 } {}}").get)
}
}
The desired output for this is to echo { foo {hello 3 } {}}, but it ends up taking a long time before dying from java.lang.OutOfMemoryError: GC overhead limit exceeded. What am I doing wrong and what should I have done instead?
Your regular expression for bracefree string matches even an empty string, so parser produced by rep() succeeds without consuming any input and will loop endlessly.
Use a + quantifier instead of *:
lazy val bracefree: PackratParser[String] = """[^{}]+""".r ^^ {
case a => a
}
Also, by default RegexParsers will skip empty strings and whitespaces. To turn that behavior off, just override method skipWhitespace to always return false. In the end your parser will look like this:
import util.parsing.combinator._
class Comp extends RegexParsers with PackratParsers {
override def skipWhitespace = false
lazy val bracefree: PackratParser[String] = """[^{}]+""".r ^^ {
case a => a
}
lazy val matching: PackratParser[String] = (
"{" ~ rep(bracefree | matching) ~ "}") ^^ {
case a ~ b ~ c => a + b.mkString("") + c
}
}
object Brackets extends Comp {
def main(args: Array[String])= {
println(parseAll(matching, "{ foo {hello 3 } {}}").get)
// prints: { foo {hello 3 } {}}
}
}
I want to parse a String with scala parser combinators. Lets take
abcd,123,ghijk
as example. So we have 2 words and an Integer joined by comma.
I can do it like that:
import scala.util.parsing.combinator._
case class MyObject(field1:String, field2:Integer, field3:String)
object Test3 extends RegexParsers {
def main(args:Array[String]) {
val testRow = "abcd,123,ghijk"
val parseResult = Test3.parse(Test3.myObject, testRow)
println(parseResult)
}
def word = "\\w+".r ^^ { _ toString }
def int = """\d+""".r ^^ { _ toInt }
def comma = "," ^^ { _ toString }
def myObject = word ~ comma ~ int ~ comma ~ word ^^ {
case wordfield1 ~ sep1 ~ intfield ~ sep2 ~ wordfield2
=> MyObject(wordfield1, intfield, wordfield2)
}
}
However, I want to use the logic "joined by comma". Therefore rather than explicit writing word ~ comma ~ int ~ comma ~ word it should look more like
List(word, int, word) someFunctionIDontKnow {
(resultParser, nextParser) => resultParser ~ comma ~ nextParser
}
I am a little stuck here because I'm not sure how to save my parsers (with different types: Parser[int] and Parser[String]) into a List while maintaining type safety and what function to use to combine these like i did manually. Is what I want even possible or am I on the wrong track here?
I'm trying to make a very simple parser with parser combinators (to parse something similar to BNF). I've checked several blog posts that explain the matter (the ones top-ranked at Google (for me)) and I think I understand it but the tests say otherwise.
I've checked the questions in StackOverflow and while some could maybe be applied and useful whenever I try to apply them something else breaks, so best way to to is going through an specific example:
This is my main:
def main(args: Array[String]) {
val parser: BaseParser = new BaseParser
val eol = sys.props("line.separator")
val test = s"a = b ${eol} a = c ${eol}"
System.out.println(test)
parser.parse(test)
}
This is the parser:
import com.github.trylks.tests.parser.ParserClasses._
import scala.util.parsing.combinator.syntactical._
import scala.util.parsing.combinator.ImplicitConversions
import scala.util.parsing.combinator.PackratParsers
class BaseParser extends StandardTokenParsers with ImplicitConversions with PackratParsers {
val eol = sys.props("line.separator")
lexical.delimiters += ("=", "|", "*", "[", "]", "(", ")", ";", eol)
def rules = rep1sep(rule, eol) ^^ { Rules(_) }
def rule = id ~ "=" ~ repsep(expression, "|") ^^ flatten3 { (e1: ID, _: Any, e3: List[Expression]) => Rule(e1, e3) }
def expression: Parser[Expression] = (element | parenthesized | optional) ^^ { x => x } // and sequence and repetition, but that's another problem...
def parenthesized: Parser[Expression] = "(" ~> expression <~ ")" ^^ { x => x }
def optional: Parser[Expression] = "[" ~> expression <~ "]" ^^ { Optional(_) }
def element: Parser[Element] = (id | constant) ^^ { x => x }
def constant: Parser[Constant] = stringLit ^^ { Constant(_) }
def id: Parser[ID] = ident ^^ { ID(_) }
def parse(text: String): Option[Rules] = {
val s = rules(new lexical.Scanner(text))
s match {
case Success(res, next) => {
println("Success!\n" + res.toString)
Some(res)
}
case Error(msg, next) => {
println("error: " + msg)
None
}
case Failure(msg, next) => {
println("failure: " + msg)
None
}
}
}
}
These are the classes that you are missing from the previous part of the code:
object ParserClasses {
abstract class Element extends Expression
case class ID(value: String) extends Element {
override def toString(): String = value
}
case class Constant(value: String) extends Element {
override def toString(): String = value
}
abstract class Expression
case class Optional(value: Expression) extends Expression {
override def toString() = s"[$value]"
}
case class Rule(head: ID, body: List[Expression]) {
override def toString() = s"$head = ${body.mkString(" | ")}"
}
case class Rules(rules: List[Rule]) {
override def toString() = rules.mkString("\n")
}
}
The problem is: as the code is now, it doesn't work, it parses only one rule (not both). If I replace eol with ";" (in the main and the parser) then it works (at least for this test).
Most people seem to prefer regex parsers, every blog explaining parser combinators doesn't get into details about the traits that could be extended or not, so I have no idea about those differences or why there are several (I say this because it may be important to understand why the code doesn't work). The problem is: If I try to use regex parsers then I get errors for all the strings that I have specified in the parsers "=", "*", etc.