Combinator Definition gives an error, unable to understand why - scala

I am trying to write a simple parser to be able to generate a DDL for an RDBMS, but got stuck in defining the combinator.
import scala.util.parsing.combinator._
object DocumentParser extends RegexParsers {
override protected val whiteSpace = """(\s|//.*)+""".r //To include comments in what is regarded as white space, to be ignored
case class DocumentAttribute(attributeName : String, attributeType : String)
case class Document(documentName : String, documentAttributeList : List[DocumentAttribute])
def document : Parser[Document]= "document" ~> documentName <~ "{" ~> attributeList <~ "}" ^^ {case n ~ l => Document(n, l)} //Here is where I get an error
def documentName : Parser[String] = """[a-zA-Z_][a-zA-Z0-9_]*""".r ^^ {_.toString}
def attributeList : Parser[List[DocumentAttribute]] = repsep(attribute, ",")
def attribute : Parser[DocumentAttribute] = attributeName ~ attributeType ^^ {case n ~ t => DocumentAttribute(n, t)}
def attributeName : Parser[String] = """[a-zA-Z_][a-zA-Z0-9_]*""".r ^^ {_.toString}
def attributeType : Parser[String] = """[a-zA-Z_][a-zA-Z0-9_]*""".r ^^ {_.toString}
}
It seems that I have defined it correctly. Is there something obvious I am missing or something fundamental about combinators I don't understand? Thanks!

You have to use the following code for document:
def document : Parser[Document]= "document" ~> documentName ~ ("{" ~> attributeList <~ "}") ^^ {case n ~ l => Document(n, l)}
Note the ~ after documentName and brackets around "{" ~> attributeList <~ "}". Otherwise, all those <~ and ~> will discard everything except attributeList.
Basically, without any parentheses, the result is that everything to the right of the leftmost <~ is discarded, and then everything to the left of the rightmost ~> still remaining is discarded. For example:
def foo: Parser[String] = "a" ~> "b" ~> "c" ~ "d" <~ "e" ~> "f" <~ "g"
|<-discarded->| | <- discarded -> |
With this change your code works:
scala> DocumentParser.document(new CharSequenceReader(
""" document foo {bar baz, // comment
| qaz wsx}""".stripMargin))
res4: DocumentParser.ParseResult[DocumentParser.Document] = [2.10] parsed: Document(foo,List(DocumentAttribute(bar,baz), DocumentAttribute(qaz,wsx)))

Related

Using keep-left/right combinator is not working with result converter

I have a combinator and a result converter that looks like so:
// parses a line like so:
//
// 2
// 00:00:01.610 --> 00:00:02.620 align:start position:0%
//
private def subtitleHeader: Parser[SubtitleBlock] = {
(subtitleNumber ~ whiteSpace).? ~>
time ~ arrow ~ time ~ opt(textLine) ~ eol
} ^^ {
case
startTime ~ _ ~ endTime ~ _ ~ _
=> SubtitleBlock(startTime, endTime, List(""))
}
Because the arrow, textline and eol are not important to my result converter, I was hoping I could use <~ and ~> in the right places within my combinator such that my converter doesn't have to deal with them. As an experiment, I changed the first ~ in the parser to <~ and removed the ~ _ where the "arrow" would be matched in the case statement like so:
private def subtitleHeader: Parser[SubtitleBlock] = {
(subtitleNumber ~ whiteSpace).? ~>
time <~ arrow ~ time ~ opt(textLine) ~ eol
} ^^ {
case
startTime ~ endTime ~ _ ~ _
=> SubtitleBlock(startTime, endTime, List(""))
}
However, I get red-squigglies in IntelliJ with the error message:
Error:(44, 31) constructor cannot be instantiated to expected type;
found : caption.vttdissector.VttParsers.~[a,b] required: Int
startTime ~ endTime ~ _ ~ _
What am I doing wrong?
Since you didn't insert any parentheses in the chain of ~ and <~, most matched subexpressions are thrown out "with the bathwater" (or rather "with the whitespace and arrows"). Just insert some parentheses.
Here is the general pattern what it should look like:
(irrelevant ~> irrelevant ~> RELEVANT <~ irrelevant <~ irrelevant) ~
(irrelevant ~> RELEVANT <~ irrelevant <~ irrelevant) ~
...
i.e. every "relevant" subexpression is surrounded by irrelevant stuff and a pair of parentheses, and then the parenthesized subexpressions are connected by ~'s.
Your example:
import scala.util.parsing.combinator._
import scala.util.{Either, Left, Right}
case class SubtitleBlock(startTime: String, endTime: String, text: List[String])
object YourParser extends RegexParsers {
def subtitleHeader: Parser[SubtitleBlock] = {
(subtitleNumber.? ~> time <~ arrow) ~
time ~
(opt(textLine) <~ eol)
} ^^ {
case startTime ~ endTime ~ _ => SubtitleBlock(startTime, endTime, Nil)
}
override val whiteSpace = "[ \t]+".r
def arrow: Parser[String] = "-->".r
def subtitleNumber: Parser[String] = "\\d+".r
def time: Parser[String] = "\\d{2}:\\d{2}:\\d{2}.\\d{3}".r
def textLine: Parser[String] = ".*".r
def eol: Parser[String] = "\n".r
def parseStuff(s: String): scala.util.Either[String, SubtitleBlock] =
parseAll(subtitleHeader, s) match {
case Success(t, _) => scala.util.Right(t)
case f => scala.util.Left(f.toString)
}
def main(args: Array[String]): Unit = {
val examples: List[String] = List(
"2 00:00:01.610 --> 00:00:02.620 align:start position:0%\n"
) ++ args.map(_ + "\n")
for (x <- examples) {
println(parseStuff(x))
}
}
}
finds:
Right(SubtitleBlock(00:00:01.610,00:00:02.620,List()))

scala.util.parsing.combinator.RegexParsers constructor cannot be instantiated to expected type

I want to be able to parse strings like the one below with Scala parser combinators.
aaa22[bbb33[ccc]ddd]eee44[fff]
Before every open square bracket an integer literal is guaranteed to exist.
The code I have so far:
import scala.util.parsing.combinator.RegexParsers
trait AST
case class LetterSeq(value: String) extends AST
case class IntLiteral(value: String) extends AST
case class Repeater(count: AST, content: List[AST]) extends AST
class ExprParser extends RegexParsers {
def intLiteral: Parser[AST] = "[0-9]+".r ^^ IntLiteral
def letterSeq: Parser[AST] = "[a-f]+".r ^^ LetterSeq
def term: Parser[AST] = letterSeq | repeater
def expr: Parser[List[AST]] = rep1(term)
def repeater: Parser[AST] = intLiteral ~> "[" ~> expr <~ "]" ^^ {
case intLiteral ~ expr => Repeater(intLiteral, expr)
}
}
The message I get:
<console>:25: error: constructor cannot be instantiated to expected type;
found : ExprParser.this.~[a,b]
required: List[AST]
case intLiteral ~ expr => Repeater(intLiteral, expr)
Any ideas?
Later Edit: After making the change suggested by #sepp2k I still get the same error. The change being:
def repeater: Parser[AST] = intLiteral ~ "[" ~> expr <~ "]" ^^ {
The error message is telling you that you're pattern matching a list against the ~ constructor, which isn't allowed. In order to use ~ in your pattern, you need to have used ~ in the parser.
It looks like in this case the problem is simply that you discarded the value of intLiteral using ~> when you did not mean to. If you use ~ instead of ~> here and add parentheses1, that should fix your problem.
1 The parentheses are required, so that the following ~> only throws away the bracket instead of the result of intLiteral ~ "[". intLiteral ~ "[" ~> expr <~ "]" is parsed as (intLiteral ~ "[") ~> expr <~ "]", which still throws away the intLiteral. You want intLiteral ~ ("[" ~> expr <~ "]") which only throws away the [ and ].

Parsing in scala/Java

I am writing a Parser in scala and got stuck at this point:
private def expression : Parser[Expression] = cond | variable | integer | liste | function
private def cond : Parser[Expression] = "if" ~ predicate ~ "then" ~ expression ~ "else" ~ expression ^^ {case _~i~_~t~_~el => Cond(i,t,el)}
private def predicate: Parser[Predicate] = identifier ~ "?" ~ "(" ~ repsep(expression, ",") ~ ")" ^^{case n~_~_~el~_ => Predicate(n,el)}
private def function: Parser[Expression] = identifier ~ "(" ~ repsep(expression, ",") ~ ")" ^^{case n~_~el~_ => Function(n,el)}
private def liste: Parser[Expression] = "[" ~ repsep(expression, ",") ~ "]" ^^ {case _~ls~_ => Liste(ls)}
private def variable: Parser[Expression] = identifier ^^ {case v => Variable(v)}
def identifier: Parser[String] = """[a-zA-Z0-9]+""".r ^^ { _.toString }
def integer: Parser[Integer] = num ^^ { case i => Integer(i)}
def num: Parser[String] = """(-?\d*)""".r ^^ {_.toString}
My problem is that when it comes to an "expression" the Parser does not always takes the right way. Like if its funk(x,y) it tries to parse it like a variable ant not like a function.
Any idea?
Change order of parsers in your expression parser - put function before variable and after cond. In general, when you compose parsers using alternative A | B, then parser A shouldn't be able to parse input that is prefix of input parsable by parser B.

How to allow optional outermost parenthesis?

I am writing a parser for certain expressions. I want to allow parentheses to be optional at the outermost level. My current parser looks like this:
class MyParser extends JavaTokenParsers {
def expr = andExpr | orExpr | term
def andExpr = "(" ~> expr ~ "and" ~ expr <~ ")"
def orExpr = "(" ~> expr ~ "or" ~ expr <~ ")"
def term = """[a-z]""".r
}
As it is, this parser accepts only fully parenthesized expressions, such as:
val s1 = "(a and b)"
val s2 = "((a and b) or c)"
val s3 = "((a and b) or (c and d))"
My question is, is there any modification I can make to this parser in order for the outermost parenthesis to be optional? I would like to accept the string:
val s4 = "(a and b) or (c and d)"
Thanks!
class MyParser extends JavaTokenParsers {
// the complete parser can be either a parenthesisless "andExpr" or parenthesisless
// "orExpr " or "expr"
def complete = andExpr | orExpr | expr
def expr = parenthesis | term
// moved the parenthesis from the andExpr and orExpr so I dont have to create
// an extra parenthesisless andExpr and orExpr
def parenthesis = "(" ~> (andExpr | orExpr) <~ ")"
def andExpr = expr ~ "and" ~ expr
def orExpr = expr ~ "or" ~ expr
def term = """[a-z]""".r
}

Scala Parser, why doesn't "pat <~ pat ~> pat" work?

Trying out a simple parser combinator, I'm running into compilation errors.
I would like to parse -- "Smith, Joe" into its Name object like Name(Joe, Smith). Simple enough, I guess.
Here is the code related with that:
import util.parsing.combinator._
class NameParser extends JavaTokenParsers {
lazy val name: Parser[Name] =
lastName <~ "," ~> firstName ^^ {case (l ~ f) => Name(f, l)}
lazy val lastName = stringLiteral
lazy val firstName = stringLiteral
}
case class Name(firstName:String, lastName: String)
And I'm testing it via
object NameParserTest {
def main(args: Array[String]) {
val parser = new NameParser()
println(parser.parseAll(parser.name, "Schmo, Joe"))
}
}
Getting a compilation error:
error: constructor cannot be instantiated to expected type;
found : NameParser.this.~[a,b]
required: java.lang.String
lazy val name: Parser[Name] = lastName <~ "," ~> firstName ^^ {case (l ~ f) => Name(f, l)}
What is that I am missing here?
In this line here:
lazy val name: Parser[Name] =
lastName <~ "," ~> firstName ^^ {case (l ~ f) => Name(f, l)}
you don't want to use both <~ and ~>. You're creating a parser that matches "," and firstName and keeps only ",", and then you're creating a parser that matches lastName and the previous parser and keeps only lastName.
You can replace it with this:
(lastName <~ ",") ~ firstName ^^ {case (l ~ f) => Name(f, l)}
However, although this will compile and combine the way you want, it won't parse what you want it to. I got this output when I tried:
[1.1] failure: string matching regex `"([^"\p{Cntrl}\\]|\\[\\/bfnrt]|\\u[a-fA-F0-9]{4})*"' expected but `S' found
Schmo, Joe
^
stringLiteral expects something that looks like a string literal in code (something in quotation marks). (JavaTokenParsers is meant to parse stuff that looks like Java.) This works:
scala> val x = new NameParser
x: NameParser = NameParser#1ea8dbd
scala> x.parseAll(x.name, "\"Schmo\", \"Joe\"")
res0: x.ParseResult[Name] = [1.15] parsed: Name("Joe","Schmo")
You should probably replace it with a regex that specifies what kind of strings you will accept for names. If you look at the documentation here, you'll see:
implicit def regex (r: Regex) : Parser[String]
A parser that matches a regex string
So you can just put a Regex object there and it will be converted into a parser that matches it.
the ~> combinator ignores the left side and the <~ combinator ignores the right side. So the result of lastName <~ "," ~> firstName can never include the results of both firstName and lastName. Actually it is only the parse result of lastName because "," ~> firstName is ignored. You need to use sequential composition here:
lazy val name: Parser[Name] =
lastName ~ "," ~ firstName ^^ {case (l ~_~ f) => Name(f, l)}
Or if you want a prettier pattern match:
lazy val name: Parser[Name] =
lastName ~ ("," ~> firstName) ^^ {case (l ~ f) => Name(f, l)}
The code
lastName <~ "," ~> firstName
will end up throwing away the result of parsing firstName. Because of the operator precedence rules in Scala, the statement is parsed as if it were parenthesized like so:
lastName <~ ("," ~> firstName)
but even if it were grouped differently you are still only dealing with three parsers and throwing away the result of two of them.
So you end up with a String being passed into your mapping function, which is written to expect a ~[String, String] instead. That's why you get the compiler error you do.
One helpful technique for troubleshooting this sort of thing is to add ascriptions to subexpressions:
lazy val name: Parser[Name] =
((lastName <~ "," ~> firstName): Parser[String ~ String]) ^^ { case l ~ f => Name(f, l) }
which can help you to determine where exactly reality and your expectations diverge.