Scala Parser, why doesn't "pat <~ pat ~> pat" work? - scala

Trying out a simple parser combinator, I'm running into compilation errors.
I would like to parse -- "Smith, Joe" into its Name object like Name(Joe, Smith). Simple enough, I guess.
Here is the code related with that:
import util.parsing.combinator._
class NameParser extends JavaTokenParsers {
lazy val name: Parser[Name] =
lastName <~ "," ~> firstName ^^ {case (l ~ f) => Name(f, l)}
lazy val lastName = stringLiteral
lazy val firstName = stringLiteral
}
case class Name(firstName:String, lastName: String)
And I'm testing it via
object NameParserTest {
def main(args: Array[String]) {
val parser = new NameParser()
println(parser.parseAll(parser.name, "Schmo, Joe"))
}
}
Getting a compilation error:
error: constructor cannot be instantiated to expected type;
found : NameParser.this.~[a,b]
required: java.lang.String
lazy val name: Parser[Name] = lastName <~ "," ~> firstName ^^ {case (l ~ f) => Name(f, l)}
What is that I am missing here?

In this line here:
lazy val name: Parser[Name] =
lastName <~ "," ~> firstName ^^ {case (l ~ f) => Name(f, l)}
you don't want to use both <~ and ~>. You're creating a parser that matches "," and firstName and keeps only ",", and then you're creating a parser that matches lastName and the previous parser and keeps only lastName.
You can replace it with this:
(lastName <~ ",") ~ firstName ^^ {case (l ~ f) => Name(f, l)}
However, although this will compile and combine the way you want, it won't parse what you want it to. I got this output when I tried:
[1.1] failure: string matching regex `"([^"\p{Cntrl}\\]|\\[\\/bfnrt]|\\u[a-fA-F0-9]{4})*"' expected but `S' found
Schmo, Joe
^
stringLiteral expects something that looks like a string literal in code (something in quotation marks). (JavaTokenParsers is meant to parse stuff that looks like Java.) This works:
scala> val x = new NameParser
x: NameParser = NameParser#1ea8dbd
scala> x.parseAll(x.name, "\"Schmo\", \"Joe\"")
res0: x.ParseResult[Name] = [1.15] parsed: Name("Joe","Schmo")
You should probably replace it with a regex that specifies what kind of strings you will accept for names. If you look at the documentation here, you'll see:
implicit def regex (r: Regex) : Parser[String]
A parser that matches a regex string
So you can just put a Regex object there and it will be converted into a parser that matches it.

the ~> combinator ignores the left side and the <~ combinator ignores the right side. So the result of lastName <~ "," ~> firstName can never include the results of both firstName and lastName. Actually it is only the parse result of lastName because "," ~> firstName is ignored. You need to use sequential composition here:
lazy val name: Parser[Name] =
lastName ~ "," ~ firstName ^^ {case (l ~_~ f) => Name(f, l)}
Or if you want a prettier pattern match:
lazy val name: Parser[Name] =
lastName ~ ("," ~> firstName) ^^ {case (l ~ f) => Name(f, l)}

The code
lastName <~ "," ~> firstName
will end up throwing away the result of parsing firstName. Because of the operator precedence rules in Scala, the statement is parsed as if it were parenthesized like so:
lastName <~ ("," ~> firstName)
but even if it were grouped differently you are still only dealing with three parsers and throwing away the result of two of them.
So you end up with a String being passed into your mapping function, which is written to expect a ~[String, String] instead. That's why you get the compiler error you do.
One helpful technique for troubleshooting this sort of thing is to add ascriptions to subexpressions:
lazy val name: Parser[Name] =
((lastName <~ "," ~> firstName): Parser[String ~ String]) ^^ { case l ~ f => Name(f, l) }
which can help you to determine where exactly reality and your expectations diverge.

Related

Using keep-left/right combinator is not working with result converter

I have a combinator and a result converter that looks like so:
// parses a line like so:
//
// 2
// 00:00:01.610 --> 00:00:02.620 align:start position:0%
//
private def subtitleHeader: Parser[SubtitleBlock] = {
(subtitleNumber ~ whiteSpace).? ~>
time ~ arrow ~ time ~ opt(textLine) ~ eol
} ^^ {
case
startTime ~ _ ~ endTime ~ _ ~ _
=> SubtitleBlock(startTime, endTime, List(""))
}
Because the arrow, textline and eol are not important to my result converter, I was hoping I could use <~ and ~> in the right places within my combinator such that my converter doesn't have to deal with them. As an experiment, I changed the first ~ in the parser to <~ and removed the ~ _ where the "arrow" would be matched in the case statement like so:
private def subtitleHeader: Parser[SubtitleBlock] = {
(subtitleNumber ~ whiteSpace).? ~>
time <~ arrow ~ time ~ opt(textLine) ~ eol
} ^^ {
case
startTime ~ endTime ~ _ ~ _
=> SubtitleBlock(startTime, endTime, List(""))
}
However, I get red-squigglies in IntelliJ with the error message:
Error:(44, 31) constructor cannot be instantiated to expected type;
found : caption.vttdissector.VttParsers.~[a,b] required: Int
startTime ~ endTime ~ _ ~ _
What am I doing wrong?
Since you didn't insert any parentheses in the chain of ~ and <~, most matched subexpressions are thrown out "with the bathwater" (or rather "with the whitespace and arrows"). Just insert some parentheses.
Here is the general pattern what it should look like:
(irrelevant ~> irrelevant ~> RELEVANT <~ irrelevant <~ irrelevant) ~
(irrelevant ~> RELEVANT <~ irrelevant <~ irrelevant) ~
...
i.e. every "relevant" subexpression is surrounded by irrelevant stuff and a pair of parentheses, and then the parenthesized subexpressions are connected by ~'s.
Your example:
import scala.util.parsing.combinator._
import scala.util.{Either, Left, Right}
case class SubtitleBlock(startTime: String, endTime: String, text: List[String])
object YourParser extends RegexParsers {
def subtitleHeader: Parser[SubtitleBlock] = {
(subtitleNumber.? ~> time <~ arrow) ~
time ~
(opt(textLine) <~ eol)
} ^^ {
case startTime ~ endTime ~ _ => SubtitleBlock(startTime, endTime, Nil)
}
override val whiteSpace = "[ \t]+".r
def arrow: Parser[String] = "-->".r
def subtitleNumber: Parser[String] = "\\d+".r
def time: Parser[String] = "\\d{2}:\\d{2}:\\d{2}.\\d{3}".r
def textLine: Parser[String] = ".*".r
def eol: Parser[String] = "\n".r
def parseStuff(s: String): scala.util.Either[String, SubtitleBlock] =
parseAll(subtitleHeader, s) match {
case Success(t, _) => scala.util.Right(t)
case f => scala.util.Left(f.toString)
}
def main(args: Array[String]): Unit = {
val examples: List[String] = List(
"2 00:00:01.610 --> 00:00:02.620 align:start position:0%\n"
) ++ args.map(_ + "\n")
for (x <- examples) {
println(parseStuff(x))
}
}
}
finds:
Right(SubtitleBlock(00:00:01.610,00:00:02.620,List()))

scala.util.parsing.combinator.RegexParsers constructor cannot be instantiated to expected type

I want to be able to parse strings like the one below with Scala parser combinators.
aaa22[bbb33[ccc]ddd]eee44[fff]
Before every open square bracket an integer literal is guaranteed to exist.
The code I have so far:
import scala.util.parsing.combinator.RegexParsers
trait AST
case class LetterSeq(value: String) extends AST
case class IntLiteral(value: String) extends AST
case class Repeater(count: AST, content: List[AST]) extends AST
class ExprParser extends RegexParsers {
def intLiteral: Parser[AST] = "[0-9]+".r ^^ IntLiteral
def letterSeq: Parser[AST] = "[a-f]+".r ^^ LetterSeq
def term: Parser[AST] = letterSeq | repeater
def expr: Parser[List[AST]] = rep1(term)
def repeater: Parser[AST] = intLiteral ~> "[" ~> expr <~ "]" ^^ {
case intLiteral ~ expr => Repeater(intLiteral, expr)
}
}
The message I get:
<console>:25: error: constructor cannot be instantiated to expected type;
found : ExprParser.this.~[a,b]
required: List[AST]
case intLiteral ~ expr => Repeater(intLiteral, expr)
Any ideas?
Later Edit: After making the change suggested by #sepp2k I still get the same error. The change being:
def repeater: Parser[AST] = intLiteral ~ "[" ~> expr <~ "]" ^^ {
The error message is telling you that you're pattern matching a list against the ~ constructor, which isn't allowed. In order to use ~ in your pattern, you need to have used ~ in the parser.
It looks like in this case the problem is simply that you discarded the value of intLiteral using ~> when you did not mean to. If you use ~ instead of ~> here and add parentheses1, that should fix your problem.
1 The parentheses are required, so that the following ~> only throws away the bracket instead of the result of intLiteral ~ "[". intLiteral ~ "[" ~> expr <~ "]" is parsed as (intLiteral ~ "[") ~> expr <~ "]", which still throws away the intLiteral. You want intLiteral ~ ("[" ~> expr <~ "]") which only throws away the [ and ].

Parsing in scala/Java

I am writing a Parser in scala and got stuck at this point:
private def expression : Parser[Expression] = cond | variable | integer | liste | function
private def cond : Parser[Expression] = "if" ~ predicate ~ "then" ~ expression ~ "else" ~ expression ^^ {case _~i~_~t~_~el => Cond(i,t,el)}
private def predicate: Parser[Predicate] = identifier ~ "?" ~ "(" ~ repsep(expression, ",") ~ ")" ^^{case n~_~_~el~_ => Predicate(n,el)}
private def function: Parser[Expression] = identifier ~ "(" ~ repsep(expression, ",") ~ ")" ^^{case n~_~el~_ => Function(n,el)}
private def liste: Parser[Expression] = "[" ~ repsep(expression, ",") ~ "]" ^^ {case _~ls~_ => Liste(ls)}
private def variable: Parser[Expression] = identifier ^^ {case v => Variable(v)}
def identifier: Parser[String] = """[a-zA-Z0-9]+""".r ^^ { _.toString }
def integer: Parser[Integer] = num ^^ { case i => Integer(i)}
def num: Parser[String] = """(-?\d*)""".r ^^ {_.toString}
My problem is that when it comes to an "expression" the Parser does not always takes the right way. Like if its funk(x,y) it tries to parse it like a variable ant not like a function.
Any idea?
Change order of parsers in your expression parser - put function before variable and after cond. In general, when you compose parsers using alternative A | B, then parser A shouldn't be able to parse input that is prefix of input parsable by parser B.

Type mismatch java.io.Serializable and GenTraversableOnce

I'm trying to get a parser to take a sequence of colon-seperated words and convert them into an array.
Here's an SSCCE.
import util.parsing.combinator._
class Element {
def getSuper() = "TODO"
}
class Comp extends RegexParsers with PackratParsers {
lazy val element: PackratParser[Element] = (
"foo") ^^ {
s => new Element
}
lazy val list: PackratParser[Array[Element]] = (
(element ~ ";" ~ list) |
element ~ ";") ^^
{
case a ~ ";" ~ b => Array(a) ++ b
case a ~ ";" => Array(a)
}
}
object Compiler extends Comp {
def main(args: Array[String]) = {
println(parseAll(list, "foo; foo; foo;"))
}
}
It's not working and it's not compiling, if it was I wouldn't be asking about it. This is the error message I'm getting. Is there a way to convert from Serializable to GenTraversableOnce?
~/Documents/Git/Workspace/Uncool/Scales$ scalac stov.scala
stov.scala:19: error: type mismatch;
found : java.io.Serializable
required: scala.collection.GenTraversableOnce[?]
case a ~ ";" ~ b => Array(a) ++ b
^
one error found
My suspicion goes on the | combinator.
The type of (element ~ ";" ~ list) is ~[~[Element, String], Array[Element]] and the type of element ~ ";" is ~[Element, String].
Thus when applying the | combinator on these parsers, it returns a Parser[U] where U is a supertype of T ([U >: T]).
Here the type of T is ~[~[Element, String], Array[Element]] and the type of U is ~[Element, String].
So the most specific type between Array[Element] and String is Serializable.
Between ~[Element, String] and Element its Object. That's why the type of | is ~[Serializable, Object].
So when applying the map operation, you need to provide a function ~[Serializable, Object] => U where U is Array[Element] in your case since the return type of your function is PackratParser[Array[Element]].
Now the only possible match is:
case obj ~ ser => //do what you want
Now you see that the pattern you're trying to match in your map is fundamentally wrong. Even if you return an empty array (just so that it compiles), you'll see that it leads to a match error at runtime.
That said, what I suggest is first to map separately each combinator:
lazy val list: PackratParser[Array[Element]] =
(element ~ ";" ~ list) ^^ {case a ~ ";" ~ b => Array(a) ++ b} |
(element ~ ";") ^^ {case a ~ ";" => Array(a)}
But the pattern you are looking for is already implemented using the rep combinator (you could also take a look at repsep but you'd need to handle the last ; separately):
lazy val list: PackratParser[Array[Element]] = rep(element <~ ";") ^^ (_.toArray)

Combinator Definition gives an error, unable to understand why

I am trying to write a simple parser to be able to generate a DDL for an RDBMS, but got stuck in defining the combinator.
import scala.util.parsing.combinator._
object DocumentParser extends RegexParsers {
override protected val whiteSpace = """(\s|//.*)+""".r //To include comments in what is regarded as white space, to be ignored
case class DocumentAttribute(attributeName : String, attributeType : String)
case class Document(documentName : String, documentAttributeList : List[DocumentAttribute])
def document : Parser[Document]= "document" ~> documentName <~ "{" ~> attributeList <~ "}" ^^ {case n ~ l => Document(n, l)} //Here is where I get an error
def documentName : Parser[String] = """[a-zA-Z_][a-zA-Z0-9_]*""".r ^^ {_.toString}
def attributeList : Parser[List[DocumentAttribute]] = repsep(attribute, ",")
def attribute : Parser[DocumentAttribute] = attributeName ~ attributeType ^^ {case n ~ t => DocumentAttribute(n, t)}
def attributeName : Parser[String] = """[a-zA-Z_][a-zA-Z0-9_]*""".r ^^ {_.toString}
def attributeType : Parser[String] = """[a-zA-Z_][a-zA-Z0-9_]*""".r ^^ {_.toString}
}
It seems that I have defined it correctly. Is there something obvious I am missing or something fundamental about combinators I don't understand? Thanks!
You have to use the following code for document:
def document : Parser[Document]= "document" ~> documentName ~ ("{" ~> attributeList <~ "}") ^^ {case n ~ l => Document(n, l)}
Note the ~ after documentName and brackets around "{" ~> attributeList <~ "}". Otherwise, all those <~ and ~> will discard everything except attributeList.
Basically, without any parentheses, the result is that everything to the right of the leftmost <~ is discarded, and then everything to the left of the rightmost ~> still remaining is discarded. For example:
def foo: Parser[String] = "a" ~> "b" ~> "c" ~ "d" <~ "e" ~> "f" <~ "g"
|<-discarded->| | <- discarded -> |
With this change your code works:
scala> DocumentParser.document(new CharSequenceReader(
""" document foo {bar baz, // comment
| qaz wsx}""".stripMargin))
res4: DocumentParser.ParseResult[DocumentParser.Document] = [2.10] parsed: Document(foo,List(DocumentAttribute(bar,baz), DocumentAttribute(qaz,wsx)))