Parsing in scala/Java - scala

I am writing a Parser in scala and got stuck at this point:
private def expression : Parser[Expression] = cond | variable | integer | liste | function
private def cond : Parser[Expression] = "if" ~ predicate ~ "then" ~ expression ~ "else" ~ expression ^^ {case _~i~_~t~_~el => Cond(i,t,el)}
private def predicate: Parser[Predicate] = identifier ~ "?" ~ "(" ~ repsep(expression, ",") ~ ")" ^^{case n~_~_~el~_ => Predicate(n,el)}
private def function: Parser[Expression] = identifier ~ "(" ~ repsep(expression, ",") ~ ")" ^^{case n~_~el~_ => Function(n,el)}
private def liste: Parser[Expression] = "[" ~ repsep(expression, ",") ~ "]" ^^ {case _~ls~_ => Liste(ls)}
private def variable: Parser[Expression] = identifier ^^ {case v => Variable(v)}
def identifier: Parser[String] = """[a-zA-Z0-9]+""".r ^^ { _.toString }
def integer: Parser[Integer] = num ^^ { case i => Integer(i)}
def num: Parser[String] = """(-?\d*)""".r ^^ {_.toString}
My problem is that when it comes to an "expression" the Parser does not always takes the right way. Like if its funk(x,y) it tries to parse it like a variable ant not like a function.
Any idea?

Change order of parsers in your expression parser - put function before variable and after cond. In general, when you compose parsers using alternative A | B, then parser A shouldn't be able to parse input that is prefix of input parsable by parser B.

Related

Scala Parser Combinator

I am trying to write a Scala Parser combinator for the following input.
The input can be
10
(10)
((10)))
(((10)))
Here the number of brackets can keep on growing. but they should always match. So parsing should fail for ((((10)))
The result of parsing should always be the number at the center
I wrote the following parser
import scala.util.parsing.combinator._
class MyParser extends RegexParsers {
def i = "[0-9]+".r ^^ (_.toInt)
def n = "(" ~ i ~ ")" ^^ {case _ ~ b ~ _ => b.toInt}
def expr = i | n
}
val parser = new MyParser
parser.parseAll(parser.expr, "10")
parser.parseAll(parser.expr, "(10)")
but now how do I handle the case where the number of brackets keep growing but matched?
Easy, just make the parser recursive:
class MyParser extends RegexParsers {
def i = "[0-9]+".r ^^ (_.toInt)
def expr: Parser[Int] = i | "(" ~ expr ~ ")" ^^ {case _ ~ b ~ _ => b.toInt}
}
(but note that scala-parser-combinators has trouble with left-recursive definitions: Recursive definitions with scala-parser-combinators)

Using keep-left/right combinator is not working with result converter

I have a combinator and a result converter that looks like so:
// parses a line like so:
//
// 2
// 00:00:01.610 --> 00:00:02.620 align:start position:0%
//
private def subtitleHeader: Parser[SubtitleBlock] = {
(subtitleNumber ~ whiteSpace).? ~>
time ~ arrow ~ time ~ opt(textLine) ~ eol
} ^^ {
case
startTime ~ _ ~ endTime ~ _ ~ _
=> SubtitleBlock(startTime, endTime, List(""))
}
Because the arrow, textline and eol are not important to my result converter, I was hoping I could use <~ and ~> in the right places within my combinator such that my converter doesn't have to deal with them. As an experiment, I changed the first ~ in the parser to <~ and removed the ~ _ where the "arrow" would be matched in the case statement like so:
private def subtitleHeader: Parser[SubtitleBlock] = {
(subtitleNumber ~ whiteSpace).? ~>
time <~ arrow ~ time ~ opt(textLine) ~ eol
} ^^ {
case
startTime ~ endTime ~ _ ~ _
=> SubtitleBlock(startTime, endTime, List(""))
}
However, I get red-squigglies in IntelliJ with the error message:
Error:(44, 31) constructor cannot be instantiated to expected type;
found : caption.vttdissector.VttParsers.~[a,b] required: Int
startTime ~ endTime ~ _ ~ _
What am I doing wrong?
Since you didn't insert any parentheses in the chain of ~ and <~, most matched subexpressions are thrown out "with the bathwater" (or rather "with the whitespace and arrows"). Just insert some parentheses.
Here is the general pattern what it should look like:
(irrelevant ~> irrelevant ~> RELEVANT <~ irrelevant <~ irrelevant) ~
(irrelevant ~> RELEVANT <~ irrelevant <~ irrelevant) ~
...
i.e. every "relevant" subexpression is surrounded by irrelevant stuff and a pair of parentheses, and then the parenthesized subexpressions are connected by ~'s.
Your example:
import scala.util.parsing.combinator._
import scala.util.{Either, Left, Right}
case class SubtitleBlock(startTime: String, endTime: String, text: List[String])
object YourParser extends RegexParsers {
def subtitleHeader: Parser[SubtitleBlock] = {
(subtitleNumber.? ~> time <~ arrow) ~
time ~
(opt(textLine) <~ eol)
} ^^ {
case startTime ~ endTime ~ _ => SubtitleBlock(startTime, endTime, Nil)
}
override val whiteSpace = "[ \t]+".r
def arrow: Parser[String] = "-->".r
def subtitleNumber: Parser[String] = "\\d+".r
def time: Parser[String] = "\\d{2}:\\d{2}:\\d{2}.\\d{3}".r
def textLine: Parser[String] = ".*".r
def eol: Parser[String] = "\n".r
def parseStuff(s: String): scala.util.Either[String, SubtitleBlock] =
parseAll(subtitleHeader, s) match {
case Success(t, _) => scala.util.Right(t)
case f => scala.util.Left(f.toString)
}
def main(args: Array[String]): Unit = {
val examples: List[String] = List(
"2 00:00:01.610 --> 00:00:02.620 align:start position:0%\n"
) ++ args.map(_ + "\n")
for (x <- examples) {
println(parseStuff(x))
}
}
}
finds:
Right(SubtitleBlock(00:00:01.610,00:00:02.620,List()))

Scala warning match may not be exhaustive while parsing

class ExprParser extends RegexParsers {
val number = "[0-9]+".r
def expr: Parser[Int] = term ~ rep(
("+" | "-") ~ term ^^ {
case "+" ~ t => t
case "-" ~ t => -t
}) ^^ { case t ~ r => t + r.sum }
def term: Parser[Int] = factor ~ (("*" ~ factor)*) ^^ {
case f ~ r => f * r.map(_._2).product
}
def factor: Parser[Int] = number ^^ { _.toInt } | "(" ~> expr <~ ")"
}
I get the following warning when compiling
warning: match may not be exhaustive.
It would fail on the following input: ~((x: String forSome x not in ("+", "-")), _)
("+" | "-") ~ term ^^ {
^
one warning found
I heard that #unchecked annotation can help. But in this case where should I put it?
The issue here is that with ("+" | "-") you are creating a parser that accepts only two possible strings. However when you map on the resulting parser to extract the value, the result you're going to extract will just be String.
In your pattern matching you only have cases for the strings "+" and "-", but the compiler has no way of knowing that those are the only possible strings that will show up, so it's telling you here that your match may not be exhaustive since it can't know any better.
You could use an unchecked annotation to suppress the warning, but there are much better, more idiomatic ways, to eliminate the issue. One way to solve this is to replace those strings with some kind of structured type as soon as possible. For example, create an ADT
sealed trait Operation
case object Plus extends Operation
case object Minus extends Operation
//then in your parser
("+" ^^^ Plus | "-" ^^^ Minus) ~ term ^^ {
case PLus ~ t => t
case Minus ~ t => -t
}
Now it should be able to realize that the only possible cases are Plus and Minus
Add a case to remove the warning
class ExprParser extends RegexParsers {
val number = "[0-9]+".r
def expr: Parser[Int] = term ~ rep(
("+" | "-") ~ term ^^ {
case "+" ~ t => t
case "-" ~ t => -t
case _ ~ t => t
}) ^^ { case t ~ r => t + r.sum }
def term: Parser[Int] = factor ~ (("*" ~ factor)*) ^^ {
case f ~ r => f * r.map(_._2).product
}
def factor: Parser[Int] = number ^^ { _.toInt } | "(" ~> expr <~ ")"
}

Combinator Definition gives an error, unable to understand why

I am trying to write a simple parser to be able to generate a DDL for an RDBMS, but got stuck in defining the combinator.
import scala.util.parsing.combinator._
object DocumentParser extends RegexParsers {
override protected val whiteSpace = """(\s|//.*)+""".r //To include comments in what is regarded as white space, to be ignored
case class DocumentAttribute(attributeName : String, attributeType : String)
case class Document(documentName : String, documentAttributeList : List[DocumentAttribute])
def document : Parser[Document]= "document" ~> documentName <~ "{" ~> attributeList <~ "}" ^^ {case n ~ l => Document(n, l)} //Here is where I get an error
def documentName : Parser[String] = """[a-zA-Z_][a-zA-Z0-9_]*""".r ^^ {_.toString}
def attributeList : Parser[List[DocumentAttribute]] = repsep(attribute, ",")
def attribute : Parser[DocumentAttribute] = attributeName ~ attributeType ^^ {case n ~ t => DocumentAttribute(n, t)}
def attributeName : Parser[String] = """[a-zA-Z_][a-zA-Z0-9_]*""".r ^^ {_.toString}
def attributeType : Parser[String] = """[a-zA-Z_][a-zA-Z0-9_]*""".r ^^ {_.toString}
}
It seems that I have defined it correctly. Is there something obvious I am missing or something fundamental about combinators I don't understand? Thanks!
You have to use the following code for document:
def document : Parser[Document]= "document" ~> documentName ~ ("{" ~> attributeList <~ "}") ^^ {case n ~ l => Document(n, l)}
Note the ~ after documentName and brackets around "{" ~> attributeList <~ "}". Otherwise, all those <~ and ~> will discard everything except attributeList.
Basically, without any parentheses, the result is that everything to the right of the leftmost <~ is discarded, and then everything to the left of the rightmost ~> still remaining is discarded. For example:
def foo: Parser[String] = "a" ~> "b" ~> "c" ~ "d" <~ "e" ~> "f" <~ "g"
|<-discarded->| | <- discarded -> |
With this change your code works:
scala> DocumentParser.document(new CharSequenceReader(
""" document foo {bar baz, // comment
| qaz wsx}""".stripMargin))
res4: DocumentParser.ParseResult[DocumentParser.Document] = [2.10] parsed: Document(foo,List(DocumentAttribute(bar,baz), DocumentAttribute(qaz,wsx)))

How to allow optional outermost parenthesis?

I am writing a parser for certain expressions. I want to allow parentheses to be optional at the outermost level. My current parser looks like this:
class MyParser extends JavaTokenParsers {
def expr = andExpr | orExpr | term
def andExpr = "(" ~> expr ~ "and" ~ expr <~ ")"
def orExpr = "(" ~> expr ~ "or" ~ expr <~ ")"
def term = """[a-z]""".r
}
As it is, this parser accepts only fully parenthesized expressions, such as:
val s1 = "(a and b)"
val s2 = "((a and b) or c)"
val s3 = "((a and b) or (c and d))"
My question is, is there any modification I can make to this parser in order for the outermost parenthesis to be optional? I would like to accept the string:
val s4 = "(a and b) or (c and d)"
Thanks!
class MyParser extends JavaTokenParsers {
// the complete parser can be either a parenthesisless "andExpr" or parenthesisless
// "orExpr " or "expr"
def complete = andExpr | orExpr | expr
def expr = parenthesis | term
// moved the parenthesis from the andExpr and orExpr so I dont have to create
// an extra parenthesisless andExpr and orExpr
def parenthesis = "(" ~> (andExpr | orExpr) <~ ")"
def andExpr = expr ~ "and" ~ expr
def orExpr = expr ~ "or" ~ expr
def term = """[a-z]""".r
}