How to allow optional outermost parenthesis? - scala

I am writing a parser for certain expressions. I want to allow parentheses to be optional at the outermost level. My current parser looks like this:
class MyParser extends JavaTokenParsers {
def expr = andExpr | orExpr | term
def andExpr = "(" ~> expr ~ "and" ~ expr <~ ")"
def orExpr = "(" ~> expr ~ "or" ~ expr <~ ")"
def term = """[a-z]""".r
}
As it is, this parser accepts only fully parenthesized expressions, such as:
val s1 = "(a and b)"
val s2 = "((a and b) or c)"
val s3 = "((a and b) or (c and d))"
My question is, is there any modification I can make to this parser in order for the outermost parenthesis to be optional? I would like to accept the string:
val s4 = "(a and b) or (c and d)"
Thanks!

class MyParser extends JavaTokenParsers {
// the complete parser can be either a parenthesisless "andExpr" or parenthesisless
// "orExpr " or "expr"
def complete = andExpr | orExpr | expr
def expr = parenthesis | term
// moved the parenthesis from the andExpr and orExpr so I dont have to create
// an extra parenthesisless andExpr and orExpr
def parenthesis = "(" ~> (andExpr | orExpr) <~ ")"
def andExpr = expr ~ "and" ~ expr
def orExpr = expr ~ "or" ~ expr
def term = """[a-z]""".r
}

Related

Scala Parser Combinator

I am trying to write a Scala Parser combinator for the following input.
The input can be
10
(10)
((10)))
(((10)))
Here the number of brackets can keep on growing. but they should always match. So parsing should fail for ((((10)))
The result of parsing should always be the number at the center
I wrote the following parser
import scala.util.parsing.combinator._
class MyParser extends RegexParsers {
def i = "[0-9]+".r ^^ (_.toInt)
def n = "(" ~ i ~ ")" ^^ {case _ ~ b ~ _ => b.toInt}
def expr = i | n
}
val parser = new MyParser
parser.parseAll(parser.expr, "10")
parser.parseAll(parser.expr, "(10)")
but now how do I handle the case where the number of brackets keep growing but matched?
Easy, just make the parser recursive:
class MyParser extends RegexParsers {
def i = "[0-9]+".r ^^ (_.toInt)
def expr: Parser[Int] = i | "(" ~ expr ~ ")" ^^ {case _ ~ b ~ _ => b.toInt}
}
(but note that scala-parser-combinators has trouble with left-recursive definitions: Recursive definitions with scala-parser-combinators)

scala.util.parsing.combinator.RegexParsers constructor cannot be instantiated to expected type

I want to be able to parse strings like the one below with Scala parser combinators.
aaa22[bbb33[ccc]ddd]eee44[fff]
Before every open square bracket an integer literal is guaranteed to exist.
The code I have so far:
import scala.util.parsing.combinator.RegexParsers
trait AST
case class LetterSeq(value: String) extends AST
case class IntLiteral(value: String) extends AST
case class Repeater(count: AST, content: List[AST]) extends AST
class ExprParser extends RegexParsers {
def intLiteral: Parser[AST] = "[0-9]+".r ^^ IntLiteral
def letterSeq: Parser[AST] = "[a-f]+".r ^^ LetterSeq
def term: Parser[AST] = letterSeq | repeater
def expr: Parser[List[AST]] = rep1(term)
def repeater: Parser[AST] = intLiteral ~> "[" ~> expr <~ "]" ^^ {
case intLiteral ~ expr => Repeater(intLiteral, expr)
}
}
The message I get:
<console>:25: error: constructor cannot be instantiated to expected type;
found : ExprParser.this.~[a,b]
required: List[AST]
case intLiteral ~ expr => Repeater(intLiteral, expr)
Any ideas?
Later Edit: After making the change suggested by #sepp2k I still get the same error. The change being:
def repeater: Parser[AST] = intLiteral ~ "[" ~> expr <~ "]" ^^ {
The error message is telling you that you're pattern matching a list against the ~ constructor, which isn't allowed. In order to use ~ in your pattern, you need to have used ~ in the parser.
It looks like in this case the problem is simply that you discarded the value of intLiteral using ~> when you did not mean to. If you use ~ instead of ~> here and add parentheses1, that should fix your problem.
1 The parentheses are required, so that the following ~> only throws away the bracket instead of the result of intLiteral ~ "[". intLiteral ~ "[" ~> expr <~ "]" is parsed as (intLiteral ~ "[") ~> expr <~ "]", which still throws away the intLiteral. You want intLiteral ~ ("[" ~> expr <~ "]") which only throws away the [ and ].

Parsing in scala/Java

I am writing a Parser in scala and got stuck at this point:
private def expression : Parser[Expression] = cond | variable | integer | liste | function
private def cond : Parser[Expression] = "if" ~ predicate ~ "then" ~ expression ~ "else" ~ expression ^^ {case _~i~_~t~_~el => Cond(i,t,el)}
private def predicate: Parser[Predicate] = identifier ~ "?" ~ "(" ~ repsep(expression, ",") ~ ")" ^^{case n~_~_~el~_ => Predicate(n,el)}
private def function: Parser[Expression] = identifier ~ "(" ~ repsep(expression, ",") ~ ")" ^^{case n~_~el~_ => Function(n,el)}
private def liste: Parser[Expression] = "[" ~ repsep(expression, ",") ~ "]" ^^ {case _~ls~_ => Liste(ls)}
private def variable: Parser[Expression] = identifier ^^ {case v => Variable(v)}
def identifier: Parser[String] = """[a-zA-Z0-9]+""".r ^^ { _.toString }
def integer: Parser[Integer] = num ^^ { case i => Integer(i)}
def num: Parser[String] = """(-?\d*)""".r ^^ {_.toString}
My problem is that when it comes to an "expression" the Parser does not always takes the right way. Like if its funk(x,y) it tries to parse it like a variable ant not like a function.
Any idea?
Change order of parsers in your expression parser - put function before variable and after cond. In general, when you compose parsers using alternative A | B, then parser A shouldn't be able to parse input that is prefix of input parsable by parser B.

Combinator Definition gives an error, unable to understand why

I am trying to write a simple parser to be able to generate a DDL for an RDBMS, but got stuck in defining the combinator.
import scala.util.parsing.combinator._
object DocumentParser extends RegexParsers {
override protected val whiteSpace = """(\s|//.*)+""".r //To include comments in what is regarded as white space, to be ignored
case class DocumentAttribute(attributeName : String, attributeType : String)
case class Document(documentName : String, documentAttributeList : List[DocumentAttribute])
def document : Parser[Document]= "document" ~> documentName <~ "{" ~> attributeList <~ "}" ^^ {case n ~ l => Document(n, l)} //Here is where I get an error
def documentName : Parser[String] = """[a-zA-Z_][a-zA-Z0-9_]*""".r ^^ {_.toString}
def attributeList : Parser[List[DocumentAttribute]] = repsep(attribute, ",")
def attribute : Parser[DocumentAttribute] = attributeName ~ attributeType ^^ {case n ~ t => DocumentAttribute(n, t)}
def attributeName : Parser[String] = """[a-zA-Z_][a-zA-Z0-9_]*""".r ^^ {_.toString}
def attributeType : Parser[String] = """[a-zA-Z_][a-zA-Z0-9_]*""".r ^^ {_.toString}
}
It seems that I have defined it correctly. Is there something obvious I am missing or something fundamental about combinators I don't understand? Thanks!
You have to use the following code for document:
def document : Parser[Document]= "document" ~> documentName ~ ("{" ~> attributeList <~ "}") ^^ {case n ~ l => Document(n, l)}
Note the ~ after documentName and brackets around "{" ~> attributeList <~ "}". Otherwise, all those <~ and ~> will discard everything except attributeList.
Basically, without any parentheses, the result is that everything to the right of the leftmost <~ is discarded, and then everything to the left of the rightmost ~> still remaining is discarded. For example:
def foo: Parser[String] = "a" ~> "b" ~> "c" ~ "d" <~ "e" ~> "f" <~ "g"
|<-discarded->| | <- discarded -> |
With this change your code works:
scala> DocumentParser.document(new CharSequenceReader(
""" document foo {bar baz, // comment
| qaz wsx}""".stripMargin))
res4: DocumentParser.ParseResult[DocumentParser.Document] = [2.10] parsed: Document(foo,List(DocumentAttribute(bar,baz), DocumentAttribute(qaz,wsx)))

Transforming Parser[Any] to a Stricter Type

Programming in Scala's Chapter 33 explains Combinator Parsing:
It provides this example:
import scala.util.parsing.combinator._
class Arith extends JavaTokenParsers {
def expr: Parser[Any] = term~rep("+"~term | "-"~term)
def term: Parser[Any] = factor~rep("*"~factor | "/"~factor)
def factor: Parser[Any] = floatingPointNumber | "("~expr~")"
}
How can I map expr to a narrower type than Parser[Any]? In other words,
I'd like to take def expr: Parser[Any] and map that via ^^ into a stricter type.
Note - I asked this question in Scala Google Groups - https://groups.google.com/forum/#!forum/scala-user, but haven't received a complete answer that helped me out.
As already stated in the comments, you can narrow down the type to anything you like. You just have to specify it after the ^^.
Here is a complete example with a data structure from your given code.
object Arith extends JavaTokenParsers {
trait Expression //The data structure
case class FNumber(value: Float) extends Expression
case class Plus(e1: Expression, e2: Expression) extends Expression
case class Minus(e1: Expression, e2: Expression) extends Expression
case class Mult(e1: Expression, e2: Expression) extends Expression
case class Div(e1: Expression, e2: Expression) extends Expression
def expr: Parser[Expression] = term ~ rep("+" ~ term | "-" ~ term) ^^ {
case term ~ rest => rest.foldLeft(term)((result, elem) => elem match {
case "+" ~ e => Plus(result, e)
case "-" ~ e => Minus(result, e)
})
}
def term: Parser[Expression] = factor ~ rep("*" ~ factor | "/" ~ factor) ^^ {
case factor ~ rest => rest.foldLeft(factor)((result, elem) => elem match {
case "*" ~ e => Mult(result, e)
case "/" ~ e => Div(result, e)
})
}
def factor: Parser[Expression] = floatingPointNumber ^^ (f => FNumber(f.toFloat)) | "(" ~> expr <~ ")"
def parseInput(input: String): Expression = parse(expr, input) match {
case Success(ex, _) => ex
case _ => throw new IllegalArgumentException //or change the result to Try[Expression]
}
}
Now we can start to parse something.
Arith.parseInput("(1.3 + 2.0) * 2")
//yields: Mult(Plus(FNumber(1.3),FNumber(2.0)),FNumber(2.0))
Of course you can also have a Parser[String] or a Parser[Float], where you directly transform or evaluate the input String. It is as I said up to you.