I am trying to parse a simple syntax like so
mod A do
end
where between the do block will either be content or not. For some odd reason I am not parsing this correctly using this combinator.
def module: Parser[Any] = "mod" ~> moduleIdent ~ "do" ~ opt(rep(_)) ~ "end"
where _ is the repetition of an optional function definition parser.
I am getting the following error.
[2.1] failure: 'do' expected but 'e' found
end
moduleIdent code:
def moduleIdent: Parser[String] = "[A-Z]+".r ~ opt(ident) ^^ {
case first ~ optAll => optAll match {
case Some(all) => first ++ all
case None => first
}
}
running the parsing code:
parseAll(module, Source.fromFile(source).mkString)
where source is the given code that is failing.
Sometimes I am also getting "end of input" error. I am not sure what I am doing wrong. I am using the JavaTokenParsers class.
Thank you for your help!
Related
I am writing a parser for boolean expressions, and try to parse input like "true and false"
def boolExpression: Parser[BoolExpression] = boolLiteral | andExpression
def andExpression: Parser[AndExpression] = (boolExpression ~ "and" ~ boolExpression) ^^ {
case b1 ~ "and" ~ b2 => AndExpression(b1, b2)
}
def boolLiteral: Parser[BoolLiteral] = ("true" | "false") ^^ {
s => BoolLiteral(java.lang.Boolean.valueOf(s))
}
The above code does not parse "true and false", since it reads only "true" and applies rule boolLiteral immediately
But if I change the rule boolExpression to this:
def boolExpression: Parser[BoolExpression] = andExpression | boolLiteral
Then, when parsing "true and false", the code throws a StackoverflowError due to endless recursion
java.lang.StackOverflowError
at Parser.boolExpression(NewParser.scala:58)
at Parser.andExpression(NewParser.scala:62)
at Parser.boolExpression(NewParser.scala:58)
at Parser.andExpression(NewParser.scala:62)
...
How to solve this?
This appears to be parsing a string of boolean constants separated by "and" which is best done using the chainl1 primitive in the parser. This process a chain of operations a op b op c op d using left-to-right precedence.
It might look something like this totally untested code:
trait BoolExpression
case class BoolLiteral(value: Boolean) extends BoolExpression
case class AndExpression(l: BoolExpression, r: BoolExpression) extends BoolExpression
def boolLiteral: Parser[BoolLiteral] =
("true" | "false") ^^ { s => BoolLiteral(java.lang.Boolean.valueOf(s)) }
def andExpression: Parser[(BoolExpression, BoolExpression) => BoolExpression] =
"and" ^^ { _ => (l: BoolExpression, r: BoolExpression) => AndExpression(l, r) }
def boolExpression: Parser[BoolExpression] =
chainl1(boolLiteral, andExpression) ^^ { expr => expr }
Presumably the requirement is more complex than this, but the chainl1 parser is a good starting point.
The core of the issue is that the usual parser combinator libraries use parsing algorithms that don't support left recursion. One solution is to rewrite the grammar without left recursion, meaning that every parser needs to consume at least some input before invoking itself recursively. For instance, your grammar can be written like so:
def boolExpression: Parser[BoolExpression] = andExpression | boolLiteral
def andExpression: Parser[AndExpression] = (boolLiteral ~ "and" ~ boolExpression) ^^ {
case b1 ~ "and" ~ b2 => AndExpression(b1, b2)
}
But you can also use a parser combinator library that's based on another parsing algorithm that supports left recursion. I'm only aware of one for Scala:
https://github.com/djspiewak/gll-combinators
I don't know to what degree it is production ready.
//edit:
I think this one might also support left recursion. Again, I don't know the degree to which it is production ready.
https://github.com/djspiewak/parseback
I'm beginner in scala and looking at this tutorial :
http://enear.github.io/2016/03/31/parser-combinators/
Event it is explained just below :
The ^^ operator acts as a map over the parse result. The regex
"[a-zA-Z_][a-zA-Z0-9_]*".r is implicitly converted to an instance of
Parser[String], on which we map a function (String => IDENTIFIER),
thus returning a instance of Parser[IDENTIFIER].
I dont understand this code snippet :
def identifier: Parser[IDENTIFIER] = {
"[a-zA-Z_][a-zA-Z0-9_]*".r ^^ { str => IDENTIFIER(str) }
}
Can someone explain the ^^ operator and how it is mapped to a block code ?
Thanks
It defines the operation that needs to be performed when the left-hand side expression is evaluated.
For instance, we have some code like this:
def symbol: Parser[Any] = "+" | "-" | "*"
def number: Parser[Int] = """(0|[1-9]\d*)""".r ^^ {string => string.toInt }
def expression = {
number ~ symbol ~ number ^^ { case firstOperand ~ operator ~ secondOperand =>
firstOperand + secondOperand
}
}
So, here in the number we will convert String to Int. This means that when we will use our parser like this:
parse(expression, "3 + 2")
It will return us 5, if we will not convert it and leave it as Strings we will get "32".
I am trying to understand how Try works in scala (not try/catch) but Try. As an example, here I wish to check if the file exists, and if yes, I will use the data in the file later in the code, but it doesn't work:
val texte = Try(Source.fromFile(chemin_texte).getLines().filter(_!="").foldLeft(""){_+_})
texte match {
case Success(x) => x
case Failure(e) => println("An error occured with the text file"); println("Error: " + e.getMessage)
}
/*phrases du texte*/
val phrases_txt = split_phrases(texte).map(phrase => phrase)
At val phrases_txt I wish to use the output of texte if the file exists, if not the program should halt at Failure(e).
The error that I get is type mismatch; found: scala.util.Try[String] required: String .
Any help? Thanks.
Think of Try as just a container for a computation that can fail. It is not comparable with a try and catch block because they just "throw" the exceptions, which are expected to be handled later on in the program. Scala Try forces you to ensure that a possible error is handled at all times from that point onwards in your program.
You can do something like this:
val texte = Try(Source.fromFile(chemin_texte).getLines().filter(_!="").foldLeft(""){_+_})
val phrases: Try[List[String]] = texte.map(split_phrases)
I don't see the point of .map(phrases => phrases) because it will return the same object. The map function has a type of T[A] => T[B], so that means that for a container with values of type A, the map will run a given function f on the contents of the container and produce a container of type B where function f is responsible for converting an object of type A to type B.
If you wish to further use your phrases object in your program with other values that produce Try values, you can use the flatMap function or for expressions that make life easier. For example:
val morePhrases: Try[List[String]] = ???
def mergePhrases(phrases1: List[String], phrases2: List[String]): Phrases = phrases1 ++ phrases2
val mergedPhrases: Try[List[String]] = for {
p1 <- phrases
p2 <- morePhrases
} yield mergePhrases(p1, p2) // Only for demonstration, you could also do yield p1 ++ p2
The mergedPhrases value in the code above is just a Try container containing the result of application of mergePhrases function on contents of phrases and morePhrases.
Note that the Try block may not always be the best way to capture error at the end of your program you'll what the first error occurred, but you won't know what exactly the error was. That's why we have things like Either.
If I have simple parser defs like this:
def term: Parser[String] = """[a-zA-Z"']+""".r ^^ { _.toString }
def intWhole: Parser[String] = wholeNumber ^^ { w => w }
def simpleTerm: Parser[String] = term >> {
case t:String => failure("Oops!")
}
If I parse against simpleTerm (with any string) it fails as expected with my "Oops!" message.
Now if I add this:
def repTerm: Parser[Unit] = rep(simpleTerm | intWhole) ^^ { _ => Unit }
If I now parse against repTerm, again with just a non-numeric character string, what I'd hope to happen is have it fail with the same "Oops!" message--basically an aborted parse. What happens instead is that I get no error at all; just the returned Unit.
Is there a way to make parsing stop once it hits a failure, and return that failure, during a rep() clause?
Looked at the code. There's a difference how rep() handles failure vs errors. failure() just tells a repeating sequence to stop, i.e. end of repeating clause. It's not a breakage necessarily. err() means something broke, and a rep() clause does propagate errors and stop further parsing.
Changing my failure() in code above to err() produces the desired result of stopping further parsing.
I'm trying to use a JavaToken combinator parser to pull out a particular match that's in the middle of larger string (ie ignore a random set of prefix chars). However I can't get it working and think I'm getting caught out by a greedy parser and/or CRs LFs. (the prefix chars can be basically anything). I have:
class RuleHandler extends JavaTokenParsers {
def allowedPrefixChars = """[a-zA-Z0-9=*+-/<>!\_(){}~\\s]*""".r
def findX: Parser[Double] = allowedPrefixChars ~ "(x=" ~> floatingPointNumber <~ ")" ^^ { case num => num.toDouble}
}
and then in my test case ..
"when looking for the X value" in {
"must find and correctly interpret X" in {
val testString =
"""
|Looking (only)
|for (x=45) within
|this string
""".stripMargin
val answer = ruleHandler.parse(ruleHandler.findX, testString)
System.out.println(" X value is : " + answer.toString)
}
}
I think it's similar to this SO question. Can anyone see whats wrong pls ? Tks.
First, you should not escape "\\s" twice inside """ """:
def allowedPrefixChars = """[a-zA-Z0-9=*+-/<>!\_(){}~\s]*?""".r
In your case it was interpreted separately "\" or "s" (s as symbol, not \s)
Second, your allowedPrefixChars parser includes (, x, =, so it captures the whole string, including (x=, nothing is left to subsequent parsers.
The solution is to be more concrete about prefix you want:
object ruleHandler extends JavaTokenParsers {
def allowedPrefixChar: Parser[String] = """[a-zA-Z0-9=*+-/<>!\_){}~\s]""".r //no "(" here
def findX: Parser[Double] = rep(allowedPrefixChar | "\\((?!x=)".r ) ~ "(x=" ~> floatingPointNumber <~ ")" ^^ { case num => num.toDouble}
}
ruleHandler.parse(ruleHandler.findX, testString)
res14: ruleHandler.ParseResult[Double] = [3.11] parsed: 45.0
I've told the parser to ignore (, that has x= going after (it's just negative lookahead).
Alternative:
"""\(x=(.*?)\)""".r.findAllMatchIn(testString).map(_.group(1).toDouble).toList
res22: List[Double] = List(45.0)
If you want to use parsers correctly, I would recommend you to describe the whole BNF grammar (with all possible (,) and = usages) - not just fragment. For example, include (only) into your parser if it's keyword, "(" ~> valueName <~ "=" ~ value to get value. Don't forget that scala-parser is intended to return you AST, not just some matched value. Pure regexps are better for regular matching from unstructured data.
Example how it would like to use parsers in correct way (didn't try to compile):
trait Command
case class Rule(name: String, value: Double) extends Command
case class Directive(name: String) extends Command
class RuleHandler extends JavaTokenParsers { //why `JavaTokenParsers` (not `RegexParsers`) if you don't use tokens from Java Language Specification ?
def string = """[a-zA-Z0-9*+-/<>!\_{}~\s]*""".r //it's still wrong you should use some predefined Java-like literals from **JavaToken**Parsers
def rule = "(" ~> string <~ "=" ~> string <~ ")" ^^ { case name ~ num => Rule(name, num.toDouble} }
def directive = "(" ~> string <~ ")" ^^ { case name => Directive(name) }
def commands: Parser[Command] = repsep(rule | directive, string)
}
If you need to process natural language (Chomsky type-0), scalanlp or something similar fits better.