In the following Parser:
object Foo extends JavaTokenParsers {
def word(x: String) = s"\\b$x\\b".r
lazy val expr = aSentence | something
lazy val aSentence = noun ~ verb ~ obj
lazy val noun = word("noun")
lazy val verb = word("verb") | err("not a verb!")
lazy val obj = word("object")
lazy val something = word("FOO")
}
It will parse noun verb object.
scala> Foo.parseAll(Foo.expr, "noun verb object")
res1: Foo.ParseResult[java.io.Serializable] = [1.17] parsed: ((noun~verb)~object)
But, when entering a valid noun, but an invalid verb, why won't the err("not a verb!") return an Error with that particular error message?
scala> Foo.parseAll(Foo.expr, "noun vedsfasdf")
res2: Foo.ParseResult[java.io.Serializable] =
[1.6] failure: string matching regex `\bverb\b' expected but `v' found
noun vedsfasdf
^
credit: Thanks to Travis Brown for explaining the need for the word function here.
This question seems similar, but I'm not sure how to handle err with the ~ function.
Here's another question you might ask: why isn't it complaining that it expected the word "FOO" but got "noun"? After all, if it fails to parse aSentence, it's then going to try something.
The culprit should be obvious when you think about it: what in that source code is taking two Failure results and choosing one? | (aka append).
This method on Parser will feed the input to both parsers, and then call append on ParseResult. That method is abstract at that level, and defined on Success, Failure and Error in different ways.
On both Success and Error, it always take this (that is, the parser on the left). On Failure, though, it does something else:
case class Failure(override val msg: String, override val next: Input) extends NoSuccess(msg, next) {
/** The toString method of a Failure yields an error message. */
override def toString = "["+next.pos+"] failure: "+msg+"\n\n"+next.pos.longString
def append[U >: Nothing](a: => ParseResult[U]): ParseResult[U] = { val alt = a; alt match {
case Success(_, _) => alt
case ns: NoSuccess => if (alt.next.pos < next.pos) this else alt
}}
}
Or, in other words, if both sides have failed, then it will take the side that read the most of the input (which is why it won't complain about a missing FOO), but if both have read the same amount, it will give precedence to the second failure.
I do wonder if it shouldn't check whether the right side is an Error, and, if so, return that. After all, if the left side is an Error, it always return that. This look suspicious to me, but maybe it's supposed to be that way. But I digress.
Back to the problem, it would seem that it should have gone with err, as they both consumed the same amount of input, right? Well... Here's the thing: regex parsers skip whiteSpace first, but that's for regex literals and literal strings. It does not apply over all other methods, including err.
That means that err's input is at the whitespace, while the word's input is at the word, and, therefore, further on the input. Try this:
lazy val verb = word("verb") | " *".r ~ err("not a verb!")
Arguably, err ought to be overridden by RegexParsers to do the right thing (tm). Since Scala Parser Combinators is now a separate project, I suggest you open an issue and follow it up with a Pull Request implementing the change. It will have the impact of changing error messages for some parser (well, that's the whole purpose of changing it :).
Related
I am reading Lihaoyi's presentation of his parser combinators framework while knowing only the basics of Scala.
I came across this line that I don't understand at all:
val Parsed.Success(2, _) = parse("1+1", expr(_))
Coming from Java, it looks weird. Anybody knows what it does?
Thanks in advance.
https://www.lihaoyi.com/fastparse/
What would be the equivalent in Java?
Scala knows Extractor Objects - see docs.scala-lang.org
They are mostly used for pattern matching - see docs.scala-lang.org
So this can be used with vals:
val customer2ID = CustomerID("Nico")
val CustomerID(name) = customer2ID
println(name) // prints Nico
Your example will throw a scala.MatchError if the parser would not work.
Try val Parsed.Success(2, _) = parse("1+2", expr(_)) // should be 3
Other answers explained what this is technically, but why did the author of fastparse write
val Parsed.Success(2, _) = parse("1+1", expr(_))
where we see no variable is actually bound in the sense of being usable after this statement executes? This kind of looks like a side-effect returning unit. I believe the author is deploying this technique as an alternative to assertion
assert(parse("1+1", expr(_)) == Parsed.Success(2, 3))
but they want to emphasise they do not care about the index value of Success[+T](value: T, index: Int) which they cannot do using regular assert
assert(parse("1+1", expr(_)) == Parsed.Success(2, _)) // error: missing parameter type for expanded function
Function parse is returning a Parsed[T], which is matched against the extractor of the Success case class (inside the object Parsed).
It could be rewritten as:
parse("1+1", expr(_)) match {
case Parsed.Success(2, _/*index*/) => ???
}
The previous make it obvious that it's not an exhaustive match (and so the val pattern would raise a MatchError).
Could also be used as bellow, to be more explicit and avoid MatchError.
val res: Option[(Int/* value */, Int/* index */)] =
Parsed.Success.unapply(parse("1+2", expr(_)))
I am a new to Scala coming from Java background, currently confused about the best practice considering Option[T].
I feel like using Option.map is just more functional and beautiful, but this is not a good argument to convince other people. Sometimes, isEmpty check feels more straight forward thus more readable. Is there any objective advantages, or is it just personal preference?
Example:
Variation 1:
someOption.map{ value =>
{
//some lines of code
}
} orElse(foo)
Variation 2:
if(someOption.isEmpty){
foo
} else{
val value = someOption.get
//some lines of code
}
I intentionally excluded the options to use fold or pattern matching. I am simply not pleased by the idea of treating Option as a collection right now, and using pattern matching for a simple isEmpty check is an abuse of pattern matching IMHO. But no matter why I dislike these options, I want to keep the scope of this question to be the above two variations as named in the title.
Is there any objective advantages, or is it just personal preference?
I think there's a thin line between objective advantages and personal preference. You cannot make one believe there is an absolute truth to either one.
The biggest advantage one gains from using the monadic nature of Scala constructs is composition. The ability to chain operations together without having to "worry" about the internal value is powerful, not only with Option[T], but also working with Future[T], Try[T], Either[A, B] and going back and forth between them (also see Monad Transformers).
Let's try and see how using predefined methods on Option[T] can help with control flow. For example, consider a case where you have an Option[Int] which you want to multiply only if it's greater than a value, otherwise return -1. In the imperative approach, we get:
val option: Option[Int] = generateOptionValue
var res: Int = if (option.isDefined) {
val value = option.get
if (value > 40) value * 2 else -1
} else -1
Using collections style method on Option, an equivalent would look like:
val result: Int = option
.filter(_ > 40)
.map(_ * 2)
.getOrElse(-1)
Let's now consider a case for composition. Let's say we have an operation which might throw an exception. Additionaly, this operation may or may not yield a value. If it returns a value, we want to query a database with that value, otherwise, return an empty string.
A look at the imperative approach with a try-catch block:
var result: String = _
try {
val maybeResult = dangerousMethod()
if (maybeResult.isDefined) {
result = queryDatabase(maybeResult.get)
} else result = ""
}
catch {
case NonFatal(e) => result = ""
}
Now let's consider using scala.util.Try along with an Option[String] and composing both together:
val result: String = Try(dangerousMethod())
.toOption
.flatten
.map(queryDatabase)
.getOrElse("")
I think this eventually boils down to which one can help you create clear control flow of your operations. Getting used to working with Option[T].map rather than Option[T].get will make your code safer.
To wrap up, I don't believe there's a single truth. I do believe that composition can lead to beautiful, readable, side effect deferring safe code and I'm all for it. I think the best way to show other people what you feel is by giving them examples as we just saw, and letting them feel for themselves the power they can leverage with these sets of tools.
using pattern matching for a simple isEmpty check is an abuse of pattern matching IMHO
If you do just want an isEmpty check, isEmpty/isDefined is perfectly fine. But in your case you also want to get the value. And using pattern matching for this is not abuse; it's precisely the basic use-case. Using get allows to very easily make errors like forgetting to check isDefined or making the wrong check:
if(someOption.isEmpty){
val value = someOption.get
//some lines of code
} else{
//some other lines
}
Hopefully testing would catch it, but there's no reason to settle for "hopefully".
Combinators (map and friends) are better than get for the same reason pattern matching is: they don't allow you to make this kind of mistake. Choosing between pattern matching and combinators is a different question. Generally combinators are preferred because they are more composable (as Yuval's answer explains). If you want to do something covered by a single combinator, I'd generally choose them; if you need a combination like map ... getOrElse, or a fold with multi-line branches, it depends on the specific case.
It seems similar to you in case of Option but just consider the case of Future. You will not be able to interact with the future's value after going out of Future monad.
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Promise
import scala.util.{Success, Try}
// create a promise which we will complete after sometime
val promise = Promise[String]();
// Now lets consider the future contained in this promise
val future = promise.future;
val byGet = if (!future.value.isEmpty) {
val valTry = future.value.get
valTry match {
case Success(v) => v + " :: Added"
case _ => "DEFAULT :: Added"
}
} else "DEFAULT :: Added"
val byMap = future.map(s => s + " :: Added")
// promise was completed now
promise.complete(Try("PROMISE"))
//Now lets print both values
println(byGet)
// DEFAULT :: Added
println(byMap)
// Success(PROMISE :: Added)
The following scala code fails to work as expected:
import scala.util.parsing.combinator.PackratParsers
import scala.util.parsing.combinator.syntactical.StandardTokenParsers
import scala.util.parsing.combinator.lexical.StdLexical
object Minimal extends StandardTokenParsers with PackratParsers {
override val lexical = new StdLexical
lexical.delimiters += ("<", "(", ")")
lazy val expression: PackratParser[Any] = (
numericLit
| numericLit ~ "<" ~ numericLit
)
def parseAll[T](p: PackratParser[T], in: String): ParseResult[T] =
phrase(p)(new PackratReader(new lexical.Scanner(in)))
def main(args: Array[String]) = println(parseAll(expression, "2 < 4"))
}
I get the error message:
[1.3] failure: end of input expected
2 < 4
^
If however I change the definition of "expression" to
lazy val expression: PackratParser[Any] = (
numericLit ~ "<" ~ numericLit
| numericLit
)
the problem disappears.
The problem seems to be that with the original definition code for "expression" the first rule consisting only of "numericLit" is applied, such that the parser indeed expects the input to end immediately afterwards. I do not understand why the parser does not backtrack as soon as it notices that the input does not indeed end; scala PackratParsers are supposed to be backtracking, and I also made sure to replace "def" by "lazy val" as suggested in the answer to another question.
The reason you are seeing this behaviour is that the alternation operator (vertical bar) is designed to accept the first of its alternatives that succeeds. In your case numericLit succeeds so the alternation never considers other alternatives.
With this kind of grammar specification you have to be careful if one alternative can match a prefix of another. As you've seen, the longer alternative should be placed earlier in the alternatives, otherwise it can never succeed.
If you wish the shorter alternative to match only if there is no more input after it, then you could try using the not combinator to express that extra condition. However, this approach will cause problems if expression is intended to be used inside other constructs.
It has nothing to do with packrat parser.
What you need to know is that in PEG, the choice operator selects the first match, which is numericLit in your case.
Is it possible to invert matches with Scala parser combinators? I am trying to match lines with a parser that do not start with a set of keywords. I could do this with an annoying zero width negative lookahead regular expression (e.g. "(?!h1|h2).*"), but I'd rather do it with a Scala parser. The best I've been able to come up with is this:
def keyword = "h1." | "h2."
def alwaysfails = "(?=a)b".r
def linenotstartingwithkeyword = keyword ~! alwaysfails | ".*".r
The idea is here that I use ~! to forbid backtracking to the all-matching regexp, and then continue with a regex "(?=a)b".r that matches nothing. (By the way, is there a predefined parser that always fails?) That way the line would not be matched if a keyword is found but would be matched if keyword does not match.
I am wondering if there is a better way to do this. Is there?
You can use not here:
import scala.util.parsing.combinator._
object MyParser extends RegexParsers {
val keyword = "h1." | "h2."
val lineNotStartingWithKeyword = not(keyword) ~> ".*".r
def apply(s: String) = parseAll(lineNotStartingWithKeyword, s)
}
Now:
scala> MyParser("h1. test")
res0: MyParser.ParseResult[String] =
[1.1] failure: Expected failure
h1. test
^
scala> MyParser("h1 test")
res1: MyParser.ParseResult[String] = [1.8] parsed: h1 test
Note that there is also a failure method on Parsers, so you could just as well have written your version with keyword ~! failure("keyword!"). But not's a lot nicer, anyway.
I'm trying to define a grammar for the commands below.
object ParserWorkshop {
def main(args: Array[String]) = {
ChoiceParser("todo link todo to database")
ChoiceParser("todo link todo to database deadline: next tuesday context: app.model")
}
}
The second command should be tokenized as:
action = todo
message = link todo to database
properties = [deadline: next tuesday, context: app.model]
When I run this input on the grammar defined below, I receive the following error message:
[1.27] parsed: Command(todo,link todo to database,List())
[1.36] failure: string matching regex `\z' expected but `:' found
todo link todo to database deadline: next tuesday context: app.model
^
As far as I can see it fails because the pattern for matching the words of the message is nearly identical to the pattern for the key of the property key:value pair, so the parser cannot tell where the message ends and the property starts. I can solve this by insisting that start token be used for each property like so:
todo link todo to database :deadline: next tuesday :context: app.model
But i would prefer to keep the command as close natural language as possible.
I have two questions:
What does the error message actually mean?
And how would I modify the existing grammar to work for the given input strings?
import scala.util.parsing.combinator._
case class Command(action: String, message: String, properties: List[Property])
case class Property(name: String, value: String)
object ChoiceParser extends JavaTokenParsers {
def apply(input: String) = println(parseAll(command, input))
def command = action~message~properties ^^ {case a~m~p => new Command(a, m, p)}
def action = ident
def message = """[\w\d\s\.]+""".r
def properties = rep(property)
def property = propertyName~":"~propertyValue ^^ {
case n~":"~v => new Property(n, v)
}
def propertyName: Parser[String] = ident
def propertyValue: Parser[String] = """[\w\d\s\.]+""".r
}
It is really simple. When you use ~, you have to understand that there's no backtracking on individual parsers which have completed succesfully.
So, for instance, message got everything up to before the colon, as all of that is an acceptable pattern. Next, properties is a rep of property, which requires propertyName, but it only finds the colon (the first char not gobbled by message). So propertyName fails, and property fails. Now, properties, as mentioned, is a rep, so it finishes succesfully with 0 repetitions, which then makes command finish succesfully.
So, back to parseAll. The command parser returned succesfully, having consumed everything before the colon. It then asks the question: are we at the end of the input (\z)? No, because there is a colon right next. So, it expected end-of-input, but got a colon.
You'll have to change the regex so it won't consume the last identifier before a colon. For example:
def message = """[\w\d\s\.]+(?![:\w])""".r
By the way, when you use def you force the expression to be reevaluated. In other words, each of these defs create a parser every time each one is called. The regular expressions are instantiated every time the parsers they belong to are processed. If you change everything to val, you'll get much better performance.
Remember, these things define the parser, they do not run it. It is parseAll which runs a parser.