Scala Parser Combinator to match element from list - scala

I want a parser that matches if and only if the parsed String is contained by a given list of Strings.
def box: Parser[String] = // match if token is element of boxSyms: List[String]
Even after hours of searching the web, I have no idea how to achieve this. (Which makes me think I've looked for it the wrong way).
Edit:
This is only a snippet from a bigger parser. The input is going to be used in further parser combinators:
lazy val boxModal = box ~ formula ^^ {
case boxSym ~ formula => Box(boxSyms.get(boxSym).get, formula)
}
The problem is that the List boxSyms is unknown at compile time.

Maybe something like this would work:
lazy val boxModal = box ~ formula ^^ {
case boxSym ~ formula if boxSyms.contains(boxSym) =>
Box(boxSyms.get(boxSym).get, formula)
}
Or some other, more specific condition.

Related

Understanding the val declaration syntax for type Some in scala

While going through Spray.io examples library I came across this declaration of val in FileUploadHandler example of routing app.
val Some(HttpHeaders.Content-Type(ContentType(multipart: MultipartMediaType, _))) = header[HttpHeaders.Content-Type]
As per my understanding the variable declaration goes as val <identifier> = ...
Please help in understanding this paradigm of syntax.
val is a bit more complex than just an assignment operator.
A definition
val p = e
where p is not just a variable name, is expanded to
val x = e match { case p => x }
Take a loot at the simplest example:
val Some(s) = Some(5)
As a result, s would be equal 5.
In your example header[HttpHeaders.Content-Type] is matched against Some(...).
According to Scala language spec: Value definitions can alternatively have a pattern as left-hand side. Watch out for PatDef in the document.
Section "Patterns in Value Definitions" of Daniel Westheide's Blog gives a nice overview on the usage.
You're looking for extractors/mattern matching in scala, please see http://www.scala-lang.org/old/node/112.
You need a simple form of it, take a look at this snippet:
scala> val Some(t) = Some("Hello")
t: String = Hello

Can I tell scala.xml to match any of two tags?

body \\ "div" matches the "div" tags, and body \\ "p" matches the "p" tags.
But what if I'd like to match all the "div" and "p" tags? Is it possible with one expression in scala.xml?
And if not, is there another way to iterate over all the "div" and "p" tags in the document in the order in which they appear?
If you take a look at the source for \\ in NodeSeq.scala you can see that it's really just a bit of sugar for a filter operation over descendant_or_self, which is a List[Node], using the node's label.
So you could do the same thing yourself, matching against a set of labels, like this:
val searchedLabels = Set("p", "div")
val results = body.descendant_or_self.filter(node => searchedLabels.contains(node.label))
Or if you really want it to seem like "built-in" functionality, you can pimp-on a suitable method to scala.xml.Node like so:
class ExtendedNode(n: Node) {
def \\\(labels: Set[String]): NodeSeq = {
n.descendant_or_self.filter(node => labels.contains(node.label))
}
}
implicit def node2extendedNode(n: Node): ExtendedNode = new ExtendedNode(n)
val results = body \\\ Set("p", "div")
although I must say I'm not sure I like either the method-name or the use of an implicit here :-(

Using `err` in a Child Parser

In the following Parser:
object Foo extends JavaTokenParsers {
def word(x: String) = s"\\b$x\\b".r
lazy val expr = aSentence | something
lazy val aSentence = noun ~ verb ~ obj
lazy val noun = word("noun")
lazy val verb = word("verb") | err("not a verb!")
lazy val obj = word("object")
lazy val something = word("FOO")
}
It will parse noun verb object.
scala> Foo.parseAll(Foo.expr, "noun verb object")
res1: Foo.ParseResult[java.io.Serializable] = [1.17] parsed: ((noun~verb)~object)
But, when entering a valid noun, but an invalid verb, why won't the err("not a verb!") return an Error with that particular error message?
scala> Foo.parseAll(Foo.expr, "noun vedsfasdf")
res2: Foo.ParseResult[java.io.Serializable] =
[1.6] failure: string matching regex `\bverb\b' expected but `v' found
noun vedsfasdf
^
credit: Thanks to Travis Brown for explaining the need for the word function here.
This question seems similar, but I'm not sure how to handle err with the ~ function.
Here's another question you might ask: why isn't it complaining that it expected the word "FOO" but got "noun"? After all, if it fails to parse aSentence, it's then going to try something.
The culprit should be obvious when you think about it: what in that source code is taking two Failure results and choosing one? | (aka append).
This method on Parser will feed the input to both parsers, and then call append on ParseResult. That method is abstract at that level, and defined on Success, Failure and Error in different ways.
On both Success and Error, it always take this (that is, the parser on the left). On Failure, though, it does something else:
case class Failure(override val msg: String, override val next: Input) extends NoSuccess(msg, next) {
/** The toString method of a Failure yields an error message. */
override def toString = "["+next.pos+"] failure: "+msg+"\n\n"+next.pos.longString
def append[U >: Nothing](a: => ParseResult[U]): ParseResult[U] = { val alt = a; alt match {
case Success(_, _) => alt
case ns: NoSuccess => if (alt.next.pos < next.pos) this else alt
}}
}
Or, in other words, if both sides have failed, then it will take the side that read the most of the input (which is why it won't complain about a missing FOO), but if both have read the same amount, it will give precedence to the second failure.
I do wonder if it shouldn't check whether the right side is an Error, and, if so, return that. After all, if the left side is an Error, it always return that. This look suspicious to me, but maybe it's supposed to be that way. But I digress.
Back to the problem, it would seem that it should have gone with err, as they both consumed the same amount of input, right? Well... Here's the thing: regex parsers skip whiteSpace first, but that's for regex literals and literal strings. It does not apply over all other methods, including err.
That means that err's input is at the whitespace, while the word's input is at the word, and, therefore, further on the input. Try this:
lazy val verb = word("verb") | " *".r ~ err("not a verb!")
Arguably, err ought to be overridden by RegexParsers to do the right thing (tm). Since Scala Parser Combinators is now a separate project, I suggest you open an issue and follow it up with a Pull Request implementing the change. It will have the impact of changing error messages for some parser (well, that's the whole purpose of changing it :).

Scala parser combinator reduce/foldLeft

I'm trying to make the following, from a dynamically filled List:
val primitives = "x" | "y" | "z" // what I want
val primitives2 = List("x", "y", "z") // what I need to transform from
I figured something like this might work:
primitives2.reduce(_|_)
But no go. I then found this snippet, which works:
primitives2.foldRight(failure("no matching delimiter"): Parser[Any])(_|_)
However, the base case failure("no matching delimiter") is confusing. Is that just the equivalent Nil case for Parser objects?
I'm going to assume that you're working with RegexParsers or one of its descendants. If so, then the issue is just that the implicit conversion from String to Parser[String] won't kick in automatically with reduce(_ | _). If you explicitly convert every item in your list first, like this:
val anyPrimitive = primitives2.map(literal).reduce(_ | _)
You'll be perfectly fine—except that this will leave you with slightly confusing error messages, like this:
scala> parser.parseAll(parser.anyPrimitive, "a")
res8: parser.ParseResult[Any] =
[1.1] failure: `z' expected but `a' found
a
^
If you want a clearer error message, then you'll need to provide your own starting value using the fold approach.

Scala parser-combinators: how to invert matches?

Is it possible to invert matches with Scala parser combinators? I am trying to match lines with a parser that do not start with a set of keywords. I could do this with an annoying zero width negative lookahead regular expression (e.g. "(?!h1|h2).*"), but I'd rather do it with a Scala parser. The best I've been able to come up with is this:
def keyword = "h1." | "h2."
def alwaysfails = "(?=a)b".r
def linenotstartingwithkeyword = keyword ~! alwaysfails | ".*".r
The idea is here that I use ~! to forbid backtracking to the all-matching regexp, and then continue with a regex "(?=a)b".r that matches nothing. (By the way, is there a predefined parser that always fails?) That way the line would not be matched if a keyword is found but would be matched if keyword does not match.
I am wondering if there is a better way to do this. Is there?
You can use not here:
import scala.util.parsing.combinator._
object MyParser extends RegexParsers {
val keyword = "h1." | "h2."
val lineNotStartingWithKeyword = not(keyword) ~> ".*".r
def apply(s: String) = parseAll(lineNotStartingWithKeyword, s)
}
Now:
scala> MyParser("h1. test")
res0: MyParser.ParseResult[String] =
[1.1] failure: Expected failure
h1. test
^
scala> MyParser("h1 test")
res1: MyParser.ParseResult[String] = [1.8] parsed: h1 test
Note that there is also a failure method on Parsers, so you could just as well have written your version with keyword ~! failure("keyword!"). But not's a lot nicer, anyway.