Scala PackratParsers: backtracking seems not to work - scala

The following scala code fails to work as expected:
import scala.util.parsing.combinator.PackratParsers
import scala.util.parsing.combinator.syntactical.StandardTokenParsers
import scala.util.parsing.combinator.lexical.StdLexical
object Minimal extends StandardTokenParsers with PackratParsers {
override val lexical = new StdLexical
lexical.delimiters += ("<", "(", ")")
lazy val expression: PackratParser[Any] = (
numericLit
| numericLit ~ "<" ~ numericLit
)
def parseAll[T](p: PackratParser[T], in: String): ParseResult[T] =
phrase(p)(new PackratReader(new lexical.Scanner(in)))
def main(args: Array[String]) = println(parseAll(expression, "2 < 4"))
}
I get the error message:
[1.3] failure: end of input expected
2 < 4
^
If however I change the definition of "expression" to
lazy val expression: PackratParser[Any] = (
numericLit ~ "<" ~ numericLit
| numericLit
)
the problem disappears.
The problem seems to be that with the original definition code for "expression" the first rule consisting only of "numericLit" is applied, such that the parser indeed expects the input to end immediately afterwards. I do not understand why the parser does not backtrack as soon as it notices that the input does not indeed end; scala PackratParsers are supposed to be backtracking, and I also made sure to replace "def" by "lazy val" as suggested in the answer to another question.

The reason you are seeing this behaviour is that the alternation operator (vertical bar) is designed to accept the first of its alternatives that succeeds. In your case numericLit succeeds so the alternation never considers other alternatives.
With this kind of grammar specification you have to be careful if one alternative can match a prefix of another. As you've seen, the longer alternative should be placed earlier in the alternatives, otherwise it can never succeed.
If you wish the shorter alternative to match only if there is no more input after it, then you could try using the not combinator to express that extra condition. However, this approach will cause problems if expression is intended to be used inside other constructs.

It has nothing to do with packrat parser.
What you need to know is that in PEG, the choice operator selects the first match, which is numericLit in your case.

Related

Calling methods with parameters without using parentheses

I am using the following implicit class in order to call a function which would otherwise take parameters without having to write the parameters in brackets:
scala> implicit class Print(string: String) {
| def echo: Unit = Console.println(string)
| }
defined class Print
scala> "Hello world" echo
Hello world
However while this works, I don't really like how it looks and my goal is to get the method call in front of the input variable as I think it reads better.
Is there any simple way, without relying on external libraries, to be able to call a method before supplying the parameters and without needing brackets? Implicit classes are what I've been using so far but that doesn't have to be the final solution.
What I would like to type instead of "Hello world" echo:
scala> echo "Hello world"
Alternatives I have tried:
Object with apply method
Requires parentheses
object echo {
def apply(string: String): Unit = Console.println(string)
}
echo "Hello world" // Error: ';' or newline expected
extending Dynamic [see here]
Doesn't seem to work in my version of Scala
Special characters [see here]
Looks ugly and not what I am looking for
Scalaz [see here]
Looks to do basically what my implicit class solution does, and I don't want any external dependencies.
EDIT
This answer has been pointed to as a potential solution, but again it doesn't address my issue as it relies on Dynamic to achieve a solution. As previously mentioned, Dynamic does not solve my problem for a couple of reasons:
It behaves funnily
If you define a val and try to println that val, it gives you back the val's name and not its value (as pointed out by #metaphori):
object println extends Dynamic {
def typed[T] = asInstanceOf[T]
def selectDynamic(name: String) = Console.println(name)
}
val x = "hello"
println x // prints: x
The specific example linked to did not work when I tried to recreate it - it still gave the ';' or newline expected error
If I just misunderstood how to implement it then I would appreciate a scalafiddle demonstrating that this solution solves my problem and will happily concede that this question is a duplicate of the previously mentioned answer, but until then I do contest it.
AFAIK only way to do something similar to what you want is extending Dynamic like this:
object println extends Dynamic {
def typed[T] = asInstanceOf[T]
def selectDynamic(name: String) = Console.println(name)
}
and using it with:
println `Hello World`
edit: of course you need to enable related features either by adding compiler parameters -language:postfixOps and -language:dynamics or by importing scala.language.dynamics and scala.language.postfixOps
You cannot achieve
echo "str"
since Scala is not e.g. Ruby: it syntactically requires that function invocations use parentheses.
It is not a matter of semantics or how things are implemented or what techniques are used: here is the parser that complains.
The point is that x y is actually interpreted as x.y, which means that y must be a method.
Refer to the Scala Language Specification, section 6.6 Function Applications:
SimpleExpr ::= SimpleExpr1 ArgumentExprs
ArgumentExprs ::= ‘(’ [Exprs] ‘)’
| ‘(’ [Exprs ‘,’] PostfixExpr ‘:’ ‘_’ ‘*’ ‘)’
| [nl] BlockExpr
Exprs ::= Expr {‘,’ Expr}
I do not like the trick of #hüseyin-zengin since it leverages dynamic method invocations, and also does not work as expected:
val x = "hello"
println x // prints: x
To partially achieve what you like you need to use infix operator notation
object run { def echo(s: String) = println(s) }
run echo "hello" // OK
run.echo "hello" // error: ';' expected but string literal found.
You may also use a symbol to reduce "typing" overhead (though may be perceived weirdly):
object > { def echo(s: String) = println(s) }
> echo "hello" // OK

Using `err` in a Child Parser

In the following Parser:
object Foo extends JavaTokenParsers {
def word(x: String) = s"\\b$x\\b".r
lazy val expr = aSentence | something
lazy val aSentence = noun ~ verb ~ obj
lazy val noun = word("noun")
lazy val verb = word("verb") | err("not a verb!")
lazy val obj = word("object")
lazy val something = word("FOO")
}
It will parse noun verb object.
scala> Foo.parseAll(Foo.expr, "noun verb object")
res1: Foo.ParseResult[java.io.Serializable] = [1.17] parsed: ((noun~verb)~object)
But, when entering a valid noun, but an invalid verb, why won't the err("not a verb!") return an Error with that particular error message?
scala> Foo.parseAll(Foo.expr, "noun vedsfasdf")
res2: Foo.ParseResult[java.io.Serializable] =
[1.6] failure: string matching regex `\bverb\b' expected but `v' found
noun vedsfasdf
^
credit: Thanks to Travis Brown for explaining the need for the word function here.
This question seems similar, but I'm not sure how to handle err with the ~ function.
Here's another question you might ask: why isn't it complaining that it expected the word "FOO" but got "noun"? After all, if it fails to parse aSentence, it's then going to try something.
The culprit should be obvious when you think about it: what in that source code is taking two Failure results and choosing one? | (aka append).
This method on Parser will feed the input to both parsers, and then call append on ParseResult. That method is abstract at that level, and defined on Success, Failure and Error in different ways.
On both Success and Error, it always take this (that is, the parser on the left). On Failure, though, it does something else:
case class Failure(override val msg: String, override val next: Input) extends NoSuccess(msg, next) {
/** The toString method of a Failure yields an error message. */
override def toString = "["+next.pos+"] failure: "+msg+"\n\n"+next.pos.longString
def append[U >: Nothing](a: => ParseResult[U]): ParseResult[U] = { val alt = a; alt match {
case Success(_, _) => alt
case ns: NoSuccess => if (alt.next.pos < next.pos) this else alt
}}
}
Or, in other words, if both sides have failed, then it will take the side that read the most of the input (which is why it won't complain about a missing FOO), but if both have read the same amount, it will give precedence to the second failure.
I do wonder if it shouldn't check whether the right side is an Error, and, if so, return that. After all, if the left side is an Error, it always return that. This look suspicious to me, but maybe it's supposed to be that way. But I digress.
Back to the problem, it would seem that it should have gone with err, as they both consumed the same amount of input, right? Well... Here's the thing: regex parsers skip whiteSpace first, but that's for regex literals and literal strings. It does not apply over all other methods, including err.
That means that err's input is at the whitespace, while the word's input is at the word, and, therefore, further on the input. Try this:
lazy val verb = word("verb") | " *".r ~ err("not a verb!")
Arguably, err ought to be overridden by RegexParsers to do the right thing (tm). Since Scala Parser Combinators is now a separate project, I suggest you open an issue and follow it up with a Pull Request implementing the change. It will have the impact of changing error messages for some parser (well, that's the whole purpose of changing it :).

Parsing sentences using Scala parser combinator

I just started playing with parser combinators in Scala, but got stuck on a parser to parse sentences such as "I like Scala." (words end on a whitespace or a period (.)).
I started with the following implementation:
package example
import scala.util.parsing.combinator._
object Example extends RegexParsers {
override def skipWhitespace = false
def character: Parser[String] = """\w""".r
def word: Parser[String] =
rep(character) <~ (whiteSpace | guard(literal("."))) ^^ (_.mkString(""))
def sentence: Parser[List[String]] = rep(word) <~ "."
}
object Test extends App {
val result = Example.parseAll(Example.sentence, "I like Scala.")
println(result)
}
The idea behind using guard() is to have a period demarcate word endings, but not consume it so that sentences can. However, the parser gets stuck (adding log() reveals that it is repeatedly trying the word and character parser).
If I change the word and sentence definitions as follows, it parses the sentence, but the grammar description doesn't look right and will not work if I try to add parser for paragraph (rep(sentence)) etc.
def word: Parser[String] =
rep(character) <~ (whiteSpace | literal(".")) ^^ (_.mkString(""))
def sentence: Parser[List[String]] = rep(word) <~ opt(".")
Any ideas what may be going on here?
However, the parser gets stuck (adding log() reveals that it is repeatedly trying the word and character parser).
The rep combinator corresponds to a * in perl-style regex notation. This means it matches zero or more characters. I think you want it to match one or more characters. Changing that to a rep1 (corresponding to + in perl-style regex notation) should fix the problem.
However, your definition still seems a little verbose to me. Why are you parsing individual characters instead of just using \w+ as the pattern for a word? Here's how I'd write it:
object Example extends RegexParsers {
override def skipWhitespace = false
def word: Parser[String] = """\w+""".r
def sentence: Parser[List[String]] = rep1sep(word, whiteSpace) <~ "."
}
Notice that I use rep1sep to parse a non-empty list of words separated by whitespace. There's a repsep combinator as well, but I think you'd want at least one word per sentence.

Scala parser-combinators: how to invert matches?

Is it possible to invert matches with Scala parser combinators? I am trying to match lines with a parser that do not start with a set of keywords. I could do this with an annoying zero width negative lookahead regular expression (e.g. "(?!h1|h2).*"), but I'd rather do it with a Scala parser. The best I've been able to come up with is this:
def keyword = "h1." | "h2."
def alwaysfails = "(?=a)b".r
def linenotstartingwithkeyword = keyword ~! alwaysfails | ".*".r
The idea is here that I use ~! to forbid backtracking to the all-matching regexp, and then continue with a regex "(?=a)b".r that matches nothing. (By the way, is there a predefined parser that always fails?) That way the line would not be matched if a keyword is found but would be matched if keyword does not match.
I am wondering if there is a better way to do this. Is there?
You can use not here:
import scala.util.parsing.combinator._
object MyParser extends RegexParsers {
val keyword = "h1." | "h2."
val lineNotStartingWithKeyword = not(keyword) ~> ".*".r
def apply(s: String) = parseAll(lineNotStartingWithKeyword, s)
}
Now:
scala> MyParser("h1. test")
res0: MyParser.ParseResult[String] =
[1.1] failure: Expected failure
h1. test
^
scala> MyParser("h1 test")
res1: MyParser.ParseResult[String] = [1.8] parsed: h1 test
Note that there is also a failure method on Parsers, so you could just as well have written your version with keyword ~! failure("keyword!"). But not's a lot nicer, anyway.

Using LabelDef in scala macros (2.10)

I'm experimenting with the scala 2.10 macro features. I have trouble using LabelDef in some cases, though. To some extent I peeked in the compiler's code, read excerpts of Miguel Garcia's papers but I'm still stuck.
If my understanding is correct, a pseudo-definition would be:
LabelDef(labelName, listOfParameters, stmsAndApply) where the 3 arguments are Trees and:
- labelNameis the identifier of the label $L being defined
- listOfParameters correspond to the arguments passed when label-apply occurs, as in $L(a1,...,an), and can be empty
- stmsAndApplycorresponds to the block of statements (possibly none) and final apply-expression
label-apply meaning more-or-less a GOTO to a label
For instance, in the case of a simple loop, a LabelDef can eventually apply itself:
LabelDef($L, (), {...; $L()})
Now, if I want to define 2 LabelDef that jump to each other:
...
LabelDef($L1, (), $L2())
...
LabelDef($L2, (), $L1())
...
The 2nd LabelDef is fine, but the compiler outputs an error on the 1st, "not found: value $L2". I guess that is because $L2 isn't yet defined while there is an attempt to apply it. This is a tree being constructed so that would make sense to me. Is my understanding correct so far? Because if no error is expected, that means my macro implementation is probably buggy.
Anyway, I believe there must be a way to apply $L2 (i.e. Jumping to $L2) from $L1, somehow, but I just have no clue how to do it. Does someone have an example of doing that, or any pointer?
Other unclear points (but less of a concern right now) about using LabelDef in macros are:
-what the 2nd argument is, concretely, how is it used when non-empty? In other words, what are the mechanisms of a label-apply with parameters?
-is it valid to put in the 3rd argument's final expression anything else than a label-apply? (not that I can't try, but macros are still experimental)
-is it possible to perform a forwarding label-apply outside a LabelDef? (maybe this is a redundant question)
Any macro implementation example in the answer is, of course, very welcome!
Cheers,
Because if no error is expected, that means my macro implementation is probably buggy.
Yes, it seems that was a bug (^^; Although I'm not sure whether or not the limitation with the Block/LabelDef combination exists on purpose.
def EVIL_LABELS_MACRO = macro EVIL_LABELS_MACRO_impl
def EVIL_LABELS_MACRO_impl(c:Context):c.Expr[Unit] = { // fails to expand
import c.universe._
val lt1 = newTermName("$L1"); val lt2 = newTermName("$L2")
val ld1 = LabelDef(lt1, Nil, Block(c.reify{println("$L1")}.tree, Apply(Ident(lt2), Nil)))
val ld2 = LabelDef(lt2, Nil, Block(c.reify{println("$L2")}.tree, Apply(Ident(lt1), Nil)))
c.Expr( Block(ld1, c.reify{println("ignored")}.tree, ld2) )
}
def FINE_LABELS_MACRO = macro FINE_LABELS_MACRO_impl
def FINE_LABELS_MACRO_impl(c:Context):c.Expr[Unit] = { // The End isn't near
import c.universe._
val lt1 = newTermName("$L1"); val lt2 = newTermName("$L2")
val ld1 = LabelDef(lt1, Nil, Block(c.reify{println("$L1")}.tree, Apply(Ident(lt2), Nil)))
val ld2 = LabelDef(lt2, Nil, Block(c.reify{println("$L2")}.tree, Apply(Ident(lt1), Nil)))
c.Expr( Block(ld1, c.reify{println("ignored")}.tree, ld2, c.reify{println("The End")}.tree) )
}
I think a Block is parsed into { statements; expression } thus the last argument is the expression. If a LabelDef "falls in" expression, e.g. the EVIL_LABELS_MACRO pattern, its expansion won't be visible in statements; hence error "not found: value $L2".
So it's better to make sure all LabelDef "fall in" statements. FINE_LABELS_MACRO does that and expands to:
{
$L1(){
scala.this.Predef.println("$L1");
$L2()
};
scala.this.Predef.println("ignored");
$L2(){
scala.this.Predef.println("$L2");
$L1()
};
scala.this.Predef.println("The End")
}