Stack overflow when using parser combinators - scala

import scala.util.parsing.combinator._
object ExprParser extends JavaTokenParsers {
lazy val name: Parser[_] = "a" ~ rep("a" | "1") | function_call
lazy val function_call = name ~ "(" ~> name <~ ")"
}
recurs indefinitely for function_call.parseAll("aaa(1)"). Obviously, it is because 1 cannot inter the name and name enters the function_call, which tries the name, which enters the funciton call. How do you resolve such situations?
There was a solution to reduce name to simple identifier
def name = rep1("a" | "1")
def function_call = name ~ "(" ~ (function_call | name) ~ ")"
but I prefer not to do this because name ::= identifier | function_call is BNF-ed in VHDL specification and function_call is probably shared elsewhere. The left recursion elimination found here is undesirable for the same reason
def name: Parser[_] = "a" ~ rep("a" | "1") ~ pared_name
def pared_name: Parser[_] = "(" ~> name <~ ")" | ""
BTW, I also wonder, if I fix the error, will name.parseAll consume "aaa" only as first alternative in the name rule or take whole "aaa(1)"? How can I make name to consume the whole aaa(1) before consuming only aaa? I guess that I should put function_call a first alternative in the name but it will stack overflow even more eagerly in this case?

An easy solution is use the packrat parser:
object ExprParser extends JavaTokenParsers with PackratParsers {
lazy val name: PackratParser[_] = "a" ~ rep("a" | "1") | function_call
lazy val function_call: PackratParser[_] = name ~ "(" ~> name <~ ")"
}
Output:
scala> ExprParser.parseAll(ExprParser.function_call, "aaa(1)")
res0: ExprParser.ParseResult[Any] =
[1.5] failure: Base Failure
aaa(1)
^

Related

How to parse until a token is found on a line by itself

I'm trying to parse the following document:
val doc = """BEGIN
A Bunch
Of Text
With linebreaks
##
"""
The idea here being that when I see a ## on a line of its own, I should consider that the end of parsing.
I've tried, using the following code to parse this document:
object MyParser extends RegexParsers {
val begin: Parser[String] = "BEGIN"
val lines: Parser[Seq[String]] = repsep(line, eol)
val line: Parser[String] = """.+""".r
val eol: Parser[Any] = "\n" | "\r\n" | "\r"
val end: Parser[String] = "##"
val document: Parser[Seq[String]] =
begin ~> lines <~ end
}
MyParser.parseAll(MyParser.document, doc)
However when I try to execute this (in an Annonite script), I get the following:
java.lang.NullPointerException
scala.util.parsing.combinator.Parsers$class.rep1sep(Parsers.scala:771)
ammonite.$file.vtt$minusparser$MyParser$.rep1sep(vtt-parser.sc:3)
scala.util.parsing.combinator.Parsers$class.repsep(Parsers.scala:687)
ammonite.$file.vtt$minusparser$MyParser$.repsep(vtt-parser.sc:3)
ammonite.$file.vtt$minusparser$MyParser$.<init>(vtt-parser.sc:5)
ammonite.$file.vtt$minusparser$MyParser$.<clinit>(vtt-parser.sc)
ammonite.$file.vtt$minusparser$.<init>(vtt-parser.sc:22)
ammonite.$file.vtt$minusparser$.<clinit>(vtt-parser.sc)
Can anyone see where I'm going wrong?
The reason for the error is that line and eol are defined as normal class field vals, but they are used in lines before their definition. The code that assigns values to class fields is executed sequentially in the constructor, and line and eol are both still null, when lines is being assigned.
To solve this define line and eol as lazy vals or defs, or just put them before lines in the code.
The parser itself also has some problems. By default Scala parsers automatically ignore all whitespace, including EOLs. Considering that regexp .* without any flags does not include EOLs, line naturally means "the whole line until the line break", so you don't have to analyze EOLs at all.
Secondly, the lines parser as defined is greedy. It will happily consume everything including the final ##. To make it stop before end you can, for example, use the not combinator.
With all the changes, the parser looks like this:
object MyParser extends RegexParsers {
val begin: Parser[String] = "BEGIN"
val line: Parser[String] = """.+""".r
val lines: Parser[Seq[String]] = rep(not(end) ~> line)
val end: Parser[String] = "##"
val document: Parser[Seq[String]] =
begin ~> lines <~ end
}
You may also override the behaviour of skipping the whitespace, and analyze all whitespace manually. This includes the whitespace after BEGIN and after the ##:
object MyParser extends RegexParsers {
override def skipWhitespace = false
val eol: Parser[Any] = "\n" | "\r\n" | "\r"
val begin: Parser[String] = "BEGIN" <~ eol
val line: Parser[String] = """.*""".r
val lines: Parser[Seq[String]] = rep(not(end) ~> line <~ eol)
val end: Parser[String] = "##"
val document: Parser[Seq[String]] =
begin ~> lines <~ end <~ whiteSpace
}
Note, that line is defined as .* instead of .+ here. Like this the parser won't fail if there're any empty lines in the input.

scala fastparse typechecking

I am puzzled by why the following code using scala fastparse 0.4.3 fails typechecking.
val White = WhitespaceApi.Wrapper{
import fastparse.all._
NoTrace(CharIn(" \t\n").rep)
}
import fastparse.noApi._
import White._
case class Term(tokens: Seq[String])
case class Terms(terms: Seq[Term])
val token = P[String] ( CharIn('a' to 'z', 'A' to 'Z', '0' to '9').rep(min=1).!)
val term: P[Term] = P("[" ~ token.!.rep(sep=" ", min=1) ~ "]").map(x => Term(x))
val terms = P("(" ~ term.!.rep(sep=" ", min=1) ~ ")").map{x => Terms(x)}
val parse = terms.parse("([ab bd ef] [xy wa dd] [jk mn op])")
The error messages:
[error] .../MyParser.scala: type mismatch;
[error] found : Seq[String]
[error] required: Seq[Term]
[error] val terms = P("(" ~ term.!.rep(sep=" ", min=1) ~")").map{x => Terms(x)}
[error] ^
I would imagine that since term is of type Term and since the terms pattern uses term.!.rep(..., it should get a Seq[Term].
I figured it out. My mistake was capturing (with !) redundantly in terms. That line should instead be written:
val terms = P("(" ~ term.rep(sep=" ", min=1) ~ ")").map{x => Terms(x)}
Notice that term.!.rep( has been rewritten to term.rep(.
Apparently capturing in any rule will return the text that the captured subrule matches overriding what the subrule actually returns. I guess this is a feature when used correctly. :)

FastParse - out of memory error

I'm trying to use the FastParse library to create a parser for a very primitive templating system like this:
Hello, your name is {{name}} and today is {{date}}.
So far I have:
scala> import fastparse.all._
import fastparse.all._
scala> val FieldStart = "{{"
FieldStart: String = {{
scala> val FieldEnd = "}}"
FieldEnd: String = }}
scala> val Field = P(FieldStart ~ (!FieldEnd ~ AnyChar).rep.! ~ FieldEnd)
Field: fastparse.all.Parser[String] = Field
scala> val Static = P((!FieldStart ~ !FieldEnd ~ AnyChar).rep.!)
Static: fastparse.all.Parser[String] = Static
scala> val Template = P(Start ~ (Field | Static) ~ End)
Template: fastparse.all.Parser[String] = Template
scala> Template parse "{{foo}}"
res0: fastparse.core.Parsed[String,Char,String] = Success(foo,7)
scala> Template parse "foo"
res1: fastparse.core.Parsed[String,Char,String] = Success(foo,3)
scala> Template parse "{{foo"
res2: fastparse.core.Parsed[String,Char,String] = Failure(End:1:1 ..."{{foo")
But when I try what I think should be the correct final form:
scala> val Template = P(Start ~ (Field | Static).rep ~ End)
Template: fastparse.all.Parser[Seq[String]] = Template
I get:
scala> Template parse "{{foo}}"
java.lang.OutOfMemoryError: Java heap space
at scala.collection.mutable.ResizableArray$class.ensureSize(ResizableArray.scala:103)
at scala.collection.mutable.ArrayBuffer.ensureSize(ArrayBuffer.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$eq(ArrayBuffer.scala:84)
at scala.collection.mutable.ArrayBuffer.$plus$eq(ArrayBuffer.scala:48)
at fastparse.core.Implicits$LowPriRepeater$GenericRepeater.accumulate(Implicits.scala:47)
at fastparse.core.Implicits$LowPriRepeater$GenericRepeater.accumulate(Implicits.scala:44)
at fastparse.parsers.Combinators$Repeat.rec$3(Combinators.scala:462)
at fastparse.parsers.Combinators$Repeat.parseRec(Combinators.scala:489)
at fastparse.parsers.Combinators$Sequence$Flat.rec$1(Combinators.scala:297)
at fastparse.parsers.Combinators$Sequence$Flat.parseRec(Combinators.scala:319)
at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:160)
at fastparse.core.Parser.parseInput(Parsing.scala:374)
at fastparse.core.Parser.parse(Parsing.scala:358)
... 19 elided
What am I doing wrong?
Try like this:
val Field = P(FieldStart ~ (!FieldEnd ~ AnyChar).rep(min=1).! ~ FieldEnd)
val Static = P((!(FieldStart | FieldEnd) ~ AnyChar).rep(min=1).!)
val Template = P(Start ~ (Field | Static) ~ End)
You should be careful with .rep, it literally means zero or more...
Also, in the Static parser, the negative lookahead should look like !(FieldStart | FieldEnd),
I think, because you don't want (open braces or closed braces).
Hope it helps! ;)

Scala FastParse Library Error

I am trying to learn the scala fast parse library. Towards this I have written the following code
import fastparse.noApi._
import fastparse.WhitespaceApi
object FastParsePOC {
val White = WhitespaceApi.Wrapper{
import fastparse.all._
NoTrace(" ".rep)
}
def print(input : Parsed[String]): Unit = {
input match {
case Parsed.Success(value, index) => println(s"Success: $value $index")
case f # Parsed.Failure(error, line, col) => println(s"Error: $error $line $col ${f.extra.traced.trace}")
}
}
def main(args: Array[String]) : Unit = {
import White._
val parser = P("Foo" ~ "(" ~ AnyChar.rep(1).! ~ ")")
val input1 = "Foo(Bar(10), Baz(20))"
print(parser.parse(input1))
}
}
But I get error
Error: ")" 21 Extra(Foo(Bar(10), Baz(20)), [traced - not evaluated]) parser:1:1 / (AnyChar | ")"):1:21 ...""
My expected output was "Bar(10), Baz(20)". it seems the parser above does not like the ending ")".
AnyChar.rep(1) also includes ) symbol at the end of the input string, as a result the end ) at ~ ")") isn't reached.
If ) symbol weren't used in Bar and Baz, then this could be solved by excluding ) from AnyChar like this:
val parser = P("Foo" ~ "(" ~ (!")" ~ AnyChar).rep(1).! ~ ")")
val input1 = "Foo(Bar(10*, Baz(20*)"
To make Bar and Baz work with ) symbol you could define separate parsers for each of them (also excluding ) symbol from AnyChar. The following solution is a bit more flexible as it allows more occurrences of Bar and Baz but I hope that you get the idea.
val bar = P("Bar" ~ "(" ~ (!")" ~ AnyChar).rep(1) ~ ")")
val baz = P("Baz" ~ "(" ~ (!")" ~ AnyChar).rep(1) ~ ")")
val parser = P("Foo" ~ "(" ~ (bar | baz).rep(sep = ",").! ~ ")")
val input1 = "Foo(Bar(10), Baz(20))"
print(parser.parse(input1))
Result:
Success: Bar(10), Baz(20) 21

Scala parser combinator for Logo list?

I am trying to make a token based scala parser for UCB Logo. The problem I am facing is that in Logo any expression that lists in UCB Logo values in a list can be delimited by one of ']', '[', ' '. If there are any other kinds of delimiters the content in the list should be treated as a word.
In short, how can I make a token parser that will consider the following:
[ 4 3 2 ] - should be a list
[ [ 4 3 2 ] ] - should be a list within a list
[ 1 + 2 ] - should be a word inside a list
[ [ 1 2 3 ] + ] - should be a word inside a list
The following
'[' ~ rep(chrExcept('[', ']')) ~ ']'
produces these tokens:
Tokens: List([, [1 2 3], +, ])
from [ [ 1 2 3 ] + ]. I believe it should produce the tokens:
List([, [1 2 3] +, ]) -> merge the + sign with the token [1 2 3].
This is the current code of the Lexical I am using:
package lexical
import scala.language.postfixOps
import scala.util.parsing.combinator.lexical.Lexical
import scala.util.parsing.input.CharSequenceReader._
/**
* Created by Marin on 28/03/16.
*/
class MyLexical extends Lexical with MyTokens {
def token: Parser[Token] = (
//procDef ^^ { case first ~ chars => processNewProcedure(chars mkString "") }
word2 ^^ { case rest => {
/*val s = if (second.isEmpty) "" else second mkString ""
val t = if(third.isEmpty) "" else third mkString ""
val f = if(fourth.isEmpty) "" else fourth mkString ""
StringLit(s"$first$s$t$f$rest")*/
println(rest)
StringLit("Smth")
}
}
| formalChar ~ rep(identChar | digit) ^^ { case first ~ rest => Formal(first :: rest mkString "") }
| identChar ~ rep(identChar | digit) ^^ { case first ~ rest => processIdent(first :: rest mkString "") }
| procDigit ^^ { case first ~ second ~ rest => NumericLit((first mkString "") :: second.getOrElse("") :: rest mkString "") }
| '\"' ~ rep(chrExcept('\"', EofCh)) ~ ' ' ^^ { case '\"' ~ chars ~ ' ' => StringLit(chars mkString "") }
| EofCh ^^^ EOF
| delim
| failure("Illegal character")
)
def processNewProcedure(chars: String) =
if(reserved.contains(chars)) throw new RuntimeException
else {
Identifier(chars)
}
def procDef = toSeq ~> identChar ~ rep(identChar | elem('_')) <~ formalChar.* <~ endSeq
def toSeq = 't' ~ 'o' ^^^ "to"
def endSeq = 'e' ~ 'n' ~ 'd' ^^^ "end"
def processIdent(name: String) = {
if (reserved contains name) {
Keyword(name)
} else {
Identifier(name)
}
}
def word = {
'[' ~ ((whitespaceChar | digit)*) ~ (_delim | identChar) ~ rep(whitespaceChar | digit) ~ ']'
}
def word2 = {
//'[' ~> rep(whitespaceChar | digit) ~> rep(_delim | identChar) <~ rep(whitespaceChar | digit) <~ ']'
//'[' ~ rep(chrExcept('[', ']')) ~ ']'
rep1('[') ~ rep1(chrExcept('[', ']') | digit) ~ rep(_delim) ~ rep1(']')
//rep1('[') ~ identChar ~ rep(']') ~ rep('+') ~ rep1(']')
//'[' ~ (_delim | chrExcept('[', ']')) ~ ']'
}
def word3 = {
'[' ~> rep(digit | letter | _delim) <~ ']'
}
def procDigit = digit.+ ~ '.'.? ~ digit.*
def identChar = letter | elem('_')
def formalChar = ':' ~ identChar
override def whitespace: Parser[Any] = rep[Any] (
whitespaceChar
| ';' ~ comment
)
def comment: Parser[Any] = rep(chrExcept(EofCh, ';')) ^^ { case _ => ' ' }
/****** Pure copy-paste ******/
/** The set of reserved identifiers: these will be returned as `Keyword`s. */
val reserved = new scala.collection.mutable.HashSet[String]
/** The set of delimiters (ordering does not matter). */
val delimiters = new scala.collection.mutable.HashSet[String]
private lazy val _delim: Parser[Token] = {
// construct parser for delimiters by |'ing together the parsers for the individual delimiters,
// starting with the longest one -- otherwise a delimiter D will never be matched if there is
// another delimiter that is a prefix of D
def parseDelim(s: String): Parser[Token] = accept(s.toList) ^^ { x => Keyword(s) }
val d = new Array[String](delimiters.size)
delimiters.copyToArray(d, 0)
scala.util.Sorting.quickSort(d)
(d.toList map parseDelim).foldRight(failure("no matching delimiter"): Parser[Token])((x, y) => y | x)
}
protected def delim: Parser[Token] = _delim
}