Scala FastParse Library Error - scala

I am trying to learn the scala fast parse library. Towards this I have written the following code
import fastparse.noApi._
import fastparse.WhitespaceApi
object FastParsePOC {
val White = WhitespaceApi.Wrapper{
import fastparse.all._
NoTrace(" ".rep)
}
def print(input : Parsed[String]): Unit = {
input match {
case Parsed.Success(value, index) => println(s"Success: $value $index")
case f # Parsed.Failure(error, line, col) => println(s"Error: $error $line $col ${f.extra.traced.trace}")
}
}
def main(args: Array[String]) : Unit = {
import White._
val parser = P("Foo" ~ "(" ~ AnyChar.rep(1).! ~ ")")
val input1 = "Foo(Bar(10), Baz(20))"
print(parser.parse(input1))
}
}
But I get error
Error: ")" 21 Extra(Foo(Bar(10), Baz(20)), [traced - not evaluated]) parser:1:1 / (AnyChar | ")"):1:21 ...""
My expected output was "Bar(10), Baz(20)". it seems the parser above does not like the ending ")".

AnyChar.rep(1) also includes ) symbol at the end of the input string, as a result the end ) at ~ ")") isn't reached.
If ) symbol weren't used in Bar and Baz, then this could be solved by excluding ) from AnyChar like this:
val parser = P("Foo" ~ "(" ~ (!")" ~ AnyChar).rep(1).! ~ ")")
val input1 = "Foo(Bar(10*, Baz(20*)"
To make Bar and Baz work with ) symbol you could define separate parsers for each of them (also excluding ) symbol from AnyChar. The following solution is a bit more flexible as it allows more occurrences of Bar and Baz but I hope that you get the idea.
val bar = P("Bar" ~ "(" ~ (!")" ~ AnyChar).rep(1) ~ ")")
val baz = P("Baz" ~ "(" ~ (!")" ~ AnyChar).rep(1) ~ ")")
val parser = P("Foo" ~ "(" ~ (bar | baz).rep(sep = ",").! ~ ")")
val input1 = "Foo(Bar(10), Baz(20))"
print(parser.parse(input1))
Result:
Success: Bar(10), Baz(20) 21

Related

How to use if-else condition in Scala's Filter?

I have an ArrayBuffer with data in the following format: period_name:character varying(15) year:bigint. The data in it represents column name of a table and its datatype. My requirement is to extract the column name period and the datatype, just character varying excluding substring from "(" till ")" and then send all the elements to a ListBuffer. I came up with the following logic:
for(i <- receivedGpData) {
gpTypes = i.split("\\:")
if(gpTypes(1).contains("(")) {
gpColType = gpTypes(1).substring(0, gpTypes(1).indexOf("("))
prepList += gpTypes(0) + " " + gpColType
} else {
prepList += gpTypes(0) + " " + gpTypes(1)
}
}
The above code is working but I am trying to implement the same using Scala's Map and Filter functions. What I don't understand is how to use the if-else condition in the Scala Filter after the condition:
var reList = receivedGpData.map(element => element.split(":"))
.filter{ x => x(1).contains("(")
}
Could anyone let me know how can I implement the same code in for-loop using Scala's map & filter functions ?
val receivedGpData = Array("bla:bla(1)","bla2:cat")
val res = receivedGpData
.map(_.split(":"))
.map(s=>(s(0),s(1).takeWhile(_!='(')))
.map(s => s"${s._1} ${s._2}").toList
println(res)
Using regex:
val p = "(\\w+):([.[^(]]*)(\\(.*\\))?".r
val res = data.map{case p(x,y,_)=>x+" "+y}
In Scala REPL:
scala> val data = Array("period_name:character varying(15)","year:bigint")
data: Array[String] = Array(period_name:character varying(15), year:bigint)
scala> val p = "(\\w+):([.[^(]]*)(\\(.*\\))?".r
p: scala.util.matching.Regex = (\w+):([.[^(]]*)(\(.*\))?
scala> val res = data.map{case p(x,y,_)=>x+" "+y}
res: Array[String] = Array(period_name character varying, year bigint)

scala fastparse typechecking

I am puzzled by why the following code using scala fastparse 0.4.3 fails typechecking.
val White = WhitespaceApi.Wrapper{
import fastparse.all._
NoTrace(CharIn(" \t\n").rep)
}
import fastparse.noApi._
import White._
case class Term(tokens: Seq[String])
case class Terms(terms: Seq[Term])
val token = P[String] ( CharIn('a' to 'z', 'A' to 'Z', '0' to '9').rep(min=1).!)
val term: P[Term] = P("[" ~ token.!.rep(sep=" ", min=1) ~ "]").map(x => Term(x))
val terms = P("(" ~ term.!.rep(sep=" ", min=1) ~ ")").map{x => Terms(x)}
val parse = terms.parse("([ab bd ef] [xy wa dd] [jk mn op])")
The error messages:
[error] .../MyParser.scala: type mismatch;
[error] found : Seq[String]
[error] required: Seq[Term]
[error] val terms = P("(" ~ term.!.rep(sep=" ", min=1) ~")").map{x => Terms(x)}
[error] ^
I would imagine that since term is of type Term and since the terms pattern uses term.!.rep(..., it should get a Seq[Term].
I figured it out. My mistake was capturing (with !) redundantly in terms. That line should instead be written:
val terms = P("(" ~ term.rep(sep=" ", min=1) ~ ")").map{x => Terms(x)}
Notice that term.!.rep( has been rewritten to term.rep(.
Apparently capturing in any rule will return the text that the captured subrule matches overriding what the subrule actually returns. I guess this is a feature when used correctly. :)

FastParse - out of memory error

I'm trying to use the FastParse library to create a parser for a very primitive templating system like this:
Hello, your name is {{name}} and today is {{date}}.
So far I have:
scala> import fastparse.all._
import fastparse.all._
scala> val FieldStart = "{{"
FieldStart: String = {{
scala> val FieldEnd = "}}"
FieldEnd: String = }}
scala> val Field = P(FieldStart ~ (!FieldEnd ~ AnyChar).rep.! ~ FieldEnd)
Field: fastparse.all.Parser[String] = Field
scala> val Static = P((!FieldStart ~ !FieldEnd ~ AnyChar).rep.!)
Static: fastparse.all.Parser[String] = Static
scala> val Template = P(Start ~ (Field | Static) ~ End)
Template: fastparse.all.Parser[String] = Template
scala> Template parse "{{foo}}"
res0: fastparse.core.Parsed[String,Char,String] = Success(foo,7)
scala> Template parse "foo"
res1: fastparse.core.Parsed[String,Char,String] = Success(foo,3)
scala> Template parse "{{foo"
res2: fastparse.core.Parsed[String,Char,String] = Failure(End:1:1 ..."{{foo")
But when I try what I think should be the correct final form:
scala> val Template = P(Start ~ (Field | Static).rep ~ End)
Template: fastparse.all.Parser[Seq[String]] = Template
I get:
scala> Template parse "{{foo}}"
java.lang.OutOfMemoryError: Java heap space
at scala.collection.mutable.ResizableArray$class.ensureSize(ResizableArray.scala:103)
at scala.collection.mutable.ArrayBuffer.ensureSize(ArrayBuffer.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$eq(ArrayBuffer.scala:84)
at scala.collection.mutable.ArrayBuffer.$plus$eq(ArrayBuffer.scala:48)
at fastparse.core.Implicits$LowPriRepeater$GenericRepeater.accumulate(Implicits.scala:47)
at fastparse.core.Implicits$LowPriRepeater$GenericRepeater.accumulate(Implicits.scala:44)
at fastparse.parsers.Combinators$Repeat.rec$3(Combinators.scala:462)
at fastparse.parsers.Combinators$Repeat.parseRec(Combinators.scala:489)
at fastparse.parsers.Combinators$Sequence$Flat.rec$1(Combinators.scala:297)
at fastparse.parsers.Combinators$Sequence$Flat.parseRec(Combinators.scala:319)
at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:160)
at fastparse.core.Parser.parseInput(Parsing.scala:374)
at fastparse.core.Parser.parse(Parsing.scala:358)
... 19 elided
What am I doing wrong?
Try like this:
val Field = P(FieldStart ~ (!FieldEnd ~ AnyChar).rep(min=1).! ~ FieldEnd)
val Static = P((!(FieldStart | FieldEnd) ~ AnyChar).rep(min=1).!)
val Template = P(Start ~ (Field | Static) ~ End)
You should be careful with .rep, it literally means zero or more...
Also, in the Static parser, the negative lookahead should look like !(FieldStart | FieldEnd),
I think, because you don't want (open braces or closed braces).
Hope it helps! ;)

Scala parser combinator for Logo list?

I am trying to make a token based scala parser for UCB Logo. The problem I am facing is that in Logo any expression that lists in UCB Logo values in a list can be delimited by one of ']', '[', ' '. If there are any other kinds of delimiters the content in the list should be treated as a word.
In short, how can I make a token parser that will consider the following:
[ 4 3 2 ] - should be a list
[ [ 4 3 2 ] ] - should be a list within a list
[ 1 + 2 ] - should be a word inside a list
[ [ 1 2 3 ] + ] - should be a word inside a list
The following
'[' ~ rep(chrExcept('[', ']')) ~ ']'
produces these tokens:
Tokens: List([, [1 2 3], +, ])
from [ [ 1 2 3 ] + ]. I believe it should produce the tokens:
List([, [1 2 3] +, ]) -> merge the + sign with the token [1 2 3].
This is the current code of the Lexical I am using:
package lexical
import scala.language.postfixOps
import scala.util.parsing.combinator.lexical.Lexical
import scala.util.parsing.input.CharSequenceReader._
/**
* Created by Marin on 28/03/16.
*/
class MyLexical extends Lexical with MyTokens {
def token: Parser[Token] = (
//procDef ^^ { case first ~ chars => processNewProcedure(chars mkString "") }
word2 ^^ { case rest => {
/*val s = if (second.isEmpty) "" else second mkString ""
val t = if(third.isEmpty) "" else third mkString ""
val f = if(fourth.isEmpty) "" else fourth mkString ""
StringLit(s"$first$s$t$f$rest")*/
println(rest)
StringLit("Smth")
}
}
| formalChar ~ rep(identChar | digit) ^^ { case first ~ rest => Formal(first :: rest mkString "") }
| identChar ~ rep(identChar | digit) ^^ { case first ~ rest => processIdent(first :: rest mkString "") }
| procDigit ^^ { case first ~ second ~ rest => NumericLit((first mkString "") :: second.getOrElse("") :: rest mkString "") }
| '\"' ~ rep(chrExcept('\"', EofCh)) ~ ' ' ^^ { case '\"' ~ chars ~ ' ' => StringLit(chars mkString "") }
| EofCh ^^^ EOF
| delim
| failure("Illegal character")
)
def processNewProcedure(chars: String) =
if(reserved.contains(chars)) throw new RuntimeException
else {
Identifier(chars)
}
def procDef = toSeq ~> identChar ~ rep(identChar | elem('_')) <~ formalChar.* <~ endSeq
def toSeq = 't' ~ 'o' ^^^ "to"
def endSeq = 'e' ~ 'n' ~ 'd' ^^^ "end"
def processIdent(name: String) = {
if (reserved contains name) {
Keyword(name)
} else {
Identifier(name)
}
}
def word = {
'[' ~ ((whitespaceChar | digit)*) ~ (_delim | identChar) ~ rep(whitespaceChar | digit) ~ ']'
}
def word2 = {
//'[' ~> rep(whitespaceChar | digit) ~> rep(_delim | identChar) <~ rep(whitespaceChar | digit) <~ ']'
//'[' ~ rep(chrExcept('[', ']')) ~ ']'
rep1('[') ~ rep1(chrExcept('[', ']') | digit) ~ rep(_delim) ~ rep1(']')
//rep1('[') ~ identChar ~ rep(']') ~ rep('+') ~ rep1(']')
//'[' ~ (_delim | chrExcept('[', ']')) ~ ']'
}
def word3 = {
'[' ~> rep(digit | letter | _delim) <~ ']'
}
def procDigit = digit.+ ~ '.'.? ~ digit.*
def identChar = letter | elem('_')
def formalChar = ':' ~ identChar
override def whitespace: Parser[Any] = rep[Any] (
whitespaceChar
| ';' ~ comment
)
def comment: Parser[Any] = rep(chrExcept(EofCh, ';')) ^^ { case _ => ' ' }
/****** Pure copy-paste ******/
/** The set of reserved identifiers: these will be returned as `Keyword`s. */
val reserved = new scala.collection.mutable.HashSet[String]
/** The set of delimiters (ordering does not matter). */
val delimiters = new scala.collection.mutable.HashSet[String]
private lazy val _delim: Parser[Token] = {
// construct parser for delimiters by |'ing together the parsers for the individual delimiters,
// starting with the longest one -- otherwise a delimiter D will never be matched if there is
// another delimiter that is a prefix of D
def parseDelim(s: String): Parser[Token] = accept(s.toList) ^^ { x => Keyword(s) }
val d = new Array[String](delimiters.size)
delimiters.copyToArray(d, 0)
scala.util.Sorting.quickSort(d)
(d.toList map parseDelim).foldRight(failure("no matching delimiter"): Parser[Token])((x, y) => y | x)
}
protected def delim: Parser[Token] = _delim
}

Stack overflow when using parser combinators

import scala.util.parsing.combinator._
object ExprParser extends JavaTokenParsers {
lazy val name: Parser[_] = "a" ~ rep("a" | "1") | function_call
lazy val function_call = name ~ "(" ~> name <~ ")"
}
recurs indefinitely for function_call.parseAll("aaa(1)"). Obviously, it is because 1 cannot inter the name and name enters the function_call, which tries the name, which enters the funciton call. How do you resolve such situations?
There was a solution to reduce name to simple identifier
def name = rep1("a" | "1")
def function_call = name ~ "(" ~ (function_call | name) ~ ")"
but I prefer not to do this because name ::= identifier | function_call is BNF-ed in VHDL specification and function_call is probably shared elsewhere. The left recursion elimination found here is undesirable for the same reason
def name: Parser[_] = "a" ~ rep("a" | "1") ~ pared_name
def pared_name: Parser[_] = "(" ~> name <~ ")" | ""
BTW, I also wonder, if I fix the error, will name.parseAll consume "aaa" only as first alternative in the name rule or take whole "aaa(1)"? How can I make name to consume the whole aaa(1) before consuming only aaa? I guess that I should put function_call a first alternative in the name but it will stack overflow even more eagerly in this case?
An easy solution is use the packrat parser:
object ExprParser extends JavaTokenParsers with PackratParsers {
lazy val name: PackratParser[_] = "a" ~ rep("a" | "1") | function_call
lazy val function_call: PackratParser[_] = name ~ "(" ~> name <~ ")"
}
Output:
scala> ExprParser.parseAll(ExprParser.function_call, "aaa(1)")
res0: ExprParser.ParseResult[Any] =
[1.5] failure: Base Failure
aaa(1)
^