I'm trying to use the FastParse library to create a parser for a very primitive templating system like this:
Hello, your name is {{name}} and today is {{date}}.
So far I have:
scala> import fastparse.all._
import fastparse.all._
scala> val FieldStart = "{{"
FieldStart: String = {{
scala> val FieldEnd = "}}"
FieldEnd: String = }}
scala> val Field = P(FieldStart ~ (!FieldEnd ~ AnyChar).rep.! ~ FieldEnd)
Field: fastparse.all.Parser[String] = Field
scala> val Static = P((!FieldStart ~ !FieldEnd ~ AnyChar).rep.!)
Static: fastparse.all.Parser[String] = Static
scala> val Template = P(Start ~ (Field | Static) ~ End)
Template: fastparse.all.Parser[String] = Template
scala> Template parse "{{foo}}"
res0: fastparse.core.Parsed[String,Char,String] = Success(foo,7)
scala> Template parse "foo"
res1: fastparse.core.Parsed[String,Char,String] = Success(foo,3)
scala> Template parse "{{foo"
res2: fastparse.core.Parsed[String,Char,String] = Failure(End:1:1 ..."{{foo")
But when I try what I think should be the correct final form:
scala> val Template = P(Start ~ (Field | Static).rep ~ End)
Template: fastparse.all.Parser[Seq[String]] = Template
I get:
scala> Template parse "{{foo}}"
java.lang.OutOfMemoryError: Java heap space
at scala.collection.mutable.ResizableArray$class.ensureSize(ResizableArray.scala:103)
at scala.collection.mutable.ArrayBuffer.ensureSize(ArrayBuffer.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$eq(ArrayBuffer.scala:84)
at scala.collection.mutable.ArrayBuffer.$plus$eq(ArrayBuffer.scala:48)
at fastparse.core.Implicits$LowPriRepeater$GenericRepeater.accumulate(Implicits.scala:47)
at fastparse.core.Implicits$LowPriRepeater$GenericRepeater.accumulate(Implicits.scala:44)
at fastparse.parsers.Combinators$Repeat.rec$3(Combinators.scala:462)
at fastparse.parsers.Combinators$Repeat.parseRec(Combinators.scala:489)
at fastparse.parsers.Combinators$Sequence$Flat.rec$1(Combinators.scala:297)
at fastparse.parsers.Combinators$Sequence$Flat.parseRec(Combinators.scala:319)
at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:160)
at fastparse.core.Parser.parseInput(Parsing.scala:374)
at fastparse.core.Parser.parse(Parsing.scala:358)
... 19 elided
What am I doing wrong?
Try like this:
val Field = P(FieldStart ~ (!FieldEnd ~ AnyChar).rep(min=1).! ~ FieldEnd)
val Static = P((!(FieldStart | FieldEnd) ~ AnyChar).rep(min=1).!)
val Template = P(Start ~ (Field | Static) ~ End)
You should be careful with .rep, it literally means zero or more...
Also, in the Static parser, the negative lookahead should look like !(FieldStart | FieldEnd),
I think, because you don't want (open braces or closed braces).
Hope it helps! ;)
Related
Can someone tell me why do we have two separate ways of representing pipe(|) and comma(,). Like
sc.textFile(file).map( x => x.split(","))
for comma, and
sc.textFile(file).map( x => x.split('|'))
for pipe.
Keeping both in double quotes, its failing with pipe and comma is giving me correct result.
Below is the full code which I am running
package com.rakesh.singh
import org.apache.spark._
import org.apache.spark.SparkContext._
import org.apache.log4j._
object MPMovie {
def namex ( x : String) = {
val fields = x.split('|')
val id = fields(0).toInt
val name = fields(1).toString
(id , name)
}
def main(rakesh : Array[String]) = {
Logger.getLogger("yoyo").setLevel(Level.ERROR)
val conf = new SparkConf().setAppName("Movies").setMaster("local[2]")
val sc = new SparkContext(conf)
val rdd = sc.textFile("F:/Raakesh/ml-100k/movies.data")
val names = sc.textFile("F:/Raakesh/ml-100k/names.data")
val mappednames = names.map(namex)
val splited = rdd.map(x => (x.split("\t")(1).toInt,1))
//.map(x => (x,1))
val counteachmovie = splited.reduceByKey( (a ,b )=> a + b).map( x => (x._2 , x._1))
val mpm = counteachmovie.max()
println(s"the final value of mpm is $mpm")
mappednames.foreach(println)
val finalname = mappednames.lookup(mpm._2)(0)
println(s"the final value of mpm is $finalname")
}
}
and data files are
movies.data
196 101 3 881250949
186 101 3 891717742
22 103 1 878887116
244 102 2 880606923
names:Data
101|Sajan
102|Mela
103|Hum
There are two different split methods:
The split(",") method comes originally from String.split(regex: String), it works with arbitrary regexes as separators, e.g.
scala> "helloABCworldCABfooBBACCAbar".split("[ABC]+")
res0: Array[String] = Array(hello, world, foo, bar)
The other split('|') comes from StringOps.split(separator: Char), and is rather like a generic Scala-collection operation. It doesn't work with regex, but it works on all StringLike collections, for example on StringBuilders:
scala> val b = new StringBuilder
b: StringBuilder =
scala> b ++= "hello|"
res2: b.type = hello|
scala> b ++= "world"
res3: b.type = hello|world
scala> b.split('|')
res4: Array[String] = Array(hello, world)
The "|" doesn't work with the first method, because it's a nonsensical "OR"-regex. In order to use the pipe | with the split(regex: String) version, you either have to escape it like this "\\|" or (often easier) to enclose it into "[|]"-character class.
I am trying to learn the scala fast parse library. Towards this I have written the following code
import fastparse.noApi._
import fastparse.WhitespaceApi
object FastParsePOC {
val White = WhitespaceApi.Wrapper{
import fastparse.all._
NoTrace(" ".rep)
}
def print(input : Parsed[String]): Unit = {
input match {
case Parsed.Success(value, index) => println(s"Success: $value $index")
case f # Parsed.Failure(error, line, col) => println(s"Error: $error $line $col ${f.extra.traced.trace}")
}
}
def main(args: Array[String]) : Unit = {
import White._
val parser = P("Foo" ~ "(" ~ AnyChar.rep(1).! ~ ")")
val input1 = "Foo(Bar(10), Baz(20))"
print(parser.parse(input1))
}
}
But I get error
Error: ")" 21 Extra(Foo(Bar(10), Baz(20)), [traced - not evaluated]) parser:1:1 / (AnyChar | ")"):1:21 ...""
My expected output was "Bar(10), Baz(20)". it seems the parser above does not like the ending ")".
AnyChar.rep(1) also includes ) symbol at the end of the input string, as a result the end ) at ~ ")") isn't reached.
If ) symbol weren't used in Bar and Baz, then this could be solved by excluding ) from AnyChar like this:
val parser = P("Foo" ~ "(" ~ (!")" ~ AnyChar).rep(1).! ~ ")")
val input1 = "Foo(Bar(10*, Baz(20*)"
To make Bar and Baz work with ) symbol you could define separate parsers for each of them (also excluding ) symbol from AnyChar. The following solution is a bit more flexible as it allows more occurrences of Bar and Baz but I hope that you get the idea.
val bar = P("Bar" ~ "(" ~ (!")" ~ AnyChar).rep(1) ~ ")")
val baz = P("Baz" ~ "(" ~ (!")" ~ AnyChar).rep(1) ~ ")")
val parser = P("Foo" ~ "(" ~ (bar | baz).rep(sep = ",").! ~ ")")
val input1 = "Foo(Bar(10), Baz(20))"
print(parser.parse(input1))
Result:
Success: Bar(10), Baz(20) 21
import scala.util.parsing.combinator._
object ExprParser extends JavaTokenParsers {
lazy val name: Parser[_] = "a" ~ rep("a" | "1") | function_call
lazy val function_call = name ~ "(" ~> name <~ ")"
}
recurs indefinitely for function_call.parseAll("aaa(1)"). Obviously, it is because 1 cannot inter the name and name enters the function_call, which tries the name, which enters the funciton call. How do you resolve such situations?
There was a solution to reduce name to simple identifier
def name = rep1("a" | "1")
def function_call = name ~ "(" ~ (function_call | name) ~ ")"
but I prefer not to do this because name ::= identifier | function_call is BNF-ed in VHDL specification and function_call is probably shared elsewhere. The left recursion elimination found here is undesirable for the same reason
def name: Parser[_] = "a" ~ rep("a" | "1") ~ pared_name
def pared_name: Parser[_] = "(" ~> name <~ ")" | ""
BTW, I also wonder, if I fix the error, will name.parseAll consume "aaa" only as first alternative in the name rule or take whole "aaa(1)"? How can I make name to consume the whole aaa(1) before consuming only aaa? I guess that I should put function_call a first alternative in the name but it will stack overflow even more eagerly in this case?
An easy solution is use the packrat parser:
object ExprParser extends JavaTokenParsers with PackratParsers {
lazy val name: PackratParser[_] = "a" ~ rep("a" | "1") | function_call
lazy val function_call: PackratParser[_] = name ~ "(" ~> name <~ ")"
}
Output:
scala> ExprParser.parseAll(ExprParser.function_call, "aaa(1)")
res0: ExprParser.ParseResult[Any] =
[1.5] failure: Base Failure
aaa(1)
^
I have scala expression stored in String variable:
val myExpr = "(xml \ \"node\")"
How do I execute this?
s"${myExpr}"
Right now it only gives me the string contents
What I'm trying to achieve is parsing user string input in the form:
"/some/node/in/xml"
and get that corresponding node in Scala:
(xml \ "node" \ "in" \ "xml")
For the REPL, my init includes:
implicit class interpoleter(val sc: StringContext) {def i(args: Any*) = $intp interpret sc.s(args: _*) }
with which
scala> val myExpr = "(xml \\ \"node\")"
myExpr: String = (xml \ "node")
scala> val xml = <x><node/></x>
xml: scala.xml.Elem = <x><node/></x>
scala> i"${myExpr}"
res3: scala.xml.NodeSeq = NodeSeq(<node/>)
res2: scala.tools.nsc.interpreter.IR.Result = Success
because isn't code really just a string, like everything else?
Probably, there is some more idiomatic way in recent scala versions, but you can use Twitter's Eval for that:
val i: Int = new Eval()("1 + 1") // => 2
I would like to re-resolve a config object.
for example if I define this config:
val conf = ConfigFactory.parseString(
"""
| foo = a
| bar = ${foo}1
| baz = ${foo}2
""".stripMargin).resolve()
I will get those values:
conf.getString("bar") //res0: String = a1
conf.getString("baz") //res1: String = a2
given the object conf, what I want is to be able to change the value of foo, and get updated values for bar and baz.
Something like :
val conf2 = conf
.withValue("foo", ConfigValueFactory.fromAnyRef("b"))
.resolve()
and get:
conf2.getString("bar") //res0: String = b1
conf2.getString("baz") //res1: String = b2
but running this code will result in:
conf2.getString("foo") //res0: String = b
conf2.getString("bar") //res1: String = a1
conf2.getString("baz") //res2: String = a2
is this even possible?
It's not possible once resolve is called. In the documentation for resolve, it says:
Returns a replacement config with all substitutions resolved... Resolving an already-resolved config is a harmless no-op.
In other words, once you call resolve, all the substitions occur, and there is no reference to the original HOCON substitution syntax.
Of course, you can keep the unresolved Config object as a variable, and then use withValue:
val rawConf = ConfigFactory.parseString(
"""
| foo = a
| bar = ${foo}1
| baz = ${foo}2
""".stripMargin)
val conf2 = rawConf.withValue("foo", ConfigValueFactory.fromAnyRef("b")).resolve
val conf = rawConf.resolve
conf.getString("bar") //a1
conf2.getString("bar") //b1, as desired