Using: scala.util.parsing.combinator
def varToken: Parser[ZVAR] = """[A-Za-z_][A-Za-z_0-9]*""".r ^^ { str => println("varToken: "+str);ZVAR(str) }
def number: Parser[ZNUMBER] = """-?(0|[1-9]\d*)(\.\d+)?([eE][+-]?\d*)?""".r ^^ { str => ZNUMBER(str) }
def dot: Parser[ZDOT] = """\.""".r ^^ { _ => ZDOT() }
def ref: Parser[_] = varToken ~ refArgDot.* ^^ {r => println("ref: "+r)} // ascii token + optional repeating "dot" notations
def refArgDot = dot ~ varToken ^^ { a => println("Dot: "+a)}
val define = ref ~ number.* // top-level match
Input string:
foo.bar.baz 1 2 3
Output:
varToken: foo
varToken: bar
Dot: (ZDOT()~ZVAR(bar))
varToken: baz
Dot: (ZDOT()~ZVAR(baz))
ref: (ZVAR(foo)~List((), ()))
Matching (top-level) against define.
You can see the varTokens match as expected, as do the dots (refArgDot).
But look at the ref: I expect it to capture [foo,dot,bar,dot,baz] but it only captures foo and nothing else.
It's basically failing to capture the ref properly.
I would have expected something along the lines of:
ref: (ZVAR(foo)~List(List(ZDOT()~ZVAR(bar)), List(ZDOT()~ZVAR(baz))))
The println blocks are swallowing the return value. Should be:
def ref: Parser[_] = varToken ~ refArgDot.* ^^ {r => println("ref: "+r); r}
Related
I have one library which supports some kind of custom language. The parser is written using scala RegexParsers. Now I'm trying to rewrite our parser using fastparse library to speedup our engine.
The question is: Is it possible to parse properly params inside our pseudolanguage function?
Here is an example:
$out <= doSomething('/mypath[text() != '']', 'def f(a) {a * 2}', ',') <= $in
here is a function doSomething with 3 params:
/mypath[text() != '']
def f(a) {a * 2}
,
I'm expecting to get a tree for the function with params:
Function(
name = doSomething
params = List[String](
"/mypath[text() != '']",
"def f(a) {a * 2}",
","
)
)
What I do:
val ws = P(CharsWhileIn(" \r\n"))
def wsSep(sep: String) = P(ws.? ~ sep ~ ws.?)
val name = P(CharsIn('a' to 'z', 'A' to 'Z'))
val param = P(ws.? ~ "'" ~ CharPred(_ != '\'').rep ~ "'" ~ ws.?)
val params = P("(" ~ param.!.rep(sep = wsSep(",")) ~ ")")
val function = P(name.! ~ params.?).map(case (name, params) => Function(name, params.getOrElse(List())))
The problem here that the single quotes represent a String in my code, but inside that string sometimes we have additional single quotes like here:
/mypath[text() != '']
So, I can't use CharPred(_ != '\'') in my case
We also have a commas inside a Strings like in 3rd param
This is works somehow using scala parser but I can't parse the same using fastparse
Does anyone have ideas how to make the parser work properly?
Update
Got it!
The main magic is in val param
object X {
import fastparse.all._
case class Fn(name: String, params: Seq[String])
val ws = P(CharsWhileIn(" \r\n"))
def wsSep(sep: String) = P(ws.? ~ sep ~ ws.?)
val name = P(CharIn('a' to 'z', 'A' to 'Z').rep)
val param = P(ws.? ~ "'" ~ (!("'" ~ ws.? ~ ("," | ")")) ~ AnyChar).rep ~ "'" ~ ws.?)
val params = P("(" ~ param.!.rep(sep = wsSep(",")) ~ ")")
val function = P(name.! ~ params.?).map{case (name, params) => Fn(name, params.getOrElse(Seq()))}
}
object Test extends App {
val res = X.function.parse("myFunction('/hello[name != '']' , 'def f(a) {mytest}', ',')")
res match {
case Success(r, z) =>
println(s"fn name: ${r.name}")
println(s"params:\n {${r.params.mkString("\n")}\n}")
case Failure(e, z, m) => println(m)
}
}
out:
name: myFunction
params:
'/hello[name != '']'
'def f(a) {mytest}'
','
Using scala parser combinators I have parsed some text input and created some of my own types along the way. The result prints fine. Now I need to go through the output, which I presume is a nested structure that includes the types I created. How do I go about this?
This is how I call the parser:
GMParser1.parseItem(i_inputHard_2) match {
case GMParser1.Success(res, _) =>
println(">" + res + "< of type: " + res.getClass.getSimpleName)
case x =>
println("Could not parse the input string:" + x)
}
EDIT
What I get back from MsgResponse(O2 Flow|Off) is:
>(MsgResponse~(O2 Flow~Off))< of type: $tilde
And what I get back from WthrResponse(Id(Tube 25,Carbon Monoxide)|0.20) is:
>(WthrResponse~IdWithValue(Id(Tube 25,Carbon Monoxide),0.20))< of type: $tilde
Just to give context to the question here is some of the input parsing. I will want to get at Id:
trait Keeper
case class Id(leftContents:String,rightContents:String) extends Keeper
And here is Id being created:
def id = "Id(" ~> idContents <~ ")" ^^ { contents => Id(contents._1,contents._2) }
And here is the whole of the parser:
object GMParser1 extends RegexParsers {
override def skipWhitespace = false
def number = regex(new Regex("[-+]?(\\d*[.])?\\d+"))
def idContents = text ~ ("," ~> text)
def id = "Id(" ~> idContents <~ ")" ^^ { contents => Id(contents._1,contents._2) }
def text = """[A-Za-z0-9* ]+""".r
def wholeWord = """[A-Za-z]+""".r
def idBracketContents = id ~ ( "|" ~> number ) ^^ { contents => IdWithValue(contents._1,contents._2) }
def nonIdBracketContents = text ~ ( "|" ~> text )
def bracketContents = idBracketContents | nonIdBracketContents
def outerBrackets = "(" ~> bracketContents <~ ")"
def target = wholeWord ~ outerBrackets
def parseItem(str: String): ParseResult[Any] = parse(target, str)
trait Keeper
case class Id(leftContents:String,rightContents:String) extends Keeper
case class IdWithValue(leftContents:Id,numberContents:String) extends Keeper
}
The parser created by the ~ operator produces a value of the ~ case class. To get at its contents, you can pattern match on it like on any other case class (keeping in mind that its name is symbolic, so it's used infix).
So you can replace case GMParser1.Success(res, _) => ... with case GMParser1.Success(functionName ~ argument) => ... to get at the function name and argument (or whatever the semantics of wholeWord and bracketContents in wholeWord "(" bracketContents ")" are). You can then similarly use a nested pattern to get at the individual parts of the argument.
You could (and probably should) also use ^^ together with pattern matching in your rules to create a more meaningful AST structure that doesn't contain ~. This would be useful to distinguish a nonIdBracketContents result from a bracketContents result for example.
I am writing a parser trying to calculate the result of an expression containing float and an RDD, I have override + - / * and it works fine. In one part I am getting the famous error the "reassignment to val" but cannot figure out how to solve it.
Part of the code is as follow:
def calc: Parser[Any]=rep(term2 ~ operator) ^^ {
//match a list of term~operator
case termss =>
var stack =List[Either[RDD[(Int,Array[Float])], Float]]()
var lastop:(Either[RDD[(Int,Array[Float])], Float], Either[RDD[(Int,Array[Float])], Float]) => RDD[(Int,Array[Float])] = add
termss.foreach(t =>
t match { case nums ~ op => {
if (nums=="/path1/test3D.xml")
nums=sv.getInlineArrayRDD()
lastop = op; stack = reduce(stack ++ nums, op)}}
)
stack.reduceRight((x, y) => lastop(y, x))
}
def term2: Parser[List[Any]] = rep(factor2)
def factor2: Parser[Any] = pathIdent | num | "(" ~> calc <~ ")"
def num: Parser[Float] = floatingPointNumber ^^ (_.toFloat)
I defined pathIdent to parse paths.
Here is the error:
[error] reassignment to val:
[error] nums=sv.getInlineArrayRDD()
[error] ^
I have changed def in term2, factor2, and num to var although I knew it seems incorrect but that's the only thing came into my mind to test and it didn't work.
Where is it coming from?
In this piece of code:
case nums ~ op => {
if (nums=="/path1/test3D.xml")
nums=sv.getInlineArrayRDD()
The nums isn't reassignable because it comes from the pattern matching (see the case line). The last line (nums = ...) is trying to assign to nums when it can't.
I have a working parser, but I've just realised I do not cater for comments. In the DSL I am parsing, comments start with a ; character. If a ; is encountered, the rest of the line is ignored (not all of it however, unless the first character is ;).
I am extending RegexParsers for my parser and ignoring whitespace (the default way), so I am losing the new line characters anyway. I don't wish to modify each and every parser I have to cater for the possibility of comments either, because statements can span across multiple lines (thus each part of each statement may end with a comment). Is there any clean way to acheive this?
One thing that may influence your choice is whether comments can be found within your valid parsers. For instance let's say you have something like:
val p = "(" ~> "[a-z]*".r <~ ")"
which would parse something like ( abc ) but because of comments you could actually encounter something like:
( ; comment goes here
abc
)
Then I would recommend using a TokenParser or one of its subclass. It's more work because you have to provide a lexical parser that will do a first pass to discard the comments. But it is also more flexible if you have nested comments or if the ; can be escaped or if the ; can be inside a string literal like:
abc = "; don't ignore this" ; ignore this
On the other hand, you could also try to override the value of whitespace to be something like
override protected val whiteSpace = """(\s|;.*)+""".r
Or something along those lines.
For instance using the example from the RegexParsers scaladoc:
import scala.util.parsing.combinator.RegexParsers
object so1 {
Calculator("""(1 + ; foo
(1 + 2))
; bar""")
}
object Calculator extends RegexParsers {
override protected val whiteSpace = """(\s|;.*)+""".r
def number: Parser[Double] = """\d+(\.\d*)?""".r ^^ { _.toDouble }
def factor: Parser[Double] = number | "(" ~> expr <~ ")"
def term: Parser[Double] = factor ~ rep("*" ~ factor | "/" ~ factor) ^^ {
case number ~ list => (number /: list) {
case (x, "*" ~ y) => x * y
case (x, "/" ~ y) => x / y
}
}
def expr: Parser[Double] = term ~ rep("+" ~ log(term)("Plus term") | "-" ~ log(term)("Minus term")) ^^ {
case number ~ list => list.foldLeft(number) { // same as before, using alternate name for /:
case (x, "+" ~ y) => x + y
case (x, "-" ~ y) => x - y
}
}
def apply(input: String): Double = parseAll(expr, input) match {
case Success(result, _) => result
case failure: NoSuccess => scala.sys.error(failure.msg)
}
}
This prints:
Plus term --> [2.9] parsed: 2.0
Plus term --> [2.10] parsed: 3.0
res0: Double = 4.0
Just filter out all the comments with a regex before you pass the code into your parser.
def removeComments(input: String): String = {
"""(?ms)\".*?\"|;.*?$|.+?""".r.findAllIn(input).map(str => if(str.startsWith(";")) "" else str).mkString
}
val code =
"""abc "def; ghij"
abc ;this is a comment
def"""
println(removeComments(code))
Let's say I want to parse a string with various opening and closing brackets (I used parentheses in the title because I believe it is more common -- the question is the same nevertheless) so that I get all the higher levels separated in a list.
Given:
[hello:=[notting],[hill]][3.4(4.56676|5.67787)][the[hill[is[high]]not]]
I want:
List("[hello:=[notting],[hill]]", "[3.4(4.56676|5.67787)]", "[the[hill[is[high]]not]]")
The way I am doing this is by counting the opening and closing brackets and adding to the list whenever I get my counter to 0. However, I have an ugly imperative code. You may assume that the original string is well formed.
My question is: what would be a nice functional approach to this problem?
Notes: I have thought of using the for...yield construct but given the use of the counters I cannot get a simple conditional (I must have conditionals just for updating the counters as well) and I do not know how I could use this construct in this case.
Quick solution using Scala parser combinator library:
import util.parsing.combinator.RegexParsers
object Parser extends RegexParsers {
lazy val t = "[^\\[\\]\\(\\)]+".r
def paren: Parser[String] =
("(" ~ rep1(t | paren) ~ ")" |
"[" ~ rep1(t | paren) ~ "]") ^^ {
case o ~ l ~ c => (o :: l ::: c :: Nil) mkString ""
}
def all = rep(paren)
def apply(s: String) = parseAll(all, s)
}
Checking it in REPL:
scala> Parser("[hello:=[notting],[hill]][3.4(4.56676|5.67787)][the[hill[is[high]]not]]")
res0: Parser.ParseResult[List[String]] = [1.72] parsed: List([hello:=[notting],[hill]], [3.4(4.56676|5.67787)], [the[hill[is[high]]not]])
What about:
def split(input: String): List[String] = {
def loop(pos: Int, ends: List[Int], xs: List[String]): List[String] =
if (pos >= 0)
if ((input charAt pos) == ']') loop(pos-1, pos+1 :: ends, xs)
else if ((input charAt pos) == '[')
if (ends.size == 1) loop(pos-1, Nil, input.substring(pos, ends.head) :: xs)
else loop(pos-1, ends.tail, xs)
else loop(pos-1, ends, xs)
else xs
loop(input.length-1, Nil, Nil)
}
scala> val s1 = "[hello:=[notting],[hill]][3.4(4.56676|5.67787)][the[hill[is[high]]not]]"
s1: String = [hello:=[notting],[hill]][3.4(4.56676|5.67787)][the[hill[is[high]]not]]
scala> val s2 = "[f[sad][add]dir][er][p]"
s2: String = [f[sad][add]dir][er][p]
scala> split(s1) foreach println
[hello:=[notting],[hill]]
[3.4(4.56676|5.67787)]
[the[hill[is[high]]not]]
scala> split(s2) foreach println
[f[sad][add]dir]
[er]
[p]
Given your requirements counting the parenthesis seems perfectly fine. How would you do that in a functional way? You can make the state explicitly passed around.
So first we define our state which accumulates results in blocks or concatenates the next block and keeps track of the depth:
case class Parsed(blocks: Vector[String], block: String, depth: Int)
Then we write a pure function that processed that returns the next state. Hopefully, we can just carefully look at this one function and ensure it's correct.
def nextChar(parsed: Parsed, c: Char): Parsed = {
import parsed._
c match {
case '[' | '(' => parsed.copy(block = block + c,
depth = depth + 1)
case ']' | ')' if depth == 1
=> parsed.copy(blocks = blocks :+ (block + c),
block = "",
depth = depth - 1)
case ']' | ')' => parsed.copy(block = block + c,
depth = depth - 1)
case _ => parsed.copy(block = block + c)
}
}
Then we just used a foldLeft to process the data with an initial state:
val data = "[hello:=[notting],[hill]][3.4(4.56676|5.67787)][the[hill[is[high]]not]]"
val parsed = data.foldLeft(Parsed(Vector(), "", 0))(nextChar)
parsed.blocks foreach println
Which returns:
[hello:=[notting],[hill]]
[3.4(4.56676|5.67787)]
[the[hill[is[high]]not]]
You have an ugly imperative solution, so why not make a good-looking one? :)
This is an imperative translation of huynhjl's solution, but just posting to show that sometimes imperative is concise and perhaps easier to follow.
def parse(s: String) = {
var res = Vector[String]()
var depth = 0
var block = ""
for (c <- s) {
block += c
c match {
case '[' => depth += 1
case ']' => depth -= 1
if (depth == 0) {
res :+= block
block = ""
}
case _ =>
}
}
res
}
Try this:
val s = "[hello:=[notting],[hill]][3.4(4.56676|5.67787)][the[hill[is[high]]not]]"
s.split("]\\[").toList
returns:
List[String](
[hello:=[notting],[hill],
3.4(4.56676|5.67787),
the[hill[is[high]]not]]
)