parsing white spaces in front of `elem` method - scala

I tried to parse an input of two Ints and some elements and the end:
import scala.util.parsing.combinator.JavaTokenParsers
class X extends JavaTokenParsers {
lazy val elems = elem("wrong elem", "#WB-" contains _)
lazy val lists = repsep(rep(elems), ",")
lazy val p1 = int ~ int ~ lists
lazy val p2 = int ~ int ~ (whiteSpace ~> lists)
def go[A](p: Parser[A]) = parseAll(p, "1 2 WB#,---,BBB") match {
case NoSuccess(msg, _) => sys.error(msg)
case _ =>
}
lazy val int: Parser[Int] =
wholeNumber ^^ {
try _.toInt catch {
case e: NumberFormatException => sys.error("invalid number")
}
}
}
An example input is given in method go. The Ints and the elements at the end have to be delimited by spaces. But this works only for the Ints and not for the elements. When I type in
val x = new X
x go x.p1
I get following error:
java.lang.RuntimeException: string matching regex `\z' expected but `W' found
But when I type in
x go x.p1
I get:
java.lang.RuntimeException: string matching regex `\s+' expected but `W' found
At the end I want to have a Parser[Int ~ Int ~ List[List[Char]]]. Why does inserting white spaces in front of elem not work? And how can I get this code to work?

Just replace elems by a RegEx Parser :
import scala.util.parsing.combinator.JavaTokenParsers
class X extends JavaTokenParsers {
lazy val elems = "[#WB-]".r
lazy val lists = repsep(rep(elems), ",")
lazy val p1 = int ~ int ~ lists
def go[A](p: Parser[A]) = parseAll(p, "1 2 WB#,---,BBB") match {
case NoSuccess(msg, _) => sys.error(msg)
case _ =>
}
lazy val int: Parser[Int] =
wholeNumber ^^ {
try _.toInt catch {
case e: NumberFormatException => sys.error("invalid number")
}
}
}
i have removed p2 because is not useful now

Related

Troubles with collections type in pattern matching in Scala

I tried to do collection matching in Scala without using scala.reflect.ClassTag
case class Foo(name: String)
case class Bar(id: Int)
case class Items(items: Vector[AnyRef])
val foo = Vector(Foo("a"), Foo("b"), Foo("c"))
val bar = Vector(Bar(1), Bar(2), Bar(3))
val fc = Items(foo)
val bc = Items(bar)
we can not do this:
fc match {
case Items(x) if x.isInstanceOf[Vector[Foo]]
}
because:
Warning: non-variable type argument Foo in type scala.collection.immutable.Vector[Foo] (the underlying of Vector[Foo]) is unchecked since it is eliminated by erasure
and this:
fc match {
case Items(x: Vector[Foo]) =>
}
but we can do this:
fc match {
case Items(x#(_: Foo) +: _) => ...
case Items(x#(_: Bar) +: _) => ...
}
bc match {
case Items(x#(_: Foo) +: _) => ...
case Items(x#(_: Bar) +: _) => ...
}
As you can see, we are check - is collection Foo + vector or Bar + vector.
And here we are have some problems:
We can do Vector(Foo("1"), Bar(2)), and this is will be match with Foo.
We are still need "val result = x.asInstanceOf[Vector[Bar]]" class casting for result extraction
Is there are some more beautiful way?
Like this:
fc match {
case Items(x: Vector[Foo]) => // result is type of Vector[Foo] already
}
What you're doing here is fundamentally just kind of unpleasant, so I'm not sure making it possible to do it in a beautiful way is a good thing, but for what it's worth, Shapeless's TypeCase is a little nicer:
case class Foo(name: String)
case class Bar(id: Int)
case class Items(items: Vector[AnyRef])
val foo = Vector(Foo("a"), Foo("b"), Foo("c"))
val bar = Vector(Bar(1), Bar(2), Bar(3))
val fc = Items(foo)
val bc = Items(bar)
val FooVector = shapeless.TypeCase[Vector[Foo]]
val BarVector = shapeless.TypeCase[Vector[Bar]]
And then:
scala> fc match {
| case Items(FooVector(items)) => items
| case _ => Vector.empty
| }
res0: Vector[Foo] = Vector(Foo(a), Foo(b), Foo(c))
scala> bc match {
| case Items(FooVector(items)) => items
| case _ => Vector.empty
| }
res1: Vector[Foo] = Vector()
Note that while ClassTag instances can also be used in this way, they don't do what you want:
scala> val FooVector = implicitly[scala.reflect.ClassTag[Vector[Foo]]]
FooVector: scala.reflect.ClassTag[Vector[Foo]] = scala.collection.immutable.Vector
scala> fc match {
| case Items(FooVector(items)) => items
| case _ => Vector.empty
| }
res2: Vector[Foo] = Vector(Foo(a), Foo(b), Foo(c))
scala> bc match {
| case Items(FooVector(items)) => items
| case _ => Vector.empty
| }
res3: Vector[Foo] = Vector(Bar(1), Bar(2), Bar(3))
…which will of course throw ClassCastExceptions if you try to use res3.
This really isn't a nice thing to do, though—inspecting types at runtime undermines parametricity, makes your code less robust, etc. Type erasure is a good thing, and the only problem with type erasure on the JVM is that it's not more complete.
If you want something that is simple using implicit conversions. then try this!
implicit def VectorConversionI(items: Items): Vector[AnyRef] = items match { case x#Items(v) => v }
Example:
val fcVertor: Vector[AnyRef] = fc // Vector(Foo(a), Foo(b), Foo(c))
val bcVertor: Vector[AnyRef] = bc // Vector(Bar(1), Bar(2), Bar(3))

Scala parser combinator with recursive structures

I'm a beginner with Scala, and now learning Scala parser combinator, writing "MiniLogicParser", a mini parser for propositional logic formula. I am successful for parsing it partly, but can not convert to case class. I tried some codes like below.
import java.io._
import scala.util.parsing.combinator._
sealed trait Bool[+A]
case object True extends Bool[Nothing]
case class Var[A](label: A) extends Bool[A]
case class Not[A](child: Bool[A]) extends Bool[A]
case class And[A](children: List[Bool[A]]) extends Bool[A]
object LogicParser extends RegexParsers {
override def skipWhitespace = true
def formula = ( TRUE | not | and | textdata )
def TRUE = "TRUE"
def not : Parser[_] = NEG ~ formula ^^ {case ( "!" ~ formula) => Not(formula)}
def and : Parser[_] = LPARENTHESIS ~ formula ~ opt(CONJUNCT ~ formula) ~ RPARENTHESIS
def NEG = "!"
def CONJUNCT = "&&"
def LPARENTHESIS = '('
def RPARENTHESIS = ')'
def textdata = "[a-zA-Z0-9]+".r
def apply(input: String): Either[String, Any] = parseAll(formula, input) match {
case Success(logicData, next) => Right(logicData)
case NoSuccess(errorMessage, next) => Left(s"$errorMessage on line ${next.pos.line} on column ${next.pos.column}")
}
}
but, the compilation failed with the following error message
[error] ... MiniLogicParser.scala:15 type mismatch;
[error] found : Any
[error] required: Bool[?]
[error] def not : Parser[_] = NEG ~ formula ^^ {case ( "!" ~ formula) => Not(formula)}
I can partly understand the error message; i.e., it means for line 15 where I tried to convert the result of parsing to case class, type mismatch is occurring. However, I do not understand how to fix this error.
I've adapted your parser a little bit.
import scala.util.parsing.combinator._
sealed trait Bool[+A]
case object True extends Bool[Nothing]
case class Var[A](label: A) extends Bool[A]
case class Not[A](child: Bool[A]) extends Bool[A]
case class And[A](l: Bool[A], r: Bool[A]) extends Bool[A]
object LogicParser extends RegexParsers with App {
override def skipWhitespace = true
def NEG = "!"
def CONJUNCT = "&&"
def LP = '('
def RP = ')'
def TRUE = literal("TRUE") ^^ { case _ => True }
def textdata = "[a-zA-Z0-9]+".r ^^ { case x => Var(x) }
def formula: Parser[Bool[_]] = textdata | and | not | TRUE
def not = NEG ~ formula ^^ { case n ~ f => Not(f) }
def and = LP ~> formula ~ CONJUNCT ~ formula <~ RP ^^ { case f1 ~ c ~ f2 => And(f1, f2) }
def apply(input: String): Either[String, Any] = parseAll(formula, input) match {
case Success(logicData, next) => Right(logicData)
case NoSuccess(errorMessage, next) => Left(s"$errorMessage on line ${next.pos.line} on column ${next.pos.column}")
}
println(apply("TRUE")) // Right(Var(TRUE))
println(apply("(A && B)")) // Right(And(Var(A),Var(B)))
println(apply("((A && B) && C)")) // Right(And(And(Var(A),Var(B)),Var(C)))
println(apply("!(A && !B)")) // Right(Not(And(Var(A),Not(Var(B)))))
}
The child of the Not-node is of type Bool. In line 15 however, formula, the value that you want to pass to Not's apply method, is of type Any. You can restrict the extractor (i.e., the case-statement) to only match values of formula that are of type Bool by adding the type information after a colon:
case ( "!" ~ (formula: Bool[_]))
Hence, the not method would look like this:
def not : Parser[_] = NEG ~ formula ^^ {case ( "!" ~ (formula: Bool[_])) => Not(formula)}
However, now, e.g., "!TRUE" does not match anymore, because "TRUE" is not yet of type Bool. This can be fixed by extending your parser to convert the string to a Bool, e.g.,
def TRUE = "TRUE" ^^ (_ => True)

Getting separatedBy to work in parboiled2

This seems pretty simple!
class SeparatedParser(val input: ParserInput, val delimiter: String = ",") extends Parser {
def pipedField = rule { (zeroOrMore(field).separatedBy("|")) }
def field = rule { capture(zeroOrMore(noneOf(delimiter))) }
def d = delimiter
def record = rule {
field ~ d ~ pipedField ~ d ~ field ~ EOI
}
}
I try:
val parser = new SeparatedParser("""49798,piped1|piped2,sklw""")
val parsed = parser.record.run()
parsed match {
case Success(rel) => println(rel)
case Failure(pe:ParseError) =>println(parser.formatError(pe))
}
But I get:
49798 :: Vector(piped1|piped2) :: sklw :: HNil
I would expect the Vector to have two separate elements: piped1 and piped2.
What dumbass mistake am I making?

save all parse results into one map in scala

Suppose I've got following txt file:
--quest_29540602496284069
Operator Name : Kevin
Account Id: 1444
Text: This is Kevin and this my text.
Age: 16
--quest_=29540602496284069--
I want to transform it to simple scala map:
(pseudo-code)
{
quest_id: 29540602496284069
operation_name = Kevin
account_id: 1444
text: This is Kevin and this my text.
operator_age: 16
}
So, i started to created case content class in order to store it in target object for future use:
case class MapContent(map: Map[String, String])
Then, i have created scala class with extending RegexpParsers:
class OperatorParser extends RegexParsers {
def parseFullRequest(input: String): MapContent = parseAll(parseRequest, input) match {
case Success(result, _) => result
case NoSuccess(msg, _) => throw new SomeParserException(msg)
}
// main entry
def parseRequest: Parser[MapContent] = parseQuestBody ~ parseAnotherBody
def parseQuestBody: Parser[MapContent] = parseQuestId
def parseQuestId: Parser[MapContent] = "--quest_" ~> """.+\n?""".r ^^ { case res =>
MapContent(Map("quest_id" -> res.toString))
}
def parseAnotherBody: Parser[MapContent] = """.+""".r ^^ { case res =>
MapContent(Map("another_body" -> res.toString))
}
}
When i'm doing
parseQuestBody ~ parseAnotherBody
it causes an error that Operator[MapContent, MapContent] isn't an Operator[MapContent]
I need a solution how to store exactly in MapContent during the whole parse process. Is there a possible way to do that? For now i can only store the quest_id number and not able to continue next.
You can use ^^ like in other parsers:
def parseRequest: Parser[MapContent] = parseQuestBody ~ parseAnotherBody ^^ {
case res => MapContent(res._1.map ++ res._2.map)
}
or
def parseRequest: Parser[MapContent] = parseQuestBody ~ parseAnotherBody ^^ {
case a ~ b => MapContent(a.map ++ b.map)
}
It transforms "tuple" of MapContents into single MapContent
Edit:
what if i will have a lot of text values, so in scala it will be
looking like a.map ++ b.map ++ c.map ++ d... + f.. + ... + x1 ? is
there more common way?
Folding it is one of possibilities:
def parseRequest: Parser[MapContent] = parseQuestId ~ rep(parseAnotherBody) ~ parseQuestId ^^ {
case value1 ~ list1 ~ value2=> {
val merged = List(value1, value2) ++ list1
merged.foldLeft(MapContent(Map.empty[String, String]))((a, b) => MapContent(a.map ++ b.map))
}
}
It returns map with only 2 values because you can't store in map 2 values with same another_body key

scala parser combinator infinite loop

I'm trying to write a simple parser in scala but when I add a repeated token Scala seems to get stuck in an infinite loop.
I have 2 parse methods below. One uses rep(). The non repetitive version works as expected (not what I want though) using the rep() version results in an infinite loop.
EDIT:
This was a learning example where I tired to enforce the '=' was surrounded by whitespace.
If it is helpful this is my actual test file:
a = 1
b = 2
c = 1 2 3
I was able to parse: (with the parse1 method)
K = V
but then ran into this problem when tried to expand the exercise out to:
K = V1 V2 V3
import scala.util.parsing.combinator._
import scala.io.Source.fromFile
class MyParser extends RegexParsers {
override def skipWhitespace(): Boolean = { false }
def key: Parser[String] = """[a-zA-Z]+""".r ^^ { _.toString }
def eq: Parser[String] = """\s+=\s+""".r ^^ { _.toString.trim }
def string: Parser[String] = """[^ \t\n]*""".r ^^ { _.toString.trim }
def value: Parser[List[String]] = rep(string)
def foo(key: String, value: String): Boolean = {
println(key + " = " + value)
true
}
def parse1: Parser[Boolean] = key ~ eq ~ string ^^ { case k ~ eq ~ string => foo(k, string) }
def parse2: Parser[Boolean] = key ~ eq ~ value ^^ { case k ~ eq ~ value => foo(k, value.toString) }
def parseLine(line: String): Boolean = {
parse(parse2, line) match {
case Success(matched, _) => true
case Failure(msg, _) => false
case Error(msg, _) => false
}
}
}
object TestParser {
def usage() = {
System.out.println("<file>")
}
def main(args: Array[String]) : Unit = {
if (args.length != 1) {
usage()
} else {
val mp = new MyParser()
fromFile(args(0)).getLines().foreach { mp.parseLine }
println("done")
}
}
}
Next time, please provide some concrete examples, it's not obvious what your input is supposed to look like.
Meanwhile, you can try this, maybe you find it helpful:
import scala.util.parsing.combinator._
import scala.io.Source.fromFile
class MyParser extends JavaTokenParsers {
// override def skipWhitespace(): Boolean = { false }
def key: Parser[String] = """[a-zA-Z]+""".r ^^ { _.toString }
def eq: Parser[String] = "="
def string: Parser[String] = """[^ \t\n]+""".r
def value: Parser[List[String]] = rep(string)
def foo(key: String, value: String): Boolean = {
println(key + " = " + value)
true
}
def parse1: Parser[Boolean] = key ~ eq ~ string ^^ { case k ~ eq ~ string => foo(k, string) }
def parse2: Parser[Boolean] = key ~ eq ~ value ^^ { case k ~ eq ~ value => foo(k, value.toString) }
def parseLine(line: String): Boolean = {
parseAll(parse2, line) match {
case Success(matched, _) => true
case Failure(msg, _) => false
case Error(msg, _) => false
}
}
}
val mp = new MyParser()
for (line <- List("hey = hou", "hello = world ppl", "foo = bar baz blup")) {
println(mp.parseLine(line))
}
Explanation:
JavaTokenParsers and RegexParsers treat white space differently.
The JavaTokenParsers handles the white space for you, it's not specific for Java, it works for most non-esoteric languages. As long as you are not trying to parse Whitespace, JavaTokenParsers is a good starting point.
Your string definition included a *, which caused the infinite recursion.
Your eq definition included something that messed with the empty space handling (don't do this unless it's really necessary).
Furthermore, if you want to parse the whole line, you must call parseAll,
otherwise it parses only the beginning of the string in non-greedy manner.
Final remark: for parsing key-value pairs line by line, some String.split and
String.trim would be completely sufficient. Scala Parser Combinators are a little overkill for that.
PS: Hmm... Did you want to allow =-signs in your key-names? Then my version would not work here, because it does not enforce an empty space after the key-name.
This is not a duplicate, it's a different version with RegexParsers that takes care of whitespace explicitly
If you for some reason really care about the white space, then you could stick to the RegexParsers, and do the following (notice the skipWhitespace = false, explicit parser for whitespace ws, the two ws with squiglies around the equality sign, and the repsep with explicitly specified ws):
import scala.util.parsing.combinator._
import scala.io.Source.fromFile
class MyParser extends RegexParsers {
override def skipWhitespace(): Boolean = false
def ws: Parser[String] = "[ \t]+".r
def key: Parser[String] = """[a-zA-Z]+""".r ^^ { _.toString }
def eq: Parser[String] = ws ~> """=""" <~ ws
def string: Parser[String] = """[^ \t\n]+""".r
def value: Parser[List[String]] = repsep(string, ws)
def foo(key: String, value: String): Boolean = {
print(key + " = " + value)
true
}
def parse1: Parser[Boolean] = (key ~ eq ~ string) ^^ { case k ~ e ~ v => foo(k, v) }
def parse2: Parser[Boolean] = (key ~ eq ~ value) ^^ { case k ~ e ~ v => foo(k, v.toString) }
def parseLine(line: String): Boolean = {
parseAll(parse2, line) match {
case Success(matched, _) => true
case Failure(msg, _) => false
case Error(msg, _) => false
}
}
}
val mp = new MyParser()
for (line <- List("hey = hou", "hello = world ppl", "foo = bar baz blup", "foo= bar baz", "foo =bar baz")) {
println(" (Matches: " + mp.parseLine(line) + ")")
}
Now the parser rejects the lines where there is no whitespace around the equal sign:
hey = List(hou) (Matches: true)
hello = List(world, ppl) (Matches: true)
foo = List(bar, baz, blup) (Matches: true)
(Matches: false)
(Matches: false)
The bug with * instead of + in string has been removed, just like in the previous version.