Trying to understand scala ^^ syntax - scala

I'm beginner in scala and looking at this tutorial :
http://enear.github.io/2016/03/31/parser-combinators/
Event it is explained just below :
The ^^ operator acts as a map over the parse result. The regex
"[a-zA-Z_][a-zA-Z0-9_]*".r is implicitly converted to an instance of
Parser[String], on which we map a function (String => IDENTIFIER),
thus returning a instance of Parser[IDENTIFIER].
I dont understand this code snippet :
def identifier: Parser[IDENTIFIER] = {
"[a-zA-Z_][a-zA-Z0-9_]*".r ^^ { str => IDENTIFIER(str) }
}
Can someone explain the ^^ operator and how it is mapped to a block code ?
Thanks

It defines the operation that needs to be performed when the left-hand side expression is evaluated.
For instance, we have some code like this:
def symbol: Parser[Any] = "+" | "-" | "*"
def number: Parser[Int] = """(0|[1-9]\d*)""".r ^^ {string => string.toInt }
def expression = {
number ~ symbol ~ number ^^ { case firstOperand ~ operator ~ secondOperand =>
firstOperand + secondOperand
}
}
So, here in the number we will convert String to Int. This means that when we will use our parser like this:
parse(expression, "3 + 2")
It will return us 5, if we will not convert it and leave it as Strings we will get "32".

Related

SCALA: How to convert a Parser Combinator result to Scala List[String]?

I am trying to write a parser for a language very similar to Milner's CCS. Basically what I am parsing so far are expressions of the following sort:
a.b.a.1
a.0
An expression must start with a letter (excluding t) and could have any number of letters following the first letter (separated by a '.'). The Expression must terminate with a digit (for simplicity I chose digits between 0 and 2 for now). I want to use Parser Combinators for Scala, however this is the first time that I am working with them. This is what I have so far:
import scala.util.parsing.combinator._
class SimpleParser extends RegexParsers {
def alpha: Parser[String] = """[^t]{1}""".r ^^ { _.toString }
def digit: Parser[Int] = """[0-2]{1}""".r ^^ { _.toInt }
def expr: Parser[Any] = alpha ~ "." ~ digit ^^ {
case al ~ "." ~ di => List(al, di)
}
def simpleExpression: Parser[Any] = alpha ~ "." ~ rep(alpha ~ ".") ~ digit //^^ { }
}
As you can see in def expr :Parser[Any] I am trying to return the result as a list, since Lists in Scala are very easy to work with (in my opinion). Is this the correct way how to convert a Parser[Any] result to a List? Can anyone give me any tips on how I can do this for def simpleExpression:Parser[Any].
The main reason why I want to use Lists is because after parsing and Expression I want to be able to consume it. For example, given the expression a.b.1, if I am given an 'a', I would like to consume the expression to end up with a new expression: b.1 (i.e. a.b.1 ->(a)-> b.1). The idea behind this is to simulate finite state automatas. Any tips on how I may improve my implementation are appreciated.
To keeps things type safe, I recommend a parser that produces a tuple of a list of strings and an int. That is, the input a.b.a.1 would get parsed as (List("a", "b", "a"), 1). Note also that the regex for alpha was modified to exclude anything that is not a lowercase letter (in addition to t).
class SimpleParser extends RegexParsers {
def alpha: Parser[String] = """[a-su-z]{1}""".r ^^ { _.toString }
def digit: Parser[Int] = """[0-2]{1}""".r ^^ { _.toInt }
def repAlpha: Parser[List[String]] = rep1sep(alpha, ".")
def expr: Parser[(List[String], Int)] = repAlpha ~ "." ~ digit ^^ {
case alphas ~ _ ~ num =>
(alphas, num)
}
}
With an instance of this SimpleParser, here's the output I got:
println(parser.parse(parser.expr, "a.b.a.1"))
// [1.8] parsed: (List(a, b, a),1)
println(parser.parse(parser.expr, "a.0"))
// [1.4] parsed: (List(a),0)

Ignoring prefixes in a JavaToken combinator parser

I'm trying to use a JavaToken combinator parser to pull out a particular match that's in the middle of larger string (ie ignore a random set of prefix chars). However I can't get it working and think I'm getting caught out by a greedy parser and/or CRs LFs. (the prefix chars can be basically anything). I have:
class RuleHandler extends JavaTokenParsers {
def allowedPrefixChars = """[a-zA-Z0-9=*+-/<>!\_(){}~\\s]*""".r
def findX: Parser[Double] = allowedPrefixChars ~ "(x=" ~> floatingPointNumber <~ ")" ^^ { case num => num.toDouble}
}
and then in my test case ..
"when looking for the X value" in {
"must find and correctly interpret X" in {
val testString =
"""
|Looking (only)
|for (x=45) within
|this string
""".stripMargin
val answer = ruleHandler.parse(ruleHandler.findX, testString)
System.out.println(" X value is : " + answer.toString)
}
}
I think it's similar to this SO question. Can anyone see whats wrong pls ? Tks.
First, you should not escape "\\s" twice inside """ """:
def allowedPrefixChars = """[a-zA-Z0-9=*+-/<>!\_(){}~\s]*?""".r
In your case it was interpreted separately "\" or "s" (s as symbol, not \s)
Second, your allowedPrefixChars parser includes (, x, =, so it captures the whole string, including (x=, nothing is left to subsequent parsers.
The solution is to be more concrete about prefix you want:
object ruleHandler extends JavaTokenParsers {
def allowedPrefixChar: Parser[String] = """[a-zA-Z0-9=*+-/<>!\_){}~\s]""".r //no "(" here
def findX: Parser[Double] = rep(allowedPrefixChar | "\\((?!x=)".r ) ~ "(x=" ~> floatingPointNumber <~ ")" ^^ { case num => num.toDouble}
}
ruleHandler.parse(ruleHandler.findX, testString)
res14: ruleHandler.ParseResult[Double] = [3.11] parsed: 45.0
I've told the parser to ignore (, that has x= going after (it's just negative lookahead).
Alternative:
"""\(x=(.*?)\)""".r.findAllMatchIn(testString).map(_.group(1).toDouble).toList
res22: List[Double] = List(45.0)
If you want to use parsers correctly, I would recommend you to describe the whole BNF grammar (with all possible (,) and = usages) - not just fragment. For example, include (only) into your parser if it's keyword, "(" ~> valueName <~ "=" ~ value to get value. Don't forget that scala-parser is intended to return you AST, not just some matched value. Pure regexps are better for regular matching from unstructured data.
Example how it would like to use parsers in correct way (didn't try to compile):
trait Command
case class Rule(name: String, value: Double) extends Command
case class Directive(name: String) extends Command
class RuleHandler extends JavaTokenParsers { //why `JavaTokenParsers` (not `RegexParsers`) if you don't use tokens from Java Language Specification ?
def string = """[a-zA-Z0-9*+-/<>!\_{}~\s]*""".r //it's still wrong you should use some predefined Java-like literals from **JavaToken**Parsers
def rule = "(" ~> string <~ "=" ~> string <~ ")" ^^ { case name ~ num => Rule(name, num.toDouble} }
def directive = "(" ~> string <~ ")" ^^ { case name => Directive(name) }
def commands: Parser[Command] = repsep(rule | directive, string)
}
If you need to process natural language (Chomsky type-0), scalanlp or something similar fits better.

How to ignore single line comments in a parser-combinator

I have a working parser, but I've just realised I do not cater for comments. In the DSL I am parsing, comments start with a ; character. If a ; is encountered, the rest of the line is ignored (not all of it however, unless the first character is ;).
I am extending RegexParsers for my parser and ignoring whitespace (the default way), so I am losing the new line characters anyway. I don't wish to modify each and every parser I have to cater for the possibility of comments either, because statements can span across multiple lines (thus each part of each statement may end with a comment). Is there any clean way to acheive this?
One thing that may influence your choice is whether comments can be found within your valid parsers. For instance let's say you have something like:
val p = "(" ~> "[a-z]*".r <~ ")"
which would parse something like ( abc ) but because of comments you could actually encounter something like:
( ; comment goes here
abc
)
Then I would recommend using a TokenParser or one of its subclass. It's more work because you have to provide a lexical parser that will do a first pass to discard the comments. But it is also more flexible if you have nested comments or if the ; can be escaped or if the ; can be inside a string literal like:
abc = "; don't ignore this" ; ignore this
On the other hand, you could also try to override the value of whitespace to be something like
override protected val whiteSpace = """(\s|;.*)+""".r
Or something along those lines.
For instance using the example from the RegexParsers scaladoc:
import scala.util.parsing.combinator.RegexParsers
object so1 {
Calculator("""(1 + ; foo
(1 + 2))
; bar""")
}
object Calculator extends RegexParsers {
override protected val whiteSpace = """(\s|;.*)+""".r
def number: Parser[Double] = """\d+(\.\d*)?""".r ^^ { _.toDouble }
def factor: Parser[Double] = number | "(" ~> expr <~ ")"
def term: Parser[Double] = factor ~ rep("*" ~ factor | "/" ~ factor) ^^ {
case number ~ list => (number /: list) {
case (x, "*" ~ y) => x * y
case (x, "/" ~ y) => x / y
}
}
def expr: Parser[Double] = term ~ rep("+" ~ log(term)("Plus term") | "-" ~ log(term)("Minus term")) ^^ {
case number ~ list => list.foldLeft(number) { // same as before, using alternate name for /:
case (x, "+" ~ y) => x + y
case (x, "-" ~ y) => x - y
}
}
def apply(input: String): Double = parseAll(expr, input) match {
case Success(result, _) => result
case failure: NoSuccess => scala.sys.error(failure.msg)
}
}
This prints:
Plus term --> [2.9] parsed: 2.0
Plus term --> [2.10] parsed: 3.0
res0: Double = 4.0
Just filter out all the comments with a regex before you pass the code into your parser.
def removeComments(input: String): String = {
"""(?ms)\".*?\"|;.*?$|.+?""".r.findAllIn(input).map(str => if(str.startsWith(";")) "" else str).mkString
}
val code =
"""abc "def; ghij"
abc ;this is a comment
def"""
println(removeComments(code))

equivalent of pythons repr() in scala

Is it there an equivalent of Pythons repr function in scala?
Ie a function which you can give any scala object an it will produce a string representation of the object which is valid scala code.
eg:
val l = List(Map(1 -> "a"))
print(repr(l))
Would produce
List(Map(1 -> "a"))
There is mostly only the toString method on every object. (Inherited from Java.) This may or may not result in a parseable representation. In most generic cases it probably won’t; there is no real convention for this as there is in Python but some of the collection classes at least try to. (As long as they are not infinite.)
The point where it breaks down is of course already reached when Strings are involved
"some string".toString == "some string"
however, for a proper representation, one would need
repr("some string") == "\"some string\""
As far as I know there is no such thing in Scala. Some of the serialisation libraries might be of some help for this, though.
Based on the logic at Java equivalent of Python repr()?, I wrote this little function:
object Util {
def repr(s: String): String = {
if (s == null) "null"
else s.toList.map {
case '\0' => "\\0"
case '\t' => "\\t"
case '\n' => "\\n"
case '\r' => "\\r"
case '\"' => "\\\""
case '\\' => "\\\\"
case ch if (' ' <= ch && ch <= '\u007e') => ch.toString
case ch => {
val hex = Integer.toHexString(ch.toInt)
"\\u%s%s".format("0" * (4 - hex.length), hex)
}
}.mkString("\"", "", "\"")
}
}
I've tried it with a few values and it seems to work, though I'm pretty sure sticking in a Unicode character above U+FFFF would cause problems.
If you deal with case classes, you can mix in the following trait StringMaker, so that calling toString on such case classes will work even if their arguments are strings:
trait StringMaker {
override def toString = {
this.getClass.getName + "(" +
this.getClass.getDeclaredFields.map{
field =>
field.setAccessible(true)
val name = field.getName
val value = field.get(this)
value match {
case s: String => "\"" + value + "\"" //Or Util.repr(value) see the other answer
case _ => value.toString
}
}
.reduceLeft{_+", "+_} +
")"
}
}
trait Expression
case class EString(value: String, i: Int) extends Expression with StringMaker
case class EStringBad(value: String, i: Int) extends Expression //w/o StringMaker
val c_good = EString("641", 151)
val c_bad = EStringBad("641", 151)
will result in:
c_good: EString = EString("641", 151)
c_bad: EStringBad = EStringBad(641,151)
So you can parse back the firsst expression, but not the first one.
No, there is no such feature in Scala.

Grammars, Scala Parsing Combinators and Orderless Sets

I'm writing an application that will take in various "command" strings. I've been looking at the Scala combinator library to tokenize the commands. I find in a lot of cases I want to say: "These tokens are an orderless set, and so they can appear in any order, and some might not appear".
With my current knowledge of grammars I would have to define all combinations of sequences as such (pseudo grammar):
command = action~content
action = alphanum
content = (tokenA~tokenB~tokenC | tokenB~tokenC~tokenA | tokenC~tokenB~tokenA ....... )
So my question is, considering tokenA-C are unique, is there a shorter way to define a set of any order using a grammar?
You can use the "Parser.^?" operator to check a group of parse elements for duplicates.
def tokens = tokenA | tokenB | tokenC
def uniqueTokens = (tokens*) ^? (
{ case t if (t == t.removeDuplicates) => t },
{ "duplicate tokens found: " + _ })
Here is an example that allows you to enter any of the four stooges in any order, but fails to parse if a duplicate is encountered:
package blevins.example
import scala.util.parsing.combinator._
case class Stooge(name: String)
object StoogesParser extends RegexParsers {
def moe = "Moe".r
def larry = "Larry".r
def curly = "Curly".r
def shemp = "Shemp".r
def stooge = ( moe | larry | curly | shemp ) ^^ { case s => Stooge(s) }
def certifiedStooge = stooge | """\w+""".r ^? (
{ case s: Stooge => s },
{ "not a stooge: " + _ })
def stooges = (certifiedStooge*) ^? (
{ case x if (x == x.removeDuplicates) => x.toSet },
{ "duplicate stooge in: " + _ })
def parse(s: String): String = {
parseAll(stooges, new scala.util.parsing.input.CharSequenceReader(s)) match {
case Success(r,_) => r.mkString(" ")
case Failure(r,_) => "failure: " + r
case Error(r,_) => "error: " + r
}
}
}
And some example usage:
package blevins.example
object App extends Application {
def printParse(s: String): Unit = println(StoogesParser.parse(s))
printParse("Moe Shemp Larry")
printParse("Moe Shemp Shemp")
printParse("Curly Beyonce")
/* Output:
Stooge(Moe) Stooge(Shemp) Stooge(Larry)
failure: duplicate stooge in: List(Stooge(Moe), Stooge(Shemp), Stooge(Shemp))
failure: not a stooge: Beyonce
*/
}
There are ways around it. Take a look at the parser here, for example. It accepts 4 pre-defined numbers, which may appear in any other, but must appear once, and only once.
OTOH, you could write a combinator, if this pattern happens often:
def comb3[A](a: Parser[A], b: Parser[A], c: Parser[A]) =
a ~ b ~ c | a ~ c ~ b | b ~ a ~ c | b ~ c ~ a | c ~ a ~ b | c ~ b ~ a
I would not try to enforce this requirement syntactically. I'd write a production that admits multiple tokens from the set allowed and then use a non-parsing approach to ascertaining the acceptability of the keywords actually given. In addition to allowing a simpler grammar, it will allow you to more easily continue parsing after emitting a diagnostic about the erroneous usage.
Randall Schulz
I don't know what kind of constructs you want to support, but I gather you should be specifying a more specific grammar. From your comment to another answer:
todo message:link Todo class to database
I guess you don't want to accept something like
todo message:database Todo to link class
So you probably want to define some message-level keywords like "link" and "to"...
def token = alphanum~':'~ "link" ~ alphanum ~ "class" ~ "to" ~ alphanum
^^ { (a:String,b:String,c:String) => /* a == "message", b="Todo", c="database" */ }
I guess you would have to define your grammar at that level.
You could of course write a combination rule that does this for you if you encounter this situation frequently.
On the other hand, maybe the option exists to make "tokenA..C" just "token" and then differentiate inside the handler of "token"