equivalent of pythons repr() in scala - scala

Is it there an equivalent of Pythons repr function in scala?
Ie a function which you can give any scala object an it will produce a string representation of the object which is valid scala code.
eg:
val l = List(Map(1 -> "a"))
print(repr(l))
Would produce
List(Map(1 -> "a"))

There is mostly only the toString method on every object. (Inherited from Java.) This may or may not result in a parseable representation. In most generic cases it probably won’t; there is no real convention for this as there is in Python but some of the collection classes at least try to. (As long as they are not infinite.)
The point where it breaks down is of course already reached when Strings are involved
"some string".toString == "some string"
however, for a proper representation, one would need
repr("some string") == "\"some string\""
As far as I know there is no such thing in Scala. Some of the serialisation libraries might be of some help for this, though.

Based on the logic at Java equivalent of Python repr()?, I wrote this little function:
object Util {
def repr(s: String): String = {
if (s == null) "null"
else s.toList.map {
case '\0' => "\\0"
case '\t' => "\\t"
case '\n' => "\\n"
case '\r' => "\\r"
case '\"' => "\\\""
case '\\' => "\\\\"
case ch if (' ' <= ch && ch <= '\u007e') => ch.toString
case ch => {
val hex = Integer.toHexString(ch.toInt)
"\\u%s%s".format("0" * (4 - hex.length), hex)
}
}.mkString("\"", "", "\"")
}
}
I've tried it with a few values and it seems to work, though I'm pretty sure sticking in a Unicode character above U+FFFF would cause problems.

If you deal with case classes, you can mix in the following trait StringMaker, so that calling toString on such case classes will work even if their arguments are strings:
trait StringMaker {
override def toString = {
this.getClass.getName + "(" +
this.getClass.getDeclaredFields.map{
field =>
field.setAccessible(true)
val name = field.getName
val value = field.get(this)
value match {
case s: String => "\"" + value + "\"" //Or Util.repr(value) see the other answer
case _ => value.toString
}
}
.reduceLeft{_+", "+_} +
")"
}
}
trait Expression
case class EString(value: String, i: Int) extends Expression with StringMaker
case class EStringBad(value: String, i: Int) extends Expression //w/o StringMaker
val c_good = EString("641", 151)
val c_bad = EStringBad("641", 151)
will result in:
c_good: EString = EString("641", 151)
c_bad: EStringBad = EStringBad(641,151)
So you can parse back the firsst expression, but not the first one.

No, there is no such feature in Scala.

Related

dynamic string interpolation

I would like to pretty-print a Product, such as a case class, so I create the following trait:
trait X extends Product {
def fmtStrs =
productIterator map {
case _ : Double => "%8.2f"
case _ => "%4s"
} map (_ + separator) toSeq
override def toString = {
new StringContext("" +: fmtStrs : _*) f (productIterator.toSeq : _*)
}
}
This uses string interpolation as described in the ScalaDoc for StringContext.
But this won't compile, with this cryptic error:
Error:(69, 70) too many arguments for interpolated string
new StringContext("" +: fmtStrs : _*) f (productIterator.toSeq : _*)
Is this a bug, or limitation of a macro? Note that doing the following works fine, so I suspect this may be related to the variable argument list:
scala> val str2 = StringContext("","%4s,","%8.2f").f(1,23.4)
str2: String = " 1, 23.40"
The reason f is a macro is so that it can give you an error when types of format specifiers and arguments don't match, and this isn't possible to check by looking at ("" +: fmtStrs : _*) and (productIterator.toSeq : _*), so it isn't particularly surprising this doesn't work. The error message could be clearer, so let's see what exactly happens.
If you look at the implementation of f (it took me some time to actually find it, I finally did by searching for the error message), you'll see
c.macroApplication match {
//case q"$_(..$parts).f(..$args)" =>
case Applied(Select(Apply(_, parts), _), _, argss) =>
val args = argss.flatten
def badlyInvoked = (parts.length != args.length + 1) && truly {
def because(s: String) = s"too $s arguments for interpolated string"
val (p, msg) =
if (parts.length == 0) (c.prefix.tree.pos, "there are no parts")
else if (args.length + 1 < parts.length)
(if (args.isEmpty) c.enclosingPosition else args.last.pos, because("few"))
else (args(parts.length-1).pos, because("many"))
c.abort(p, msg)
}
if (badlyInvoked) c.macroApplication else interpolated(parts, args)
With your call you have a single tree in both parts and argss, and parts.length != args.length + 1 is true, so badlyInvoked is true.
s doesn't care what its arguments look like, so it's just a method and your scenario works.

Combining a Parser using andThen to create another Parser of different type

I'm trying to parse a data using combinator parsers to return a
Parser[java.util.Date]
I do that in two phases, first, I parse a year using a simpleYear
Parser, then I tried to plug in the result of this simple parser into
andThen, I would then manipulate this input to have as output of this andThen
a ParseResult[Date]:
Sadly I get the following error from the compiler at the line of declaration:
type mismatch; found :
parser.DateParser.Input ⇒ parser.DateParser.ParseResult[java.util.Date] (which
expands to)
scala.util.parsing.input.Reader[Char] ⇒ parser.DateParser.ParseResult[java.util. required: parser.DateParser.Parser[java.util.Date]
here is the code:
object DateParser extends RegexParsers {
val formatter: SimpleDateFormat = new SimpleDateFormat("yyyy-MM-dd")
def year = """\d{4}""".r
def month: Parser[String] =
def day = """[0-2]\d""".r | """3[01]""".r
def month = """0\d""".r | """1[0-2]""".r
def simpleDate: Parser[String] =
(year ~ "-" ~ month ~ "-" ~ day) ^^
{ case y ~ "-" ~ m ~ "-" ~ d => y + "-" + m + "-" + d }
def date: Parser[Date] = simpleDate andThen {
case Success(s, in) =>
val x: ParseResult[Date] = try {
Success(formatter.parse(s), in)
} catch {
case pe: ParseException => Failure("date format invalid", in)
}
x
case f: Failure => f
}
}
It seems that the scala compiler can't do an implicit conversion itself
of the type of date into a Parser[Date](maybe because of the try/catch?)
Is there some other way to do what I want to do?
Parser[T] is a subclass of a function Input => ParseResult[T] and the method andThen you are using comes from the function. You are passing to it a function ParseResult[String] => ParseResult[Date], so you get back Input => ParseResult[Date], which doesn't match the type Parser[Date], and that's why you get this error.
But you can simply wrap a function of type Input => ParseResult[T] in a Parser method to get a Parser[T]. So you can define date like this:
def date: Parser[Date] = Parser(simpleDate andThen {
// Cleaned up `Success` case a bit
case Success(s, in) =>
try Success(formatter.parse(s), in)
catch {
case pe: ParseException => Failure("date format invalid", in)
}
// It's better to use `NoSuccess` instead of `Failure`,
// to cover the `Error` case as well.
case f: NoSuccess => f
})
That said, it's not the best/cleanest method. As you want to call a function on the parser result to modify it somehow, you can use the Parser's methods map and flatMap or their equivalents (map equivalent to ^^, flatMap equivalent to into and >>). Those have the same idea as map and flatMap of other Scala classes such as Try or Future.
In this case you have to account for the possibility of failure, so you have to use flatMap. A definition of date using flatMap may look like this:
def date: Parser[Date] = simpleDate >> (s =>
try success(formatter.parse(s))
catch { case pe: ParseException => failure("date format invalid") })
Also, you may want (if you hadn't already done it yourself) to set formatter to be non-lenient: formatter.setLenient(false). Otherwise it will do things like parsing 2000-02-31 as the 2nd of March!

Filtering inside `for` with pattern matching

I am reading a TSV file and using using something like this:
case class Entry(entryType: Int, value: Int)
def filterEntries(): Iterator[Entry] = {
for {
line <- scala.io.Source.fromFile("filename").getLines()
} yield new Entry(line.split("\t").map(x => x.toInt))
}
Now I am both interested in filtering out entries whose entryType are set to 0 and ignoring lines with column count greater or lesser than 2 (that does not match the constructor). I was wondering if there's an idiomatic way to achieve this may be using pattern matching and unapply method in a companion object. The only thing I can think of is using .filter on the resulting iterator.
I will also accept solution not involving for loop but that returns Iterator[Entry]. They solutions must be tolerant to malformed inputs.
This is more state-of-arty:
package object liner {
implicit class R(val sc: StringContext) {
object r {
def unapplySeq(s: String): Option[Seq[String]] = sc.parts.mkString.r unapplySeq s
}
}
}
package liner {
case class Entry(entryType: Int, value: Int)
object I {
def unapply(s: String): Option[Int] = util.Try(s.toInt).toOption
}
object Test extends App {
def lines = List("1 2", "3", "", " 4 5 ", "junk", "0, 100000", "6 7 8")
def entries = lines flatMap {
case r"""\s*${I(i)}(\d+)\s+${I(j)}(\d+)\s*""" if i != 0 => Some(Entry(i, j))
case __________________________________________________ => None
}
Console println entries
}
}
Hopefully, the regex interpolator will make it into the standard distro soon, but this shows how easy it is to rig up. Also hopefully, a scanf-style interpolator will allow easy extraction with case f"$i%d".
I just started using the "elongated wildcard" in patterns to align the arrows.
There is a pupal or maybe larval regex macro:
https://github.com/som-snytt/regextractor
You can create variables in the head of the for-comprehension and then use a guard:
edit: ensure length of array
for {
line <- scala.io.Source.fromFile("filename").getLines()
arr = line.split("\t").map(x => x.toInt)
if arr.size == 2 && arr(0) != 0
} yield new Entry(arr(0), arr(1))
I have solved it using the following code:
import scala.util.{Try, Success}
val lines = List(
"1\t2",
"1\t",
"2",
"hello",
"1\t3"
)
case class Entry(val entryType: Int, val value: Int)
object Entry {
def unapply(line: String) = {
line.split("\t").map(x => Try(x.toInt)) match {
case Array(Success(entryType: Int), Success(value: Int)) => Some(Entry(entryType, value))
case _ =>
println("Malformed line: " + line)
None
}
}
}
for {
line <- lines
entryOption = Entry.unapply(line)
if entryOption.isDefined
} yield entryOption.get
The left hand side of a <- or = in a for-loop may be a fully-fledged pattern. So you may write this:
def filterEntries(): Iterator[Int] = for {
line <- scala.io.Source.fromFile("filename").getLines()
arr = line.split("\t").map(x => x.toInt)
if arr.size == 2
// now you may use pattern matching to extract the array
Array(entryType, value) = arr
if entryType == 0
} yield Entry(entryType, value)
Note that this solution will throw a NumberFormatException if a field is not convertible to an Int. If you do not want that, you'll have to encapsulate x.toInt with a Try and pattern match again.

Pattern matching on Class[_] type?

I'm trying to use Scala pattern matching on Java Class[_] (in context of using Java reflection from Scala) but I'm getting some unexpected error. The following gives "unreachable code" on the line with case jLong
def foo[T](paramType: Class[_]): Unit = {
val jInteger = classOf[java.lang.Integer]
val jLong = classOf[java.lang.Long]
paramType match {
case jInteger => println("int")
case jLong => println("long")
}
}
Any ideas why this is happening ?
The code works as expected if you change the variable names to upper case (or surround them with backticks in the pattern):
scala> def foo[T](paramType: Class[_]): Unit = {
| val jInteger = classOf[java.lang.Integer]
| val jLong = classOf[java.lang.Long]
| paramType match {
| case `jInteger` => println("int")
| case `jLong` => println("long")
| }
| }
foo: [T](paramType: Class[_])Unit
scala> foo(classOf[java.lang.Integer])
int
In your code the jInteger in the first pattern is a new variable—it's not the jInteger from the surrounding scope. From the specification:
8.1.1 Variable Patterns
... A variable pattern x is a simple identifier which starts with a lower case letter. It
matches any value, and binds the variable name to that value.
...
8.1.5 Stable Identifier Patterns
... To resolve the syntactic overlap with a variable pattern, a stable
identifier pattern may not be a simple name starting with a lower-case
letter. However, it is possible to enclose a such a variable name in
backquotes; then it is treated as a stable identifier pattern.
See this question for more information.
On your pattern matching, each of these 2 cases try to create place holder names instead of matching the class type as expected.
If you use upper case in the starting character, you'll be fine
def foo[T](paramType: Class[_]): Unit = {
val JInteger = classOf[Int]
val JLong = classOf[Long]
paramType match {
case JInteger => println("int")
case JLong => println("long")
}
}
scala> foo(1.getClass)
int
JMPL is simple java library, which could emulate some of the features pattern matching, using Java 8 features.
matches(data).as(
Integer.class, i -> { System.out.println(i * i); },
Byte.class, b -> { System.out.println(b * b); },
Long.class, l -> { System.out.println(l * l); },
String.class, s -> { System.out.println(s * s); },
Null.class, () -> { System.out.println("Null value "); },
Else.class, () -> { System.out.println("Default value: " + data); }
);

Accessing Scala Parser regular expression match data

I wondering if it's possible to get the MatchData generated from the matching regular expression in the grammar below.
object DateParser extends JavaTokenParsers {
....
val dateLiteral = """(\d{4}[-/])?(\d\d[-/])?(\d\d)""".r ^^ {
... get MatchData
}
}
One option of course is to perform the match again inside the block, but since the RegexParser has already performed the match I'm hoping that it passes the MatchData to the block, or stores it?
Here is the implicit definition that converts your Regex into a Parser:
/** A parser that matches a regex string */
implicit def regex(r: Regex): Parser[String] = new Parser[String] {
def apply(in: Input) = {
val source = in.source
val offset = in.offset
val start = handleWhiteSpace(source, offset)
(r findPrefixMatchOf (source.subSequence(start, source.length))) match {
case Some(matched) =>
Success(source.subSequence(start, start + matched.end).toString,
in.drop(start + matched.end - offset))
case None =>
Failure("string matching regex `"+r+"' expected but `"+in.first+"' found", in.drop(start - offset))
}
}
}
Just adapt it:
object X extends RegexParsers {
/** A parser that matches a regex string and returns the Match */
def regexMatch(r: Regex): Parser[Regex.Match] = new Parser[Regex.Match] {
def apply(in: Input) = {
val source = in.source
val offset = in.offset
val start = handleWhiteSpace(source, offset)
(r findPrefixMatchOf (source.subSequence(start, source.length))) match {
case Some(matched) =>
Success(matched,
in.drop(start + matched.end - offset))
case None =>
Failure("string matching regex `"+r+"' expected but `"+in.first+"' found", in.drop(start - offset))
}
}
}
val t = regexMatch("""(\d\d)/(\d\d)/(\d\d\d\d)""".r) ^^ { case m => (m.group(1), m.group(2), m.group(3)) }
}
Example:
scala> X.parseAll(X.t, "23/03/1971")
res8: X.ParseResult[(String, String, String)] = [1.11] parsed: (23,03,1971)
No, you can't do this. If you look at the definition of the Parser used when you convert a regex to a Parser, it throws away all context and just returns the full matched string:
http://lampsvn.epfl.ch/trac/scala/browser/scala/tags/R_2_7_7_final/src/library/scala/util/parsing/combinator/RegexParsers.scala?view=markup#L55
You have a couple of other options, though:
break up your parser into several smaller parsers (for the tokens you actually want to extract)
define a custom parser that extracts the values you want and returns a domain object instead of a string
The first would look like
val separator = "-" | "/"
val year = ("""\d{4}"""r) <~ separator
val month = ("""\d\d"""r) <~ separator
val day = """\d\d"""r
val date = ((year?) ~ (month?) ~ day) map {
case year ~ month ~ day =>
(year.getOrElse("2009"), month.getOrElse("11"), day)
}
The <~ means "require these two tokens together, but only give me the result of the first one.
The ~ means "require these two tokens together and tie them together in a pattern-matchable ~ object.
The ? means that the parser is optional and will return an Option.
The .getOrElse bit provides a default value for when the parser didn't define a value.
When a Regex is used in a RegexParsers instance, the implicit def regex(Regex): Parser[String] in RegexParsers is used to appoly that Regex to the input. The Match instance yielded upon successful application of the RE at the current input is used to construct a Success in the regex() method, but only its "end" value is used, so any captured sub-matches are discarded by the time that method returns.
As it stands (in the 2.7 source I looked at), you're out of luck, I believe.
I ran into a similar issue using scala 2.8.1 and trying to parse input of the form "name:value" using the RegexParsers class:
package scalucene.query
import scala.util.matching.Regex
import scala.util.parsing.combinator._
object QueryParser extends RegexParsers {
override def skipWhitespace = false
private def quoted = regex(new Regex("\"[^\"]+"))
private def colon = regex(new Regex(":"))
private def word = regex(new Regex("\\w+"))
private def fielded = (regex(new Regex("[^:]+")) <~ colon) ~ word
private def term = (fielded | word | quoted)
def parseItem(str: String) = parse(term, str)
}
It seems that you can grab the matched groups after parsing like this:
QueryParser.parseItem("nameExample:valueExample") match {
case QueryParser.Success(result:scala.util.parsing.combinator.Parsers$$tilde, _) => {
println("Name: " + result.productElement(0) + " value: " + result.productElement(1))
}
}