Idiomatic Scala way of deserializing delimited strings into case classes

Idiomatic Scala way of deserializing delimited strings into case classes - scala

Suppose I was dealing with a simple colon-delimited text protocol that looked something like:
Event:005003:information:2013 12 06 12 37 55:n3.swmml20861:1:Full client swmml20861 registered [entry=280 PID=20864 queue=0x4ca9001b]
RSET:m3node:AUTRS:1-1-24:A:0:LOADSHARE:INHIBITED:0
M3UA_IP_LINK:m3node:AUT001LKSET1:AUT001LK1:r
OPC:m3node:1-10-2(P):A7:NAT0
....
I'd like to deserialize each line as an instance of a case class, but in a type-safe way. My first attempt uses type classes to define 'read' methods for each possible type that I can encounter, in addition to the 'tupled' method on the case class to get back a function that can be applied to a tuple of arguments, something like the following:
case class Foo(a: String, b: Integer)
trait Reader[T] {
def read(s: String): T
}
object Reader {
implicit object StringParser extends Reader[String] { def read(s: String): String = s }
implicit object IntParser extends Reader[Integer] { def read(s: String): Integer = s.toInt }
}
def create[A1, A2, Ret](fs: Seq[String], f: ((A1, A2)) => Ret)(implicit A1Reader: Reader[A1], A2Reader: Reader[A2]): Ret = {
f((A1Reader.read(fs(0)), A2Reader.read(fs(1))))
}
create(Seq("foo", "42"), Foo.tupled) // gives me a Foo("foo", 42)
The problem though is that I'd need to define the create method for each tuple and function arity, so that means up to 22 versions of create. Additionally, this doesn't take care of validation, or receiving corrupt data.

As there is a Shapeless tag, a possible solution using it, but I'm not an expert and I guess one can do better :
First, about the lack of validation, you should simply have read return Try, or scalaz.Validation or just option if you do not care about an error message.
Then about boilerplate, you may try to use HList. This way you don't need to go for all the arities.
import scala.util._
import shapeless._
trait Reader[+A] { self =>
def read(s: String) : Try[A]
def map[B](f: A => B): Reader[B] = new Reader[B] {
def read(s: String) = self.read(s).map(f)
}
}
object Reader {
// convenience
def apply[A: Reader] : Reader[A] = implicitly[Reader[A]]
def read[A: Reader](s: String): Try[A] = implicitly[Reader[A]].read(s)
// base types
implicit object StringReader extends Reader[String] {
def read(s: String) = Success(s)
}
implicit object IntReader extends Reader[Int] {
def read(s: String) = Try {s.toInt}
}
// HLists, parts separated by ":"
implicit object HNilReader extends Reader[HNil] {
def read(s: String) =
if (s.isEmpty()) Success(HNil)
else Failure(new Exception("Expect empty"))
}
implicit def HListReader[A : Reader, H <: HList : Reader] : Reader[A :: H]
= new Reader[A :: H] {
def read(s: String) = {
val (before, colonAndBeyond) = s.span(_ != ':')
val after = if (colonAndBeyond.isEmpty()) "" else colonAndBeyond.tail
for {
a <- Reader.read[A](before)
b <- Reader.read[H](after)
} yield a :: b
}
}
}
Given that, you have a reasonably short reader for Foo :
case class Foo(a: Int, s: String)
object Foo {
implicit val FooReader : Reader[Foo] =
Reader[Int :: String :: HNil].map(Generic[Foo].from _)
}
It works :
println(Reader.read[Foo]("12:text"))
Success(Foo(12,text))

Without scalaz and shapeless, I think the ideomatic Scala way to parse some input are Scala parser combinators. In your example, I would try something like this:
import org.joda.time.DateTime
import scala.util.parsing.combinator.JavaTokenParsers
val input =
"""Event:005003:information:2013 12 06 12 37 55:n3.swmml20861:1:Full client swmml20861 registered [entry=280 PID=20864 queue=0x4ca9001b]
|RSET:m3node:AUTRS:1-1-24:A:0:LOADSHARE:INHIBITED:0
|M3UA_IP_LINK:m3node:AUT001LKSET1:AUT001LK1:r
|OPC:m3node:1-10-2(P):A7:NAT0""".stripMargin
trait LineContent
case class Event(number : Int, typ : String, when : DateTime, stuff : List[String]) extends LineContent
case class Reset(node : String, stuff : List[String]) extends LineContent
case class Other(typ : String, stuff : List[String]) extends LineContent
object LineContentParser extends JavaTokenParsers {
override val whiteSpace=""":""".r
val space="""\s+""".r
val lineEnd = """"\n""".r //"""\s*(\r?\n\r?)+""".r
val field = """[^:]*""".r
def stuff : Parser[List[String]] = rep(field)
def integer : Parser[Int] = log(wholeNumber ^^ {_.toInt})("integer")
def date : Parser[DateTime] = log((repsep(integer, space) filter (_.length == 6)) ^^ (l =>
new DateTime(l(0), l(1), l(2), l(3), l(4), l(5), 0)
))("date")
def event : Parser[Event] = "Event" ~> integer ~ field ~ date ~ stuff ^^ {
case number~typ~when~stuff => Event(number, typ, when, stuff)}
def reset : Parser[Reset] = "RSET" ~> field ~ stuff ^^ { case node~stuff =>
Reset(node, stuff)
}
def other : Parser[Other] = ("M3UA_IP_LINK" | "OPC") ~ stuff ^^ { case typ~stuff =>
Other(typ, stuff)
}
def line : Parser[LineContent] = event | reset | other
def lines = repsep(line, lineEnd)
def parseLines(s : String) = parseAll(lines, s)
}
LineContentParser.parseLines(input)
The patterns in the parser combinators are self explanatory. I always convert each successfully parsed chunk as early as possible to an partial result. Then the partial results will be combined to the final result.
A hint for debugging: You can always add the log parser. It will print before and after when a rule is applied. Together with the given name (e.g. "date") it will also print the current position of the input source, where the rule is applied and when applicable the parsed partial result.
An example output looks like this:
trying integer at scala.util.parsing.input.CharSequenceReader#108589b
integer --> [1.13] parsed: 5003
trying date at scala.util.parsing.input.CharSequenceReader#cec2e3
trying integer at scala.util.parsing.input.CharSequenceReader#cec2e3
integer --> [1.30] parsed: 2013
trying integer at scala.util.parsing.input.CharSequenceReader#14da3
integer --> [1.33] parsed: 12
trying integer at scala.util.parsing.input.CharSequenceReader#1902929
integer --> [1.36] parsed: 6
trying integer at scala.util.parsing.input.CharSequenceReader#17e4dce
integer --> [1.39] parsed: 12
trying integer at scala.util.parsing.input.CharSequenceReader#1747fd8
integer --> [1.42] parsed: 37
trying integer at scala.util.parsing.input.CharSequenceReader#1757f47
integer --> [1.45] parsed: 55
date --> [1.45] parsed: 2013-12-06T12:37:55.000+01:00
I think this is an easy and maintainable way to parse input into well typed Scala objects. It is all in the core Scala API, hence I would call it "idiomatic". When typing the example code in an Idea Scala worksheet, completion and type information worked very well. So this way seems to well supported by the IDEs.

Related

What scala language feature is this using? It seems to extract values from a JsObject and cast it?

I'm just reading some scala source code to learn from, and I came accorss this:
How is o str "t" working here?
It seems to be extracting the "t" key from the JsObject and casting it to a string?
Try(Json parse str) map {
case o: JsObject =>
o str "t" flatMap {
case "p" => Some(Ping(o int "l"))
There are similar patters like:
case "opening" =>
for {
d <- o obj "d"
path <- d str "path"
fen <- d str "fen"
variant = dataVariant(d)
} yield Opening(variant, Path(path), FEN(fen))
Not sure what o obj "d" is doing? Or more importantly, how it is doing this?
I can't seem to find any scalaz references or package level implicits that would allow this?
Reference: https://github.com/ornicar/lila-ws/blob/master/src/main/scala/ipc/ClientOut.scala#L112

The expression o obj "d" uses combination of extension method via implicit conversion and infix notation. Desugared it translates to
augment(o).str("t")
where augment is defined by
final class LilaJsObject(private val js: JsObject) extends AnyVal {
def str(key: String): Option[String] =
(js \ key).asOpt[String]
...
}
object LilaJsObject {
implicit def augment(o: JsObject) = new LilaJsObject(o)
}
This low level implementation of extension method could be simplified using implicit classes
implicit final class LilaJsObject(private val js: JsObject) extends AnyVal {
def str(key: String): Option[String] =
(js \ key).asOpt[String]
}
In Scala 3 extension method syntax becomes even simpler.
Try seeing how infix punctuation-free notation desugars by executing
scala -Xprint:parser -e '"Hello" charAt 2'
which should display "Hello".charAt(2). Infix notation can lead to beautiful DSL, for example,
object repeat {
def it(n: Int) = new {
def times[A](f: => A): Unit = (1 to n).foreach(_ => f)
}
}
can express in almost human language
repeat it 5 times { println("hello") }

How to express Function type?

I am currently reading Hutton's and Meijer's paper on parsing combinators in Haskell http://www.cs.nott.ac.uk/~pszgmh/monparsing.pdf. For the sake of it I am trying to implement them in scala. I would like to construct something easy to code, extend and also simple and elegant. I have come up with two solutions for the following haskell code
/* Haskell Code */
type Parser a = String -> [(a,String)]
result :: a -> Parser a
result v = \inp -> [(v,inp)]
zero :: Parser a
zero = \inp -> []
item :: Parser Char
item = \inp -> case inp of
[] -> []
(x:xs) -> [(x,xs)]
/* Scala Code */
object Hutton1 {
type Parser[A] = String => List[(A, String)]
def Result[A](v: A): Parser[A] = str => List((v, str))
def Zero[A]: Parser[A] = str => List()
def Character: Parser[Char] = str => if (str.isEmpty) List() else List((str.head, str.tail))
}
object Hutton2 {
trait Parser[A] extends (String => List[(A, String)])
case class Result[A](v: A) extends Parser[A] {
def apply(str: String) = List((v, str))
}
case object Zero extends Parser[T forSome {type T}] {
def apply(str: String) = List()
}
case object Character extends Parser[Char] {
def apply(str: String) = if (str.isEmpty) List() else List((str.head, str.tail))
}
}
object Hutton extends App {
object T1 {
import Hutton1._
def run = {
val r: List[(Int, String)] = Zero("test") ++ Result(5)("test")
println(r.map(x => x._1 + 1) == List(6))
println(Character("abc") == List(('a', "bc")))
}
}
object T2 {
import Hutton2._
def run = {
val r: List[(Int, String)] = Zero("test") ++ Result(5)("test")
println(r.map(x => x._1 + 1) == List(6))
println(Character("abc") == List(('a', "bc")))
}
}
T1.run
T2.run
}
Question 1
In Haskell, zero is a function value that can be used as it is, expessing all failed parsers whether they are of type Parser[Int] or Parser[String]. In scala we achieve the same by calling the function Zero (1st approach) but in this way I believe that I just generate a different function everytime Zero is called. Is this statement true? Is there a way to mitigate this?
Question 2
In the second approach, the Zero case object is extending Parser with the usage of existential types Parser[T forSome {type T}] . If I replace the type with Parser[_] I get the compile error
Error:(19, 28) class type required but Hutton2.Parser[_] found
case object Zero extends Parser[_] {
^
I thought these two expressions where equivalent. Is this the case?
Question 3
Which approach out of the two do you think that will yield better results in expressing the combinators in terms of elegance and simplicity?
I use scala 2.11.8

Note: I didn't compile it, but I know the problem and can propose two solutions.
The more Haskellish way would be to not use subtyping, but to define zero as a polymorphic value. In that style, I would propose to define parsers not as objects deriving from a function type, but as values of one case class:
final case class Parser[T](run: String => List[(T, String)])
def zero[T]: Parser[T] = Parser(...)
As shown by #Alec, yes, this will produce a new value every time, since a def is compiled to a method.
If you want to use subtyping, you need to make Parser covariant. Then you can give zero a bottom result type:
trait Parser[+A] extends (String => List[(A, String)])
case object Zero extends Parser[Nothing] {...}
These are in some way quite related; in system F_<:, which is the base of what Scala uses, the types _|_ (aka Nothing) and \/T <: Any. T behave the same (this hinted at in Types and Programming Languages, chapter 28). The two possibilities given here are a consequence of this fact.
With existentials I'm not so familiar with, but I think that while unbounded T forSome {type T} will behave like Nothing, Scala does not allow inhertance from an existential type.

Question 1
I think that you are right, and here is why: Zero1 below prints hello every time you use it. The solution, Zero2, involves using a val instead.
def Zero1[A]: Parser[A] = { println("hi"); str => List() }
val Zero2: Parser[Nothing] = str => List()
Question 2
No idea. I'm still just starting out with Scala. Hope someone answers this.
Question 3
The trait one will play better with Scala's for (since you can define custom flatMap and map), which turns out to be (somewhat) like Haskell's do. The following is all you need.
trait Parser[A] extends (String => List[(A, String)]) {
def flatMap[B](f: A => Parser[B]): Parser[B] = {
val p1 = this
new Parser[B] {
def apply(s1: String) = for {
(a,s2) <- p1(s1)
p2 = f(a)
(b,s3) <- p2(s2)
} yield (b,s3)
}
}
def map[B](f: A => B): Parser[B] = {
val p = this
new Parser[B] {
def apply(s1: String) = for ((a,s2) <- p(s1)) yield (f(a),s2)
}
}
}
Of course, to do anything interesting you need more parsers. I'll propose to you one simple parser combinator: Choice(p1: Parser[A], p2: Parser[A]): Parser[A] which tries both parsers. (And rewrite your existing parsers more to my style).
def choice[A](p1: Parser[A], p2: Parser[A]): Parser[A] = new Parser[A] {
def apply(s: String): List[(A,String)] = { p1(s) ++ p2(s) }
}
def unit[A](x: A): Parser[A] = new Parser[A] {
def apply(s: String): List[(A,String)] = List((x,s))
}
val character: Parser[Char] = new Parser[Char] {
def apply(s: String): List[(Char,String)] = List((s.head,s.tail))
}
Then, you can write something like the following:
val parser: Parser[(Char,Char)] = for {
x <- choice(unit('x'),char)
y <- char
} yield (x,y)
And calling parser("xyz") gives you List((('x','x'),"yz"), (('x','y'),"z")).

What is the best way to return more than 2 different variable type values in Scala

It's been while Since I've started working on scala and I am wondering what kind of variable type is the best when I create a method which requires to return multiple data.
let's say If I have to make a method to get user info and it'll be called from many places.
def getUserParam(userId: String):Map[String,Any] = {
//do something
Map(
"isExist" -> true,
"userDataA" -> "String",
"userDataB" -> 1 // int
)
}
in this case, the result type is Map[String,Any] and since each param would be recognized as Any, You cannot pass the value to some other method requiring something spesifically.
def doSomething(foo: String){}
val foo = getUserParam("bar")
doSomething(foo("userDataA")) // type mismatch error
If I use Tuple, I can avoid that error, but I don't think it is easy to guess what each indexed number contains.
and of course there is a choice to use Case Class but once I use case class as a return type, I need to import the case class where ever I call the method.
What I want to ask is what is the best way to make a method returning more than 2 different variable type values.

Here are three options. Even though you might like the third option (using anonymous class) it's actually my least favorite. As you can see, it requires you to enable reflective calls (otherwise it throws a compilation warning). Scala will use reflection to achieve this which is not that great.
Personally, if there are only 2 values I use tuple. If there are more than two I will use a case class since it greatly improves code readability. The anonymous class option I knew it existed for a while, but I never used that it my code.
import java.util.Date
def returnTwoUsingTuple: (Date, String) = {
val date = new Date()
val str = "Hello world"
(date,str)
}
val tupleVer = returnTwoUsingTuple
println(tupleVer._1)
println(tupleVer._2)
case class Reply(date: Date, str: String)
def returnTwoUsingCaseClass: Reply = {
val date = new Date()
val str = "Hello world"
Reply(date,str)
}
val caseClassVer = returnTwoUsingCaseClass
println(caseClassVer.date)
println(caseClassVer.str)
import scala.language.reflectiveCalls
def returnTwoUsingAnonymousClass = {
val date = new Date()
val str = "Hello world"
new {
val getDate = date
val getStr = str
}
}
val anonClassVer = returnTwoUsingAnonymousClass
println(anonClassVer.getDate)
println(anonClassVer.getStr)

Sinse your logic with Map[String,Any] is more like for each key I have one of .. not for each key I have both ... more effective use in this case would be Either or even more effectively - scalaz.\/
scalaz.\/
import scalaz._
import scalaz.syntax.either._
def getUserParam(userId: String): Map[String, String \/ Int \/ Boolean] = {
//do something
Map(
"isExist" -> true.right,
"userDataA" -> "String".left.left,
"userDataB" -> 1.right.left
)
}
String \/ Int \/ Boolean is left-associatited to (String \/ Int) \/ Boolean
now you have
def doSomething(foo: String){}
unluckily it's the most complex case, if for example you had
def doSomethingB(foo: Boolean){}
you could've just
foo("userDataA").foreach(doSomethingB)
since the right value considered as correct so for String which is left to the left you could write
foo("userdata").swap.foreach(_.swap.foreach(doSomething))
Closed Family
Or you could craft you own simple type for large number of alternatives like
sealed trait Either3[+A, +B, +C] {
def ifFirst[T](action: A => T): Option[T] = None
def ifSecond[T](action: B => T): Option[T] = None
def ifThird[T](action: C => T): Option[T] = None
}
case class First[A](x: A) extends Either3[A, Nothing, Nothing] {
override def ifFirst[T](action: A => T): Option[T] = Some(action(x))
}
case class Second[A](x: A) extends Either3[Nothing, A, Nothing] {
override def ifSecond[T](action: A => T): Option[T] = Some(action(x))
}
case class Third[A](x: A) extends Either3[Nothing, Nothing, A] {
override def ifThird[T](action: A => T): Option[T] = Some(action(x))
}
now having
def getUserParam3(userId: String): Map[String, Either3[Boolean, String, Int]] = {
//do something
Map(
"isExist" -> First(true),
"userDataA" -> Second("String"),
"userDataB" -> Third(1)
)
}
val foo3 = getUserParam3("bar")
you can use your values as
foo3("userdata").ifSecond(doSomething)

Scala Functional "no-op" syntax [duplicate]

When programming in java, I always log input parameter and return value of a method, but in scala, the last line of a method is the return value. so I have to do something like:
def myFunc() = {
val rs = calcSomeResult()
logger.info("result is:" + rs)
rs
}
in order to make it easy, I write a utility:
class LogUtil(val f: (String) => Unit) {
def logWithValue[T](msg: String, value: T): T = { f(msg); value }
}
object LogUtil {
def withValue[T](f: String => Unit): ((String, T) => T) = new LogUtil(f).logWithValue _
}
Then I used it as:
val rs = calcSomeResult()
withValue(logger.info)("result is:" + rs, rs)
it will log the value and return it. it works for me,but seems wierd. as I am a old java programmer, but new to scala, I don't know whether there is a more idiomatic way to do this in scala.
thanks for your help, now I create a better util using Kestrel combinator metioned by romusz
object LogUtil {
def kestrel[A](x: A)(f: A => Unit): A = { f(x); x }
def logV[A](f: String => Unit)(s: String, x: A) = kestrel(x) { y => f(s + ": " + y)}
}
I add f parameter so that I can pass it a logger from slf4j, and the test case is:
class LogUtilSpec extends FlatSpec with ShouldMatchers {
val logger = LoggerFactory.getLogger(this.getClass())
import LogUtil._
"LogUtil" should "print log info and keep the value, and the calc for value should only be called once" in {
def calcValue = { println("calcValue"); 100 } // to confirm it's called only once
val v = logV(logger.info)("result is", calcValue)
v should be === 100
}
}

What you're looking for is called Kestrel combinator (K combinator): Kxy = x. You can do all kinds of side-effect operations (not only logging) while returning the value passed to it. Read https://github.com/raganwald/homoiconic/blob/master/2008-10-29/kestrel.markdown#readme
In Scala the simplest way to implement it is:
def kestrel[A](x: A)(f: A => Unit): A = { f(x); x }
Then you can define your printing/logging function as:
def logging[A](x: A) = kestrel(x)(println)
def logging[A](s: String, x: A) = kestrel(x){ y => println(s + ": " + y) }
And use it like:
logging(1 + 2) + logging(3 + 4)
your example function becomes a one-liner:
def myFunc() = logging("result is", calcSomeResult())
If you prefer OO notation you can use implicits as shown in other answers, but the problem with such approach is that you'll create a new object every time you want to log something, which may cause performance degradation if you do it often enough. But for completeness, it looks like this:
implicit def anyToLogging[A](a: A) = new {
def log = logging(a)
def log(msg: String) = logging(msg, a)
}
Use it like:
def myFunc() = calcSomeResult().log("result is")

You have the basic idea right--you just need to tidy it up a little bit to make it maximally convenient.
class GenericLogger[A](a: A) {
def log(logger: String => Unit)(str: A => String): A = { logger(str(a)); a }
}
implicit def anything_can_log[A](a: A) = new GenericLogger(a)
Now you can
scala> (47+92).log(println)("The answer is " + _)
The answer is 139
res0: Int = 139
This way you don't need to repeat yourself (e.g. no rs twice).

If you like a more generic approach better, you could define
implicit def idToSideEffect[A](a: A) = new {
def withSideEffect(fun: A => Unit): A = { fun(a); a }
def |!>(fun: A => Unit): A = withSideEffect(fun) // forward pipe-like
def tap(fun: A => Unit): A = withSideEffect(fun) // public demand & ruby standard
}
and use it like
calcSomeResult() |!> { rs => logger.info("result is:" + rs) }
calcSomeResult() tap println

Starting Scala 2.13, the chaining operation tap can be used to apply a side effect (in this case some logging) on any value while returning the original value:
def tap[U](f: (A) => U): A
For instance:
scala> val a = 42.tap(println)
42
a: Int = 42
or in our case:
import scala.util.chaining._
def myFunc() = calcSomeResult().tap(x => logger.info(s"result is: $x"))

Let's say you already have a base class for all you loggers:
abstract class Logger {
def info(msg:String):Unit
}
Then you could extend String with the ## logging method:
object ExpressionLog {
// default logger
implicit val logger = new Logger {
def info(s:String) {println(s)}
}
// adding ## method to all String objects
implicit def stringToLog (msg: String) (implicit logger: Logger) = new {
def ## [T] (exp: T) = {
logger.info(msg + " = " + exp)
exp
}
}
}
To use the logging you'd have to import members of ExpressionLog object and then you could easily log expressions using the following notation:
import ExpressionLog._
def sum (a:Int, b:Int) = "sum result" ## (a+b)
val c = sum("a" ## 1, "b" ##2)
Will print:
a = 1
b = 2
sum result = 3
This works because every time when you call a ## method on a String compiler realises that String doesn't have the method and silently converts it into an object with anonymous type that has the ## method defined (see stringToLog). As part of the conversion compiler picks the desired logger as an implicit parameter, this way you don't have to keep passing on the logger to the ## every time yet you retain full control over which logger needs to be used every time.
As far as precedence goes when ## method is used in infix notation it has the highest priority making it easier to reason about what will be logged.
So what if you wanted to use a different logger in one of your methods? This is very simple:
import ExpressionLog.{logger=>_,_} // import everything but default logger
// define specific local logger
// this can be as simple as: implicit val logger = new MyLogger
implicit val logger = new Logger {
var lineno = 1
def info(s:String) {
println("%03d".format(lineno) + ": " + s)
lineno+=1
}
}
// start logging
def sum (a:Int, b:Int) = a+b
val c = "sum result" ## sum("a" ## 1, "b" ##2)
Will output:
001: a = 1
002: b = 2
003: sum result = 3

Compiling all the answers, pros and cons, I came up with this (context is a Play application):
import play.api.LoggerLike
object LogUtils {
implicit class LogAny2[T](val value : T) extends AnyVal {
def ##(str : String)(implicit logger : LoggerLike) : T = {
logger.debug(str);
value
}
def ##(f : T => String)(implicit logger : LoggerLike) : T = {
logger.debug(f(value))
value
}
}
As you can see, LogAny is an AnyVal so there shouldn't be any overhead of new object creation.
You can use it like this:
scala> import utils.LogUtils._
scala> val a = 5
scala> val b = 7
scala> implicit val logger = play.api.Logger
scala> val c = a + b ## { c => s"result of $a + $b = $c" }
c: Int = 12
Or if you don't need a reference to the result, just use:
scala> val c = a + b ## "Finished this very complex calculation"
c: Int = 12
Any downsides to this implementation?
Edit:
I've made this available with some improvements in a gist here

Hidden features of Scala

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
What are the hidden features of Scala that every Scala developer should be aware of?
One hidden feature per answer, please.

Okay, I had to add one more. Every Regex object in Scala has an extractor (see answer from oxbox_lakes above) that gives you access to the match groups. So you can do something like:
// Regex to split a date in the format Y/M/D.
val regex = "(\\d+)/(\\d+)/(\\d+)".r
val regex(year, month, day) = "2010/1/13"
The second line looks confusing if you're not used to using pattern matching and extractors. Whenever you define a val or var, what comes after the keyword is not simply an identifier but rather a pattern. That's why this works:
val (a, b, c) = (1, 3.14159, "Hello, world")
The right hand expression creates a Tuple3[Int, Double, String] which can match the pattern (a, b, c).
Most of the time your patterns use extractors that are members of singleton objects. For example, if you write a pattern like
Some(value)
then you're implicitly calling the extractor Some.unapply.
But you can also use class instances in patterns, and that is what's happening here. The val regex is an instance of Regex, and when you use it in a pattern, you're implicitly calling regex.unapplySeq (unapply versus unapplySeq is beyond the scope of this answer), which extracts the match groups into a Seq[String], the elements of which are assigned in order to the variables year, month, and day.

Structural type definitions - i.e. a type described by what methods it supports. For example:
object Closer {
def using(closeable: { def close(): Unit }, f: => Unit) {
try {
f
} finally { closeable.close }
}
}
Notice that the type of the parameter closeable is not defined other than it has a close method

Type-Constructor Polymorphism (a.k.a. higher-kinded types)
Without this feature you can, for example, express the idea of mapping a function over a list to return another list, or mapping a function over a tree to return another tree. But you can't express this idea generally without higher kinds.
With higher kinds, you can capture the idea of any type that's parameterised with another type. A type constructor that takes one parameter is said to be of kind (*->*). For example, List. A type constructor that returns another type constructor is said to be of kind (*->*->*). For example, Function1. But in Scala, we have higher kinds, so we can have type constructors that are parameterised with other type constructors. So they're of kinds like ((*->*)->*).
For example:
trait Functor[F[_]] {
def fmap[A, B](f: A => B, fa: F[A]): F[B]
}
Now, if you have a Functor[List], you can map over lists. If you have a Functor[Tree], you can map over trees. But more importantly, if you have Functor[A] for any A of kind (*->*), you can map a function over A.

Extractors which allow you to replace messy if-elseif-else style code with patterns. I know that these are not exactly hidden but I've been using Scala for a few months without really understanding the power of them. For (a long) example I can replace:
val code: String = ...
val ps: ProductService = ...
var p: Product = null
if (code.endsWith("=")) {
p = ps.findCash(code.substring(0, 3)) //e.g. USD=, GBP= etc
}
else if (code.endsWith(".FWD")) {
//e.g. GBP20090625.FWD
p = ps.findForward(code.substring(0,3), code.substring(3, 9))
}
else {
p = ps.lookupProductByRic(code)
}
With this, which is much clearer in my opinion
implicit val ps: ProductService = ...
val p = code match {
case SyntheticCodes.Cash(c) => c
case SyntheticCodes.Forward(f) => f
case _ => ps.lookupProductByRic(code)
}
I have to do a bit of legwork in the background...
object SyntheticCodes {
// Synthetic Code for a CashProduct
object Cash extends (CashProduct => String) {
def apply(p: CashProduct) = p.currency.name + "="
//EXTRACTOR
def unapply(s: String)(implicit ps: ProductService): Option[CashProduct] = {
if (s.endsWith("=")
Some(ps.findCash(s.substring(0,3)))
else None
}
}
//Synthetic Code for a ForwardProduct
object Forward extends (ForwardProduct => String) {
def apply(p: ForwardProduct) = p.currency.name + p.date.toString + ".FWD"
//EXTRACTOR
def unapply(s: String)(implicit ps: ProductService): Option[ForwardProduct] = {
if (s.endsWith(".FWD")
Some(ps.findForward(s.substring(0,3), s.substring(3, 9))
else None
}
}
But the legwork is worth it for the fact that it separates a piece of business logic into a sensible place. I can implement my Product.getCode methods as follows..
class CashProduct {
def getCode = SyntheticCodes.Cash(this)
}
class ForwardProduct {
def getCode = SyntheticCodes.Forward(this)
}

Manifests which are a sort of way at getting the type information at runtime, as if Scala had reified types.

In scala 2.8 you can have tail-recursive methods by using the package scala.util.control.TailCalls (in fact it's trampolining).
An example:
def u(n:Int):TailRec[Int] = {
if (n==0) done(1)
else tailcall(v(n/2))
}
def v(n:Int):TailRec[Int] = {
if (n==0) done(5)
else tailcall(u(n-1))
}
val l=for(n<-0 to 5) yield (n,u(n).result,v(n).result)
println(l)

Case classes automatically mixin the Product trait, providing untyped, indexed access to the fields without any reflection:
case class Person(name: String, age: Int)
val p = Person("Aaron", 28)
val name = p.productElement(0) // name = "Aaron": Any
val age = p.productElement(1) // age = 28: Any
val fields = p.productIterator.toList // fields = List[Any]("Aaron", 28)
This feature also provides a simplified way to alter the output of the toString method:
case class Person(name: String, age: Int) {
override def productPrefix = "person: "
}
// prints "person: (Aaron,28)" instead of "Person(Aaron, 28)"
println(Person("Aaron", 28))

It's not exactly hidden, but certainly a under advertised feature: scalac -Xprint.
As a illustration of the use consider the following source:
class A { "xx".r }
Compiling this with scalac -Xprint:typer outputs:
package <empty> {
class A extends java.lang.Object with ScalaObject {
def this(): A = {
A.super.this();
()
};
scala.this.Predef.augmentString("xx").r
}
}
Notice scala.this.Predef.augmentString("xx").r, which is a the application of the implicit def augmentString present in Predef.scala.
scalac -Xprint:<phase> will print the syntax tree after some compiler phase. To see the available phases use scalac -Xshow-phases.
This is a great way to learn what is going on behind the scenes.
Try with
case class X(a:Int,b:String)
using the typer phase to really feel how useful it is.

You can define your own control structures. It's really just functions and objects and some syntactic sugar, but they look and behave like the real thing.
For example, the following code defines dont {...} unless (cond) and dont {...} until (cond):
def dont(code: => Unit) = new DontCommand(code)
class DontCommand(code: => Unit) {
def unless(condition: => Boolean) =
if (condition) code
def until(condition: => Boolean) = {
while (!condition) {}
code
}
}
Now you can do the following:
/* This will only get executed if the condition is true */
dont {
println("Yep, 2 really is greater than 1.")
} unless (2 > 1)
/* Just a helper function */
var number = 0;
def nextNumber() = {
number += 1
println(number)
number
}
/* This will not be printed until the condition is met. */
dont {
println("Done counting to 5!")
} until (nextNumber() == 5)

#switch annotation in Scala 2.8:
An annotation to be applied to a match
expression. If present, the compiler
will verify that the match has been
compiled to a tableswitch or
lookupswitch, and issue an error if it
instead compiles into a series of
conditional expressions.
Example:
scala> val n = 3
n: Int = 3
scala> import annotation.switch
import annotation.switch
scala> val s = (n: #switch) match {
| case 3 => "Three"
| case _ => "NoThree"
| }
<console>:6: error: could not emit switch for #switch annotated match
val s = (n: #switch) match {

Dunno if this is really hidden, but I find it quite nice.
Typeconstructors that take 2 type parameters can be written in infix notation
object Main {
class FooBar[A, B]
def main(args: Array[String]): Unit = {
var x: FooBar[Int, BigInt] = null
var y: Int FooBar BigInt = null
}
}

Scala 2.8 introduced default and named arguments, which made possible the addition of a new "copy" method that Scala adds to case classes. If you define this:
case class Foo(a: Int, b: Int, c: Int, ... z:Int)
and you want to create a new Foo that's like an existing Foo, only with a different "n" value, then you can just say:
foo.copy(n = 3)

in scala 2.8 you can add #specialized to your generic classes/methods. This will create special versions of the class for primitive types (extending AnyVal) and save the cost of un-necessary boxing/unboxing :
class Foo[#specialized T]...
You can select a subset of AnyVals :
class Foo[#specialized(Int,Boolean) T]...

Extending the language. I always wanted to do something like this in Java (couldn't). But in Scala I can have:
def timed[T](thunk: => T) = {
val t1 = System.nanoTime
val ret = thunk
val time = System.nanoTime - t1
println("Executed in: " + time/1000000.0 + " millisec")
ret
}
and then write:
val numbers = List(12, 42, 3, 11, 6, 3, 77, 44)
val sorted = timed { // "timed" is a new "keyword"!
numbers.sortWith(_<_)
}
println(sorted)
and get
Executed in: 6.410311 millisec
List(3, 3, 6, 11, 12, 42, 44, 77)

You can designate a call-by-name parameter (EDITED: this is different then a lazy parameter!) to a function and it will not be evaluated until used by the function (EDIT: in fact, it will be reevaluated every time it is used). See this faq for details
class Bar(i:Int) {
println("constructing bar " + i)
override def toString():String = {
"bar with value: " + i
}
}
// NOTE the => in the method declaration. It indicates a lazy paramter
def foo(x: => Bar) = {
println("foo called")
println("bar: " + x)
}
foo(new Bar(22))
/*
prints the following:
foo called
constructing bar 22
bar with value: 22
*/

You can use locally to introduce a local block without causing semicolon inference issues.
Usage:
scala> case class Dog(name: String) {
| def bark() {
| println("Bow Vow")
| }
| }
defined class Dog
scala> val d = Dog("Barnie")
d: Dog = Dog(Barnie)
scala> locally {
| import d._
| bark()
| bark()
| }
Bow Vow
Bow Vow
locally is defined in "Predef.scala" as:
#inline def locally[T](x: T): T = x
Being inline, it does not impose any additional overhead.

Early Initialization:
trait AbstractT2 {
println("In AbstractT2:")
val value: Int
val inverse = 1.0/value
println("AbstractT2: value = "+value+", inverse = "+inverse)
}
val c2c = new {
// Only initializations are allowed in pre-init. blocks.
// println("In c2c:")
val value = 10
} with AbstractT2
println("c2c.value = "+c2c.value+", inverse = "+c2c.inverse)
Output:
In AbstractT2:
AbstractT2: value = 10, inverse = 0.1
c2c.value = 10, inverse = 0.1
We instantiate an anonymous inner
class, initializing the value field
in the block, before the with
AbstractT2 clause. This guarantees
that value is initialized before the
body of AbstractT2 is executed, as
shown when you run the script.

You can compose structural types with the 'with' keyword
object Main {
type A = {def foo: Unit}
type B = {def bar: Unit}
type C = A with B
class myA {
def foo: Unit = println("myA.foo")
}
class myB {
def bar: Unit = println("myB.bar")
}
class myC extends myB {
def foo: Unit = println("myC.foo")
}
def main(args: Array[String]): Unit = {
val a: A = new myA
a.foo
val b: C = new myC
b.bar
b.foo
}
}

placeholder syntax for anonymous functions
From The Scala Language Specification:
SimpleExpr1 ::= '_'
An expression (of syntactic category Expr) may contain embedded underscore symbols _ at places where identifiers are legal. Such an expression represents an anonymous function where subsequent occurrences of underscores denote successive parameters.
From Scala Language Changes:
_ + 1 x => x + 1
_ * _ (x1, x2) => x1 * x2
(_: Int) * 2 (x: Int) => x * 2
if (_) x else y z => if (z) x else y
_.map(f) x => x.map(f)
_.map(_ + 1) x => x.map(y => y + 1)
Using this you could do something like:
def filesEnding(query: String) =
filesMatching(_.endsWith(query))

Implicit definitions, particularly conversions.
For example, assume a function which will format an input string to fit to a size, by replacing the middle of it with "...":
def sizeBoundedString(s: String, n: Int): String = {
if (n < 5 && n < s.length) throw new IllegalArgumentException
if (s.length > n) {
val trailLength = ((n - 3) / 2) min 3
val headLength = n - 3 - trailLength
s.substring(0, headLength)+"..."+s.substring(s.length - trailLength, s.length)
} else s
}
You can use that with any String, and, of course, use the toString method to convert anything. But you could also write it like this:
def sizeBoundedString[T](s: T, n: Int)(implicit toStr: T => String): String = {
if (n < 5 && n < s.length) throw new IllegalArgumentException
if (s.length > n) {
val trailLength = ((n - 3) / 2) min 3
val headLength = n - 3 - trailLength
s.substring(0, headLength)+"..."+s.substring(s.length - trailLength, s.length)
} else s
}
And then, you could pass classes of other types by doing this:
implicit def double2String(d: Double) = d.toString
Now you can call that function passing a double:
sizeBoundedString(12345.12345D, 8)
The last argument is implicit, and is being passed automatically because of the implicit de declaration. Furthermore, "s" is being treated like a String inside sizeBoundedString because there is an implicit conversion from it to String.
Implicits of this type are better defined for uncommon types to avoid unexpected conversions. You can also explictly pass a conversion, and it will still be implicitly used inside sizeBoundedString:
sizeBoundedString(1234567890L, 8)((l : Long) => l.toString)
You can also have multiple implicit arguments, but then you must either pass all of them, or not pass any of them. There is also a shortcut syntax for implicit conversions:
def sizeBoundedString[T <% String](s: T, n: Int): String = {
if (n < 5 && n < s.length) throw new IllegalArgumentException
if (s.length > n) {
val trailLength = ((n - 3) / 2) min 3
val headLength = n - 3 - trailLength
s.substring(0, headLength)+"..."+s.substring(s.length - trailLength, s.length)
} else s
}
This is used exactly the same way.
Implicits can have any value. They can be used, for instance, to hide library information. Take the following example, for instance:
case class Daemon(name: String) {
def log(msg: String) = println(name+": "+msg)
}
object DefaultDaemon extends Daemon("Default")
trait Logger {
private var logd: Option[Daemon] = None
implicit def daemon: Daemon = logd getOrElse DefaultDaemon
def logTo(daemon: Daemon) =
if (logd == None) logd = Some(daemon)
else throw new IllegalArgumentException
def log(msg: String)(implicit daemon: Daemon) = daemon.log(msg)
}
class X extends Logger {
logTo(Daemon("X Daemon"))
def f = {
log("f called")
println("Stuff")
}
def g = {
log("g called")(DefaultDaemon)
}
}
class Y extends Logger {
def f = {
log("f called")
println("Stuff")
}
}
In this example, calling "f" in an Y object will send the log to the default daemon, and on an instance of X to the Daemon X daemon. But calling g on an instance of X will send the log to the explicitly given DefaultDaemon.
While this simple example can be re-written with overload and private state, implicits do not require private state, and can be brought into context with imports.

Maybe not too hidden, but I think this is useful:
#scala.reflect.BeanProperty
var firstName:String = _
This will automatically generate a getter and setter for the field that matches bean convention.
Further description at developerworks

Implicit arguments in closures.
A function argument can be marked as implicit just as with methods. Within the scope of the body of the function the implicit parameter is visible and eligible for implicit resolution:
trait Foo { def bar }
trait Base {
def callBar(implicit foo: Foo) = foo.bar
}
object Test extends Base {
val f: Foo => Unit = { implicit foo =>
callBar
}
def test = f(new Foo {
def bar = println("Hello")
})
}

Build infinite data structures with Scala's Streams :
http://www.codecommit.com/blog/scala/infinite-lists-for-the-finitely-patient

Result types are dependent on implicit resolution. This can give you a form of multiple dispatch:
scala> trait PerformFunc[A,B] { def perform(a : A) : B }
defined trait PerformFunc
scala> implicit val stringToInt = new PerformFunc[String,Int] {
def perform(a : String) = 5
}
stringToInt: java.lang.Object with PerformFunc[String,Int] = $anon$1#13ccf137
scala> implicit val intToDouble = new PerformFunc[Int,Double] {
def perform(a : Int) = 1.0
}
intToDouble: java.lang.Object with PerformFunc[Int,Double] = $anon$1#74e551a4
scala> def foo[A, B](x : A)(implicit z : PerformFunc[A,B]) : B = z.perform(x)
foo: [A,B](x: A)(implicit z: PerformFunc[A,B])B
scala> foo("HAI")
res16: Int = 5
scala> foo(1)
res17: Double = 1.0

Scala's equivalent of Java double brace initializer.
Scala allows you to create an anonymous subclass with the body of the class (the constructor) containing statements to initialize the instance of that class.
This pattern is very useful when building component-based user interfaces (for example Swing , Vaadin) as it allows to create UI components and declare their properties more concisely.
See http://spot.colorado.edu/~reids/papers/how-scala-experience-improved-our-java-development-reid-2011.pdf for more information.
Here is an example of creating a Vaadin button:
val button = new Button("Click me"){
setWidth("20px")
setDescription("Click on this")
setIcon(new ThemeResource("icons/ok.png"))
}

Excluding members from import statements
Suppose you want to use a Logger that contains a println and a printerr method, but you only want to use the one for error messages, and keep the good old Predef.println for standard output. You could do this:
val logger = new Logger(...)
import logger.printerr
but if logger also contains another twelve methods that you would like to import and use, it becomes inconvenient to list them. You could instead try:
import logger.{println => donotuseprintlnt, _}
but this still "pollutes" the list of imported members. Enter the über-powerful wildcard:
import logger.{println => _, _}
and that will do just the right thing™.

require method (defined in Predef) that allow you to define additional function constraints that would be checked during run-time. Imagine that you developing yet another twitter client and you need to limit tweet length up to 140 symbols. Moreover you can't post empty tweet.
def post(tweet: String) = {
require(tweet.length < 140 && tweet.length > 0)
println(tweet)
}
Now calling post with inappropriate length argument will cause an exception:
scala> post("that's ok")
that's ok
scala> post("")
java.lang.IllegalArgumentException: requirement failed
at scala.Predef$.require(Predef.scala:145)
at .post(<console>:8)
scala> post("way to looooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooong tweet")
java.lang.IllegalArgumentException: requirement failed
at scala.Predef$.require(Predef.scala:145)
at .post(<console>:8)
You can write multiple requirements or even add description to each:
def post(tweet: String) = {
require(tweet.length > 0, "too short message")
require(tweet.length < 140, "too long message")
println(tweet)
}
Now exceptions are verbose:
scala> post("")
java.lang.IllegalArgumentException: requirement failed: too short message
at scala.Predef$.require(Predef.scala:157)
at .post(<console>:8)
One more example is here.
Bonus
You can perform an action every time requirement fails:
scala> var errorcount = 0
errorcount: Int = 0
def post(tweet: String) = {
require(tweet.length > 0, {errorcount+=1})
println(tweet)
}
scala> errorcount
res14: Int = 0
scala> post("")
java.lang.IllegalArgumentException: requirement failed: ()
at scala.Predef$.require(Predef.scala:157)
at .post(<console>:9)
...
scala> errorcount
res16: Int = 1

Traits with abstract override methods are a feature in Scala that is as not widely advertised as many others. The intend of methods with the abstract override modifier is to do some operations and delegating the call to super. Then these traits have to be mixed-in with concrete implementations of their abstract override methods.
trait A {
def a(s : String) : String
}
trait TimingA extends A {
abstract override def a(s : String) = {
val start = System.currentTimeMillis
val result = super.a(s)
val dur = System.currentTimeMillis-start
println("Executed a in %s ms".format(dur))
result
}
}
trait ParameterPrintingA extends A {
abstract override def a(s : String) = {
println("Called a with s=%s".format(s))
super.a(s)
}
}
trait ImplementingA extends A {
def a(s: String) = s.reverse
}
scala> val a = new ImplementingA with TimingA with ParameterPrintingA
scala> a.a("a lotta as")
Called a with s=a lotta as
Executed a in 0 ms
res4: String = sa attol a
While my example is really not much more than a poor mans AOP, I used these Stackable Traits much to my liking to build Scala interpreter instances with predefined imports, custom bindings and classpathes. The Stackable Traits made it possible to create my factory along the lines of new InterpreterFactory with JsonLibs with LuceneLibs and then have useful imports and scope varibles for the users scripts.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Idiomatic Scala way of deserializing delimited strings into case classes - scala

Related

What scala language feature is this using? It seems to extract values from a JsObject and cast it?

How to express Function type?

What is the best way to return more than 2 different variable type values in Scala

Scala Functional "no-op" syntax [duplicate]

Hidden features of Scala

Categories

Resources