How to convert values of a case class into Seq? - scala

I am new to Scala and I am having to provide values extracted from an object/case class into a Seq. I was wondering whether there would be any generic way of extracting values of an object into Seq of those values in order?
Convert the following:
case class Customer(name: Option[String], age: Int)
val customer = Customer(Some("John"), 24)
into:
val values = Seq("John", 24)

case class extends Product class and it provides such method:
case class Person(age:Int, name:String, lastName:Option[String])
def seq(p:Product) = p.productIterator.toList
val s:Seq[Any] = seq(Person(100, "Albert", Some("Einstain")))
println(s) //List(100, Albert, Some(Einstain))
https://scalafiddle.io/sf/oD7qk8u/0
Problem is that you will get untyped list/array from it. Most of the time it is not optimal way of doing things, and you should always prefer statically typed solutions.

Scala 3 (Dotty) might give us HList out-of-the-box which is a way of getting product's values without loosing type information. Given val picard = Customer(Some("Picard"), 75) consider the difference between
val l: List[Any] = picard.productIterator.toList
l(1)
// val res0: Any = 75
and
val hl: (Option[String], Int) = Tuple.fromProductTyped(picard)
hl(1)
// val res1: Int = 75
Note how res1 did not loose type information.
Informally, it might help to think of an HList as making a case class more generic by dropping its name whilst retaining its fields, for example, whilst Person and Robot are two separate models
Robot(name: Option[String], age: Int)
Person(name: Option[String], age: Int)
they could both represented by a common "HList" that looks something like
(_: Option[String], _: Int) // I dropped the names

If it's enough for you to have Seq[Any] you can use productIterator approach proposed by #Scalway. If I understood correctly you want also to unpack Option fields. But you haven't specified what to do with None case like Customer(None, 24).
val values: Seq[Any] = customer.productIterator.map {
case Some(x) => x
case x => x
}.toSeq // List(John, 24)
Statically typed solution would be to use heterogeneous collection e.g. HList
class Default[A](val value: A)
object Default {
implicit val int: Default[Int] = new Default(0)
implicit val string: Default[String] = new Default("")
//...
}
trait LowPriorityUnpackOption extends Poly1 {
implicit def default[A]: Case.Aux[A, A] = at(identity)
}
object unpackOption extends LowPriorityUnpackOption {
implicit def option[A](implicit default: Default[A]): Case.Aux[Option[A], A] = at {
case Some(a) => a
case None => default.value
}
}
val values: String :: Int :: HNil =
Generic[Customer].to(customer).map(unpackOption) // John :: 24 :: HNil
Generally it would be better to work with Option monadically rather than to unpack them.

Related

How do I put a case class in an rdd and have it act like a tuple(pair)?

Say for example, I have a simple case class
case class Foo(k:String, v1:String, v2:String)
Can I get spark to recognise this as a tuple for the purposes of something like this, without converting to a tuple in, say a map or keyBy step.
val rdd = sc.parallelize(List(Foo("k", "v1", "v2")))
// Swap values
rdd.mapValues(v => (v._2, v._1))
I don't even care if it looses the original case class after such an operation. I've tried the following with no luck. I'm fairly new to Scala, am I missing something?
case class Foo(k:String, v1:String, v2:String)
extends Tuple2[String, (String, String)](k, (v1, v2))
edit: In the above snippet the case class extends Tuple2, this does not produce the desired effect that the RDD class and functions do not treat it like a tuple and allow PairRDDFunctions, such as mapValues, values, reduceByKey, etc.
Extending TupleN isn't a good idea for a number of reasons, with one of the best being the fact that it's deprecated, and on 2.11 it's not even possible to extend TupleN with a case class. Even if you make your Foo a non-case class, defining it on 2.11 with -deprecation will show you this: "warning: inheritance from class Tuple2 in package scala is deprecated: Tuples will be made final in a future version.".
If what you care about is convenience of use and you don't mind the (almost certainly negligible) overhead of the conversion to a tuple, you can enrich a RDD[Foo] with the syntax provided by PairRDDFunctions with a conversion like this:
import org.apache.spark.rdd.{ PairRDDFunctions, RDD }
case class Foo(k: String, v1: String, v2: String)
implicit def fooToPairRDDFunctions[K, V]
(rdd: RDD[Foo]): PairRDDFunctions[String, (String, String)] =
new PairRDDFunctions(
rdd.map {
case Foo(k, v1, v2) => k -> (v1, v2)
}
)
And then:
scala> val rdd = sc.parallelize(List(Foo("a", "b", "c"), Foo("d", "e", "f")))
rdd: org.apache.spark.rdd.RDD[Foo] = ParallelCollectionRDD[6] at parallelize at <console>:34
scala> rdd.mapValues(_._1).first
res0: (String, String) = (a,b)
The reason your version with Foo extending Tuple2[String, (String, String)] doesn't work is that RDD.rddToPairRDDFunctions targets an RDD[Tuple2[K, V]] and RDD isn't covariant in its type parameter, so an RDD[Foo] isn't a RDD[Tuple2[K, V]]. A simpler example might make this clearer:
case class Box[A](a: A)
class Foo(k: String, v: String) extends Tuple2[String, String](k, v)
class PairBoxFunctions(box: Box[(String, String)]) {
def pairValue: String = box.a._2
}
implicit def toPairBoxFunctions(box: Box[(String, String)]): PairBoxFunctions =
new PairBoxFunctions(box)
And then:
scala> Box(("a", "b")).pairValue
res0: String = b
scala> Box(new Foo("a", "b")).pairValue
<console>:16: error: value pairValue is not a member of Box[Foo]
Box(new Foo("a", "b")).pairValue
^
But if you make Box covariant…
case class Box[+A](a: A)
class Foo(k: String, v: String) extends Tuple2[String, String](k, v)
class PairBoxFunctions(box: Box[(String, String)]) {
def pairValue: String = box.a._2
}
implicit def toPairBoxFunctions(box: Box[(String, String)]): PairBoxFunctions =
new PairBoxFunctions(box)
…everything's fine:
scala> Box(("a", "b")).pairValue
res0: String = b
scala> Box(new Foo("a", "b")).pairValue
res1: String = b
You can't make RDD covariant, though, so defining your own implicit conversion to add the syntax is your best bet. Personally I'd probably choose to do the conversion explicitly, but this is a relatively un-horrible use of implicit conversions.
Not sure if I get your question right, but let say you have a case class
import org.apache.spark.rdd.RDD
case class DataFormat(id: Int, name: String, value: Double)
val data: Seq[(Int, String, Double)] = Seq(
(1, "Joe", 0.1),
(2, "Mike", 0.3)
)
val rdd: RDD[DataFormat] = (
sc.parallelize(data).map(x=>DataFormat(x._1, x._2, x._3))
)
// Print all data
rdd.foreach(println)
// Print only names
rdd.map(x=>x.name).foreach(println)

Different types in Map Scala

I need a Map where I put different types of values (Double, String, Int,...) in it, key can be String.
Is there a way to do this, so that I get the correct type with map.apply(k) like
val map: Map[String, SomeType] = Map()
val d: Double = map.apply("double")
val str: String = map.apply("string")
I already tried it with a generic type
class Container[T](element: T) {
def get: T = element
}
val d: Container[Double] = new Container(4.0)
val str: Container[String] = new Container("string")
val m: Map[String, Container] = Map("double" -> d, "string" -> str)
but it's not possible since Container takes an parameter. Is there any solution to this?
This is not straightforward.
The type of the value depends on the key. So the key has to carry the information about what type its value is. This is a common pattern. It is used for example in SBT (see for example SettingsKey[T]) and Shapeless Records (Example). However, in SBT the keys are a huge, complex class hierarchy of its own, and the HList in shapeless is pretty complex and also does more than you want.
So here is a small example of how you could implement this. The key knows the type, and the only way to create a Record or to get a value out of a Record is the key. We use a Map[Key, Any] internally as storage, but the casts are hidden and guaranteed to succeed. There is an operator to create records from keys, and an operator to merge records. I chose the operators so you can concatenate Records without having to use brackets.
sealed trait Record {
def apply[T](key:Key[T]) : T
def get[T](key:Key[T]) : Option[T]
def ++ (that:Record) : Record
}
private class RecordImpl(private val inner:Map[Key[_], Any]) extends Record {
def apply[T](key:Key[T]) : T = inner.apply(key).asInstanceOf[T]
def get[T](key:Key[T]) : Option[T] = inner.get(key).asInstanceOf[Option[T]]
def ++ (that:Record) = that match {
case that:RecordImpl => new RecordImpl(this.inner ++ that.inner)
}
}
final class Key[T] {
def ~>(value:T) : Record = new RecordImpl(Map(this -> value))
}
object Key {
def apply[T] = new Key[T]
}
Here is how you would use this. First define some keys:
val a = Key[Int]
val b = Key[String]
val c = Key[Float]
Then use them to create a record
val record = a ~> 1 ++ b ~> "abc" ++ c ~> 1.0f
When accessing the record using the keys, you will get a value of the right type back
scala> record(a)
res0: Int = 1
scala> record(b)
res1: String = abc
scala> record(c)
res2: Float = 1.0
I find this sort of data structure very useful. Sometimes you need more flexibility than a case class provides, but you don't want to resort to something completely type-unsafe like a Map[String,Any]. This is a good middle ground.
Edit: another option would be to have a map that uses a (name, type) pair as the real key internally. You have to provide both the name and the type when getting a value. If you choose the wrong type there is no entry. However this has a big potential for errors, like when you put in a byte and try to get out an int. So I think this is not a good idea.
import reflect.runtime.universe.TypeTag
class TypedMap[K](val inner:Map[(K, TypeTag[_]), Any]) extends AnyVal {
def updated[V](key:K, value:V)(implicit tag:TypeTag[V]) = new TypedMap[K](inner + ((key, tag) -> value))
def apply[V](key:K)(implicit tag:TypeTag[V]) = inner.apply((key, tag)).asInstanceOf[V]
def get[V](key:K)(implicit tag:TypeTag[V]) = inner.get((key, tag)).asInstanceOf[Option[V]]
}
object TypedMap {
def empty[K] = new TypedMap[K](Map.empty)
}
Usage:
scala> val x = TypedMap.empty[String].updated("a", 1).updated("b", "a string")
x: TypedMap[String] = TypedMap#30e1a76d
scala> x.apply[Int]("a")
res0: Int = 1
scala> x.apply[String]("b")
res1: String = a string
// this is what happens when you try to get something out with the wrong type.
scala> x.apply[Int]("b")
java.util.NoSuchElementException: key not found: (b,Int)
This is now very straightforward in shapeless,
scala> import shapeless._ ; import syntax.singleton._ ; import record._
import shapeless._
import syntax.singleton._
import record._
scala> val map = ("double" ->> 4.0) :: ("string" ->> "foo") :: HNil
map: ... <complex type elided> ... = 4.0 :: foo :: HNil
scala> map("double")
res0: Double with shapeless.record.KeyTag[String("double")] = 4.0
scala> map("string")
res1: String with shapeless.record.KeyTag[String("string")] = foo
scala> map("double")+1.0
res2: Double = 5.0
scala> val map2 = map.updateWith("double")(_+1.0)
map2: ... <complex type elided> ... = 5.0 :: foo :: HNil
scala> map2("double")
res3: Double = 5.0
This is with shapeless 2.0.0-SNAPSHOT as of the date of this answer.
I finally found my own solution, which worked best in my case:
case class Container[+T](element: T) {
def get[T]: T = {
element.asInstanceOf[T]
}
}
val map: Map[String, Container[Any]] = Map("a" -> Container[Double](4.0), "b" -> Container[String]("test"))
val double: Double = map.apply("a").get[Double]
val string: String = map.apply("b").get[String]
(a) Scala containers don't track type information for what's placed inside them, and
(b) the return "type" for an apply/get method with a simple String parameter/key is going to be static for a given instance of the object the method is to be applied to.
This feels very much like a design decision that needs to be rethought.
I don't think there's a way to get bare map.apply() to do what you'd want. As the other answers suggest, some sort of container class will be necessary. Here's an example that restricts the values to be only certain types (String, Double, Int, in this case):
sealed trait MapVal
case class StringMapVal(value: String) extends MapVal
case class DoubleMapVal(value: Double) extends MapVal
case class IntMapVal(value: Int) extends MapVal
val myMap: Map[String, MapVal] =
Map("key1" -> StringMapVal("value1"),
"key2" -> DoubleMapVal(3.14),
"key3" -> IntMapVal(42))
myMap.keys.foreach { k =>
val message =
myMap(k) match { // map.apply() in your example code
case StringMapVal(x) => "string: %s".format(x)
case DoubleMapVal(x) => "double: %.2f".format(x)
case IntMapVal(x) => "int: %d".format(x)
}
println(message)
}
The main benefit of the sealted trait is compile-time checking for non-exhaustive matches in pattern matching.
I also like this approach because it's relatively simple by Scala standards. You can go off into the weeds for something more robust, but in my opinion you're into diminishing returns pretty quickly.
If you want to do this you'd have to specify the type of Container to be Any, because Any is a supertype of both Double and String.
val d: Container[Any] = new Container(4.0)
val str: Container[Any] = new Container("string")
val m: Map[String, Container[Any]] = Map("double" -> d, "string" -> str)
Or to make things easier, you can change the definition of Container so that it's no longer type invariant:
class Container[+T](element: T) {
def get: T = element
override def toString = s"Container($element)"
}
val d: Container[Double] = new Container(4.0)
val str: Container[String] = new Container("string")
val m: Map[String, Container[Any]] = Map("double" -> d, "string" -> str)
There is a way but it's complicated. See Unboxed union types in Scala. Essentially you'll have to type the Map to some type Int |v| Double to be able to hold both Int and Double. You'll also pay a high price in compile times.

DSL for safe navigation operator in Scala

I want to build a Scala DSL to convert from a existing structure of Java POJOs to a structure equivalent to a Map.
However the incoming objects structure is very likely to contain a lot of null references, which will result in no value in the output map.
The performance is very important in this context so I need to avoid both reflection and throw/catch NPE.
I have considered already this topic which does not meet with my requirements.
I think the answer may lie in the usage of macros to generate some special type but I have no experience in the usage of scala macros.
More formally :
POJO classes provided by project : (there will be like 50 POJO, nested, so I want a solution which does not require to hand-write and maintain a class or trait for each of them)
case class Level1(
#BeanProperty var a: String,
#BeanProperty var b: Int)
case class Level2(
#BeanProperty var p: Level1,
#BeanProperty var b: Int)
expected behaviour :
println(convert(null)) // == Map()
println(convert(Level2(null, 3))) // == Map("l2.b" -> 3)
println(convert(Level2(Level1("a", 2), 3))) // == Map(l2.p.a -> a, l2.p.b -> 2, l2.b -> 3)
correct implementation but I want an easier DSL for writing the mappings
implicit def toOptionBuilder[T](f: => T) = new {
def ? : Option[T] = Option(f)
}
def convert(l2: Level2): Map[String, _] = l2? match {
case None => Map()
case Some(o2) => convert(o2.p, "l2.p.") + ("l2.b" -> o2.b)
}
def convert(l1: Level1, prefix: String = ""): Map[String, _] = l1? match {
case None => Map()
case Some(o1) => Map(
prefix + "a" -> o1.a,
prefix + "b" -> o1.b)
}
Here is how I want to write with a DSL :
def convertDsl(l2:Level2)={
Map(
"l2.b" -> l2?.b,
"l2.p.a" -> l2?.l1?.a,
"l2.p.b" -> l2?.l1?.b
)
}
Note that it is perfectly fine for me to specify that the property is optional with '?'.
What I want is to generate statically using a macro a method l2.?l1 or l2?.l1 which returns Option[Level1] (so type checking is done correctly in my DSL).
I couldn't refine it down to precisely the syntax you gave above, but generally, something like this might work:
sealed trait FieldSpec
sealed trait ValueFieldSpec[+T] extends FieldSpec
{
def value: Option[T]
}
case class IntFieldSpec(value: Option[Int]) extends ValueFieldSpec[Int]
case class StringFieldSpec(value: Option[String]) extends ValueFieldSpec[String]
case class Level1FieldSpec(input: Option[Level1]) extends FieldSpec
{
def a: ValueFieldSpec[_] = StringFieldSpec(input.map(_.a))
def b: ValueFieldSpec[_] = IntFieldSpec(input.map(_.b))
}
case class Level2FieldSpec(input: Option[Level2]) extends FieldSpec
{
def b: ValueFieldSpec[_] = IntFieldSpec(input.map(_.b))
def l1 = Level1FieldSpec(input.map(_.p))
}
case class SpecArrowAssoc(str: String)
{
def ->(value: ValueFieldSpec[_]) = (str, value)
}
implicit def str2SpecArrowAssoc(str: String) = SpecArrowAssoc(str)
implicit def Level2ToFieldSpec(input: Option[Level2]) = Level2FieldSpec(input)
def map(fields: (String, ValueFieldSpec[_])*): Map[String, _] =
Map[String, Any]((for {
field <- fields
value <- field._2.value
} yield (field._1, value)):_*)
def convertDsl(implicit l2: Level2): Map[String, _] =
{
map(
"l2.b" -> l2.?.b,
"l2.p.a" -> l2.?.l1.a,
"l2.p.b" -> l2.?.l1.b
)
}
Then we get:
scala> val myL2 = Level2(Level1("a", 2), 3)
myL2: Level2 = Level2(Level1(a,2),3)
scala> convertDsl(myL2)
res0: scala.collection.immutable.Map[String,Any] = Map(l2.b -> 3, l2.p.a -> a, l2.p.b -> 2)
Note that the DSL uses '.?' rather than just '?' as the only way I could see around Scala's trouble with semicolon inference and postfix operators (see, eg., #0__ 's answer to scala syntactic suger question).
Also, the strings you can provide are arbitrary (no checking or parsing of them is done), and this simplistic 'FieldSpec' hierarchy will assume that all your POJOs use 'a' for String fields and 'b' for Int fields etc.
With a bit of time and effort I'm sure this could be improved on.

construct case class from collection of parameters

Given:
case class Thing(a:Int, b:String, c:Double)
val v = Vector(1, "str", 7.3)
I want something that will magically create:
Thing(1, "str", 7.3)
Does such a thing exist (for arbitrary size Things)?
My first time dipping my toes into the 2.10 experimental reflection facilities. So mostly following this outline http://docs.scala-lang.org/overviews/reflection/overview.html, I came up with this:
import scala.reflect.runtime.{universe=>ru}
case class Thing(a: Int, b: String, c: Double)
object Test {
def main(args: Array[String]) {
val v = Vector(1, "str", 7.3)
val thing: Thing = Ref.runtimeCtor[Thing](v)
println(thing) // prints: Thing(1,str,7.3)
}
}
object Ref {
def runtimeCtor[T: ru.TypeTag](args: Seq[Any]): T = {
val typeTag = ru.typeTag[T]
val runtimeMirror = ru.runtimeMirror(getClass.getClassLoader)
val classSymbol = typeTag.tpe.typeSymbol.asClass
val classMirror = runtimeMirror.reflectClass(classSymbol)
val constructorSymbol = typeTag.tpe.declaration(ru.nme.CONSTRUCTOR).asMethod
val constructorMirrror = classMirror.reflectConstructor(constructorSymbol)
constructorMirrror(args: _*).asInstanceOf[T]
}
}
Note that when I had the case class inside the main method, this did not compile. I don't know if type tags can only be generated for non-inner case classes.
I don't know if it's possible to get a working solution with a compile-time error, but this is my solution using matching:
case class Thing(a: Int, b: String, c: Double)
def printThing(t: Thing) {
println(t.toString)
}
implicit def vectToThing(v: Vector[Any]) = v match {
case (Vector(a: Int, b: String, c: Double)) => new Thing(a, b, c)
}
val v = Vector(1, "str", 7.3) // this is of type Vector[Any]
printThing(v) // prints Thing(1,str,7.3)
printThing(Vector(2.0, 1.0)) // this is actually a MatchError
Is there an actual purpose to this "Thing"-conversion or would you rather use Tuple3[Int,String,Double] instead of Vector[Any]?
From your question it's not clear what you will use it for. What you call a Thing might actually be a HList or a KList. HList stands for Heterogeneous Lists which is an "arbitrary-length tuple".
I am unsure how hard it would be to add an 'unnapply' or 'unapplySeq' method in order for it to behave more like a case class.
I have little experience with them, but a good explanation can be found here: http://apocalisp.wordpress.com/2010/06/08/type-level-programming-in-scala/
If this is not what you need it might be a good idea to tell us what you want to achieve.

Can I name a tuple (define a structure?) in Scala 2.8?

It does not look very good for me to always repeat a line-long tuple definition every time I need it. Can I just name it and use as a type name? Would be nice to name its fields also instead of using ._1, ._2 etc.
Regarding your first question, you can simply use a type alias:
type KeyValue = (Int, String)
And, of course, Scala is an object-oriented language, so regarding your second about how to specialize a tuple, the magic word is inheritance:
case class KeyValue(key: Int, value: String) extends (Int, String)(key, value)
That's it. The class doesn't even need a body.
val kvp = KeyValue(42, "Hello")
kvp._1 // => res0: Int = 42
kvp.value // => res1: String = "Hello"
Note, however, that inheriting from case classes (which Tuple2 is), is deprecated and may be disallowed in the future. Here's the compiler warning you get for the above class definition:
warning: case class class KV has case class ancestor class Tuple2. This has been deprecated for unduly complicating both usage and implementation. You should instead use extractors for pattern matching on non-leaf nodes.
Type alias is fine for naming your Tuple, but try using a case class instead. You will be able to use named parameters
Example with tuple:
def foo(a : Int) : (Int, String) = {
(a,"bar")
}
val res = foo(1)
val size = res._1
val name= res._2
With a case class:
case class Result( size : Int , name : String )
def foo(a : Int) : Result = {
Result(a,"bar")
}
val res = foo(1)
val size = res.size
val name= res.name
Here's a solution that creates a type alias and a factory object.
scala> type T = (Int, String)
defined type alias T
scala> object T { def apply(i: Int, s: String): T = (i, s) }
defined module T
scala> new T(1, "a")
res0: (Int, String) = (1,a)
scala> T(1, "a")
res1: (Int, String) = (1,a)
However as others have mentioned, you probably should just create a case class.
Although as others have said, explicit (case) classes are best in the general sense.
However for localized scenarios what you can do is to use the tuple extractor to improve code readability:
val (first, second) = incrementPair(3, 4)
println(s"$first ... $second")
Given a method returning a tuple:
def incrementPair(pair: (Int, Int)) : (Int, Int) = {
val (first, second) = pair
(first + 1, second + 1)
}