Get each field as an option of an optional object - scala

I have a case class looks like this
case class EmotionData(
fearful: Double,
angry: Double,
sad: Double,
neutral: Double,
disgusted: Double,
surprised: Double,
happy: Double
)
I receive an Option[EmotionData] and I need each emotion data as an Option[Double].
What I did was:
val (fearful, angry, sad, neutral, disgusted, surprised, happy) = videoResult.emotion match {
case Some(e) => (Some(e.fearful), Some(e.angry), Some(e.sad), Some(e.neutral), Some(e.disgusted), Some(e.surprised), Some(e.happy))
case None => (None, None, None, None, None, None, None)
}
This way I have each field as an Option[Double] value.
But isn't there a way to do this in Scala where I can iterate through all fields of an object and extract them without rewriting each field?

Here's a slightly different approach that might be, perhaps, a little more palatable.
val vidEmo :Option[EmotionData] = videoResult.emotion
val (fearful, angry, sad, neutral, disgusted, surprised, happy) =
(vidEmo.map(_.fearful)
,vidEmo.map(_.angry)
,vidEmo.map(_.sad)
,vidEmo.map(_.neutral)
,vidEmo.map(_.disgusted)
,vidEmo.map(_.surprised)
,vidEmo.map(_.happy))
But really, you should just keep vidEmo around and extract what you need when you need it.

Yes, there is a way to iterate through the fields of an object by using productIterator. It would look something like this:
val List(fearful, angry, sad, neutral, disgusted, surprised, happy) =
videoResult.emotion.map(_.productIterator.map(f => Some(f.asInstanceOf[Double])).toList)
.getOrElse(List.fill(7)(None))
As you can see, this isn't much better than what you already have, and is more prone to error. The problem is that the number and order of fields is explicit in the result you have specified, so there are limits to how much this can be automated. And this only works because the type of all the fields is the same.
Personally I would keep the value as Option[EmotionData] as long as possible, and pick out individual values as needed, like this:
val opt = videoResult.emotion
val fearful = opt.map(_.fearful) // Option[Double]
val angry = opt.map(_.angry) // Option[Double]
val sad = opt.map(_.sad) // Option[Double]
val happy = opt.fold(0)(_.happy) // Double, default is 0 if opt is None
val ok = opt.forall(e => e.happy > e.sad) // True if emotion not set or more happy than sad
val disgusted = opt.exists(_.disgusted > 1.0) // True if emotion is set and disgusted value is large

Maybe this?
case class EmotionData(
fearful: Double,
angry: Double,
sad: Double,
neutral: Double,
disgusted: Double,
surprised: Double,
happy: Double
)
val s = Some(EmotionData(1,2,3,4,5,6,7))
val n:Option[EmotionData] = None
val emotionsOpt = s.map { x =>
x.productIterator.toVector.map(x => Some(x.asInstanceOf[Double]))
}.getOrElse(List.fill(7)(None))
// Or if you want an iterator:
val emotionsOptItr = n.map { x =>
x.productIterator.map(x => Some(x.asInstanceOf[Double]))
}.getOrElse(List.fill(7)(None))
println(emotionsOpt)
println(emotionsOptItr)
Which results in:
Vector(Some(1), Some(2), Some(3), Some(4), Some(5), Some(6), Some(7))
List(None, None, None, None, None, None, None)

You can do Something like that:
val defaultEmotionData=(0.0,0.0,0.0,0.0,0.0,0.0,0.0)
object Solution1 extends App{
case class EmotionData(
fearful: Double,
angry: Double,
sad: Double,
neutral: Double,
disgusted: Double,
surprised: Double,
happy: Double
)
case class EmotionDataOption(
fearfulOpt: Option[Double],
angryOpt: Option[Double],
sadOpt: Option[Double],
neutralOpt: Option[Double],
disgustedOpt: Option[Double],
surprisedOpt: Option[Double],
happyOpt: Option[Double]
)
val emotion = Some(EmotionData(1.2, 3.4, 5, 6, 7.8, 3, 12))
val ans: EmotionDataOption = emotion.getOrElse(defaultEmotionData).toOption
implicit class ToOption(emotionData: EmotionData) {
def toOption = EmotionDataOption(Some(emotionData.fearful), Some(emotionData.angry), Some(emotionData.sad), Some(emotionData
.neutral), Some(emotionData.disgusted), Some(emotionData.surprised), Some(emotionData.happy))
}
}
Now where ever you will have an object of type EmotionData you can use toOption on that and it will convert it's values into EmotionDataOption which will have values Option[Double].
If you will return Tuple7 then it will be tough to access values, that's why I think converting it into another case class EmotionDataOption is a good idea and you will be able to access the values easily with the parameter name.

Related

How to convert values of a case class into Seq?

I am new to Scala and I am having to provide values extracted from an object/case class into a Seq. I was wondering whether there would be any generic way of extracting values of an object into Seq of those values in order?
Convert the following:
case class Customer(name: Option[String], age: Int)
val customer = Customer(Some("John"), 24)
into:
val values = Seq("John", 24)
case class extends Product class and it provides such method:
case class Person(age:Int, name:String, lastName:Option[String])
def seq(p:Product) = p.productIterator.toList
val s:Seq[Any] = seq(Person(100, "Albert", Some("Einstain")))
println(s) //List(100, Albert, Some(Einstain))
https://scalafiddle.io/sf/oD7qk8u/0
Problem is that you will get untyped list/array from it. Most of the time it is not optimal way of doing things, and you should always prefer statically typed solutions.
Scala 3 (Dotty) might give us HList out-of-the-box which is a way of getting product's values without loosing type information. Given val picard = Customer(Some("Picard"), 75) consider the difference between
val l: List[Any] = picard.productIterator.toList
l(1)
// val res0: Any = 75
and
val hl: (Option[String], Int) = Tuple.fromProductTyped(picard)
hl(1)
// val res1: Int = 75
Note how res1 did not loose type information.
Informally, it might help to think of an HList as making a case class more generic by dropping its name whilst retaining its fields, for example, whilst Person and Robot are two separate models
Robot(name: Option[String], age: Int)
Person(name: Option[String], age: Int)
they could both represented by a common "HList" that looks something like
(_: Option[String], _: Int) // I dropped the names
If it's enough for you to have Seq[Any] you can use productIterator approach proposed by #Scalway. If I understood correctly you want also to unpack Option fields. But you haven't specified what to do with None case like Customer(None, 24).
val values: Seq[Any] = customer.productIterator.map {
case Some(x) => x
case x => x
}.toSeq // List(John, 24)
Statically typed solution would be to use heterogeneous collection e.g. HList
class Default[A](val value: A)
object Default {
implicit val int: Default[Int] = new Default(0)
implicit val string: Default[String] = new Default("")
//...
}
trait LowPriorityUnpackOption extends Poly1 {
implicit def default[A]: Case.Aux[A, A] = at(identity)
}
object unpackOption extends LowPriorityUnpackOption {
implicit def option[A](implicit default: Default[A]): Case.Aux[Option[A], A] = at {
case Some(a) => a
case None => default.value
}
}
val values: String :: Int :: HNil =
Generic[Customer].to(customer).map(unpackOption) // John :: 24 :: HNil
Generally it would be better to work with Option monadically rather than to unpack them.

Spark: sum over list containing None and Some()?

I already understand that I can sum over a list easily using List.sum:
var mylist = List(1,2,3,4,5)
mylist.sum
// res387: Int = 15
However, I have a list that contains elements like None and Some(1). These values were produced after running a left outer join.
Now, when I try to run List.sum, I get an error:
var mylist= List(Some(0), None, Some(0), Some(0), Some(1))
mylist.sum
<console>:27: error: could not find implicit value for parameter num: Numeric[Option[Int]]
mylist.sum
^
How can I fix this problem? Can I somehow convert the None and Some values to integers, perhaps right after the left outer join?
You can use List.collect method with pattern matching:
mylist.collect{ case Some(x) => x }.sum
// res9: Int = 1
This ignores the None element.
Another option is to use getOrElse on the Option to extract the values, here you can choose what value you want to replace None with:
mylist.map(_.getOrElse(0)).sum
// res10: Int = 1
I find the easiest way to deal with a collection of Option[A] is to flatten it:
val myList = List(Some(0), None, Some(0), Some(0), Some(1))
myList.flatten.sum
The call to flatten will remove all None values and turn the remaining Some[Int] into plain old Int--ultimately leaving you with a collection of Int.
And by the way, embrace that immutability is a first-class citizen in Scala and prefer val to var.
If you want to avoid creating extra intermediate collections with flatten or map you should consider using an Iterator, e.g.
mylist.iterator.flatten.sum
or
mylist.iterator.collect({ case Some(x) => x }).sum
or
mylist.iterator.map(_.getOrElse(0)).sum
I think the first and second approaches are a bit better since they avoid unnecessary additions of 0. I'd probably go with the first approach due to it's simplicity.
If you want to get a bit fancy (or needed the extra generality) you could define your own Numeric[Option[Int]] instance. Something like this should work for any type Option[N] where type N itself has a Numeric instance, i.e. Option[Int], Option[Double], Option[BigInt], Option[Option[Int]], etc.
implicit def optionNumeric[N](implicit num: Numeric[N]) = {
new Numeric[Option[N]] {
def compare(x: Option[N], y: Option[N]) = ??? //left as an exercise :-)
def fromInt(x: Int) = if (x != 0) Some(num.fromInt(x)) else None
def minus(x: Option[N], y: Option[N]) = x.map(vx => y.map(num.minus(vx, _)).getOrElse(vx)).orElse(negate(y))
def negate(x: Option[N]) = x.map(num.negate(_))
def plus(x: Option[N], y: Option[N]) = x.map(vx => y.map(num.plus(vx, _)).getOrElse(vx)).orElse(y)
def times(x: Option[N], y: Option[N]) = x.flatMap(vx => y.map(num.times(vx, _)))
def toDouble(x: Option[N]) = x.map(num.toDouble(_)).getOrElse(0d)
def toFloat(x: Option[N]) = x.map(num.toFloat(_)).getOrElse(0f)
def toInt(x: Option[N]) = x.map(num.toInt(_)).getOrElse(0)
def toLong(x: Option[N]) = x.map(num.toLong(_)).getOrElse(0L)
override val zero = None
override val one = Some(num.one)
}
}
Examples:
List(Some(3), None, None, Some(5), Some(1), None).sum
//Some(9)
List[Option[Int]](Some(2), Some(4)).product
//Some(8)
List(Some(2), Some(4), None).product
//None
List(Some(Some(3)), Some(None), Some(Some(5)), None, Some(Some(1)), Some(None)).sum
//Some(Some(9))
List[Option[Option[Int]]](Some(Some(2)), Some(Some(4))).product
//Some(Some(8))
List[Option[Option[Int]]](Some(Some(2)), Some(Some(4)), None).product
//None
List[Option[Option[Int]]](Some(Some(2)), Some(Some(4)), Some(None)).product
//Some(None) !?!?!
Note that there may be multiple ways of representing "zero", e.g. None or Some(0) in the case of Option[Int], though preference is given to None. Also, note this approach contains the basic idea of how one goes about turning a semigroup (without an additive identity) into a monoid.
you can use a .fold or .reduce and implement the sum of 2 Options manually. But I would go by the #Psidom approach
Folding on the list is a more optimized solution. Beware of chaining function calls on collections, as you may be iterating over something like a List multiple times.
A more optimized approach would look something like
val foo = List(Some(1), Some(2), None, Some(3))
foo.foldLeft(0)((acc, optNum) => acc + optNum.getOrElse(0))

How do I put a case class in an rdd and have it act like a tuple(pair)?

Say for example, I have a simple case class
case class Foo(k:String, v1:String, v2:String)
Can I get spark to recognise this as a tuple for the purposes of something like this, without converting to a tuple in, say a map or keyBy step.
val rdd = sc.parallelize(List(Foo("k", "v1", "v2")))
// Swap values
rdd.mapValues(v => (v._2, v._1))
I don't even care if it looses the original case class after such an operation. I've tried the following with no luck. I'm fairly new to Scala, am I missing something?
case class Foo(k:String, v1:String, v2:String)
extends Tuple2[String, (String, String)](k, (v1, v2))
edit: In the above snippet the case class extends Tuple2, this does not produce the desired effect that the RDD class and functions do not treat it like a tuple and allow PairRDDFunctions, such as mapValues, values, reduceByKey, etc.
Extending TupleN isn't a good idea for a number of reasons, with one of the best being the fact that it's deprecated, and on 2.11 it's not even possible to extend TupleN with a case class. Even if you make your Foo a non-case class, defining it on 2.11 with -deprecation will show you this: "warning: inheritance from class Tuple2 in package scala is deprecated: Tuples will be made final in a future version.".
If what you care about is convenience of use and you don't mind the (almost certainly negligible) overhead of the conversion to a tuple, you can enrich a RDD[Foo] with the syntax provided by PairRDDFunctions with a conversion like this:
import org.apache.spark.rdd.{ PairRDDFunctions, RDD }
case class Foo(k: String, v1: String, v2: String)
implicit def fooToPairRDDFunctions[K, V]
(rdd: RDD[Foo]): PairRDDFunctions[String, (String, String)] =
new PairRDDFunctions(
rdd.map {
case Foo(k, v1, v2) => k -> (v1, v2)
}
)
And then:
scala> val rdd = sc.parallelize(List(Foo("a", "b", "c"), Foo("d", "e", "f")))
rdd: org.apache.spark.rdd.RDD[Foo] = ParallelCollectionRDD[6] at parallelize at <console>:34
scala> rdd.mapValues(_._1).first
res0: (String, String) = (a,b)
The reason your version with Foo extending Tuple2[String, (String, String)] doesn't work is that RDD.rddToPairRDDFunctions targets an RDD[Tuple2[K, V]] and RDD isn't covariant in its type parameter, so an RDD[Foo] isn't a RDD[Tuple2[K, V]]. A simpler example might make this clearer:
case class Box[A](a: A)
class Foo(k: String, v: String) extends Tuple2[String, String](k, v)
class PairBoxFunctions(box: Box[(String, String)]) {
def pairValue: String = box.a._2
}
implicit def toPairBoxFunctions(box: Box[(String, String)]): PairBoxFunctions =
new PairBoxFunctions(box)
And then:
scala> Box(("a", "b")).pairValue
res0: String = b
scala> Box(new Foo("a", "b")).pairValue
<console>:16: error: value pairValue is not a member of Box[Foo]
Box(new Foo("a", "b")).pairValue
^
But if you make Box covariant…
case class Box[+A](a: A)
class Foo(k: String, v: String) extends Tuple2[String, String](k, v)
class PairBoxFunctions(box: Box[(String, String)]) {
def pairValue: String = box.a._2
}
implicit def toPairBoxFunctions(box: Box[(String, String)]): PairBoxFunctions =
new PairBoxFunctions(box)
…everything's fine:
scala> Box(("a", "b")).pairValue
res0: String = b
scala> Box(new Foo("a", "b")).pairValue
res1: String = b
You can't make RDD covariant, though, so defining your own implicit conversion to add the syntax is your best bet. Personally I'd probably choose to do the conversion explicitly, but this is a relatively un-horrible use of implicit conversions.
Not sure if I get your question right, but let say you have a case class
import org.apache.spark.rdd.RDD
case class DataFormat(id: Int, name: String, value: Double)
val data: Seq[(Int, String, Double)] = Seq(
(1, "Joe", 0.1),
(2, "Mike", 0.3)
)
val rdd: RDD[DataFormat] = (
sc.parallelize(data).map(x=>DataFormat(x._1, x._2, x._3))
)
// Print all data
rdd.foreach(println)
// Print only names
rdd.map(x=>x.name).foreach(println)

construct case class from collection of parameters

Given:
case class Thing(a:Int, b:String, c:Double)
val v = Vector(1, "str", 7.3)
I want something that will magically create:
Thing(1, "str", 7.3)
Does such a thing exist (for arbitrary size Things)?
My first time dipping my toes into the 2.10 experimental reflection facilities. So mostly following this outline http://docs.scala-lang.org/overviews/reflection/overview.html, I came up with this:
import scala.reflect.runtime.{universe=>ru}
case class Thing(a: Int, b: String, c: Double)
object Test {
def main(args: Array[String]) {
val v = Vector(1, "str", 7.3)
val thing: Thing = Ref.runtimeCtor[Thing](v)
println(thing) // prints: Thing(1,str,7.3)
}
}
object Ref {
def runtimeCtor[T: ru.TypeTag](args: Seq[Any]): T = {
val typeTag = ru.typeTag[T]
val runtimeMirror = ru.runtimeMirror(getClass.getClassLoader)
val classSymbol = typeTag.tpe.typeSymbol.asClass
val classMirror = runtimeMirror.reflectClass(classSymbol)
val constructorSymbol = typeTag.tpe.declaration(ru.nme.CONSTRUCTOR).asMethod
val constructorMirrror = classMirror.reflectConstructor(constructorSymbol)
constructorMirrror(args: _*).asInstanceOf[T]
}
}
Note that when I had the case class inside the main method, this did not compile. I don't know if type tags can only be generated for non-inner case classes.
I don't know if it's possible to get a working solution with a compile-time error, but this is my solution using matching:
case class Thing(a: Int, b: String, c: Double)
def printThing(t: Thing) {
println(t.toString)
}
implicit def vectToThing(v: Vector[Any]) = v match {
case (Vector(a: Int, b: String, c: Double)) => new Thing(a, b, c)
}
val v = Vector(1, "str", 7.3) // this is of type Vector[Any]
printThing(v) // prints Thing(1,str,7.3)
printThing(Vector(2.0, 1.0)) // this is actually a MatchError
Is there an actual purpose to this "Thing"-conversion or would you rather use Tuple3[Int,String,Double] instead of Vector[Any]?
From your question it's not clear what you will use it for. What you call a Thing might actually be a HList or a KList. HList stands for Heterogeneous Lists which is an "arbitrary-length tuple".
I am unsure how hard it would be to add an 'unnapply' or 'unapplySeq' method in order for it to behave more like a case class.
I have little experience with them, but a good explanation can be found here: http://apocalisp.wordpress.com/2010/06/08/type-level-programming-in-scala/
If this is not what you need it might be a good idea to tell us what you want to achieve.

Best way to score current extremum in collection type

I’m currently a little tired so I might be missing the obvious.
I have a var _minVal: Option[Double], which shall hold the minimal value contained in a collection of Doubles (or None, if the collection is empty)
When adding a new item to the collection, I have too check if _minVal is either None or greater than the new item (=candidate for new mimimum).
I’ve gone from
_minVal = Some(_minVal match {
case Some(oldMin) => if (candidate < oldMin) candidate
else oldMin
case None => candidate
})
(not very DRY) to
_minVal = Some(min(_minVal getOrElse candidate, candidate))
but still think I might be missing something…
Without Scalaz, you are going to pay some RY. But I'd write it as:
_minVal = _minVal map (candidate min) orElse Some(candidate)
EDIT
Eric Torreborre, of Specs/Specs2 fame, was kind enough to pursue the Scalaz solution that has eluded me. Being a testing framework guy, he wrote the answer in a testing format, instead of the imperative, side-effecting original. :-)
Here's the version using _minVal, Double instead of Int, side-effects, and some twists of mine now that Eric has done the hard work.
// From the question (candidate provided for testing purposes)
var _minVal: Option[Double] = None
def candidate = scala.util.Random.nextDouble
// A function "min"
def min = (_: Double) min (_: Double)
// A function "orElse"
def orElse = (_: Option[Double]) orElse (_: Option[Double])
// Extract function to decrease noise
def updateMin = _minVal map min.curried(_: Double)
// This is the Scalaz vesion for the above -- type inference is not kind to it
// def updateMin = (_minVal map min.curried).sequence[({type lambda[a] = (Double => a)})#lambda, Double]
// Say the magic words
import scalaz._
import Scalaz._
def orElseSome = (Option(_: Double)) andThen orElse.flip.curried
def updateMinOrSome = updateMin <*> orElseSome
// TAH-DAH!
_minVal = updateMinOrSome(candidate)
Here is an update to Daniel's answer, using Scalaz:
Here's a curried 'min' function:
def min = (i: Int) => (j: Int) => if (i < j) i else j
And 2 variables:
// the last minimum value
def lastMin: Option[Int] = None
// the new value
def current = 1
Now let's define 2 new functions
// this one does the minimum update
def updateMin = (i: Int) => lastMin map (min(i))
// this one provides a default value if the option o2 is not defined
def orElse = (o1: Int) => (o2: Option[Int]) => o2 orElse Some(o1)
Then using the excellent explanation by #dibblego of why Function1[T, _] is an applicative functor, we can avoid the repetition of the 'current' variable:
(updateMin <*> orElse).apply(current) === Some(current)