Spark: sum over list containing None and Some()?

Spark: sum over list containing None and Some()? - scala

I already understand that I can sum over a list easily using List.sum:
var mylist = List(1,2,3,4,5)
mylist.sum
// res387: Int = 15
However, I have a list that contains elements like None and Some(1). These values were produced after running a left outer join.
Now, when I try to run List.sum, I get an error:
var mylist= List(Some(0), None, Some(0), Some(0), Some(1))
mylist.sum
<console>:27: error: could not find implicit value for parameter num: Numeric[Option[Int]]
mylist.sum
^
How can I fix this problem? Can I somehow convert the None and Some values to integers, perhaps right after the left outer join?

You can use List.collect method with pattern matching:
mylist.collect{ case Some(x) => x }.sum
// res9: Int = 1
This ignores the None element.
Another option is to use getOrElse on the Option to extract the values, here you can choose what value you want to replace None with:
mylist.map(_.getOrElse(0)).sum
// res10: Int = 1

I find the easiest way to deal with a collection of Option[A] is to flatten it:
val myList = List(Some(0), None, Some(0), Some(0), Some(1))
myList.flatten.sum
The call to flatten will remove all None values and turn the remaining Some[Int] into plain old Int--ultimately leaving you with a collection of Int.
And by the way, embrace that immutability is a first-class citizen in Scala and prefer val to var.

If you want to avoid creating extra intermediate collections with flatten or map you should consider using an Iterator, e.g.
mylist.iterator.flatten.sum
or
mylist.iterator.collect({ case Some(x) => x }).sum
or
mylist.iterator.map(_.getOrElse(0)).sum
I think the first and second approaches are a bit better since they avoid unnecessary additions of 0. I'd probably go with the first approach due to it's simplicity.
If you want to get a bit fancy (or needed the extra generality) you could define your own Numeric[Option[Int]] instance. Something like this should work for any type Option[N] where type N itself has a Numeric instance, i.e. Option[Int], Option[Double], Option[BigInt], Option[Option[Int]], etc.
implicit def optionNumeric[N](implicit num: Numeric[N]) = {
new Numeric[Option[N]] {
def compare(x: Option[N], y: Option[N]) = ??? //left as an exercise :-)
def fromInt(x: Int) = if (x != 0) Some(num.fromInt(x)) else None
def minus(x: Option[N], y: Option[N]) = x.map(vx => y.map(num.minus(vx, _)).getOrElse(vx)).orElse(negate(y))
def negate(x: Option[N]) = x.map(num.negate(_))
def plus(x: Option[N], y: Option[N]) = x.map(vx => y.map(num.plus(vx, _)).getOrElse(vx)).orElse(y)
def times(x: Option[N], y: Option[N]) = x.flatMap(vx => y.map(num.times(vx, _)))
def toDouble(x: Option[N]) = x.map(num.toDouble(_)).getOrElse(0d)
def toFloat(x: Option[N]) = x.map(num.toFloat(_)).getOrElse(0f)
def toInt(x: Option[N]) = x.map(num.toInt(_)).getOrElse(0)
def toLong(x: Option[N]) = x.map(num.toLong(_)).getOrElse(0L)
override val zero = None
override val one = Some(num.one)
}
}
Examples:
List(Some(3), None, None, Some(5), Some(1), None).sum
//Some(9)
List[Option[Int]](Some(2), Some(4)).product
//Some(8)
List(Some(2), Some(4), None).product
//None
List(Some(Some(3)), Some(None), Some(Some(5)), None, Some(Some(1)), Some(None)).sum
//Some(Some(9))
List[Option[Option[Int]]](Some(Some(2)), Some(Some(4))).product
//Some(Some(8))
List[Option[Option[Int]]](Some(Some(2)), Some(Some(4)), None).product
//None
List[Option[Option[Int]]](Some(Some(2)), Some(Some(4)), Some(None)).product
//Some(None) !?!?!
Note that there may be multiple ways of representing "zero", e.g. None or Some(0) in the case of Option[Int], though preference is given to None. Also, note this approach contains the basic idea of how one goes about turning a semigroup (without an additive identity) into a monoid.

you can use a .fold or .reduce and implement the sum of 2 Options manually. But I would go by the #Psidom approach

Folding on the list is a more optimized solution. Beware of chaining function calls on collections, as you may be iterating over something like a List multiple times.
A more optimized approach would look something like
val foo = List(Some(1), Some(2), None, Some(3))
foo.foldLeft(0)((acc, optNum) => acc + optNum.getOrElse(0))

Related

How to improve this function?

Suppose I've got a data structure like that:
case class B(bx: Int)
case class A(ax: Int, bs: Seq[B])
I am writing a function A => Seq[(Int, Option[Int])] as follows:
def foo(a: A): Seq[(Int, Option[Int])] =
if (a.bs.isEmpty) Seq((a.ax, None)) else a.bs.map(b => (a.ax, Some(b.bx)))
It seems working but I don't like the branching. How would you improve foo ?

Another option - add an auxiliary function that takes a Seq[T] and returns a Seq[Option[T]] where the output is never empty - if the input is empty, the output would have a single None element in its result:
def foo(a: A): Seq[(Int, Option[Int])] = toOptions(a.bs.map(_.bx)).map((a.ax, _))
// always returns a non-empty list - with None as the only value for empty input
def toOptions[T](s: Seq[T]): Seq[Option[T]] = s.headOption +: s.drop(1).map(Some(_))
Benefits:
This truly has no branching (including getOrElse which is a kind of branching, albeit a more elegant one)
No repetition of building the tuple (a.ax called once)
Nice separation of concerns (building a never-empty list vs. dealing with A and Bs)

Use Option companion object to compose.
def foo(a: A): Seq[(Int, Option[Int])] =
Option(a.bs).filterNot(_.isEmpty)
.map(list => list.map(b => (a.ax, Some(b.bx))))
.getOrElse(Seq((a.ax, None)))

Scala - Difference between map and flatMap [duplicate]

This question already has answers here:
Map versus FlatMap on String
(5 answers)
Closed 5 years ago.
Can anyone teach me property use cases of map and flatMap?
In Option case, I know these two methods have each signature, def map(A => B): Option[B] and def flatMap(A => Option[B]): Option[B].
So, I can get some value by two ways:
scala> val a = Some(1).map(_ + 2)
a: Option[Int] = Some(3)
scala> val a2 = Some(1).flatMap(n => Some(n + 2))
a2: Option[Int] = Some(3)
When I write a method: def plusTwo(n: Int), is there any difference between
def plusTwo(n: Int): Int = n + 2
Some(1).map(plusTwo)
and
def plusTwo(n: Int): Option[Int] = Some(n + 2)
Some(1).flatMap(plusTwo)
flatMap can convert to for-comprehension, and is it better that almost all methods return value Option wrapped?

Let's say you have a List:
val names = List("Benny", "Danna", "Tal")
names: List[String] = List(Benny, Danna, Tal)
Now let's go with your example. Say we have a function that returns an Option:
def f(name: String) = if (name contains "nn") Some(name) else None
The map function works by applying a function to each element in the list:
names.map(name => f(name))
List[Option[String]] = List(Some(Benny), Some(Danna), None)
In the other hand, flatMap applies a function that returns a sequence for each element in the list, and flattens the results into the original list
names.flatMap(name => f(name))
List[String] = List(Benny, Danna)
As you can see, the flatMap removed the Some/None layer and kept only the original list.

Your function plusTwo returns valid results for all input since you can add 2 to any Int.
There is no need to define that it returns Option[Int] because None value is never returned. That's why for such functions you use Option.map
But not all functions have meaningful result for every input. For example if your function divide some number by function parameter then it makes no sense to pass zero to that function.
Let's say we have a function:
def divideTenBy(a: Int): Double
When you invoke it with zero then ArithmeticException is thrown. Then you have to remember to catch this exception so it's better to make our function less error prone.
def divideTenBy(a: Int): Option[Double] = if (a == 0) None else Some(10 / a)
With such functions you can use flatMap since you can have 'None' value in optional (left operand) or given function can return None.
Now you can safely map this function on any value:
scala> None.flatMap(divideTenBy)
res9: Option[Double] = None
scala> Some(2).flatMap(divideTenBy)
res10: Option[Double] = Some(5.0)
scala> Some(0).flatMap(divideTenBy)
res11: Option[Double] = None

Use List as monad in Scala

I'm wondering what is idiomatic way to applying some operation on the List if it is not empty, and return empty List (Nil) if list is empty.
val result= myList match {
case Nil => Nil // this one looks bad for me
case nonEmpty => myService.getByFilters(nonEmpty)
}
Just using map operation on the list will trigger loop, but I want to achieve same result as map for Option type - i.e. do something only once if List is non-empty, and do nothing if List is empty

I think your design is not quite right perhaps. You should be just able to pass any list into the getByFilters function and it should just handle lists of any length. So there should be no need for these sorts of checks.
If the design change is not possible there is nothing wrong with if:
val result = if(myList.isEmpty) Nil else myService.getByFilters(myList)
It's idiomatic because if returns values. Maybe there are other clean ways, I don't know.
If you just want to require non empty list argument you can use HList or alternatively, you can use this trick:
def takesNonEmptyList[T](head: T, tail: T *): List[T] = head :: tail.toList
You can do something fake to make it seem look idiomatic, but I would not recommend it. It's unclear and unnecessary complication:
def getByFilters(xs: List[Int]) = xs.filter(_ % 2 == 0)
val res = l.headOption.map(_ :: l.tail).map(getByFilters).getOrElse(Nil)
println(res)
prints List(2, 4)

If you really want it, you can just implement your own semantic:
implicit class MySpecialList[T](xs: List[T]) {
def mapIfNotEmpty[R](f: List[T] ⇒ List[R]): List[R] =
if (xs.isEmpty) Nil else f(xs)
}
def getStuff(xs: List[Int]) = xs.map(_ + " OK")
val x: List[Int] = List(1,2,3)
val y: List[Int] = List()
def main(args: Array[String]): Unit = {
val xx = x.mapIfNotEmpty(getStuff) // List("1 OK", "2 OK", "3 OK")
val yy = y.mapIfNotEmpty(getStuff) // List()
}

There is method headOption in List, so you could use option semantic to lift List to Option[List]:
import scala.collection.TraversableLike
implicit class TraversableOption[T <: TraversableLike[_, T]](traversable: T) {
def opt: Option[T] = traversable.headOption.map(_ => traversable)
}
you can use it as:
val result = myList.opt.fold[List[Int]](Nil)(myService.getByFilters)

By invoking each filter service separately,
myList.flatMap(filter => myService.getByFilters(List(filter)))
it gets an empty list if myList is empty. If performance may be a matter, consider also a parallel version with
myList.par

Getting Value of Either

Besides using match, is there an Option-like way to getOrElse the actual content of the Right or Left value?
scala> val x: Either[String,Int] = Right(5)
scala> val a: String = x match {
case Right(x) => x.toString
case Left(x) => "left"
}
a: String = 5

Nicolas Rinaudo's answer regarding calling getOrElse on either the left or right projection is probably the closest to Option.getOrElse.
Alternatively, you can fold the either:
scala> val x: Either[String,Int] = Right(5)
x: Either[String,Int] = Right(5)
scala> val a: String = x.fold(l => "left", r => r.toString)
a: String = 5
As l is not used in the above fold, you could also write x.fold(_ => "left", r => r.toString)
Edit:
Actually, you can literally have Option.getOrElse by calling toOption on the left or right projection of the either, eg,
scala> val o: Option[Int] = x.right.toOption
o: Option[Int] = Some(5)
scala> val a: String = o.map(_.toString).getOrElse("left")
a: String = 5

I don't particularly like Either and as a result I'm not terribly familiar with it, but I believe you're looking for projections: either.left.getOrElse or either.right.getOrElse.
Note that projections can be used in for-comprehensions as well. This is an example straight from the documentation:
def interactWithDB(x: Query): Either[Exception, Result] =
try {
Right(getResultFromDatabase(x))
} catch {
case ex => Left(ex)
}
// this will only be executed if interactWithDB returns a Right
val report =
for (r <- interactWithDB(someQuery).right) yield generateReport(r)
if (report.isRight)
send(report)
else
log("report not generated, reason was " + report.left.get)

Given type A on both sides, that is, Either[A, A], we can use Either.merge
...to extract values from Either instances regardless of whether they are
Left or Right.
Note if left and right types differ then result is least upper bound of the two types which may become in worst case Any:
val e: Either[Int, String] = Right("hello")
e.merge // hello: Any

In Scala 2.12 there is a getOrElse method for getting the "right" value but you cannot use it for the "left" value directly. However, you can do it like this: e.swap.getOrElse(42).

Best way to score current extremum in collection type

I’m currently a little tired so I might be missing the obvious.
I have a var _minVal: Option[Double], which shall hold the minimal value contained in a collection of Doubles (or None, if the collection is empty)
When adding a new item to the collection, I have too check if _minVal is either None or greater than the new item (=candidate for new mimimum).
I’ve gone from
_minVal = Some(_minVal match {
case Some(oldMin) => if (candidate < oldMin) candidate
else oldMin
case None => candidate
})
(not very DRY) to
_minVal = Some(min(_minVal getOrElse candidate, candidate))
but still think I might be missing something…

Without Scalaz, you are going to pay some RY. But I'd write it as:
_minVal = _minVal map (candidate min) orElse Some(candidate)
EDIT
Eric Torreborre, of Specs/Specs2 fame, was kind enough to pursue the Scalaz solution that has eluded me. Being a testing framework guy, he wrote the answer in a testing format, instead of the imperative, side-effecting original. :-)
Here's the version using _minVal, Double instead of Int, side-effects, and some twists of mine now that Eric has done the hard work.
// From the question (candidate provided for testing purposes)
var _minVal: Option[Double] = None
def candidate = scala.util.Random.nextDouble
// A function "min"
def min = (_: Double) min (_: Double)
// A function "orElse"
def orElse = (_: Option[Double]) orElse (_: Option[Double])
// Extract function to decrease noise
def updateMin = _minVal map min.curried(_: Double)
// This is the Scalaz vesion for the above -- type inference is not kind to it
// def updateMin = (_minVal map min.curried).sequence[({type lambda[a] = (Double => a)})#lambda, Double]
// Say the magic words
import scalaz._
import Scalaz._
def orElseSome = (Option(_: Double)) andThen orElse.flip.curried
def updateMinOrSome = updateMin <*> orElseSome
// TAH-DAH!
_minVal = updateMinOrSome(candidate)

Here is an update to Daniel's answer, using Scalaz:
Here's a curried 'min' function:
def min = (i: Int) => (j: Int) => if (i < j) i else j
And 2 variables:
// the last minimum value
def lastMin: Option[Int] = None
// the new value
def current = 1
Now let's define 2 new functions
// this one does the minimum update
def updateMin = (i: Int) => lastMin map (min(i))
// this one provides a default value if the option o2 is not defined
def orElse = (o1: Int) => (o2: Option[Int]) => o2 orElse Some(o1)
Then using the excellent explanation by #dibblego of why Function1[T, _] is an applicative functor, we can avoid the repetition of the 'current' variable:
(updateMin <*> orElse).apply(current) === Some(current)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Spark: sum over list containing None and Some()? - scala

you can use a .fold or .reduce and implement the sum of 2 Options manually. But I would go by the #Psidom approach

Related

How to improve this function?

Scala - Difference between map and flatMap [duplicate]

Use List as monad in Scala

Getting Value of Either

Best way to score current extremum in collection type

Categories

Resources