Option monad in scala - scala

how is meant to work Option monad? I'm browsing the scala api and there is an example (I mean the second one),
Because of how for comprehension works, if None is returned from request.getParameter, the entire expression results in None
But when I try this code:
val upper = for {
name <- None //request.getParameter("name")
trimmed <- Some(name.trim)
upper <- Some(trimmed.toUpperCase) if trimmed.length != 0
} yield upper
println(upper.getOrElse(""))
I get a compile error. How is this supposed to work?

You get a compiler error because of this
name <- None
That way, the type of None is set to None.type and the variable name is inferred to be of type Nothing. (That is, it would have this type if it actually existed but obviously the for comprehension does not even get to creating it at runtime.) Therefore no method name.trim exists and it won’t compile.
If you had request.getParameter("name") available, its type would be Option[String], name would potentially have type String and name.trim would compile.
You can work around this by specifying the type of None:
name <- None: Option[String]

To expand on Kevin's answer, you can avoid wrapping values in Some() by using the = operator instead of the <- operator:
val upper = for {
name <- None: Option[String] //request.getParameter("name")
trimmed = name.trim
upper = trimmed.toUpperCase if trimmed nonEmpty
} yield upper
The for-comprehension will compile to something very similar to Kevin's version, but I often find it more clear to use map and filter explicitly to avoid clutter (e.g. extra variable names) that add nothing to the semantic content of the expression.

To expand on Debilski's answer, you also don't need to explicitly wrap subsequent values in Some(), the only value you're actually mapping over is the original name.
A better approach would be be to use the map and filter operations directly instead of a for-comprehension:
NOTE: behind the scenes, the Scala compiler will convert a for-comprehension to a combination of map/flatMap/filter anyway, so this approach will never be less efficient than a for-comprehension, and may well be more efficient
def clean(x:Option[String]) = x map { _.trim.toUpperCase } filterNot { _.isEmpty }
val param = None : Option[String] // request.getParameter("name")
println( clean(param).getOrElse("") )

Related

Scala: do something if get the value in getOrElse

If a variable is an Option[Account], and there is a string field called accountName in the class Account.
e.g:
val allAccounts: Set[Option[Account]] = Set(Some(Account1), Some(Account2), None)
How do I get the accountName from Some(Account) if I get something from getOrElse?
I tried allAccounts.map(_.getOrElse("").accountName) but it doesn't work. It cannot apply to the "get" part but the "OrElse" part
Thanks for your help!
PS: wonder why allAccounts.map(_.map(_.accountName).getOrElse("")) works fine with None value but if I create another variable: val sampleAccount2 = None and sampleAccount2.map(_.accountName).getOrElse("") will failed? Basically I just goes from Set(None) to None ?
Is this what you ultimately wanted to achieve?
final case class Account(accountName: String)
val allAccounts: Set[Option[Account]] =
Set(Some(Account("Account1")), Some(Account("Account2")), None)
def getAccountNames(maybeAccounts: Set[Option[Account]]): Set[String] =
maybeAccounts.map(_.fold("")(_.accountName))
assert(getAccountNames(allAccounts) == Set("Account1", "Account2", ""))
You can play around with this code here on Scastie.
Another way to write getAccountNames is by using a combination of map and getOrElse instead of fold, like so:
def getAccountNames(maybeAccounts: Set[Option[Account]]): Set[String] =
maybeAccounts.map(_.map(_.accountName).getOrElse(""))
This is probably closer to what you initially wanted to write. In this case fold and map with getOrElse are basically equivalent, choose whichever makes more sense given your knowledge of the code base you're working on at the moment.
This version is also available here on Scastie.
The problem with your attempt if that you were applying getOrElse to the Option[Account] type, meaning that you were trying to return something that was either an Account (within the Option) or a String and from that thing you were then asking the accountName, which only makes sense on Account but not on String. The key difference is that in this case you first map on Option[Account] to get the accountName on Somes, getting an Option[String], and then you either get what's in there or the default value if the Option is empty.
As further input, please note that since you are using a Set, if you have multiple empty values in your input, they will be effectively collapsed into one, as in the following example:
assert(getAccountNames(Set(None, None)) == Set(""))
If by any chance you would rather remove any empty value entirely from the output, you can do so by rewriting the function above so that it's defined like so (Scastie):
def getAccountNames(maybeAccounts: Set[Option[Account]]): Set[String] =
maybeAccounts.flatMap(_.map(_.accountName))
In this case getAccountNames can be redefined in terms of a for-comprehension (more on the topic here on the Scala documentation):
def getAccountNames(maybeAccounts: Set[Option[Account]]): Set[String] =
for {
maybeAccount <- maybeAccounts
account <- maybeAccount
} yield account.accountName
This last example is also available here on Scastie for you to play around with it.
In both cases, the assertion that holds now changes to the following:
assert(getAccountNames(allAccounts) == Set("Account1", "Account2"))

Why is a scala Set casted to a Vector instead of a List?

I wonder why a Set[A] is converted to a Vector[A] if I ask for a Seq[A] subclass? To illustrate this take the following example:
val A = Set("one", "two")
val B = Set("one", "two", "three")
def f(one: Seq[String], other : Seq[String]) = {
one.intersect(other) match {
case head :: tail => head
case _ => "unknown"
}
}
f(A.to, B.to)
This function will return "unknown" instead of one. The reason is that A.to will be casted to a Vector[String]. The cons operator (::) is not defined for Vectors but for Lists so the second case is applied and "unknown" is returned. To fix this problem I could use the +: operator which is defined for all Seqs or convert the Set to List (A.to[List]). So my (academic) question is:
Why does A.to returns a Vector. At least according to the scala docs the default implementation of Seq is LinearSeq and the default of this is List. What did I got wrong?
Because it can, you are depending on runtime class implementation details, instead of compile-time type information guarantees. The to or toSeq method is free to return anything that typechecks, it could even generate a random number and chose a concrete class in base of that number, so you may get a List something other times a Vector or whatever. It may even decide in base of the operating system. Of course, I am being pedantic here and hopefully, they do not do that, but my point is, we can't really explain, that is what the implementation does and it may change in the future.
Also, the "default implementation of Seq is a List", applies only in the constructor. And again, they may change that in any moment.
So, if you want a List ask for a List, not for a Seq.

Pattern match on value of Either inside a for comprehension?

I have a for comprehension like this:
for {
(value1: String, value2: String, value3: String) <- getConfigs(args)
// more stuff using those values
}
getConfigs returns an Either[Throwable, (Seq[String], String, String)] and when I try to compile I get this error:
value withFilter is not a member of Either[Throwable,(Seq[String], String, String)]
How can I use this method (that returns an Either) in the for comprehension?
Like this:
for {
tuple <- getConfigs()
} println(tuple)
Joking aside, I think that is an interesting question but it is misnamed a bit.
The problem (see above) is not that for comprehensions are not possible but that pattern matching inside the for comprehension is not possible within Either.
There is documentation how for comprehensions are translated but they don't cover each case. This one is not covered there, as far as I can see. So I looked it up in my instance of "Programming in Scala" -- Second Edition (because that is the one I have by my side on dead trees).
Section 23.4 - Translation of for-expressions
There is a subchapter "Translating patterns in generators", which is what is the problem here, as described above. It lists two cases:
Case One: Tuples
Is exactly our case:
for ((x1, …, xn) <- expr1) yield expr2
should translate to expr1.map { case (x1, …, xn) => expr2).
Which is exactly what IntelliJ does, when you select the code and do an "Desugar for comprehension" action. Yay!
… but that makes it even weirder in my eyes, because the desugared code actually runs without problems.
So this case is the one which is (imho) matching the case, but is not what is happening. At least not what we observed. Hm?!
Case two: Arbitrary patterns
for (pat <- expr1) yield expr2
translates to
expr1 withFilter {
case pat => true
case _ => false
} map {
case pat => expr2
}
where there is now an withFilter method!
This case totally explains the error message and why pattern matching in an Either is not possible.
The chapter ultimately refers to the scala language specification (to an older one though) which is where I stop now.
So I a sorry I can't totally answer that question, but hopefully I could hint enough what is the root of the problem here.
Intuition
So why is Either problematic and doesn't propose an withFilter method, where Try and Option do?
Because filter removes elements from the "container" and probably "all", so we need something that is representing an "empty container".
That is easy for Option, where this is obviously None. Also easy for e.g. List. Not so easy for Try, because there are multiple Failure, each one can hold a specific exception. However there are multiple failures taking this place:
NoSuchElementException and
UnsupportedOperationException
and which is why Try[X] runs, but an Either[Throwable, X] does not.
It's almost the same thing, but not entirely. Try knows that Left are Throwable and the library authors can take advantage out of it.
However on an Either (which is now right biased) the "empty" case is the Left case; which is generic. So the user determines which type it is, so the library authors couldn't pick generic instances for each possible left.
I think this is why Either doesn't provide an withFilter out-of-the-box and why your expression fails.
Btw. the
expr1.map { case (x1, …, xn) => expr2) }
case works, because it throws an MatchError on the calling stack and panics out of the problem which… in itself might be a greater problem.
Oh and for the ones that are brave enough: I didn't use the "Monad" word up until now, because Scala doesn't have a datastructure for it, but for-comprehensions work just without it. But maybe a reference won't hurt: Additive Monads have this "zero" value, which is exactly what Either misses here and what I tried to give some meaning in the "intuition" part.
I guess you want your loop to run only if the value is a Right. If it is a Left, it should not run. This can be achieved really easy:
for {
(value1, value2, value3) <- getConfigs(args).right.toOption
// more stuff using those values
}
Sidenote: I don't know whats your exact use case, but scala.util.Try is better suited for cases where you either have a result or a failure (an exception).
Just write Try { /*some code that may throw an exception*/ } and you'll either have Success(/*the result*/) or a Failure(/*the caught exception*/).
If your getConfigs method returns a Try instead of Either, then your above could would work without any changes.
You can do this using Oleg's better-monadic-for compiler plugin:
build.sbt:
addCompilerPlugin("com.olegpy" %% "better-monadic-for" % "0.2.4")
And then:
object Test {
def getConfigs: Either[Throwable, (String, String, String)] = Right(("a", "b", "c"))
def main(args: Array[String]): Unit = {
val res = for {
(fst, snd, third) <- getConfigs
} yield fst
res.foreach(println)
}
}
Yields:
a
This works because the plugin removes the unnecessary withFilter and unchecked while desugaring and uses a .map call. Thus, we get:
val res: Either[Throwable, String] =
getConfigs
.map[String](((x$1: (String, String, String)) => x$1 match {
case (_1: String, _2: String, _3: String)
(String, String, String)((fst # _), (snd # _), (third # _)) => fst
}));
I think the part you may find surprising is that the Scala compiler emits this error because you deconstruct the tuple in place. This is surprisingly forces the compiler to check for withFilter method because it looks to the compilers like an implicit check for the type of the value inside the container and checks on values are implemented using withFilter. If you write your code as
for {
tmp <- getConfigs(args)
(value1: Seq[String], value2: String, value3: String) = tmp
// more stuff using those values
}
it should compile without errors.

Spark Cassandra Connector: for comprehension error (type mismatch)

Problem
Maybe this is due to my lack of Scala knowledge, but it seems like adding another level to the for comprehension should just work. If the first for comprehension line is commented out, the code works. I ultimately want a Set[Int] instead of '1 to 2', but it serves to show the problem. The first two lines of the for should not need a type specifier, but I include it to show that I've tried the obvious.
Tools/Jars
IntelliJ 2016.1
Java 8
Scala 2.10.5
Cassandra 3.x
spark-assembly-1.6.0-hadoop2.6.0.jar (pre-built)
spark-cassandra-connector_2.10-1.6.0-M1-SNAPSHOT.jar (pre-built)
spark-cassandra-connector-assembly-1.6.0-M1-SNAPSHOT.jar (I built)
Code
case class NotifHist(intnotifhistid:Int, eventhistids:Seq[Int], yosemiteid:String, initiatorname:String)
case class NotifHistSingle(intnotifhistid:Int, inteventhistid:Int, dataCenter:String, initiatorname:String)
object SparkCassandraConnectorJoins {
def joinQueryAfterMakingExpandedRdd(sc:SparkContext, orgNodeId:Int) {
val notifHist:RDD[NotifHistSingle] = for {
orgNodeId:Int <- 1 to 2 // comment out this line and it works
notifHist:NotifHist <- sc.cassandraTable[NotifHist](keyspace, "notifhist").where("intorgnodeid = ?", orgNodeId)
eventHistId <- notifHist.eventhistids
} yield NotifHistSingle(notifHist.intnotifhistid, eventHistId, notifHist.yosemiteid, notifHist.initiatorname)
...etc...
}
Compilation Output
Information:3/29/16 8:52 AM - Compilation completed with 1 error and 0 warnings in 1s 507ms
/home/jpowell/Projects/SparkCassandraConnector/src/com/mir3/spark/SparkCassandraConnectorJoins.scala
**Error:(88, 21) type mismatch;
found : scala.collection.immutable.IndexedSeq[Nothing]
required: org.apache.spark.rdd.RDD[com.mir3.spark.NotifHistSingle]
orgNodeId:Int <- 1 to 2
^**
Later
#slouc Thanks for the comprehensive answer. I was using the for comprehension's syntactic sugar to also keep state from the second statement to fill elements in the NotifHistSingle ctor, so I don't see how to get the equivalent map/flatmap to work. Therefore, I went with the following solution:
def joinQueryAfterMakingExpandedRdd(sc:SparkContext, orgNodeIds:Set[Int]) {
def notifHistForOrg(orgNodeId:Int): RDD[NotifHistSingle] = {
for {
notifHist <- sc.cassandraTable[NotifHist](keyspace, "notifhist").where("intorgnodeid = ?", orgNodeId)
eventHistId <- notifHist.eventhistids
} yield NotifHistSingle(notifHist.intnotifhistid, eventHistId, notifHist.yosemiteid, notifHist.initiatorname)
}
val emptyTable:RDD[NotifHistSingle] = sc.emptyRDD[NotifHistSingle]
val notifHistForAllOrgs:RDD[NotifHistSingle] = orgNodeIds.foldLeft(emptyTable)((accum, oid) => accum ++ notifHistForOrg(oid))
}
For comprehension is actually syntax sugar; what's really going on underneath is a series of chained flatMap calls, with a single map at the end which replaces yield. Scala compiler translates every for comprehension like this. If you use if conditions in your for comprehension, they are translated into filters, and if you don't yield anything foreach is used. For more information, see here.
So, to explain on your case - this:
val notifHist:RDD[NotifHistSingle] = for {
orgNodeId:Int <- 1 to 2 // comment out this line and it works
notifHist:NotifHist <- sc.cassandraTable[NotifHist](keyspace, "notifhist").where("intorgnodeid = ?", orgNodeId)
eventHistId <- notifHist.eventhistids
} yield NotifHistSingle(...)
is actually translated by the compiler to this:
val notifHist:RDD[NotifHistSingle] = (1 to 2)
.flatMap(x => sc.cassandraTable[NotifHist](keyspace, "notifhist").where("intorgnodeid = ?", x)
.flatMap(x => x.eventhistids)
.map(x => NotifHistSingle(...))
You are getting the error if you include the 1 to 2 line because that makes your for comprehension operate on a sequence (vector, to be more precise). So when invoking flatMap(), compiler expects you to follow up with a function that transforms each element of your vector to a GenTraversableOnce. If you take a closer look at the type of your for expression (most IDEs will display it just by hovering over it) you can see it for yourself:
def flatMap[B, That](f: A => GenTraversableOnce[B])(implicit bf: CanBuildFrom[Repr, B, That]): That
This is the problem. Compiler doesn't know how to flatMap the vector 1 to 10 using a function that returns CassandraRDD. It wants a function that returns GenTraversableOnce. If you remove the 1 to 2 line then you remove this restriction.
Bottom line - if you want to use a for comprehension and yield values out of it, you have to obey the type rules. It's impossible to flatten a sequence consisting of elements which are not sequences and cannot be turned into sequences.
You can always map instead of flatMap since map is less restrictive (it requires A => B instead of A => GenTraversableOnce[B]). This means that instead of getting all results in one giant sequence, you will get a sequence where each element is a group of results (one group for each query). You can also play around the types, trying to get a GenTraversableOnce from your query result (e.g. invoking sc.cassandraTable().where().toArray or something; I don't really work with Cassandra so I don't know).

Scala Option object inside another Option object

I have a model, which has some Option fields, which contain another Option fields. For example:
case class First(second: Option[Second], name: Option[String])
case class Second(third: Option[Third], title: Option[String])
case class Third(numberOfSmth: Option[Int])
I'm receiving this data from external JSON's and sometimes this data may contain null's, that was the reason of such model design.
So the question is: what is the best way to get a deepest field?
First.get.second.get.third.get.numberOfSmth.get
Above method looks really ugly and it may cause exception if one of the objects will be None. I was looking in to Scalaz lib, but didn't figure out a better way to do that.
Any ideas?
The solution is to use Option.map and Option.flatMap:
First.flatMap(_.second.flatMap(_.third.map(_.numberOfSmth)))
Or the equivalent (see the update at the end of this answer):
First flatMap(_.second) flatMap(_.third) map(_.numberOfSmth)
This returns an Option[Int] (provided that numberOfSmth returns an Int). If any of the options in the call chain is None, the result will be None, otherwise it will be Some(count) where count is the value returned by numberOfSmth.
Of course this can get ugly very fast. For this reason scala supports for comprehensions as a syntactic sugar. The above can be rewritten as:
for {
first <- First
second <- first .second
third <- second.third
} third.numberOfSmth
Which is arguably nicer (especially if you are not yet used to seeing map/flatMap everywhere, as will certainly be the case after a while using scala), and generates the exact same code under the hood.
For more background, you may check this other question: What is Scala's yield?
UPDATE:
Thanks to Ben James for pointing out that flatMap is associative. In other words x flatMap(y flatMap z))) is the same as x flatMap y flatMap z. While the latter is usually not shorter, it has the advantage of avoiding any nesting, which is easier to follow.
Here is some illustration in the REPL (the 4 styles are equivalent, with the first two using flatMap nesting, the other two using flat chains of flatMap):
scala> val l = Some(1,Some(2,Some(3,"aze")))
l: Some[(Int, Some[(Int, Some[(Int, String)])])] = Some((1,Some((2,Some((3,aze))))))
scala> l.flatMap(_._2.flatMap(_._2.map(_._2)))
res22: Option[String] = Some(aze)
scala> l flatMap(_._2 flatMap(_._2 map(_._2)))
res23: Option[String] = Some(aze)
scala> l flatMap(_._2) flatMap(_._2) map(_._2)
res24: Option[String] = Some(aze)
scala> l.flatMap(_._2).flatMap(_._2).map(_._2)
res25: Option[String] = Some(aze)
There is no need for scalaz:
for {
first <- yourFirst
second <- f.second
third <- second.third
number <- third.numberOfSmth
} yield number
Alternatively you can use nested flatMaps
This can be done by chaining calls to flatMap:
def getN(first: Option[First]): Option[Int] =
first flatMap (_.second) flatMap (_.third) flatMap (_.numberOfSmth)
You can also do this with a for-comprehension, but it's more verbose as it forces you to name each intermediate value:
def getN(first: Option[First]): Option[Int] =
for {
f <- first
s <- f.second
t <- s.third
n <- t.numberOfSmth
} yield n
I think it is an overkill for your problem but just as a general reference:
This nested access problem is addressed by a concept called Lenses. They provide a nice mechanism to access nested data types by simple composition. As introduction you might want to check for instance this SO answer or this tutorial. The question whether it makes sense to use Lenses in your case is whether you also have to perform a lot of updates in you nested option structure (note: update not in the mutable sense, but returning a new modified but immutable instance). Without Lenses this leads to lengthy nested case class copy code. If you do not have to update at all, I would stick to om-nom-nom's suggestion.