List/Array with optionals - scala

Which is better in practice? Having an optional List or having optional items in the list?
Currently I'm following an optional list.
List[Option[T]] or Option[List[T]]?
Edit:
The problem I'm running into is that i have crud operations that i'm returning optional types from. I have a situation where I have a method that does a single lookup and i want to leverage it to make a function to return a list. I'm still getting my feet wet with scala so I'm curious what the best practice is.
Example:
def findOne(id: Int): Option[T]
regardless of implementation I want to use it for something like these, but which is better? They both seem to be weird to map from. Maybe there's something i'm missing all together:
def find(ids: List[Int]) : Option[List[T]]
vs
def find(ids: List[Int]) : List[Option[T]]

Those two types mean very different things. List[Option[T]] looks like an intermediate result that would you would flatten. (I'd look into using flatMap in this case.)
The second type, Option[List[T]] says there may or may not be a list. This would be a good type to use when you need to distinguish between the "no result" case and the "result is an empty list" case.
I can't think of a situation where both types would make sense.

If you want to retrieve several things that might exist, and it's sensible for some of them to exist and some of them to not exist, it's List[Option[T]] - a list of several entries, each of which is present or not. This would make sense in e.g. a "search" situation, where you want to display whichever ones exist, possibly only some of the requested things. You could implement that method as:
def find(ids: List[Int]) = ids map findOne
If you're using Option to represent something like an an "error" case, and you want "if any of them failed then the whole thing is a failure", then you want Option[List[T]] - either a complete list, or nothing at all. You could implement that, using Scalaz, as:
def find(ids: List[Int]) = ids traverse findOne

DaoWen already got to the point regarding your considerations.
List[Option[T]] doesn't even encode more information than List[T] without the implicit knowledge that your list of ids is in the same order than your result list.
I'd actualy favour
def find(ids: Seq[Int]): Seq[T]
or
def find(ids: Seq[Int]): Option[NonEmptyList[T]]
where NonEmptyList is a type of sequence that actually enforces being not empty, e.g. in Scalaz. Unless you really wanna point out the difference between None and some empty list.
Btw, you might wanna use the more general Seq[T] for your interface instead of List[T]

Related

Traversing Scalaz Tree

I'm trying to understand the scalaz tree structure and am having some difficulty!
First I've defined a tree:
val tree: Tree[Int] =
1.node(
2.leaf,
3.node(
4.leaf,
5.leaf))
So far using TreeLoc I've worked out how to find the first element that matches some predicate. E.g. to find the first node where the value is 3:
tree.loc.find(x => x.getLabel == 3)
My next challenge was to try and find all nodes that match some predicate. For example I would like to find all leaf nodes (which should be pretty easy using TreeLoc and isLeaf). Unfortunately I can't for the life of me work out how to walk the tree to do this.
Edit: Sorry I don't think I was clear enough in my original question. To be clear I want to walk the tree in such a way that I have information about the Node available to me. Flatten, foldRight etc just allow me to operate on [Int] whereas I want to be able to operate on Tree[Int] (or TreeLoc[Int]).
Having a look to how find is implemented in scalaz, my suggestion is to implement something like:
implicit class FilterTreeLoc[A](treeLoc: TreeLoc[A]){
def filter(p: TreeLoc[A] => Boolean): Stream[TreeLoc[A]] =
Cobind[TreeLoc].cojoin(treeLoc).tree.flatten.filter(p)
}
It behaves like the find but it gives you back instead a Stream[TreeLoc[A]] instead of an Option[TreeLoc[A]].
You can use this as tree.loc.filter(_.isLeaf) and tree.loc.filter(_.getLabel == 3).
Note: the use of an implicit class can be obviously avoided if you prefer to have this declared as a method instead.

Option as a singleton collection - real life use cases

The title pretty much sums it up. Option as a singleton collection can sometimes be confusing, but sometimes it allows for an interesting application. I have one example on top of my head, and would like to learn more of such examples.
My only example is running for comprehension on the Option[List[T]]. We can do the following:
val v = Some(List(1, 2, 3))
for {
list <- v.toList
elem <- list
} yield elem + 1
Without having Option.toList, it wouldn't be possible to stay in the same for comprehension, and I'd be forced to write something like this:
for {
list <- v
} yield for {
elem <- list
} yield elem + 1
The first example is cleaner, and it's an advantage of Option being a collection. Of course, the result type will be different in these 2 examples, but let's assume it doesn't matter for the sake of discussion.
Any other examples? I'd especially like to concentrate on collection-like usage, and not usage of Option's monadic properties - those are pretty much obvious. In other words, map and flatMap functions are out of scope of this question. They're definitely very useful, just coming from elsewhere.
I find that working with Option[T] as a collection's main benefit is that you get to use operations defined on a collection, such as map, flatmap, filter, foreach etc. This makes it easier to do operations on a given option, instead of using pattern matching or checking Option[T].isDefined to see if a value exists.
For example, let's take the user repository example from Daniel Westheide blog post about Option[T]:
Say you have a UserRepository object which returns users based on their ID. The user may or may not exist, hence it returns an Option[Person]. Now let's say we want to search a person by id and then filter their age. We can do:
val age: Some[Int] = UserRepository.findById(1).map(_.age)
Now let's say that a Person also has a gender property of type Option[String]. If you wanted to extract that out, you could use map:
val gender: Option[Option[String]] = UserRepository.findById(1).map(_.gender)
But working with nested options isn't too convenient. For that, you have flatMap:
val gender: Option[String] = UserRepository.findById(1).flatMap(_.gender)
And if we want to print out the gender if it exists, we can use foreach:
gender.foreach(println)
You'll find yourself working with scala types that have nested Option[T] fields defined and it's really handy to have collection like methods which help you remove out boilerplate and noise for extracting the actual value out of the operation.
A more real life use case I just encountered the other day was working with the awscala SDK, where I wanted to retrieve an object from S3 storage:
val bucket: Option[Bucket] = s3.bucket(amazonConfig.bucketName)
val result: Option[S3Object] = bucket.flatMap(_.get(amazonConfig.offsetKey))
result.flatMap(s3Object =>
Source.fromInputStream(s3Object.content).mkString.decodeOption[Array[KafkaOffset]])
So what happens here is that you query the S3 service for a bucket, which may or may not exist. Then, you want to extract an S3Object out of it which actually contains the data, but the API itself returns an Option[S3Object], so it's handy to use flatMap to flat out get an Option[S3Object] instead of Option[Option[S3Object]]. Finally, I want to deserialize the S3Object which actually contains a JSON, and using the Argonaut library, it returns an Option[MyObject], so then again using flatMap to the rescue of extracting the inner option type.
Edit:
As you pointed out, map and flatMap belong to the monadic property of Option[T]. I've written a blog post describing the reduction of two options where the final solution was:
def reduce[T](a: Option[T], b: Option[T], f: (T, T) => T): Option[T] = {
(a ++ b).reduceLeftOption(f)
}
Which takes advantage of the ++ operator defined on any collection which is also specifically defined on Option[T], being a collection.
I'd suggest to take a look at the corresponding chapter of The Neophyte's Guide to Scala.
In my experience, most useful use-cases of Option-as-collection are to filter an option and to make flatMap that implicitly filters None values.

Possible to make use of Scala's Option flatMap method more concise?

I'm admittedly very new to Scala, and I'm having trouble with the syntactical sugar I see in many Scala examples.
It often results in a very concise statement, but honestly so far (for me) a bit unreadable.
So I wish to take a typical use of the Option class, safe-dereferencing, as a good place to start for understanding, for example, the use of the underscore in a particular example I've seen.
I found a really nice article showing examples of the use of Option to avoid the case of null.
https://medium.com/#sinisalouc/demystifying-the-monad-in-scala-cc716bb6f534#.fhrljf7nl
He describes a use as so:
trait User {
val child: Option[User]
}
By the way, you can also write those functions as in-place lambda
functions instead of defining them a priori. Then the code becomes
this:
val result = UserService.loadUser("mike")
.flatMap(user => user.child)
.flatMap(user => user.child)
That looks great! Maybe not as concise as one can do in groovy, but not bad.
So I thought I'd try to apply it to a case I am trying to solve.
I have a type Person where the existence of a Person is optional, but if we have a person, his attributes are guaranteed. For that reason, there are no use of the Option type within the Person type itself.
The Person has an PID which is of type Id. The Id type consists of two String types; the Id-Type and the Id-Value.
I've used the Scala console to test the following:
class Id(val idCode : String, val idVal : String)
class Person(val pid : Id, val name : String)
val anId: Id = new Id("Passport_number", "12345")
val person: Person = new Person(anId, "Sean")
val operson : Option[Person] = Some(person)
OK. That setup my person and it's optional instance.
I learned from the above linked article that I could get the Persons Id-Val by using flatMap; Like this:
val result = operson.flatMap(person => Some(person.pid)).flatMap(pid => Some(pid.idVal)).getOrElse("NoValue")
Great! That works. And if I infact have no person, my result is "NoValue".
I used flatMap (and not Map) because, unless I misunderstand (and my tests with Map were incorrect) if I use Map I have to provide an alternate or default Person instance. That I didn't want to have to do.
OK, so, flatMap is the way to go.
However, that is really not a very concise statement.
If I were writing that in more of a groovy style, I guess i'd be able to do something like this:
val result = person?.pid.idVal
Wow, that's a bit nicer!
Surely Scala has a means to provide something at least nearly as nice as Groovy?
In the above linked example, he was able to make his statement more concise using some of that syntactical sugar I mentioned before. The underscore:
or even more concise:
val result = UserService.loadUser("mike")
.flatMap(_.child)
.flatMap(_.child)
So, it seems in this case the underscore character allows you to skip specifying the type (as the type is inferred) and replace it with underscore.
However, when I try the same thing with my example:
val result = operson.flatMap(Some(_.pid)).flatMap(Some(_.idVal)).getOrElse("NoValue")
Scala complains.
<console>:15: error: missing parameter type for expanded function ((x$2) => x$2.idVal)
val result = operson.flatMap(Some(_.pid)).flatMap(Some(_.idVal)).getOrElse("NoValue")
Can someone help me along here?
How am I misunderstanding this?
Is there a short-hand method of writing my above lengthy statement?
Is flatMap the best way to achieve what I am after? Or is there a better more concise and/or readable way to do it ?
thanks in advance!
Why do you insist on using flatMap? I'd just use map for your example instead:
val result = operson.map(_.pid).map(_.idVal).getOrElse("NoValue")
or even shorter:
val result = operson.map(_.pid.idVal).getOrElse("NoValue")
You should only use flatMap with functions that return Options. Your pid and idVals are not Options, so just map them instead.
You said
I have a type Person where the existence of a Person is optional, but if we have a person, his attributes are guaranteed. For that reason, there are no use of the Option type within the Person type itself.
This is the essential difference between your example and the User example. In the User example, both the existence of a User instance, and its child field are options. This is why, to get a child, you need to flatMap. However, since in your example, only the existence of a Person is not guaranteed, after you've retrieved an Option[Person], you can safely map to any of its fields.
Think of flatMap as a map, followed by a flatten (hence its name). If I mapped on child:
val ouser = Some(new User())
val child: Option[Option[User]] = ouser.map(_.child)
I would end up with an Option[Option[User]]. I need to flatten that to a single Option level, that's why I use flatMap in the first place.
If you looking for the most concise solution, consider this:
val result = operson.fold("NoValue")(_.pid.idVal)
Though one could find it not clear or confusing

Returning the same value type, but with success/failure

Let's say I have a method..
def foo(b: Bar): Try[Bar] = ???
Try is just a placeholder here. foo does something with Bar, then returns a value to indicate success/failure. I want to return the original value with the success/failure indication, so when I have a collection, I can know which ones failed and succeeded, and do something with them. Try doesn't really work for me, because Failure wraps an exception (let's say I don't care about the reason why it failed).
I could maybe return Either[Bar, Bar], but it seems redundant to repeat the type parameter.
Are there better alternatives than this?
Either[Bar, Bar] and (Boolean, Bar) are isomorphic and the choice between them is a matter of taste.
I'd personally prefer Either because you get a nicer set of operations for mapping over the collection with pattern matching, etc., as well as a merge extension method that allows you to write results.map(_.merge) to get a Seq[Bar] if in some situation you no longer need to make a distinction between successful and failed results. I also find this:
val result: Either[Bar, Bar] = foo(input).toOption.toRight(input)
A little nicer than:
val result: (Boolean, Bar) =
foo(input).map((true, _)).getOrElse((false, input))
Or the alternatives, but your mileage may vary.
If I understood your question: Your solution is almost what you were about to achieve.
def foo(b: Bar): Try[Bar] = Try(...)
val succeed = foo(b).getOrElse()
In succeed you have the value you wanted, or nothing (since you don't care about exception). Very good article is: http://danielwestheide.com/blog/2012/12/26/the-neophytes-guide-to-scala-part-6-error-handling-with-try.html

When applying `map` to a `Set` you sometimes want the result not to be a set but overlook this

Or how to avoid accidental removal of duplicates when mapping a Set?
This is a mistake I'm doing very often. Look at the following code:
def countSubelements[A](sl: Set[List[A]]): Int = sl.map(_.size).sum
The function shall count the accumulated size of all the contained lists. The problem is that after mapping the lists to their lengths, the result is still a Set and all lists of size 1 are reduced to a single representative.
Is it just me having this problem? Is there something I can do to prevent this happening? I think I'd love to have two methods mapToSet and mapToSeq for Set. But there is no way to enforce this, and sometimes you don't locally notice that you are working with a Set.
Maybe it's even possible that you were writing code for a Seq and something changes in another class and the underlying object becomes a Set?
Maybe something like a best practise to not let this situation arise at all?
Remote edits break my code
Imagine the following situation:
val totalEdges = graph.nodes.map(_.getEdges).map(_.size).sum / 2
You fetch a collection of Node objects from a graph, use them to get their adjacent edges and sum over them. This works if graph.nodes returns a Seq.
And it breaks if someone decides to make Graph return its nodes as a Set; without this code looking suspicious (at least not to me, do you expect every collection could possibly end up being a Set?) and without touching it.
It seems there will be many possible "gotcha's" if one expects a Seq and gets a Set. It's not a surprise that method implementations can depend on the type of the object and (with overloading) the arguments. With Scala implicits, the method can even depend on the expected return type.
A way to defend against surprises is to explicitly label types. For example, annotating methods with return types, even if it's not required. At least this way, if the type of graph.nodes is changed from Seq to Set, the programmer is aware that there's potential breakage.
For your specific issue, why not define your ownmapToSeq method,
scala> def mapToSeq[A, B](t: Traversable[A])(f: A => B): Seq[B] =
t.map(f)(collection.breakOut)
mapToSeq: [A, B](t: Traversable[A])(f: A => B)Seq[B]
scala> mapToSeq(Set(Seq(1), Seq(1,2)))(_.sum)
res1: Seq[Int] = Vector(1, 3)
scala> mapToSeq(Seq(Seq(1), Seq(1,2)))(_.sum)
res2: Seq[Int] = Vector(1, 3)
The advantage of using breakOut: CanBuildFrom is that the conversion from a Set to a Seq has no additional overhead.
You can make use the pimp my library pattern to make mapToSeq appear to be part of the Traversable trait, inherited by Seq and Set.
Workarounds:
def countSubelements[A](sl: Set[List[A]]): Int = sl.toSeq.map(_.size).sum
def countSubelements[A](sl: Set[List[A]]): Int = sl.foldLeft(0)(_ + _.size)