Traversing Scalaz Tree - scala

I'm trying to understand the scalaz tree structure and am having some difficulty!
First I've defined a tree:
val tree: Tree[Int] =
1.node(
2.leaf,
3.node(
4.leaf,
5.leaf))
So far using TreeLoc I've worked out how to find the first element that matches some predicate. E.g. to find the first node where the value is 3:
tree.loc.find(x => x.getLabel == 3)
My next challenge was to try and find all nodes that match some predicate. For example I would like to find all leaf nodes (which should be pretty easy using TreeLoc and isLeaf). Unfortunately I can't for the life of me work out how to walk the tree to do this.
Edit: Sorry I don't think I was clear enough in my original question. To be clear I want to walk the tree in such a way that I have information about the Node available to me. Flatten, foldRight etc just allow me to operate on [Int] whereas I want to be able to operate on Tree[Int] (or TreeLoc[Int]).

Having a look to how find is implemented in scalaz, my suggestion is to implement something like:
implicit class FilterTreeLoc[A](treeLoc: TreeLoc[A]){
def filter(p: TreeLoc[A] => Boolean): Stream[TreeLoc[A]] =
Cobind[TreeLoc].cojoin(treeLoc).tree.flatten.filter(p)
}
It behaves like the find but it gives you back instead a Stream[TreeLoc[A]] instead of an Option[TreeLoc[A]].
You can use this as tree.loc.filter(_.isLeaf) and tree.loc.filter(_.getLabel == 3).
Note: the use of an implicit class can be obviously avoided if you prefer to have this declared as a method instead.

Related

Option as a singleton collection - real life use cases

The title pretty much sums it up. Option as a singleton collection can sometimes be confusing, but sometimes it allows for an interesting application. I have one example on top of my head, and would like to learn more of such examples.
My only example is running for comprehension on the Option[List[T]]. We can do the following:
val v = Some(List(1, 2, 3))
for {
list <- v.toList
elem <- list
} yield elem + 1
Without having Option.toList, it wouldn't be possible to stay in the same for comprehension, and I'd be forced to write something like this:
for {
list <- v
} yield for {
elem <- list
} yield elem + 1
The first example is cleaner, and it's an advantage of Option being a collection. Of course, the result type will be different in these 2 examples, but let's assume it doesn't matter for the sake of discussion.
Any other examples? I'd especially like to concentrate on collection-like usage, and not usage of Option's monadic properties - those are pretty much obvious. In other words, map and flatMap functions are out of scope of this question. They're definitely very useful, just coming from elsewhere.
I find that working with Option[T] as a collection's main benefit is that you get to use operations defined on a collection, such as map, flatmap, filter, foreach etc. This makes it easier to do operations on a given option, instead of using pattern matching or checking Option[T].isDefined to see if a value exists.
For example, let's take the user repository example from Daniel Westheide blog post about Option[T]:
Say you have a UserRepository object which returns users based on their ID. The user may or may not exist, hence it returns an Option[Person]. Now let's say we want to search a person by id and then filter their age. We can do:
val age: Some[Int] = UserRepository.findById(1).map(_.age)
Now let's say that a Person also has a gender property of type Option[String]. If you wanted to extract that out, you could use map:
val gender: Option[Option[String]] = UserRepository.findById(1).map(_.gender)
But working with nested options isn't too convenient. For that, you have flatMap:
val gender: Option[String] = UserRepository.findById(1).flatMap(_.gender)
And if we want to print out the gender if it exists, we can use foreach:
gender.foreach(println)
You'll find yourself working with scala types that have nested Option[T] fields defined and it's really handy to have collection like methods which help you remove out boilerplate and noise for extracting the actual value out of the operation.
A more real life use case I just encountered the other day was working with the awscala SDK, where I wanted to retrieve an object from S3 storage:
val bucket: Option[Bucket] = s3.bucket(amazonConfig.bucketName)
val result: Option[S3Object] = bucket.flatMap(_.get(amazonConfig.offsetKey))
result.flatMap(s3Object =>
Source.fromInputStream(s3Object.content).mkString.decodeOption[Array[KafkaOffset]])
So what happens here is that you query the S3 service for a bucket, which may or may not exist. Then, you want to extract an S3Object out of it which actually contains the data, but the API itself returns an Option[S3Object], so it's handy to use flatMap to flat out get an Option[S3Object] instead of Option[Option[S3Object]]. Finally, I want to deserialize the S3Object which actually contains a JSON, and using the Argonaut library, it returns an Option[MyObject], so then again using flatMap to the rescue of extracting the inner option type.
Edit:
As you pointed out, map and flatMap belong to the monadic property of Option[T]. I've written a blog post describing the reduction of two options where the final solution was:
def reduce[T](a: Option[T], b: Option[T], f: (T, T) => T): Option[T] = {
(a ++ b).reduceLeftOption(f)
}
Which takes advantage of the ++ operator defined on any collection which is also specifically defined on Option[T], being a collection.
I'd suggest to take a look at the corresponding chapter of The Neophyte's Guide to Scala.
In my experience, most useful use-cases of Option-as-collection are to filter an option and to make flatMap that implicitly filters None values.

How to iterate over a Tree in Scalaz

The Scalaz Tree class proves seemingly very useful `Zipper' functionality via TreeLoc (Javadoc).
However, it's not apparent to me how to easily iterate through a tree (e.g. to find the `k-th' node in a tree containing a total of n>k nodes) without doing a lot of conditional hedging on whether the zipper is at the end of the list of current children.
Is there an easy way to do this that I'm missing?
Stealing from the TreeLoc.find method, you could do something like this:
def findAt[A](tree: TreeLoc[A], k: Int): Option[TreeLoc[A]] = {
Cobind[TreeLoc].cojoin(tree).tree.flatten.drop(k).headOption
}

List/Array with optionals

Which is better in practice? Having an optional List or having optional items in the list?
Currently I'm following an optional list.
List[Option[T]] or Option[List[T]]?
Edit:
The problem I'm running into is that i have crud operations that i'm returning optional types from. I have a situation where I have a method that does a single lookup and i want to leverage it to make a function to return a list. I'm still getting my feet wet with scala so I'm curious what the best practice is.
Example:
def findOne(id: Int): Option[T]
regardless of implementation I want to use it for something like these, but which is better? They both seem to be weird to map from. Maybe there's something i'm missing all together:
def find(ids: List[Int]) : Option[List[T]]
vs
def find(ids: List[Int]) : List[Option[T]]
Those two types mean very different things. List[Option[T]] looks like an intermediate result that would you would flatten. (I'd look into using flatMap in this case.)
The second type, Option[List[T]] says there may or may not be a list. This would be a good type to use when you need to distinguish between the "no result" case and the "result is an empty list" case.
I can't think of a situation where both types would make sense.
If you want to retrieve several things that might exist, and it's sensible for some of them to exist and some of them to not exist, it's List[Option[T]] - a list of several entries, each of which is present or not. This would make sense in e.g. a "search" situation, where you want to display whichever ones exist, possibly only some of the requested things. You could implement that method as:
def find(ids: List[Int]) = ids map findOne
If you're using Option to represent something like an an "error" case, and you want "if any of them failed then the whole thing is a failure", then you want Option[List[T]] - either a complete list, or nothing at all. You could implement that, using Scalaz, as:
def find(ids: List[Int]) = ids traverse findOne
DaoWen already got to the point regarding your considerations.
List[Option[T]] doesn't even encode more information than List[T] without the implicit knowledge that your list of ids is in the same order than your result list.
I'd actualy favour
def find(ids: Seq[Int]): Seq[T]
or
def find(ids: Seq[Int]): Option[NonEmptyList[T]]
where NonEmptyList is a type of sequence that actually enforces being not empty, e.g. in Scalaz. Unless you really wanna point out the difference between None and some empty list.
Btw, you might wanna use the more general Seq[T] for your interface instead of List[T]

Understanding GenericTraversableTemplate and other Scala collection internals

I was exchanging emails with an acquaintance that is a big Kotlin, Clojure and Java8 fan and asked him why not Scala. He provided many reasons (Scala is too academic, too many features, not the first time I hear this and I think this is very subjective)
but his biggest pain point was as an example, that he doesn't like a language where he can't understand the implementation of basic data structures, and he gave LinkedList as an example.
I took a look at scala.collection.LinkedList and counted the things I either understand or somewhat understand.
CanBuildFrom - after some effort, I get it, type classes, not the longest suicide note
in history [1]
LinkedListLike - I can't remember where I read it, but I got convinced this is there for a good reason
But then I started to stare at these
GenericTraversableTemplate - now I'm scratching my head as well...
SeqFactory, GenericCompanion - OK, now you lost me, I start to understand his point
Can someone who understand this well please explain GenericTraversableTemplate SeqFactory and GenericCompanion in the context of LinkedList? What they are for, what impact on the end user they have (e.g. I'm sure they are there for a good reason, what is that reason?)
Are they there for a practical reason? or is it a level of abstraction that could have been simplified?
I like Scala collections because I don't have to understand the internals to be able to effectively use them. I don't mind a complex implementation if it helps me to keep my usage simpler. e.g. I don't mind paying the price of a complex library if I get the ability to write cleaner more elegant code using it in return. but it will sure be nice to better understand it.
[1] - Is the Scala 2.8 collections library a case of "the longest suicide note in history"?
I will try to describe the concepts from the point of view of a random pedestrian (I've never contributed a single line to the Scala collection library, so don't hit me too hard if I'm wrong).
Since LinkedList is now deprecated, and because Maps provide a better example, I will use TreeMap as example.
CanBuildFrom
The motivation is this: If we take a TreeMap[Int, Int] and map it with
case (x, y) => (2 * x, y * y * 0.3d)
we get TreeMap[Int, Double]. This type safety alone would already explain the necessity for
simple genericBuilder[X] constructs.
However, if we map it with
case (x, y) => x
we obtain an Iterable[Int] (more precisely: a List[Int]), this is no longer a Map, the type of the container has changed. This is where CBF's come into play:
CanBuildFrom[This, X, That]
can be seen as a kind of "type-level function" that tells us: if we map a collection of type
This with a function that returns values of type X, we can build a That. The most specific
CBF is provided at compile time, in the first case it will be something like
CanBuildFrom[TreeMap[_,_], (X,Y), TreeMap[X,Y]]
in the second case it will be something like
CanBuildFrom[TreeMap[_,_], X, Iterable[X]]
and so we always get the right type of the container. The pattern is pretty general.
Every time you have a generic function
foo[X1, ..., Xn](x1: X1, ..., xn: Xn): Y
where the result type Y depends on X1, ..., Xn, you can introduce an implicit parameter as
follows:
foo[X1, ...., Xn, Y](x1: X1, ..., xn: Xn)(implicit CanFooFrom[X1, ..., Xn, Y]): Y
and then define the type-level function X1, ..., Xn -> Y piecewise by providing multiple
implicit CanFooFrom's.
LinkedListLike
In the class definition, we see something like this:
TreeMap[A, B] extends SortedMap[A, B] with SortedMapLike[A, B, TreeMap[A, B]]
This is Scala's way to express the so-called F-bounded polymorphism.
The motivation is as follows: Suppose we have a dozen (or at least two...) implementations of the trait SortedMap[A, B]. Now we want to implement a method withoutHead, it could look
somewhat like this:
def withoutHead = this.remove(this.head)
If we move the implementation into SortedMap[A, B] itself, the best we can do is this:
def withoutHead: SortedMap[A, B] = this.remove(this.head)
But is this the most specific result type we can get? No, that's too vague.
We would like to return TreeMap[A, B] if the original map is a TreeMap, and
CrazySortedLinkedHashMap (or whatever...) if the original was a CrazySortedLinkedHashMap.
This is why we move the implementation into SortedMapLike, and give the following signature to the withoutHead method:
trait SortedMapLike[A, B, Repr <: SortedMap[A, B]] {
...
def withoutHead: Repr = this.remove(this.head)
}
now because TreeMap[A, B] extends SortedMapLike[A, B, TreeMap[A, B]], the result type of
withoutHead is TreeMap[A,B]. The same holds for CrazySortedLinkedHashMap: we get the exact type back. In Java, you would either have to return SortedMap[A, B] or override the method in each subclass (which turned out to be a maintenance nightmare for the feature-rich traits in Scala)
GenericTraversableTemplate
The type is: GenericTraversableTemplate[+A, +CC[X] <: GenTraversable[X]]
As far as i can tell, this is just a trait that provides implementations of
methods that somehow return regular collections with same container type but
possibly different content type (stuff like flatten, transpose, unzip).
Stuff like foldLeft, reduce, exists are not here because these methods care only about content type, not container type.
Stuff like flatMap is not here, because the container type can change (again, CBF's).
Why is it a separate trait, is there a fundamental reason why it exists?
I don't think so... It probably would be possible to group the godzillion of methods somewhat differently. But this is just what happens naturally: you start to implement a trait, and it turns out that it has very many methods. So instead you group loosely related methods, and put them into 10 different traits with awkward names like "GenTraversableTemplate", and them mix them all into traits/classes where you need them...
GenericCompanion
This is just an abstract class that implements some basic functionality which is common
for companion objects of most collection classes (essentially, it just implements very
simple factory methods apply(varargs) and empty).
For example there is method apply that takes varargs of some type A and returns a collection of type CC[A]:
Array(1, 2, 3, 4) // calls Array.apply[A](elems: A*) on the companion object
List(1, 2, 3, 4) // same for List
The implementation is very simple, it's something like this:
def apply[A](varargs: A*): CC[A] = {
val builder = newBuilder[A]
for (arg <- varargs) builder += arg
builder.result()
}
This is obviously the same for Arrays and Lists and TreeMaps and almost everything else, except 'constrained irregular Collections' like Bitset. So this is just common functionality in a common ancestor class of most companion objects. Nothing special about that.
SeqFactory
Similar to GenericCompanion, but this time more specifically for Sequences.
Adds some common factory methods like fill() and iterate() and tabulate() etc.
Again, nothing particularly rocket-scientific here...
Few general remarks
In general: I don't think that one should attempt to understand every single trait in this library. Rather, one should try to look at the library as a whole. As a whole, it has a very interesting architecture. And in my personal opinion, it's actually a very aesthetic piece of software, but one has to stare at it for quite a while (and try to re-implement the whole architectural pattern several times) to grasp it. On the other hand: for example CBF's are kind of "design pattern" that clearly should be eliminated in successors of this language. The whole story with the scope of implicit CBF's still seems like a total nightmare to me. But many things seemed completely inscrutable at first, and almost always, it ended with an epiphany (which is very specific for Scala: for the majority of other languages, such struggles usually end with the thought "Author of this is a complete idiot").

Scala: selecting function returning Option versus PartialFunction

I'm a relative Scala beginner and would like some advice on how to proceed on an implementation that seems like it can be done either with a function returning Option or with PartialFunction. I've read all the related posts I could find (see bottom of question), but these seem to involve the technical details of using PartialFunction or converting one to the other; I am looking for an answer of the type "if the circumstances are X,Y,Z, then use A else B, but also consider C".
My example use case is a path search between locations using a library of path finders. Say the locations are of type L, a path was of type P and the desired path search result would be an Iterable[P]. The patch search result should be assembled by asking all the path finders (in something like Google maps these might be Bicycle, Car, Walk, Subway, etc.) for their path suggestions, which may or may not be defined for a particular start/end location pair.
There seem to be two ways to go about this:
(a) define a path finder as f: (L,L) => Option[P] and then get the result via something like finders.map( _.apply(l1,l2) ).filter( _.isDefined ).map( _.get )
(b) define a path finder as f: PartialFunction[(L,L),P] and then get the result via something likefinders.filter( _.isDefined( (l1,l2) ) ).map( _.apply( (l1,l2)) )`
It seems like using a function returning Option[P] would avoid double evaluation of results, so for an expensive computation this may be preferable unless one caches the results. It also seems like using Option one can have an arbitrary input signature, whereas PartialFunction expects a single argument. But I am particularly interested in hearing from someone with practical experience about less immediate, more "bigger picture" considerations, such as the interaction with the Scala library. Would using a PartialFunction have significant benefits in making available certain methods of the collections API that might pay off in other ways? Would such code generally be more concise?
Related but different questions:
Inverse of PartialFunction's lift method
Is the PartialFunction design inefficient?
How to convert X => Option[R] to PartialFunction[X,R]
Is there a nicer way of lifting a PartialFunction in Scala?
costly computation occuring in both isDefined and Apply of a PartialFunction
It's not all that well known, but since 2.8 Scala has a collect method defined on it's collections. collect is similar to filter, but takes a partial function and has the semantics you describe.
It feels like Option might fit your use case better.
My interpretation is that Partial Functions work well to be combined over input ranges. So if f is defined over (SanDiego,Irvine) and g is defined over (Paris,London) then you can get a function that is defined over the combined input (SanDiego,Irvine) and (Paris,London) by doing f orElse g.
But in your case it seems, things happen for a given (l1,l2) location tuple and then you do some work...
If you find yourself writing a lot of {case (L,M) => ... case (P,Q) => ...} then it may be the sign that partial functions are a better fit.
Otherwise options work well with the rest of the collections and can be used like this instead of your (a) proposal:
val processedPaths = for {
f <- finders
p <- f(l1, l2)
} yield process(p)
Within the for comprehension p is lifted into an Traversable, so you don't even have to call filter, isDefined or get to skip the finders without results.