Deserialize a binary tree breadth-first in Functional Programming - scala

I had an imperative program which deserializes to a Binary Tree from an array. Its a BFS algorithm. I was wondering how to do this in Scala with functional programming concepts.
class TreeNode(_value: Int = 0, _left: TreeNode = null, _right: TreeNode = null) {
val value: Int = _value
val left: TreeNode = _left
val right: TreeNode = _right
}
def createTree(list: Array[Int]): TreeNode = ???
For reference this is the imperative version in Javascript. The algorithm is described here. https://support.leetcode.com/hc/en-us/articles/360011883654-What-does-1-null-2-3-mean-in-binary-tree-representation-
class TreeNode {
constructor(val){
this.val = val;
this.left = this.right = null;
}
}
function makeTree(arr){
let root = new TreeNode(arr[0]);
let q = [root];
let index = 1;
while(q.length){
let node = q.splice(0,1)[0];
if(arr[index] != null){
node.left = new TreeNode(arr[index]);
q.push(node.left);
}
index++;
if(arr[index] != null){
node.right = new TreeNode(arr[index]);
q.push(node.right);
}
index++;
}
return root;
}

First of all, you can use a case class to simplify your tree class, and you should use Option instead of null:
case class Tree(value: Int, left: Option[Tree], right: Option[Tree])
Next, the main trouble here is because your tree is immutable, it needs to be built with a depth-first post-order traversal, and your serialization format requires a breadth-first level-order traversal. So you first have to deserialize to a data structure that can then be traversed in depth-first order. The following uses a Map from (row, column) to the node value:
#scala.annotation.tailrec
private def bfsTraverse(
serialized: List[Option[Int]],
queue: Queue[(Int, Int)],
map: Map[(Int, Int), Int]): Map[(Int, Int), Int] = {
val ((row, col), queueTail) = queue.dequeue
if (serialized.isEmpty) {
map
} else if (serialized.head.isEmpty) {
bfsTraverse(serialized.tail, queueTail, map)
} else {
val newQueue = queueTail.enqueueAll(List((row + 1, col * 2), (row + 1, col * 2 + 1)))
val newMap = map + ((row, col) -> serialized.head.get)
bfsTraverse(serialized.tail, newQueue, newMap)
}
}
Now you can use the output of that function to build your Tree:
private def buildTree(row: Int, col: Int, map: Map[(Int, Int), Int]): Option[Tree] = {
map.get((row, col)).map{value =>
Tree(value, buildTree(row + 1, col * 2, map), buildTree(row + 1, col * 2 + 1, map))
}
}

This solution is a bit verbose but uses some functional concepts and defines the data structures thoroughly.
The algorithm you provided works better with mutable nodes. It's possible to have a shorter solution with just one mutable class, but here, there are two versions (one node class with mutable left/right and the other completely immutable).
Case classes automatically provide a lot of useful features such as comparison and friendly print-out
The processValues function tail-recursively performs the tasks equivalent to the makeTree function you provided.
The #tailrec annotation tells the compiler to check that the function is tail recursive.
Pattern matching using the match and case keywords replace checking for null-ness or different subtypes, and the compiler can check for a non-exhaustive match clause.
The App trait allows you to easily make an object (static class) into an entrypoint to run some examples.
import scala.annotation.tailrec
sealed trait TreeNode[T]
sealed trait MutableTreeNode[T]
object MutableTreeNode {
final case class Empty[T]() extends MutableTreeNode[T]
case class Branch[T](val value: T) extends MutableTreeNode[T] {
protected var left: MutableTreeNode[T] = Empty()
protected var right: MutableTreeNode[T] = Empty()
def setLeft(newLeft: T): Branch[T] = {
left = Branch(newLeft)
left.asInstanceOf[Branch[T]] // shouldn't be necessary but compiler requires it
}
def setRight(newRight: T): Branch[T] = {
right = Branch(newRight)
right.asInstanceOf[Branch[T]]
}
override def toString: String = this.toImmutable().toString
/* converts given node to immutable version */
private def toImmutable(node: MutableTreeNode[T]): TreeNode[T] = {
node match {
case Empty() => TreeNode.Empty()
case b#Branch(value) => TreeNode.Branch(value, toImmutable(b.left), toImmutable(b.right))
}
}
def toImmutable():TreeNode[T] = toImmutable(this)
}
/**
* Modifies nodes inside of queue
*/
#tailrec def processValues[T](values: Seq[Option[T]], queue: Seq[MutableTreeNode.Branch[T]]): Unit = {
(queue, values) match {
case (Nil, _) => ()
case (_, Nil) => ()
case (qHead :: qTail, Some(vLeft) :: Some(vRight) :: vTail) =>
processValues(vTail, qTail :+ qHead.setLeft(vLeft) :+ qHead.setRight(vRight))
case (qHead :: qTail, Some(vLeft) :: None :: vTail) =>
processValues(vTail, qTail :+ qHead.setLeft(vLeft))
case (qHead :: qTail, None :: Some(vRight) :: vTail) =>
processValues(vTail, qTail :+ qHead.setRight(vRight))
case (qHead :: qTail, None :: None :: vTail) =>
processValues(vTail, qTail)
}
}
}
object TreeNode {
final case class Empty[T]() extends TreeNode[T]
final case class Branch[T](value: T, left: TreeNode[T], right: TreeNode[T]) extends TreeNode[T]
def deserialize[T](values: Seq[Option[T]]): TreeNode[T] = {
values match {
case Some(headVal) :: tail =>
val root: MutableTreeNode.Branch[T] = MutableTreeNode.Branch(headVal)
MutableTreeNode.processValues(tail, Seq(root))
root.toImmutable()
case Nil => Empty()
case _ => throw new RuntimeException("Invalid argument values")
}
}
}
object TreeNodeTest extends App {
val input = Seq(Some(5), Some(4), Some(7), None, None, Some(2), None)
val treeNode:TreeNode[Int] = TreeNode.deserialize(input)
println(treeNode)
}

As has been noted, Scala avoids null whenever possible, preferring Option to indicate the absence of a value.
Mutable variables are also shunned, which makes it much easier to construct a B-tree in a depth-first manner rather than breadth-first.
So all you really need is an easy-to-use breadth-first-serialization --to--> depth-first-serialization translator.
I did it in two steps.
//from Breadth-First-Serialization to full tree representation
def BFS2full[A](bfs:IndexedSeq[Option[A]]) :List[List[Option[A]]] = {
val bfsLen = bfs.length
if (bfs.isEmpty) Nil
else
List(bfs.head) :: List.unfold((List(bfs.head),1)){case (pr,idx) =>
Option.when(bfsLen > idx){
val ns = pr.foldLeft((List.empty[Option[A]],idx)){
case ((acc,x), None) => (acc ++ List(None,None), x)
case ((acc,x), _) => (acc ++ List(bfs.lift(x).flatten
,bfs.lift(x+1).flatten), x+2)
}
(ns._1, ns)
}
}
}
//from full tree representation to Depth-First-Serialization
def toDFS[A](lloa :List[List[Option[A]]]
,lvl :Int = 0) :List[Option[A]] = lloa match {
case Nil => List(None, None)
case List(None) :: Nil => List(None)
case List( oa ) :: tl => oa :: toDFS(tl, lvl)
case row :: tl => row.drop(lvl*2) match {
case List(None,None,_*) => List(None, None)
case List(None, ob ,_*) => None :: (ob::toDFS(tl,2*lvl + 1))
case List( oa ,None,_*) => (oa::toDFS(tl,2*lvl)) ++ List(None)
case List( oa , ob ,_*) => (oa :: toDFS(tl, 2*lvl)) ++
(ob :: toDFS(tl,2*lvl + 1))
}
}
Now let's parameterize the tree so that we can build Int trees, Float trees, String trees, etc.
We're also going to make the constructor private so that node creation is only done via factory methods.
case class Tree[A] private (value : A
,left : Option[Tree[A]]
,right : Option[Tree[A]])
All that's left is to supply the factory methods.
object Tree {
private def BFS2full[A]( . . . //as above
private def toDFS[A]( . . . //as above
def fromDFS[A](dfs :IterableOnce[Option[A]]) :Option[Tree[A]] = {
val itr = dfs.iterator
def loop(): Option[Tree[A]] =
Option.when(itr.hasNext)(itr.next())
.flatten
.map(new Tree(_,loop(),loop()))
loop()
}
def fromBFS[A](bfs:IndexedSeq[Option[A]]) :Option[Tree[A]] =
fromDFS(toDFS(BFS2full(bfs)))
}
testing:
Tree.fromBFS(Vector(Some('A'),None,Some('B'),Some('C'))).get
//res0: Tree[Char] = Tree(A,None,Some(Tree(B,Some(Tree(C,None,None)),None)))

Here's a basic solution. It doesn't use lazy vals or any data structures other than List, Option, and Either. You might find it easier to understand because of that or harder because of the verbosity.
I've defined Tree like this, just to make things easier.
sealed trait Tree[+T]
case class Node[+T](data: T, left: Tree[T], right: Tree[T]) extends Tree[T]
case object Empty extends Tree[Nothing]
Also, instead of an Array[Int], I'm using a List[Option[T]] (where T is a type parameter of the makeTree method). Some means there's a node, None is like -1 in your code. This is more idiomatic and also works for types other than Int.
The thing about doing this breadth-first is that you need to kinda keep passing the input list around to the children. First you try to make the left child, then when that's done, you try make the right child. Then you try to make the left child's children, then the right child's children, then their children, and so on.
One way to deal this without using var would be to take in the input list containing the serialized binary tree and see what the head of the list is. If it's a None, we can just return an Empty, because we know it's the end of the tree. If it's a Some however, we can't yet return a tree, but we can return another function that takes in the next round of input and returns a tree.
Since there are 2 different types that can be returned, these functions will be of type List[Option[T]] => Either[List[Option[T]] => ThisFunctionsType, Tree[T]]. Since the function may return another function of the same type, we'll have to define a new type that can return itself:
trait RecFun[T] extends ((List[Option[T]]) => (List[Option[T]], Either[RecFun[T], Tree[T]]))
The reason that it's List[Option[T]] => (List[Option[T]], Either[RecFun[T], Tree[T]]) and not just List[Option[T]] => Either[RecFun[T], Tree[T]] is that in case one child turns out to be a leaf somewhere in the middle of the list, we still need to continue, so the first element of the returned tuple contains the rest of the list after processing.
Now we can define this. The helper function is so that as long as the RecFun returns a Left[RecFun], it keeps passing the remaining input into that function.
def makeTree[T](list: List[Option[T]]): Tree[T] = {
def helper(f: RecFun[T], l: List[Option[T]]): Tree[T] =
f(l) match {
case (_, Right(tree)) => tree
case (next, Left(f)) => helper(f, next)
}
list match {
case Some(x) :: tail => helper(createRec(x), tail)
case _ => Empty
}
}
def createRec[T](data: T): RecFun[T] = {
case None :: Nil | Nil => (Nil, Right(Node(data, Empty, Empty)))
case Some(l) :: Nil => (Nil, Right(Node(data, Node(l, Empty, Empty), Empty)))
case lo :: ro :: rest =>
(rest, (lo, ro) match {
case (Some(l), Some(r)) =>
Left(waitForChildren(data, createRec(l), createRec(r)))
case (Some(l), None) =>
Left(waitForChild(Node(data, _, Empty), createRec(l)))
case (None, Some(r)) =>
Left(waitForChild(Node(data, Empty, _), createRec(r)))
case (None, None) => Right(Node(data, Empty, Empty))
})
}
def waitForChildren[T](data: T, leftF: RecFun[T], rightF: RecFun[T]): RecFun[T] =
input => {
val (next, res) = leftF(input)
res match {
case Right(tree) =>
(next, Left(waitForChild(Node(data, tree, _), rightF)))
case Left(leftF2) => {
val (next2, res2) = rightF(next)
(next2, Left(res2 match {
case Right(tree) => waitForChild(Node(data, _, tree), leftF2)
case Left(rightF2) => waitForChildren(data, leftF2, rightF2)
}))
}
}
}
def waitForChild[T](ctor: Tree[T] => Node[T], f: RecFun[T]): RecFun[T] =
input => {
val (next, res) = f(input)
(next, res match {
case Right(tree) => Right(ctor(tree))
case Left(recFun) => Left(waitForChild(ctor, recFun))
})
}
Scastie

Related

Functional patterns for better chaining of collect

I often find myself needing to chain collects where I want to do multiple collects in a single traversal. I also would like to return a "remainder" for things that don't match any of the collects.
For example:
sealed trait Animal
case class Cat(name: String) extends Animal
case class Dog(name: String, age: Int) extends Animal
val animals: List[Animal] =
List(Cat("Bob"), Dog("Spot", 3), Cat("Sally"), Dog("Jim", 11))
// Normal way
val cats: List[Cat] = animals.collect { case c: Cat => c }
val dogAges: List[Int] = animals.collect { case Dog(_, age) => age }
val rem: List[Animal] = Nil // No easy way to create this without repeated code
This really isn't great, it requires multiple iterations and there is no reasonable way to calculate the remainder. I could write a very complicated fold to pull this off, but it would be really nasty.
Instead, I usually opt for mutation which is fairly similar to the logic you would have in a fold:
import scala.collection.mutable.ListBuffer
// Ugly, hide the mutation away
val (cats2, dogsAges2, rem2) = {
// Lose some benefits of type inference
val cs = ListBuffer[Cat]()
val da = ListBuffer[Int]()
val rem = ListBuffer[Animal]()
// Bad separation of concerns, I have to merge all of my functions
animals.foreach {
case c: Cat => cs += c
case Dog(_, age) => da += age
case other => rem += other
}
(cs.toList, da.toList, rem.toList)
}
I don't like this one bit, it has worse type inference and separation of concerns since I have to merge all of the various partial functions. It also requires lots of lines of code.
What I want, are some useful patterns, like a collect that returns the remainder (I grant that partitionMap new in 2.13 does this, but uglier). I also could use some form of pipe or map for operating on parts of tuples. Here are some made up utilities:
implicit class ListSyntax[A](xs: List[A]) {
import scala.collection.mutable.ListBuffer
// Collect and return remainder
// A specialized form of new 2.13 partitionMap
def collectR[B](pf: PartialFunction[A, B]): (List[B], List[A]) = {
val rem = new ListBuffer[A]()
val res = new ListBuffer[B]()
val f = pf.lift
for (elt <- xs) {
f(elt) match {
case Some(r) => res += r
case None => rem += elt
}
}
(res.toList, rem.toList)
}
}
implicit class Tuple2Syntax[A, B](x: Tuple2[A, B]){
def chainR[C](f: B => C): Tuple2[A, C] = x.copy(_2 = f(x._2))
}
Now, I can write this in a way that could be done in a single traversal (with a lazy datastructure) and yet follows functional, immutable practice:
// Relatively pretty, can imagine lazy forms using a single iteration
val (cats3, (dogAges3, rem3)) =
animals.collectR { case c: Cat => c }
.chainR(_.collectR { case Dog(_, age) => age })
My question is, are there patterns like this? It smells like the type of thing that would be in a library like Cats, FS2, or ZIO, but I am not sure what it might be called.
Scastie link of code examples: https://scastie.scala-lang.org/Egz78fnGR6KyqlUTNTv9DQ
I wanted to see just how "nasty" a fold() would be.
val (cats
,dogAges
,rem) = animals.foldRight((List.empty[Cat]
,List.empty[Int]
,List.empty[Animal])) {
case (c:Cat, (cs,ds,rs)) => (c::cs, ds, rs)
case (Dog(_,d),(cs,ds,rs)) => (cs, d::ds, rs)
case (r, (cs,ds,rs)) => (cs, ds, r::rs)
}
Eye of the beholder I suppose.
How about defining a couple utility classes to help you with this?
case class ListCollect[A](list: List[A]) {
def partialCollect[B](f: PartialFunction[A, B]): ChainCollect[List[B], A] = {
val (cs, rem) = list.partition(f.isDefinedAt)
new ChainCollect((cs.map(f), rem))
}
}
case class ChainCollect[A, B](tuple: (A, List[B])) {
def partialCollect[C](f: PartialFunction[B, C]): ChainCollect[(A, List[C]), B] = {
val (cs, rem) = tuple._2.partition(f.isDefinedAt)
ChainCollect(((tuple._1, cs.map(f)), rem))
}
}
ListCollect is just meant to start the chain, and ChainCollect takes the previous remainder (the second element of the tuple) and tries to apply a PartialFunction to it, creating a new ChainCollect object. I'm not particularly fond of the nested tuples this produces, but you may be able to make it look a bit better if you use Shapeless's HLists.
val ((cats, dogs), rem) = ListCollect(animals)
.partialCollect { case c: Cat => c }
.partialCollect { case Dog(_, age) => age }
.tuple
Scastie
Dotty's *: type makes this a bit easier:
opaque type ChainResult[Prev <: Tuple, Rem] = (Prev, List[Rem])
extension [P <: Tuple, R, N](chainRes: ChainResult[P, R]) {
def partialCollect(f: PartialFunction[R, N]): ChainResult[List[N] *: P, R] = {
val (cs, rem) = chainRes._2.partition(f.isDefinedAt)
(cs.map(f) *: chainRes._1, rem)
}
}
This does end up in the output being reversed, but it doesn't have that ugly nesting from my previous approach:
val ((owls, dogs, cats), rem) = (EmptyTuple, animals)
.partialCollect { case c: Cat => c }
.partialCollect { case Dog(_, age) => age }
.partialCollect { case Owl(wisdom) => wisdom }
/* more animals */
case class Owl(wisdom: Double) extends Animal
case class Fly(isAnimal: Boolean) extends Animal
val animals: List[Animal] =
List(Cat("Bob"), Dog("Spot", 3), Cat("Sally"), Dog("Jim", 11), Owl(200), Fly(false))
Scastie
And if you still don't like that, you can always define a few more helper methods to reverse the tuple, add the extension on a List without requiring an EmptyTuple to begin with, etc.
//Add this to the ChainResult extension
def end: Reverse[List[R] *: P] = {
def revHelp[A <: Tuple, R <: Tuple](acc: A, rest: R): RevHelp[A, R] =
rest match {
case EmptyTuple => acc.asInstanceOf[RevHelp[A, R]]
case h *: t => revHelp(h *: acc, t).asInstanceOf[RevHelp[A, R]]
}
revHelp(EmptyTuple, chainRes._2 *: chainRes._1)
}
//Helpful types for safety
type Reverse[T <: Tuple] = RevHelp[EmptyTuple, T]
type RevHelp[A <: Tuple, R <: Tuple] <: Tuple = R match {
case EmptyTuple => A
case h *: t => RevHelp[h *: A, t]
}
And now you can do this:
val (cats, dogs, owls, rem) = (EmptyTuple, animals)
.partialCollect { case c: Cat => c }
.partialCollect { case Dog(_, age) => age }
.partialCollect { case Owl(wisdom) => wisdom }
.end
Scastie
Since you mentioned cats, I would also add solution using foldMap:
sealed trait Animal
case class Cat(name: String) extends Animal
case class Dog(name: String) extends Animal
case class Snake(name: String) extends Animal
val animals: List[Animal] = List(Cat("Bob"), Dog("Spot"), Cat("Sally"), Dog("Jim"), Snake("Billy"))
val map = animals.foldMap{ //Map(other -> List(Snake(Billy)), cats -> List(Cat(Bob), Cat(Sally)), dogs -> List(Dog(Spot), Dog(Jim)))
case d: Dog => Map("dogs" -> List(d))
case c: Cat => Map("cats" -> List(c))
case o => Map("other" -> List(o))
}
val tuples = animals.foldMap{ //(List(Dog(Spot), Dog(Jim)),List(Cat(Bob), Cat(Sally)),List(Snake(Billy)))
case d: Dog => (List(d), Nil, Nil)
case c: Cat => (Nil, List(c), Nil)
case o => (Nil, Nil, List(o))
}
Arguably it's more succinct than fold version, but it has to combine partial results using monoids, so it won't be as performant.
This code is dividing a list into three sets, so the natural way to do this is to use partition twice:
val (cats, notCat) = animals.partitionMap{
case c: Cat => Left(c)
case x => Right(x)
}
val (dogAges, rem) = notCat.partitionMap {
case Dog(_, age) => Left(age)
case x => Right(x)
}
A helper method can simplify this
def partitionCollect[T, U](list: List[T])(pf: PartialFunction[T, U]): (List[U], List[T]) =
list.partitionMap {
case t if pf.isDefinedAt(t) => Left(pf(t))
case x => Right(x)
}
val (cats, notCat) = partitionCollect(animals) { case c: Cat => c }
val (dogAges, rem) = partitionCollect(notCat) { case Dog(_, age) => age }
This is clearly extensible to more categories, with the slight irritation of having to invent temporary variable names (which could be overcome by explicit n-way partition methods)

Merging elements of a list of case classes

I have the following case class:
case class GHUser(login:String, contributions:Option[Int])
And a list of such elements:
val list = List(
List(GHUser("a", Some(10)), GHUser("b", Some(10))), List(GHUser("b", Some(300)))
).flatten
And now I would like to merge all elements such that all contributions are added together for the same user. At first I thought I could apply a Monoid to my case class, like this:
trait Semigroup[A] {
def combine(x: A, y: A): A
}
trait Monoid[A] extends Semigroup[A] {
def empty: A
}
case class GHUser(login: String, contributions: Option[Int])
object Main extends App {
val ghMonoid: Monoid[GHUser] = new Monoid[GHUser] {
def empty: GHUser = GHUser("", None)
def combine(x: GHUser, y: GHUser): GHUser = {
x match {
case GHUser(_, None) => GHUser(y.login, y.contributions)
case GHUser(_, Some(xv)) =>
y match {
case GHUser(_, None) => GHUser(x.login, x.contributions)
case GHUser(_, Some(yv)) => GHUser(x.login, Some(xv + yv))
}
}
}
}
val list = List(
List(GHUser("a", Some(10)), GHUser("b", Some(10))), List(GHUser("b", Some(300)))
).flatten
val b = list.groupBy(_.login)
val c = b.mapValues(_.foldLeft(ghMonoid.empty)(ghMonoid.combine))
println(c.valuesIterator mkString("\n"))
// GHUser(a,Some(10))
// GHUser(b,Some(310))
}
An it works, but I feel like I am not following Monoid Laws, as it is required that all user have the same login (For that reason I did the groupBy call.
Is there a cleaner solution?
Update
Rereading my question, it seems like I do not want a Monoid but a Semigroup, am I right?
groupMapReduce() (Scala 2.13) handles most of what you need.
list.groupMapReduce(_.login)(_.contributions){case (a,b) => a.fold(b)(n => Some(n+b.getOrElse(0)))}
.map(GHUser.tupled)
//res0 = List(GHUser(a,Some(10)), GHUser(b,Some(310)))
The Reduce part is a bit convoluted but it gets the job done.
Here is a simple solution:
list.groupBy(_.login).map{
case (k, v) =>
GHUser(k, Some(v.flatMap(_.contributions).sum))
}
This will give Some(0) for users with no contributions. If you want None in this case it looks more ugly:
list.groupBy(_.login).map{
case (k, v) =>
val c = v.flatMap(_.contributions)
GHUser(k, c.headOption.map(_ => c.sum))
}

Rewriting imperative for loop to declarative style in Scala

How do I rewrite the following loop (pattern) into Scala, either using built-in higher order functions or tail recursion?
This the example of an iteration pattern where you do a computation (comparison, for example) of two list elements, but only if the second one comes after first one in the original input. Note that the +1 step is used here, but in general, it could be +n.
public List<U> mapNext(List<T> list) {
List<U> results = new ArrayList();
for (i = 0; i < list.size - 1; i++) {
for (j = i + 1; j < list.size; j++) {
results.add(doSomething(list[i], list[j]))
}
}
return results;
}
So far, I've come up with this in Scala:
def mapNext[T, U](list: List[T])(f: (T, T) => U): List[U] = {
#scala.annotation.tailrec
def loop(ix: List[T], jx: List[T], res: List[U]): List[U] = (ix, jx) match {
case (_ :: _ :: is, Nil) => loop(ix, ix.tail, res)
case (i :: _ :: is, j :: Nil) => loop(ix.tail, Nil, f(i, j) :: res)
case (i :: _ :: is, j :: js) => loop(ix, js, f(i, j) :: res)
case _ => res
}
loop(list, Nil, Nil).reverse
}
Edit:
To all contributors, I only wish I could accept every answer as solution :)
Here's my stab. I think it's pretty readable. The intuition is: for each head of the list, apply the function to the head and every other member of the tail. Then recurse on the tail of the list.
def mapNext[U, T](list: List[U], fun: (U, U) => T): List[T] = list match {
case Nil => Nil
case (first :: Nil) => Nil
case (first :: rest) => rest.map(fun(first, _: U)) ++ mapNext(rest, fun)
}
Here's a sample run
scala> mapNext(List(1, 2, 3, 4), (x: Int, y: Int) => x + y)
res6: List[Int] = List(3, 4, 5, 5, 6, 7)
This one isn't explicitly tail recursive but an accumulator could be easily added to make it.
Recursion is certainly an option, but the standard library offers some alternatives that will achieve the same iteration pattern.
Here's a very simple setup for demonstration purposes.
val lst = List("a","b","c","d")
def doSomething(a:String, b:String) = a+b
And here's one way to get at what we're after.
val resA = lst.tails.toList.init.flatMap(tl=>tl.tail.map(doSomething(tl.head,_)))
// resA: List[String] = List(ab, ac, ad, bc, bd, cd)
This works but the fact that there's a map() within a flatMap() suggests that a for comprehension might be used to pretty it up.
val resB = for {
tl <- lst.tails
if tl.nonEmpty
h = tl.head
x <- tl.tail
} yield doSomething(h, x) // resB: Iterator[String] = non-empty iterator
resB.toList // List(ab, ac, ad, bc, bd, cd)
In both cases the toList cast is used to get us back to the original collection type, which might not actually be necessary depending on what further processing of the collection is required.
Comeback Attempt:
After deleting my first attempt to give an answer I put some more thought into it and came up with another, at least shorter solution.
def mapNext[T, U](list: List[T])(f: (T, T) => U): List[U] = {
#tailrec
def loop(in: List[T], out: List[U]): List[U] = in match {
case Nil => out
case head :: tail => loop(tail, out ::: tail.map { f(head, _) } )
}
loop(list, Nil)
}
I would also like to recommend the enrich my library pattern for adding the mapNext function to the List api (or with some adjustments to any other collection).
object collection {
object Implicits {
implicit class RichList[A](private val underlying: List[A]) extends AnyVal {
def mapNext[U](f: (A, A) => U): List[U] = {
#tailrec
def loop(in: List[A], out: List[U]): List[U] = in match {
case Nil => out
case head :: tail => loop(tail, out ::: tail.map { f(head, _) } )
}
loop(underlying, Nil)
}
}
}
}
Then you can use the function like:
list.mapNext(doSomething)
Again, there is a downside, as concatenating lists is relatively expensive.
However, variable assignemends inside for comprehensions can be quite inefficient, too (as this improvement task for dotty Scala Wart: Convoluted de-sugaring of for-comprehensions suggests).
UPDATE
Now that I'm into this, I simply cannot let go :(
Concerning 'Note that the +1 step is used here, but in general, it could be +n.'
I extended my proposal with some parameters to cover more situations:
object collection {
object Implicits {
implicit class RichList[A](private val underlying: List[A]) extends AnyVal {
def mapNext[U](f: (A, A) => U): List[U] = {
#tailrec
def loop(in: List[A], out: List[U]): List[U] = in match {
case Nil => out
case head :: tail => loop(tail, out ::: tail.map { f(head, _) } )
}
loop(underlying, Nil)
}
def mapEvery[U](step: Int)(f: A => U) = {
#tailrec
def loop(in: List[A], out: List[U]): List[U] = {
in match {
case Nil => out.reverse
case head :: tail => loop(tail.drop(step), f(head) :: out)
}
}
loop(underlying, Nil)
}
def mapDrop[U](drop1: Int, drop2: Int, step: Int)(f: (A, A) => U): List[U] = {
#tailrec
def loop(in: List[A], out: List[U]): List[U] = in match {
case Nil => out
case head :: tail =>
loop(tail.drop(drop1), out ::: tail.drop(drop2).mapEvery(step) { f(head, _) } )
}
loop(underlying, Nil)
}
}
}
}
list // [a, b, c, d, ...]
.indices // [0, 1, 2, 3, ...]
.flatMap { i =>
elem = list(i) // Don't redo access every iteration of the below map.
list.drop(i + 1) // Take only the inputs that come after the one we're working on
.map(doSomething(elem, _))
}
// Or with a monad-comprehension
for {
index <- list.indices
thisElem = list(index)
thatElem <- list.drop(index + 1)
} yield doSomething(thisElem, thatElem)
You start, not with the list, but with its indices. Then, you use flatMap, because each index goes to a list of elements. Use drop to take only the elements after the element we're working on, and map that list to actually run the computation. Note that this has terrible time complexity, because most operations here, indices/length, flatMap, map, are O(n) in the list size, and drop and apply are O(n) in the argument.
You can get better performance if you a) stop using a linked list (List is good for LIFO, sequential access, but Vector is better in the general case), and b) make this a tiny bit uglier
val len = vector.length
(0 until len)
.flatMap { thisIdx =>
val thisElem = vector(thisIdx)
((thisIdx + 1) until len)
.map { thatIdx =>
doSomething(thisElem, vector(thatIdx))
}
}
// Or
val len = vector.length
for {
thisIdx <- 0 until len
thisElem = vector(thisIdx)
thatIdx <- (thisIdx + 1) until len
thatElem = vector(thatIdx)
} yield doSomething(thisElem, thatElem)
If you really need to, you can generalize either version of this code to all IndexedSeqs, by using some implicit CanBuildFrom parameters, but I won't cover that.

scalaz, Disjunction.sequence returning a list of lefts

In scalaz 7.2.6, I want to implement sequence on Disjunction, such that if there is one or more lefts, it returns a list of those, instead of taking only the first one (as in Disjunction.sequenceU):
import scalaz._, Scalaz._
List(1.right, 2.right, 3.right).sequence
res1: \/-(List(1, 2, 3))
List(1.right, "error2".left, "error3".left).sequence
res2: -\/(List(error2, error3))
I've implemented it as follows and it works, but it looks ugly. Is there a getRight method (such as in scala Either class, Right[String, Int](3).right.get)? And how to improve this code?
implicit class RichSequence[L, R](val l: List[\/[L, R]]) {
def getLeft(v: \/[L, R]):L = v match { case -\/(x) => x }
def getRight(v: \/[L, R]):R = v match { case \/-(x) => x }
def sequence: \/[List[L], List[R]] =
if (l.forall(_.isRight)) {
l.map(e => getRight(e)).right
} else {
l.filter(_.isLeft).map(e => getLeft(e)).left
}
}
Playing around I've implemented a recursive function for that, but the best option would be to use separate:
implicit class RichSequence[L, R](val l: List[\/[L, R]]) {
def sequence: \/[List[L], List[R]] = {
def seqLoop(left: List[L], right: List[R], list: List[\/[L, R]]): \/[List[L], List[R]] =
list match {
case (h :: t) =>
h match {
case -\/(e) => seqLoop(left :+ e, right, t)
case \/-(s) => seqLoop(left, right :+ s, t)
}
case Nil =>
if(left.isEmpty) \/-(right)
else -\/(left)
}
seqLoop(List(), List(), l)
}
def sequenceSeparate: \/[List[L], List[R]] = {
val (left, right) = l.separate[\/[L, R], L, R]
if(left.isEmpty) \/-(right)
else -\/(left)
}
}
The first one just collects results and at the end decide what to do with those, the second its basically the same with the exception that the recursive function is much simpler, I didn't think about performance here, I've used :+, if you care use prepend or some other collection.
You may also want to take a look at Validation and ValidationNEL which unlike Disjunction accumulate failures.

Scala - grouping on an ordered iterator lazily

I have an Iterator[Record] which is ordered on record.id this way:
record.id=1
record.id=1
...
record.id=1
record.id=2
record.id=2
..
record.id=2
Records of a specific ID could occur a large number of times, so I want to write a function that takes this iterator as input, and returns an Iterator[Iterator[Record]] output in a lazy manner.
I was able to come up with the following, but it fails on StackOverflowError after 500K records or so:
def groupByIter[T, B](iterO: Iterator[T])(func: T => B): Iterator[Iterator[T]] = new Iterator[Iterator[T]] {
var iter = iterO
def hasNext = iter.hasNext
def next() = {
val first = iter.next()
val firstValue = func(first)
val (i1, i2) = iter.span(el => func(el) == firstValue)
iter = i2
Iterator(first) ++ i1
}
}
What am I doing wrong?
Trouble here is that each Iterator.span call makes another stacked closure for trailing iterator, and without any trampolining it's very easy to overflow.
Actually I dont think there is an implementation, which is not memoizing elements of prefix iterator, since followed iterator could be accessed earlier than prefix is drain out.
Even in .span implementation there is a Queue to memoize elements in the Leading definition.
So easiest implementation that I could imagine is the following via Stream.
implicit class StreamChopOps[T](xs: Stream[T]) {
def chopBy[U](f: T => U): Stream[Stream[T]] = xs match {
case x #:: _ =>
def eq(e: T) = f(e) == f(x)
xs.takeWhile(eq) #:: xs.dropWhile(eq).chopBy(f)
case _ => Stream.empty
}
}
Although it could be not the most performant as it memoize a lot. But with proper iterating of that, GC should handle problem of excess intermediate streams.
You could use it as myIterator.toStream.chopBy(f)
Simple check validates that following code can run without SO
Iterator.fill(10000000)(Iterator(1,1,2)).flatten //1,1,2,1,1,2,...
.toStream.chopBy(identity) //(1,1),(2),(1,1),(2),...
.map(xs => xs.sum * xs.size).sum //60000000
Inspired by chopBy implemented by #Odomontois here is a chopBy I implemented for Iterator. Of course each bulk should fit allocated memory. It doesn't looks very elegant but it seems to work :)
implicit class IteratorChopOps[A](toChopIter: Iterator[A]) {
def chopBy[U](f: A => U) = new Iterator[Traversable[A]] {
var next_el: Option[A] = None
#tailrec
private def accum(acc: List[A]): List[A] = {
next_el = None
val new_acc = hasNext match {
case true =>
val next = toChopIter.next()
acc match {
case Nil =>
acc :+ next
case _ MatchTail t if (f(t) == f(next)) =>
acc :+ next
case _ =>
next_el = Some(next)
acc
}
case false =>
next_el = None
return acc
}
next_el match{
case Some(_) =>
new_acc
case None => accum(new_acc)
}
}
def hasNext = {
toChopIter.hasNext || next_el.isDefined
}
def next: Traversable[A] = accum(next_el.toList)
}
}
And here is an extractor for matching tail:
object MatchTail {
def unapply[A] (l: Traversable[A]) = Some( (l.init, l.last) )
}