How to improve this "update" function? - scala

Suppose I've got case class A(x: Int, s: String) and need to update a List[A] using a Map[Int, String] like that:
def update(as: List[A], map: Map[Int, String]): List[A] = ???
val as = List(A(1, "a"), A(2, "b"), A(3, "c"), A(4, "d"))
val map = Map(2 -> "b1", 4 -> "d1", 5 -> "e", 6 -> "f")
update(as, map) // List(A(1, "a"), A(2, "b1"), A(3, "c"), A(4, "d1"))
I am writing update like that:
def update(as: List[A], map: Map[Int, String]): List[A] = {
#annotation.tailrec
def loop(acc: List[A], rest: List[A], map: Map[Int, String]): List[A] = rest match {
case Nil => acc
case as => as.span(a => !map.contains(a.x)) match {
case (xs, Nil) => xs ++ acc
case (xs, y :: ys) => loop((y.copy(s = map(y.x)) +: xs) ++ acc, ys, map - y.x)
}
}
loop(Nil, as, map).reverse
}
This function works fine but it's suboptimal because it continues iterating over the input list when map is empty. Besides, it looks overcomplicated. How would you suggest improve this update function ?

If you can not make any supposition about the List and the Map. Then the best is to just iterate the former, juts once and in the simplest way possible; that is, using the map function.
list.map { a =>
map
.get(key = a.x)
.fold(ifEmpty = a) { s =>
a.copy(s = s)
}
}
However, if and only if, you can be sure that most of the time:
The List will be big.
The Map will be small.
The keys in the Map are a subset of the values in the List.
And all operations will be closer to the head of the List rather than the tail.
Then, you could use the following approach which should be more efficient in such cases.
def optimizedUpdate(data: List[A], updates: Map[Int, String]): List[A] = {
#annotation.tailrec
def loop(remaining: List[A], map: Map[Int, String], acc: List[A]): List[A] =
if (map.isEmpty) acc reverse_::: remaining
else remaining match {
case a :: as =>
map.get(key = a.x) match {
case None =>
loop(
remaining = as,
map,
a :: acc
)
case Some(s) =>
loop(
remaining = as,
map = map - a.x,
a.copy(s = s) :: acc
)
}
case Nil =>
acc.reverse
}
loop(remaining = data, map = updates, acc = List.empty)
}
However note that the code is not only longer and more difficult to understand.
It is actually more inefficient than the map solution (if the conditions are not meet); this is because the stdlib implementation "cheats" and constructs the List my mutating its tail instead of building it backwards and then reversing it as we did.
In any case, as with any things performance, the only real answer is to benchmark.
But, I would go with the map solution just for clarity or with a mutable approach if you really need speed.
You can see the code running here.

How about
def update(as: List[A], map: Map[Int, String]): List[A] =
as.foldLeft(List.empty[A]) { (agg, elem) =>
val newA = map
.get(elem.x)
.map(a => elem.copy(s = a))
.getOrElse(elem)
newA :: agg
}.reverse

Related

Concat sorted generic sequence

I need to concat two generic sequences, I tried to do it like this, but I understand that this is wrong. How should I do it in right way? (I need to get new Seq which will be ordered too)
object Main extends App{
val strings = Seq("f", "d", "a")
val numbers = Seq(1,5,4,2)
val strings2 = Seq("c", "b");
val strings3 = strings2.concat(strings)
println(strings3)
println(numbers)
}
class Seq[T] private(initialElems: T*) {
override def toString: String = initialElems.toString
val elems = initialElems
def concat(a:Seq[T]) = a.elems ++ this.elems
}
object Seq {
def apply[T: Ordering](initialElems: T*): Seq[T] = new Seq(initialElems.sorted:_*)
}
You could create a function which will go through both lists taking heads of both lists and comparing them, then appending appropriate head to the result list. Then it should take the next two heads and repeat until one list is over.
Here's tail-recursive example:
import scala.annotation.tailrec
def merge[A](a: List[A], b: List[A])(implicit ordering: Ordering[A]): List[A] = {
#tailrec
def go(a: List[A], b: List[A], acc: List[A] = Nil): List[A] = {
(a, b) match {
case (ax :: as, bx :: bs) => if(ordering.compare(ax, bx) < 0) go(as, bx :: bs, ax :: acc) else go(ax :: as, bs, bx :: acc)
case (Nil, bs) => acc.reverse ++ bs
case (as, Nil) => acc.reverse ++ as
case _ => acc.reverse
}
}
go(a, b)
}
val strings = List("a", "d", "f")
val strings2 = List("b", "c")
merge(strings, strings2) // List(a,b,c,d,e)
I used List instead of Seq. You should rather not use Seq, which is very general type, but utilize more specific collection types, which suit your task best, like Vector, List, ArraySeq etc.
You can't concat two sorted arrays using ++ keeping order. ++ just stick one sequence to the end of another.
You need to implement something like merge operation from merge sort algorithm and create new Seq from merged elems without sorting.
So, you need to do 3 things:
Implement merge:
def merge(a: Seq[T], b: Seq[T]): YourElemsType[T] = ???
Implement new method for creating Seq instance without sorting in object Seq:
def fromSorted(initialElems: T*): Seq[T] = new Seq(initialElems:_*)
After all, your concat can be implemented as composition merge and fromSorted:
def concat(a:Seq[T]): Seq[T] = Seq.fromSorted(merge(this, a))
Read more about merge sort wiki

How to replace or append an item in/to a list?

Suppose I've got a list of case class A(id: Int, str: String) and an instance of A. I need to either replace an item from the list with the new instance or append the new instance to the list.
case class A(id: Int, str: String)
def replaceOrAppend(as: List[A], a: A): List[A] = ???
val as = List(A(1, "a1"), A(2, "a2"), A(3, "a3"))
replaceOrAppend(as, A(2, "xyz")) // List(A(1, "a1"), A(2, "xyz"), A(3, "a3"))
replaceOrAppend(as, A(5, "xyz")) // List(A(1, "a1"), A(2, "a2"), A(3, "a3"), A(5, "xyz"))
I can write replaceOrAppend like this:
def replaceOrAppend(as: List[A], a: A): List[A] =
if (as.exists(_.id == a.id)) as.map(x => if (x.id == a.id) a else x) else as :+ a
This implementation is a bit clumsy and obviously suboptimal since it passes the input list twice. How to implement replaceOrAppend to pass the input list just once ?
If the order is not essential I would go with:
def replaceOrAppend(as: List[A], a: A): List[A] =
a::as.filterNot(_.id == a.id)
This would also work if the order is related to id or str:
def replaceOrAppend(as: List[A], a: A): List[A] =
(a::as.filterNot(_.id == a.id)).sortBy(_.id)
And if the order must be kept (as Micheal suggested - I couldn't find anything better):
def replaceOrAppend(as: List[A], a: A): List[A] =
as.span(_.id != a.id) match { case (xs, ys) => xs ++ (a :: ys.drop(1)) }
Here is another one:
def replaceOrAppend(as: List[A], a: A): List[A] = {
as.find(_.id==a.id).map(op => {
as.map(el => el match {
case e if e.id==a.id => e.copy(str=a.str)
case _ => el
})
}).getOrElse((a::as.reverse).reverse)
}
What about this? Still clumsy but only uses one iteration.
def replaceOrAppend(as: List[A], a: A): List[A] = {
val (updatedList,itemToAppend) = as.foldLeft((List[A](),Option(a))) {
case ((acc, Some(item)), l) =>
if (item.id == l.id) (acc :+ item, None)
else (acc :+ l, Some(item))
case ((acc, None), l) => (acc :+ l, None)
}
itemToAppend match {
case Some(item) => updatedList :+ item
case None => updatedList
}
}
I do not understand why people forgets that the best way to handle a functional list is through pattern matching + tail-recursion.
IMHO, this looks cleaner and tries to be as efficient as possible.
final case class A(id: Int, str: String)
def replaceOrAppend(as: List[A], a: A): List[A] = {
#annotation.tailrec
def loop(remaining: List[A], acc: List[A]): List[A] =
remaining match {
case x :: xs if (x.id == a.id) =>
acc reverse_::: (a :: xs)
case x :: xs =>
loop(remaining = xs, acc = x :: acc)
case Nil =>
(a :: acc).reverse
}
loop(remaining = as, acc = List.empty)
}
technically speaking, this traverse the list twice on the worst case.
But, it is always better to build a list by prepending from the head and reverse at the end, than to do many appends.

Reduce RDD[Map[T, V]] by merging maps

I have an RDD of maps, where the maps are certain to have intersecting key sets. Each map may have 10,000s of entries.
I need to merge the maps, such that those with intersecting key sets are merged, but others are left distinct.
Here's what I have. I haven't tested that it works, but I know that it's slow.
def mergeOverlapping(maps: RDD[Map[Int, Int]])(implicit sc: SparkContext): RDD[Map[Int, Int]] = {
val in: RDD[List[Map[Int, Int]]] = maps.map(List(_))
val z = List.empty[Map[Int, Int]]
val t: List[Map[Int, Int]] = in.fold(z) { case (l, r) =>
(l ::: r).foldLeft(List.empty[Map[Int, Int]]) { case (acc, next) =>
val (overlapping, distinct) = acc.partition(_.keys.exists(next.contains))
overlapping match {
case Nil => next :: acc
case xs => (next :: xs).reduceLeft(merge) :: distinct
}
}
}
sc.parallelize(t)
}
def merge(l: Map[Int, Int], r: Map[Int, Int]): Map[Int, Int] = {
val keys = l.keySet ++ r.keySet
keys.map { k =>
(l.get(k), r.get(k)) match {
case (Some(i), Some(j)) => k -> math.min(i, j)
case (a, b) => k -> (a orElse b).get
}
}.toMap
}
The problem, as far as I can tell, is that RDD#fold is merging and re-merging maps many more times than it has to.
Is there a more efficient mechanism that I could use? Is there another way I can structure my data to make it efficient?

Transform a Scala Sequence in Pairs

I have a sequence like this:
val l = Seq(1,2,3,4)
which I want to transform to List(Seq(1,2), Seq(2,3), Seq(3,4))
Here is what I tried:
def getPairs(inter: Seq[(Int, Int)]): Seq[(Int, Int)] = l match {
case Nil => inter
case x :: xs => getPairs(inter :+ (x, xs.head))
}
This strangely seems not to work? Any suggestions?
You can also just use sliding:
l.sliding(2).toList
res1: List[Seq[Int]] = List(List(1, 2), List(2, 3), List(3, 4))
Ok I got to know about the zip method:
xs zip xs.tail
Using a for comprehension, for instance as follows,
for ( (a,b) <- l zip l.drop(1) ) yield Seq(a,b)
Note l.drop(1) (in contrast to l.tail) will deliver an empty list if l is empty or has at most one item.
The already given answers describe well, how to do this in a scala way.
However, you might also want an explanation why your code does not work, so here it comes:
Your getPairs function expects a list of tuples as input and returns a list of tuples. But you say you want to transform a list of single values into a list to tuples. So if you call getPairs(l) you will get a type mismatch compiler error.
You would have to refactor your code to take a simple list:
def pairs(in: Seq[Int]): Seq[(Int, Int)] = {
#tailrec
def recursive(remaining: Seq[Int], result: Seq[(Int, Int)]): Seq[(Int, Int)] = {
remaining match {
case Nil => result
case last +: Nil => result
case head +: next +: tail => recursive(next +: tail, (head, next) +: result)
}
}
recursive(in, Nil).reverse
}
and from here it's a small step to a generic function:
def pairs2[A](in: Seq[A]): Seq[(A, A)] = {
#tailrec
def recursive(remaining: Seq[A], result: Seq[(A, A)]): Seq[(A, A)] = {
remaining match {
case Nil => result
case last +: Nil => result
case head +: next +: tail => recursive(next +: tail, (head, next) +: result)
}
}
recursive(in, Nil).reverse
}

How to sum a list of tuples by keys

I wrote the following code to sum by keys :
// ("A", 1) ("A", 4)
// ("B", 2) --> ("B", 2)
// ("A", 3)
def sumByKeys[A](tuples: List[(A, Long)]) : List[(A, Long)] = {
tuples.groupBy(_._1).mapValues(_.map(_._2).sum).toList
}
Is there a better way ?
Update : add .toList at the end
I guess this is the most simple immutable form without usage of any additional framework on top of scala.
UPD Actually forgot about final toList. This makes totally different picture in terms of perfomance, because of the mapValues view return type
You can try foldLeft, tailrec, something mutable and they have better perfomance
import annotation.tailrec
#tailrec
final def tailSum[A](tuples: List[(A, Long)], acc: Map[A, Long] = Map.empty[A, Long]): List[(A, Long)] = tuples match {
case (k, v) :: tail => tailSum(tail, acc + (k -> (v + acc.get(k).getOrElse(0L))))
case Nil => acc.toList
}
def foldLeftSum[A](tuples: List[(A, Long)]) = tuples.foldLeft(Map.empty[A, Long])({
case (acc, (k, v)) => acc + (k -> (v + acc.get(k).getOrElse(0L)))
}).toList
def mutableSum[A](tuples: List[(A, Long)]) = {
val m = scala.collection.mutable.Map.empty[A, Long].withDefault(_ => 0L)
for ((k, v) <- tuples) m += (k -> (v + m(k)))
m.toList
}
Updated perfomance testing is here https://gist.github.com/baskakov/8437895, briefly:
scala> avgTime("default", sumByKeys(tuples))
default avg time is 63 ms
scala> avgTime("tailrec", tailSum(tuples))
tailrec avg time is 48 ms
scala> avgTime("foldleft", foldLeftSum(tuples))
foldleft avg time is 45 ms
scala> avgTime("mutableSum", mutableSum(tuples))
mutableSum avg time is 41 ms
The best I can think of gets you slightly better performance and save two characters:
def sumByKeys[A](tuples: List[(A, Long)]) : List[(A, Long)] = {
tuples.groupBy(_._1).mapValues(_.unzip._2.sum)
}
On my machine with Bask.ws' benchmark it took 11ms instead of 13ms without the unzip.
EDIT: In fact I think the performance must be the same... don't know where those 2ms come from
A solution very similar to yours:
def sumByKeys[A](tuples: List[(A, Long)]): List[(A, Long)] =
tuples groupBy (_._1) map { case (k, v) => (k, v.map(_._2).sum) } toList
val l: List[(String, Long)] = List(("A", 1), ("B", 2), ("A", 3))
sumByKeys(l)
// result:
// List[(String, Long)] = List((A,4), (B,2))
What's interesting is that in your solution you use def mapValues[C](f: (B) ⇒ C): Map[A, C] which according to docs has "lazy" evaluation: "Transforms this map by applying a function to every retrieved value."
On the other hand def map[B](f: (A) ⇒ B): Map[B] will build a new collection: "Builds a new collection by applying a function to all elements of this immutable map."
So depending on your needs you could be lazily evaluating a large map, or eagerly evaluating a small one.
Using reduce,
def sumByKeys[A](tuples: List[(A, Long)]): List[(A, Long)] = {
tuples groupBy(_._1) map { _._2 reduce { (a,b) => (a._1, a._2+b._2) } } toList
}
a short for
def sumByKeys[A](tuples: List[(A, Long)]): List[(A, Long)] = {
tuples groupBy(_._1) map { case(k,v) => v reduce { (a,b) => (a._1, a._2+b._2) } } toList
}