Parallel Aggregate is not working on lists .length > 8 - scala

I'm writing a small exercise app that calculates number of unique letters (incl Unicode) in a seq of strings, and I'm using aggregate for it, as I try to run in parallel
here's my code:
class Frequency(seq: Seq[String]) {
type FreqMap = Map[Char, Int]
def calculate() = {
val freqMap: FreqMap = Map[Char, Int]()
val pattern = "(\\p{L}+)".r
val seqop: (FreqMap, String) => FreqMap = (fm, s) => {
s.toLowerCase().foldLeft(freqMap){(fm, c) =>
c match {
case pattern(char) => fm.get(char) match {
case None => fm+((char, 1))
case Some(i) => fm.updated(char, i+1)
}
case _ => fm
}
}
}
val reduce: (FreqMap, FreqMap) => FreqMap =
(m1, m2) => {
m1 ++ m2.map { case (k, v) => k -> (v + m1.getOrElse(k, 0)) }
}
seq.par.aggregate(freqMap)(seqop, reduce)
}
}
and then the code that makes use of that
object Frequency extends App {
val text = List("abc", "abc", "abc", "abc", "abc", "abc", "abc", "abc", "abc");
def frequency(seq: Seq[String]):Map[Char, Int] = {
new Frequency(seq).calculate()
}
Console println frequency(seq=text)
}
though I supplied "abc" 9 times, the result is Map(a -> 8, b -> 8, c -> 8), as it is for any number of "abc"'s > 8
I've looked at this, and it seems like I'm using aggregate correctly
Any suggestions to make it work?

You're discarding already collected results (the first fm) in your seqop. You need to add these to the new results you're computing, e.g. like this:
def calculate() = {
val freqMap: FreqMap = Map[Char, Int]()
val pattern = "(\\p{L}+)".r
val reduce: (FreqMap, FreqMap) => FreqMap =
(m1, m2) => {
m1 ++ m2.map { case (k, v) => k -> (v + m1.getOrElse(k, 0)) }
}
val seqop: (FreqMap, String) => FreqMap = (fm, s) => {
val res = s.toLowerCase().foldLeft(freqMap){(fm, c) =>
c match {
case pattern(char) => fm.get(char) match {
case None => fm+((char, 1))
case Some(i) => fm.updated(char, i+1)
}
case _ => fm
}
}
// I'm reusing your existing combinator function here:
reduce(res,fm)
}
seq.par.aggregate(freqMap)(seqop, reduce)
}
Depending on how the parallel collections divide the work you discard some of it. In your case (9x "abc") it divides the thing in 8 parallel seqop operations which means you discard exactly one result set. This varies depending on numbers, if you run in with say 17x "abc" it runs in 13 parallel operations, discarding 4 result sets (on my machine anyway - I'm not familiar with the underlying code and how it divides the work, this probably depends on the used ExecutionContext/Threadpool and subsequently number of CPUs/cores and so on).
Generally parallel collections are a drop in replacement for sequential collections, meaning if you drop .par you should still get the same result, albeit usually slower. If you do this with your original code you get a result of 1, which tells you that it's not a parallelization problem. This is a good way to test if you're doing to right thing when using these.
And last but not least: This was harder to spot than usual for me because you use the same variable name twice and subsequently shadow fm. Not doing that would make the code more readable and mistakes such as this easier to spot.

Related

How to transform input data into following format? - groupby

What I have is the following input data for a function in a piece of scala code I'm writing:
List(
(1,SubScriptionState(CNN,ONLINE,Seq(12))),
(1,SubScriptionState(SKY,ONLINE,Seq(12))),
(1,SubScriptionState(FOX,ONLINE,Seq(12))),
(2,SubScriptionState(CNN,ONLINE,Seq(12))),
(2,SubScriptionState(SKY,ONLINE,Seq(12))),
(2,SubScriptionState(FOX,ONLINE,Seq(12))),
(2,SubScriptionState(CNN,OFFLINE,Seq(13))),
(2,SubScriptionState(SKY,ONLINE,Seq(13))),
(2,SubScriptionState(FOX,ONLINE,Seq(13))),
(3,SubScriptionState(CNN,OFFLINE,Seq(13))),
(3,SubScriptionState(SKY,ONLINE,Seq(13))),
(3,SubScriptionState(FOX,ONLINE,Seq(13)))
)
SubscriptionState is just a case class here:
case class SubscriptionState(channel: Channel, state: ChannelState, subIds: Seq[Long])
I want to transform it into this:
Map(
1 -> Map(
SubScriptionState(SKY,ONLINE,Seq(12)) -> 1,
SubScriptionState(CNN,ONLINE,Seq(12)) -> 1,
SubScriptionState(FOX,ONLINE,Seq(12)) -> 1),
2 -> Map(
SubScriptionState(SKY,ONLINE,Seq(12,13)) -> 2,
SubScriptionState(CNN,ONLINE,Seq(12)) -> 1,
SubScriptionState(FOX,ONLINE,Seq(12,13)) -> 2,
SubScriptionState(CNN,OFFLINE,Seq(13)) -> 1),
3 -> Map(
SubScriptionState(SKY,ONLINE,Seq(13)) -> 1,
SubScriptionState(FOX,ONLINE,Seq(13)) -> 1,
SubScriptionState(CNN,OFFLINE,Seq(13)) -> 1)
)
How would I go about doing this in scala?
Here is my approach to the problem. I think it may not be a perfect solution, but it works as you would expect.
val result: Map[Int, Map[SubscriptionState, Int]] = list
.groupBy(_._1)
.view
.mapValues { statesById =>
statesById
.groupBy { case (_, subscriptionState) => (subscriptionState.channel, subscriptionState.state) }
.map { case (_, groupedStatesById) =>
val subscriptionState = groupedStatesById.head._2 // groupedStatesById should contain at least one element
val allSubIds = groupedStatesById.flatMap(_._2.subIds)
val updatedSubscriptionState = subscriptionState.copy(subIds = allSubIds)
updatedSubscriptionState -> allSubIds.size
}
}.toMap
This is a "simple" solution using groupMap and groupMapReduce
list
.groupMap(_._1)(_._2)
.view
.mapValues{
_.groupMapReduce(ss => (ss.channel, ss.state))(_.subIds)(_ ++ _)
.map{case (k,v) => SubScriptionState(k._1, k._2, v) -> v.length}
}
.toMap
The groupMap converts the data to a Map[Int, List[SubScriptionState]] and the mapValues converts each List to the appropriate Map. (The view and toMap wrappers make mapValues more efficient and safe.)
The groupMapReduce converts the List[SubScriptionState] into a Map[(Channel, ChannelState), List[SubId]].
The map on this inner Map juggles these values around to make Map[SubScriptionState, Int] as required.
I'm not clear what the purpose of inner Map is. The value is the length of the subIds field so it could be obtained directly from the key rather than needing to look it up in the Map
An attempt using foldLeft:
list.foldLeft(Map.empty[Int, Map[SubscriptionState, Int]]) { (acc, next) =>
val subMap = acc.getOrElse(next._1, Map.empty[SubscriptionState, Int])
val channelSub = subMap.find { case (sub, _) => sub.channel == next._2.channel && sub.state == next._2.state }
acc + (next._1 -> channelSub.fold(subMap + (next._2 -> next._2.subIds.length)) { case (sub, _) =>
val subIds = sub.subIds ++ next._2.subIds
(subMap - sub) + (sub.copy(subIds = subIds) -> subIds.length)
})
}
I noticed that count is not used while folding and can be calculated using storeIds. Also, as storeIds can vary, the inner Map is rather useless as you will have to use find instead of get to fetch values from Map. So if you have control over your ADTs, you could use an intermediary ADT like:
case class SubscriptionStateWithoutIds(channel: Channel, state: ChannelState)
then you can rewrite your foldLeft as follows:
list.foldLeft(Map.empty[Int, Map[SubscriptionStateWithoutIds, Seq[Long]]]) { (acc, next) =>
val subMap = acc.getOrElse(next._1, Map.empty[SubscriptionStateWithoutIds, Seq[Long]])
val withoutId = SubscriptionStateWithoutIds(next._2.channel, next._2.state)
val channelSub = subMap.get(withoutId)
acc + (next._1 -> (subMap + channelSub.fold(withoutId -> next._2.subIds) { seq => withoutId -> (seq ++ next._2.subIds) }))
}
The biggest advantage of intermediary ADT is you can have a cleaner groupMapReduce version:
list.groupMap(_._1)(sub => SubscriptionStateWithoutIds(sub._2.channel, sub._2.state) -> sub._2.subIds)
.map { case (key, value) => key -> value.groupMapReduce(_._1)(_._2)(_ ++ _) }

scala using calculations from pattern matching's guard (if) in body

I'm using pattern matching in scala a lot. Many times I need to do some calculations in guard part and sometimes they are pretty expensive. Is there any way to bind calculated values to separate value?
//i wan't to use result of prettyExpensiveFunc in body safely
people.collect {
case ...
case Some(Right((x, y))) if prettyExpensiveFunc(x, y) > 0 => prettyExpensiveFunc(x)
}
//ideally something like that could be helpful, but it doesn't compile:
people.collect {
case ...
case Some(Right((x, y))) if {val z = prettyExpensiveFunc(x, y); y > 0} => z
}
//this sollution works but it isn't safe for some `Seq` types and is risky when more cases are used.
var cache:Int = 0
people.collect {
case ...
case Some(Right((x, y))) if {cache = prettyExpensiveFunc(x, y); cache > 0} => cache
}
Is there any better solution?
ps: Example is simplified and I don't expect anwers that shows that I don't need pattern matching here.
You can use cats.Eval to make expensive calculations lazy and memoizable, create Evals using .map and extract .value (calculated at most once - if needed) in .collect
values.map { value =>
val expensiveCheck1 = Eval.later { prettyExpensiveFunc(value) }
val expensiveCheck2 = Eval.later { anotherExpensiveFunc(value) }
(value, expensiveCheck1, expensiveCheck2)
}.collect {
case (value, lazyResult1, _) if lazyResult1.value > 0 => ...
case (value, _, lazyResult2) if lazyResult2.value > 0 => ...
case (value, lazyResult1, lazyResult2) if lazyResult1.value > lazyResult2.value => ...
...
}
I don't see a way of doing what you want without creating some implementation of lazy evaluation, and if you have to use one, you might as well use existing one instead of rolling one yourself.
EDIT. Just in case you haven't noticed - you aren't losing the ability to pattern match by using tuple here:
values.map {
// originial value -> lazily evaluated memoized expensive calculation
case a # Some(Right((x, y)) => a -> Some(Eval.later(prettyExpensiveFunc(x, y)))
case a => a -> None
}.collect {
// match type and calculation
...
case (Some(Right((x, y))), Some(lazyResult)) if lazyResult.value > 0 => ...
...
}
Why not run the function first for every element and then work with a tuple?
Seq(1,2,3,4,5).map(e => (e, prettyExpensiveFunc(e))).collect {
case ...
case (x, y) if y => y
}
I tried own matchers and effect is somehow OK, but not perfect. My matcher is untyped, and it is bit ugly to make it fully typed.
class Matcher[T,E](f:PartialFunction[T, E]) {
def unapply(z: T): Option[E] = if (f.isDefinedAt(z)) Some(f(z)) else None
}
def newMatcherAny[E](f:PartialFunction[Any, E]) = new Matcher(f)
def newMatcher[T,E](f:PartialFunction[T, E]) = new Matcher(f)
def prettyExpensiveFunc(x:Int) = {println(s"-- prettyExpensiveFunc($x)"); x%2+x*x}
val x = Seq(
Some(Right(22)),
Some(Right(10)),
Some(Left("Oh now")),
None
)
val PersonAgeRank = newMatcherAny { case Some(Right(x:Int)) => (x, prettyExpensiveFunc(x)) }
x.collect {
case PersonAgeRank(age, rank) if rank > 100 => println("age:"+age + " rank:" + rank)
}
https://scalafiddle.io/sf/hFbcAqH/3

Scala for/yield runs but doesn't complete

I'm trying to walk through two arrays of potentially different sizes and compose a new array of randomly selected elements from them (for crossover in a genetic algorithm) (childGeneCount is just the length of the longer array).
In the following code snippet, each gene.toString logs, but my code doesn't seem to execute the last log. What dumb thing am I doing?
val genes = for (i <- 0 to childGeneCount) yield {
val gene = if (Random.nextBoolean()) {
if (i < p1genes.length) {
p1genes(i)
} else {
p2genes(i)
}
} else {
if (i < p2genes.length) {
p2genes(i)
} else {
p1genes(i)
}
}
Logger.debug(gene.toString)
gene
}
Logger.debug("crossover finishing - never gets here??")
New to scala, and would be happy for a slap on the wrist accompanied by a "do it this completely different way instead" if appropriate.
You are right, the problem was with "to" should have been "until". I have changed your code a bit to make it more scala like.
val p1genes = "AGTCTC"
val p2genes = "ATG"
val genePair = p1genes.zipAll(p2genes, None, None)
val matchedGene = for (pair <- genePair) yield {
pair match {
case (p1Gene, None) => p1Gene
case (None, p2Gene) => p2Gene
case (p1Gene, p2Gene) => if (Random.nextBoolean()) p1Gene else p2Gene
}
}
println(matchedGene)
The process is:
First zip two dna sequences into one.
Fill the shorter sequence with None.
Now loop over the zipped sequences and populate the new sequence.
Reworked Tawkir's answer, with cleaner None handling:
val p1genes = "AGTCTC"
val p2genes = "ATG"
val genePair = p1genes.map(Some.apply).zipAll(p2genes.map(Some.apply), None, None)
val matchedGene = genePair.map {
case (Some(p1Gene), None) => p1Gene
case (None, Some(p2Gene)) => p2Gene
case (Some(p1Gene), Some(p2Gene)) => if (Random.nextBoolean()) p1Gene else p2Gene
}
println(matchedGene)
If you want to avoid wrapping the sequence with Some, another solution is to use a character known not to appear in the sequence as a "none" marker:
val p1genes = "AGTCTC"
val p2genes = "ATG"
val none = '-'
val genePair = p1genes.zipAll(p2genes, none, none)
val matchedGene = genePair.map {
case (p1Gene, `none`) => p1Gene
case (`none`, p2Gene) => p2Gene
case (p1Gene, p2Gene) => if (Random.nextBoolean()) p1Gene else p2Gene
}
println(matchedGene)
Pretty sure harry0000's answer is correct: I was using "to" like "until", and am so used to exceptions being thrown loudly that I didn't think to look there!
I ended up switching from for/yield to List.tabulate(childGeneCount){ i => {, which fixed the error probably for the same reason.
Since you asked for possible style improvements, here are two suggested implementations. The first one is less idiomatic, but more performant. The second one is prettier but does some more work.
def crossover[E : ClassTag](a: Array[E], b: Array[E]): Array[E] = {
val (larger, smaller) = if(a.length > b.length) (a, b) else (b, a)
val result = Array.ofDim[E](larger.length)
for(i <- smaller.indices)
result(i) = if(Random.nextBoolean()) larger(i) else smaller(i)
for(i <- smaller.length until larger.length)
result(i) = larger(i)
result
}
def crossoverIdiomatic[E : ClassTag](a: Array[E], b: Array[E]): Array[E] = {
val randomPart = (a zip b).map { case (x,y) => if(Random.nextBoolean()) x else y }
val (larger, smaller) = if(a.length > b.length) (a, b) else (b, a)
randomPart ++ larger.drop(smaller.length)
}
val a = Array("1", "2", "3", "4", "5", "6")
val b = Array("one", "two", "three", "four")
// e.g. output: [one,2,three,4,5,6]
println(crossover(a, b).mkString("[", ",", "]"))
println(crossoverIdiomatic(a, b).mkString("[", ",", "]"))
Note that the E : ClassTag are only there to make the compiler happy about using Array[E], if you only need Int for your work, you can drop all the fancy generics.

Combining multiple Lists of arbitrary length

I am looking for an approach to join multiple Lists in the following manner:
ListA a b c
ListB 1 2 3 4
ListC + # * § %
..
..
..
Resulting List: a 1 + b 2 # c 3 * 4 § %
In Words: The elements in sequential order, starting at first list combined into the resulting list. An arbitrary amount of input lists could be there varying in length.
I used multiple approaches with variants of zip, sliding iterators but none worked and especially took care of varying list lengths. There has to be an elegant way in scala ;)
val lists = List(ListA, ListB, ListC)
lists.flatMap(_.zipWithIndex).sortBy(_._2).map(_._1)
It's pretty self-explanatory. It just zips each value with its position on its respective list, sorts by index, then pulls the values back out.
Here's how I would do it:
class ListTests extends FunSuite {
test("The three lists from his example") {
val l1 = List("a", "b", "c")
val l2 = List(1, 2, 3, 4)
val l3 = List("+", "#", "*", "§", "%")
// All lists together
val l = List(l1, l2, l3)
// Max length of a list (to pad the shorter ones)
val maxLen = l.map(_.size).max
// Wrap the elements in Option and pad with None
val padded = l.map { list => list.map(Some(_)) ++ Stream.continually(None).take(maxLen - list.size) }
// Transpose
val trans = padded.transpose
// Flatten the lists then flatten the options
val result = trans.flatten.flatten
// Viola
assert(List("a", 1, "+", "b", 2, "#", "c", 3, "*", 4, "§", "%") === result)
}
}
Here's an imperative solution if efficiency is paramount:
def combine[T](xss: List[List[T]]): List[T] = {
val b = List.newBuilder[T]
var its = xss.map(_.iterator)
while (!its.isEmpty) {
its = its.filter(_.hasNext)
its.foreach(b += _.next)
}
b.result
}
You can use padTo, transpose, and flatten to good effect here:
lists.map(_.map(Some(_)).padTo(lists.map(_.length).max, None)).transpose.flatten.flatten
Here's a small recursive solution.
def flatList(lists: List[List[Any]]) = {
def loop(output: List[Any], xss: List[List[Any]]): List[Any] = (xss collect { case x :: xs => x }) match {
case Nil => output
case heads => loop(output ::: heads, xss.collect({ case x :: xs => xs }))
}
loop(List[Any](), lists)
}
And here is a simple streams approach which can cope with an arbitrary sequence of sequences, each of potentially infinite length.
def flatSeqs[A](ssa: Seq[Seq[A]]): Stream[A] = {
def seqs(xss: Seq[Seq[A]]): Stream[Seq[A]] = xss collect { case xs if !xs.isEmpty => xs } match {
case Nil => Stream.empty
case heads => heads #:: seqs(xss collect { case xs if !xs.isEmpty => xs.tail })
}
seqs(ssa).flatten
}
Here's something short but not exceedingly efficient:
def heads[A](xss: List[List[A]]) = xss.map(_.splitAt(1)).unzip
def interleave[A](xss: List[List[A]]) = Iterator.
iterate(heads(xss)){ case (_, tails) => heads(tails) }.
map(_._1.flatten).
takeWhile(! _.isEmpty).
flatten.toList
Here's a recursive solution that's O(n). The accepted solution (using sort) is O(nlog(n)). Some testing I've done suggests the second solution using transpose is also O(nlog(n)) due to the implementation of transpose. The use of reverse below looks suspicious (since it's an O(n) operation itself) but convince yourself that it either can't be called too often or on too-large lists.
def intercalate[T](lists: List[List[T]]) : List[T] = {
def intercalateHelper(newLists: List[List[T]], oldLists: List[List[T]], merged: List[T]): List[T] = {
(newLists, oldLists) match {
case (Nil, Nil) => merged
case (Nil, zss) => intercalateHelper(zss.reverse, Nil, merged)
case (Nil::xss, zss) => intercalateHelper(xss, zss, merged)
case ( (y::ys)::xss, zss) => intercalateHelper(xss, ys::zss, y::merged)
}
}
intercalateHelper(lists, List.empty, List.empty).reverse
}

Filtering out keys of a map but keeping all values in scala

I'm trying to write a method with the following signature:
def buildSumMap(minInterval:Int, mappes:SortedMap[Int, Long]):SortedMap[Int, Long] = {...}
Within the method I want to return a new map by applying the following pseudo-code to each
(key:Int,value:Long)-pair of "mappes":
If(key + minInterval > nextKey) {
value += nextValue
}
else {
//Forget previous key(s) and return current key with sum of all previous values
return (key, value)
}
Example: If I had the source Map ((10 -> 5000), (20 -> 5000), (25 -> 7000), (40 -> 13000)) and defined a minInterval of 10, I'd expect the resulting Map:
((10 -> 5000), (25 -> 12000), (40 -> 13000))
I found a lot of examples for transforming keys and values of filtering keys and values seperately but none so far for dropping keys, while preserving the values.
This solution uses List as intermediate structure. It traverses map from left to right and appends key-value pairs to list if interval is big enough, otherwise it replaces head of the list with new key-value pair. TreeMap factory metod reverses list at the end.
import collection.immutable._
def buildSumMap(minInterval:Int, mappes:SortedMap[Int, Long]):SortedMap[Int, Long] =
TreeMap(
mappes.foldLeft[List[(Int, Long)]] (Nil) {
case (Nil, nextKV) => nextKV :: Nil
case (acc # (key, value) :: accTail, nextKV # (nextKey, nextValue)) =>
if (nextKey - key < minInterval)
(nextKey -> (value + nextValue)) :: accTail
else
nextKV :: acc
} : _*
)
To answer the question, basically there is no totally simple way of doing this, because the requirement isn't simple. You need to somehow iterate through the SortedMap while comparing adjacent elements and build a new Map. There are several ways to do it:
Use a fold / reduce / scan / groupBy higher order functions: generally the preferred way, and most concise
Recursion (see http://aperiodic.net/phil/scala/s-99/ for plenty of examples): what you resort to if using higher order functions gets too complicated, or the exact function you need doesn't exist. May be faster than using functions.
Builders - a nice term for a brief foray into mutable-land. Best performance; often equivalent to the recursive version without the ceremony
Here's my attempt using scanLeft:
def buildSumMap(minInterval: Int, mappes: SortedMap[Int, Long]) =
SortedMap.empty[Int, Long] ++ mappes.toSeq.tail.scanLeft(mappes.head){
case ((k1, v1), (k2, v2)) => if (k2 - k1 > minInterval) (k2,v2) else (k1,v2)
}.groupBy(_._1).mapValues(_.map(_._2).sum)
It looks complicated but it isn't really, once you understand what scanLeft and groupBy do, which you can look up elsewhere. It basically scans the sequence from the left and compares the keys, using the key to the left if the gap is too small, then groups the tuples together according to the keys.
TLDR: The key is to learn the built-in functions in the collections library, which takes some practice, but it's good fun.
import scala.collection.SortedMap
def buildSumMap(minInterval:Int, mappes:SortedMap[Int, Long]):SortedMap[Int, Long] = {
def _buildSumMap(map: List[(Int, Long)], buffer: List[(Int, Long)], result:SortedMap[Int, Long]): SortedMap[Int, Long] = {
def mergeBufferWithResult = {
val res = buffer.headOption.map { case (k, v) =>
(k, buffer.map(_._2).sum)
}
res.map(result + _).getOrElse(result)
}
map match {
case entry :: other =>
if(buffer.headOption.exists(entry._1 - _._1 < minInterval)) {
_buildSumMap(other, entry :: buffer, result)
} else {
_buildSumMap(other, entry :: Nil, mergeBufferWithResult)
}
case Nil =>
mergeBufferWithResult
}
}
_buildSumMap(mappes.toList, List.empty, SortedMap.empty)
}
val result = buildSumMap(10 , SortedMap(10 -> 5000L, 20 -> 5000L, 25 -> 7000L, 40 -> 13000L))
println(result)
//Map(10 -> 5000, 25 -> 12000, 40 -> 13000)
I tried to split the parts of the algorithm :
import scala.collection._
val myMap = SortedMap((10 -> 5000), (20 -> 5000), (25 -> 7000), (40 -> 13000)).mapValues(_.toLong)
def filterInterval(minInterval: Int, it: Iterable[Int]):List[Int] = {
val list = it.toList
val jumpMap = list.map(x => (x, list.filter( _ > x + minInterval))).toMap.
filter(_._2.nonEmpty).mapValues(_.min)
def jump(n:Int): Stream[Int] = jumpMap.get(n).map(j => Stream.cons(j, jump(j))).getOrElse(Stream.empty)
list.min :: jump(list.min).toList
}
def buildSumMap(minInterval:Int, mappes:Map[Int, Long]):Map[Int,Long] = {
val filteredKeys: List[Int] = filterInterval(minInterval, mappes.keys)
val agg:List[(Int, Long)] = filteredKeys.map(finalkey =>
(finalkey,mappes.filterKeys(_ <= finalkey).values.sum)
).sort(_._1 < _._1)
agg.zip((filteredKeys.min, 0L) :: agg ).map(st => (st._1._1, st._1._2 - st._2._2)).toMap
}
buildSumMap(10, myMap)
Here's another take:
def buildSumMap(map: SortedMap[Int, Int], diff: Int) =
map.drop(1).foldLeft(map.take(1)) { case (m, (k, v)) =>
val (_k, _v) = m.last
if (k - _k < diff) (m - _k) + (k -> (v + _v))
else m + (k -> v)
}
A much cleaner (than my first attempt) solution using Scalaz 7's State, and a List to store the state of the computation. Using a List makes it efficient to inspect, and modify if necessary, the head of the list at each step.
def f2(minInterval: Int): ((Int, Int)) => State[List[(Int, Int)], Unit] = {
case (k, v) => State {
case (floor, acc) :: tail if (floor + minInterval) > k =>
((k, acc + v) :: tail) -> ()
case state => ((k, v) :: state) -> ()
}
}
scala> mappes.toList traverseS f2(10) execZero
res1: scalaz.Id.Id[List[(Int, Int)]] = List((40,13000), (25,12000), (10,5000))