Better way for aggregation on list of case classes - scala

I have list of case classes. Output requires aggregation on different parameters of case class. Looking for more optimized way to do it.
Example:
case class Students(city: String, college: String, group: String,
name: String, fee: Int, age: Int)
object GroupByStudents {
val studentsList= List(
Students("Mumbai","College1","Science","Jony",100,30),
Students("Mumbai","College1","Science","Tony", 200, 25),
Students("Mumbai","College1","Social","Bony",250,30),
Students("Mumbai","College2","Science","Gony", 240, 28),
Students("Bangalore","College3","Science","Hony", 270, 28))
}
Now to get details of students from a City, i need to first aggregate by City, then break-up those details college wise, then group wise.
Output is list of case class in below format.
Students(Mumbai,,,,790,0) -- aggregate city wise
Students(Mumbai,College1,,,550,0) -- aggregate college wise
Students(Mumbai,College1,Social,,250,0)
Students(Mumbai,College1,Science,,300,0)
Students(Mumbai,College2,,,240,0)
Students(Mumbai,College2,Science,,240,0)
Students(Bangalore,,,,270,0)
Students(Bangalore,College3,,,270,0)
Students(Bangalore,College3,Science,,270,0)
Two methods to achieve this:
1) Loop all list, create a map for each combination (above case 3 combinations
), aggregate data and create new result list and append data to it.
2) Using foldLeft option
studentsList.groupBy(d=>(d.city))
.mapValues(_.foldLeft(Students("","","","",0,0))
((r,c) => Students(c.city,"","","",r.fee+c.fee,0)))
studentsList.groupBy(d=>(d.city,d.college))
.mapValues(_.foldLeft(Students("","","","",0,0))
((r,c) => Students(c.city,c.college,"","",r.fee+c.fee,0)))
studentsList.groupBy(d=>(d.city,d.college,d.group))
.mapValues(_.foldLeft(Students("","","","",0,0))
((r,c) => Students(c.city,c.college,c.group,"",r.fee+c.fee,0)))
In both cases, looping on list more than once. Is there any way to achieve this with single pass and optimized way.

With GroupBy
Code looks a little bit nicer, but I think it isn't faster. With groupby you have always 2 "loops"
studentsList.groupBy(d=>(d.city)).map { case (k,v) =>
Students(v.head.city,"","","",v.map(_.fee).sum, 0)
}
studentsList.groupBy(d=>(d.city,d.college)).map { case (k,v) =>
Students(v.head.city,v.head.college,"","",v.map(_.fee).sum, 0)
}
studentsList.groupBy(d=>(d.city,d.college,d.group)).map { case (k,v) =>
Students(v.head.city,v.head.college,v.head.group,"",v.map(_.fee).sum, 0)
}
You get then Something like this
List(Students(Bangalore,College3,Science,Hony,270,0),
Students(Mumbai,College1,Science,Jony,790,0))
List(Students(Mumbai,College2,,,240,0),
Students(Bangalore,College3,,,270,0),
Students(Mumbai,College1,,,550,0))
List(Students(Bangalore,College3,Science,,270,0),
Students(Mumbai,College2,Science,,240,0),
Students(Mumbai,College1,Social,,250,0),
Students(Mumbai,College1,Science,,300,0))
It is not exactly the same output like in your example, but it is the desired output: a list of case class students.
With a for comprehension
You could avoid this looping if your grouping by yourself. Only have the city example the other are straight forward.
var m = Map[String, Students]()
for (v <- studentsList) {
m += v.city -> Students(v.city,"","","",v.fee + m.getOrElse(v.city, Students("","","","",0,0)).asInstanceOf[Students].fee, 0)
}
m
Output
It's the same Output like your studenList but I only loop one time, for every Map[String,Students] output.
Map(Mumbai -> Students(Mumbai,,,,790,0), Bangalore -> Students(Bangalore,,,,270,0))
With Foldleft
Just going in one loop over the complete list.
val emptyStudent = Students("","","","",0,0);
studentsList.foldLeft(Map[String, Students]()) { case (m, v) =>
m + (v.city -> Students(v.city,"","","",
v.fee + m.getOrElse(v.city, emptyStudent).fee, 0))
}
studentsList.foldLeft(Map[(String,String), Students]()) { case (m, v) =>
m + ((v.city,v.college) -> Students(v.city,v.college,"","",
v.fee + m.getOrElse((v.city,v.college), emptyStudent).fee, 0))
}
studentsList.foldLeft(Map[(String,String,String), Students]()) { case (m, v) =>
m + ((v.city,v.college,v.group) -> Students(v.city,v.college,v.group,"",
v.fee + m.getOrElse((v.city,v.college,v.group), emptyStudent).fee, 0))
}
Output
It's the same Output like your studenList but I only loop one time, for every Map[String,Students] output.
Map(Mumbai -> Students(Mumbai,,,,790,0),
Bangalore -> Students(Bangalore,,,,270,0))
Map((Mumbai,College1) -> Students(Mumbai,College1,,,550,0),
(Mumbai,College2) -> Students(Mumbai,College2,,,240,0),
(Bangalore,College3) -> Students(Bangalore,College3,,,270,0))
Map((Mumbai,College1,Science) -> Students(Mumbai,College1,Science,,300,0),
(Mumbai,College1,Social) -> Students(Mumbai,College1,Social,,250,0),
(Mumbai,College2,Science) -> Students(Mumbai,College2,Science,,240,0),
(Bangalore,College3,Science) -> Students(Bangalore,College3,Science,,270,0))
With FoldLeft One Loop
You can just generate one Big Map with all the List.
val emptyStudent = Students("","","","",0,0);
studentsList.foldLeft(Map[(String,String,String), Students]()) { case (m, v) =>
{
var t = m + ((v.city,"","") -> Students(v.city,"","","",
v.fee + m.getOrElse((v.city,"",""), emptyStudent).fee, 0))
t = t + ((v.city,v.college,"") -> Students(v.city,v.college,"","",
v.fee + m.getOrElse((v.city,v.college,""), emptyStudent).fee, 0))
t + ((v.city,v.college,v.group) -> Students(v.city,v.college,v.group,"",
v.fee + m.getOrElse((v.city,v.college,v.group), emptyStudent).fee, 0))
}
}
Output
In this case you loop one time and get back the results for all aggregating, but only in oneMap. This would work with for comprehension, too.
Map((Mumbai,College1,Science) -> Students(Mumbai,College1,Science,,300,0),
(Bangalore,,) -> Students(Bangalore,,,,270,0),
(Mumbai,College2,Science) -> Students(Mumbai,College2,Science,,240,0),
(Mumbai,College2,) -> Students(Mumbai,College2,,,240,0),
(Mumbai,College1,Social) -> Students(Mumbai,College1,Social,,250,0),
(Mumbai,,) -> Students(Mumbai,,,,790,0),
(Bangalore,College3,) -> Students(Bangalore,College3,,,270,0),
(Mumbai,College1,) -> Students(Mumbai,College1,,,550,0),
(Bangalore,College3,Science) -> Students(Bangalore,College3,Science,,270,0))
The Map is always copied, so it could have some performance and memory issues. To solve this use a for comprehension
For Comprehension One Loop
This generates one Map with the 3 aggregate types.
val emptyStudent = Students("","","","",0,0);
var m = Map[(String,String,String), Students]()
for (v <- studentsList) {
m += ((v.city,"","") -> Students(v.city,"","","", v.fee + m.getOrElse((v.city,"",""), emptyStudent).fee, 0))
m += ((v.city,v.college,"") -> Students(v.city,v.college,"","", v.fee + m.getOrElse((v.city,v.college,""), emptyStudent).fee, 0))
m += ((v.city,v.college,v.group) -> Students(v.city,v.college,v.group,"", v.fee + m.getOrElse((v.city,v.college,v.group), emptyStudent).fee, 0))
}
m
This should be better in terms of memory consumption cause you aren't copy the maps like in the foldLeft example
Output
Map((Mumbai,College1,Science) -> Students(Mumbai,College1,Science,,300,0),
(Bangalore,,) -> Students(Bangalore,,,,270,0),
(Mumbai,College2,Science) -> Students(Mumbai,College2,Science,,240,0),
(Mumbai,College2,) -> Students(Mumbai,College2,,,240,0),
(Mumbai,College1,Social) -> Students(Mumbai,College1,Social,,250,0),
(Mumbai,,) -> Students(Mumbai,,,,790,0), (Bangalore,College3,) -> Students(Bangalore,College3,,,270,0),
(Mumbai,College1,) -> Students(Mumbai,College1,,,550,0),
(Bangalore,College3,Science) -> Students(Bangalore,College3,Science,,270,0))
In all cases you could just reduce the code if you make the parameter optional in your case class students, cause then you can just do something like Students(city=v.city,fee=v.fee+m.getOrElse(v.city,emptyStudent).fee during grouping

Use a foldLeft
First, let's define some type aliases to make the syntax easier
object GroupByStudents {
type City = String
type College = String
type Group = String
type Name = String
type Aggregate = Map[City, Map[College, Map[Group, List[Students]]]]
def emptyAggregate: Aggregate = Map.empty
case class Students(city: City, college: College, group: Group,
name: Name, fee: Int, age: Int)
}
You can aggregate the students list into an Aggregate map in a single foldLeft
object Test {
import GroupByStudents._
def main(args: Array[String]) {
val studentsList = List(
Students("Mumbai","College1","Science","Jony",100,30),
Students("Mumbai","College1","Science","Tony", 200, 25),
Students("Mumbai","College1","Social","Bony",250,30),
Students("Mumbai","College2","Science","Gony", 240, 28),
Students("Bangalore","College3","Science","Hony", 270, 28))
val aggregated = studentsList.foldLeft(emptyAggregate){(agg, students) =>
val cityBin = agg.getOrElse(students.city, Map.empty)
val collegeBin = cityBin.getOrElse(students.college, Map.empty)
val groupBin = collegeBin.getOrElse(students.group, List.empty)
val nextGroupBin = students :: groupBin
val nextCollegeBin= collegeBin + (students.group -> nextGroupBin)
val nextCityBin = cityBin + (students.college -> nextCollegeBin)
agg + (students.city -> nextCityBin)
}
}
}
aggregated can then be mapped over to calculate fees.
If you really want, you can calculate the fees in the foldLeft itself, but this would make the code harder to read.
Note that you can also try monocle's lenses to put the students value in the aggregated structure.

Related

How to transform input data into following format? - groupby

What I have is the following input data for a function in a piece of scala code I'm writing:
List(
(1,SubScriptionState(CNN,ONLINE,Seq(12))),
(1,SubScriptionState(SKY,ONLINE,Seq(12))),
(1,SubScriptionState(FOX,ONLINE,Seq(12))),
(2,SubScriptionState(CNN,ONLINE,Seq(12))),
(2,SubScriptionState(SKY,ONLINE,Seq(12))),
(2,SubScriptionState(FOX,ONLINE,Seq(12))),
(2,SubScriptionState(CNN,OFFLINE,Seq(13))),
(2,SubScriptionState(SKY,ONLINE,Seq(13))),
(2,SubScriptionState(FOX,ONLINE,Seq(13))),
(3,SubScriptionState(CNN,OFFLINE,Seq(13))),
(3,SubScriptionState(SKY,ONLINE,Seq(13))),
(3,SubScriptionState(FOX,ONLINE,Seq(13)))
)
SubscriptionState is just a case class here:
case class SubscriptionState(channel: Channel, state: ChannelState, subIds: Seq[Long])
I want to transform it into this:
Map(
1 -> Map(
SubScriptionState(SKY,ONLINE,Seq(12)) -> 1,
SubScriptionState(CNN,ONLINE,Seq(12)) -> 1,
SubScriptionState(FOX,ONLINE,Seq(12)) -> 1),
2 -> Map(
SubScriptionState(SKY,ONLINE,Seq(12,13)) -> 2,
SubScriptionState(CNN,ONLINE,Seq(12)) -> 1,
SubScriptionState(FOX,ONLINE,Seq(12,13)) -> 2,
SubScriptionState(CNN,OFFLINE,Seq(13)) -> 1),
3 -> Map(
SubScriptionState(SKY,ONLINE,Seq(13)) -> 1,
SubScriptionState(FOX,ONLINE,Seq(13)) -> 1,
SubScriptionState(CNN,OFFLINE,Seq(13)) -> 1)
)
How would I go about doing this in scala?
Here is my approach to the problem. I think it may not be a perfect solution, but it works as you would expect.
val result: Map[Int, Map[SubscriptionState, Int]] = list
.groupBy(_._1)
.view
.mapValues { statesById =>
statesById
.groupBy { case (_, subscriptionState) => (subscriptionState.channel, subscriptionState.state) }
.map { case (_, groupedStatesById) =>
val subscriptionState = groupedStatesById.head._2 // groupedStatesById should contain at least one element
val allSubIds = groupedStatesById.flatMap(_._2.subIds)
val updatedSubscriptionState = subscriptionState.copy(subIds = allSubIds)
updatedSubscriptionState -> allSubIds.size
}
}.toMap
This is a "simple" solution using groupMap and groupMapReduce
list
.groupMap(_._1)(_._2)
.view
.mapValues{
_.groupMapReduce(ss => (ss.channel, ss.state))(_.subIds)(_ ++ _)
.map{case (k,v) => SubScriptionState(k._1, k._2, v) -> v.length}
}
.toMap
The groupMap converts the data to a Map[Int, List[SubScriptionState]] and the mapValues converts each List to the appropriate Map. (The view and toMap wrappers make mapValues more efficient and safe.)
The groupMapReduce converts the List[SubScriptionState] into a Map[(Channel, ChannelState), List[SubId]].
The map on this inner Map juggles these values around to make Map[SubScriptionState, Int] as required.
I'm not clear what the purpose of inner Map is. The value is the length of the subIds field so it could be obtained directly from the key rather than needing to look it up in the Map
An attempt using foldLeft:
list.foldLeft(Map.empty[Int, Map[SubscriptionState, Int]]) { (acc, next) =>
val subMap = acc.getOrElse(next._1, Map.empty[SubscriptionState, Int])
val channelSub = subMap.find { case (sub, _) => sub.channel == next._2.channel && sub.state == next._2.state }
acc + (next._1 -> channelSub.fold(subMap + (next._2 -> next._2.subIds.length)) { case (sub, _) =>
val subIds = sub.subIds ++ next._2.subIds
(subMap - sub) + (sub.copy(subIds = subIds) -> subIds.length)
})
}
I noticed that count is not used while folding and can be calculated using storeIds. Also, as storeIds can vary, the inner Map is rather useless as you will have to use find instead of get to fetch values from Map. So if you have control over your ADTs, you could use an intermediary ADT like:
case class SubscriptionStateWithoutIds(channel: Channel, state: ChannelState)
then you can rewrite your foldLeft as follows:
list.foldLeft(Map.empty[Int, Map[SubscriptionStateWithoutIds, Seq[Long]]]) { (acc, next) =>
val subMap = acc.getOrElse(next._1, Map.empty[SubscriptionStateWithoutIds, Seq[Long]])
val withoutId = SubscriptionStateWithoutIds(next._2.channel, next._2.state)
val channelSub = subMap.get(withoutId)
acc + (next._1 -> (subMap + channelSub.fold(withoutId -> next._2.subIds) { seq => withoutId -> (seq ++ next._2.subIds) }))
}
The biggest advantage of intermediary ADT is you can have a cleaner groupMapReduce version:
list.groupMap(_._1)(sub => SubscriptionStateWithoutIds(sub._2.channel, sub._2.state) -> sub._2.subIds)
.map { case (key, value) => key -> value.groupMapReduce(_._1)(_._2)(_ ++ _) }

Need to merge the Map and List with the same dates

I have a Map where key = LocalDateTime and value = Group
def someGroup(/.../): List[Group] = {
someCode.map {
/.../
}.map(group => (group.completedDt, group)).toMap
/.../
}
And there is also List [Group], where Group (completedDt: LocalDateTime, cost: Int), in which always cost = 0
An example of what I have:
map: [(2021-04-01T00:00:00.000, 500), (2021-04-03T00:00:00.000, 1000), (2021-04-05T00:00:00.000, 750)]
list: ((2021-04-01T00:00:00.000, 0),(2021-04-02T00:00:00.000, 0),(2021-04-03T00:00:00.000, 0),(2021-04-04T00:00:00.000, 0),(2021-04-05T00:00:00.000, 0))
The expected result is:
list ((2021-04-01T00:00:00.000, 500),(2021-04-02T00:00:00.000, 0),(2021-04-03T00:00:00.000, 1000),(2021-04-04T00:00:00.000, 0),(2021-04-05T00:00:00.000, 750))
Thanks in advance!
Assuming that if there's a time appearing in both that you want to combine the costs:
type Group = (LocalDateTime, Int) // completedDt, cost
val groupMap: Map[LocalDateTime, Group] = ???
val groupList: List[Group] = ???
val combined =
groupList.foldLeft(groupMap) { (acc, group) =>
val completedDt = group._1
if (acc.contains(completedDt)) {
val nv = completedDt -> (acc(completedDt)._2 + group._2)
acc.updated(completedDt, nv)
} else acc + (completedDt -> group)
}.values.toList.sortBy(_._1) // You might need to define an Ordering[LocalDateTime]
The notation in your question leads me to think Group is just a pair, not a case class. It's also worth noting that I'm not sure what having the map be Map[LocalDateTime, Group] vs. Map[LocalDateTime, Int] (and thus by definition a collection of Group) buys you.
EDIT: if you have a general collection of collections of Group, you can
val groupLists: List[List[Group]] = ???
groupList.foldLeft(Map.empty[LocalDateTime, Group]) { (acc, lst) =>
lst.foldLeft(acc) { (m, group) =>
val completedDt = group._1
if (m.contains(completedDt)) {
val nv = completedDt -> (acc(completedDt)._2 + group._2)
m.updated(completedDt, nv)
} else m + (completedDt -> group)
}
}.values.toList.sortBy(_._2)

Partially filter on a map to get key : Scala

I have a map:
val mapTest = Map("Haley" -> Map("Deran" -> 0.4, "Mike" -> 0.3), "Jack" -> Map("Deran" -> 0.3, "Mike" -> 0.3))
I want to retrieve the key based on a value. Given the value "Deran"-> 0.4 I should get "Haley".
I have tried using this:
mapTest.filter(_._2 == Map("Deran" -> 0.4))
but it doesn't work as filter selects all the values at a time. That's the first question. My second question is what If two keys verify that predicates such as the case for "Jack" and "Haley" for "Mike"
Maybe you want something like this:
val toSearch = List("Deran - > 0.4," Mike" -> 0.3)
mapTest.collectFirst {
case (key, values) if (toSearch.forall { case (k, v) => values.get(k).contains(v) }) => key
}
This could probably solve it:
def filter[K, NK, NV](m: Map[K, Map[NK, NV]])(p: ((NK, NV)) => Boolean): Vector[K] =
m.view.collect { case (k, v) if v.exists(p) => k }.toVector
Where NK is a generic type for a nested key and NV a generic type for a nested value.
This works as follows with the following inputs and outputs
val in1: (String, Double) = "Deran" -> 0.4
val out1: Vector[String] = Vector("Haley")
val in2: (String, Double) = "Mike" -> 0.3
val out2: Vector[String] = Vector("Haley", "Jack")
assert(filter(mapTest)(_ == in1) == out1)
assert(filter(mapTest)(_ == in2) == out2)
You can play around with this code here on Scastie.
Using a predicate you can be very generic but note that the complexity grows proportionally to the size of both the map and the nested maps contained therein.
If you can be less generic and simply check for equality, you can drop the predicate and use this to your advantage to make the nested check run in constant time:
def filter[K, NK, NV](m: Map[K, Map[NK, NV]])(p: (NK, NV)): Vector[K] =
m.view.collect { case (k, v) if v.get(p._1).contains(p._2) => k }.toVector
assert(filter(mapTest)(in1) == out1)
assert(filter(mapTest)(in2) == out2)
This variant is also available here on Scastie.

How to use Reduce on Scala

I am using scala to implement an algorithm. I have a case where I need to implement such scenario:
test = Map(t -> List((t,2)), B -> List((B,3), (B,1)), D -> List((D,1)))
I need to some the second member of every common tuples.
The desired result :
Map((t,2),(B,4),(D,1))
val resReduce = test.foldLeft(Map.empty[String, List[Map.empty[String, Int]]){(count, tup) => count + (tup -> (count.getOrElse(tup, 0) + 1))
I am trying to use "Reduce", I have to go through every group I did and sum their second member. Any idea how to do that.
If you know that all lists are nonempty and start with the same key (e.g. they were produced by groupBy), then you can just
test.mapValues(_.map(_._2).sum).toMap
Alternatively, you might want an intermediate step that allows you to perform error-checking:
test.map{ case(k,xs) =>
val v = {
if (xs.exists(_._1 != k)) ??? // Handle key-mismatch case
else xs.reduceOption((l,r) => l.copy(_2 = l._2 + r._2))
}
v.getOrElse(??? /* Handle empty-list case */)
}
You could do something like this:
test collect{
case (key, many) => (key, many.map(_._2).sum)
}
wherein you do not have to assume that the list has any members. However, if you want to exclude empty lists, add a guard
case (key, many) if many.nonEmpty =>
like that.
scala> val test = Map("t" -> List(("t",2)), "B" -> List(("B",3), ("B",1)), "D" -> List(("D",1)))
test: scala.collection.immutable.Map[String,List[(String, Int)]] = Map(t -> List((t,2)), B -> List((B,3), (B,1)), D -> List((D,1)))
scala> test.map{case (k,v) => (k, v.map(t => t._2).sum)}
res32: scala.collection.immutable.Map[String,Int] = Map(t -> 2, B -> 4, D -> 1)
Yet another approach, in essence quite similar to what has already been suggested,
implicit class mapAcc(val m: Map[String,List[(String,Int)]]) extends AnyVal {
def mapCount() = for ( (k,v) <- m ) yield { (k,v.map {_._2}.sum) }
}
Then for a given
val test = Map("t" -> List(("t",2)), "B" -> List(("B",3), ("B",1)), "D" -> List(("D",1)))
a call
test.mapCount()
delivers
Map(t -> 2, B -> 4, D -> 1)

Filtering out keys of a map but keeping all values in scala

I'm trying to write a method with the following signature:
def buildSumMap(minInterval:Int, mappes:SortedMap[Int, Long]):SortedMap[Int, Long] = {...}
Within the method I want to return a new map by applying the following pseudo-code to each
(key:Int,value:Long)-pair of "mappes":
If(key + minInterval > nextKey) {
value += nextValue
}
else {
//Forget previous key(s) and return current key with sum of all previous values
return (key, value)
}
Example: If I had the source Map ((10 -> 5000), (20 -> 5000), (25 -> 7000), (40 -> 13000)) and defined a minInterval of 10, I'd expect the resulting Map:
((10 -> 5000), (25 -> 12000), (40 -> 13000))
I found a lot of examples for transforming keys and values of filtering keys and values seperately but none so far for dropping keys, while preserving the values.
This solution uses List as intermediate structure. It traverses map from left to right and appends key-value pairs to list if interval is big enough, otherwise it replaces head of the list with new key-value pair. TreeMap factory metod reverses list at the end.
import collection.immutable._
def buildSumMap(minInterval:Int, mappes:SortedMap[Int, Long]):SortedMap[Int, Long] =
TreeMap(
mappes.foldLeft[List[(Int, Long)]] (Nil) {
case (Nil, nextKV) => nextKV :: Nil
case (acc # (key, value) :: accTail, nextKV # (nextKey, nextValue)) =>
if (nextKey - key < minInterval)
(nextKey -> (value + nextValue)) :: accTail
else
nextKV :: acc
} : _*
)
To answer the question, basically there is no totally simple way of doing this, because the requirement isn't simple. You need to somehow iterate through the SortedMap while comparing adjacent elements and build a new Map. There are several ways to do it:
Use a fold / reduce / scan / groupBy higher order functions: generally the preferred way, and most concise
Recursion (see http://aperiodic.net/phil/scala/s-99/ for plenty of examples): what you resort to if using higher order functions gets too complicated, or the exact function you need doesn't exist. May be faster than using functions.
Builders - a nice term for a brief foray into mutable-land. Best performance; often equivalent to the recursive version without the ceremony
Here's my attempt using scanLeft:
def buildSumMap(minInterval: Int, mappes: SortedMap[Int, Long]) =
SortedMap.empty[Int, Long] ++ mappes.toSeq.tail.scanLeft(mappes.head){
case ((k1, v1), (k2, v2)) => if (k2 - k1 > minInterval) (k2,v2) else (k1,v2)
}.groupBy(_._1).mapValues(_.map(_._2).sum)
It looks complicated but it isn't really, once you understand what scanLeft and groupBy do, which you can look up elsewhere. It basically scans the sequence from the left and compares the keys, using the key to the left if the gap is too small, then groups the tuples together according to the keys.
TLDR: The key is to learn the built-in functions in the collections library, which takes some practice, but it's good fun.
import scala.collection.SortedMap
def buildSumMap(minInterval:Int, mappes:SortedMap[Int, Long]):SortedMap[Int, Long] = {
def _buildSumMap(map: List[(Int, Long)], buffer: List[(Int, Long)], result:SortedMap[Int, Long]): SortedMap[Int, Long] = {
def mergeBufferWithResult = {
val res = buffer.headOption.map { case (k, v) =>
(k, buffer.map(_._2).sum)
}
res.map(result + _).getOrElse(result)
}
map match {
case entry :: other =>
if(buffer.headOption.exists(entry._1 - _._1 < minInterval)) {
_buildSumMap(other, entry :: buffer, result)
} else {
_buildSumMap(other, entry :: Nil, mergeBufferWithResult)
}
case Nil =>
mergeBufferWithResult
}
}
_buildSumMap(mappes.toList, List.empty, SortedMap.empty)
}
val result = buildSumMap(10 , SortedMap(10 -> 5000L, 20 -> 5000L, 25 -> 7000L, 40 -> 13000L))
println(result)
//Map(10 -> 5000, 25 -> 12000, 40 -> 13000)
I tried to split the parts of the algorithm :
import scala.collection._
val myMap = SortedMap((10 -> 5000), (20 -> 5000), (25 -> 7000), (40 -> 13000)).mapValues(_.toLong)
def filterInterval(minInterval: Int, it: Iterable[Int]):List[Int] = {
val list = it.toList
val jumpMap = list.map(x => (x, list.filter( _ > x + minInterval))).toMap.
filter(_._2.nonEmpty).mapValues(_.min)
def jump(n:Int): Stream[Int] = jumpMap.get(n).map(j => Stream.cons(j, jump(j))).getOrElse(Stream.empty)
list.min :: jump(list.min).toList
}
def buildSumMap(minInterval:Int, mappes:Map[Int, Long]):Map[Int,Long] = {
val filteredKeys: List[Int] = filterInterval(minInterval, mappes.keys)
val agg:List[(Int, Long)] = filteredKeys.map(finalkey =>
(finalkey,mappes.filterKeys(_ <= finalkey).values.sum)
).sort(_._1 < _._1)
agg.zip((filteredKeys.min, 0L) :: agg ).map(st => (st._1._1, st._1._2 - st._2._2)).toMap
}
buildSumMap(10, myMap)
Here's another take:
def buildSumMap(map: SortedMap[Int, Int], diff: Int) =
map.drop(1).foldLeft(map.take(1)) { case (m, (k, v)) =>
val (_k, _v) = m.last
if (k - _k < diff) (m - _k) + (k -> (v + _v))
else m + (k -> v)
}
A much cleaner (than my first attempt) solution using Scalaz 7's State, and a List to store the state of the computation. Using a List makes it efficient to inspect, and modify if necessary, the head of the list at each step.
def f2(minInterval: Int): ((Int, Int)) => State[List[(Int, Int)], Unit] = {
case (k, v) => State {
case (floor, acc) :: tail if (floor + minInterval) > k =>
((k, acc + v) :: tail) -> ()
case state => ((k, v) :: state) -> ()
}
}
scala> mappes.toList traverseS f2(10) execZero
res1: scalaz.Id.Id[List[(Int, Int)]] = List((40,13000), (25,12000), (10,5000))