How to traverse a list of custom class obj containing a map and build another map with aggregation - scala

I have a list of custom class object that contains can contain a map(String -> Integer). I want to aggregate all quantities from the list grouped by the key. I don't need anything else from the list.
For example:
case class Student(name: String, standard: String, scores: List[Option[Map[String, java.lang.Integer]]])
val studentList = List(Student("John", "First", List(Option(Map("Sub1" -> 10)), Option(Map("Sub2" -> 20)), Option(Map("Sub3" -> 30)))),
Student("Jane", "First", List(Option(Map("Sub1" -> 10)), Option(Map("Sub2" -> 20)), Option(Map("Sub3" -> 30)))),
Student("Jimmy", "First", List(Option(Map("Sub1" -> 10)), Option(Map("Sub2" -> 20)), Option(Map("Sub3" -> 30)))))
val totalScoresByName = studentList.groupMapReduce(_.name)(_.scores)(_ + _)
Using + in groupMapReduce method shows error
value + is not a member of List[Option[Map[String, Integer]]]
How can I create a map of name -> totalScore out of this list which looks like this
John -> 60
Jane -> 60
Jimmy -> 60

Related

scala groupby by the value of the map in a list of maps

So I have a list of maps like this
val data = List(
Map[String, String]("name" -> "Bob", "food" -> "pizza", "day" -> "monday"),
Map[String, String]("name" -> "Ron", "food" -> "hotdog", "day" -> "tuesday"),
Map[String, String]("name" -> "Tim", "food" -> "pizza", "day" -> "wednesday"),
Map[String, String]("name" -> "Carl", "food" -> "hotdog", "day" -> "wednesday")
)
I want to make a Map like this from that List of maps
val result = Map("pizza" -> Map("name" -> ("Bob", "Tim"), "day" -> ("monday", "wednesday")),
"hotdog"-> Map("name" -> ("Ron", "Carl"), "day" -> ("tuesday", "wednesday")))
How can I achieve this result? Thanks
*ps I'm a beginner in Scala
Here is a preliminary solution, there is probably an easier way of doing this with fold but I have to sketch that out separately
data.groupMap(a => a("food"))(_.filter(_._1 != "food"))
.map{
case (a,b) =>
(a, b.flatten.groupMapReduce(_._1)(a => List(a._2))(_ ++ _))}
You group the maps inside based on the value of food
This gives you:
Map(
hotdog -> List(
Map(name -> Ron, food -> hotdog, day -> tuesday),
Map(name -> Carl, food -> hotdog, day -> wednesday)),
pizza -> List(
Map(name -> Bob, food -> pizza, day -> monday),
Map(name -> Tim, food -> pizza, day -> wednesday))
)
You remove the key food from the inner maps
Map(
hotdog -> List(
Map(name -> Ron, day -> tuesday),
Map(name -> Carl, day -> wednesday)),
pizza -> List(
Map(name -> Bob, day -> monday),
Map(name -> Tim, day -> wednesday))
)
You "merge" the maps inside by using groupMapReduce which
a) groups by the inner key (i.e. name and day)
b) maps each value to a singleton list
c) concats the lists
Edit: Here is a single pass solution using foldLeft but I don't think I like this any better. All the key accesses are unsafe and will blow up if your entry is missing the key. So ideally you would need to use .get() to get back an option and do bunch of pattern matching
data.foldLeft(Map[String, Map[String, List[String]]]())((b, a) => {
val foodVal = a("food")
b.get(foodVal) match{
case None => b + (foodVal ->
List("name" -> List(a("name")), "day" -> List(a("day"))).toMap)
case Some(v : Map[String, List[String]]) =>
b + (foodVal ->
List("name" -> (v("name") :+ a("name")), "day" -> (v("day") :+ a("day"))).toMap)
}
})

Scala group By First Occurrence

I have the following data:
List(Map("weight"->"1000","type"->"abc","match"->"first"),Ma‌​p("weight"->"1000","‌​type"->"abc","match"‌​->"third"),Map("weig‌​ht"->"182","type"->"‌​bcd","match"->"secon‌​d"),Map("weight"->"1‌​50","type"->"bcd","m‌​atch"->"fourth"),Map‌​("weight"->"40","typ‌​e"->"aaa","match"->"‌​fifth"))
After grouping by "type" i would want results of first "abc" then "bcd" then "aaa"
When I apply group by "type" the resulting map gives first key as aaa whereas I want the first key to be "abc" and all corresponding values.
how can I achieve this?
As mentioned in the comments you need a sorted map, which is not created by a simple group by. What you could do is:
val a = List(Map("weight"->"1000", "type"->"abc","match"->"first"), Map("weight"->"1000","type"->"abc","match"->"third"),Map("weig‌​ht"->"182","type"->"‌​bcd","match"->"secon‌​d"), Map("weight"->"1‌​50","type"->"bcd","m‌​atch"->"fourth"), Map("weight"->"40","type"->"aaa","match"->"‌​fifth"))
val sortedGrouping = SortedMap.empty[String,String] ++ a.groupBy { x => x("type") }
println(sortedGrouping)
What you get (printed) is:
Map(aaa -> List(Map(weight -> 40, type -> aaa, match -> ‌​fifth)), abc -> List(Map(weight -> 1000, type -> abc, match -> first), Map(weight -> 1000, type -> abc, match -> third)), bcd -> List(Map(weight -> 1‌​50, type -> bcd, m‌​atch -> fourth)), ‌​bcd -> List(Map(weig‌​ht -> 182, type -> ‌​bcd, match -> secon‌​d)))
I don't think there is out of the box solution for what you are trying to achieve. Map is used in two ways: get a value by a key, and iterate over it. The first part is unaffected by a sort order, so groupBy works perfectly well. The second part can be achieved by creating a list of keys in required order and then using that list to get key-value pairs from Map in a specific order. For example:
val xs = List(Map("weight"->"1000","type"->"abc","match"->"first"),
Map("weight"->"1000","type"->"abc","match"->"third"),
Map("weight"->"182","type"->"bcd","match"->"second"),
Map("weight"->"150","type"->"bcd","match"->"fourth"),
Map("weight"->"40","type"->"aaa","match"->"‌fifth"))
import collection.mutable.{ListBuffer, Set}
// List of types in the order of appearance in your list
val sortedKeys = xs.map(_("type")).distinct
//> sortedKeys: List[String] = List(abc, bcd, aaa)
Now when you need to iterate you simply do:
val grouped = xs.groupBy(_("type"))
sortedKeys.map(k => (k, grouped(k)))
// List[(String, List[scala.collection.immutable.Map[String,String]])] =
// List(
// (abc,List(Map(weight -> 1000, type -> abc, match -> first),
// Map(weight -> 1000, type -> abc, match -> third))),
// (bcd,List(Map(weight -> 182, type -> bcd, match -> second),
// Map(weight -> 150, type -> bcd, match -> fourth))),
// (aaa,List(Map(weight -> 40, type -> aaa, match -> ‌fifth)))
// )

Scala Map object to be split on the basis of a value property

I have a map in my scala code, that has a string as a key and a userdefined object as the value. I want to split this map to three different map objects based on a property of the value.
Is this possible? Can someone share a way to do this? I have been trying to search but no example could be found. I am a novice at scala and appreciate any help...
Lets say you had a map of person's and you wanted to divide that into three maps based on the age of a person.
case class Person(name: String, age: Int)
val map = Map(
"p1" -> Person("person_1", 15),
"p2" -> Person("person_2", 30),
"p3" -> Person("person_3", 40),
"p4" -> Person("person_4", 55),
"p5" -> Person("person_5", 65)
)
// map: scala.collection.immutable.Map[String,Person] = Map(p4 -> Person(person_4,55), p5 -> Person(person_5,65), p3 -> Person(person_3,40), p2 -> Person(person_2,30), p1 -> Person(person_1,15))
val dividedMaps = map.groupBy({ case (key, person) =>
if (person.age < 20 ) "teenager"
else if (person.age < 50) "adult"
else "old"
})
// dividedMaps: scala.collection.immutable.Map[String,scala.collection.immutable.Map[String,Person]] = Map(old -> Map(p4 -> Person(person_4,55), p5 -> Person(person_5,65)), teenager -> Map(p1 -> Person(person_1,15)), adult -> Map(p3 -> Person(person_3,40), p2 -> Person(person_2,30)))

Better way for aggregation on list of case classes

I have list of case classes. Output requires aggregation on different parameters of case class. Looking for more optimized way to do it.
Example:
case class Students(city: String, college: String, group: String,
name: String, fee: Int, age: Int)
object GroupByStudents {
val studentsList= List(
Students("Mumbai","College1","Science","Jony",100,30),
Students("Mumbai","College1","Science","Tony", 200, 25),
Students("Mumbai","College1","Social","Bony",250,30),
Students("Mumbai","College2","Science","Gony", 240, 28),
Students("Bangalore","College3","Science","Hony", 270, 28))
}
Now to get details of students from a City, i need to first aggregate by City, then break-up those details college wise, then group wise.
Output is list of case class in below format.
Students(Mumbai,,,,790,0) -- aggregate city wise
Students(Mumbai,College1,,,550,0) -- aggregate college wise
Students(Mumbai,College1,Social,,250,0)
Students(Mumbai,College1,Science,,300,0)
Students(Mumbai,College2,,,240,0)
Students(Mumbai,College2,Science,,240,0)
Students(Bangalore,,,,270,0)
Students(Bangalore,College3,,,270,0)
Students(Bangalore,College3,Science,,270,0)
Two methods to achieve this:
1) Loop all list, create a map for each combination (above case 3 combinations
), aggregate data and create new result list and append data to it.
2) Using foldLeft option
studentsList.groupBy(d=>(d.city))
.mapValues(_.foldLeft(Students("","","","",0,0))
((r,c) => Students(c.city,"","","",r.fee+c.fee,0)))
studentsList.groupBy(d=>(d.city,d.college))
.mapValues(_.foldLeft(Students("","","","",0,0))
((r,c) => Students(c.city,c.college,"","",r.fee+c.fee,0)))
studentsList.groupBy(d=>(d.city,d.college,d.group))
.mapValues(_.foldLeft(Students("","","","",0,0))
((r,c) => Students(c.city,c.college,c.group,"",r.fee+c.fee,0)))
In both cases, looping on list more than once. Is there any way to achieve this with single pass and optimized way.
With GroupBy
Code looks a little bit nicer, but I think it isn't faster. With groupby you have always 2 "loops"
studentsList.groupBy(d=>(d.city)).map { case (k,v) =>
Students(v.head.city,"","","",v.map(_.fee).sum, 0)
}
studentsList.groupBy(d=>(d.city,d.college)).map { case (k,v) =>
Students(v.head.city,v.head.college,"","",v.map(_.fee).sum, 0)
}
studentsList.groupBy(d=>(d.city,d.college,d.group)).map { case (k,v) =>
Students(v.head.city,v.head.college,v.head.group,"",v.map(_.fee).sum, 0)
}
You get then Something like this
List(Students(Bangalore,College3,Science,Hony,270,0),
Students(Mumbai,College1,Science,Jony,790,0))
List(Students(Mumbai,College2,,,240,0),
Students(Bangalore,College3,,,270,0),
Students(Mumbai,College1,,,550,0))
List(Students(Bangalore,College3,Science,,270,0),
Students(Mumbai,College2,Science,,240,0),
Students(Mumbai,College1,Social,,250,0),
Students(Mumbai,College1,Science,,300,0))
It is not exactly the same output like in your example, but it is the desired output: a list of case class students.
With a for comprehension
You could avoid this looping if your grouping by yourself. Only have the city example the other are straight forward.
var m = Map[String, Students]()
for (v <- studentsList) {
m += v.city -> Students(v.city,"","","",v.fee + m.getOrElse(v.city, Students("","","","",0,0)).asInstanceOf[Students].fee, 0)
}
m
Output
It's the same Output like your studenList but I only loop one time, for every Map[String,Students] output.
Map(Mumbai -> Students(Mumbai,,,,790,0), Bangalore -> Students(Bangalore,,,,270,0))
With Foldleft
Just going in one loop over the complete list.
val emptyStudent = Students("","","","",0,0);
studentsList.foldLeft(Map[String, Students]()) { case (m, v) =>
m + (v.city -> Students(v.city,"","","",
v.fee + m.getOrElse(v.city, emptyStudent).fee, 0))
}
studentsList.foldLeft(Map[(String,String), Students]()) { case (m, v) =>
m + ((v.city,v.college) -> Students(v.city,v.college,"","",
v.fee + m.getOrElse((v.city,v.college), emptyStudent).fee, 0))
}
studentsList.foldLeft(Map[(String,String,String), Students]()) { case (m, v) =>
m + ((v.city,v.college,v.group) -> Students(v.city,v.college,v.group,"",
v.fee + m.getOrElse((v.city,v.college,v.group), emptyStudent).fee, 0))
}
Output
It's the same Output like your studenList but I only loop one time, for every Map[String,Students] output.
Map(Mumbai -> Students(Mumbai,,,,790,0),
Bangalore -> Students(Bangalore,,,,270,0))
Map((Mumbai,College1) -> Students(Mumbai,College1,,,550,0),
(Mumbai,College2) -> Students(Mumbai,College2,,,240,0),
(Bangalore,College3) -> Students(Bangalore,College3,,,270,0))
Map((Mumbai,College1,Science) -> Students(Mumbai,College1,Science,,300,0),
(Mumbai,College1,Social) -> Students(Mumbai,College1,Social,,250,0),
(Mumbai,College2,Science) -> Students(Mumbai,College2,Science,,240,0),
(Bangalore,College3,Science) -> Students(Bangalore,College3,Science,,270,0))
With FoldLeft One Loop
You can just generate one Big Map with all the List.
val emptyStudent = Students("","","","",0,0);
studentsList.foldLeft(Map[(String,String,String), Students]()) { case (m, v) =>
{
var t = m + ((v.city,"","") -> Students(v.city,"","","",
v.fee + m.getOrElse((v.city,"",""), emptyStudent).fee, 0))
t = t + ((v.city,v.college,"") -> Students(v.city,v.college,"","",
v.fee + m.getOrElse((v.city,v.college,""), emptyStudent).fee, 0))
t + ((v.city,v.college,v.group) -> Students(v.city,v.college,v.group,"",
v.fee + m.getOrElse((v.city,v.college,v.group), emptyStudent).fee, 0))
}
}
Output
In this case you loop one time and get back the results for all aggregating, but only in oneMap. This would work with for comprehension, too.
Map((Mumbai,College1,Science) -> Students(Mumbai,College1,Science,,300,0),
(Bangalore,,) -> Students(Bangalore,,,,270,0),
(Mumbai,College2,Science) -> Students(Mumbai,College2,Science,,240,0),
(Mumbai,College2,) -> Students(Mumbai,College2,,,240,0),
(Mumbai,College1,Social) -> Students(Mumbai,College1,Social,,250,0),
(Mumbai,,) -> Students(Mumbai,,,,790,0),
(Bangalore,College3,) -> Students(Bangalore,College3,,,270,0),
(Mumbai,College1,) -> Students(Mumbai,College1,,,550,0),
(Bangalore,College3,Science) -> Students(Bangalore,College3,Science,,270,0))
The Map is always copied, so it could have some performance and memory issues. To solve this use a for comprehension
For Comprehension One Loop
This generates one Map with the 3 aggregate types.
val emptyStudent = Students("","","","",0,0);
var m = Map[(String,String,String), Students]()
for (v <- studentsList) {
m += ((v.city,"","") -> Students(v.city,"","","", v.fee + m.getOrElse((v.city,"",""), emptyStudent).fee, 0))
m += ((v.city,v.college,"") -> Students(v.city,v.college,"","", v.fee + m.getOrElse((v.city,v.college,""), emptyStudent).fee, 0))
m += ((v.city,v.college,v.group) -> Students(v.city,v.college,v.group,"", v.fee + m.getOrElse((v.city,v.college,v.group), emptyStudent).fee, 0))
}
m
This should be better in terms of memory consumption cause you aren't copy the maps like in the foldLeft example
Output
Map((Mumbai,College1,Science) -> Students(Mumbai,College1,Science,,300,0),
(Bangalore,,) -> Students(Bangalore,,,,270,0),
(Mumbai,College2,Science) -> Students(Mumbai,College2,Science,,240,0),
(Mumbai,College2,) -> Students(Mumbai,College2,,,240,0),
(Mumbai,College1,Social) -> Students(Mumbai,College1,Social,,250,0),
(Mumbai,,) -> Students(Mumbai,,,,790,0), (Bangalore,College3,) -> Students(Bangalore,College3,,,270,0),
(Mumbai,College1,) -> Students(Mumbai,College1,,,550,0),
(Bangalore,College3,Science) -> Students(Bangalore,College3,Science,,270,0))
In all cases you could just reduce the code if you make the parameter optional in your case class students, cause then you can just do something like Students(city=v.city,fee=v.fee+m.getOrElse(v.city,emptyStudent).fee during grouping
Use a foldLeft
First, let's define some type aliases to make the syntax easier
object GroupByStudents {
type City = String
type College = String
type Group = String
type Name = String
type Aggregate = Map[City, Map[College, Map[Group, List[Students]]]]
def emptyAggregate: Aggregate = Map.empty
case class Students(city: City, college: College, group: Group,
name: Name, fee: Int, age: Int)
}
You can aggregate the students list into an Aggregate map in a single foldLeft
object Test {
import GroupByStudents._
def main(args: Array[String]) {
val studentsList = List(
Students("Mumbai","College1","Science","Jony",100,30),
Students("Mumbai","College1","Science","Tony", 200, 25),
Students("Mumbai","College1","Social","Bony",250,30),
Students("Mumbai","College2","Science","Gony", 240, 28),
Students("Bangalore","College3","Science","Hony", 270, 28))
val aggregated = studentsList.foldLeft(emptyAggregate){(agg, students) =>
val cityBin = agg.getOrElse(students.city, Map.empty)
val collegeBin = cityBin.getOrElse(students.college, Map.empty)
val groupBin = collegeBin.getOrElse(students.group, List.empty)
val nextGroupBin = students :: groupBin
val nextCollegeBin= collegeBin + (students.group -> nextGroupBin)
val nextCityBin = cityBin + (students.college -> nextCollegeBin)
agg + (students.city -> nextCityBin)
}
}
}
aggregated can then be mapped over to calculate fees.
If you really want, you can calculate the fees in the foldLeft itself, but this would make the code harder to read.
Note that you can also try monocle's lenses to put the students value in the aggregated structure.

How to sort a list of maps in scala?

I query data from multiple tables, and each has a customized key. I put the data from these tables into a list of maps and want to sort it by the id value.
What I end up with is:
var g = groups.map(i => Map("id" -> i._1, "job" -> i._2))
var p = people.map(i => Map("id" -> i._1, "job" -> i._2))
var party = g ++ p
Which gives me:
var party = List(
Map(id -> 1, job -> Group1),
Map(id -> 2, job -> Group2),
Map(id -> 2>1, job -> Person1Group2)
Map(id -> 1>1, job -> Person1Group1),
Map(id -> 1>2, job -> Person2Group1)
)
But I want to sort by id so that I have it in an order so that i can populate a tree structure:
var party = List(
Map(id -> 1, job -> Group1),
Map(id -> 1>1, job -> Person1Group1),
Map(id -> 1>2, job -> Person2Group1),
Map(id -> 2, job -> Group2),
Map(id -> 2>1, job -> Person1Group2)
)
How do I do this?
A minor refactoring of the associations in each Map by using case classes may simplify the subsequent coding; consider
case class Item(id: String, job: String)
and so by using (immutable) values,
val g = groups.map(i => Item(i._1, i._2)
val p = people.map(i => Item(i._1, i._2)
Then
(g ++ p).sortBy(_.id)
brings a list of items sorted by id.
If you wish to group jobs by id, consider
(g ++ p).groupBy(_.id)
which delivers a Map from ids onto lists of items with common id. From this Map you can use mapValues to extract the actual jobs.
as hinted above party.sortBy(_("id")) should do it