I have a map like :
val programming = Map(("functional", 1) -> "scala", ("functional", 2) -> "perl", ("orientedObject", 1) -> "java", ("orientedObject", 2) -> "C++")
with the same first element of key appearing multiple times.
How to regroup all the values corresponding to the same first element of key ? Which would turn this map into :
Map("functional" -> List("scala","perl"), "orientedObject" -> List("java","C++"))
UPDATE: This answer is based upon your original question. If you need the more complex Map definition, using a tuple as the key, then the other answers will address your requirements. You may still find this approach simpler.
As has been pointed out, you can't actually have multiple keys with the same value in a map. In the REPL, you'll note that your declaration becomes:
scala> val programming = Map("functional" -> "scala", "functional" -> "perl", "orientedObject" -> "java", "orientedObject" -> "C++")
programming: scala.collection.immutable.Map[String,String] = Map(functional -> perl, orientedObject -> C++)
So you end up missing some values. If you make this a List instead, you can get what you want as follows:
scala> val programming = List("functional" -> "scala", "functional" -> "perl", "orientedObject" -> "java", "orientedObject" -> "C++")
programming: List[(String, String)] = List((functional,scala), (functional,perl), (orientedObject,java), (orientedObject,C++))
scala> programming.groupBy(_._1).map(p => p._1 -> p._2.map(_._2)).toMap
res0: scala.collection.immutable.Map[String,List[String]] = Map(functional -> List(scala, perl), orientedObject -> List(java, C++))
Based on your edit, you have a data structure that looks something like this
val programming = Map(("functional", 1) -> "scala", ("functional", 2) -> "perl",
("orientedObject", 1) -> "java", ("orientedObject", 2) -> "C++")
and you want to scrap the numerical indices and group by the string key. Fortunately, Scala provides a built-in that gets you close.
programming groupBy { case ((k, _), _) => k }
This will return a new map which contains submaps of the original, grouped by the key that we return from the "partial" function. But we want a map of lists, so let's ignore the keys in the submaps.
programming groupBy { case ((k, _), _) => k } mapValues { _.values }
This gets us a map of... some kind of Iterable. But we really want lists, so let's take the final step and convert to a list.
programming groupBy { case ((k, _), _) => k } mapValues { _.values.toList }
You should try the .groupBy method
programming.groupBy(_._1._1)
and you will get
scala> programming.groupBy(_._1._1)
res1: scala.collection.immutable.Map[String,scala.collection.immutable.Map[(String, Int),String]] = Map(functional -> Map((functional,1) -> scala, (functional,2) -> perl), orientedObject -> Map((orientedObject,1) -> java, (orientedObject,2) -> C++))
you can now "clean" by doing something like:
scala> res1.mapValues(m => m.values.toList)
res3: scala.collection.immutable.Map[String,List[String]] = Map(functional -> List(scala, perl), orientedObject -> List(java, C++))
Read the csv file and create a map that contains key and list of values.
val fileStream = getClass.getResourceAsStream("/keyvaluepair.csv")
val lines = Source.fromInputStream(fileStream).getLines
var mp = Seq[List[(String, String)]]();
var codeMap=List[(String, String)]();
var res = Map[String,List[String]]();
for(line <- lines )
{
val cols=line.split(",").map(_.trim())
codeMap ++= Map(cols(0)->cols(1))
}
res = codeMap.groupBy(_._1).map(p => p._1 -> p._2.map(_._2)).toMap
Since no one has put in the specific ordering he asked for:
programming.groupBy(_._1._1)
.mapValues(_.toSeq.map { case ((t, i), l) => (i, l) }.sortBy(_._1).map(_._2))
I have the following data:
List(Map("weight"->"1000","type"->"abc","match"->"first"),Map("weight"->"1000","type"->"abc","match"->"third"),Map("weight"->"182","type"->"bcd","match"->"second"),Map("weight"->"150","type"->"bcd","match"->"fourth"),Map("weight"->"40","type"->"aaa","match"->"fifth"))
After grouping by "type" i would want results of first "abc" then "bcd" then "aaa"
When I apply group by "type" the resulting map gives first key as aaa whereas I want the first key to be "abc" and all corresponding values.
how can I achieve this?
As mentioned in the comments you need a sorted map, which is not created by a simple group by. What you could do is:
val a = List(Map("weight"->"1000", "type"->"abc","match"->"first"), Map("weight"->"1000","type"->"abc","match"->"third"),Map("weight"->"182","type"->"bcd","match"->"second"), Map("weight"->"150","type"->"bcd","match"->"fourth"), Map("weight"->"40","type"->"aaa","match"->"fifth"))
val sortedGrouping = SortedMap.empty[String,String] ++ a.groupBy { x => x("type") }
println(sortedGrouping)
What you get (printed) is:
Map(aaa -> List(Map(weight -> 40, type -> aaa, match -> fifth)), abc -> List(Map(weight -> 1000, type -> abc, match -> first), Map(weight -> 1000, type -> abc, match -> third)), bcd -> List(Map(weight -> 150, type -> bcd, match -> fourth)), bcd -> List(Map(weight -> 182, type -> bcd, match -> second)))
I don't think there is out of the box solution for what you are trying to achieve. Map is used in two ways: get a value by a key, and iterate over it. The first part is unaffected by a sort order, so groupBy works perfectly well. The second part can be achieved by creating a list of keys in required order and then using that list to get key-value pairs from Map in a specific order. For example:
val xs = List(Map("weight"->"1000","type"->"abc","match"->"first"),
Map("weight"->"1000","type"->"abc","match"->"third"),
Map("weight"->"182","type"->"bcd","match"->"second"),
Map("weight"->"150","type"->"bcd","match"->"fourth"),
Map("weight"->"40","type"->"aaa","match"->"fifth"))
import collection.mutable.{ListBuffer, Set}
// List of types in the order of appearance in your list
val sortedKeys = xs.map(_("type")).distinct
//> sortedKeys: List[String] = List(abc, bcd, aaa)
Now when you need to iterate you simply do:
val grouped = xs.groupBy(_("type"))
sortedKeys.map(k => (k, grouped(k)))
// List[(String, List[scala.collection.immutable.Map[String,String]])] =
// List(
// (abc,List(Map(weight -> 1000, type -> abc, match -> first),
// Map(weight -> 1000, type -> abc, match -> third))),
// (bcd,List(Map(weight -> 182, type -> bcd, match -> second),
// Map(weight -> 150, type -> bcd, match -> fourth))),
// (aaa,List(Map(weight -> 40, type -> aaa, match -> fifth)))
// )
I have list of case classes. Output requires aggregation on different parameters of case class. Looking for more optimized way to do it.
Example:
case class Students(city: String, college: String, group: String,
name: String, fee: Int, age: Int)
object GroupByStudents {
val studentsList= List(
Students("Mumbai","College1","Science","Jony",100,30),
Students("Mumbai","College1","Science","Tony", 200, 25),
Students("Mumbai","College1","Social","Bony",250,30),
Students("Mumbai","College2","Science","Gony", 240, 28),
Students("Bangalore","College3","Science","Hony", 270, 28))
}
Now to get details of students from a City, i need to first aggregate by City, then break-up those details college wise, then group wise.
Output is list of case class in below format.
Students(Mumbai,,,,790,0) -- aggregate city wise
Students(Mumbai,College1,,,550,0) -- aggregate college wise
Students(Mumbai,College1,Social,,250,0)
Students(Mumbai,College1,Science,,300,0)
Students(Mumbai,College2,,,240,0)
Students(Mumbai,College2,Science,,240,0)
Students(Bangalore,,,,270,0)
Students(Bangalore,College3,,,270,0)
Students(Bangalore,College3,Science,,270,0)
Two methods to achieve this:
1) Loop all list, create a map for each combination (above case 3 combinations
), aggregate data and create new result list and append data to it.
2) Using foldLeft option
studentsList.groupBy(d=>(d.city))
.mapValues(_.foldLeft(Students("","","","",0,0))
((r,c) => Students(c.city,"","","",r.fee+c.fee,0)))
studentsList.groupBy(d=>(d.city,d.college))
.mapValues(_.foldLeft(Students("","","","",0,0))
((r,c) => Students(c.city,c.college,"","",r.fee+c.fee,0)))
studentsList.groupBy(d=>(d.city,d.college,d.group))
.mapValues(_.foldLeft(Students("","","","",0,0))
((r,c) => Students(c.city,c.college,c.group,"",r.fee+c.fee,0)))
In both cases, looping on list more than once. Is there any way to achieve this with single pass and optimized way.
With GroupBy
Code looks a little bit nicer, but I think it isn't faster. With groupby you have always 2 "loops"
studentsList.groupBy(d=>(d.city)).map { case (k,v) =>
Students(v.head.city,"","","",v.map(_.fee).sum, 0)
}
studentsList.groupBy(d=>(d.city,d.college)).map { case (k,v) =>
Students(v.head.city,v.head.college,"","",v.map(_.fee).sum, 0)
}
studentsList.groupBy(d=>(d.city,d.college,d.group)).map { case (k,v) =>
Students(v.head.city,v.head.college,v.head.group,"",v.map(_.fee).sum, 0)
}
You get then Something like this
List(Students(Bangalore,College3,Science,Hony,270,0),
Students(Mumbai,College1,Science,Jony,790,0))
List(Students(Mumbai,College2,,,240,0),
Students(Bangalore,College3,,,270,0),
Students(Mumbai,College1,,,550,0))
List(Students(Bangalore,College3,Science,,270,0),
Students(Mumbai,College2,Science,,240,0),
Students(Mumbai,College1,Social,,250,0),
Students(Mumbai,College1,Science,,300,0))
It is not exactly the same output like in your example, but it is the desired output: a list of case class students.
With a for comprehension
You could avoid this looping if your grouping by yourself. Only have the city example the other are straight forward.
var m = Map[String, Students]()
for (v <- studentsList) {
m += v.city -> Students(v.city,"","","",v.fee + m.getOrElse(v.city, Students("","","","",0,0)).asInstanceOf[Students].fee, 0)
}
m
Output
It's the same Output like your studenList but I only loop one time, for every Map[String,Students] output.
Map(Mumbai -> Students(Mumbai,,,,790,0), Bangalore -> Students(Bangalore,,,,270,0))
With Foldleft
Just going in one loop over the complete list.
val emptyStudent = Students("","","","",0,0);
studentsList.foldLeft(Map[String, Students]()) { case (m, v) =>
m + (v.city -> Students(v.city,"","","",
v.fee + m.getOrElse(v.city, emptyStudent).fee, 0))
}
studentsList.foldLeft(Map[(String,String), Students]()) { case (m, v) =>
m + ((v.city,v.college) -> Students(v.city,v.college,"","",
v.fee + m.getOrElse((v.city,v.college), emptyStudent).fee, 0))
}
studentsList.foldLeft(Map[(String,String,String), Students]()) { case (m, v) =>
m + ((v.city,v.college,v.group) -> Students(v.city,v.college,v.group,"",
v.fee + m.getOrElse((v.city,v.college,v.group), emptyStudent).fee, 0))
}
Output
It's the same Output like your studenList but I only loop one time, for every Map[String,Students] output.
Map(Mumbai -> Students(Mumbai,,,,790,0),
Bangalore -> Students(Bangalore,,,,270,0))
Map((Mumbai,College1) -> Students(Mumbai,College1,,,550,0),
(Mumbai,College2) -> Students(Mumbai,College2,,,240,0),
(Bangalore,College3) -> Students(Bangalore,College3,,,270,0))
Map((Mumbai,College1,Science) -> Students(Mumbai,College1,Science,,300,0),
(Mumbai,College1,Social) -> Students(Mumbai,College1,Social,,250,0),
(Mumbai,College2,Science) -> Students(Mumbai,College2,Science,,240,0),
(Bangalore,College3,Science) -> Students(Bangalore,College3,Science,,270,0))
With FoldLeft One Loop
You can just generate one Big Map with all the List.
val emptyStudent = Students("","","","",0,0);
studentsList.foldLeft(Map[(String,String,String), Students]()) { case (m, v) =>
{
var t = m + ((v.city,"","") -> Students(v.city,"","","",
v.fee + m.getOrElse((v.city,"",""), emptyStudent).fee, 0))
t = t + ((v.city,v.college,"") -> Students(v.city,v.college,"","",
v.fee + m.getOrElse((v.city,v.college,""), emptyStudent).fee, 0))
t + ((v.city,v.college,v.group) -> Students(v.city,v.college,v.group,"",
v.fee + m.getOrElse((v.city,v.college,v.group), emptyStudent).fee, 0))
}
}
Output
In this case you loop one time and get back the results for all aggregating, but only in oneMap. This would work with for comprehension, too.
Map((Mumbai,College1,Science) -> Students(Mumbai,College1,Science,,300,0),
(Bangalore,,) -> Students(Bangalore,,,,270,0),
(Mumbai,College2,Science) -> Students(Mumbai,College2,Science,,240,0),
(Mumbai,College2,) -> Students(Mumbai,College2,,,240,0),
(Mumbai,College1,Social) -> Students(Mumbai,College1,Social,,250,0),
(Mumbai,,) -> Students(Mumbai,,,,790,0),
(Bangalore,College3,) -> Students(Bangalore,College3,,,270,0),
(Mumbai,College1,) -> Students(Mumbai,College1,,,550,0),
(Bangalore,College3,Science) -> Students(Bangalore,College3,Science,,270,0))
The Map is always copied, so it could have some performance and memory issues. To solve this use a for comprehension
For Comprehension One Loop
This generates one Map with the 3 aggregate types.
val emptyStudent = Students("","","","",0,0);
var m = Map[(String,String,String), Students]()
for (v <- studentsList) {
m += ((v.city,"","") -> Students(v.city,"","","", v.fee + m.getOrElse((v.city,"",""), emptyStudent).fee, 0))
m += ((v.city,v.college,"") -> Students(v.city,v.college,"","", v.fee + m.getOrElse((v.city,v.college,""), emptyStudent).fee, 0))
m += ((v.city,v.college,v.group) -> Students(v.city,v.college,v.group,"", v.fee + m.getOrElse((v.city,v.college,v.group), emptyStudent).fee, 0))
}
m
This should be better in terms of memory consumption cause you aren't copy the maps like in the foldLeft example
Output
Map((Mumbai,College1,Science) -> Students(Mumbai,College1,Science,,300,0),
(Bangalore,,) -> Students(Bangalore,,,,270,0),
(Mumbai,College2,Science) -> Students(Mumbai,College2,Science,,240,0),
(Mumbai,College2,) -> Students(Mumbai,College2,,,240,0),
(Mumbai,College1,Social) -> Students(Mumbai,College1,Social,,250,0),
(Mumbai,,) -> Students(Mumbai,,,,790,0), (Bangalore,College3,) -> Students(Bangalore,College3,,,270,0),
(Mumbai,College1,) -> Students(Mumbai,College1,,,550,0),
(Bangalore,College3,Science) -> Students(Bangalore,College3,Science,,270,0))
In all cases you could just reduce the code if you make the parameter optional in your case class students, cause then you can just do something like Students(city=v.city,fee=v.fee+m.getOrElse(v.city,emptyStudent).fee during grouping
Use a foldLeft
First, let's define some type aliases to make the syntax easier
object GroupByStudents {
type City = String
type College = String
type Group = String
type Name = String
type Aggregate = Map[City, Map[College, Map[Group, List[Students]]]]
def emptyAggregate: Aggregate = Map.empty
case class Students(city: City, college: College, group: Group,
name: Name, fee: Int, age: Int)
}
You can aggregate the students list into an Aggregate map in a single foldLeft
object Test {
import GroupByStudents._
def main(args: Array[String]) {
val studentsList = List(
Students("Mumbai","College1","Science","Jony",100,30),
Students("Mumbai","College1","Science","Tony", 200, 25),
Students("Mumbai","College1","Social","Bony",250,30),
Students("Mumbai","College2","Science","Gony", 240, 28),
Students("Bangalore","College3","Science","Hony", 270, 28))
val aggregated = studentsList.foldLeft(emptyAggregate){(agg, students) =>
val cityBin = agg.getOrElse(students.city, Map.empty)
val collegeBin = cityBin.getOrElse(students.college, Map.empty)
val groupBin = collegeBin.getOrElse(students.group, List.empty)
val nextGroupBin = students :: groupBin
val nextCollegeBin= collegeBin + (students.group -> nextGroupBin)
val nextCityBin = cityBin + (students.college -> nextCollegeBin)
agg + (students.city -> nextCityBin)
}
}
}
aggregated can then be mapped over to calculate fees.
If you really want, you can calculate the fees in the foldLeft itself, but this would make the code harder to read.
Note that you can also try monocle's lenses to put the students value in the aggregated structure.