Scala map co-occurence counts - scala

I have a constantly-updating mutable.HashMap[String, String] with a record of current user locations:
{user1 -> location1,
user2 -> location4,
user3 -> location4}
I want to keep track of the location co-occurences between users - that is, how many times each pair of users has been in the same location. The format I have in mind is a mutable.HashMap[(String, String), Int]:
{(user1, user2) -> 0,
(user1, user3) -> 0,
(user2, user3) -> 1}
Each time the user location map updates, I want to re-examine which users are together and add 1 to their running count of co-occurences.
The code below returns a map of {location -> Array(users)}, which feels like a good first step.
var users_by_location = user_locations.groupBy(_._2).mapValues{s => s.map{ case(user, location) => user}}
> {location1 -> Array(user1), location4 -> Array(user2, user3)}
I'm using scala 2.11.8.

Use the subsets(2) to get all combinations of keys and compare whether equal to generate a new map, like:
val m = Map("user1" -> "location1", "user2" -> "location2", "user3" -> "location2")
val result = m.keySet.subsets(2).map(_.toList).map(i => (i.head, i(1))).map(i => if (m.get(i._1) == m.get(i._2)) (i, 1) else (i, 0)).toMap
println(result)
> Map((user1,user2) -> 0, (user1,user3) -> 0, (user2,user3) -> 1)

Related

How to combine all the values with the same key in Scala?

I have a map like :
val programming = Map(("functional", 1) -> "scala", ("functional", 2) -> "perl", ("orientedObject", 1) -> "java", ("orientedObject", 2) -> "C++")
with the same first element of key appearing multiple times.
How to regroup all the values corresponding to the same first element of key ? Which would turn this map into :
Map("functional" -> List("scala","perl"), "orientedObject" -> List("java","C++"))
UPDATE: This answer is based upon your original question. If you need the more complex Map definition, using a tuple as the key, then the other answers will address your requirements. You may still find this approach simpler.
As has been pointed out, you can't actually have multiple keys with the same value in a map. In the REPL, you'll note that your declaration becomes:
scala> val programming = Map("functional" -> "scala", "functional" -> "perl", "orientedObject" -> "java", "orientedObject" -> "C++")
programming: scala.collection.immutable.Map[String,String] = Map(functional -> perl, orientedObject -> C++)
So you end up missing some values. If you make this a List instead, you can get what you want as follows:
scala> val programming = List("functional" -> "scala", "functional" -> "perl", "orientedObject" -> "java", "orientedObject" -> "C++")
programming: List[(String, String)] = List((functional,scala), (functional,perl), (orientedObject,java), (orientedObject,C++))
scala> programming.groupBy(_._1).map(p => p._1 -> p._2.map(_._2)).toMap
res0: scala.collection.immutable.Map[String,List[String]] = Map(functional -> List(scala, perl), orientedObject -> List(java, C++))
Based on your edit, you have a data structure that looks something like this
val programming = Map(("functional", 1) -> "scala", ("functional", 2) -> "perl",
("orientedObject", 1) -> "java", ("orientedObject", 2) -> "C++")
and you want to scrap the numerical indices and group by the string key. Fortunately, Scala provides a built-in that gets you close.
programming groupBy { case ((k, _), _) => k }
This will return a new map which contains submaps of the original, grouped by the key that we return from the "partial" function. But we want a map of lists, so let's ignore the keys in the submaps.
programming groupBy { case ((k, _), _) => k } mapValues { _.values }
This gets us a map of... some kind of Iterable. But we really want lists, so let's take the final step and convert to a list.
programming groupBy { case ((k, _), _) => k } mapValues { _.values.toList }
You should try the .groupBy method
programming.groupBy(_._1._1)
and you will get
scala> programming.groupBy(_._1._1)
res1: scala.collection.immutable.Map[String,scala.collection.immutable.Map[(String, Int),String]] = Map(functional -> Map((functional,1) -> scala, (functional,2) -> perl), orientedObject -> Map((orientedObject,1) -> java, (orientedObject,2) -> C++))
you can now "clean" by doing something like:
scala> res1.mapValues(m => m.values.toList)
res3: scala.collection.immutable.Map[String,List[String]] = Map(functional -> List(scala, perl), orientedObject -> List(java, C++))
Read the csv file and create a map that contains key and list of values.
val fileStream = getClass.getResourceAsStream("/keyvaluepair.csv")
val lines = Source.fromInputStream(fileStream).getLines
var mp = Seq[List[(String, String)]]();
var codeMap=List[(String, String)]();
var res = Map[String,List[String]]();
for(line <- lines )
{
val cols=line.split(",").map(_.trim())
codeMap ++= Map(cols(0)->cols(1))
}
res = codeMap.groupBy(_._1).map(p => p._1 -> p._2.map(_._2)).toMap
Since no one has put in the specific ordering he asked for:
programming.groupBy(_._1._1)
.mapValues(_.toSeq.map { case ((t, i), l) => (i, l) }.sortBy(_._1).map(_._2))

Scala group By First Occurrence

I have the following data:
List(Map("weight"->"1000","type"->"abc","match"->"first"),Ma‌​p("weight"->"1000","‌​type"->"abc","match"‌​->"third"),Map("weig‌​ht"->"182","type"->"‌​bcd","match"->"secon‌​d"),Map("weight"->"1‌​50","type"->"bcd","m‌​atch"->"fourth"),Map‌​("weight"->"40","typ‌​e"->"aaa","match"->"‌​fifth"))
After grouping by "type" i would want results of first "abc" then "bcd" then "aaa"
When I apply group by "type" the resulting map gives first key as aaa whereas I want the first key to be "abc" and all corresponding values.
how can I achieve this?
As mentioned in the comments you need a sorted map, which is not created by a simple group by. What you could do is:
val a = List(Map("weight"->"1000", "type"->"abc","match"->"first"), Map("weight"->"1000","type"->"abc","match"->"third"),Map("weig‌​ht"->"182","type"->"‌​bcd","match"->"secon‌​d"), Map("weight"->"1‌​50","type"->"bcd","m‌​atch"->"fourth"), Map("weight"->"40","type"->"aaa","match"->"‌​fifth"))
val sortedGrouping = SortedMap.empty[String,String] ++ a.groupBy { x => x("type") }
println(sortedGrouping)
What you get (printed) is:
Map(aaa -> List(Map(weight -> 40, type -> aaa, match -> ‌​fifth)), abc -> List(Map(weight -> 1000, type -> abc, match -> first), Map(weight -> 1000, type -> abc, match -> third)), bcd -> List(Map(weight -> 1‌​50, type -> bcd, m‌​atch -> fourth)), ‌​bcd -> List(Map(weig‌​ht -> 182, type -> ‌​bcd, match -> secon‌​d)))
I don't think there is out of the box solution for what you are trying to achieve. Map is used in two ways: get a value by a key, and iterate over it. The first part is unaffected by a sort order, so groupBy works perfectly well. The second part can be achieved by creating a list of keys in required order and then using that list to get key-value pairs from Map in a specific order. For example:
val xs = List(Map("weight"->"1000","type"->"abc","match"->"first"),
Map("weight"->"1000","type"->"abc","match"->"third"),
Map("weight"->"182","type"->"bcd","match"->"second"),
Map("weight"->"150","type"->"bcd","match"->"fourth"),
Map("weight"->"40","type"->"aaa","match"->"‌fifth"))
import collection.mutable.{ListBuffer, Set}
// List of types in the order of appearance in your list
val sortedKeys = xs.map(_("type")).distinct
//> sortedKeys: List[String] = List(abc, bcd, aaa)
Now when you need to iterate you simply do:
val grouped = xs.groupBy(_("type"))
sortedKeys.map(k => (k, grouped(k)))
// List[(String, List[scala.collection.immutable.Map[String,String]])] =
// List(
// (abc,List(Map(weight -> 1000, type -> abc, match -> first),
// Map(weight -> 1000, type -> abc, match -> third))),
// (bcd,List(Map(weight -> 182, type -> bcd, match -> second),
// Map(weight -> 150, type -> bcd, match -> fourth))),
// (aaa,List(Map(weight -> 40, type -> aaa, match -> ‌fifth)))
// )

Scala Map object to be split on the basis of a value property

I have a map in my scala code, that has a string as a key and a userdefined object as the value. I want to split this map to three different map objects based on a property of the value.
Is this possible? Can someone share a way to do this? I have been trying to search but no example could be found. I am a novice at scala and appreciate any help...
Lets say you had a map of person's and you wanted to divide that into three maps based on the age of a person.
case class Person(name: String, age: Int)
val map = Map(
"p1" -> Person("person_1", 15),
"p2" -> Person("person_2", 30),
"p3" -> Person("person_3", 40),
"p4" -> Person("person_4", 55),
"p5" -> Person("person_5", 65)
)
// map: scala.collection.immutable.Map[String,Person] = Map(p4 -> Person(person_4,55), p5 -> Person(person_5,65), p3 -> Person(person_3,40), p2 -> Person(person_2,30), p1 -> Person(person_1,15))
val dividedMaps = map.groupBy({ case (key, person) =>
if (person.age < 20 ) "teenager"
else if (person.age < 50) "adult"
else "old"
})
// dividedMaps: scala.collection.immutable.Map[String,scala.collection.immutable.Map[String,Person]] = Map(old -> Map(p4 -> Person(person_4,55), p5 -> Person(person_5,65)), teenager -> Map(p1 -> Person(person_1,15)), adult -> Map(p3 -> Person(person_3,40), p2 -> Person(person_2,30)))

Better way for aggregation on list of case classes

I have list of case classes. Output requires aggregation on different parameters of case class. Looking for more optimized way to do it.
Example:
case class Students(city: String, college: String, group: String,
name: String, fee: Int, age: Int)
object GroupByStudents {
val studentsList= List(
Students("Mumbai","College1","Science","Jony",100,30),
Students("Mumbai","College1","Science","Tony", 200, 25),
Students("Mumbai","College1","Social","Bony",250,30),
Students("Mumbai","College2","Science","Gony", 240, 28),
Students("Bangalore","College3","Science","Hony", 270, 28))
}
Now to get details of students from a City, i need to first aggregate by City, then break-up those details college wise, then group wise.
Output is list of case class in below format.
Students(Mumbai,,,,790,0) -- aggregate city wise
Students(Mumbai,College1,,,550,0) -- aggregate college wise
Students(Mumbai,College1,Social,,250,0)
Students(Mumbai,College1,Science,,300,0)
Students(Mumbai,College2,,,240,0)
Students(Mumbai,College2,Science,,240,0)
Students(Bangalore,,,,270,0)
Students(Bangalore,College3,,,270,0)
Students(Bangalore,College3,Science,,270,0)
Two methods to achieve this:
1) Loop all list, create a map for each combination (above case 3 combinations
), aggregate data and create new result list and append data to it.
2) Using foldLeft option
studentsList.groupBy(d=>(d.city))
.mapValues(_.foldLeft(Students("","","","",0,0))
((r,c) => Students(c.city,"","","",r.fee+c.fee,0)))
studentsList.groupBy(d=>(d.city,d.college))
.mapValues(_.foldLeft(Students("","","","",0,0))
((r,c) => Students(c.city,c.college,"","",r.fee+c.fee,0)))
studentsList.groupBy(d=>(d.city,d.college,d.group))
.mapValues(_.foldLeft(Students("","","","",0,0))
((r,c) => Students(c.city,c.college,c.group,"",r.fee+c.fee,0)))
In both cases, looping on list more than once. Is there any way to achieve this with single pass and optimized way.
With GroupBy
Code looks a little bit nicer, but I think it isn't faster. With groupby you have always 2 "loops"
studentsList.groupBy(d=>(d.city)).map { case (k,v) =>
Students(v.head.city,"","","",v.map(_.fee).sum, 0)
}
studentsList.groupBy(d=>(d.city,d.college)).map { case (k,v) =>
Students(v.head.city,v.head.college,"","",v.map(_.fee).sum, 0)
}
studentsList.groupBy(d=>(d.city,d.college,d.group)).map { case (k,v) =>
Students(v.head.city,v.head.college,v.head.group,"",v.map(_.fee).sum, 0)
}
You get then Something like this
List(Students(Bangalore,College3,Science,Hony,270,0),
Students(Mumbai,College1,Science,Jony,790,0))
List(Students(Mumbai,College2,,,240,0),
Students(Bangalore,College3,,,270,0),
Students(Mumbai,College1,,,550,0))
List(Students(Bangalore,College3,Science,,270,0),
Students(Mumbai,College2,Science,,240,0),
Students(Mumbai,College1,Social,,250,0),
Students(Mumbai,College1,Science,,300,0))
It is not exactly the same output like in your example, but it is the desired output: a list of case class students.
With a for comprehension
You could avoid this looping if your grouping by yourself. Only have the city example the other are straight forward.
var m = Map[String, Students]()
for (v <- studentsList) {
m += v.city -> Students(v.city,"","","",v.fee + m.getOrElse(v.city, Students("","","","",0,0)).asInstanceOf[Students].fee, 0)
}
m
Output
It's the same Output like your studenList but I only loop one time, for every Map[String,Students] output.
Map(Mumbai -> Students(Mumbai,,,,790,0), Bangalore -> Students(Bangalore,,,,270,0))
With Foldleft
Just going in one loop over the complete list.
val emptyStudent = Students("","","","",0,0);
studentsList.foldLeft(Map[String, Students]()) { case (m, v) =>
m + (v.city -> Students(v.city,"","","",
v.fee + m.getOrElse(v.city, emptyStudent).fee, 0))
}
studentsList.foldLeft(Map[(String,String), Students]()) { case (m, v) =>
m + ((v.city,v.college) -> Students(v.city,v.college,"","",
v.fee + m.getOrElse((v.city,v.college), emptyStudent).fee, 0))
}
studentsList.foldLeft(Map[(String,String,String), Students]()) { case (m, v) =>
m + ((v.city,v.college,v.group) -> Students(v.city,v.college,v.group,"",
v.fee + m.getOrElse((v.city,v.college,v.group), emptyStudent).fee, 0))
}
Output
It's the same Output like your studenList but I only loop one time, for every Map[String,Students] output.
Map(Mumbai -> Students(Mumbai,,,,790,0),
Bangalore -> Students(Bangalore,,,,270,0))
Map((Mumbai,College1) -> Students(Mumbai,College1,,,550,0),
(Mumbai,College2) -> Students(Mumbai,College2,,,240,0),
(Bangalore,College3) -> Students(Bangalore,College3,,,270,0))
Map((Mumbai,College1,Science) -> Students(Mumbai,College1,Science,,300,0),
(Mumbai,College1,Social) -> Students(Mumbai,College1,Social,,250,0),
(Mumbai,College2,Science) -> Students(Mumbai,College2,Science,,240,0),
(Bangalore,College3,Science) -> Students(Bangalore,College3,Science,,270,0))
With FoldLeft One Loop
You can just generate one Big Map with all the List.
val emptyStudent = Students("","","","",0,0);
studentsList.foldLeft(Map[(String,String,String), Students]()) { case (m, v) =>
{
var t = m + ((v.city,"","") -> Students(v.city,"","","",
v.fee + m.getOrElse((v.city,"",""), emptyStudent).fee, 0))
t = t + ((v.city,v.college,"") -> Students(v.city,v.college,"","",
v.fee + m.getOrElse((v.city,v.college,""), emptyStudent).fee, 0))
t + ((v.city,v.college,v.group) -> Students(v.city,v.college,v.group,"",
v.fee + m.getOrElse((v.city,v.college,v.group), emptyStudent).fee, 0))
}
}
Output
In this case you loop one time and get back the results for all aggregating, but only in oneMap. This would work with for comprehension, too.
Map((Mumbai,College1,Science) -> Students(Mumbai,College1,Science,,300,0),
(Bangalore,,) -> Students(Bangalore,,,,270,0),
(Mumbai,College2,Science) -> Students(Mumbai,College2,Science,,240,0),
(Mumbai,College2,) -> Students(Mumbai,College2,,,240,0),
(Mumbai,College1,Social) -> Students(Mumbai,College1,Social,,250,0),
(Mumbai,,) -> Students(Mumbai,,,,790,0),
(Bangalore,College3,) -> Students(Bangalore,College3,,,270,0),
(Mumbai,College1,) -> Students(Mumbai,College1,,,550,0),
(Bangalore,College3,Science) -> Students(Bangalore,College3,Science,,270,0))
The Map is always copied, so it could have some performance and memory issues. To solve this use a for comprehension
For Comprehension One Loop
This generates one Map with the 3 aggregate types.
val emptyStudent = Students("","","","",0,0);
var m = Map[(String,String,String), Students]()
for (v <- studentsList) {
m += ((v.city,"","") -> Students(v.city,"","","", v.fee + m.getOrElse((v.city,"",""), emptyStudent).fee, 0))
m += ((v.city,v.college,"") -> Students(v.city,v.college,"","", v.fee + m.getOrElse((v.city,v.college,""), emptyStudent).fee, 0))
m += ((v.city,v.college,v.group) -> Students(v.city,v.college,v.group,"", v.fee + m.getOrElse((v.city,v.college,v.group), emptyStudent).fee, 0))
}
m
This should be better in terms of memory consumption cause you aren't copy the maps like in the foldLeft example
Output
Map((Mumbai,College1,Science) -> Students(Mumbai,College1,Science,,300,0),
(Bangalore,,) -> Students(Bangalore,,,,270,0),
(Mumbai,College2,Science) -> Students(Mumbai,College2,Science,,240,0),
(Mumbai,College2,) -> Students(Mumbai,College2,,,240,0),
(Mumbai,College1,Social) -> Students(Mumbai,College1,Social,,250,0),
(Mumbai,,) -> Students(Mumbai,,,,790,0), (Bangalore,College3,) -> Students(Bangalore,College3,,,270,0),
(Mumbai,College1,) -> Students(Mumbai,College1,,,550,0),
(Bangalore,College3,Science) -> Students(Bangalore,College3,Science,,270,0))
In all cases you could just reduce the code if you make the parameter optional in your case class students, cause then you can just do something like Students(city=v.city,fee=v.fee+m.getOrElse(v.city,emptyStudent).fee during grouping
Use a foldLeft
First, let's define some type aliases to make the syntax easier
object GroupByStudents {
type City = String
type College = String
type Group = String
type Name = String
type Aggregate = Map[City, Map[College, Map[Group, List[Students]]]]
def emptyAggregate: Aggregate = Map.empty
case class Students(city: City, college: College, group: Group,
name: Name, fee: Int, age: Int)
}
You can aggregate the students list into an Aggregate map in a single foldLeft
object Test {
import GroupByStudents._
def main(args: Array[String]) {
val studentsList = List(
Students("Mumbai","College1","Science","Jony",100,30),
Students("Mumbai","College1","Science","Tony", 200, 25),
Students("Mumbai","College1","Social","Bony",250,30),
Students("Mumbai","College2","Science","Gony", 240, 28),
Students("Bangalore","College3","Science","Hony", 270, 28))
val aggregated = studentsList.foldLeft(emptyAggregate){(agg, students) =>
val cityBin = agg.getOrElse(students.city, Map.empty)
val collegeBin = cityBin.getOrElse(students.college, Map.empty)
val groupBin = collegeBin.getOrElse(students.group, List.empty)
val nextGroupBin = students :: groupBin
val nextCollegeBin= collegeBin + (students.group -> nextGroupBin)
val nextCityBin = cityBin + (students.college -> nextCollegeBin)
agg + (students.city -> nextCityBin)
}
}
}
aggregated can then be mapped over to calculate fees.
If you really want, you can calculate the fees in the foldLeft itself, but this would make the code harder to read.
Note that you can also try monocle's lenses to put the students value in the aggregated structure.

How to sort a list of maps in scala?

I query data from multiple tables, and each has a customized key. I put the data from these tables into a list of maps and want to sort it by the id value.
What I end up with is:
var g = groups.map(i => Map("id" -> i._1, "job" -> i._2))
var p = people.map(i => Map("id" -> i._1, "job" -> i._2))
var party = g ++ p
Which gives me:
var party = List(
Map(id -> 1, job -> Group1),
Map(id -> 2, job -> Group2),
Map(id -> 2>1, job -> Person1Group2)
Map(id -> 1>1, job -> Person1Group1),
Map(id -> 1>2, job -> Person2Group1)
)
But I want to sort by id so that I have it in an order so that i can populate a tree structure:
var party = List(
Map(id -> 1, job -> Group1),
Map(id -> 1>1, job -> Person1Group1),
Map(id -> 1>2, job -> Person2Group1),
Map(id -> 2, job -> Group2),
Map(id -> 2>1, job -> Person1Group2)
)
How do I do this?
A minor refactoring of the associations in each Map by using case classes may simplify the subsequent coding; consider
case class Item(id: String, job: String)
and so by using (immutable) values,
val g = groups.map(i => Item(i._1, i._2)
val p = people.map(i => Item(i._1, i._2)
Then
(g ++ p).sortBy(_.id)
brings a list of items sorted by id.
If you wish to group jobs by id, consider
(g ++ p).groupBy(_.id)
which delivers a Map from ids onto lists of items with common id. From this Map you can use mapValues to extract the actual jobs.
as hinted above party.sortBy(_("id")) should do it