Scala group By First Occurrence - scala

I have the following data:
List(Map("weight"->"1000","type"->"abc","match"->"first"),Ma‌​p("weight"->"1000","‌​type"->"abc","match"‌​->"third"),Map("weig‌​ht"->"182","type"->"‌​bcd","match"->"secon‌​d"),Map("weight"->"1‌​50","type"->"bcd","m‌​atch"->"fourth"),Map‌​("weight"->"40","typ‌​e"->"aaa","match"->"‌​fifth"))
After grouping by "type" i would want results of first "abc" then "bcd" then "aaa"
When I apply group by "type" the resulting map gives first key as aaa whereas I want the first key to be "abc" and all corresponding values.
how can I achieve this?

As mentioned in the comments you need a sorted map, which is not created by a simple group by. What you could do is:
val a = List(Map("weight"->"1000", "type"->"abc","match"->"first"), Map("weight"->"1000","type"->"abc","match"->"third"),Map("weig‌​ht"->"182","type"->"‌​bcd","match"->"secon‌​d"), Map("weight"->"1‌​50","type"->"bcd","m‌​atch"->"fourth"), Map("weight"->"40","type"->"aaa","match"->"‌​fifth"))
val sortedGrouping = SortedMap.empty[String,String] ++ a.groupBy { x => x("type") }
println(sortedGrouping)
What you get (printed) is:
Map(aaa -> List(Map(weight -> 40, type -> aaa, match -> ‌​fifth)), abc -> List(Map(weight -> 1000, type -> abc, match -> first), Map(weight -> 1000, type -> abc, match -> third)), bcd -> List(Map(weight -> 1‌​50, type -> bcd, m‌​atch -> fourth)), ‌​bcd -> List(Map(weig‌​ht -> 182, type -> ‌​bcd, match -> secon‌​d)))

I don't think there is out of the box solution for what you are trying to achieve. Map is used in two ways: get a value by a key, and iterate over it. The first part is unaffected by a sort order, so groupBy works perfectly well. The second part can be achieved by creating a list of keys in required order and then using that list to get key-value pairs from Map in a specific order. For example:
val xs = List(Map("weight"->"1000","type"->"abc","match"->"first"),
Map("weight"->"1000","type"->"abc","match"->"third"),
Map("weight"->"182","type"->"bcd","match"->"second"),
Map("weight"->"150","type"->"bcd","match"->"fourth"),
Map("weight"->"40","type"->"aaa","match"->"‌fifth"))
import collection.mutable.{ListBuffer, Set}
// List of types in the order of appearance in your list
val sortedKeys = xs.map(_("type")).distinct
//> sortedKeys: List[String] = List(abc, bcd, aaa)
Now when you need to iterate you simply do:
val grouped = xs.groupBy(_("type"))
sortedKeys.map(k => (k, grouped(k)))
// List[(String, List[scala.collection.immutable.Map[String,String]])] =
// List(
// (abc,List(Map(weight -> 1000, type -> abc, match -> first),
// Map(weight -> 1000, type -> abc, match -> third))),
// (bcd,List(Map(weight -> 182, type -> bcd, match -> second),
// Map(weight -> 150, type -> bcd, match -> fourth))),
// (aaa,List(Map(weight -> 40, type -> aaa, match -> ‌fifth)))
// )

Related

Replace column values when matching keys in a Map

I have a dataframe with a column named source_system that has the values contained in the keys of this Map:
val convertSourceSystem = Map (
"HON_MUREX3FXFI" -> "MX3_FXFI",
"MAD_MUREX3FXFI" -> "MX3_FXFI",
"MEX_MUREX3FXFI" -> "MX3_LT",
"MX3BRASIL" -> "MX3_BR",
"MX3EUROPAEQ_MAD" -> "MX3_EQ",
"MX3EUROPAEQ_POL" -> "MX3_EQ",
"MXEUROPA_MAD" -> "MX2_EU",
"MXEUROPA_PT" -> "MX2_EU",
"MXEUROPA_UK" -> "MX2_EU",
"MXLATAM_CHI" -> "MX2_LT",
"MXLATAM_NEW" -> "MX2_LT",
"MXLATAM_SOV" -> "MX2_LT",
"POR_MUREX3FXFI" -> "MX3_FXFI",
"SHN_MUREX3FXFI" -> "MX3_FXFI",
"UK_MUREX3FXFI" -> "MX3_FXFI",
"SOV_MX3LATAM" -> "MX3_LT"
)
I need to replace them to the short code, and using a foldLeft to do a withColumn is giving me only null values, because its replacing all the values and the last source_system is not in the map:
val ssReplacedDf = irisToCreamSourceSystem.foldLeft(tempDf) { (acc, filter) =>
acc.withColumn("source_system", when( col("source_system").equalTo(lit(filter._1)),
lit(filter._2)))
}
I would suggest another solution by joining the translation table :
// convert Map to a DataFrame
val convertSourceSystemDF = convertSourceSystem.toSeq.toDF("source_system","source_system_short")
tempDf.join(broadcast(convertSourceSystemDF),Seq("source_system"),"left")
// override column with short name, alternatively use withColumnRenamed
.withColumn("source_system",$"source_system_short")
.drop("source_system_short)

Scala map co-occurence counts

I have a constantly-updating mutable.HashMap[String, String] with a record of current user locations:
{user1 -> location1,
user2 -> location4,
user3 -> location4}
I want to keep track of the location co-occurences between users - that is, how many times each pair of users has been in the same location. The format I have in mind is a mutable.HashMap[(String, String), Int]:
{(user1, user2) -> 0,
(user1, user3) -> 0,
(user2, user3) -> 1}
Each time the user location map updates, I want to re-examine which users are together and add 1 to their running count of co-occurences.
The code below returns a map of {location -> Array(users)}, which feels like a good first step.
var users_by_location = user_locations.groupBy(_._2).mapValues{s => s.map{ case(user, location) => user}}
> {location1 -> Array(user1), location4 -> Array(user2, user3)}
I'm using scala 2.11.8.
Use the subsets(2) to get all combinations of keys and compare whether equal to generate a new map, like:
val m = Map("user1" -> "location1", "user2" -> "location2", "user3" -> "location2")
val result = m.keySet.subsets(2).map(_.toList).map(i => (i.head, i(1))).map(i => if (m.get(i._1) == m.get(i._2)) (i, 1) else (i, 0)).toMap
println(result)
> Map((user1,user2) -> 0, (user1,user3) -> 0, (user2,user3) -> 1)

Scala Map object to be split on the basis of a value property

I have a map in my scala code, that has a string as a key and a userdefined object as the value. I want to split this map to three different map objects based on a property of the value.
Is this possible? Can someone share a way to do this? I have been trying to search but no example could be found. I am a novice at scala and appreciate any help...
Lets say you had a map of person's and you wanted to divide that into three maps based on the age of a person.
case class Person(name: String, age: Int)
val map = Map(
"p1" -> Person("person_1", 15),
"p2" -> Person("person_2", 30),
"p3" -> Person("person_3", 40),
"p4" -> Person("person_4", 55),
"p5" -> Person("person_5", 65)
)
// map: scala.collection.immutable.Map[String,Person] = Map(p4 -> Person(person_4,55), p5 -> Person(person_5,65), p3 -> Person(person_3,40), p2 -> Person(person_2,30), p1 -> Person(person_1,15))
val dividedMaps = map.groupBy({ case (key, person) =>
if (person.age < 20 ) "teenager"
else if (person.age < 50) "adult"
else "old"
})
// dividedMaps: scala.collection.immutable.Map[String,scala.collection.immutable.Map[String,Person]] = Map(old -> Map(p4 -> Person(person_4,55), p5 -> Person(person_5,65)), teenager -> Map(p1 -> Person(person_1,15)), adult -> Map(p3 -> Person(person_3,40), p2 -> Person(person_2,30)))

How to sort a list of maps in scala?

I query data from multiple tables, and each has a customized key. I put the data from these tables into a list of maps and want to sort it by the id value.
What I end up with is:
var g = groups.map(i => Map("id" -> i._1, "job" -> i._2))
var p = people.map(i => Map("id" -> i._1, "job" -> i._2))
var party = g ++ p
Which gives me:
var party = List(
Map(id -> 1, job -> Group1),
Map(id -> 2, job -> Group2),
Map(id -> 2>1, job -> Person1Group2)
Map(id -> 1>1, job -> Person1Group1),
Map(id -> 1>2, job -> Person2Group1)
)
But I want to sort by id so that I have it in an order so that i can populate a tree structure:
var party = List(
Map(id -> 1, job -> Group1),
Map(id -> 1>1, job -> Person1Group1),
Map(id -> 1>2, job -> Person2Group1),
Map(id -> 2, job -> Group2),
Map(id -> 2>1, job -> Person1Group2)
)
How do I do this?
A minor refactoring of the associations in each Map by using case classes may simplify the subsequent coding; consider
case class Item(id: String, job: String)
and so by using (immutable) values,
val g = groups.map(i => Item(i._1, i._2)
val p = people.map(i => Item(i._1, i._2)
Then
(g ++ p).sortBy(_.id)
brings a list of items sorted by id.
If you wish to group jobs by id, consider
(g ++ p).groupBy(_.id)
which delivers a Map from ids onto lists of items with common id. From this Map you can use mapValues to extract the actual jobs.
as hinted above party.sortBy(_("id")) should do it

Convert SortedMap Data Type

I have a nested map Data Structure of the following type:
SortedMap[Time: Long,SortedMap[Name: String, Value: Double]]
"Time" elements are of type Long and indicate the timestamp of the data.
"Name" elements are of type String and indicate the name of the element.
"Value" elements are of type Double and indicate the elements value for timestamp "Time".
The basic idea is that for each timestamp, we have several elements, each has a specific value for the current timestamp.
The result i want is an Array[Double] or List[Double] for each "Name" element. I don't need the "Time" value except i want the result to be ordered in the same way.
Example:
val dataType = SortedMap(1000L -> SortedMap("component1" -> 1.0,
"component2" -> 1.1), 2000L -> SortedMap("component1" -> 1.1),
3000L -> SortedMap("component1" -> 0.95))
The result i want is the following:
"component1" - 1.0, 1.1, 0.95
"component2" - 1.1
Can anyone please help?
I think the result-type you want is a Map[String, Seq[Double]. I will use a Vector for the sorted values, because it has an efficient append method :+.
First, you want to drop the time-keys of the outer map. You can use values or valuesIterator for that. Then perhaps the easiest is to perform a "fold-left" over these values, starting with an empty result map and updating it at each step. You need two nested folds, because you iterate first over the maps and then over the individual elements of each map.
import scala.collection.immutable.SortedMap
val dataType = SortedMap(
1000L -> SortedMap("component1" -> 1.0, "component2" -> 1.1),
2000L -> SortedMap("component1" -> 1.1),
3000L -> SortedMap("component1" -> 0.95)
)
(Map.empty[String, Vector[Double]] /: dataType.values) { case (res0, map) =>
(res0 /: map) { case (res1, (key, value)) =>
res1.updated(key, res1.getOrElse(key, Vector.empty) :+ value)
}
}
// Map(component1 -> Vector(1.0, 1.1, 0.95), component2 -> Vector(1.1))
Note the fold-left call (init /: coll) …. You can use the alternative method coll.foldLeft(init) … which yields slightly different syntax:
dataType.values.foldLeft(Map.empty[String, Vector[Double]]) { case (res0, map) …
You can swap Map for SortedMap in the result if you want the component names to remain sorted.