Scala GroupBy elements inside a nested List - scala

Suppose I have some class Info with many attributes, one being categories: Option[List[String]. I have a list of such Info objects, eg: val bigList = (Info1,Info2,Info3).
I would like to sort bigList by each category of each element:
sortedList = List((category1,(Info1,Info2,Info3)),(category2,(Info4,Info4)),...)
and so on.
I've tried using groupBy like such:
val sortedList = bigList.groupBy(_.categories.getOrElse(""))
which doesn't work as certain objects have multiple categories, so it groups objects that share certain categories.
Therefore, is it possible to groupBy each category in the categories attribute?

Sounds like you want to construct a hash map with categories as keys, each entry containing all Info objects that contain that particular category.
You can achieve this by taking the bigList and for each element construct a tuple with category as first element. Then you can group by that element and, to make things a bit cleaner, iterate through values in order to remove the categories (we're iterating through hash map values, so we don't need category here, because it's already present in the corresponding hash map key).
Code:
case class Info(categories: Option[List[String]], id: String)
val info1 = Info(Some(List("A")), "1")
val info2 = Info(Some(List("A", "B")), "2")
val info3 = Info(Some(List("B", "C")), "3")
val bigList = List(info1, info2, info3)
def getCategories(info: Info): List[(String, Info)] =
info.categories.getOrElse(List()).map(_-> info)
val result: Map[String, List[Info]] = bigList
.flatMap(getCategories)
.groupBy(_._1)
.view.mapValues(_.map(_._2))
.toMap
// HashMap(
// A -> List(Info(Some(List(A)),1), Info(Some(List(A, B)),2)),
// B -> List(Info(Some(List(A, B)),2), Info(Some(List(B, C)),3)),
// C -> List(Info(Some(List(B, C)),3))
// )

Related

How can I group by the individual elements of a list of elements in Scala

Forgive me if I'm not naming things by their actual name, I've just started to learn Scala. I've been looking around for a while, but can not find a clear answer to my question.
Suppose I have a list of objects, each object has two fields: x: Int and l: List[String], where the Strings, in my case, represent categories.
The l lists can be of arbitrary length, so an object can belong to multiple categories. Furthermore, various objects can belong to the same category. My goal is to group the objects by the individual categories, where the categories are the keys. This means that if an object is linked to say "N" categories, it will occur in "N" of the key-value pairs.
So far I managed to groupBy the lists of categories through:
objectList.groupBy(x => x.l)
However, this obviously groups the objects by list of categories rather than by categories.
I'm trying to do this with immutable collections avoiding loops etc.
If anyone has some ideas that would be much appreciated!
Thanks
EDIT:
By request the actual case class and what I am trying.
case class Car(make: String, model: String, fuelCapacity: Option[Int], category:Option[List[String]])
Once again, a car can belong to multiple categories. Let's say List("SUV", "offroad", "family").
I want to group by category elements rather than by the whole list of categories, and have the fuelCapacity as the values, in order to be able to extract average fuelCapacity per category amongst other metrics.
Using your EDIT as a guide.
case class Car( make: String
, model: String
, fuelCapacity: Option[Int]
, category:Option[List[String]] )
val cars: List[Car] = ???
//all currently known category strings
val cats: Set[String] = cars.flatMap(_.category).flatten.toSet
//category -> list of cars in this category
val catMap: Map[String,List[Car]] =
cats.map(cat => (cat, cars.filter(_.category.contains(cat)))).toMap
//category -> average fuel capacity for cars in this category
val fcAvg: Map[String,Double] =
catMap.map{case (cat, cars) =>
val fcaps: List[Int] = cars.flatMap(_.fuelCapacity)
if (fcaps.lengthIs < 1) (cat, -1d)
else (cat, fcaps.sum.toDouble / fcaps.length)
}
Something like the following?
objectList // Seq[YourType]
.flatMap(o => o.l.map(c => c -> o)) // Seq[(String, YourType)]
.groupBy { case (c,_) => c } // Map[String,Seq[(String,YourType)]]
.mapValues { items => c -> items.map { case (_, o) => o } } // Map[String, Seq[YourType]]
(Deliberately "heavy" to help you understand the idea behind it)
EDIT, or as of Scala 2.13 thanks to groupMap:
objectList // Seq[YourType]
.flatMap(o => o.l.map(c => c -> o)) // Seq[(String, YourType)]
.groupMap { case (c,_) => c } { case (_, o) => o } // Map[String,Seq[YourType]]
You are very close, you just need to split each individual element in the list before the group so try with something like this:
// I used a Set instead of a List,
// since I don't think the order of categories matters
// as well I would think having two times the same category doesn't make sense.
final case class MyObject(x: Int, categories: Set[String] = Set.empty) {
def addCategory(category: String): MyObject =
this.copy(categories = this.categories + category)
}
def groupByCategories(data: List[MyObject]): Map[String, List[Int]] =
data
.flatMap(o => o.categories.map(c => c -> o.x))
.groupMap(_._1)(_._2)

Scala create immutable nested map

I have a situation here
I have two strins
val keyMap = "anrodiApp,key1;iosApp,key2;xyz,key3"
val tentMap = "androidApp,tenant1; iosApp,tenant1; xyz,tenant2"
So what I want to add is to create a nested immutable nested map like this
tenant1 -> (andoidiApp -> key1, iosApp -> key2),
tenant2 -> (xyz -> key3)
So basically want to group by tenant and create a map of keyMap
Here is what I tried but is done using mutable map which I do want, is there a way to create this using immmutable map
case class TenantSetting() {
val requesterKeyMapping = new mutable.HashMap[String, String]()
}
val requesterKeyMapping = keyMap.split(";")
.map { keyValueList => keyValueList.split(',')
.filter(_.size==2)
.map(keyValuePair => (keyValuePair[0],keyValuePair[1]))
.toMap
}.flatten.toMap
val config = new mutable.HashMap[String, TenantSetting]
tentMap.split(";")
.map { keyValueList => keyValueList.split(',')
.filter(_.size==2)
.map { keyValuePair =>
val requester = keyValuePair[0]
val tenant = keyValuePair[1]
if (!config.contains(tenant)) config.put(tenant, new TenantSetting)
config.get(tenant).get.requesterKeyMapping.put(requester, requesterKeyMapping.get(requester).get)
}
}
The logic to break the strings into a map can be the same for both as it's the same syntax.
What you had for the first string was not quite right as the filter you were applying to each string from the split result and not on the array result itself. Which also showed in that you were using [] on keyValuePair which was of type String and not Array[String] as I think you were expecting. Also you needed a trim in there to cope with the spaces in the second string. You might want to also trim the key and value to avoid other whitespace issues.
Additionally in this case the combination of map and filter can be more succinctly done with collect as shown here:
How to convert an Array to a Tuple?
The use of the pattern with 2 elements ensures you filter out anything with length other than 2 as you wanted.
The iterator is to make the combination of map and collect more efficient by only requiring one iteration of the collection returned from the first split (see comments below).
With both strings turned into a map it just needs the right use of groupByto group the first map by the value of the second based on the same key to get what you wanted. Obviously this only works if the same key is always in the second map.
def toMap(str: String): Map[String, String] =
str
.split(";")
.iterator
.map(_.trim.split(','))
.collect { case Array(key, value) => (key.trim, value.trim) }
.toMap
val keyMap = toMap("androidApp,key1;iosApp,key2;xyz,key3")
val tentMap = toMap("androidApp,tenant1; iosApp,tenant1; xyz,tenant2")
val finalMap = keyMap.groupBy { case (k, _) => tentMap(k) }
Printing out finalMap gives:
Map(tenant2 -> Map(xyz -> key3), tenant1 -> Map(androidApp -> key1, iosApp -> key2))
Which is what you wanted.

Filter a List based on a parameter in scala

I want to filter a list of Subjects inside a list of student based on a particular subject i.e. "maths" in my case.
Below is the code which defines Student and Subject class.
case class Student(
name:String,
age:Int,
subjects:List[Subject]
)
case class Subject(name:String)
val sub1=Subject("maths")
val sub2=Subject("science")
val sub3=Subject("english")
val s1=Student("abc",20,List(sub1,sub2))
val s2=Student("def",20,List(sub3,sub1))
val sList=List(s1,s2)
Expected Output is
list of students(s1,s2) with filtered subjects as explained below
s1 contains Student("abc",20,List(sub1)) and s2 contains Student("def",20,List(sub1)) i.e sub2 and sub3 is filtered out.
I tried below but it didnot worked
val filtered=sList.map(x=>x.subjects.filter(_.name=="maths"))
What you did doesn't work because you turn the list of students into a list of (list of) subjects.
What I do below is keeping each student, but modify their list of subjects
sList.map(student => student.copy(subjects = student.subjects.filter(_.name=="maths")))
If there are students in the list who didn't sign up for the subject in question then I assume you wouldn't want that student in the result list.
val s3=Student("xyz",20,List(sub2,sub3))
val sList=List(s1,s2,s3)
sList.flatMap{s =>
if (s.subjects.contains(sub1)) // if sub1 is in the subjects list
Some(s.copy(subjects = List(sub1))) // drop all others from the list
else
None // no sub1 in subjects list, skip this student
}

scala get first key from seq of map

In scala, I know the mySeq is an array of Map object and the array only has one element. then I want to get first key of this element. Why it doesn't work ? it gave me error: value keySet is not a member of (Int, String)
code:
val mySeq: Seq[(Int, String)] = ...
val aMap = mySeq(0)
val firstKey = aMap.keySet.head
That's actually a Seq of tuples:
val aTuple = mySeq(0)
val firstKey = aTuple._1
To declare a Seq or maps, you'd use:
val mySeq: Seq[Map[Int, String]] = ...
But note that it doesn't make much sense to get the first key of a map, since maps are usually unordered by design.

How to create links between vertices in RDD[(Long, Vertex)] based on a property?

I have a users: RDD[(Long, Vertex)] collection of users. I want to create links between my Vertex objects. The rule is: if two Vertex have the same value in a selected property - call it prop1, then a link exists.
My problem is how to check for every pair in the same collection. If I do:
val rels = users.map(
x => users.map(y => if(x._2.prop1 == y._2.prop1){(x._1, y._1)}))
I got back an RDD[RDD[Any]] and not a RDD[(Long, Long)] as expected for the Graph to work
First of all you cannot start an action of a transformation from an another action or transformation not to mention create nested RDDs. So it is simply impossible you get RDD[RDD[Any]].
What you need here is most likely a simple join roughly equivalent to something like this where T is a type of the property1:
val pairs: RDD[(T, Long)] = users.map{ case (id, v) => (v.prop1, id) }
val links: RDD[(Long, Long)] = pairs
.join(pairs) // join by a common property, equivalent to INNER JOIN in SQL
.values // drop properties
.filter{ case (v1, v2) => v1 != v2 } // filter self-links