Find common elements in a map of sequences - scala - scala

I have something like this:
val myMap: Map[Int, Seq[Int]] = Map(1 -> (1, 2, 3), 2 -> (2, 3, 4), 3 -> (3, 4, 5), 4 -> (4, 5, 6))
I am trying to find a way to relate all the keys and their common elements in the sequence they are mapped to.
For example:
1 and 2 share (2, 3)
1 and 3 share (3)
2 and 3 share (3, 4)
2 and 4 share (4)
3 and 4 share (4, 5)
I suspect I need to use intersect but I am not sure how to go about the problem. I am brand new to scala and functional programming and need a little help getting started on this. I know there are probably easier ways to do this with spark, however, I am trying to stick just to scala.
Any help is greatly appreciated!

Here's one way using flatMap and collect to generate the shared values from every combination of the key pairs via intersect:
val myMap: Map[Int, List[Int]] = Map(
1 -> List(1, 2, 3), 2 -> List(2, 3, 4), 3 -> List(3, 4, 5), 4 -> List(4, 5, 6)
)
val keys = myMap.keys.toList
keys.flatMap{ i => keys.collect{
case j if j > i => (i, j, myMap(i) intersect myMap(j))
}
}
// res1: List[(Int, Int, List[Int])] = List(
// (1,2,List(2, 3)),
// (1,3,List(3)),
// (1,4,List()),
// (2,3,List(3, 4)),
// (2,4,List(4)),
// (3,4,List(4, 5))
// )
The above is essentially the same as the following for comprehension:
for {
i <- keys
j <- keys
if j > i
} yield (i, j, myMap(i) intersect myMap(j))

How do you want the results returned? Do you just want to print them to STDOUT?
myMap.keys.toList.combinations(2).foreach{ case List(a,b) =>
println(s"$a,$b --> ${myMap(a) intersect myMap(b)}")
}

Pretty similar to #jwvh solution, but with less lookups in the map, in case it is big:
val myMap: Map[Int, Seq[Int]] = Map(1 -> Seq(1, 2, 3), 2 -> Seq(2, 3, 4), 3 -> Seq(3, 4, 5), 4 -> Seq(4, 5, 6))
myMap.toList.combinations(2).foreach {
case List((i1, s1), (i2, s2)) =>
val ints = s1.intersect(s2)
if (ints.nonEmpty) {
println(s"$i1 and $i2 share (${ints.mkString(", ")})")
}
case _ => ???
}
Code run at Scastie.

Related

Efficiently find common values in a map of lists - scala

I asked a similar question already here. However, I misjudged the scale of my specific case. In my example I gave, there were only 4 keys in the map. I am actually dealing with over 10,000 keys and they are mapped to lists of different sizes. So the solution given was correct, but I am now looking for a way that will do this in a more efficient manner.
Say I have:
val myMap: Map[Int, List[Int]] = Map(
1 -> List(1, 10, 12, 76, 105), 2 -> List(2, 5, 10), 3 -> List(10, 12, 76, 5), 4 -> List(2, 4, 5, 10),
... -> List(...)
)
Imagine the (...) go on for over 10,000 keys. I want to return a List of Lists containing a pair of keys and their shared values if the size of the intersection of their respective lists is >= 3.
For example:
res0: List[(Int, Int, List[Int])] = List(
(1, 3, List(10, 12, 76)),
(2, 4, List(2, 5, 10)),
(...),
(...),
)
I've been pretty stuck on this for a couple of days, so any help is genuinely appreciated. Thank you in advance!
If space is not the concern then the problem can be solved in the O(N) where N is the number of elements in the list.
Algorithm:
Create a reverse lookup map out from the input map. Here reverse lookup maps the list element to the key (Id).
For each input map key
Create a temp map
Iterate over the list and look for value (Id) in the reverse lookup. Count the number of occurred for the fetched id.
All key which occurred equal or more than 3 times is the desired pair.
Code
import scala.collection.mutable
import scala.collection.mutable.ArrayBuffer
object Application extends App {
val inputMap = Map(
1 -> List(1, 2, 3, 4),
2 -> List(2, 3, 4, 5),
3 -> List(3, 5, 6, 7),
4 -> List(1, 2, 3, 6, 7))
/*
Expected pairs
| pair | common elements |
---------------------------
(1, 2) -> 2, 3, 4
(1, 4) -> 2, 3, 4
(2, 1) -> 2, 3, 4
(3, 4) -> 3, 5, 6
(4, 1) -> 1, 2, 3
(4, 3) -> 3, 5, 6
*/
val reverseMap = mutable.Map[Int, ArrayBuffer[Int]]()
inputMap.foreach {
case (id, list) => list.foreach(
o => if (reverseMap.contains(o)) reverseMap(o).append(id) else reverseMap.put(o, ArrayBuffer(id)))
}
val result = inputMap.map {
case (id, list) =>
val m = mutable.Map[Int, Int]()
list.foreach(o =>
reverseMap(o).foreach(k => if (m.contains(k)) m.update(k, m(k)+1) else m.put(k, 1)))
val res = m.toList.filter(o => o._2 >= 3 && o._1 != id).map(o => (id, o._1))
res
}.flatten
println(result)
}

Scala grouping of Sequence of <Key, Value(Key)> to Map of <Key, Seq(Value)> [duplicate]

I have
val a = List((1,2), (1,3), (3,4), (3,5), (4,5))
I am using A.groupBy(_._1) which is groupBy with the first element. But, it gives me output as
Map(1 -> List((1,2) , (1,3)) , 3 -> List((3,4), (3,5)), 4 -> List((4,5)))
But, I want answer as
Map(1 -> List(2, 3), 3 -> List(4,5) , 4 -> List(5))
So, how can I do this?
You can do that by following up with mapValues (and a map over each value to extract the second element):
scala> a.groupBy(_._1).mapValues(_.map(_._2))
res2: scala.collection.immutable.Map[Int,List[Int]] = Map(4 -> List(5), 1 -> List(2, 3), 3 -> List(4, 5))
Make life easy with pattern match and Map#withDefaultValue:
scala> a.foldLeft(Map.empty[Int, List[Int]].withDefaultValue(Nil)){
case(r, (x, y)) => r.updated(x, r(x):+y)
}
res0: scala.collection.immutable.Map[Int,List[Int]] =
Map(1 -> List(2, 3), 3 -> List(4, 5), 4 -> List(5))
There are two points:
Map#withDefaultValue will get a map with a given default value, then you don't need to check if the map contains a key.
When somewhere in scala expected a function value (x1,x2,..,xn) => y, you can always use a pattern matching case(x1,x2,..,xn) => y here, the compiler will translate it to a function auto. Look into 8.5 Pattern Matching Anonymous Functions for more information.
Sorry for my poor english.
As from Scala 2.13 it would be possible to use groupMap
so you'd be able to write just:
// val list = List((1, 2), (1, 3), (3, 4), (3, 5), (4, 5))
list.groupMap(_._1)(_._2)
// Map(1 -> List(2, 3), 3 -> List(4, 5), 4 -> List(5))
As a variant:
a.foldLeft(Map[Int, List[Int]]()) {case (acc, (a,b)) => acc + (a -> (b::acc.getOrElse(a,List())))}
You can also do it with a foldLeft to have only one iteration.
a.foldLeft(Map.empty[Int, List[Int]])((map, t) =>
if(map.contains(t._1)) map + (t._1 -> (t._2 :: map(t._1)))
else map + (t._1 -> List(t._2)))
scala.collection.immutable.Map[Int,List[Int]] = Map(1 -> List(3, 2), 3 ->
List(5, 4), 4 -> List(5))
If the order of the elements in the lists matters you need to include a reverse.
a.foldLeft(Map.empty[Int, List[Int]])((map, t) =>
if(map.contains(t._1)) (map + (t._1 -> (t._2 :: map(t._1)).reverse))
else map + (t._1 -> List(t._2)))
scala.collection.immutable.Map[Int,List[Int]] = Map(1 -> List(2, 3), 3 ->
List(4, 5), 4 -> List(5))

Scala -- List into List of Count and List of Element

Let's say I have Scala list like this:
val mylist = List(4,2,5,6,4,4,2,6,5,6,6,2,5,4,4)
How can I transform it into list of count and list of element? For example, I want to convert mylist into:
val count = List(3,5,3,4)
val elements = List(2,4,5,6)
Which means, in mylist, I have three occurrences of 2, five occurrences of 4, etc.
In procedural, this is easy as I can just make two empty lists (for count and elements) and fill them while doing iteration. However, I have no idea on how to achieve this in Scala.
Arguably a shortest version:
val elements = mylist.distinct
val count = elements map (e => mylist.count(_ == e))
Use .groupBy(identity) to create a Map regrouping elements with their occurences:
scala> val mylist = List(4,2,5,6,4,4,2,6,5,6,6,2,5,4,4)
mylist: List[Int] = List(4, 2, 5, 6, 4, 4, 2, 6, 5, 6, 6, 2, 5, 4, 4)
scala> mylist.groupBy(identity)
res0: scala.collection.immutable.Map[Int,List[Int]] = Map(2 -> List(2, 2, 2), 5 -> List(5, 5, 5), 4 -> List(4, 4, 4, 4, 4), 6 -> List(6, 6, 6, 6))
Then you can use .mapValues(_.length) to change the 'value' part of the map to the size of the list:
scala> mylist.groupBy(identity).mapValues(_.length)
res1: scala.collection.immutable.Map[Int,Int] = Map(2 -> 3, 5 -> 3, 4 -> 5, 6 -> 4)
If you want to get 2 lists out of this you can use .unzip, which returns a tuple, the first part being the keys (ie the elements), the second being the values (ie the number of instances of the element in the original list):
scala> val (elements, counts) = mylist.groupBy(identity).mapValues(_.length).unzip
elements: scala.collection.immutable.Iterable[Int] = List(2, 5, 4, 6)
counts: scala.collection.immutable.Iterable[Int] = List(3, 3, 5, 4)
One way would be to use groupBy and then check the size of each "group":
val withSizes = mylist.groupBy(identity).toList.map { case (v, l) => (v, l.size) }
val count = withSizes.map(_._2)
val elements = withSizes.map(_._1)
You can try like this as well alternative way of doing the same.
Step - 1
scala> val mylist = List(4,2,5,6,4,4,2,6,5,6,6,2,5,4,4)
mylist: List[Int] = List(4, 2, 5, 6, 4, 4, 2, 6, 5, 6, 6, 2, 5, 4, 4)
// Use groupBy { x => x } returns a "Map[Int, List[Int]]"
step - 2
scala> mylist.groupBy(x => (x))
res0: scala.collection.immutable.Map[Int,List[Int]] = Map(2 -> List(2, 2, 2), 5 -> List(5, 5, 5), 4 -> List(4, 4, 4, 4, 4), 6 -> List(6, 6, 6, 6))
step - 3
scala> mylist.groupBy(x => (x)).map{case(num,times) =>(num,times.size)}.toList
res1: List[(Int, Int)] = List((2,3), (5,3), (4,5), (6,4))
step -4 - sort by num
scala> mylist.groupBy(x => (x)).map{case(num,times) =>(num,times.size)}.toList.sortBy(_._1)
res2: List[(Int, Int)] = List((2,3), (4,5), (5,3), (6,4))
step -5 - unzip to beak into to list it return tuple
scala> mylist.groupBy(x => (x)).map{case(num,times) =>(num,times.size)}.toList.sortBy(_._1).unzip
res3: (List[Int], List[Int]) = (List(2, 4, 5, 6),List(3, 5, 3, 4))

GroupBy in scala

I have
val a = List((1,2), (1,3), (3,4), (3,5), (4,5))
I am using A.groupBy(_._1) which is groupBy with the first element. But, it gives me output as
Map(1 -> List((1,2) , (1,3)) , 3 -> List((3,4), (3,5)), 4 -> List((4,5)))
But, I want answer as
Map(1 -> List(2, 3), 3 -> List(4,5) , 4 -> List(5))
So, how can I do this?
You can do that by following up with mapValues (and a map over each value to extract the second element):
scala> a.groupBy(_._1).mapValues(_.map(_._2))
res2: scala.collection.immutable.Map[Int,List[Int]] = Map(4 -> List(5), 1 -> List(2, 3), 3 -> List(4, 5))
Make life easy with pattern match and Map#withDefaultValue:
scala> a.foldLeft(Map.empty[Int, List[Int]].withDefaultValue(Nil)){
case(r, (x, y)) => r.updated(x, r(x):+y)
}
res0: scala.collection.immutable.Map[Int,List[Int]] =
Map(1 -> List(2, 3), 3 -> List(4, 5), 4 -> List(5))
There are two points:
Map#withDefaultValue will get a map with a given default value, then you don't need to check if the map contains a key.
When somewhere in scala expected a function value (x1,x2,..,xn) => y, you can always use a pattern matching case(x1,x2,..,xn) => y here, the compiler will translate it to a function auto. Look into 8.5 Pattern Matching Anonymous Functions for more information.
Sorry for my poor english.
As from Scala 2.13 it would be possible to use groupMap
so you'd be able to write just:
// val list = List((1, 2), (1, 3), (3, 4), (3, 5), (4, 5))
list.groupMap(_._1)(_._2)
// Map(1 -> List(2, 3), 3 -> List(4, 5), 4 -> List(5))
As a variant:
a.foldLeft(Map[Int, List[Int]]()) {case (acc, (a,b)) => acc + (a -> (b::acc.getOrElse(a,List())))}
You can also do it with a foldLeft to have only one iteration.
a.foldLeft(Map.empty[Int, List[Int]])((map, t) =>
if(map.contains(t._1)) map + (t._1 -> (t._2 :: map(t._1)))
else map + (t._1 -> List(t._2)))
scala.collection.immutable.Map[Int,List[Int]] = Map(1 -> List(3, 2), 3 ->
List(5, 4), 4 -> List(5))
If the order of the elements in the lists matters you need to include a reverse.
a.foldLeft(Map.empty[Int, List[Int]])((map, t) =>
if(map.contains(t._1)) (map + (t._1 -> (t._2 :: map(t._1)).reverse))
else map + (t._1 -> List(t._2)))
scala.collection.immutable.Map[Int,List[Int]] = Map(1 -> List(2, 3), 3 ->
List(4, 5), 4 -> List(5))

How to concatenate lists that are values of map?

Given:
scala> var a = Map.empty[String, List[Int]]
a: scala.collection.immutable.Map[String,List[Int]] = Map()
scala> a += ("AAA" -> List[Int](1,3,4))
scala> a += ("BBB" -> List[Int](4,1,4))
scala> a
res0: scala.collection.immutable.Map[String,List[Int]] = Map(AAA -> List(1, 3, 4), BBB -> List(4, 1, 4))
How to concatenate the values to a single iterable collection (to be sorted)?
List(1, 3, 4, 4, 1, 4)
How should I end this code?
a.values.[???].sorted
You should end it with:
a.values.flatten
Result:
scala> Map("AAA" -> List(1, 3, 4), "BBB" -> List(4, 1, 4))
res50: scala.collection.immutable.Map[String,List[Int]] = Map(AAA -> List(1, 3, 4), BBB -> List(4, 1, 4))
scala> res50.values.flatten
res51: Iterable[Int] = List(1, 3, 4, 4, 1, 4)
Updated:
For your specific case it's:
(for(vs <- a.asScala.values; v <- vs.asScala) yield v.asInstanceOf[TargetType]).sorted
This will work
a.values.flatten
//> res0: Iterable[Int] = List(1, 3, 4, 4, 1, 4)
Consider
a.flatMap(_._2)
which flattens up the second element of each tuple (each value in the map).
Equivalent in this case is also
a.values.flatMap(identity)
My appreciation of all answers I have received. Finally good points led to really working code. Below is real code fragment and x here is org.apache.hadoop.hbase.client.Put which makes all the 'devil in the details'. I needed HBase Put to be converted into list of appropriate data cells (accessible from puts through org.apache.hadoop.hbase.Cell interface) but yet I need disclosure of the fact they are indeed implemented as KeyValue (org.apache.hadoop.hbase.KeyValue).
val a : Put ...
a.getFamilyCellMap.asScala
.flatMap(
_._2.asScala.flatMap(
x => List[KeyValue](x.asInstanceOf[KeyValue]) )
).toList.sorted
Why so complex?
Put is Java type to represent 'write' operation content and we can get its cells only through map of cells families elements of which are lists. Of course they all are Java.
I have only access to interface (Cell) but I need implementation (KeyValue) so downcast is required. I have guarantee nothing else is present.
The most funny thing after all of this I decided to drop standard Put and encapsulate data into different container (which is my custom class) on earlier stage and this made things much more simple.
So more generic answer for this case where a is java.util.Map[?] with values of java.util.List[?] and elements of list are of BaseType but you need `TargetType is probably:
a.asScala.flatMap(
_._2.asScala.flatMap(
x => List[TargetType](x.asInstanceOf[TargetType]) )
).toList.sorted