Efficiently find common values in a map of lists - scala - scala

I asked a similar question already here. However, I misjudged the scale of my specific case. In my example I gave, there were only 4 keys in the map. I am actually dealing with over 10,000 keys and they are mapped to lists of different sizes. So the solution given was correct, but I am now looking for a way that will do this in a more efficient manner.
Say I have:
val myMap: Map[Int, List[Int]] = Map(
1 -> List(1, 10, 12, 76, 105), 2 -> List(2, 5, 10), 3 -> List(10, 12, 76, 5), 4 -> List(2, 4, 5, 10),
... -> List(...)
)
Imagine the (...) go on for over 10,000 keys. I want to return a List of Lists containing a pair of keys and their shared values if the size of the intersection of their respective lists is >= 3.
For example:
res0: List[(Int, Int, List[Int])] = List(
(1, 3, List(10, 12, 76)),
(2, 4, List(2, 5, 10)),
(...),
(...),
)
I've been pretty stuck on this for a couple of days, so any help is genuinely appreciated. Thank you in advance!

If space is not the concern then the problem can be solved in the O(N) where N is the number of elements in the list.
Algorithm:
Create a reverse lookup map out from the input map. Here reverse lookup maps the list element to the key (Id).
For each input map key
Create a temp map
Iterate over the list and look for value (Id) in the reverse lookup. Count the number of occurred for the fetched id.
All key which occurred equal or more than 3 times is the desired pair.
Code
import scala.collection.mutable
import scala.collection.mutable.ArrayBuffer
object Application extends App {
val inputMap = Map(
1 -> List(1, 2, 3, 4),
2 -> List(2, 3, 4, 5),
3 -> List(3, 5, 6, 7),
4 -> List(1, 2, 3, 6, 7))
/*
Expected pairs
| pair | common elements |
---------------------------
(1, 2) -> 2, 3, 4
(1, 4) -> 2, 3, 4
(2, 1) -> 2, 3, 4
(3, 4) -> 3, 5, 6
(4, 1) -> 1, 2, 3
(4, 3) -> 3, 5, 6
*/
val reverseMap = mutable.Map[Int, ArrayBuffer[Int]]()
inputMap.foreach {
case (id, list) => list.foreach(
o => if (reverseMap.contains(o)) reverseMap(o).append(id) else reverseMap.put(o, ArrayBuffer(id)))
}
val result = inputMap.map {
case (id, list) =>
val m = mutable.Map[Int, Int]()
list.foreach(o =>
reverseMap(o).foreach(k => if (m.contains(k)) m.update(k, m(k)+1) else m.put(k, 1)))
val res = m.toList.filter(o => o._2 >= 3 && o._1 != id).map(o => (id, o._1))
res
}.flatten
println(result)
}

Related

Find common elements in a map of sequences - scala

I have something like this:
val myMap: Map[Int, Seq[Int]] = Map(1 -> (1, 2, 3), 2 -> (2, 3, 4), 3 -> (3, 4, 5), 4 -> (4, 5, 6))
I am trying to find a way to relate all the keys and their common elements in the sequence they are mapped to.
For example:
1 and 2 share (2, 3)
1 and 3 share (3)
2 and 3 share (3, 4)
2 and 4 share (4)
3 and 4 share (4, 5)
I suspect I need to use intersect but I am not sure how to go about the problem. I am brand new to scala and functional programming and need a little help getting started on this. I know there are probably easier ways to do this with spark, however, I am trying to stick just to scala.
Any help is greatly appreciated!
Here's one way using flatMap and collect to generate the shared values from every combination of the key pairs via intersect:
val myMap: Map[Int, List[Int]] = Map(
1 -> List(1, 2, 3), 2 -> List(2, 3, 4), 3 -> List(3, 4, 5), 4 -> List(4, 5, 6)
)
val keys = myMap.keys.toList
keys.flatMap{ i => keys.collect{
case j if j > i => (i, j, myMap(i) intersect myMap(j))
}
}
// res1: List[(Int, Int, List[Int])] = List(
// (1,2,List(2, 3)),
// (1,3,List(3)),
// (1,4,List()),
// (2,3,List(3, 4)),
// (2,4,List(4)),
// (3,4,List(4, 5))
// )
The above is essentially the same as the following for comprehension:
for {
i <- keys
j <- keys
if j > i
} yield (i, j, myMap(i) intersect myMap(j))
How do you want the results returned? Do you just want to print them to STDOUT?
myMap.keys.toList.combinations(2).foreach{ case List(a,b) =>
println(s"$a,$b --> ${myMap(a) intersect myMap(b)}")
}
Pretty similar to #jwvh solution, but with less lookups in the map, in case it is big:
val myMap: Map[Int, Seq[Int]] = Map(1 -> Seq(1, 2, 3), 2 -> Seq(2, 3, 4), 3 -> Seq(3, 4, 5), 4 -> Seq(4, 5, 6))
myMap.toList.combinations(2).foreach {
case List((i1, s1), (i2, s2)) =>
val ints = s1.intersect(s2)
if (ints.nonEmpty) {
println(s"$i1 and $i2 share (${ints.mkString(", ")})")
}
case _ => ???
}
Code run at Scastie.

How to find unique elements from list of list of string based on some elements using scala?

I want to filter the list of list based on few elements in it.
pritnln("databind = "datamap)
dataBind = List((List(3,60,90,T3,T6),List(3,90,89,T32,T5),List(3,60,90,T5,T6), List(3,120,89,T32,T5))
I want to filter this list[list[string]] based on unique first elements present in each list. If the first three elements are repeating, I don't want to get it.
My expected output
List((List(3,60,90,T3,T6),List(3,90,89,T32,T5),List(3,120,89,T32,T5))
When I checked in some question they have used for list of tuple
databind.groupBy(v => (v._1, v._2, v._3)).keys.ToList
How can I do this for the above mentioned list ?
Assuming your lists always contain more than 3 elements:
scala> val l = List(List(1,2,3,4), List(2,3,4,5), List(1,2,3,5))
l: List[List[Int]] = List(List(1, 2, 3, 4), List(2, 3, 4, 5), List(1, 2, 3, 5))
scala> l.groupBy(_.take(3))
res1: scala.collection.immutable.Map[List[Int],List[List[Int]]] = Map(List(1, 2, 3) -> List(List(1, 2, 3, 4), List(1, 2, 3, 5)), List(2, 3, 4) -> List(List(2, 3, 4, 5)))
It is then up to you what you do with the groups. For example if you only want the Lists with a unique first 3 elements then:
scala> res1.collect{ case (_, List(l)) => l}
res2: scala.collection.immutable.Iterable[List[Int]] = List(List(2, 3, 4, 5))
Since you want to group lists based on first few elements of every item, you can go about doing the following:
scala> val dataBind = List(List(3,60,90,"T3","T6"),List(3,90,89,"T32","T5"),List(3,60,90,"T5","T6"), List(3,120,89,"T32","T5"))
dataBind: List[List[Any]] = List(List(3, 60, 90, T3, T6), List(3, 90, 89, T32, T5), List(3, 60, 90, T5, T6), List(3, 120, 89, T32, T5))
scala> dataBind.groupBy(_.take(3)).mapValues(_.head).values.toList
res8: List[List[Any]] = List(List(3, 120, 89, T32, T5), List(3, 60, 90, T3, T6), List(3, 90, 89, T32, T5))
You can specify the transformation of your choice inside mapValues method to derive the desired result.

Scala -- List into List of Count and List of Element

Let's say I have Scala list like this:
val mylist = List(4,2,5,6,4,4,2,6,5,6,6,2,5,4,4)
How can I transform it into list of count and list of element? For example, I want to convert mylist into:
val count = List(3,5,3,4)
val elements = List(2,4,5,6)
Which means, in mylist, I have three occurrences of 2, five occurrences of 4, etc.
In procedural, this is easy as I can just make two empty lists (for count and elements) and fill them while doing iteration. However, I have no idea on how to achieve this in Scala.
Arguably a shortest version:
val elements = mylist.distinct
val count = elements map (e => mylist.count(_ == e))
Use .groupBy(identity) to create a Map regrouping elements with their occurences:
scala> val mylist = List(4,2,5,6,4,4,2,6,5,6,6,2,5,4,4)
mylist: List[Int] = List(4, 2, 5, 6, 4, 4, 2, 6, 5, 6, 6, 2, 5, 4, 4)
scala> mylist.groupBy(identity)
res0: scala.collection.immutable.Map[Int,List[Int]] = Map(2 -> List(2, 2, 2), 5 -> List(5, 5, 5), 4 -> List(4, 4, 4, 4, 4), 6 -> List(6, 6, 6, 6))
Then you can use .mapValues(_.length) to change the 'value' part of the map to the size of the list:
scala> mylist.groupBy(identity).mapValues(_.length)
res1: scala.collection.immutable.Map[Int,Int] = Map(2 -> 3, 5 -> 3, 4 -> 5, 6 -> 4)
If you want to get 2 lists out of this you can use .unzip, which returns a tuple, the first part being the keys (ie the elements), the second being the values (ie the number of instances of the element in the original list):
scala> val (elements, counts) = mylist.groupBy(identity).mapValues(_.length).unzip
elements: scala.collection.immutable.Iterable[Int] = List(2, 5, 4, 6)
counts: scala.collection.immutable.Iterable[Int] = List(3, 3, 5, 4)
One way would be to use groupBy and then check the size of each "group":
val withSizes = mylist.groupBy(identity).toList.map { case (v, l) => (v, l.size) }
val count = withSizes.map(_._2)
val elements = withSizes.map(_._1)
You can try like this as well alternative way of doing the same.
Step - 1
scala> val mylist = List(4,2,5,6,4,4,2,6,5,6,6,2,5,4,4)
mylist: List[Int] = List(4, 2, 5, 6, 4, 4, 2, 6, 5, 6, 6, 2, 5, 4, 4)
// Use groupBy { x => x } returns a "Map[Int, List[Int]]"
step - 2
scala> mylist.groupBy(x => (x))
res0: scala.collection.immutable.Map[Int,List[Int]] = Map(2 -> List(2, 2, 2), 5 -> List(5, 5, 5), 4 -> List(4, 4, 4, 4, 4), 6 -> List(6, 6, 6, 6))
step - 3
scala> mylist.groupBy(x => (x)).map{case(num,times) =>(num,times.size)}.toList
res1: List[(Int, Int)] = List((2,3), (5,3), (4,5), (6,4))
step -4 - sort by num
scala> mylist.groupBy(x => (x)).map{case(num,times) =>(num,times.size)}.toList.sortBy(_._1)
res2: List[(Int, Int)] = List((2,3), (4,5), (5,3), (6,4))
step -5 - unzip to beak into to list it return tuple
scala> mylist.groupBy(x => (x)).map{case(num,times) =>(num,times.size)}.toList.sortBy(_._1).unzip
res3: (List[Int], List[Int]) = (List(2, 4, 5, 6),List(3, 5, 3, 4))

How do i concat the Int or Long to a List in Scala? [duplicate]

This question already has an answer here:
How to add data to a TrieMap[Long,List[Long]] in Scala
(1 answer)
Closed 5 years ago.
I have this:
val vertexIdListPartitions: TrieMap[Long, List[Long]]
I need to have something like this:
vertexIdListPartitions(0) -> List[2,3,4,5,etc..]
But when I add numbers in the list in this way:
for(c<- 0 to 10)
vertexIdListPartitions.update(0,List(c))
The result is List[10]
How can I concat them?
If I understand your question right, you don't need a for loop:
import scala.collection.concurrent.TrieMap
val vertexIdListPartitions = TrieMap[Long, List[Long]]()
vertexIdListPartitions.update(0, (0L to 10L).toList)
// res1: scala.collection.concurrent.TrieMap[Long,List[Long]] =
// TrieMap(0 -> List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
[UPDATE]
Below is a method for adding or concatenating a key-value tuple to the TrieMap accordingly:
def concatTM( tm: TrieMap[Long, List[Long]], kv: Tuple2[Long, List[Long]] ) =
tm += ( tm.get(kv._1) match {
case Some(l: List[Long]) => (kv._1 -> (l ::: kv._2))
case None => kv
} )
concatTM( vertexIdListPartitions, (1L, List(1L, 2L, 3L)) )
// res2: scala.collection.concurrent.TrieMap[Long,List[Long]] =
// TrieMap(0 -> List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10), 1 -> List(1, 2, 3))
concatTM( vertexIdListPartitions, (0L, List(11L, 12L)) )
// res61: scala.collection.concurrent.TrieMap[Long,List[Long]] =
// TrieMap(0 -> List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), 1 -> List(1, 2, 3))

All possible permutations of values for such a map

Consider such a map:
Map("one" -> Iterable(1,2,3,4), "two" -> Iterable(3,4,5), "three" -> Iterable(1,2))
I want to get a list of all possible permutations of elements under Iterable, one element for each key. For this example, this would be something like:
// first element of "one", first element of "two", first element of "three"
// second element of "one", second element of "two", second element of "three"
// third element of "one", third element of "two", first element of "three"
// etc.
Seq(Iterable(1,3,1), Iterable(2,4,2), Iterable(3,5,1),...)
What would be a good way to accomplish that?
val m = Map("one" -> Iterable(1,2,3,4), "two" -> Iterable(5,6,7), "three" -> Iterable(8,9))
If you want every combination:
for (a <- m("one"); b <- m("two"); c <- m("three")) yield Iterable(a,b,c)
If you want each iterable to march up together, but stop when the shortest is exhuasted:
(m("one"), m("two"), m("three")).zipped.map((a,b,c) => Iterable(a,b,c))
If you want each iterable to wrap around but stop when the longest one has been exhausted:
val maxlen = m.values.map(_.size).max
def icf[A](i: Iterable[A]) = Iterator.continually(i).flatMap(identity).take(maxlen).toList
(icf(m("one")), icf(m("two")), icf(m("three"))).zipped.map((a,b,c) => Iterable(a,b,c))
Edit: If you want arbitrary numbers of input lists, then you're best off with recursive functions. For Cartesian products:
def cart[A](iia: Iterable[Iterable[A]]): List[List[A]] = {
if (iia.isEmpty) List()
else {
val h = iia.head
val t = iia.tail
if (t.isEmpty) h.map(a => List(a)).toList
else h.toList.map(a => cart(t).map(x => a :: x)).flatten
}
}
and to replace zipped you want something like:
def zipper[A](iia: Iterable[Iterable[A]]): List[List[A]] = {
def zipp(iia: Iterable[Iterator[A]], part: List[List[A]] = Nil): List[List[A]] = {
if (iia.isEmpty || !iia.forall(_.hasNext)) part
else zipp(iia, iia.map(_.next).toList :: part)
}
zipp(iia.map(_.iterator))
}
You can try these out with cart(m.values), zipper(m.values), and zipper(m.values.map(icf)).
If you are out for an cartesian product, I have a solution for lists of lists of something.
xproduct (List (List (1, 2, 3, 4), List (3, 4, 5), List (1, 2)))
res3: List[List[_]] = List(List(1, 3, 1), List(2, 3, 1), List(3, 3, 1), List(4, 3, 1), List(1, 3, 2), List(2, 3, 2), List(3, 3, 2), List(4, 3, 2), List(1, 4, 1), List(2, 4, 1), List(3, 4, 1), List(4, 4, 1), List(1, 4, 2), List(2, 4, 2), List(3, 4, 2), List(4, 4, 2), List(1, 5, 1), List(2, 5, 1), List(3, 5, 1), List(4, 5, 1), List(1, 5, 2), List(2, 5, 2), List(3, 5, 2), List(4, 5, 2))
Invoke it with Rex' m:
xproduct (List (m("one").toList, m("two").toList, m("three").toList))
Have a look at this answer. The question is about a fixed number of lists to combine, but some answers address the general case.