How to merge two LinkedHashMaps[Int, ListBuffer[Int]] in Scala? - scala

I found this method:
def merge[K, V](maps: Seq[Map[K, V]])(f: (K, V, V) => V): Map[K, V] = {
maps.foldLeft(Map.empty[K, V]) { case (merged, m) =>
m.foldLeft(merged) { case (acc, (k, v)) =>
acc.get(k) match {
case Some(existing) => acc.updated(k, f(k, existing, v))
case None => acc.updated(k, v)
}
}
}
}
but it gives me a type mismatch error if i use it like this:
val mergeMsg = (map1: LinkedHashMap[Int, ListBuffer[Int]], map2: LinkedHashMap[Int, ListBuffer[Int]]) =>
{
val ms=Seq(map1, map2)
merge(ms.map(_.mapValues(List(_)))){(_, v1, v2) => v1 ++ v2}
}
The error says:
"Type mismatch, expected: mutable.Seq[Mutable.Map[NotInferedK, NotInferedV]], actual: mutable.Seq[Map[Int, List[ListBuffer[Int]]]]"
How can i solve this? I know it's something simple, but i'm new to scala.

The problem is that you are passing in to merge a sequence of mutable LinkedHashMaps. The function requires a sequence of immutable Maps.
You need to convert your LinkedHashMaps to the correct type first. The simplest way to do this is probably to call .toMap before you perform the mapValues.
merge(ms.map(_.toMap.mapValues(List(_)))){(_, v1, v2) => v1 ++ v2}
Update
Alternatively the method signature for Merge can be change to explicitly use scala.collection.Map. It will, by default, use scala.collection.immutable.Map.
def merge[K, V](maps: Seq[scala.collection.Map[K, V]])(f: (K, V, V) => V): scala.collection.Map[K, V]

val mergeMsg = (map1: LinkedHashMap[Int, ListBuffer[Int]],
map2: LinkedHashMap[Int, ListBuffer[Int]]) => {
val ms = Seq (map1.toMap, map2.toMap)
merge (ms) ((_, lb1, lb2) => (lb1 ++ lb2))
}
So the Type needs just to be converted to Map.
The k is not used for the process of updating, so we use _ instead.
The lbs are for ListBuffers.

Related

GroupBy with list of keys -- Scala

I'm micro-optimising some code as a challenge.
I have a list of objects with a list of keys in each of them.
What's the most efficient way of grouping them by key, with each object being in every group of which it has a key.
This is what I have, but I have a feeling it can be improved.
I have many objects (100k+), each has ~2 keys, and there's less than 50 possible keys.
I've tried parallelising the list with listOfObjs.par, but there doesn't seem to be much of an improvement overall.
case class Obj(value: Option[Int], key: Option[List[String]])
listOfObjs
.filter(x => x.key.isDefined && x.value.isDefined)
.flatMap(x => x.key.get.map((_, x.value.get)))
.groupBy(_._1)
If you have that many object, the logical next step would be to distribute the work by using a MapReduce framework. At the end of the day you still need to go over every single object to determine the group it belongs in and your worst case will be bottlenecked by that.
The best you can do here is to replace these 3 operations by a fold so you only iterate through the collection once.
Edit: Updated the order based on Luis' recommendation in the comments
listOfObj.foldLeft(Map.empty[String, List[Int]]){ (acc, obj) =>
(obj.key, obj.value) match {
case (Some(k), Some(v)) =>
k.foldLeft(acc)((a, ky) => a + (ky -> {v +: a.getOrElse(ky, List.empty)}))))
case _ => acc
}
}
I got the impression you are looking for a fast alternative; thus a little bit of encapsulated mutability can help.
So, what about something like this:
def groupObjectsByKey(objects: List[Obj]): Map[String, List[Int]] = {
val iter =
objects.iterator.flatMap {
case Obj(Some(value), Some(keys)) =>
keys.iterator.map(key => key -> value)
case _ =>
Iterator.empty[(String, Int)]
}
val m =
mutable
.Map
.empty[String, mutable.Builder[Int, List[Int]]
iter.foreach {
case (k, v) =>
m.get(key = k) match {
case Some(builder) =>
builder.addOne(v)
case None =>
m.update(key = k, value = List.newBuilder[Int].addOne(v))
}
}
immutable
.Map
.mapFactory[String, List[Int]]
.fromSpecific(m.view.mapValues(_.result()))
}
Or if you don't care about the order of the elements of each group we can simplify and speed up the code a lot:
def groupObjectsByKey(objects: List[Obj]): Map[String, List[Int]] = {
val iter = objects.iterator.flatMap {
case Obj(Some(value), Some(keys)) =>
keys.iterator.map(key => key -> value)
case _ =>
Iterator.empty[(String, Int)]
}
val m = mutable.Map.empty[String, List[Int]]
iter.foreach {
case (k, v) =>
m.updateWith(key = k) match {
case Some(list) =>
Some(v :: list)
case None =>
Some(v :: Nil)
}
}
m.to(immutable.Map)
}

Get rid of Option [T] when get values from Map[T, T]

I am writing my own Semigroup for Map [T, T]. The logic of the function is as follows:
in case of the same key the values should be combined in the result map.
in case of different keys the values should be added to the result map.
I wrote this function, but I ran into a problem: the get(key) method returns not just T, but Option[T].
What are the ways to solve this problem?
trait Semigroup[T]:
extension (left: T) def combine(right: T): T
given Semigroup[Int] with
extension (left: Int) def combine(right: Int): Int = ???
given Semigroup[String] with
extension (left: String) def combine(right: String): String = ???
given [T]: Semigroup[List[T]] with
extension (left: List[T]) def combine(right: List[T]): List[T] = ???
given [T: Semigroup]: Semigroup[Map[String, T]] with
extension (left: Map[String, T])
def combine(right: Map[String, T]): Map[String, T] = {
(left.keySet ++ right.keySet) map { key => key -> (summon[Semigroup[T]].combine(left.get(key)) { right.get(key)} ) } toMap
}
You do not want to remove the Option, it is what will tell you if you need to combine the values because the key is present in both Maps; or otherwise, just preserve the value.
(left.keySet | right.keySet).iterator.map { key =>
val value = (left.get(key), right.get(key)) match {
case (Some(v1), Some(v2)) => v1.combine(v2)
case (None, Some(v)) => v
case (Some(v), None) => v
case (None, None) => ??? // Should never happend.
}
key -> value
}.toMap
See the code running here.
Option is a good thing :) It protects you from crashing when key is not in a map.
(left.keySet ++ right.keySet)
.map { key => key -> (left.get(key), right.get(key)) }
.flatMap { case (k, (l, r) =>
l.zip(r)
.map { case (x,y) => summon[Semigroup[T]].combine(x)(y)))
.orElse(l)
.orElse(r)
.map(key -> _)
}.toMap
Or, maybe make a special semigroup for options (I am not sure about scala3 syntax, but something like this I think):
given [T : Semigroup]: Semigroup[Option[T]] with
extension (left: Option[T]) def combine(right: Option[T]): Option[T] =
left.zip(right).map { case(l,r) => summon[Semigroup[T]].combine(l)(r) }
.orElse(l)
.orElse(r)
Then you can just write
(left.keySet ++ right.keySet).map { key =>
key -> summon[Semigroup[Option[T]].combine(left.get(key))(right.get(key))
}
I just thought about a better option :) You should not have to deal with merging map keys explicitly ...
(left.toSeq ++ right.toSeq)
.groupMapReduce(_._1)(_._2)(summon[Semigroup[T]].combine(_)(_))
groupMapReduce fееls a little like black magic, but it's basically a combination of groupBy(_.Ь1), mapValues(_.map(_._2)) (dropping the key from grouped lists), and reduce(combine) (combining the grouped values in the the list.

Scala - Join List of tuples by Key

I am looking for a way to join two list of tuples in scala to get same result than Apache spark gives me using join function.
Example:
Having two list of tuples such us:
val l1 = List((1,1),(1,2),(2,1),(2,2))
l1: List[(Int, Int)] = List((1,1), (1,2), (2,1), (2,2))
val l2 = List((1,(1,2)), (2,(2,3)))
l2: List[(Int, (Int, Int))] = List((1,(1,2)), (2,(2,3)))
What is the best way to join by key both list to get the following result?
l3: List[(Int,(Int,(Int,Int)))] = ((1,(1,(1,2))),(1,(2,(1,2))),(2,(1,(2,3))),(2,(2,(2,3))))
You can use a for comprehension and take advantage of using the '`' in the pattern matching. That is, it will match only when keys from the first list are the same with the ones in the second list ("`k`" means the key in the tuple must be equal to the value of k).
val res = for {
(k, v1) <- l1
(`k`, v2) <- l2
} yield (k, (v1, v2))
I hope you find this helpful.
You might want do do something like this:
val l3=l1.map(tup1 => l2.filter(tup2 => tup1._1==tup2._1).map(tup2 => (tup1._1, (tup1._2, tup2._2)))).flatten
It Matches the same Indexes, creates sublists and then combines the list of lists with the flatten-command
This results to:
List((1,(1,(1,2))), (1,(2,(1,2))), (2,(1,(2,3))), (2,(2,(2,3))))
Try something like this:
val l2Map = l2.toMap
val l3 = l1.flatMap { case (k, v1) => l2Map.get(k).map(v2 => (k, (v1, v2))) }
what can be rewritten to more general form using implicits:
package some.package
import scala.collection.TraversableLike
import scala.collection.generic.CanBuildFrom
package object collection {
implicit class PairTraversable[K, V, C[A] <: TraversableLike[A, C[A]]](val seq: C[(K, V)]) {
def join[V2, C2[A] <: TraversableLike[A, C2[A]]](other: C2[(K, V2)])
(implicit canBuildFrom: CanBuildFrom[C[(K, V)], (K, (V, V2)), C[(K, (V, V2))]]): C[(K, (V, V2))] = {
val otherMap = other.toMap
seq.flatMap { case (k, v1) => otherMap.get(k).map(v2 => (k, (v1, v2))) }
}
}
}
and then simply:
import some.package.collection.PairTraversable
val l3 = l1.join(l2)
This solution converts second sequence to map (so it consumes some additional memory), but is much faster, than solutions in other answers (compare it for large collections, e.g. 10000 elements, on my laptop it is 5ms vs 2500ms).
Little late. This solution will give you back the original size of l1 and return Option(None) for missing values in l2. (Left join instead of inner join)
val m2 = l2.map{ case(k,v) => (k -> v)}.toMap
val res2 = l1.map { case(k,v) =>
val v2 = m2.get(k)
(k, (v, v2))
}

A method that accepts both mutable and immutable map in Scala

How can I change the signature of this method to a method that accepts both mutable and immutable map?
def - [A <: BothType] (o: A): ResourceHashMap = {
o.forall {
case (k, v) => this.contains(k) && this(k) >= v
} match {
case true => map {
case (k, v) => k -> (v - o.getOrElse(k, 0))
}
case _ => null
}
}
I know I can use Map trait, but it has not foreall and getOrElse method
What you call BothType is actually a scala.collection.Map. Try importing it and using then.

Better way of converting a Map[K, Option[V]] to a Map[K,V]

I have some code that is producing a Map where the values are Option types, and I really of course want a map containing only the real values.
So I need to convert this, and what I've come up with in code is
def toMap[K,V](input: Map[K, Option[V]]): Map[K, V] = {
var result: Map[K, V] = Map()
input.foreach({
s: Tuple2[K, Option[V]] => {
s match {
case (key, Some(value)) => {
result += ((key, value))
}
case _ => {
// Don't add the None values
}
}
}
})
result
}
which works, but seems inelegant. I suspect there's something for this built into the collections library that I'm missing.
Is there something built in, or a more idiomatic way to accomplish this?
input.collect{case (k, Some(v)) => (k,v)}
input flatMap {case(k,ov) => ov map {v => (k, v)}}
for ((k, Some(v)) <- input) yield (k, v)
It's franza's answer from a later question, but it deserves a re-post here.