Scala - Join List of tuples by Key - scala

I am looking for a way to join two list of tuples in scala to get same result than Apache spark gives me using join function.
Example:
Having two list of tuples such us:
val l1 = List((1,1),(1,2),(2,1),(2,2))
l1: List[(Int, Int)] = List((1,1), (1,2), (2,1), (2,2))
val l2 = List((1,(1,2)), (2,(2,3)))
l2: List[(Int, (Int, Int))] = List((1,(1,2)), (2,(2,3)))
What is the best way to join by key both list to get the following result?
l3: List[(Int,(Int,(Int,Int)))] = ((1,(1,(1,2))),(1,(2,(1,2))),(2,(1,(2,3))),(2,(2,(2,3))))

You can use a for comprehension and take advantage of using the '`' in the pattern matching. That is, it will match only when keys from the first list are the same with the ones in the second list ("`k`" means the key in the tuple must be equal to the value of k).
val res = for {
(k, v1) <- l1
(`k`, v2) <- l2
} yield (k, (v1, v2))
I hope you find this helpful.

You might want do do something like this:
val l3=l1.map(tup1 => l2.filter(tup2 => tup1._1==tup2._1).map(tup2 => (tup1._1, (tup1._2, tup2._2)))).flatten
It Matches the same Indexes, creates sublists and then combines the list of lists with the flatten-command
This results to:
List((1,(1,(1,2))), (1,(2,(1,2))), (2,(1,(2,3))), (2,(2,(2,3))))

Try something like this:
val l2Map = l2.toMap
val l3 = l1.flatMap { case (k, v1) => l2Map.get(k).map(v2 => (k, (v1, v2))) }
what can be rewritten to more general form using implicits:
package some.package
import scala.collection.TraversableLike
import scala.collection.generic.CanBuildFrom
package object collection {
implicit class PairTraversable[K, V, C[A] <: TraversableLike[A, C[A]]](val seq: C[(K, V)]) {
def join[V2, C2[A] <: TraversableLike[A, C2[A]]](other: C2[(K, V2)])
(implicit canBuildFrom: CanBuildFrom[C[(K, V)], (K, (V, V2)), C[(K, (V, V2))]]): C[(K, (V, V2))] = {
val otherMap = other.toMap
seq.flatMap { case (k, v1) => otherMap.get(k).map(v2 => (k, (v1, v2))) }
}
}
}
and then simply:
import some.package.collection.PairTraversable
val l3 = l1.join(l2)
This solution converts second sequence to map (so it consumes some additional memory), but is much faster, than solutions in other answers (compare it for large collections, e.g. 10000 elements, on my laptop it is 5ms vs 2500ms).

Little late. This solution will give you back the original size of l1 and return Option(None) for missing values in l2. (Left join instead of inner join)
val m2 = l2.map{ case(k,v) => (k -> v)}.toMap
val res2 = l1.map { case(k,v) =>
val v2 = m2.get(k)
(k, (v, v2))
}

Related

How do I efficiently find the root in a map which represents a tree in Scala

How do I find the root in a Map which represents a spanning tree in Scala.
Below is an example of a Map which contains a tree.
val l = List((1,List(2,3,4)), (2,List(5,6)), (3,List(7,8,9)))
val m1 = l.groupBy(_._1).map{ case (k, v) => (k, v.map(_._2))}
Here is a one-liner that returns a list of all the nodes that do not appear in any of the lists of leaves.
l.collect{ case (k, _) if !l.exists(_._2.contains(k)) => k }
A more efficient two-line version:
val leaves = l.flatMap(_._2).toSet
l.collect{ case (k, _) if !leaves.contains(k) => k }
Even more efficient:
val leaves: Set[Int] = l.flatMap(_._2)(collection.breakOut)
l.collect{ case (k, _) if !leaves.contains(k) => k }
All of these return List(1) for the sample data.

Scala: Collect values defined in hashmap passed by a list argument

Suppose I have the following variables:
val m = HashMap( ("1", "one"), ("2", "two"), ("3", "three") )
val l = List("1", "2")
I would like to extract the list List("one","two"), which corresponds to the values for each key in the list present in the map.
This is my solution, works like a charm. Still I would like to know if I'm reinventing the wheel and if there's some idiomatic solution for doing what I intend to do:
class Mapper[T,V](val map: HashMap[T,V]) extends PartialFunction[T, V]{
override def isDefinedAt(x: T): Boolean = map.contains(x)
override def apply(x: T): V = map.get(x) match {
case Some(v) => v
}
}
val collected = l collect (new Mapper(map) )
List("one", "two")
Yes, you are reinventing the wheel. Your code is equivalent to
l collect m
but with additional layer of indirection that doesn't add anything to HashMap (which already implements PartialFunction—just expand the "Linear Supertypes" list to see that).
Alternatively, you can also use flatMap as follows:
l flatMap m.get
The implicit CanBuildFroms make sure that the result is actually a List.
You could do this, which seems a bit simpler:
val res = l.map(m.get(_)) // List(Some("one"), Some("two"))
.flatMap(_.toList)
Or even this, using a for-comprehension:
val res = for {
key <- l
value <- m.get(key)
} yield value
I would suggest something like this:
m.collect { case (k, v) if l.contains(k) => v }
note:
does not preserve the order from l
does not handle the case of duplicates in l

How to merge two LinkedHashMaps[Int, ListBuffer[Int]] in Scala?

I found this method:
def merge[K, V](maps: Seq[Map[K, V]])(f: (K, V, V) => V): Map[K, V] = {
maps.foldLeft(Map.empty[K, V]) { case (merged, m) =>
m.foldLeft(merged) { case (acc, (k, v)) =>
acc.get(k) match {
case Some(existing) => acc.updated(k, f(k, existing, v))
case None => acc.updated(k, v)
}
}
}
}
but it gives me a type mismatch error if i use it like this:
val mergeMsg = (map1: LinkedHashMap[Int, ListBuffer[Int]], map2: LinkedHashMap[Int, ListBuffer[Int]]) =>
{
val ms=Seq(map1, map2)
merge(ms.map(_.mapValues(List(_)))){(_, v1, v2) => v1 ++ v2}
}
The error says:
"Type mismatch, expected: mutable.Seq[Mutable.Map[NotInferedK, NotInferedV]], actual: mutable.Seq[Map[Int, List[ListBuffer[Int]]]]"
How can i solve this? I know it's something simple, but i'm new to scala.
The problem is that you are passing in to merge a sequence of mutable LinkedHashMaps. The function requires a sequence of immutable Maps.
You need to convert your LinkedHashMaps to the correct type first. The simplest way to do this is probably to call .toMap before you perform the mapValues.
merge(ms.map(_.toMap.mapValues(List(_)))){(_, v1, v2) => v1 ++ v2}
Update
Alternatively the method signature for Merge can be change to explicitly use scala.collection.Map. It will, by default, use scala.collection.immutable.Map.
def merge[K, V](maps: Seq[scala.collection.Map[K, V]])(f: (K, V, V) => V): scala.collection.Map[K, V]
val mergeMsg = (map1: LinkedHashMap[Int, ListBuffer[Int]],
map2: LinkedHashMap[Int, ListBuffer[Int]]) => {
val ms = Seq (map1.toMap, map2.toMap)
merge (ms) ((_, lb1, lb2) => (lb1 ++ lb2))
}
So the Type needs just to be converted to Map.
The k is not used for the process of updating, so we use _ instead.
The lbs are for ListBuffers.

Efficient and/or idiomatic way to turn Seq[Either[String, Int]] to (Seq[String], Seq[Int])

Slightly simplifying, my problem comes from a list of strings input that I want to parse with a function parse returning Either[String,Int].
Then list.map(parse) returns a list of Eithers. The next step in the program is to format an error message summing up all the errors or passing on the list of parsed integers.
Lets call the solution I'm looking for partitionEithers.
Calling
partitionEithers(List(Left("foo"), Right(1), Left("bar")))
Would give
(List("foo", "bar"),List(1))
Finding something like this in the standard library would be best. Failing that some kind of clean, idiomatic and efficient solution would be best. Also some kind of efficient utility function I could just paste into my projects would be ok.
I was very confused between these 3 earlier questions. As far as I can tell, neither of those questions matches my case, but some answers there seem to contain valid answers to this question.
Scala collections offer a partition function:
val eithers: List[Either[String, Int]] = List(Left("foo"), Right(1), Left("bar"))
eithers.partition(_.isLeft) match {
case (leftList, rightList) =>
(leftList.map(_.left.get), rightList.map(_.right.get))
}
=> res0: (List[String], List[Int]) = (List(foo, bar),List(1))
UPDATE
If you want to wrap it in a (maybe even somewhat type safer) generic function:
def partitionEither[Left : ClassTag, Right : ClassTag](in: List[Either[Left, Right]]): (List[Left], List[Right]) =
in.partition(_.isLeft) match {
case (leftList, rightList) =>
(leftList.collect { case Left(l: Left) => l }, rightList.collect { case Right(r: Right) => r })
}
You could use separate from MonadPlus (scalaz) or MonadCombine (cats) :
import scala.util.{Either, Left, Right}
import scalaz.std.list._
import scalaz.std.either._
import scalaz.syntax.monadPlus._
val l: List[Either[String, Int]] = List(Right(1), Left("error"), Right(2))
l.separate
// (List[String], List[Int]) = (List(error),List(1, 2))
I don't really get the amount of contortions of the other answers. So here is a one liner:
scala> val es:List[Either[Int,String]] =
List(Left(1),Left(2),Right("A"),Right("B"),Left(3),Right("C"))
es: List[Either[Int,String]] = List(Left(1), Left(2), Right(A), Right(B), Left(3), Right(C))
scala> es.foldRight( (List[Int](), List[String]()) ) {
case ( e, (ls, rs) ) => e.fold( l => ( l :: ls, rs), r => ( ls, r :: rs ) )
}
res5: (List[Int], List[String]) = (List(1, 2, 3),List(A, B, C))
Here is an imperative implementation mimicking the style of Scala collection internals.
I wonder if there should something like this in there, since at least I run into this from time to time.
import collection._
import generic._
def partitionEithers[L, R, E, I, CL, CR]
(lrs: I)
(implicit evI: I <:< GenTraversableOnce[E],
evE: E <:< Either[L, R],
cbfl: CanBuildFrom[I, L, CL],
cbfr: CanBuildFrom[I, R, CR])
: (CL, CR) = {
val ls = cbfl()
val rs = cbfr()
ls.sizeHint(lrs.size)
rs.sizeHint(lrs.size)
lrs.foreach { e => evE(e) match {
case Left(l) => ls += l
case Right(r) => rs += r
} }
(ls.result(), rs.result())
}
partitionEithers(List(Left("foo"), Right(1), Left("bar"))) == (List("foo", "bar"), List(1))
partitionEithers(Set(Left("foo"), Right(1), Left("bar"), Right(1))) == (Set("foo", "bar"), Set(1))
You can use foldLeft.
def f(s: Seq[Either[String, Int]]): (Seq[String], Seq[Int]) = {
s.foldRight((Seq[String](), Seq[Int]())) { case (c, r) =>
c match {
case Left(le) => (le +: r._1, r._2)
case Right(ri) => (r._1 , ri +: r._2)
}
}
}
val eithers: List[Either[String, Int]] = List(Left("foo"), Right(1), Left("bar"))
scala> f(eithers)
res0: (Seq[String], Seq[Int]) = (List(foo, bar),List(1))

subsets manipulation on vectors in spark scala

I have an RDD curRdd of the form
res10: org.apache.spark.rdd.RDD[(scala.collection.immutable.Vector[(Int, Int)], Int)] = ShuffledRDD[102]
with curRdd.collect() producing the following result.
Array((Vector((5,2)),1), (Vector((1,1)),2), (Vector((1,1), (5,2)),2))
Here key : vector of pairs of ints and value: count
Now, I want to convert it into another RDD of the same form RDD[(scala.collection.immutable.Vector[(Int, Int)], Int)] by percolating down the counts.
That is (Vector((1,1), (5,2)),2)) will contribute its count of 2 to any key which is a subset of it like (Vector((5,2)),1) becomes (Vector((5,2)),3).
For the example above, our new RDD will have
(Vector((5,2)),3), (Vector((1,1)),4), (Vector((1,1), (5,2)),2)
How do I achieve this? Kindly help.
First you can introduce subsets operation for Seq:
implicit class SubSetsOps[T](val elems: Seq[T]) extends AnyVal {
def subsets: Vector[Seq[T]] = elems match {
case Seq() => Vector(elems)
case elem +: rest => {
val recur = rest.subsets
recur ++ recur.map(elem +: _)
}
}
}
empty subset will allways the be first element in the result vector, so you can omit it with .tail
Now your task is pretty obvious map-reduce which is flatMap-reduceByKey in terms of RDD:
val result = curRdd
.flatMap { case (keys, count) => keys.subsets.tail.map(_ -> count) }
.reduceByKey(_ + _)
Update
This implementation could introduce new sets in the result, if you would like to choose only those that was presented in the original collection, you can join result with original:
val result = curRdd
.flatMap { case (keys, count) => keys.subsets.tail.map(_ -> count) }
.reduceByKey(_ + _)
.join(curRdd map identity[(Seq[(Int, Int)], Int)])
.map { case (key, (v, _)) => (key, v) }
Note that map identity step is needed to convert key type from Vector[_] to Seq[_] in the original RDD. You can instead modify SubSetsOps definition substituting all occurencest of Seq[T] with Vector[T] or change definition following (hardcode scala.collection) way:
import scala.collection.SeqLike
import scala.collection.generic.CanBuildFrom
implicit class SubSetsOps[T, F[e] <: SeqLike[e, F[e]]](val elems: F[T]) extends AnyVal {
def subsets(implicit cbf: CanBuildFrom[F[T], T, F[T]]): Vector[F[T]] = elems match {
case Seq() => Vector(elems)
case elem +: rest => {
val recur = rest.subsets
recur ++ recur.map(elem +: _)
}
}
}