Scala: How to merge lists by the first element of the tuple - scala

Let say I have a list:
[(A, a), (A, b), (A, c), (B, a), (B, d)]
How do I make that list into:
[(A, [a,b,c]), (B, [a,d])]
with a single function?
Thanks

The groupBy function allows you to achieve this:
scala> val list = List((1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), (2, 'd'))
list: List[(Int, Char)] = List((1,a), (1,b), (1,c), (2,a), (2,d))
scala> list.groupBy(_._1) // grouping by the first item in the tuple
res0: scala.collection.immutable.Map[Int,List[(Int, Char)]] = Map(2 -> List((2,a), (2,d)), 1 -> List((1,a), (1,b), (1,c)))

Just doing groupBy won't give you the expected format you desire. So i suggest you write a custom method for this.
def groupTuples[A,B](seq: Seq[(A,B)]): List[(A, List[B])] = {
seq.
groupBy(_._1).
mapValues(_.map(_._2).toList).toList
}
Then then invoke it to get the desired result.
val t = Seq((1,"I"),(1,"AM"),(1, "Koby"),(2,"UP"),(2,"UP"),(2,"AND"),(2,"AWAY"))
groupTuples[Int, String](t)

Related

spark iterate consecutive elements in list to rows

I have a spark rdd with a column like
List(1, 3, 4, 8)
List(2, 3)
List(1, 5, 6)
I would like to get a new rdd with consecutive elements in each list to rows, like
(1, 3)
(3, 4)
(4, 8)
(2, 3)
(1, 5)
(5, 6)
How can I achieve this with scala?
Consider:
using a complementary (plain Scala) function with signature List[Int] => List[(Int, Int)] to achieve the desired result for the single list
and
passing this function to your RDD's flatMap method.
This complementary function may look like this:
def makeTuples(l: List[Int],
acc: List[(Int, Int)] = List.empty): List[(Int, Int)] =
l match {
case Nil | _ :: Nil => acc.reverse
case a :: b :: rest => makeTuples(b :: rest, (a, b) :: acc)
}

scala combining multiple sequences

I have a couple of lists:
val aa = Seq(1,2,3,4)
val bb = Seq(Seq(2.0,3.0,4.0,5.0), Seq(1.0,2.0,3.0,4.0))
val cc = Seq("a", "B")
And want to combine them in the desired format of:
(1, 2.0, a), (2, 3.0, a), (3, 4.0, a), (4, 5.0, a), (1, 1.0, b), (2, 2.0, b), (3, 3.0, b), (4, 4.0, b)
but my combination of zip and flatMap
(aa, bb,cc).zipped.flatMap{
case (a, b,c) => {
b.map(b1 => (a,b1,c))
}
}
is only producing
(1,2.0,a), (1,3.0,a), (1,4.0,a), (1,5.0,a), (2,1.0,B), (2,2.0,B), (2,3.0,B), (2,4.0,B)
In java I would just iterate for over bb and then again in a nested loop iterate over the values.
What do I need to change to get the data in the desired format using neat functional scala?
How about this:
for {
(bs, c) <- bb zip cc
(a, b) <- aa zip bs
} yield (a, b, c)
Produces:
List(
(1,2.0,a), (2,3.0,a), (3,4.0,a), (4,5.0,a),
(1,1.0,b), (2,2.0,b), (3,3.0,b), (4,4.0,b)
)
I doubt this could be made any more neat & functional.
Not exactly pretty to read but here is an option:
bb
.map(b => aa.zip(b)) // List(List((1,2.0), (2,3.0), (3,4.0), (4,5.0)), List((1,1.0), (2,2.0), (3,3.0), (4,4.0)))
.zip(cc) // List((List((1,2.0), (2,3.0), (3,4.0), (4,5.0)),a), (List((1,1.0), (2,2.0), (3,3.0), (4,4.0)),B))
.flatMap{ case (l, c) => l.map(t => (t._1, t._2, c)) } // List((1,2.0,a), (2,3.0,a), (3,4.0,a), (4,5.0,a), (1,1.0,B), (2,2.0,B), (3,3.0,B), (4,4.0,B))
Another approach using collect and map
scala> val result = bb.zip(cc).collect{
case bc => (aa.zip(bc._1).map(e => (e._1,e._2, bc._2)))
}.flatten
result: Seq[(Int, Double, String)] = List((1,2.0,a), (2,3.0,a), (3,4.0,a), (4,5.0,a), (1,1.0,B), (2,2.0,B), (3,3.0,B), (4,4.0,B))

How to merge two Seq[String], Seq[Double] to Seq[(String,Double)]

I have two Seq.
1 has Seq[String] and another has Seq[(String,Double)]
a -> ["a","b","c"] and
b-> [1,2,3]
I want to create output as
[("a",1),("b",2),("c",3)]
I have a code
a.zip(b) is actually creating a seq of those two elements instead of creating a map
Can anyone suggest how to do that in scala?
you simply need .toMap so that you can transform List[Tuple[String, Int]] to Map[String, Int]
scala> val seq1 = List("a", "b", "c")
seq1: List[String] = List(a, b, c)
scala> val seq2 = List(1, 2, 3)
seq2: List[Int] = List(1, 2, 3)
scala> seq1.zip(seq2)
res0: List[(String, Int)] = List((a,1), (b,2), (c,3))
scala> seq1.zip(seq2).toMap
res1: scala.collection.immutable.Map[String,Int] = Map(a -> 1, b -> 2, c -> 3)
also see
How to convert a Seq[A] to a Map[Int, A] using a value of A as the key in the map?

How to make a scala collection contain unique elements? ("unique" defined)

Say I have a list as follows:
val l = List( (1, 2, "hi"), (1, 3, "hello"), (2, 3, "world"), (1, 2, "hello") )
I want to make the elements of l distinct ignoring the 3rd element of the tuple. That is, two elements of l are considered same if their first two components are same.
So makeDistinct(l) should return
List( (1, 2, "hi"), (1, 3, "hello"), (2, 3, "world") )
What is the most Scala-like and generic way to do implement makeDistinct
EDIT: We are free to choose which to drop, and ordering need not be preserved.
If you want to do this with lists, use groupBy:
l.groupBy(x => (x._1, x._2)).map(kv => kv._2.head).toList
If you really want to be generic for all collection types:
scala> import scala.collection.generic.CanBuildFrom
import scala.collection.generic.CanBuildFrom
scala> def distinct[A, B, C, CC[X] <: Traversable[X]](xs: CC[(A, B, C)])(implicit cbf: CanBuildFrom[Nothing, (A, B, C), CC[(A, B, C)]]): CC[(A, B, C)] = xs.groupBy(x => (x._1, x._2)).map(kv => kv._2.head).to[CC]
warning: there were 1 feature warnings; re-run with -feature for details
distinct: [A, B, C, CC[X] <: Traversable[X]](xs: CC[(A, B, C)])(implicit cbf: scala.collection.generic.CanBuildFrom[Nothing,(A, B, C),CC[(A, B, C)]])CC[(A, B, C)]
scala> distinct(List((1, 2, "ok"), (1, 3, "ee"), (1, 2, "notok")))
res0: List[(Int, Int, String)] = List((1,3,ee), (1,2,ok))
You can use Ordering:
scala> SortedSet(l: _*)(Ordering[(Int, Int)].on(x => (x._1, x._2))).toList
res33: List[(Int, Int, String)] = List((1,2,hello), (1,3,hello), (2,3,world))
The only problem is that the last found element is preserved. For the first one you need to reverse the list:
scala> SortedSet(l.reverse: _*)(Ordering[(Int, Int)].on(x => (x._1, x._2))).toList
res34: List[(Int, Int, String)] = List((1,2,hi), (1,3,hello), (2,3,world))
The reverse is not optimal but maybe it is possible to create the list directly in reversed order, which would avoid the construction of an unnecessary intermediate list.

Scala: using foldl to add pairs from list to a map?

I am trying to add pairs from list to a map using foldl. I get the following error:
"missing arguments for method /: in trait TraversableOnce; follow this method with `_' if you want to treat it as a partially applied function"
code:
val pairs = List(("a", 1), ("a", 2), ("c", 3), ("d", 4))
def lstToMap(lst:List[(String,Int)], map: Map[String, Int] ) = {
(map /: lst) addToMap ( _, _)
}
def addToMap(pair: (String, Int), map: Map[String, Int]): Map[String, Int] = {
map + (pair._1 -> pair._2)
}
What is wrong?
scala> val pairs = List(("a", 1), ("a", 2), ("c", 3), ("d", 4))
pairs: List[(String, Int)] = List((a,1), (a,2), (c,3), (d,4))
scala> (Map.empty[String, Int] /: pairs)(_ + _)
res9: scala.collection.immutable.Map[String,Int] = Map(a -> 2, c -> 3, d -> 4)
But you know, you could just do:
scala> pairs.toMap
res10: scala.collection.immutable.Map[String,Int] = Map(a -> 2, c -> 3, d -> 4)
You need to swap the input values of addToMap and put it in parenthesis for this to work:
def addToMap( map: Map[String, Int], pair: (String, Int)): Map[String, Int] = {
map + (pair._1 -> pair._2)
}
def lstToMap(lst:List[(String,Int)], map: Map[String, Int] ) = {
(map /: lst)(addToMap)
}
missingfaktor's answer is much more concise, reusable, and scala-like.
If you already have a collection of Tuple2s, you don't need to implement this yourself, there is already a toMap method that only works if the elements are tuples!
The full signature is:
def toMap[T, U](implicit ev: <:<[A, (T, U)]): Map[T, U]
It works by requiring an implicit A <:< (T, U) which is essentially a function that can take the element type A and cast/convert it to tuples of type (T, U). Another way of saying this is that it requires an implicit witness that A is-a (T, U). Therefore, this is completely type-safe.
Update: which is what #missingfaktor said
This is not a direct answer to the question, which is about folding correctly on the map, but I deem it important to emphasize that
a Map can be treated as a generic Traversable of pairs
and you can easily combine the two!
scala> val pairs = List(("a", 1), ("a", 2), ("c", 3), ("d", 4))
pairs: List[(String, Int)] = List((a,1), (a,2), (c,3), (d,4))
scala> Map.empty[String, Int] ++ pairs
res1: scala.collection.immutable.Map[String,Int] = Map(a -> 2, c -> 3, d -> 4)
scala> pairs.toMap
res2: scala.collection.immutable.Map[String,Int] = Map(a -> 2, c -> 3, d -> 4)