How to merge two Seq[String], Seq[Double] to Seq[(String,Double)] - scala

I have two Seq.
1 has Seq[String] and another has Seq[(String,Double)]
a -> ["a","b","c"] and
b-> [1,2,3]
I want to create output as
[("a",1),("b",2),("c",3)]
I have a code
a.zip(b) is actually creating a seq of those two elements instead of creating a map
Can anyone suggest how to do that in scala?

you simply need .toMap so that you can transform List[Tuple[String, Int]] to Map[String, Int]
scala> val seq1 = List("a", "b", "c")
seq1: List[String] = List(a, b, c)
scala> val seq2 = List(1, 2, 3)
seq2: List[Int] = List(1, 2, 3)
scala> seq1.zip(seq2)
res0: List[(String, Int)] = List((a,1), (b,2), (c,3))
scala> seq1.zip(seq2).toMap
res1: scala.collection.immutable.Map[String,Int] = Map(a -> 1, b -> 2, c -> 3)
also see
How to convert a Seq[A] to a Map[Int, A] using a value of A as the key in the map?

Related

In Scala, what does x=> x._1._1 denotes

In the following snippet of code, I am aware that x._1 denotes the first element of the tuple, but I couldn't understand what x._1._1 represents.I am not so familiar with Scala, sorry if it is a relatively naive question, thank you!!
val a = b.groupBy(x=> x._1._1)
Here is a quick example in the REPL of a nested tuple
scala> val t = ((1, 2), 3)
t: ((Int, Int), Int) = ((1,2),3)
scala> t._1 // Get the first part of the tuple
res0: (Int, Int) = (1,2)
scala> t._2 // Get the second part of the tuple
res1: Int = 3
scala> t._1._1 // Get the first part of the first part
res2: Int = 1
And here is an example with a sequence to demonstrate the groupBy:
scala> val s = Seq(((1, 2), 3), ((1, 5), 6), ((2, 4), 32))
s: Seq[((Int, Int), Int)] = List(((1,2),3), ((1,5),6), ((2,4),32))
scala> s.groupBy
def groupBy[K](f: (((Int, Int), Int)) => K): scala.collection.immutable.Map[K,Seq[((Int, Int), Int)]]
scala> s.groupBy(x => x._1._1)
res3: scala.collection.immutable.Map[Int,Seq[((Int, Int), Int)]] = Map(2 -> List(((2,4),32)), 1 -> List(((1,2),3), ((1,5),6)))
In this case the first element of the first element are the target for the grouping. Here's the result in an easier to look at format:
Map(
2 -> List(
((2,4),32)),
1 -> List(
((1,2),3),
((1,5),6))
)
It means x._1 itself is a tuple.
Example:
val b = Seq((("subTuple_1", "subTuple_2"), "tuple_2"))
val a = b.groupBy(x=> x._1._1)
As you mentioned ._1 gives you the first column of your tuple, and if the result of first column is Tuple, you can do ._1.
eg.
scala> Map(("a" -> "b") -> 100, ("c" -> "d") -> 200).map(_._1)
res31: scala.collection.immutable.Map[String,String] = Map(a -> b, c -> d)
scala> Map(("a" -> "b") -> 100, ("c" -> "d") -> 200).map(_._1._1)
res32: scala.collection.immutable.Iterable[String] = List(a, c)
groupBy,
scala> Map(("a" -> "b") -> 100, ("a" -> "c") -> 200).groupBy(_._1._1)
res19: scala.collection.immutable.Map[String,scala.collection.immutable.Map[(String, String),Int]] = Map(a -> Map((a,b) -> 100, (a,c) -> 200))

Error: type mismatch flatMap

I am new to spark programming and scala and i am not able to understand the difference between map and flatMap.
I tried below code as i was expecting both to work but got error.
scala> val b = List("1","2", "4", "5")
b: List[String] = List(1, 2, 4, 5)
scala> b.map(x => (x,1))
res2: List[(String, Int)] = List((1,1), (2,1), (4,1), (5,1))
scala> b.flatMap(x => (x,1))
<console>:28: error: type mismatch;
found : (String, Int)
required: scala.collection.GenTraversableOnce[?]
b.flatMap(x => (x,1))
As per my understanding flatmap make Rdd in to collection for String/Int Rdd.
I was thinking that in this case both should work without any error.Please let me know where i am making the mistake.
Thanks
You need to look at how the signatures defined these methods:
def map[U: ClassTag](f: T => U): RDD[U]
map takes a function from type T to type U and returns an RDD[U].
On the other hand, flatMap:
def flatMap[U: ClassTag](f: T => TraversableOnce[U]): RDD[U]
Expects a function taking type T to a TraversableOnce[U], which is a trait Tuple2 doesn't implement, and returns an RDD[U]. Generally, you use flatMap when you want to flatten a collection of collections, i.e. if you had an RDD[List[List[Int]] and you want to produce a RDD[List[Int]] you can flatMap it using identity.
map(func) Return a new distributed dataset formed by passing each element of the source through a function func.
flatMap(func) Similar to map, but each input item can be mapped to 0 or more output items (so func should return a Seq rather than a single item).
The following example might be helpful.
scala> val b = List("1", "2", "4", "5")
b: List[String] = List(1, 2, 4, 5)
scala> b.map(x=>Set(x,1))
res69: List[scala.collection.immutable.Set[Any]] =
List(Set(1, 1), Set(2, 1), Set(4, 1), Set(5, 1))
scala> b.flatMap(x=>Set(x,1))
res70: List[Any] = List(1, 1, 2, 1, 4, 1, 5, 1)
scala> b.flatMap(x=>List(x,1))
res71: List[Any] = List(1, 1, 2, 1, 4, 1, 5, 1)
scala> b.flatMap(x=>List(x+1))
res75: scala.collection.immutable.Set[String] = List(11, 21, 41, 51) // concat
scala> val x = sc.parallelize(List("aa bb cc dd", "ee ff gg hh"), 2)
scala> val y = x.map(x => x.split(" ")) // split(" ") returns an array of words
scala> y.collect
res0: Array[Array[String]] = Array(Array(aa, bb, cc, dd), Array(ee, ff, gg, hh))
scala> val y = x.flatMap(x => x.split(" "))
scala> y.collect
res1: Array[String] = Array(aa, bb, cc, dd, ee, ff, gg, hh)
Map operation return type is U where as flatMap return type is TraversableOnce[U](means collections)
val b = List("1", "2", "4", "5")
val mapRDD = b.map { input => (input, 1) }
mapRDD.foreach(f => println(f._1 + " " + f._2))
val flatmapRDD = b.flatMap { input => List((input, 1)) }
flatmapRDD.foreach(f => println(f._1 + " " + f._2))
map does a 1-to-1 transformation, while flatMap converts a list of lists to a single list:
scala> val b = List(List(1,2,3), List(4,5,6), List(7,8,90))
b: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6), List(7, 8, 90))
scala> b.map(x => (x,1))
res1: List[(List[Int], Int)] = List((List(1, 2, 3),1), (List(4, 5, 6),1), (List(7, 8, 90),1))
scala> b.flatMap(x => x)
res2: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 90)
Also, flatMap is useful for filtering out None values if you have a list of Options:
scala> val c = List(Some(1), Some(2), None, Some(3), Some(4), None)
c: List[Option[Int]] = List(Some(1), Some(2), None, Some(3), Some(4), None)
scala> c.flatMap(x => x)
res3: List[Int] = List(1, 2, 3, 4)

Add list of tuples of integers in Scala

I wish to add a list of tuples of integers i.e. given an input list of tuples of arity k, produce a tuple of arity k whose fields are sums of corresponding fields of the tuples in the list.
Input
List( (1,2,3), (2,3,-3), (1,1,1))
Output
(4, 6, 1)
I was trying to use foldLeft, but I am not able to get it to compile. Right now, I am using a for loop, but I was looking for a more concise solution.
This can be done type safely and very concisely using shapeless,
scala> import shapeless._, syntax.std.tuple._
import shapeless._
import syntax.std.tuple._
scala> val l = List((1, 2, 3), (2, 3, -1), (1, 1, 1))
l: List[(Int, Int, Int)] = List((1,2,3), (2,3,-1), (1,1,1))
scala> l.map(_.toList).transpose.map(_.sum)
res0: List[Int] = List(4, 6, 3)
Notice that unlike solutions which rely on casts, this approach is type safe, and any type errors are detected at compile time rather than at runtime,
scala> val l = List((1, 2, 3), (2, "foo", -1), (1, 1, 1))
l: List[(Int, Any, Int)] = List((1,2,3), (2,foo,-1), (1,1,1))
scala> l.map(_.toList).transpose.map(_.sum)
<console>:15: error: could not find implicit value for parameter num: Numeric[Any]
l.map(_.toList).transpose.map(_.sum)
^
scala> val tuples = List( (1,2,3), (2,3,-3), (1,1,1))
tuples: List[(Int, Int, Int)] = List((1,2,3), (2,3,-3), (1,1,1))
scala> tuples.map(t => t.productIterator.toList.map(_.asInstanceOf[Int])).transpose.map(_.sum)
res0: List[Int] = List(4, 6, 1)
Type information is lost when calling productIterator on Tuple3 so you have to convert from Any back to an Int.
If the tuples are always going to contain the same type I would suggest using another collection such as List. The Tuple is better suited for disparate types. When you have the same types and don't lose the type information by using productIterator the solution is more elegant.
scala> val tuples = List(List(1,2,3), List(2,3,-3), List(1,1,1))
tuples: List[List[Int]] = List(List(1, 2, 3), List(2, 3, -3), List(1, 1, 1))
scala> tuples.transpose.map(_.sum)
res1: List[Int] = List(4, 6, 1)
scala> val list = List( (1,2,3), (2,3,-3), (1,1,1))
list: List[(Int, Int, Int)] = List((1,2,3), (2,3,-3), (1,1,1))
scala> list.foldRight( (0, 0, 0) ){ case ((a, b, c), (a1, b1, c1)) => (a + a1, b + b1, c + c1) }
res0: (Int, Int, Int) = (4,6,1)

Scala: using foldl to add pairs from list to a map?

I am trying to add pairs from list to a map using foldl. I get the following error:
"missing arguments for method /: in trait TraversableOnce; follow this method with `_' if you want to treat it as a partially applied function"
code:
val pairs = List(("a", 1), ("a", 2), ("c", 3), ("d", 4))
def lstToMap(lst:List[(String,Int)], map: Map[String, Int] ) = {
(map /: lst) addToMap ( _, _)
}
def addToMap(pair: (String, Int), map: Map[String, Int]): Map[String, Int] = {
map + (pair._1 -> pair._2)
}
What is wrong?
scala> val pairs = List(("a", 1), ("a", 2), ("c", 3), ("d", 4))
pairs: List[(String, Int)] = List((a,1), (a,2), (c,3), (d,4))
scala> (Map.empty[String, Int] /: pairs)(_ + _)
res9: scala.collection.immutable.Map[String,Int] = Map(a -> 2, c -> 3, d -> 4)
But you know, you could just do:
scala> pairs.toMap
res10: scala.collection.immutable.Map[String,Int] = Map(a -> 2, c -> 3, d -> 4)
You need to swap the input values of addToMap and put it in parenthesis for this to work:
def addToMap( map: Map[String, Int], pair: (String, Int)): Map[String, Int] = {
map + (pair._1 -> pair._2)
}
def lstToMap(lst:List[(String,Int)], map: Map[String, Int] ) = {
(map /: lst)(addToMap)
}
missingfaktor's answer is much more concise, reusable, and scala-like.
If you already have a collection of Tuple2s, you don't need to implement this yourself, there is already a toMap method that only works if the elements are tuples!
The full signature is:
def toMap[T, U](implicit ev: <:<[A, (T, U)]): Map[T, U]
It works by requiring an implicit A <:< (T, U) which is essentially a function that can take the element type A and cast/convert it to tuples of type (T, U). Another way of saying this is that it requires an implicit witness that A is-a (T, U). Therefore, this is completely type-safe.
Update: which is what #missingfaktor said
This is not a direct answer to the question, which is about folding correctly on the map, but I deem it important to emphasize that
a Map can be treated as a generic Traversable of pairs
and you can easily combine the two!
scala> val pairs = List(("a", 1), ("a", 2), ("c", 3), ("d", 4))
pairs: List[(String, Int)] = List((a,1), (a,2), (c,3), (d,4))
scala> Map.empty[String, Int] ++ pairs
res1: scala.collection.immutable.Map[String,Int] = Map(a -> 2, c -> 3, d -> 4)
scala> pairs.toMap
res2: scala.collection.immutable.Map[String,Int] = Map(a -> 2, c -> 3, d -> 4)

Composing two maps

Is there a function in Scala to compose two maps or is flatMap a sensible approach?
scala> val caps: Map[String, Int] = Map(("A", 1), ("B", 2))
caps: Map[String,Int] = Map(A -> 1, B -> 2)
scala> val lower: Map[Int, String] = Map((1, "a"), (2, "b"))
lower: Map[Int,String] = Map(1 -> a, 2 -> b)
scala> caps.flatMap {
| case (cap, idx) => Map((cap, lower(idx)))
| }
res1: scala.collection.immutable.Map[String,String] = Map(A -> a, B -> b)
Some syntactic sugar would be great!
If you know lower will contain keys for all the values in caps, you can use mapValues:
scala> caps mapValues lower
res0: scala.collection.immutable.Map[String,String] = Map(A -> a, B -> b)
If you don't want or need a new collection, just a mapping, it's a little more idiomatic to use andThen:
scala> val composed = caps andThen lower
composed: PartialFunction[String,String] = <function1>
scala> composed("A")
res1: String = a
This also assumes there aren't values in caps that aren't mapped in lower.