I am new to spark programming and scala and i am not able to understand the difference between map and flatMap.
I tried below code as i was expecting both to work but got error.
scala> val b = List("1","2", "4", "5")
b: List[String] = List(1, 2, 4, 5)
scala> b.map(x => (x,1))
res2: List[(String, Int)] = List((1,1), (2,1), (4,1), (5,1))
scala> b.flatMap(x => (x,1))
<console>:28: error: type mismatch;
found : (String, Int)
required: scala.collection.GenTraversableOnce[?]
b.flatMap(x => (x,1))
As per my understanding flatmap make Rdd in to collection for String/Int Rdd.
I was thinking that in this case both should work without any error.Please let me know where i am making the mistake.
Thanks
You need to look at how the signatures defined these methods:
def map[U: ClassTag](f: T => U): RDD[U]
map takes a function from type T to type U and returns an RDD[U].
On the other hand, flatMap:
def flatMap[U: ClassTag](f: T => TraversableOnce[U]): RDD[U]
Expects a function taking type T to a TraversableOnce[U], which is a trait Tuple2 doesn't implement, and returns an RDD[U]. Generally, you use flatMap when you want to flatten a collection of collections, i.e. if you had an RDD[List[List[Int]] and you want to produce a RDD[List[Int]] you can flatMap it using identity.
map(func) Return a new distributed dataset formed by passing each element of the source through a function func.
flatMap(func) Similar to map, but each input item can be mapped to 0 or more output items (so func should return a Seq rather than a single item).
The following example might be helpful.
scala> val b = List("1", "2", "4", "5")
b: List[String] = List(1, 2, 4, 5)
scala> b.map(x=>Set(x,1))
res69: List[scala.collection.immutable.Set[Any]] =
List(Set(1, 1), Set(2, 1), Set(4, 1), Set(5, 1))
scala> b.flatMap(x=>Set(x,1))
res70: List[Any] = List(1, 1, 2, 1, 4, 1, 5, 1)
scala> b.flatMap(x=>List(x,1))
res71: List[Any] = List(1, 1, 2, 1, 4, 1, 5, 1)
scala> b.flatMap(x=>List(x+1))
res75: scala.collection.immutable.Set[String] = List(11, 21, 41, 51) // concat
scala> val x = sc.parallelize(List("aa bb cc dd", "ee ff gg hh"), 2)
scala> val y = x.map(x => x.split(" ")) // split(" ") returns an array of words
scala> y.collect
res0: Array[Array[String]] = Array(Array(aa, bb, cc, dd), Array(ee, ff, gg, hh))
scala> val y = x.flatMap(x => x.split(" "))
scala> y.collect
res1: Array[String] = Array(aa, bb, cc, dd, ee, ff, gg, hh)
Map operation return type is U where as flatMap return type is TraversableOnce[U](means collections)
val b = List("1", "2", "4", "5")
val mapRDD = b.map { input => (input, 1) }
mapRDD.foreach(f => println(f._1 + " " + f._2))
val flatmapRDD = b.flatMap { input => List((input, 1)) }
flatmapRDD.foreach(f => println(f._1 + " " + f._2))
map does a 1-to-1 transformation, while flatMap converts a list of lists to a single list:
scala> val b = List(List(1,2,3), List(4,5,6), List(7,8,90))
b: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6), List(7, 8, 90))
scala> b.map(x => (x,1))
res1: List[(List[Int], Int)] = List((List(1, 2, 3),1), (List(4, 5, 6),1), (List(7, 8, 90),1))
scala> b.flatMap(x => x)
res2: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 90)
Also, flatMap is useful for filtering out None values if you have a list of Options:
scala> val c = List(Some(1), Some(2), None, Some(3), Some(4), None)
c: List[Option[Int]] = List(Some(1), Some(2), None, Some(3), Some(4), None)
scala> c.flatMap(x => x)
res3: List[Int] = List(1, 2, 3, 4)
Related
This might be pretty simple questions. I have a list named "List1" that contain list of integer pairs as below.
List1 = List((1,2), (3,4), (9,8), (9,10))
Output should be:
r1 = (1,3,9,9) //List((1,2), (3,4), (9,8), (9,10))
r2 = (2,4,8,10) //List((1,2), (3,4), (9,8), (9,10))
array r1(Array[int]) should contains set of all first integers of each pair in the list.
array r2(Array[int]) should contains set of all second integers of each pair
Just use unzip:
scala> List((1,2), (3,4), (9,8), (9,10)).unzip
res0: (List[Int], List[Int]) = (List(1, 3, 9, 9),List(2, 4, 8, 10))
Use foldLeft
val (alist, blist) = list1.foldLeft((List.empty[Int], List.empty[Int])) { (r, c) => (r._1 ++ List(c._1), r._2 ++ List(c._2))}
Scala REPL
scala> val list1 = List((1, 2), (3, 4), (5, 6))
list1: List[(Int, Int)] = List((1,2), (3,4), (5,6))
scala> val (alist, blist) = list1.foldLeft((List.empty[Int], List.empty[Int])) { (r, c) => (r._1 ++ List(c._1), r._2 ++ List(c._2))}
alist: List[Int] = List(1, 3, 5)
blist: List[Int] = List(2, 4, 6)
Want to merge val A = Option(Seq(1,2)) and val B = Option(Seq(3,4)) to yield a new option sequence
val C = Option(Seq(1,2,3,4))
This
val C = Option(A.getOrElse(Nil) ++ B.getOrElse(Nil)),
seems faster and more idiomatic than
val C = Option(A.toList.flatten ++ B.toList.flatten)
But is there a better way? And am I right that getOrElse is faster and lighter than toList.flatten?
What about a neat for comprehension:
val Empty = Some(Nil)
val C = for {
a <- A orElse Empty
b <- B orElse Empty
} yield a ++ b
Creates less intermediate options.
Or, you could just do a somewhat cumbersome pattern matching:
(A, B) match {
case (None, None) => Nil
case (None, sb#Some(b)) => sb
case (sa#Some(a), None) => sa
case (Some(a), Some(b)) => Some(a ++ b)
}
I think this at least creates less intermediate collections than the double flatten.
Your first case:
// In this case getOrElse is not needed as the option is clearly not `None`.
// So, you can replace the following:
val C = Option(A.getOrElse(Nil) ++ B.getOrElse(Nil))
// By this:
val C = Option(A.get ++ B.get) // A simple concatenation of two sequences.
C: Option[Seq[Int]] = Some(List(1, 2, 3, 4))
Your second case/option is wrong for multiple reasons.
val C = Option(A.toList.flatten ++ B.toList.flatten)
Option[List[Int]] = Some(List(1, 2, 3, 4))
It returns the incorrect type Option[List[Int]] instead of Option[Seq[Int]]
It needlessly invokes toList on A & B. You could simply add the options and invoke flatten on them.
It is not DRY and redundantly calls flatten on both A.toList & B.toList whereas it could call flatten on (A ++ B)
Instead of this, you could do this more efficiently:
val E = Option((A ++ B).flatten.toSeq)
E: Option[Seq[Int]] = Some(List(1, 2, 3, 4))
Using foldLeft
Seq(Some(List(1, 2)), None).foldLeft(List.empty[Int])(_ ++ _.getOrElse(List.empty[Int]))
result: List[Int] = List(1, 2)
Using flatten twice
Seq(Some(Seq(1, 2, 3)), Some(4, 5, 6), None).flatten.flatten
result: Seq(1, 2, 3, 4, 5, 6)
Scala REPL
scala> val a = Some(Seq(1, 2, 3))
a: Some[Seq[Int]] = Some(List(1, 2, 3))
scala> val b = Some(Seq(4, 5, 6))
b: Some[Seq[Int]] = Some(List(4, 5, 6))
scala> val c = None
c: None.type = None
scala> val d = Seq(a, b, c).flatten.flatten
d: Seq[Int] = List(1, 2, 3, 4, 5, 6)
I want to group elements of a list such as :
val lst = List(1,2,3,4,5)
On transformation it should return a new list as:
val newlst = List(List(1), List(1,2), List(1,2,3), List(1,2,3,4), Lis(1,2,3,4,5))
You can do it this way:
lst.inits.toList.reverse.tail
(1 to lst.size map lst.take).toList should do it.
Not as pretty or short as others, but gotta have some tail recursion for the soul:
def createFromElements(list: List[Int]): List[List[Int]] = {
#tailrec
def createFromElements(l: List[Int], p: List[List[Int]]): List[List[Int]] =
l match {
case x :: xs =>
createFromElements(xs, (p.headOption.getOrElse(List()) ++ List(x)) :: p)
case Nil => p.reverse
}
createFromElements(list, Nil)
}
And now:
scala> createFromElements(List(1,2,3,4,5))
res10: List[List[Int]] = List(List(1), List(1, 2), List(1, 2, 3), List(1, 2, 3, 4), List(1, 2, 3, 4, 5))
Doing a foldLeft seems to be more efficient, though ugly:
(lst.foldLeft((List[List[Int]](), List[Int]()))((x,y) => {
val z = x._2 :+ y;
(x._1 :+ z, z)
}))._1
This insert function is taken from :
http://aperiodic.net/phil/scala/s-99/p21.scala
def insertAt[A](e: A, n: Int, ls: List[A]): List[A] = ls.splitAt(n) match {
case (pre, post) => pre ::: e :: post
}
I want to insert an element at every second element of a List so I use :
val sl = List("1", "2", "3", "4", "5") //> sl : List[String] = List(1, 2, 3, 4, 5)
insertAt("'a", 2, insertAt("'a", 4, sl)) //> res0: List[String] = List(1, 2, 'a, 3, 4, 'a, 5)
This is a very basic implementation, I want to use one of the functional constructs. I think I need
to use a foldLeft ?
Group the list into Lists of size 2, then combine those into lists separated by the separation character:
val sl = List("1","2","3","4","5") //> sl : List[String] = List(1, 2, 3, 4, 5)
val grouped = sl grouped(2) toList //> grouped : List[List[String]] = List(List(1, 2), List(3, 4), List(5))
val separatedList = grouped flatMap (_ :+ "a") //> separatedList : <error> = List(1, 2, a, 3, 4, a, 5, a)
Edit
Just saw that my solution has a trailing token that isn't in the question. To get rid of that do a length check:
val separatedList2 = grouped flatMap (l => if(l.length == 2) l :+ "a" else l)
//> separatedList2 : <error> = List(1, 2, a, 3, 4, a, 5)
You could also use sliding:
val sl = List("1", "2", "3", "4", "5")
def insertEvery(n:Int, el:String, sl:List[String]) =
sl.sliding(2, 2).foldRight(List.empty[String])( (xs, acc) => if(xs.length == n)xs:::el::acc else xs:::acc)
insertEvery(2,"x",sl) // res1: List[String] = List(1, 2, x, 3, 4, x, 5)
Forget about insertAt, use pure foldLeft:
def insertAtEvery[A](e: A, n: Int, ls: List[A]): List[A] =
ls.foldLeft[(Int, List[A])]((0, List.empty)) {
case ((pos, result), elem) =>
((pos + 1) % n, if (pos == n - 1) e :: elem :: result else elem :: result)
}._2.reverse
Recursion and pattern matching are functional constructs. Insert the new elem by pattern matching on the output of splitAt then recurse with the remaining input. Seems easier to read but I'm not satisfied with the type signature for this one.
def insertEvery(xs: List[Any], n: Int, elem: String):List[Any] = xs.splitAt(n) match {
case (xs, List()) => if(xs.size >= n) xs ++ elem else xs
case (xs, ys) => xs ++ elem ++ insertEvery(ys, n, elem)
}
Sample runs.
scala> val xs = List("1","2","3","4","5")
xs: List[String] = List(1, 2, 3, 4, 5)
scala> insertEvery(xs, 1, "a")
res1: List[Any] = List(1, a, 2, a, 3, a, 4, a, 5, a)
scala> insertEvery(xs, 2, "a")
res2: List[Any] = List(1, 2, a, 3, 4, a, 5)
scala> insertEvery(xs, 3, "a")
res3: List[Any] = List(1, 2, 3, a, 4, 5)
An implementation using recursion:
Note n must smaller than the size of List, or else an Exception would be raised.
scala> def insertAt[A](e: A, n: Int, ls: List[A]): List[A] = n match {
| case 0 => e :: ls
| case _ => ls.head :: insertAt(e, n-1, ls.tail)
| }
insertAt: [A](e: A, n: Int, ls: List[A])List[A]
scala> insertAt("'a", 2, List("1", "2", "3", "4"))
res0: List[String] = List(1, 2, 'a, 3, 4)
Consider indexing list positions with zipWithIndex, and so
sl.zipWithIndex.flatMap { case(v,i) => if (i % 2 == 0) List(v) else List(v,"a") }
I wish to add a list of tuples of integers i.e. given an input list of tuples of arity k, produce a tuple of arity k whose fields are sums of corresponding fields of the tuples in the list.
Input
List( (1,2,3), (2,3,-3), (1,1,1))
Output
(4, 6, 1)
I was trying to use foldLeft, but I am not able to get it to compile. Right now, I am using a for loop, but I was looking for a more concise solution.
This can be done type safely and very concisely using shapeless,
scala> import shapeless._, syntax.std.tuple._
import shapeless._
import syntax.std.tuple._
scala> val l = List((1, 2, 3), (2, 3, -1), (1, 1, 1))
l: List[(Int, Int, Int)] = List((1,2,3), (2,3,-1), (1,1,1))
scala> l.map(_.toList).transpose.map(_.sum)
res0: List[Int] = List(4, 6, 3)
Notice that unlike solutions which rely on casts, this approach is type safe, and any type errors are detected at compile time rather than at runtime,
scala> val l = List((1, 2, 3), (2, "foo", -1), (1, 1, 1))
l: List[(Int, Any, Int)] = List((1,2,3), (2,foo,-1), (1,1,1))
scala> l.map(_.toList).transpose.map(_.sum)
<console>:15: error: could not find implicit value for parameter num: Numeric[Any]
l.map(_.toList).transpose.map(_.sum)
^
scala> val tuples = List( (1,2,3), (2,3,-3), (1,1,1))
tuples: List[(Int, Int, Int)] = List((1,2,3), (2,3,-3), (1,1,1))
scala> tuples.map(t => t.productIterator.toList.map(_.asInstanceOf[Int])).transpose.map(_.sum)
res0: List[Int] = List(4, 6, 1)
Type information is lost when calling productIterator on Tuple3 so you have to convert from Any back to an Int.
If the tuples are always going to contain the same type I would suggest using another collection such as List. The Tuple is better suited for disparate types. When you have the same types and don't lose the type information by using productIterator the solution is more elegant.
scala> val tuples = List(List(1,2,3), List(2,3,-3), List(1,1,1))
tuples: List[List[Int]] = List(List(1, 2, 3), List(2, 3, -3), List(1, 1, 1))
scala> tuples.transpose.map(_.sum)
res1: List[Int] = List(4, 6, 1)
scala> val list = List( (1,2,3), (2,3,-3), (1,1,1))
list: List[(Int, Int, Int)] = List((1,2,3), (2,3,-3), (1,1,1))
scala> list.foldRight( (0, 0, 0) ){ case ((a, b, c), (a1, b1, c1)) => (a + a1, b + b1, c + c1) }
res0: (Int, Int, Int) = (4,6,1)