Two newbie questions.
It seems that for comprehension knows about Options and can skip automatically None and unwrap Some, e.g.
val x = Map("a" -> List(1,2,3), "b" -> List(4,5,6), "c" -> List(7,8,9))
val r = for {map_key <- List("WRONG_KEY", "a", "b", "c")
map_value <- x get map_key } yield map_value
outputs:
r: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6), List(7, 8, 9))
Where do the Options go? Can someone please shed some light on how does this work? Can we always rely on this behaviour?
The second things is why this does not compile?
val x = Map("a" -> List(1,2,3), "b" -> List(4,5,6), "c" -> List(7,8,9))
val r = for {map_key <- List("WRONG_KEY", "a", "b", "c")
map_value <- x get map_key
list_value <- map_value
} yield list_value
It gives
Error:(57, 26) type mismatch;
found : List[Int]
required: Option[?]
list_value <- map_value
^
Looking at the type of the first example, I am not sure why we need to have an Option here?
For comprehensions are converted into calls to sequence of map or flatMap calls. See here
Your for loop is equivalent to
List("WRONG_KEY", "a", "b", "c").flatMap(
map_key => x.get(map_key).flatMap(map_value => map_value)
)
flatMap in Option is defined as
#inline final def flatMap[B](f: A => Option[B]): Option[B]
So it is not allowed to pass List as argument as you are notified by compiler.
I think the difference is due to the way for comprehensions are expanded into map() and flatMap method calls within the Seq trait.
For conciseness, lets define some variables:
var keys = List("WRONG_KEY", a, b, c)
Your first case is equivalent to:
val r = keys.flatMap(x.get(_))
whereas your second case is equivalent to:
val r= keys.flatMap(x.get(_).flatMap{ case y => y })
I think the issue is that Option.flatMap() should return an Option[], which is fine in the first case, but is not consistent in the second case with what the x.get().flatMap is passed, which is a List[Int].
These for-comprehension translation rules are explained in further detail in chapter 7 of "Programming Scala" by Wampler & Payne.
Maybe this small difference, setting parenthesis and calling flatten, makes it clear:
val r = for {map_key <- List("WRONG_KEY", "a", "b", "c")
| } yield x get map_key
r: List[Option[List[Int]]] = List(None, Some(List(1, 2, 3)), Some(List(4, 5, 6)), Some(List(7, 8, 9)))
val r = (for {map_key <- List("WRONG_KEY", "a", "b", "c")
| } yield x get map_key).flatten
r: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6), List(7, 8, 9))
That's equivalent to:
scala> List("WRONG_KEY", "a", "b", "c").map (x get _)
res81: List[Option[List[Int]]] = List(None, Some(List(1, 2, 3)), Some(List(4, 5, 6)), Some(List(7, 8, 9)))
scala> List("WRONG_KEY", "a", "b", "c").map (x get _).flatten
res82: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6), List(7, 8, 9))
The intermediate value (map_key) vanished as _ in the second block.
You are mixing up two different monads (List and Option) inside the for statement. This sometimes works as expected, but not always. In any case, you can trasform options into lists yourself:
for {
map_key <- List("WRONG_KEY", "a", "b", "c")
list_value <- x get map_key getOrElse Nil
} yield list_value
Related
I am new to spark programming and scala and i am not able to understand the difference between map and flatMap.
I tried below code as i was expecting both to work but got error.
scala> val b = List("1","2", "4", "5")
b: List[String] = List(1, 2, 4, 5)
scala> b.map(x => (x,1))
res2: List[(String, Int)] = List((1,1), (2,1), (4,1), (5,1))
scala> b.flatMap(x => (x,1))
<console>:28: error: type mismatch;
found : (String, Int)
required: scala.collection.GenTraversableOnce[?]
b.flatMap(x => (x,1))
As per my understanding flatmap make Rdd in to collection for String/Int Rdd.
I was thinking that in this case both should work without any error.Please let me know where i am making the mistake.
Thanks
You need to look at how the signatures defined these methods:
def map[U: ClassTag](f: T => U): RDD[U]
map takes a function from type T to type U and returns an RDD[U].
On the other hand, flatMap:
def flatMap[U: ClassTag](f: T => TraversableOnce[U]): RDD[U]
Expects a function taking type T to a TraversableOnce[U], which is a trait Tuple2 doesn't implement, and returns an RDD[U]. Generally, you use flatMap when you want to flatten a collection of collections, i.e. if you had an RDD[List[List[Int]] and you want to produce a RDD[List[Int]] you can flatMap it using identity.
map(func) Return a new distributed dataset formed by passing each element of the source through a function func.
flatMap(func) Similar to map, but each input item can be mapped to 0 or more output items (so func should return a Seq rather than a single item).
The following example might be helpful.
scala> val b = List("1", "2", "4", "5")
b: List[String] = List(1, 2, 4, 5)
scala> b.map(x=>Set(x,1))
res69: List[scala.collection.immutable.Set[Any]] =
List(Set(1, 1), Set(2, 1), Set(4, 1), Set(5, 1))
scala> b.flatMap(x=>Set(x,1))
res70: List[Any] = List(1, 1, 2, 1, 4, 1, 5, 1)
scala> b.flatMap(x=>List(x,1))
res71: List[Any] = List(1, 1, 2, 1, 4, 1, 5, 1)
scala> b.flatMap(x=>List(x+1))
res75: scala.collection.immutable.Set[String] = List(11, 21, 41, 51) // concat
scala> val x = sc.parallelize(List("aa bb cc dd", "ee ff gg hh"), 2)
scala> val y = x.map(x => x.split(" ")) // split(" ") returns an array of words
scala> y.collect
res0: Array[Array[String]] = Array(Array(aa, bb, cc, dd), Array(ee, ff, gg, hh))
scala> val y = x.flatMap(x => x.split(" "))
scala> y.collect
res1: Array[String] = Array(aa, bb, cc, dd, ee, ff, gg, hh)
Map operation return type is U where as flatMap return type is TraversableOnce[U](means collections)
val b = List("1", "2", "4", "5")
val mapRDD = b.map { input => (input, 1) }
mapRDD.foreach(f => println(f._1 + " " + f._2))
val flatmapRDD = b.flatMap { input => List((input, 1)) }
flatmapRDD.foreach(f => println(f._1 + " " + f._2))
map does a 1-to-1 transformation, while flatMap converts a list of lists to a single list:
scala> val b = List(List(1,2,3), List(4,5,6), List(7,8,90))
b: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6), List(7, 8, 90))
scala> b.map(x => (x,1))
res1: List[(List[Int], Int)] = List((List(1, 2, 3),1), (List(4, 5, 6),1), (List(7, 8, 90),1))
scala> b.flatMap(x => x)
res2: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 90)
Also, flatMap is useful for filtering out None values if you have a list of Options:
scala> val c = List(Some(1), Some(2), None, Some(3), Some(4), None)
c: List[Option[Int]] = List(Some(1), Some(2), None, Some(3), Some(4), None)
scala> c.flatMap(x => x)
res3: List[Int] = List(1, 2, 3, 4)
Want to merge val A = Option(Seq(1,2)) and val B = Option(Seq(3,4)) to yield a new option sequence
val C = Option(Seq(1,2,3,4))
This
val C = Option(A.getOrElse(Nil) ++ B.getOrElse(Nil)),
seems faster and more idiomatic than
val C = Option(A.toList.flatten ++ B.toList.flatten)
But is there a better way? And am I right that getOrElse is faster and lighter than toList.flatten?
What about a neat for comprehension:
val Empty = Some(Nil)
val C = for {
a <- A orElse Empty
b <- B orElse Empty
} yield a ++ b
Creates less intermediate options.
Or, you could just do a somewhat cumbersome pattern matching:
(A, B) match {
case (None, None) => Nil
case (None, sb#Some(b)) => sb
case (sa#Some(a), None) => sa
case (Some(a), Some(b)) => Some(a ++ b)
}
I think this at least creates less intermediate collections than the double flatten.
Your first case:
// In this case getOrElse is not needed as the option is clearly not `None`.
// So, you can replace the following:
val C = Option(A.getOrElse(Nil) ++ B.getOrElse(Nil))
// By this:
val C = Option(A.get ++ B.get) // A simple concatenation of two sequences.
C: Option[Seq[Int]] = Some(List(1, 2, 3, 4))
Your second case/option is wrong for multiple reasons.
val C = Option(A.toList.flatten ++ B.toList.flatten)
Option[List[Int]] = Some(List(1, 2, 3, 4))
It returns the incorrect type Option[List[Int]] instead of Option[Seq[Int]]
It needlessly invokes toList on A & B. You could simply add the options and invoke flatten on them.
It is not DRY and redundantly calls flatten on both A.toList & B.toList whereas it could call flatten on (A ++ B)
Instead of this, you could do this more efficiently:
val E = Option((A ++ B).flatten.toSeq)
E: Option[Seq[Int]] = Some(List(1, 2, 3, 4))
Using foldLeft
Seq(Some(List(1, 2)), None).foldLeft(List.empty[Int])(_ ++ _.getOrElse(List.empty[Int]))
result: List[Int] = List(1, 2)
Using flatten twice
Seq(Some(Seq(1, 2, 3)), Some(4, 5, 6), None).flatten.flatten
result: Seq(1, 2, 3, 4, 5, 6)
Scala REPL
scala> val a = Some(Seq(1, 2, 3))
a: Some[Seq[Int]] = Some(List(1, 2, 3))
scala> val b = Some(Seq(4, 5, 6))
b: Some[Seq[Int]] = Some(List(4, 5, 6))
scala> val c = None
c: None.type = None
scala> val d = Seq(a, b, c).flatten.flatten
d: Seq[Int] = List(1, 2, 3, 4, 5, 6)
I have an array, something like that:
val a = Array("a", "c", "c", "z", "c", "b", "a")
and I want to get a map with keys of all different values of this array and values with a collection of relevant indexes for each such group, i.e. for a given array the answer would be:
Map(
"a" -> Array(0, 6),
"b" -> Array(5),
"c" -> Array(1, 2, 4),
"z" -> Array(3)
)
Surprisingly, it proved to be somewhat more complicated that I've anticipated. The best I've came so far with is:
a.zipWithIndex.groupBy {
case(cnt, idx) => cnt
}.map {
case(cnt, arr) => (cnt, arr.map {
case(k, v) => v
}
}
which is not either concise or easy to understand. Any better ideas?
Your code can be rewritten as oneliner, but it looks ugly.
as.zipWithIndex.groupBy(_._1).mapValues(_.map(_._2))
Another way is to use mutable.MultiMap
import collection.mutable.{ HashMap, MultiMap, Set }
val as = Array("a", "c", "c", "z", "c", "b", "a")
val mm = new HashMap[String, Set[Int]] with MultiMap[String, Int]
and then just add every binding
as.zipWithIndex foreach (mm.addBinding _).tupled
//mm = Map(z -> Set(3), b -> Set(5), a -> Set(0, 6), c -> Set(1, 2, 4))
finally you can convert it mm.toMap if you want immutable version.
Here's a version with foldRight. I think it's reasonably clear.
val a = Array("a", "c", "c", "z", "c", "b", "a")
a
.zipWithIndex
.foldRight(Map[String, List[Int]]())
{case ((e,i), m)=> m updated (e, i::m.getOrElse(e, Nil))}
//> res0: scala.collection.immutable.Map[String,List[Int]] = Map(a -> List(0, 6)
//| , b -> List(5), c -> List(1, 2, 4), z -> List(3))
Another version using foldLeft and an immutable Map with default value:
val a = Array("a", "c", "c", "z", "c", "b", "a")
a.zipWithIndex.foldLeft(Map[String, List[Int]]().withDefaultValue(Nil))( (m, p) => m + ((p._1, p._2 +: m(p._1))))
// res6: scala.collection.immutable.Map[String,List[Int]] = Map(a -> List(6, 0), c -> List(4, 2, 1), z -> List(3), b -> List(5))
Starting in Scala 2.13, we can use the new groupMap which (as its name suggests) is a one-pass equivalent of a groupBy and a mapping over grouped items:
// val a = Array("a", "c", "c", "z", "c", "b", "a")
a.zipWithIndex.groupMap(_._1)(_._2)
// Map("z" -> Array(3), "b" -> Array(5), "a" -> Array(0, 6), "c" -> Array(1, 2, 4))
This:
zips each item with its index, giving (item, index) tuples
groups elements based on their first tuple part (_._1) (group part of groupMap)
maps grouped values to their second tuple part (_._2 i.e. their index) (map part of groupMap)
This insert function is taken from :
http://aperiodic.net/phil/scala/s-99/p21.scala
def insertAt[A](e: A, n: Int, ls: List[A]): List[A] = ls.splitAt(n) match {
case (pre, post) => pre ::: e :: post
}
I want to insert an element at every second element of a List so I use :
val sl = List("1", "2", "3", "4", "5") //> sl : List[String] = List(1, 2, 3, 4, 5)
insertAt("'a", 2, insertAt("'a", 4, sl)) //> res0: List[String] = List(1, 2, 'a, 3, 4, 'a, 5)
This is a very basic implementation, I want to use one of the functional constructs. I think I need
to use a foldLeft ?
Group the list into Lists of size 2, then combine those into lists separated by the separation character:
val sl = List("1","2","3","4","5") //> sl : List[String] = List(1, 2, 3, 4, 5)
val grouped = sl grouped(2) toList //> grouped : List[List[String]] = List(List(1, 2), List(3, 4), List(5))
val separatedList = grouped flatMap (_ :+ "a") //> separatedList : <error> = List(1, 2, a, 3, 4, a, 5, a)
Edit
Just saw that my solution has a trailing token that isn't in the question. To get rid of that do a length check:
val separatedList2 = grouped flatMap (l => if(l.length == 2) l :+ "a" else l)
//> separatedList2 : <error> = List(1, 2, a, 3, 4, a, 5)
You could also use sliding:
val sl = List("1", "2", "3", "4", "5")
def insertEvery(n:Int, el:String, sl:List[String]) =
sl.sliding(2, 2).foldRight(List.empty[String])( (xs, acc) => if(xs.length == n)xs:::el::acc else xs:::acc)
insertEvery(2,"x",sl) // res1: List[String] = List(1, 2, x, 3, 4, x, 5)
Forget about insertAt, use pure foldLeft:
def insertAtEvery[A](e: A, n: Int, ls: List[A]): List[A] =
ls.foldLeft[(Int, List[A])]((0, List.empty)) {
case ((pos, result), elem) =>
((pos + 1) % n, if (pos == n - 1) e :: elem :: result else elem :: result)
}._2.reverse
Recursion and pattern matching are functional constructs. Insert the new elem by pattern matching on the output of splitAt then recurse with the remaining input. Seems easier to read but I'm not satisfied with the type signature for this one.
def insertEvery(xs: List[Any], n: Int, elem: String):List[Any] = xs.splitAt(n) match {
case (xs, List()) => if(xs.size >= n) xs ++ elem else xs
case (xs, ys) => xs ++ elem ++ insertEvery(ys, n, elem)
}
Sample runs.
scala> val xs = List("1","2","3","4","5")
xs: List[String] = List(1, 2, 3, 4, 5)
scala> insertEvery(xs, 1, "a")
res1: List[Any] = List(1, a, 2, a, 3, a, 4, a, 5, a)
scala> insertEvery(xs, 2, "a")
res2: List[Any] = List(1, 2, a, 3, 4, a, 5)
scala> insertEvery(xs, 3, "a")
res3: List[Any] = List(1, 2, 3, a, 4, 5)
An implementation using recursion:
Note n must smaller than the size of List, or else an Exception would be raised.
scala> def insertAt[A](e: A, n: Int, ls: List[A]): List[A] = n match {
| case 0 => e :: ls
| case _ => ls.head :: insertAt(e, n-1, ls.tail)
| }
insertAt: [A](e: A, n: Int, ls: List[A])List[A]
scala> insertAt("'a", 2, List("1", "2", "3", "4"))
res0: List[String] = List(1, 2, 'a, 3, 4)
Consider indexing list positions with zipWithIndex, and so
sl.zipWithIndex.flatMap { case(v,i) => if (i % 2 == 0) List(v) else List(v,"a") }
If I have unknown number of Set[Int] (or List[Int]) as an input and want to combine
i don't know size of input List[Int] for which I need to produce these tuples as a final result, what's the best way to achieve this? My code looks like below.
Ok. Since combine(xs) yields a List[List[Any]] and you have a :: combine(xs) you just insert a into the the List of all combinations. You want to combine a with each element of the possible combinations. That lead me to this solution.
You can also generalize it to lists:List[List[T]] because when you combine from lists:List[List[Int]] you will get a List[List[Int]].
def combine[T](lists: List[List[T]]): List[List[T]] = lists match {
case Nil => lists
case x :: Nil => for(a <- x) yield List(a) //needed extra case because if comb(xs) is Nil in the for loop, it won't yield anything
case x :: xs => {
val comb = combine(xs) //since all further combinations are constant, you should keep it in a val
for{
a <- x
b <- comb
} yield a :: b
}
}
Tests:
val first = List(7, 3, 1)
val second = List(2, 8)
val third = List("a","b")
combine(List(first, second))
//yields List(List(7, 2), List(7, 8), List(3, 2), List(3, 8), List(1, 2), List(1, 8))
combine(List(first, second, third))
//yields List(List(7, 2, a), List(7, 2, b), List(7, 8, a), List(7, 8, b), List(3, 2, a), List(3, 2, b), List(3, 8, a), List(3, 8, b), List(1, 2, a), List(1, 2, b), List(1, 8, a), List(1, 8, b))
I think you can also generalize this to work with other collections than List, but then you can't use pattern-match this easily and you have to work via iterators.