The following for-expression seems intuitive to me. Take each item in List(1), then map over List("a"), and then return a List[(Int, String)].
scala> val x = for {
| a <- List(1)
| b <- List("a")
| } yield (a,b)
x: List[(Int, String)] = List((1,a))
Now, converting it to a flatMap, it seems less clear to me. If I understand correctly, I need to call flatMap first since I'm taking the initial List(1), and then applying a function to convert from A => List[B].
scala> List(1).flatMap(a => List("a").map(b => (a,b) ))
res0: List[(Int, String)] = List((1,a))
After using the flatMap, it seemed necessary to use a map since I needed to go from A => B.
But, as the number of items increases in the for-expression (say 2 to 3 items), how do I know whether to use a map or flatMap when converting from for-expression to flatMap?
In using the for comprehension you always flatMap until the last value that you extract which you map. So if you have three items:
for {
a <- List("a")
b <- List("b")
c <- List("c")
} yield (a, b, c)
It would be the same as:
List("a").flatMap(a => List("b").flatMap(b => List("c").map(c => (a, b, c))))
If you look at the signature of flatMap it's A => M[B]. So as we add elements to the for comprehension we need to flatMap them in since we continue to add M[B] to the comprehension. When we get to the last element, there's nothing left to add so we use map since we just want to go from A => B. Hope that makes sense, if not take you should watch some of the videos in the Reactive Programming class on Coursera as they go over this quite a bit.
Related
I have a paired RDD that looks like
(a1, (a2, a3))
(b1, (b2, b3))
...
I want to flatten the values to obtain
(a1, a2, a3)
(b1, b2, b3)
...
Currently I'm doing
rddData.map(x => (x._1, x._2._1, x._2._2))
Is there a better way of performing the conversion? The above solution gets ugly if value contains many elements instead of just 2.
When I'm trying to avoid all the ugly underscore number stuff that comes with tuple manipulation I like to use case notation:
rddData.map { case (a, (b, c)) => (a, b, c) }
You can also give your variables meaningful names to make your code self documenting and the use of curly braces means you have fewer nested parentheses.
EDIT:
The map { case ... } pattern is pretty compact and can be used for surprisingly deep nested tuples as long as the structure is known at compile time. If you absolutely, positively cannot know the structure of the tuple at compile time, then here is some hacky, slow code that, probably, can flatten any arbitrarily nested tuple... as long as there are no more than 23 elements in total. It works by recursivly converting each element of the tuple to a list, flatmap-ing it to a single list, then using scary reflection to convert the list back into a tuple as seen here.
def flatten(b:Product): List[Any] = {
b.productIterator.toList.flatMap {
case x: Product => flatten(x)
case y: Any => List(y)
}
}
def toTuple[Any](as:List[Any]):Product = {
val tupleClass = Class.forName("scala.Tuple" + as.size)
tupleClass.getConstructors.apply(0).newInstance(as.map(_.asInstanceOf[AnyRef]):_*).asInstanceOf[Product]
}
rddData.map(t => toTuple(flatten(t)))
There is no better way. The 1st answer is equivalent to:
val abc2 = xyz.map{ case (k, v) => (k, v._1, v._2) }
which is equivalent to your own example.
Why doesn't this work:
val m = Map( 1-> 2, 2-> 4, 3 ->6)
def h(k: Int, v: Int) = if (v > 2) Some(k->v) else None
m.flatMap { case(k,v) => h(k,v) }
m.flatMap { (k,v) => h(k,v) }
The one with the case statement gives me:
res1: scala.collection.immutable.Map[Int,Int] = Map(2 -> 4, 3 -> 6)
but the other one fails and says MIssing Type parameter v, and expected: Int, actual:(Int, Int)
The case keyword signifies pattern matching, so the Tuple2 (a Mapis an Iterable ofTuple2 elements) that you are flatMapping "over" gets decomposed into k and v. (The fact that flatMap works when the h function is producing an Option rather than a Map or Iterable is the Scala collections library being perhaps overly permissive.)
Without the case keyword, you are providing a function that requires two arguments, but flatMap needs a function that accepts a single argument (a Tuple2). So the second version does not typecheck.
For second one you can do this, if you don't want to use case.
m.flatMap { x => h(x._1, x._2) } // x is (key,value) pair here(each element in map), hence accessing the key , value as _1,_2 respectively
Suppose that I use a sequence of various maps and/or flatMaps to generate a sequence of collections. Is it possible to access information about the "current" collection from within any of those methods? For example, without knowing anything specific about the functions used in the previous maps or flatMaps, and without using any intermediate declarations, how can I get the maximum value (or length, or first element, etc.) of the collection upon which the last map acts?
List(1, 2, 3)
.flatMap(x => f(x) /* some unknown function */)
.map(x => x + ??? /* what is the max element of the collection? */)
Edit for clarification:
In the example, I'm not looking for the max (or whatever) of the initial List. I'm looking for the max of the collection after the flatMap has been applied.
By "without using any intermediate declarations" I mean that I do not want to use any temporary collections en route to the final result. So, the example by Steve Waldman below, while giving the desired result, is not what I am seeking. (I include this condition is mostly for aesthetic reasons.)
Edit for clarification, part 2:
The ideal solution would be some magic keyword or syntactic sugar that lets me reference the current collection:
List(1, 2, 3)
.flatMap(x => f(x))
.map(x => x + theCurrentList.max)
I'm prepared to accept the fact, however, that this simply is not possible.
Maybe just define the list as a val, so you can name it? I don't know of any facility built into map(...) or flatMap(...) that would help.
val myList = List(1, 2, 3)
myList
.flatMap(x => f(x) /* some unknown function */)
.map(x => x + myList.max /* what is the max element of the List? */)
Update: By this approach at least, if you have multiple transformations and want to see the transformed version, you'd have to name that. You could get away with
val myList = List(1, 2, 3).flatMap(x => f(x) /* some unknown function */)
myList.map(x => x + myList.max /* what is the max element of the List? */)
Or, if there will be multiple transformations, get in the habit of naming the stages.
val rawList = List(1, 2, 3)
val smordified = rawList.flatMap(x => f(x) /* some unknown function */)
val maxified = smordified.map(x => x + smordified.max /* what is the max element of the List? */)
maxified
Update 2: Watch it work in the REPL even with heterogenous types:
scala> def f( x : Int ) : Vector[Double] = Vector(x * math.random, x * math.random )
f: (x: Int)Vector[Double]
scala> val rawList = List(1, 2, 3)
rawList: List[Int] = List(1, 2, 3)
scala> val smordified = rawList.flatMap(x => f(x) /* some unknown function */)
smordified: List[Double] = List(0.40730853571901315, 0.15151641399798665, 1.5305929709857609, 0.35211231420067435, 0.644241939254793, 0.15530230501048903)
scala> val maxified = smordified.map(x => x + smordified.max /* what is the max element of the List? */)
maxified: List[Double] = List(1.937901506704774, 1.6821093849837476, 3.0611859419715217, 1.8827052851864352, 2.1748349102405538, 1.6858952759962498)
scala> maxified
res3: List[Double] = List(1.937901506704774, 1.6821093849837476, 3.0611859419715217, 1.8827052851864352, 2.1748349102405538, 1.6858952759962498)
It is possible, but not pretty, and not likely something you want if you are doing it for "aesthetic reasons."
import scala.math.max
def f(x: Int): Seq[Int] = ???
List(1, 2, 3).
flatMap(x => f(x) /* some unknown function */).
foldRight((List[Int](),List[Int]())) {
case (x, (xs, Nil)) => ((x :: xs), List.fill(xs.size + 1)(x))
case (x, (xs, xMax :: _)) => ((x :: xs), List.fill(xs.size + 1)(max(x, xMax)))
}.
zipped.
map {
case (x, xMax) => x + xMax
}
// Or alternately, a slightly more efficient version using Streams.
List(1, 2, 3).
flatMap(x => f(x) /* some unknown function */).
foldRight((List[Int](),Stream[Int]())) {
case (x, (xs, Stream())) =>
((x :: xs), Stream.continually(x))
case (x, (xs, curXMax #:: _)) =>
val newXMax = max(x, curXMax)
((x :: xs), Stream.continually(newXMax))
}.
zipped.
map {
case (x, xMax) => x + xMax
}
Seriously though, I just took this on to see if I could do it. While the code didn't turn out as bad as I expected, I still don't think it's particularly readable. I'd discourage using this over something similar to Steve Waldman's answer. Sometimes, it's simply better to just introduce a val, rather than being dogmatic about it.
You could define a mapWithSelf (resp. flatMapWithSelf) operation along these lines and add it as an implicit enrichment to the collection. For List it might look like:
// Scala 2.13 APIs
object Enrichments {
implicit class WithSelfOps[A](val lst: List[A]) extends AnyVal {
def mapWithSelf[B](f: (A, List[A]) => B): List[B] =
lst.map(f(_, lst))
def flatMapWithSelf[B](f: (A, List[A]) => IterableOnce[B]): List[B] =
lst.flatMap(f(_, lst))
}
}
The enrichment basically fixes the value of the collection before the operation and threads it through. It should be possible to generify this (at least for the strict collections), though it would look a little different in 2.12 vs. 2.13+.
Usage would look like
import Enrichments._
val someF: Int => IterableOnce[Int] = ???
List(1, 2, 3)
.flatMap(someF)
.mapWithSelf { (x, lst) =>
x + lst.max
}
So at the usage site, it's aesthetically pleasant. Note that if you're computing something which traverses the list, you'll be traversing the list every time (leading to a quadratic runtime). You can get around that with some mutability or by just saving the intermediate list after the flatMap.
One somewhat-simple way of referencing prior output within the current map/collect operation is to use a named reference outside the map, then reference it from within the map block:
var prevOutput = ... // starting value of whatever is referenced within the map
myValues.map {
prevOutput = ... // expression that references prior `prevOutput`
prevOutput // return above computed value for the map to collect
}
This draws attention to the fact that we're referencing prior elements while building the new sequence.
This would be more messy, though, if you wanted to reference arbitrarily previous values, not just the previous one.
Is there a way to manipulate multiple values of a tuple without using a temporary variable and starting a new statement?
Say I have a method that returns a tuple and I want to do something with those values inline.
e.g. if I want to split a string at a certain point and reverse the pieces
def backToFront(s: String, n:Int) = s.splitAt(n)...
I can do
val (a, b) = s.splitAt(n)
b + a
(requires temporary variables and new statement) or
List(s.splitAt(n)).map(i => i._2 + i._1).head
(works, but seems a bit dirty, creating a single element List just for this) or
s.splitAt(n).swap.productIterator.mkString
(works for this particular example, but only because there happens to be a swap method that does what I want, so it's not very general).
The zipped method on tuples seems just to be for tuples of Lists.
As another example, how could you turn the tuple ('a, 'b, 'c) into ('b, 'a, 'c) in one statement?
Tuples are just convenient return type, and it is not assumed that you will make complicated manipulations with it. Also there was similar discussion on scala forums.
About the last example, couldn't find anything better than pattern-matching.
('a, 'b, 'c) match { case (a, b, c) => (b, a ,c) }
Unfortunately, the built-in methods on tuples are pretty limited.
Maybe you want something like these in your personal library,
def fold2[A, B, C](x: (A, B))(f: (A, B) => C): C = f(x._1, x._2)
def fold3[A, B, C, D](x: (A, B, C))(f: (A, B, C) => D): D = f(x._1, x._2, x._3)
With the appropriate implicit conversions, you could do,
scala> "hello world".splitAt(5).swap.fold(_ + _)
res1: java.lang.String = " worldhello"
scala> (1, 2, 3).fold((a, b, c) => (b, c, a))
res2: (Int, Int, Int) = (2,3,1)
An alternative to the last expression would be the "pipe" operator |> (get it from Scalaz or here),
scala> ('a, 'b, 'c) |> (t => (t._2, t._3, t._1))
res3: (Symbol, Symbol, Symbol) = ('b,'c,'a)
This would be nice, if not for the required annotations,
scala> ("hello ", "world") |> (((_: String) + (_: String)).tupled)
res4: java.lang.String = hello world
How about this?
s.splitAt(n) |> Function.tupled(_ + _)
[ Edit: Just noticed your arguments to function are reversed. In that case, you will have to give up placeholder syntax and instead go for: s.splitAt(n) |> Function.tupled((a, b) => b + a) ]
For your last example, can't think of anything better than a pattern match (as shown by #4e6.)
With the current development version of shapeless, you can achieve this without unpacking the tuple:
import shapeless.syntax.std.tuple._
val s = "abcdefgh"
val n = 3
s.splitAt(n).rotateRight[shapeless.Nat._1].mkString("", "", "") // "defghabc"
I think you shouldn't have to wait too long (matter of days I'd say) before the syntax of the methods of the last line get cleaned, and you can simply write
s.splitAt(n).rotateRight(1).mkString
To create a new class that can be used in a Scala for comprehension, it seems that all you have to do is define a map function:
scala> class C[T](items: T*) {
| def map[U](f: (T) => U) = this.items.map(f)
| }
defined class C
scala> for (x <- new C(1 -> 2, 3 -> 4)) yield x
res0: Seq[(Int, Int)] = ArrayBuffer((1,2), (3,4))
But that only works for simple for loops where there is no pattern matching on the left hand side of <-. If you try to pattern match there, you get a complaint that the filter method is not defined:
scala> for ((k, v) <- new C(1 -> 2, 3 -> 4)) yield k -> v
<console>:7: error: value filter is not a member of C[(Int, Int)]
for ((k, v) <- new C(1 -> 2, 3 -> 4)) yield k -> v
Why is filter required to implement the pattern matching here? I would have thought Scala would just translate the above loop into the equivalent map call:
scala> new C(1 -> 2, 3 -> 4).map{case (k, v) => k -> v}
res2: Seq[(Int, Int)] = ArrayBuffer((1,2), (3,4))
But that seems to work fine, so the for loop must be translated into something else. What is it translated into that needs the filter method?
The short answer: according to the Scala specs, you shouldn't need to define a 'filter' method for the example you gave, but there is an open bug that means it is currently required.
The long answer: the desugaring algorithm applied to for comprehensions is described in the Scala language specification. Let's start with section 6.19 "For Comprehensions and For Loops" (I'm looking at version 2.9 of the specification):
In a first step, every generator p <- e, where p is not irrefutable (ยง8.1) for the type of e is replaced by p <- e.withFilter { case p => true; case _ => false }
The important point for your question is whether the pattern in the comprehension is "irrefutable" for the given expression or not. (The pattern is the bit before the '<-'; the expression is the bit afterwards.) If it is "irrefutable" then the withFilter will not be added, otherwise it will be needed.
Fine, but what does "irrefutable" mean? Skip ahead to section 8.1.14 of the spec ("Irrefutable Patterns"). Roughly speaking, if the compiler can prove that the pattern cannot fail when matching the expression then the pattern is irrefutable and the withFilter call will not be added.
Now your example that works as expected is the first type of irrefutable pattern from section 8.1.14, a variable pattern. So the first example is easy for the compiler to determine that withFilter is not required.
Your second example is potentially the third type of irrefutable pattern, a constructor pattern. Trying to match (k,v) which is Tuple2[Any,Any] against a Tuple2[Int,Int] (see section 8.1.6 and 8.1.7 from the specification)) succeeds since Int is irrefutable for Any. Therefore the second pattern is also irrefutable and doesn't (shouldn't) need a withFilter method.
In Daniel's example, Tuple2[Any,Any] isn't irrefutable against Any, so the withFilter calls gets added.
By the way, the error message talks about a filter method but the spec talks about withFilter - it was changed with Scala 2.8, see this question and answer for the gory details.
See the difference:
scala> for ((k, v) <- List(1 -> 2, 3 -> 4, 5)) yield k -> v
res22: List[(Any, Any)] = List((1,2), (3,4))
scala> List(1 -> 2, 3 -> 4, 5).map{case (k, v) => k -> v}
scala.MatchError: 5