Flatten an element in spark scala - scala

I have data like this in an RDD:
RDD[((Int, Int, Int), ((Int, Int), Int))]
as:
(((9,679,16),((2,274),1)), ((250,976,13),((2,218),1)))
I want output as :
((9,679,16,2,274,1),(250,976,13,2,218,1))
After Joining 2 rdds with:
val joinSale = salesTwo.join(saleFinal)
I got that result set. I tried the following code.
joinSale.flatMap(x => x).take(100).foreach(println)
I have tried map/flatMap but couldn't do it. Any ideas how to implement a scenario like this ? Thanks in advance ..

You can do this with pattern matching in scala. Simply wrap your tuple modification logic within a map similar to the below:
val mappedJoinSale = joinSale.map { case ((a, b, c), ((d, e), f)) => (a, b, c, d, e, f) }
Using your example, we have:
scala> val example = sc.parallelize(Array(((9,679,16),((2,274),1)), ((250,976,13),((2,218),1))))
example: org.apache.spark.rdd.RDD[((Int, Int, Int), ((Int, Int), Int))] = ParallelCollectionRDD[0] at parallelize at <console>:12
scala> val mapped = example.map { case ((a, b, c), ((d, e), f)) => (a, b, c, d, e, f) }
mapped: org.apache.spark.rdd.RDD[(Int, Int, Int, Int, Int, Int)] = MappedRDD[1] at map at <console>:14
scala> mapped.take(2).foreach(println)
...
(9,679,16,2,274,1)
(250,976,13,2,218,1)

You could also create generic tuple flattener using marvelous shapeless library as follows:
import shapeless._
import shapeless.ops.tuple
trait LowLevelFlatten extends Poly1 {
implicit def anyFlat[T] = at[T](x => Tuple1(x))
}
object concat extends Poly2 {
implicit def atTuples[T1, T2](implicit prepend: tuple.Prepend[T1, T2]): Case.Aux[T1, T2, prepend.Out] =
at[T1,T2]((t1,t2) => prepend(t1,t2))
}
object flatten extends LowLevelFlatten {
implicit def tupleFlat[T, M](implicit
mapper: tuple.Mapper.Aux[T, flatten.type, M],
reducer: tuple.LeftReducer[M, concat.type]
): Case.Aux[T, reducer.Out] =
at[T](t => reducer(mapper(t)))
}
Now in any code where import shapeless._ exists you could use it as
joinSale.map(flatten)

Related

Flattening Future[EitherT[Future, A, B]]

As the title mentions.
Having many operations done using EitherT[Future, A, B]. Sometimes I want map left or right through another operation having signature A => Future[C]. Other scenario is that EitherT[Future, A, B] the result of a mapping over a future resulting Future[EitherT[Future, A, B]].
How can I elegantly flatten types like:
EitherT[Future, Future[A], Future[B]] and Future[EitherT[Future, A, B]]
Thank you in advance.
In all your cases you can use EitherT#flatMap (or EitherT#flatMapF), in combination with lifting some value to EitherT (or disjunction (\/) with flatMapF).
Mapping a B => F[C] over an EitherT[F, A, B] :
flatMap + lift
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
import scalaz._, Scalaz._
def f(i: Int): Future[Double] = Future.successful(i.toDouble)
val r = EitherT.right[Future, String, Int](Future.successful(1))
r.flatMap(i => EitherT.right(f(i)))
// or
r.flatMapF(i => f(i).map(_.right))
Mapping a A => F[C] over an EitherT[F, A, B] :
swap + flatMap + lift
def g(s: String): Future[Int] = Future.successful(s.length)
val l = EitherT.left[Future, String, Int](Future.successful("error"))
l.swap.flatMap(s => EitherT.right(g(s))).swap
// or
l.swap.flatMap(s => EitherT.left[Future, Int, Int](g(s)))
// or
l.swap.flatMapF(s => g(s).map(_.left))
Mapping an A => Either[F, B, C] to an F[A] :
lift + flatMap
def h(i: Int): EitherT[Future, String, Int] =
EitherT.right(Future.successful(i + 1))
val fut = Future.successful(1)
// mapping gives us Future[EitherT[Future, String, Int]]
fut.map(h)
// lifting to EitherT and flatMap gives us EitherT[Future, String, Int]
EitherT.right(fut).flatMap(h)

How to create a generic groupByIndex function?

I would like to have groupByIndex function that groups values based on their index (and not the value).
A concrete method definition for Vector[A] could look like the following:
def groupByIndex[A, K](vector: Vector[A], f: Int => K): immutable.Map[K, Vector[(A, Int)]] = {
vector.zipWithIndex.groupBy { case (elem, index) => f(index) }
}
Testing this function in the REPL gives indeed the correct result:
scala> val vec = Vector.tabulate(4)(i => s"string ${i+1}")
vec: scala.collection.immutable.Vector[String] = Vector(string 1, string 2, string 3, string 4)
scala> groupByIndex(vec, i => i%2)
res2: scala.collection.immutable.Map[Int,Vector[(String, Int)]] = Map(1 -> Vector((string 2,1), (string 4,3)), 0 -> Vector((string 1,0), (string 3,2)))
Now, I would like to apply the "enrich-my-library" pattern to give this method to all the classes that should support it, i.e. classes that implement zipWithIndex and groupBy. Those two methods are defined in GenIterableLike (zipWithIndex) and GenTraversableLike/TraversableLike (groupBy).
With all this in mind, I tried to mimic the method definitions of zipWithIndex (this is the problematic) and groupBy to build my own groupByIndex:
implicit class GenIterableLikeOps[A, Repr](val iterable: GenIterableLike[A, Repr] with TraversableLike[A, Repr]) extends AnyVal {
def groupByIndex[K, A1 >: A, That <: TraversableLike[(A1, Int), OtherRepr], OtherRepr](f: Int => K)(implicit bf: CanBuildFrom[Repr, (A1, Int), That]): immutable.Map[K, OtherRepr] = {
val zipped = iterable.zipWithIndex
zipped.groupBy{ case (elem, index) => f(index) }
}
}
First, this seems way too complicated to me - is there a way to simplify this? For example, can we somehow drop the second OtherRepr? (I was not able to.)
Second, I am not able to call this function without explicitly specifying the generic parameters. Using the example from above I get the following error:
scala> vec.groupByIndex(i => i%2)
<console>:21: error: Cannot construct a collection of type scala.collection.TraversableLike[(String, Int),Nothing] with elements of type (String, Int) based on a collection of type scala.collection.immutable.Vector[String].
vec.groupByIndex(i => i%2)
^
scala> vec.groupByIndex[Int, String, Vector[(String, Int)], Vector[(String, Int)]](i => i%2)
res4: scala.collection.immutable.Map[Int,Vector[(String, Int)]] = Map(1 -> Vector((string 2,1), (string 4,3)), 0 -> Vector((string 1,0), (string 3,2)))
How do I a) simplify this method and b) make it work without having to specify the generic parameters?
You can substitute the OtherThat type parameter by That. That way you get rid of OtherThat and solve the problem of having to specify the generic type parameters. The compiler is then able to resolve That by looking at the implicit value for CanBuildFrom[Repr, (A1, Int), That].
implicit class GenIterableLikeOps[A, Repr]
(val iterable: GenIterableLike[A, Repr] with TraversableLike[A, Repr])
extends AnyVal {
def groupByIndex
[K, A1 >: A, That <: TraversableLike[(A1, Int), That]]
(f: Int => K)(implicit bf: CanBuildFrom[Repr, (A1, Int), That])
: Map[K, That] = {
val zipped = iterable.zipWithIndex
zipped.groupBy{ case (elem, index) => f(index) }
}
}
This isn't as good as the other answer, but if you don't care about what you're building, one way to simplify and avoid building the zipped collection:
implicit class gbi[A](val as: Traversable[A]) extends AnyVal {
def groupByIndex[K](f: Int => K) = (as, (0 until Int.MaxValue)).zipped.groupBy { case (x, i) => f(i) }
}
The range is a benign way to avoid taking the size of the traversable.

Trouble with inferred types when using Shapeless

I'm having a bit of trouble getting the inferred types of some extension methods that use Shapeless to line up with what (I think that) they should be. I can assign the result of my expressions to a val with an explicit type after passing them through identity or by way of a val without an explicit type (aaa and bb below respectively), but not directly (c below).
Tested using Scala 2.10.3, Shapeless 1.2.4, Scalaz 7.0.6.
import shapeless._, Tuples._
import scalaz._, Scalaz._
trait FnOps[=>:[_, _], A, B] {
val F: Arrow[=>:]
val f: A =>: B
def &&&:[C](g: A =>: C): A =>: (C, B) =
F.combine(g, f)
}
trait FnProdOps[=>:[_, _], A, B <: Product] {
val F: Arrow[=>:]
val f: A =>: B
def &&&:[C, R <: HList](g: A =>: C)(implicit hlister: HListerAux[B, R], tupler: Tupler[C :: R]): A =>: tupler.Out =
F.mapsnd(F.combine(g, f)) {
case (c, b) => tupler(c +: hlister(b))
}
}
trait LowPriorityImplicits0 {
implicit def ToFnOps[=>:[_, _], A, B](fn: A =>: B)(implicit A: Arrow[=>:]): FnOps[=>:, A, B] =
new FnOps[=>:, A, B] {
val F = A
val f = fn
}
}
object Implicits extends LowPriorityImplicits0 {
implicit def ToFnProdOps[=>:[_, _], A, B <: Product](fn: A =>: B)(implicit A: Arrow[=>:]): FnProdOps[=>:, A, B] =
new FnProdOps[=>:, A, B] {
val F = A
val f = fn
}
}
import Implicits._
// These work
val a: Int => Int = _ * 2
val aa: Int => (Int, Int) = a &&&: a
val aaa: Int => (Int, Int, Int) = identity(a &&&: a &&&: a)
// This works too
val b = a &&&: a &&&: a
val bb: Int => (Int, Int, Int) = b
// This doesn't
val c: Int => (Int, Int, Int) = a &&&: a &&&: a
Is there a way to make assignments similar to c compile?

Using implicit def with composed types

Forgive me if this question is a duplicate; I'm having trouble finding anything because I don't know the right words to search. So, with implicit def, I can do things like this:
type CharsetMap = Map[Charset, Byte]
implicit def seqtup2CharsetMap(input: Seq[(String, Int)]): CharsetMap = {
Map.empty // placeholder
}
def somef(a: Int, b:Int, p: CharsetMap) = p
somef(1, 3, Seq(("hey", 2), ("there", 9)))
which lets me call somef with a Seq[(String, Int)] object as a parameter. The problem is that I have something like this...
def somef2(p: (CharsetMap) => Int) = p
and this does not work:
val p = (a: Seq[(String, Int)]) => 19
somef2(p)
How can I do this without doing an implicit def specifically for (Seq[(String, Int)]) => Int?
It looks like you want to implicitly convert some function A => B to a function that goes from C => B. You can do that with this generic implicit:
implicit def f2InputConverter[A, B, C](f: A => B)(implicit i: C => A): C => B = (c: C) => f(i(c))
Once you have that in scope, in your particular case, you'll need an implicit function which is the inverse of the one that you've defined in the question:
implicit def charsetMap2Seqtup(input: CharsetMap): Seq[(String, Int)] = {
Nil // placeholder
}
and then you should be able to call somef2 with p

Variable multi-assign via unzip on List[( (A,B),(C,D) )], or List[(A,B,C,D)]

In regard to this question I was able to multi-assign via unzip on a List[(A,B)]
However, now I'm finding a need to multi-assign on either a List[( (A,B),(C,D) )] or a List[(A,B,C,D)]
I see that there is an unzip for pairs, and an unzip3 for triplets, but how to destructure a pair of Tuple2 OR a single Tuple4 so as to multi-assign? I'll adapt the collection type below accordingly, but whichever one works for 1-step multi-assignment is fine.
// foo can be a List[(A,B,C,D)] OR List[( (A,B),(C,D) )]
val(a,b,c,d) = foo.unzip
This works but is hacked
val(a,b,c_d) foo.unzip3 // foo is a List[(A,B,(C,D))]
because I wind up having to c_d._1 and c_d._2, the very notation I'm trying to avoid by multi-assigning variables
Maybe this goes without saying, but there's a simple way to do this if you don't mind multiple steps:
val foo = List((1 -> "w", 'x -> 2.0), (101 -> "Y", 'Z -> 3.0))
val (p, q) = foo.unzip
val (a, b) = p.unzip
val (c, d) = p.unzip
If you really want a one-liner, you'll have to resort to something like Scalaz, which provides a Bifunctor instance for tuples that lets you write this, for example:
import scalaz._, Scalaz._
val ((a, b), (c, d)) = foo.unzip.bimap(_.unzip, _.unzip)
This is essentially the same as the version above, but having bimap lets us do everything in one line.
You don't actually need any implicit conversions here. The trick is to take advantage of custom extractor objects, like so:
object Unzipped4 {
def unapply[A, B, C, D](ts: List[(A, B, C, D)]): Some[(List[A], List[B], List[C], List[D])] =
Some((ts map _._1, ts map _._2, ts map _._3, ts map _._4))
}
You then use it like this:
val Unzipped4(as, bs, cs, ds) = foo
You could actually expand this to an arbitrary Product by using the dynamic access methods on that class, but you'd lose some type safety in the process.
As there are only unzip and unzip3, why don't you just write an extension for that? Something like this should work (2.10 code):
implicit class Unzip4[A,B,C,D](val xs: List[(A,B,C,D)]) extends AnyVal {
def unzip4: (List[A], List[B], List[C], List[D]) = xs.foldRight[(List[A], List[B], List[C], List[D])]((Nil,Nil,Nil,Nil)) { (x, res) =>
val (a,b,c,d) = x
(a :: res._1, b :: res._2, c :: res._3, d :: res._4)
}
}
You can add your own unzip4 method.
import scala.collection._
import generic._
class Unzipper[A, CC[X] <: GenTraversable[X]](s: GenericTraversableTemplate[A, CC]) {
def unzip4[A1, A2, A3, A4](implicit asQuad: A => (A1, A2, A3, A4)): (CC[A1], CC[A2], CC[A3], CC[A4]) = {
val b1 = s.genericBuilder[A1]
val b2 = s.genericBuilder[A2]
val b3 = s.genericBuilder[A3]
val b4 = s.genericBuilder[A4]
for (e <- s) {
val (a, b, c, d) = asQuad(e)
b1 += a
b2 += b
b3 += c
b4 += d
}
(b1.result, b2.result, b3.result, b4.result)
}
}
implicit def toUnzipper[A, CC[X] <: GenTraversable[X]](s: GenericTraversableTemplate[A, CC]) = new Unzipper(s)
implicit def t2t2Tot4[A1, A2, A3, A4](tt: ((A1, A2), (A3, A4))) = tt match { case ((a, b), (c, d)) => (a, b, c, d) }
implicit def t1t3Tot4[A1, A2, A3, A4](tt: (A1, (A2, A3, A4))) = tt match { case (a, (b, c, d)) => (a, b, c, d) }
implicit def t3t1Tot4[A1, A2, A3, A4](tt: ((A1, A2, A3), A4)) = tt match { case ((a, b, c), d) => (a, b, c, d) }
Usage:
scala> List((1, 2, 3, 4)).unzip4
res0: (List[Int], List[Int], List[Int], List[Int]) = (List(1),List(2),List(3),List(4))
scala> List((1, 2) -> (3, 4)).unzip4
res1: (List[Int], List[Int], List[Int], List[Int]) = (List(1),List(2),List(3),List(4))
scala> List(1 -> (2, 3, 4)).unzip4
res2: (List[Int], List[Int], List[Int], List[Int]) = (List(1),List(2),List(3),List(4))
scala> List((1, 2, 3) -> 4).unzip4
res3: (List[Int], List[Int], List[Int], List[Int]) = (List(1),List(2),List(3),List(4))
In addition to the great other answers I played around and thought about having nested and arity-generic unzips. My approach uses type classes and loses arity and type safety like productIterator on tuples. Perhaps someone can adapt it using HList from shapeless for the rescue. One also have to implement the pimp my library to use unzip on collections to return the proper (same) collection type unzip was called on and to get rid of Iterable, but I omitted this here to only show the idea of nested arity-generic unzips. Perhaps one can use some kind of LowerPriorityImplicits to implicitly convert any A to Unzippable[A,A] if there isnĀ“t a concrete implicit conversion to Unzippable for a given type.
trait Unzippable[T, +Super] {
def unzip(t: T): Iterable[Super]
}
implicit object IntUnzippable extends Unzippable[Int, Int] { def unzip(i: Int) = Seq(i) }
implicit object BooleanUnzippable extends Unzippable[Boolean, Boolean] { def unzip(b: Boolean) = Seq(b) }
implicit object StringUnzippable extends Unzippable[String, String] { def unzip(s: String) = Seq(s) }
implicit def Tuple2Unzippable[Super, A <: Super, B <: Super, S, S1 <: S, S2 <: S](implicit ev1: Unzippable[A, S1], ev2: Unzippable[B, S2]) = new Unzippable[(A, B), S] {
def unzip(t: (A, B)): Iterable[S] = ev1.unzip(t._1) ++ ev2.unzip(t._2)
}
def unzip[A, Super](i: Iterable[A])(implicit ev: Unzippable[A, Super]): Iterable[Iterable[Super]] = i.map(ev.unzip).transpose
object MyTuple3 {
def unapply[X](i: Iterable[X]): Option[(X, X, X)] = if (i.size != 3) return None else Some((i.head, i.drop(1).head, i.last))
}
val list = (1, ("A", true)) :: (2, ("B", false)) :: (3, ("C", true)) :: Nil
val MyTuple3(nums, letters, bools) = unzip(list)
println((nums, letters, bools)) // (List(1, 2, 3),List(A, B, C),List(true, false, true))