How to create a generic groupByIndex function? - scala

I would like to have groupByIndex function that groups values based on their index (and not the value).
A concrete method definition for Vector[A] could look like the following:
def groupByIndex[A, K](vector: Vector[A], f: Int => K): immutable.Map[K, Vector[(A, Int)]] = {
vector.zipWithIndex.groupBy { case (elem, index) => f(index) }
}
Testing this function in the REPL gives indeed the correct result:
scala> val vec = Vector.tabulate(4)(i => s"string ${i+1}")
vec: scala.collection.immutable.Vector[String] = Vector(string 1, string 2, string 3, string 4)
scala> groupByIndex(vec, i => i%2)
res2: scala.collection.immutable.Map[Int,Vector[(String, Int)]] = Map(1 -> Vector((string 2,1), (string 4,3)), 0 -> Vector((string 1,0), (string 3,2)))
Now, I would like to apply the "enrich-my-library" pattern to give this method to all the classes that should support it, i.e. classes that implement zipWithIndex and groupBy. Those two methods are defined in GenIterableLike (zipWithIndex) and GenTraversableLike/TraversableLike (groupBy).
With all this in mind, I tried to mimic the method definitions of zipWithIndex (this is the problematic) and groupBy to build my own groupByIndex:
implicit class GenIterableLikeOps[A, Repr](val iterable: GenIterableLike[A, Repr] with TraversableLike[A, Repr]) extends AnyVal {
def groupByIndex[K, A1 >: A, That <: TraversableLike[(A1, Int), OtherRepr], OtherRepr](f: Int => K)(implicit bf: CanBuildFrom[Repr, (A1, Int), That]): immutable.Map[K, OtherRepr] = {
val zipped = iterable.zipWithIndex
zipped.groupBy{ case (elem, index) => f(index) }
}
}
First, this seems way too complicated to me - is there a way to simplify this? For example, can we somehow drop the second OtherRepr? (I was not able to.)
Second, I am not able to call this function without explicitly specifying the generic parameters. Using the example from above I get the following error:
scala> vec.groupByIndex(i => i%2)
<console>:21: error: Cannot construct a collection of type scala.collection.TraversableLike[(String, Int),Nothing] with elements of type (String, Int) based on a collection of type scala.collection.immutable.Vector[String].
vec.groupByIndex(i => i%2)
^
scala> vec.groupByIndex[Int, String, Vector[(String, Int)], Vector[(String, Int)]](i => i%2)
res4: scala.collection.immutable.Map[Int,Vector[(String, Int)]] = Map(1 -> Vector((string 2,1), (string 4,3)), 0 -> Vector((string 1,0), (string 3,2)))
How do I a) simplify this method and b) make it work without having to specify the generic parameters?

You can substitute the OtherThat type parameter by That. That way you get rid of OtherThat and solve the problem of having to specify the generic type parameters. The compiler is then able to resolve That by looking at the implicit value for CanBuildFrom[Repr, (A1, Int), That].
implicit class GenIterableLikeOps[A, Repr]
(val iterable: GenIterableLike[A, Repr] with TraversableLike[A, Repr])
extends AnyVal {
def groupByIndex
[K, A1 >: A, That <: TraversableLike[(A1, Int), That]]
(f: Int => K)(implicit bf: CanBuildFrom[Repr, (A1, Int), That])
: Map[K, That] = {
val zipped = iterable.zipWithIndex
zipped.groupBy{ case (elem, index) => f(index) }
}
}

This isn't as good as the other answer, but if you don't care about what you're building, one way to simplify and avoid building the zipped collection:
implicit class gbi[A](val as: Traversable[A]) extends AnyVal {
def groupByIndex[K](f: Int => K) = (as, (0 until Int.MaxValue)).zipped.groupBy { case (x, i) => f(i) }
}
The range is a benign way to avoid taking the size of the traversable.

Related

Flatten an element in spark scala

I have data like this in an RDD:
RDD[((Int, Int, Int), ((Int, Int), Int))]
as:
(((9,679,16),((2,274),1)), ((250,976,13),((2,218),1)))
I want output as :
((9,679,16,2,274,1),(250,976,13,2,218,1))
After Joining 2 rdds with:
val joinSale = salesTwo.join(saleFinal)
I got that result set. I tried the following code.
joinSale.flatMap(x => x).take(100).foreach(println)
I have tried map/flatMap but couldn't do it. Any ideas how to implement a scenario like this ? Thanks in advance ..
You can do this with pattern matching in scala. Simply wrap your tuple modification logic within a map similar to the below:
val mappedJoinSale = joinSale.map { case ((a, b, c), ((d, e), f)) => (a, b, c, d, e, f) }
Using your example, we have:
scala> val example = sc.parallelize(Array(((9,679,16),((2,274),1)), ((250,976,13),((2,218),1))))
example: org.apache.spark.rdd.RDD[((Int, Int, Int), ((Int, Int), Int))] = ParallelCollectionRDD[0] at parallelize at <console>:12
scala> val mapped = example.map { case ((a, b, c), ((d, e), f)) => (a, b, c, d, e, f) }
mapped: org.apache.spark.rdd.RDD[(Int, Int, Int, Int, Int, Int)] = MappedRDD[1] at map at <console>:14
scala> mapped.take(2).foreach(println)
...
(9,679,16,2,274,1)
(250,976,13,2,218,1)
You could also create generic tuple flattener using marvelous shapeless library as follows:
import shapeless._
import shapeless.ops.tuple
trait LowLevelFlatten extends Poly1 {
implicit def anyFlat[T] = at[T](x => Tuple1(x))
}
object concat extends Poly2 {
implicit def atTuples[T1, T2](implicit prepend: tuple.Prepend[T1, T2]): Case.Aux[T1, T2, prepend.Out] =
at[T1,T2]((t1,t2) => prepend(t1,t2))
}
object flatten extends LowLevelFlatten {
implicit def tupleFlat[T, M](implicit
mapper: tuple.Mapper.Aux[T, flatten.type, M],
reducer: tuple.LeftReducer[M, concat.type]
): Case.Aux[T, reducer.Out] =
at[T](t => reducer(mapper(t)))
}
Now in any code where import shapeless._ exists you could use it as
joinSale.map(flatten)

abstract over scala functions without losing types

I'm making a function that takes a lambda, and uses .tupled on it if possible (arity 2+). For the compiler to allow using that, it needs to know if the lambda is really a Function2 (~ Function22). However, pattern-matching for Function2[Any,Any,Any] means I'm left with a (Any, Any) => Any rather than its original types. Not even knowing the arity isn't helping either. I tried matching like case f[A <: Any, B <: Any, C <: Any]: scala.Function2[A, B, C] => f.tupled in an attempt to preserve types, but it doesn't really allow type parameters on a case.
The code:
val add = (a: Int, b: Int) => a + b
add: (Int, Int) => Int = <function2>
val tpl = (fn: Any) => {
fn match {
case f: Function0[Any] => f
case f: Function1[Any,Any] => f
case f: Function2[Any,Any,Any] => f.tupled
// case f: Function3[Any,Any,Any,Any] => f.tupled
// ...
// case _ => { throw new Exception("huh") }
}
}
// actual result:
tpl(add)
res0: Any = <function1>
// desired result is like this one:
scala> add.tupled
res3: ((Int, Int)) => Int = <function1>
Bonus points if I won't need pattern-matching cases for each possible level of arity...
The answer is ugly, as you may have expected. Using a pattern-match on a function as a val won't work. When you start out with Any you've already lost a ton of type information. And the standard library isn't really helping either, as there is no abstraction over arity of functions. That means that we can't even really use reflection to try to grab the type parameters, because we don't even know how many there are. You can figure out what FunctionN you have, but not it's contained types, as they are already lost at this point.
Another possibility is to make tpl a method and overload it for each FunctionN.
def tpl[A](f: Function0[A]): Function0[A] = f
def tpl[A, R](f: Function1[A, R]): Function1[A, R] = f
def tpl[A1, A2, R](f: Function2[A1, A2, R]): Function1[(A1, A2), R] = f.tupled
def tpl[A1, A2, A3, R](f: Function3[A1, A2, A3, R]): Function1[(A1, A2, A3), R] = f.tupled
// ... and so on
scala> val add = (a: Int, b: Int) => a + b
add: (Int, Int) => Int = <function2>
scala> tpl(add)
res0: ((Int, Int)) => Int = <function1>
It's not pretty, but it's at least type-safe. I don't think it would be too difficult to create a macro to generate the overloads 1-22.

Using implicit def with composed types

Forgive me if this question is a duplicate; I'm having trouble finding anything because I don't know the right words to search. So, with implicit def, I can do things like this:
type CharsetMap = Map[Charset, Byte]
implicit def seqtup2CharsetMap(input: Seq[(String, Int)]): CharsetMap = {
Map.empty // placeholder
}
def somef(a: Int, b:Int, p: CharsetMap) = p
somef(1, 3, Seq(("hey", 2), ("there", 9)))
which lets me call somef with a Seq[(String, Int)] object as a parameter. The problem is that I have something like this...
def somef2(p: (CharsetMap) => Int) = p
and this does not work:
val p = (a: Seq[(String, Int)]) => 19
somef2(p)
How can I do this without doing an implicit def specifically for (Seq[(String, Int)]) => Int?
It looks like you want to implicitly convert some function A => B to a function that goes from C => B. You can do that with this generic implicit:
implicit def f2InputConverter[A, B, C](f: A => B)(implicit i: C => A): C => B = (c: C) => f(i(c))
Once you have that in scope, in your particular case, you'll need an implicit function which is the inverse of the one that you've defined in the question:
implicit def charsetMap2Seqtup(input: CharsetMap): Seq[(String, Int)] = {
Nil // placeholder
}
and then you should be able to call somef2 with p

General 'map' function for Scala tuples?

I would like to map the elements of a Scala tuple (or triple, ...) using a single function returning type R. The result should be a tuple (or triple, ...) with elements of type R.
OK, if the elements of the tuple are from the same type, the mapping is not a problem:
scala> implicit def t2mapper[A](t: (A,A)) = new { def map[R](f: A => R) = (f(t._1),f(t._2)) }
t2mapper: [A](t: (A, A))java.lang.Object{def map[R](f: (A) => R): (R, R)}
scala> (1,2) map (_ + 1)
res0: (Int, Int) = (2,3)
But is it also possible to make this solution generic, i.e. to map tuples that contain elements of different types in the same manner?
Example:
class Super(i: Int)
object Sub1 extends Super(1)
object Sub2 extends Super(2)
(Sub1, Sub2) map (_.i)
should return
(1,2): (Int, Int)
But I could not find a solution so that the mapping function determines the super type of Sub1 and Sub2. I tried to use type boundaries, but my idea failed:
scala> implicit def t2mapper[A,B](t: (A,B)) = new { def map[X >: A, X >: B, R](f: X => R) = (f(t._1),f(t._2)) }
<console>:8: error: X is already defined as type X
implicit def t2mapper[A,B](t: (A,B)) = new { def map[X >: A, X >: B, R](f: X => R) = (f(t._1),f(t._2)) }
^
<console>:8: error: type mismatch;
found : A
required: X
Note: implicit method t2mapper is not applicable here because it comes after the application point and it lacks an explicit result type
implicit def t2mapper[A,B](t: (A,B)) = new { def map[X >: A, X >: B, R](f: X => R) = (f(t._1),f(t._2)) }
Here X >: B seems to override X >: A. Does Scala not support type boundaries regarding multiple types? If yes, why not?
I think this is what you're looking for:
implicit def t2mapper[X, A <: X, B <: X](t: (A,B)) = new {
def map[R](f: X => R) = (f(t._1), f(t._2))
}
scala> (Sub1, Sub2) map (_.i)
res6: (Int, Int) = (1,2)
A more "functional" way to do this would be with 2 separate functions:
implicit def t2mapper[A, B](t: (A, B)) = new {
def map[R](f: A => R, g: B => R) = (f(t._1), g(t._2))
}
scala> (1, "hello") map (_ + 1, _.length)
res1: (Int, Int) = (2,5)
I’m not a scala type genius but maybe this works:
implicit def t2mapper[X, A<:X, B<:X](t: (A,B)) = new { def map[A, B, R](f: X => R) = (f(t._1),f(t._2)) }
This can easily be achieved using shapeless, although you'll have to define the mapping function first before doing the map:
object fun extends Poly1 {
implicit def value[S <: Super] = at[S](_.i)
}
(Sub1, Sub2) map fun // typed as (Int, Int), and indeed equal to (1, 2)
(I had to add a val in front of i in the definition of Super, this way: class Super(val i: Int), so that it can be accessed outside)
The deeper question here is "why are you using a Tuple for this?"
Tuples are hetrogenous by design, and can contain an assortment of very different types. If you want a collection of related things, then you should be using ...drum roll... a collection!
A Set or Sequence will have no impact on performance, and would be a much better fit for this kind of work. After all, that's what they're designed for.
For the case when the two functions to be applied are not the same
scala> Some((1, "hello")).map((((_: Int) + 1 -> (_: String).length)).tupled).get
res112: (Int, Int) = (2,5)
The main reason I have supplied this answer is it works for lists of tuples (just change Some to List and remove the get).

How to write a zipWith method that returns the same type of collection as those passed to it?

I have reached this far:
implicit def collectionExtras[A](xs: Iterable[A]) = new {
def zipWith[B, C, That](ys: Iterable[B])(f: (A, B) => C)(implicit cbf: CanBuildFrom[Iterable[A], C, That]) = {
val builder = cbf(xs.repr)
val (i, j) = (xs.iterator, ys.iterator)
while(i.hasNext && j.hasNext) {
builder += f(i.next, j.next)
}
builder.result
}
}
// collectionExtras: [A](xs: Iterable[A])java.lang.Object{def zipWith[B,C,That](ys: Iterable[B])(f: (A, B) => C)(implicit cbf: scala.collection.generic.CanBuildFrom[Iterable[A],C,That]): That}
Vector(2, 2, 2).zipWith(Vector(4, 4, 4))(_ * _)
// res3: Iterable[Int] = Vector(8, 8, 8)
Now the problem is that above method always returns an Iterable. How do I make it return the collection of type as those passed to it? (in this case, Vector) Thanks.
You got close enough. Just a minor change in two lines:
implicit def collectionExtras[A, CC[A] <: IterableLike[A, CC[A]]](xs: CC[A]) = new {
def zipWith[B, C, That](ys: Iterable[B])(f: (A, B) => C)(implicit cbf: CanBuildFrom[CC[A], C, That]) = {
val builder = cbf(xs.repr)
val (i, j) = (xs.iterator, ys.iterator)
while(i.hasNext && j.hasNext) {
builder += f(i.next, j.next)
}
builder.result
}
}
First, you need to get the collection type being passed, so I added CC[A] as a type parameter. Also, that collection needs to be able to "reproduce" itself -- that is guaranteed by the second type parameter of IterableLike -- so CC[A] <: IterableLike[A, CC[A]]. Note that this second parameter of IterableLike is Repr, precisely the type of xs.repr.
Naturally, CanBuildFrom needs to receive CC[A] instead of Iterable[A]. And that's all there is to it.
And the result:
scala> Vector(2, 2, 2).zipWith(Vector(4, 4, 4))(_ * _)
res0: scala.collection.immutable.Vector[Int] = Vector(8, 8, 8)
The problem above is that your implicit conversion collectionExtras causes the obtained object to lose type information. In particular, in the solution above, the concrete collection type is lost because you're passing it an object of type Iterable[A] - from this point on, the compiler no longer knows the real type of xs. Although the builder factory CanBuildFrom programatically ensures that the dynamic type of the collection is correct (you really get a Vector), statically, the compiler knows only that zipWith returns something that is an Iterable.
To solve this problem, instead of having the implicit conversion take an Iterable[A], let it take an IterableLike[A, Repr]. Why?
Iterable[A] is usually declared as something like:
Iterable[A] extends IterableLike[A, Iterable[A]]
The difference with Iterable is that this IterableLike[A, Repr] keeps the concrete collection type as Repr. Most concrete collections, in addition to mixing in Iterable[A], also mix in the trait IterableLike[A, Repr], replacing the Repr with their concrete type, like below:
Vector[A] extends Iterable[A] with IterableLike[A, Vector[A]]
They can do this because type parameter Repr is declared as covariant.
Long story short, using IterableLike causes you implicit conversion to keep the concrete collection type information (that is Repr) around and use it when you define zipWith - note that the builder factory CanBuildFrom will now contain Repr instead of Iterable[A] for the first type parameter, causing the appropriate implicit object to be resolved:
import collection._
import collection.generic._
implicit def collectionExtras[A, Repr](xs: IterableLike[A, Repr]) = new {
def zipWith[B, C, That](ys: Iterable[B])(f: (A, B) => C)(implicit cbf: CanBuildFrom[Repr, C, That]) = {
val builder = cbf(xs.repr)
val (i, j) = (xs.iterator, ys.iterator)
while(i.hasNext && j.hasNext) {
builder += f(i.next, j.next)
}
builder.result
}
}
Reading your question formulation more carefully ("How to write a zipWith method that returns the same type of collection as those passed to it?"), it seems to me that you want to have the same type of collection as those passed to zipWith, not to the implicit conversion, that is the same type asys.
Same reasons as before, see solution below:
import collection._
import collection.generic._
implicit def collectionExtras[A](xs: Iterable[A]) = new {
def zipWith[B, C, That, Repr](ys: IterableLike[B, Repr])(f: (A, B) => C)(implicit cbf: CanBuildFrom[Repr, C, That]) = {
val builder = cbf(ys.repr)
val (i, j) = (xs.iterator, ys.iterator)
while(i.hasNext && j.hasNext) {
builder += f(i.next, j.next)
}
builder.result
}
}
With results:
scala> immutable.Vector(2, 2, 2).zipWith(mutable.ArrayBuffer(4, 4, 4))(_ * _)
res1: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(8, 8, 8)
To be honest I'm not sure how that really works:
implicit def collectionExtras[CC[X] <: Iterable[X], A](xs: CC[A]) = new {
import collection.generic.CanBuildFrom
def zipWith[B, C](ys: Iterable[B])(f: (A, B) => C)
(implicit cbf:CanBuildFrom[Nothing, C, CC[C]]): CC[C] = {
xs.zip(ys).map(f.tupled)(collection.breakOut)
}
}
scala> Vector(2, 2, 2).zipWith(Vector(4, 4, 4))(_ * _)
res1: scala.collection.immutable.Vector[Int] = Vector(8, 8, 8)
I sort of monkey patched this answer from retronym until it worked!
Basically, I want to use the CC[X] type constructor to indicate that zipWith should return the collection type of xs but with C as the type parameter (CC[C]). And I want to use breakOut to get the right result type. I sort of hoped that there was a CanBuildFrom implicit in scope but then got this error message:
required: scala.collection.generic.CanBuildFrom[Iterable[(A, B)],C,CC[C]]
The trick was then to use Nothing instead of Iterable[(A, B)]. I guess that implicit is defined somewhere...
Also, I like to think of your zipWith as zip and then map, so I changed the implementation. Here is with your implementation:
implicit def collectionExtras[CC[X] <: Iterable[X], A](xs: CC[A]) = new {
import collection.generic.CanBuildFrom
def zipWith[B, C](ys: Iterable[B])(f: (A, B) => C)
(implicit cbf:CanBuildFrom[Nothing, C, CC[C]]) : CC[C] = {
val builder = cbf()
val (i, j) = (xs.iterator, ys.iterator)
while(i.hasNext && j.hasNext) {
builder += f(i.next, j.next)
}
builder.result
}
}
Note this article provides some background on the type constructor pattern.