Spark reduceByKey with generic types (Scala) - scala

I am trying to create some simple custom aggregate operators in Spark using Scala.
I have created a simple hierarchy of operators, with the following super-class:
sealed abstract class Aggregator(val name: String) {
type Key = Row // org.apache.spark.sql.Row
type Value
...
}
I also have a companion object, which constructs the appropriate aggregator each time. Observe that each operator is allowed to specify the Value type it wants.
Now the problem is when I try to call combineByKey:
val agg = Aggregator("SUM")
val res = rdd
.map(agg.mapper)
.reduceByKey(agg.reducer(_: agg.Value, _: agg.Value))
The error is:
value reduceByKey is not a member of org.apache.spark.rdd.RDD[(agg.Key, agg.Value)]
For my needs, Value can either be a numeric type or a tuple, hence its no bounds definition. If I replace the Value type declaration with:
type Value = Double
in Aggregator class, then everything works fine. Therefore, I suppose that the error is relevant to reduceByKey not knowing the exact Value type in compile time.
Any ideas on how to get around this?

Your RDD cannot be implicitly converted into PairRDDFunctions, because all the implicit ClassTags for keys and values are missing.
You might want to include the class tags as implicit parameters in your Aggregator:
sealed abstract class Aggregator[K: ClassTag, V: ClassTag](name: String) {
implicit val keyClassTag: ClassTag[K] = implicitly
implicit val valueClassTag: ClassTag[V] = implicitly
}
or maybe:
sealed abstract class Aggregator[K, V](name: String)(implicit kt: ClassTag[K], vt: ClassTag[V]) {
implicit val keyClassTag: ClassTag[K] = kt
implicit val valueClassTag: ClassTag[V] = vt
}
or maybe even:
sealed abstract class Aggregator(name: String) {
type K
type V
implicit def keyClassTag: ClassTag[K]
implicit def valueClassTag: ClassTag[V]
}
The last variant would shift the responsibility for providing the ClassTags to the implementor of the abstract class.
Now, when using an aggregator a of type Aggregator[K, V] in a reduceByKey, you would have to make sure that those implicitly provided class tags are in the current implicit scope:
val agg = Aggregator("SUM")
import agg._ // now the implicits should be visible
val res = rdd
.map(agg.mapper)
.reduceByKey(agg.reducer(_: agg.Value, _: agg.Value))

Related

Difference between Scala 2 implicits and Scala 3 given/using

What is the difference between the implicit keyword in Scala 2 and given+using in Scala 3? Is it just that implicit has been split up into two keywords, or are the semantics also different, and if so, how?
For the most part, they are the same. However, implicit is no longer used for multiple different concepts. The docs go into more detail, but here's a summary of them:
Using
When declaring parameters, using is just the same as implicit. However, when explicitly passing an implicit argument, you must use using:
def foo(using bar: Bar) = ???
foo(using Bar()) //Cannot use just foo(Bar()) as you would in Scala 2
You can also have implicit by-name parameters in Scala 3.
Given
Givens are also pretty similar to implicit vals/objects/methods.
One nice thing about them is that they can be anonymous, and the compiler will generate a name for them, which looks something like given_F_X_Y if the type of the given were F[X, Y]. More details here.
Another change is that the type of a given must be written explicitly - it cannot be inferred like for an implicit in Scala 2.
A given without parameters maps to an implicit object. given foo: Foo with {...} becomes just implicit object foo extends Foo {...}.
A given with parameters is akin to an implicit def that takes in only more implicit parameters.
given listOrd[T](using ord: Ord[T]): Ord[List[T]] with { ... }
//^^ this maps to this vv
class listOrd[T](implicit ord: Ord[T]) extends Ord[List[T]] { ... }
final implicit def listOrd[T](implicit ord: Ord[T]): listOrd[T] = new listOrd[T]
A given that is merely an alias becomes an implicit def if it is just a reference, or an implicit lazy val otherwise.
val foo: Foo
given Foo = foo
would become final implicit def given_Foo = foo (note the compiler-generated name), but
given foo: Foo = new Foo()
would turn into final implicit lazy val foo: Foo = new Foo() because new Foo() shouldn't be computed unnecessarily.
Instead of using an implicit def for an implicit conversion from A to B, you can now define a given Conversion[A, B] instance.
You can also still use implicit classes in Dotty, but you can directly define extension methods. While methods inside extensions cannot take their own type parameters, they are easier to use than implicit classes.
An additional change in Scala 3 - summon is a method like implicitly, but it can return a type more specific than the one being requested.
Semantics is also different. In Scala 2 Not can be defined with ambiguity trick
trait Not[A]
object Not {
implicit def default[A]: Not[A] = null
implicit def ambig[A](implicit a: A): Not[A] = null
}
implicitly[Not[Int]] // compiles
implicit val s: String = null
// implicitly[Not[String]] // doesn't compile
But in Scala 3 this doesn't work because ambiguity error is not propagated
trait Not[A]
object Not {
given [A]: Not[A] = null
given [A](using a: A): Not[A] = null
// given ambig[A](using a: A): Not[A] = null
}
summon[Not[Int]] // compiles
given String = null
summon[Not[String]] // compiles
One should use scala.util.NotGiven instead
summon[NotGiven[Int]] // compiles
given String = null
// summon[NotGiven[String]] // doesn't compile
(Tested in 3.0.0-M3-bin-20201211-dbc1186-NIGHTLY)
http://dotty.epfl.ch/docs/reference/contextual/givens.html#negated-givens
http://dotty.epfl.ch/docs/reference/changed-features/implicit-resolution.html

How can I create an Option type at runtime (reflection)?

Using reflection, I have determined the runtime type of a thing, t: Type. Now I want to create a new Type of Option[t]. How can I do that?
val t: Type = ...
val optT: Type = ??? // Option of whatever t is
Why I want this: I have a handler function that operates on a Type. At compile time I have something like this:
trait Thing { name: String }
case class BigThing(name: String) extends Thing
case class Stuff[T <: Thing]( id: Int, maybeThing: Option[T] ) // contrived
def handler( t: Type ): Output = {...}
I can reflect that if I have a class of type Stuff, it has a member maybeThing of type Object[T] or even Object[Thing]. At runtime let's say I can determine that a specific object has T = BigThing, so I want to pass Option[BigThing], not Option[T] or Option[Thing] to handler(). That's why I'm trying to create a runtime type of Option[BigThing].
I did try the following but Scala didn't like it:
val newType = staticClass(s"Option[${runtimeTypeTAsString}]")
According to tutorial
there are three ways to instantiate a Type.
via method typeOf on scala.reflect.api.TypeTags, which is mixed into Universe (simplest and most common).
Standard Types, such as Int, Boolean, Any, or Unit are accessible through the available universe.
Manual instantiation using factory methods such as typeRef or polyType on scala.reflect.api.Types, (not recommended).
Using the third way,
import scala.reflect.runtime.universe._
class MyClass
val t: Type = typeOf[MyClass] //pckg.App.MyClass
val mirror = runtimeMirror(ClassLoader.getSystemClassLoader)
val optT: Type = mirror.universe.internal.typeRef(
definitions.PredefModule.typeSignature,
definitions.OptionClass,
List(t)
) // Option[pckg.App.MyClass]
val optT1 : Type = typeOf[Option[MyClass]]
optT =:= optT1 // true

scala: Parameterize by Type Union

I needed a type Union to force restriction on types, So as per answer in here, I defined my Union as:
sealed trait Value[T]
object Value{
implicit object NumberWitness extends Value[Int]
implicit object StringWitness extends Value[String]
}
Now, How do i create a list or class parameterized by this type union? Is it possible to do so?? I tried following syntax in repl, but without any luck:
scala> import Value._
import Value._
scala> def list[V: Value] = List("hello", 1)
list: [V](implicit evidence$1: Value[V])List[Any]
scala> list
<console>:18: error: ambiguous implicit values:
both object NumberWitness in object Value of type Value.NumberWitness.type
and object StringWitness in object Value of type Value.StringWitness.type
match expected type Value[V]
list
^
Or Is it possible to do so with advanced FP libraries like scalaz or cats??
This is called type class, not type union. And they are intended to allow you to write methods which work either with Int or with String, e.g.
def listOfValues[V: Value](x: V) = List(x)
listOfValues(1) // works
listOfValues("") // works
listOfValues(0.0) // doesn't work
listOfValues(1, "") // doesn't work
not to allow mixing different types.
You can do it using existential types, e.g.
case class WithValue[V: Value](x: V)
object WithValue {
implicit def withValue[V: Value](x: V) = WithValue(x)
}
def list = List[WithValue[_]]("hello", 1)
but I would not recommend actually doing that. There is quite likely a better way to solve your problem.
In particular, consider using simply
// not sealed if you need to add other types elsewhere
// can be Value[T] instead
sealed trait Value
case class IntValue(x: Int) extends Value
case class StringValue(x: Int) extends Value
// add implicit conversions to IntValue and StringValue if desired
List(StringValue("hello"), IntValue(1))

Scala: generic parser for Enumeration values

I thought it should be possible to write a generic function that works for all Enumeration values. I tried a simple parser first but I failed:
object Weekday extends Enumeration {
type Weekday = Value
val MONDAY = Value("MONDAY")
val OTHER = Value("OTHER")
implicit def valueToWeekday(v: Value): Weekday = v.asInstanceOf[Weekday]
implicit def stringToWeekday(s: String): Weekday = Weekday.withName(s)
}
object Enumerations {
import Weekday._
println("Welcome to the Scala worksheet")
def parseEnumeration[T <: Enumeration](s: String)(implicit ev: T): T#Value = {
ev.withName(s)
}
val test = parseEnumeration[Weekday]("MONDAY")
}
So how can I write a generict function taking an enumeration type as parameter and returning a Value of that type? I'm a bit confused here with the Object and the inner type with the same name.
Firstly, your implicit method valueToWeekday doesn't really do anything, as Weekday is simply an alias for Value in this context.
Secondly, your implicit method stringToWeekday is a working, albeit non-generic conversion from a string to its enumeration value.
However, it is not hard to make stringToWeekday generic. You simply need to pass the enumeration to the function, just like you do in parseEnumeration. Since you made the evidence in parseEnumeration implicit, all you need to do is put an appropriate implicit value in the context. Alternatively, you can pass the evidence explicitly.
So you can remove those implicit conversions (and the type alias, since the name-clash is slightly misleading).
object Weekday extends Enumeration {
val Monday = Value("MONDAY")
val Other = Value("OTHER")
}
The implicit way:
def parseEnumeration[T <: Enumeration](s: String)(implicit ev: T): T#Value = ev.withName(s)
implicit val evidence = Weekday
val weekday = parseEnumeration("MONDAY") // results in the value Weekday.Monday
The explicit way:
def parseEnumeration[T <: Enumeration](s: String, enumeration: T): T#Value = enumeration.withName(s)
val weekday = stringToEnumerationValue("MONDAY", Weekday) // results in the value Weekday.Monday
A third option would be to use a ClassTag as evidence, which is put in the context by the compiler through the generic parameters. However, this requires reflection to actually call the method withName and I would discourage going this way.

Why is there no Tuple1 Literal for single element tuples in Scala?

Python has (1,) for a single element tuple. In Scala, (1,2) works for Tuple2(1,2) but we must use Tuple1(1) to get a single element tuple. This may seem like a small issue but designing APIs that expect a Product is a pain to deal for users that are passing single elements since they have to write Tuple1(1).
Maybe this is a small issue, but a major selling point of Scala is more typing with less typing. But in this case it seems it's more typing with more typing.
Please tell me:
1) I've missed this and it exists in another form, or
2) It will be added to a future version of the language (and they'll accept patches).
You can define an implicit conversion:
implicit def value2tuple[T](x: T): Tuple1[T] = Tuple1(x)
The implicit conversion will only apply if the argument's static type does not already conform to the method parameter's type. Assuming your method takes a Product argument
def m(v: Product) = // ...
the conversion will apply to a non-product value but will not apply to a Tuple2, for example. Warning: all case classes extend the Product trait, so the conversion will not apply to them either. Instead, the product elements will be the constructor parameters of the case class.
Product is the least upper bound of the TupleX classes, but you can use a type class if you want to apply the implicit Tuple1 conversion to all non-tuples:
// given a Tupleable[T], you can call apply to convert T to a Product
sealed abstract class Tupleable[T] extends (T => Product)
sealed class ValueTupler[T] extends Tupleable[T] {
def apply(x: T) = Tuple1(x)
}
sealed class TupleTupler[T <: Product] extends Tupleable[T] {
def apply(x: T) = x
}
// implicit conversions
trait LowPriorityTuple {
// this provides a Tupleable[T] for any type T, but is the
// lowest priority conversion
implicit def anyIsTupleable[T]: Tupleable[T] = new ValueTupler
}
object Tupleable extends LowPriorityTuple {
implicit def tuple2isTuple[T1, T2]: Tupleable[Tuple2[T1,T2]] = new TupleTupler
implicit def tuple3isTuple[T1, T2, T3]: Tupleable[Tuple3[T1,T2,T3]] = new TupleTupler
// ... etc ...
}
You can use this type class in your API as follows:
def m[T: Tupleable](v: T) = {
val p = implicitly[Tupleable[T]](v)
// ... do something with p
}
If you have your method return the product, you can see how the conversions are being applied:
scala> def m[T: Tupleable](v: T) = implicitly[Tupleable[T]](v)
m: [T](v: T)(implicit evidence$1: Tupleable[T])Product
scala> m("asdf") // as Tuple1
res12: Product = (asdf,)
scala> m(Person("a", "n")) // also as Tuple1, *not* as (String, String)
res13: Product = (Person(a,n),)
scala> m((1,2)) // as Tuple2
res14: Product = (1,2)
You could, of course, add an implicit conversion to your API:
implicit def value2tuple[A](x: A) = Tuple1(x)
I do find it odd that Tuple1.toString includes the trailing comma:
scala> Tuple1(1)
res0: (Int,) = (1,)
Python is not statically typed, so tuples there act more like fixed-size collections. That is not true of Scala, where each element of a tuple has a distinct type. Tuples, in Scala, doesn't have the same uses as in Python.