Understanding covariance in my Scala code - scala

I am working through the correct syntax and structure for the following problem.
I have two datasets with two separate schemas--call them ClientEvent and ServerEvent--stored on disk. The codebase I am working on has defined a class, Reader[T :< Asset] where ClientEvent and ServerEvent are subtypes of Asset. Asset is a trait.
I am writing a function:
def getPathAndReader(config): (String, Reader[Asset]) = {
if (config.readClient) {
return getClientPathAndReader(config)
} else {
return getServerPathAndReader(config)
}
}
This does not compile in my Scala code. From my understanding, T must be a subtype of Asset, which both ServerEvent and ClientEvent are, therefore Reader[ServerEvent] <: Reader[Asset]. But since functions are covariant in their inputs, the function I wrote cannot just return this lower type, I'd have to cast it to a supertype? Does that lose too much information?
load is a function on the trait Asset
trait Reader[T <: Asset] {
def load(raw: DataFrame): Dataset[T]
}
What would be an alternative way to structure this code?
The code's intent is to take the file path returned, and call Reader::load(filePath: String) to get data back. The subtyped readers have some internal logic to clean the data that it retrieves from disk before it's returned as a Dataframe. This means it relies on the type that it passes in. I come from a C++/C# background so my thinking is that if you have a generic Reader[Asset] but call Reader::load(path: String) it will know what to do based on the type it actually is, similar to Base* ptr and calling a derived method.

Your claim that
"From my understanding, T must be a subtype of Asset, which both ServerEvent and ClientEvent are, therefore Reader[ServerEvent] <: Reader[Asset]." is not correct. Generally if A and B are usual types such as A <: B and G[T] is a generic type, then all 3 cases are possible:
Co-variant case G[A] <: G[B] - typical example is some read-only collection like Iterator
Contra-variant case G[A] :> G[B] - typical example is some kind of a consumer like a function T => ()
Invariant case where G[A] and G[B] are not related. The most typical case when some uses of the T are co-variant and some a contravariant. For example, a simple mapping function T => T is invariant. Also most of the mutable collections are invariant as well because the both "produce" and "consume" objects.
Unfortunately for you Dataset[T] is invariant (rather than covariant Dataset[+T] or contravariant Dataset[-T]). This effectively makes your Reader also invariant. As to how to work this around, it is hard to advice without understanding a larger context. For example, why your getClientPathAndReader and getServerPathAndReader do not return Dataset[Asset]? If you really then use specific ServerEvent and ClientEvent, then your design is not type-safe anyway. If you use only Asset, then changing your readers to return Dataset[Asset] seems the easiest solution.

Related

Scala: Casting results of groupBy(_.getClass)

In this hypothetical, I have a list of operations to be executed. Some of the operations in that list will be more efficient if they can be batched together (eg, lookup up different rows from the same table in a database).
trait Result
trait BatchableOp[T <: BatchableOp[T]] {
def resolve(batch: Vector[T]): Vector[Result]
}
Here we use F-bounded Polymorphism to allow the implementation of the operation to refer to its own type, which is highly convenient.
However, this poses a problem when it comes time to execute:
def execute(operations: Vector[BatchableOp[_]]): Vector[Result] = {
def helper[T <: BatchableOp[T]](clazz: Class[T], batch: Vector[T]): Vector[Result] =
batch.head.resolve(batch)
operations
.groupBy(_.getClass)
.toVector
.flatMap { case (clazz, batch) => helper(clazz, batch)}
}
This results in a compiler error stating inferred type arguments [BatchableOp[_]] do not conform to method helper's type parameter bounds [T <: BatchableOp[T]].
How can the Scala compiler be convinced that the group is all of the same type (which is a subclass of BatchableOp)?
One workaround is to specify the type explicitly, but in this case the type is unknown.
Another workaround relies on enumerating the child types, but I'd like to not have to update the execute method after implementing a new BatchableOp type.
I would like to approach the question systematically, so that the same solution strategy can be applied in similar cases.
First, an obvious remark: you want to work with a vector. The content of the vector can be of different types. The length of the vector is not limited. The number of types of entries of the vector is not limited. Therefore, the compiler cannot prove everything at compile time: you will have to use something like asInstanceOf at some point.
Now to the solution of the actual question:
This here compiles under 2.12.4:
import scala.language.existentials
trait Result
type BOX = BatchableOp[X] forSome { type X <: BatchableOp[X] }
trait BatchableOp[C <: BatchableOp[C]] {
def resolve(batch: Vector[C]): Vector[Result]
// not abstract, needed only once!
def collectSameClassInstances(batch: Vector[BOX]): Vector[C] = {
for (b <- batch if this.getClass.isAssignableFrom(b.getClass))
yield b.asInstanceOf[C]
}
// not abstract either, no additional hassle for subclasses!
def collectAndResolve(batch: Vector[BOX]): Vector[Result] =
resolve(collectSameClassInstances(batch))
}
def execute(operations: Vector[BOX]): Vector[Result] = {
operations
.groupBy(_.getClass)
.toVector
.flatMap{ case (_, batch) =>
batch.head.collectAndResolve(batch)
}
}
The main problem that I see here is that in Scala (unlike in some experimental dependently typed languages) there is no simple way to write down complex computations "under the assumption of existence of a type".
Therefore, it seems difficult / impossible to transform
Vector[BatchOp[T] forSome T]
into a
Vector[BatchOp[T]] forSome T
Here, the first type says: "it's a vector of batchOps, their types are unknown, and can be all different", whereas the second type says: "it's a vector of batchOps of unknown type T, but at least we know that they are all the same".
What you want is something like the following hypothetical language construct:
val vec1: Vector[BatchOp[T] forSome T] = ???
val vec2: Vector[BatchOp[T]] forSome T =
assumingExistsSomeType[C <: BatchOp[C]] yield {
/* `C` now available inside this scope `S` */
vec1.map(_.asInstanceOf[C])
}
Unfortunately, we don't have anything like it for existential types, we can't introduce a helper type C in some scope S such that when C is eliminated, we are left with an existential (at least I don't see a general way to do it).
Therefore, the only interesting question that is to be answered here is:
Given a Vector[BatchOp[X] forSome X] for which I know that there is one common type C such that they all are actually Vector[C], where is the scope in which this C is present as a usable type variable?
It turns out that BatchableOp[C] itself has a type variable C in scope. Therefore, I can add a method collectSameClassInstances to BachableOp[C], and this method will actually have some type C available that it can use in the return type. Then I can immediately pass the result of collectSameClassInstances to the resolve method, and then I get a completely benign Vector[Result] type as output.
Final remark: If you decide to write any code with F-bounded polymorphisms and existentials, at least make sure that you have documented very clearly what exactly you are doing there, and how you will ensure that this combination does not escape in any other parts of the codebase. It doesn't feel like a good idea to expose such interfaces to the users. Keep it localized, make sure these abstractions do not leak anywhere.
Andrey's answer has a key insight that the only scope with the appropriate type variable is on the BatchableOp itself. Here's a reduced version that doesn't rely on importing existentials:
trait Result
trait BatchableOp[T <: BatchableOp[T]] {
def resolve(batch: Vector[T]): Vector[Result]
def unsafeResolve(batch: Vector[BatchableOp[_]]): Vector[Result] = {
resolve(batch.asInstanceOf[Vector[T]])
}
}
def execute(operations: Vector[BatchableOp[_]]): Vector[Result] = {
operations
.groupBy(_.getClass)
.toVector
.flatMap{ case (_, batch) =>
batch.head.unsafeResolve(batch)
}
}

Issue with existential in Scala

I have an issue working with existentials in Scala. My problem started when creating a mini workflow engine. I started on the idea that it was a directed graph, implemented the model for the latter first and then modeled the Workflow like this:
case class Workflow private(states: List[StateDef], transitions: List[_, _], override val edges: Map[String, List[StateDef]]) extends Digraph[String, StateDef, Transition[_, _]](states, edges) { ... }
In this case class, the first two fields are a list of states which behave as node, transitions which behave as edges.
The Transition parameter types are for the input and output parameters, as this should behave as an executable piece in the workflow, like a function of some sort:
case class Transition[-P, +R](tailState: StateDef, headState: StateDef, action: Action[P, R], condition: Option[Condition[P]] = None) extends Edge[String, StateDef](tailState, headState) {
def execute(param: P): Try[State[R]] = ...
}
I realized soon enough that dealing with a list of transitions in the Workflow object was giving me troubles with its type parameters. I tried to use parameters with [[Any]] and [[Nothing]], but I couldn't make it work (gist [1]).
If I'd do Java, I'd use a wildcard ? and use its 'less type safe and more dynamic' property and Java would have to believe me. But Scala is stricter and with variance and covariance of the Transition parameter types, it's hard to define wildcards and handle these properly. For example, using forSome notation and having a method in Workflow, I would get this error (gist [2]):
Error:(55, 24) type mismatch;
found : List[A$A27.this.Transition[A$A27.this.CreateImage,A$A27.this.Image]]
required: List[A$A27.this.Transition[P forSome { type P },R forSome { type R }]]
lazy val w = Workflow(transitions)
^
Hence then I created an existential type based on a trait (gist [3]), as explained in this article.
trait Transitions {
type Param
type Return
val transition: Transition[Param, Return]
val evidenceParam: StateValue[Param]
val evidenceReturn: StateValue[Return]
}
So now I could plug this existential in my Workflow class like this:
case class Workflow private(states: List[StateDef], transitions: List[Transitions], override val edges: Map[String, List[StateDef]])
extends Digraph[String, StateDef, Transitions](states, edges)
Working in a small file proved to be working (gist [3]). But when I moved on to the real code, my Digraph parent class does not like this Transitions existential. The former needs an Edge[ID, V] type, which Transition complies with but not the Transitions existential of course.
How in Scala does one resolve this situation? It seems troublesome to work with parameter types to get generics in Scala. Is there an easier solution that I haven't tried? Or a magic trick to specify the correct compatible parameter type between Digraph which need an Edge[ID, V] type and not an existential type that basically erase type traces?
I am sorry as this is convoluted, I will try my best to update the question if necessary.
Here are the Gist references for some of my trials and errors:
https://gist.github.com/jimleroyer/943efd00c764880b8119786d9dd6c3a2
https://gist.github.com/jimleroyer/1ce238b3934882ddc02a09485f52f407
https://gist.github.com/jimleroyer/17227b7e334d020a21deb36086b9b978
EDIT-1
Based on #HTNW answer, I've modified the scope of the existentials using forSome and updated the solution: https://gist.github.com/jimleroyer/2cb4ccbec13620585d21d53b4431ce22
I still have an issue though to properly bind the generics with the matchTransition & getTransition methods and without an explicit cast using asInstanceOf. I'll open another question specific to that one issue.
You scoped your existential quantifiers wrong.
R forSome { type R }
is equal to Any, because every single type is a type, so every single type is a subtype of that existential type, and that is the distinguishing feature of Any. Therefore
Transition[P forSome { type P }, R forSome { type R }]
is really
Transition[Any, Any]
and the Transitions end up needing to take Anys as parameter, and you lose all information about the type of the return. Make it
List[Transition[P, R] forSome { type P; type R }] // if every transition can have different types
List[Transition[P, R]] forSome { type P; type R } // if all the transitions need similar types
// The first can also be sugared to
List[Transition[_, _]]
// _ scopes so the forSome is placed outside the nearest enclosing grouping
Also, I don't get where you got the idea that Java's ? is "less safe". Code using it has a higher chance of being unsafe, sure, because ? is limited, but on its own it is perfectly sound (modulo null).

What does `SomeType[_]` mean in scala?

Going through Play Form source code right now and encountered this
def bindFromRequest()(implicit request: play.api.mvc.Request[_]): Form[T] = {
I am guessing it takes request as an implicit parameter (you don't have to call bindFromRequet(request)) of type play.api.mvc.Request[_] and return a generic T type wrapped in Form class. But what does [_] mean.
The notation Foo[_] is a short hand for an existential type:
Foo[A] forSome {type A}
So it differs from a normal type parameter by being existentially quantified. There has to be some type so your code type checks where as if you would use a type parameter for the method or trait it would have to type check for every type A.
For example this is fine:
val list = List("asd");
def doSomething() {
val l: List[_] = list
}
This would not typecheck:
def doSomething[A]() {
val l: List[A] = list
}
So existential types are useful in situations where you get some parameterized type from somewhere but you do not know and care about the parameter (or only about some bounds of it etc.)
In general, you should avoid existential types though, because they get complicated fast. A lot of instances (especially the uses known from Java (called wildcard types there)) can be avoided be using variance annotations when designing your class hierarchy.
The method doesn't take a play.api.mvc.Request, it takes a play.api.mvc.Request parameterized with another type. You could give the type parameter a name:
def bindFromRequest()(implicit request: play.api.mvc.Request[TypeParameter]): Form[T] = {
But since you're not referring to TypeParameter anywhere else it's conventional to use an underscore instead. I believe the underscore is special cased as a 'black hole' here, rather than being a regular name that you could refer to as _ elsewhere in the type signature.

What's the difference between "Generic type" and "Higher-kinded type"?

I found myself really can't understand the difference between "Generic type" and "higher-kinded type".
Scala code:
trait Box[T]
I defined a trait whose name is Box, which is a type constructor that accepts a parameter type T. (Is this sentence correct?)
Can I also say:
Box is a generic type
Box is a higher-kinded type
None of above is correct
When I discuss the code with my colleagues, I often struggle between the word "generic" and "higher-kinde" to express it.
It's probably too late to answer now, and you probably know the difference by now, but I'm going to answer just to offer an alternate perspective, since I'm not so sure that what Greg says is right. Generics is more general than higher kinded types. Lots of languages, such as Java and C# have generics, but few have higher-kinded types.
To answer your specific question, yes, Box is a type constructor with a type parameter T. You can also say that it is a generic type, although it is not a higher kinded type. Below is a broader answer.
This is the Wikipedia definition of generic programming:
Generic programming is a style of computer programming in which algorithms are written in terms of types to-be-specified-later that are then instantiated when needed for specific types provided as parameters. This approach, pioneered by ML in 1973,1 permits writing common functions or types that differ only in the set of types on which they operate when used, thus reducing duplication.
Let's say you define Box like this. It holds an element of some type, and has a few special methods. It also defines a map function, something like Iterable and Option, so you can take a box holding an integer and turn it into a box holding a string, without losing all those special methods that Box has.
case class Box(elem: Any) {
..some special methods
def map(f: Any => Any): Box = Box(f(elem))
}
val boxedNum: Box = Box(1)
val extractedNum: Int = boxedString.elem.asInstanceOf[Int]
val boxedString: Box = boxedNum.map(_.toString)
val extractedString: String = boxedString.elem.asInstanceOf[String]
If Box is defined like this, your code would get really ugly because of all the calls to asInstanceOf, but more importantly, it's not typesafe, because everything is an Any.
This is where generics can be useful. Let's say we define Box like this instead:
case class Box[A](elem: A) {
def map[B](f: A => B): Box[B] = Box(f(elem))
}
Then we can use our map function for all kinds of stuff, like changing the object inside the Box while still making sure it's inside a Box. Here, there's no need for asInstanceOf since the compiler knows the type of your Boxes and what they hold (even the type annotations and type arguments are not necessary).
val boxedNum: Box[Int] = Box(1)
val extractedNum: Int = boxedNum.elem
val boxedString: Box[String] = boxedNum.map[String](_.toString)
val extractedString: String = boxedString.elem
Generics basically lets you abstract over different types, letting you use Box[Int] and Box[String] as different types even though you only have to create one Box class.
However, let's say that you don't have control over this Box class, and it's defined merely as
case class Box[A](elem: A) {
//some special methods, but no map function
}
Let's say this API you're using also defines its own Option and List classes (both accepting a single type parameter representing the type of the elements). Now you want to be able to map over all these types, but since you can't modify them yourself, you'll have to define an implicit class to create an extension method for them. Let's add an implicit class Mappable for the extension method and a typeclass Mapper.
trait Mapper[C[_]] {
def map[A, B](context: C[A])(f: A => B): C[B]
}
implicit class Mappable[C[_], A](context: C[A])(implicit mapper: Mapper[C]) {
def map[B](f: A => B): C[B] = mapper.map(context)(f)
}
You could define implicit Mappers like so
implicit object BoxMapper extends Mapper[Box] {
def map[B](box: Box[A])(f: A => B): Box[B] = Box(f(box.elem))
}
implicit object OptionMapper extends Mapper[Option] {
def map[B](opt: Option[A])(f: A => B): Option[B] = ???
}
implicit object ListMapper extends Mapper[List] {
def map[B](list: List[A])(f: A => B): List[B] = ???
}
//and so on
and use it as if Box, Option, List, etc. have always had map methods.
Here, Mappable and Mapper are higher-kinded types, whereas Box, Option, and List are first-order types. All of them are generic types and type constructors. Int and String, however, are proper types. Here are their kinds, (kinds are to types as types are to values).
//To check the kind of a type, you can use :kind in the REPL
Kind of Int and String: *
Kind of Box, Option, and List: * -> *
Kind of Mappable and Mapper: (* -> *) -> *
Type constructors are somewhat analogous to functions (which are sometimes called value constructors). A proper type (kind *) is analogous to a simple value. It's a concrete type that you can use for return types, as the types of your variables, etc. You can just directly say val x: Int without passing Int any type parameters.
A first-order type (kind * -> *) is like a function that looks like Any => Any. Instead of taking a value and giving you a value, it takes a type and gives you another type. You can't use first-order types directly (val x: List won't work) without giving them type parameters (val x: List[Int] works). This is what generics does - it lets you abstract over types and create new types (the JVM just erases that information at runtime, but languages like C++ literally generate new classes and functions). The type parameter C in Mapper is also of this kind. The underscore type parameter (you could also use something else, like x) lets the compiler know that C is of kind * -> *.
A higher-kinded type/higher-order type is like a higher-order function - it takes another type constructor as a parameter. You can't use a Mapper[Int] above, because C is supposed to be of kind * -> * (so that you can do C[A] and C[B]), whereas Int is merely *. It's only in languages like Scala and Haskell with higher-kinded types that you can create types like Mapper above and other things beyond languages with more limited type systems, like Java.
This answer (and others) on a similar question may also help.
Edit: I've stolen this very helpful image from that same answer:
There is no difference between 'Higher-Kinded Types' and 'Generics'.
Box is a 'structure' or 'context' and T can be any type.
So T is generic in the English sense... we don't know what it will be and we don't care because we aren't going to be operating on T directly.
C# also refers to these as Generics. I suspect they chose this language because of its simplicity (to not scare people away).

Scala: Type parameters and inheritance

I'm seeing something I do not understand. I have a hierarchy of (say) Vehicles, a corresponding hierarchy of VehicalReaders, and a VehicleReader object with apply methods:
abstract class VehicleReader[T <: Vehicle] {
...
object VehicleReader {
def apply[T <: Vehicle](vehicleId: Int): VehicleReader[T] = apply(vehicleType(vehicleId))
def apply[T <: Vehicle](vehicleType VehicleType): VehicleReader[T] = vehicleType match {
case VehicleType.Car => new CarReader().asInstanceOf[VehicleReader[T]]
...
Note that when you have more than one apply method, you must specify the return type. I have no issues when there is no need to specify the return type.
The cast (.asInstanceOf[VehicleReader[T]]) is the reason for the question - without it the result is compile errors like:
type mismatch;
found : CarReader
required: VehicleReader[T]
case VehicleType.Car => new CarReader()
^
Related questions:
Why cannot the compiler see a CarReader as a VehicleReader[T]?
What is the proper type parameter and return type to use in this situation?
I suspect the root cause here is that VehicleReader is invariant on its type parameter, but making it covariant does not change the result.
I feel like this should be rather simple (i.e., this is easy to accomplish in Java with wildcards).
The problem has a very simple cause and really doesn't have anything to do with variance. Consider even more simple example:
object Example {
def gimmeAListOf[T]: List[T] = List[Int](10)
}
This snippet captures the main idea of your code. But it is incorrect:
val list = Example.gimmeAListOf[String]
What will be the type of list? We asked gimmeAListOf method specifically for List[String], however, it always returns List[Int](10). Clearly, this is an error.
So, to put it in words, when the method has a signature like method[T]: Example[T] it really declares: "for any type T you give me I will return an instance of Example[T]". Such types are sometimes called 'universally quantified', or simply 'universal'.
However, this is not your case: your function returns specific instances of VehicleReader[T] depending on the value of its parameter, e.g. CarReader (which, I presume, extends VehicleReader[Car]). Suppose I wrote something like:
class House extends Vehicle
val reader = VehicleReader[House](VehicleType.Car)
val house: House = reader.read() // Assuming there is a method VehicleReader[T].read(): T
The compiler will happily compile this, but I will get ClassCastException when this code is executed.
There are two possible fixes for this situation available. First, you can use existential (or existentially quantified) type, which can be though as a more powerful version of Java wildcards:
def apply(vehicleType: VehicleType): VehicleReader[_] = ...
Signature for this function basically reads "you give me a VehicleType and I return to you an instance of VehicleReader for some type". You will have an object of type VehicleReader[_]; you cannot say anything about type of its parameter except that this type exists, that's why such types are called existential.
def apply(vehicleType: VehicleType): VehicleReader[T] forSome {type T} = ...
This is an equivalent definition and it is probably more clear from it why these types have such properties - T type is hidden inside parameter, so you don't know anything about it but that it does exist.
But due to this property of existentials you cannot really obtain any information about real type parameters. You cannot get, say, VehicleReader[Car] out of VehicleReader[_] except via direct cast with asInstanceOf, which is dangerous, unless you store a TypeTag/ClassTag for type parameter in VehicleReader and check it before the cast. This is sometimes (in fact, most of time) unwieldy.
That's where the second option comes to the rescue. There is a clear correspondence between VehicleType and VehicleReader[T] in your code, i.e. when you have specific instance of VehicleType you definitely know concrete T in VehicleReader[T] signature:
VehicleType.Car -> CarReader (<: VehicleReader[Car])
VehicleType.Truck -> TruckReader (<: VehicleReader[Truck])
and so on.
Because of this it makes sense to add type parameter to VehicleType. In this case your method will look like
def apply[T <: Vehicle](vehicleType: VehicleType[T]): VehicleReader[T] = ...
Now input type and output type are directly connected, and the user of this method will be forced to provide a correct instance of VehicleType[T] for that T he wants. This rules out the runtime error I have mentioned earlier.
You will still need asInstanceOf cast though. To avoid casting completely you will have to move VehicleReader instantiation code (e.g. yours new CarReader()) to VehicleType, because the only place where you know real value of VehicleType[T] type parameter is where instances of this type are constructed:
sealed trait VehicleType[T <: Vehicle] {
def newReader: VehicleReader[T]
}
object VehicleType {
case object Car extends VehicleType[Car] {
def newReader = new CarReader
}
// ... and so on
}
Then VehicleReader factory method will then look very clean and be completely typesafe:
object VehicleReader {
def apply[T <: Vehicle](vehicleType: VehicleType[T]) = vehicleType.newReader
}