Scala: Casting results of groupBy(_.getClass) - scala

In this hypothetical, I have a list of operations to be executed. Some of the operations in that list will be more efficient if they can be batched together (eg, lookup up different rows from the same table in a database).
trait Result
trait BatchableOp[T <: BatchableOp[T]] {
def resolve(batch: Vector[T]): Vector[Result]
}
Here we use F-bounded Polymorphism to allow the implementation of the operation to refer to its own type, which is highly convenient.
However, this poses a problem when it comes time to execute:
def execute(operations: Vector[BatchableOp[_]]): Vector[Result] = {
def helper[T <: BatchableOp[T]](clazz: Class[T], batch: Vector[T]): Vector[Result] =
batch.head.resolve(batch)
operations
.groupBy(_.getClass)
.toVector
.flatMap { case (clazz, batch) => helper(clazz, batch)}
}
This results in a compiler error stating inferred type arguments [BatchableOp[_]] do not conform to method helper's type parameter bounds [T <: BatchableOp[T]].
How can the Scala compiler be convinced that the group is all of the same type (which is a subclass of BatchableOp)?
One workaround is to specify the type explicitly, but in this case the type is unknown.
Another workaround relies on enumerating the child types, but I'd like to not have to update the execute method after implementing a new BatchableOp type.

I would like to approach the question systematically, so that the same solution strategy can be applied in similar cases.
First, an obvious remark: you want to work with a vector. The content of the vector can be of different types. The length of the vector is not limited. The number of types of entries of the vector is not limited. Therefore, the compiler cannot prove everything at compile time: you will have to use something like asInstanceOf at some point.
Now to the solution of the actual question:
This here compiles under 2.12.4:
import scala.language.existentials
trait Result
type BOX = BatchableOp[X] forSome { type X <: BatchableOp[X] }
trait BatchableOp[C <: BatchableOp[C]] {
def resolve(batch: Vector[C]): Vector[Result]
// not abstract, needed only once!
def collectSameClassInstances(batch: Vector[BOX]): Vector[C] = {
for (b <- batch if this.getClass.isAssignableFrom(b.getClass))
yield b.asInstanceOf[C]
}
// not abstract either, no additional hassle for subclasses!
def collectAndResolve(batch: Vector[BOX]): Vector[Result] =
resolve(collectSameClassInstances(batch))
}
def execute(operations: Vector[BOX]): Vector[Result] = {
operations
.groupBy(_.getClass)
.toVector
.flatMap{ case (_, batch) =>
batch.head.collectAndResolve(batch)
}
}
The main problem that I see here is that in Scala (unlike in some experimental dependently typed languages) there is no simple way to write down complex computations "under the assumption of existence of a type".
Therefore, it seems difficult / impossible to transform
Vector[BatchOp[T] forSome T]
into a
Vector[BatchOp[T]] forSome T
Here, the first type says: "it's a vector of batchOps, their types are unknown, and can be all different", whereas the second type says: "it's a vector of batchOps of unknown type T, but at least we know that they are all the same".
What you want is something like the following hypothetical language construct:
val vec1: Vector[BatchOp[T] forSome T] = ???
val vec2: Vector[BatchOp[T]] forSome T =
assumingExistsSomeType[C <: BatchOp[C]] yield {
/* `C` now available inside this scope `S` */
vec1.map(_.asInstanceOf[C])
}
Unfortunately, we don't have anything like it for existential types, we can't introduce a helper type C in some scope S such that when C is eliminated, we are left with an existential (at least I don't see a general way to do it).
Therefore, the only interesting question that is to be answered here is:
Given a Vector[BatchOp[X] forSome X] for which I know that there is one common type C such that they all are actually Vector[C], where is the scope in which this C is present as a usable type variable?
It turns out that BatchableOp[C] itself has a type variable C in scope. Therefore, I can add a method collectSameClassInstances to BachableOp[C], and this method will actually have some type C available that it can use in the return type. Then I can immediately pass the result of collectSameClassInstances to the resolve method, and then I get a completely benign Vector[Result] type as output.
Final remark: If you decide to write any code with F-bounded polymorphisms and existentials, at least make sure that you have documented very clearly what exactly you are doing there, and how you will ensure that this combination does not escape in any other parts of the codebase. It doesn't feel like a good idea to expose such interfaces to the users. Keep it localized, make sure these abstractions do not leak anywhere.

Andrey's answer has a key insight that the only scope with the appropriate type variable is on the BatchableOp itself. Here's a reduced version that doesn't rely on importing existentials:
trait Result
trait BatchableOp[T <: BatchableOp[T]] {
def resolve(batch: Vector[T]): Vector[Result]
def unsafeResolve(batch: Vector[BatchableOp[_]]): Vector[Result] = {
resolve(batch.asInstanceOf[Vector[T]])
}
}
def execute(operations: Vector[BatchableOp[_]]): Vector[Result] = {
operations
.groupBy(_.getClass)
.toVector
.flatMap{ case (_, batch) =>
batch.head.unsafeResolve(batch)
}
}

Related

Understanding covariance in my Scala code

I am working through the correct syntax and structure for the following problem.
I have two datasets with two separate schemas--call them ClientEvent and ServerEvent--stored on disk. The codebase I am working on has defined a class, Reader[T :< Asset] where ClientEvent and ServerEvent are subtypes of Asset. Asset is a trait.
I am writing a function:
def getPathAndReader(config): (String, Reader[Asset]) = {
if (config.readClient) {
return getClientPathAndReader(config)
} else {
return getServerPathAndReader(config)
}
}
This does not compile in my Scala code. From my understanding, T must be a subtype of Asset, which both ServerEvent and ClientEvent are, therefore Reader[ServerEvent] <: Reader[Asset]. But since functions are covariant in their inputs, the function I wrote cannot just return this lower type, I'd have to cast it to a supertype? Does that lose too much information?
load is a function on the trait Asset
trait Reader[T <: Asset] {
def load(raw: DataFrame): Dataset[T]
}
What would be an alternative way to structure this code?
The code's intent is to take the file path returned, and call Reader::load(filePath: String) to get data back. The subtyped readers have some internal logic to clean the data that it retrieves from disk before it's returned as a Dataframe. This means it relies on the type that it passes in. I come from a C++/C# background so my thinking is that if you have a generic Reader[Asset] but call Reader::load(path: String) it will know what to do based on the type it actually is, similar to Base* ptr and calling a derived method.
Your claim that
"From my understanding, T must be a subtype of Asset, which both ServerEvent and ClientEvent are, therefore Reader[ServerEvent] <: Reader[Asset]." is not correct. Generally if A and B are usual types such as A <: B and G[T] is a generic type, then all 3 cases are possible:
Co-variant case G[A] <: G[B] - typical example is some read-only collection like Iterator
Contra-variant case G[A] :> G[B] - typical example is some kind of a consumer like a function T => ()
Invariant case where G[A] and G[B] are not related. The most typical case when some uses of the T are co-variant and some a contravariant. For example, a simple mapping function T => T is invariant. Also most of the mutable collections are invariant as well because the both "produce" and "consume" objects.
Unfortunately for you Dataset[T] is invariant (rather than covariant Dataset[+T] or contravariant Dataset[-T]). This effectively makes your Reader also invariant. As to how to work this around, it is hard to advice without understanding a larger context. For example, why your getClientPathAndReader and getServerPathAndReader do not return Dataset[Asset]? If you really then use specific ServerEvent and ClientEvent, then your design is not type-safe anyway. If you use only Asset, then changing your readers to return Dataset[Asset] seems the easiest solution.

Issue with existential in Scala

I have an issue working with existentials in Scala. My problem started when creating a mini workflow engine. I started on the idea that it was a directed graph, implemented the model for the latter first and then modeled the Workflow like this:
case class Workflow private(states: List[StateDef], transitions: List[_, _], override val edges: Map[String, List[StateDef]]) extends Digraph[String, StateDef, Transition[_, _]](states, edges) { ... }
In this case class, the first two fields are a list of states which behave as node, transitions which behave as edges.
The Transition parameter types are for the input and output parameters, as this should behave as an executable piece in the workflow, like a function of some sort:
case class Transition[-P, +R](tailState: StateDef, headState: StateDef, action: Action[P, R], condition: Option[Condition[P]] = None) extends Edge[String, StateDef](tailState, headState) {
def execute(param: P): Try[State[R]] = ...
}
I realized soon enough that dealing with a list of transitions in the Workflow object was giving me troubles with its type parameters. I tried to use parameters with [[Any]] and [[Nothing]], but I couldn't make it work (gist [1]).
If I'd do Java, I'd use a wildcard ? and use its 'less type safe and more dynamic' property and Java would have to believe me. But Scala is stricter and with variance and covariance of the Transition parameter types, it's hard to define wildcards and handle these properly. For example, using forSome notation and having a method in Workflow, I would get this error (gist [2]):
Error:(55, 24) type mismatch;
found : List[A$A27.this.Transition[A$A27.this.CreateImage,A$A27.this.Image]]
required: List[A$A27.this.Transition[P forSome { type P },R forSome { type R }]]
lazy val w = Workflow(transitions)
^
Hence then I created an existential type based on a trait (gist [3]), as explained in this article.
trait Transitions {
type Param
type Return
val transition: Transition[Param, Return]
val evidenceParam: StateValue[Param]
val evidenceReturn: StateValue[Return]
}
So now I could plug this existential in my Workflow class like this:
case class Workflow private(states: List[StateDef], transitions: List[Transitions], override val edges: Map[String, List[StateDef]])
extends Digraph[String, StateDef, Transitions](states, edges)
Working in a small file proved to be working (gist [3]). But when I moved on to the real code, my Digraph parent class does not like this Transitions existential. The former needs an Edge[ID, V] type, which Transition complies with but not the Transitions existential of course.
How in Scala does one resolve this situation? It seems troublesome to work with parameter types to get generics in Scala. Is there an easier solution that I haven't tried? Or a magic trick to specify the correct compatible parameter type between Digraph which need an Edge[ID, V] type and not an existential type that basically erase type traces?
I am sorry as this is convoluted, I will try my best to update the question if necessary.
Here are the Gist references for some of my trials and errors:
https://gist.github.com/jimleroyer/943efd00c764880b8119786d9dd6c3a2
https://gist.github.com/jimleroyer/1ce238b3934882ddc02a09485f52f407
https://gist.github.com/jimleroyer/17227b7e334d020a21deb36086b9b978
EDIT-1
Based on #HTNW answer, I've modified the scope of the existentials using forSome and updated the solution: https://gist.github.com/jimleroyer/2cb4ccbec13620585d21d53b4431ce22
I still have an issue though to properly bind the generics with the matchTransition & getTransition methods and without an explicit cast using asInstanceOf. I'll open another question specific to that one issue.
You scoped your existential quantifiers wrong.
R forSome { type R }
is equal to Any, because every single type is a type, so every single type is a subtype of that existential type, and that is the distinguishing feature of Any. Therefore
Transition[P forSome { type P }, R forSome { type R }]
is really
Transition[Any, Any]
and the Transitions end up needing to take Anys as parameter, and you lose all information about the type of the return. Make it
List[Transition[P, R] forSome { type P; type R }] // if every transition can have different types
List[Transition[P, R]] forSome { type P; type R } // if all the transitions need similar types
// The first can also be sugared to
List[Transition[_, _]]
// _ scopes so the forSome is placed outside the nearest enclosing grouping
Also, I don't get where you got the idea that Java's ? is "less safe". Code using it has a higher chance of being unsafe, sure, because ? is limited, but on its own it is perfectly sound (modulo null).

scala type 'extraction'

This might not be the most correct terminology but what I mean by boxed type is Box[T] for type T. So Option[Int] is a boxed Int.
How might one go about extracting these types? My naive attempt:
//extractor
type X[Box[E]] = E //doesn't compile. E not found
//boxed
type boxed = Option[Int]
//unboxed
type parameter = X[boxed] //this is the syntax I would like to achieve
implicitly[parameter =:= Int] //this should compile
Is there any way to do this? Apart from the Apocalisp blog I have hard time finding instructions on type-level meta-programming in Scala.
I can only imagine two situations. Either you use type parameters, then if you use such a higher-kinded-type, e.g. as argument to a method, you will have its type parameter duplicated in the method generics:
trait Box[E]
def doSomething[X](b: Box[X]) { ... } // parameter re-stated as `X`
or you have type members, then you can refer to them per instance:
trait Box { type E }
def doSomething(b: Box) { type X = b.E }
...or generally
def doSomething(x: Box#E) { ... }
So I think you need to rewrite your question in terms of what you actually want to achieve.

Map from Class[T] to T without casting

I want to map from class tokens to instances along the lines of the following code:
trait Instances {
def put[T](key: Class[T], value: T)
def get[T](key: Class[T]): T
}
Can this be done without having to resolve to casts in the get method?
Update:
How could this be done for the more general case with some Foo[T] instead of Class[T]?
You can try retrieving the object from your map as an Any, then using your Class[T] to “cast reflectively”:
trait Instances {
private val map = collection.mutable.Map[Class[_], Any]()
def put[T](key: Class[T], value: T) { map += (key -> value) }
def get[T](key: Class[T]): T = key.cast(map(key))
}
With help of a friend of mine, we defined the map with keys as Manifest instead of Class which gives a better api when calling.
I didnt get your updated question about "general case with some Foo[T] instead of Class[T]". But this should work for the cases you specified.
object Instances {
private val map = collection.mutable.Map[Manifest[_], Any]()
def put[T: Manifest](value: T) = map += manifest[T] -> value
def get[T: Manifest]: T = map(manifest[T]).asInstanceOf[T]
def main (args: Array[String] ) {
put(1)
put("2")
println(get[Int])
println(get[String])
}
}
If you want to do this without any casting (even within get) then you will need to write a heterogeneous map. For reasons that should be obvious, this is tricky. :-) The easiest way would probably be to use a HList-like structure and build a find function. However, that's not trivial since you need to define some way of checking type equality for two arbitrary types.
I attempted to get a little tricky with tuples and existential types. However, Scala doesn't provide a unification mechanism (pattern matching doesn't work). Also, subtyping ties the whole thing in knots and basically eliminates any sort of safety it might have provided:
val xs: List[(Class[A], A) forSome { type A }] = List(
classOf[String] -> "foo", classOf[Int] -> 42)
val search = classOf[String]
val finalResult = xs collect { case (`search`, result) => result } headOption
In this example, finalResult will be of type Any. This is actually rightly so, since subtyping means that we don't really know anything about A. It's not why the compiler is choosing that type, but it is a correct choice. Take for example:
val xs: List[(Class[A], A) forSome { type A }] = List(classOf[Boolean] -> 'bippy)
This is totally legal! Subtyping means that A in this case will be chosen as Any. It's hardly what we want, but it is what you will get. Thus, in order to express this constraint without tracking all of the types individual (using a HMap), Scala would need to be able to express the constraint that a type is a specific type and nothing else. Unfortunately, Scala does not have this ability, and so we're basically stuck on the generic constraint front.
Update Actually, it's not legal. Just tried it and the compiler kicked it out. I think that only worked because Class is invariant in its type parameter. So, if Foo is a definite type that is invariant, you should be safe from this case. It still doesn't solve the unification problem, but at least it's sound. Unfortunately, type constructors are assumed to be in a magical super-position between co-, contra- and invariance, so if it's truly an arbitrary type Foo of kind * => *, then you're still sunk on the existential front.
In summary: it should be possible, but only if you fully encode Instances as a HMap. Personally, I would just cast inside get. Much simpler!

Type parameters versus member types in Scala

I'd like to know how do the member types work in Scala, and how should I associate types.
One approach is to make the associated type a type parameter. The advantages of this approach is that I can prescribe the variance of the type, and I can be sure that a subtype doesn't change the type. The disadvantages are, that I cannot infer the type parameter from the type in a function.
The second approach is to make the associated type a member of the second type, which has the problem that I can't prescribe bounds on the subtypes' associated types and therefore, I can't use the type in function parameters (when x : X, X#T might not be in any relation with x.T)
A concrete example would be:
I have a trait for DFAs (could be without the type parameter)
trait DFA[S] { /* S is the type of the symbols in the alphabet */
trait State { def next(x : S); }
/* final type Sigma = S */
}
and I want to create a function for running this DFA over an input sequence, and I want
the function must take anything <% Seq[alphabet-type-of-the-dfa] as input sequence type
the function caller needn't specify the type parameters, all must be inferred
I'd like the function to be called with the concrete DFA type (but if there is a solution where the function would not have a type parameter for the DFA, it's OK)
the alphabet types must be unconstrained (ie. there must be a DFA for Char as well as for a yet unknown user-defined class)
the DFAs with different alphabet types are not subtypes
I tried this:
def runDFA[S, D <: DFA[S], SQ <% Seq[S]](d : D)(seq : SQ) = ....
this works, except the type S is not inferred here, so I have to write the whole type parameter list on each call site.
def runDFA[D <: DFA[S] forSome { type S }, SQ <% Seq[D#Sigma]]( ... same as above
this didn't work (invalid circular reference to type D??? (what is it?))
I also deleted the type parameter, created an abstract type Sigma and tried binding that type in the concrete classes. runDFA would look like
def runDFA[D <: DFA, SQ <% Seq[D#Sigma]]( ... same as above
but this inevitably runs into problems like "type mismatch: expected dfa.Sigma, got D#Sigma"
Any ideas? Pointers?
Edit:
As the answers indicate there is no simple way of doing this, could somebody elaborate more on why is it impossible and what would have to be changed so it worked?
The reasons I want runDFA ro be a free function (not a method) is that I want other similar functions, like automaton minimization, regular language operations, NFA-to-DFA conversions, language factorization etc. and having all of this inside one class is just against almost any principle of OO design.
First off, you don't need the parameterisation SQ <% Seq[S]. Write the method parameter as Seq[S]. If SQ <% Seq[S] then any instance of it is implicitly convertable to Seq[S] (that's what <% means), so when passed as Seq[S] the compiler will automatically insert the conversion.
Additionally, what Jorge said about type parameters on D and making it a method on DFA hold. Because of the way inner classes work in Scala I would strongly advise putting runDFA on DFA. Until the path dependent typing stuff works, dealing with inner classes of some external class can be a bit of a pain.
So now you have
trait DFA[S]{
...
def runDFA(seq : Seq[S]) = ...
}
And runDFA is all of a sudden rather easy to infer type parameters for: It doesn't have any.
Scala's type inference sometimes leaves much to be desired.
Is there any reason why you can't have the method inside your DFA trait?
def run[SQ <% Seq[S]](seq: SQ)
If you don't need the D param later, you can also try defining your method without it:
def runDFA[S, SQ <% Seq[S]](d: DFA[S])(seq: SQ) = ...
Some useful info on how the two differs :
From the the shapeless guide:
Without type parameters you cannot make dependent types , for example
trait Generic[A] {
type Repr
def to(value: A): Repr
def from(value: Repr): A
}
import shapeless.Generic
def getRepr[A](value: A)(implicit gen: Generic[A]) =
gen.to(value)
Here the type returned by to depends on the input type A (because the supplied implicit depends on A):
case class Vec(x: Int, y: Int)
case class Rect(origin: Vec, size: Vec)
getRepr(Vec(1, 2))
// res1: shapeless.::[Int,shapeless.::[Int,shapeless.HNil]] = 1 :: 2 ::
HNil
getRepr(Rect(Vec(0, 0), Vec(5, 5)))
// res2: shapeless.::[Vec,shapeless.::[Vec,shapeless.HNil]] = Vec(0,0)
:: Vec(5,5) :: HNil
without type members this would be impossible :
trait Generic2[A, Repr]
def getRepr2[A, R](value: A)(implicit generic: Generic2[A, R]): R =
???
We would have had to pass the desired value of Repr to getRepr as a
type parameter, effec vely making getRepr useless. The intui ve
take-away from this is that type parameters are useful as “inputs” and
type members are useful as “outputs”.
please see the shapeless guide for details.