About scala variance position: What is "the enclosing parameter clause"? - scala

Why the type position of a method is marked as negative?
In linked above, n.m. answered:
The variance position of a method parameter is the opposite of the variance position of the enclosing parameter clause.
The variance position of a type parameter is the opposite of the
variance position of the enclosing type parameter clause.
I don't know what the enclosing parameter clause or the enclosing type parameter clause is.
can you give an example to explain it?

I don't know what the enclosing parameter clause or the enclosing type
parameter clause is.
The specification defines one important axiom before stating those lines about variance:
Let the opposite of covariance be contravariance, and the opposite of
invariance be itself. The top-level of the type or template is always
in covariant position. The variance position changes at the following
constructs.
So we begin with the fact that the initial allowed variance for a type parameter is covariant, and now we flip the variance back and forth depending on specific contexts (here is an example, there are more):
Method parameters (from covariant to contravariant),
Type parameter clauses of methods
Low bounds of type parameters
Type parameters of parameterized classes, if the corresponding formal parameter is contravariant.
Now let's look at these statements again:
The variance position of a method parameter is the opposite of the
variance position of the enclosing parameter clause.
This basically means that if we have a generic method parameter, we flip the variance for it:
def m(param: T)
The enclosing parameter clause is everything defined after the method m and inside the parenthesis, which in our case include param: T. T is in a contravariant position because we had to flip it (remember, all top-level type parameters begin in a covariant position), due to the rules (rule 1).
The variance position of a type parameter is the opposite of the
variance position of the enclosing type parameter clause.
Let's define a method with a type parameter:
def m[T >: U]()
The enclosing type parameter clause is referring to the square brackets [T >: U]. Again, the variance flips because of the rules, thus U is now in a contravariant position (rule 2).
You can think about it as a game. You have a starting state (covariant, or positive), and then a set of rules which makes positions switch (covariant -> contravariant, contravariant -> covariant, invariant -> invariant). At the end of the game, you have a selected state (position) which is applied to the type parameter.
This blog post explains things in a way which one can reason about.

I think I know what he meant and will try to illustrate.
Given the following preconditions:
trait A
trait B extends A
trait C extends B
val bs: List[B] = List(new B{}, new B{})
val b2b: B => B = identity
A List[+T] is covariant in its type argument (enclosing type parameter clause), but the type of variable it's assignable to (enclosing type clause) is contravariant:
val as: List[A] = bs // this is valid
val cs: List[C] = bs // ...and this is not
Another example involves functions. a Function1[-T, +R] is contravariant in its arguments and covariant in its return types, but when assigning to variables the situation is reverse:
val c2a: C => A = b2b // this compiles
val a2c: A => C = b2b // ...and this does not

Related

Skolemization of existentially typed expressions

In Scala, the following expression raises a type error:
val pair: (A => String, A) forSome { type A } = ( { a: Int => a.toString }, 19 )
pair._1(pair._2)
As mentioned in SI-9899 and this answer, this is correct according to the spec:
I think this is working as designed as per SLS 6.1: "The following
skolemization rule is applied universally for every expression: If the
type of an expression would be an existential type T, then the type of
the expression is assumed instead to be a skolemization of T."
However, I do not fully understand this. At which point is this rule applied? Does it apply in the first line (i.e., the type of pair is a different one than given by the type annotation), or in the second line (but applying the rule to the second line as a whole would not lead to a type error)?
Let's assume SLS 6.1 applies to the first line. It is supposed to skolemize existential types. We can make the in the first line a non-existential type by putting the existential inside a type parameter:
case class Wrap[T](x:T)
val wrap = Wrap(( { a: Int => a.toString }, 19 ) : (A => String, A) forSome { type A })
wrap.x._1(wrap.x._2)
It works! (No type error.) So that means, the existential type got "lost" when we defined pair? No:
val wrap2 = Wrap(pair)
wrap2.x._1(wrap2.x._2)
This type checks! If it would have been the "fault" of the assignment to pair, this should not work. Thus, the reason why it does not work lies in the second line. If that's the case, what's the difference between the wrap and the pair example?
To wrap up, here is one more pair of examples:
val Wrap((a2,b2)) = wrap
a2(b2)
val (a3,b3) = pair
a3(b3)
Both don't work, but by analogy to the fact that wrap.x._1(wrap.x._2) did typecheck, I would have thought that a2(b2) might typecheck, too.
I believe I figured out most of the process how the expressions above are typed.
First, what does this mean:
The following skolemization rule is applied universally for every
expression: If the type of an expression would be an existential type
T, then the type of the expression is assumed instead to be a
skolemization of T. [SLS 6.1]
It means that whenever an expression or subexpression is determined to have type T[A] forSome {type A}, then a fresh type name A1 is chosen, and the expression is given type T[A1]. This makes sense since T[A] forSome {type A} intuitively means that there is some type A such that the expression has type T[A]. (What name is chosen depends on the compiler implementation. I use A1 to distinguish it from the bound type variable A)
We look at the first line of code:
val pair: (A => String, A) forSome { type A } = ({ a: Int => a.toString }, 19)
Here the skolemization rule is actually not yet used.
({ a: Int => a.toString }, 19) has type (Int=>String, Int). This is a subtype of (A => String, A) forSome { type A } since there exists an A (namely Int) such that the rhs is of type (A=>String,A).
The value pair now has type (A => String, A) forSome { type A }.
The next line is
pair._1(pair._2)
Now the typer assigns types to subexpressions from inside out. First, the first occurrence of pair is given a type. Recall that pair had type (A => String, A) forSome { type A }. Since the skolemization rule applies to every subexpression, we apply it to the first pair. We pick a fresh type name A1, and type pair as (A1 => String, A1). Then we assign a type to the second occurrence of pair. Again the skolemization rule applies, we pick another fresh type name A2, and the second occurrence of pair is types as (A2=>String,A2).
Then pair._1 has type A1=>String and pair._2 has type A2, thus pair._1(pair._2) is not well-typed.
Note that it is not the skolemization rule's "fault" that typing fails. If we would not have the skolemization rule, pair._1 would type as (A=>String) forSome {type A} and pair._2 would type as A forSome {type A} which is the same as Any. And then pair._1(pair._2) would still not be well-typed. (The skolemization rule is actually helpful in making things type, we will see that below.)
So, why does Scala refuse to understand that the two instances of pair actually are of type (A=>String,A) for the same A? I do not know a good reason in case of a val pair, but for example if we would have a var pair of the same type, the compiler must not skolemize several occurrences of it with the same A1. Why? Imagine that within an expression, the content of pair changes. First it contains an (Int=>String, Int), and then towards the end of the evaluation of the expression, it contains a (Bool=>String,Bool). This is OK if the type of pair is (A=>String,A) forSome {type A}. But if the computer would give both occurrences of pair the same skolemized type (A1=>String,A1), the typing would not be correct. Similarly, if pair would be a def pair, it could return different results on different invocations, and thus must not be skolemized with the same A1. For val pair, this argument does not hold (since val pair cannot change), but I assume that the type system would get too complicated if it would try to treat a val pair different from a var pair. (Also, there are situations where a val can change content, namely from unitialized to initialized. But I don't know whether that can lead to problems in this context.)
However, we can use the skolemization rule to make pair._1(pair._2) well-typed. A first try would be:
val pair2 = pair
pair2._1(pair2._2)
Why should this work? pair types as (A=>String,A) forSome {type A}. Thus it's type becomes (A3=>String,A3) for some fresh A3. So the new val pair2 should be given type (A3=>String,A3) (the type of the rhs). And if pair2 has type (A3=>String,A3), then pair2._1(pair2._2) will be well-typed. (No existentials involved any more.)
Unfortunately, this will actually not work, because of another rule in the spec:
If the value definition is not recursive, the type T may be omitted,
in which case the packed type of expression e is assumed. [SLS 4.1]
The packed type is the opposite of skolemization. That means, all the fresh variables that have been introduced inside the expression due to the skolemization rule are now transformed back into existential types. That is, T[A1] becomes T[A] forSome {type A}.
Thus, in
val pair2 = pair
pair2 will actually be given type (A=>String,A) forSome {type A} even though the rhs was given type (A3=>String,A3). Then pair2._1(pair2._2) will not type, as explained above.
But we can use another trick to achieve the desired result:
pair match { case pair2 =>
pair2._1(pair2._2) }
At the first glance, this is a pointless pattern match, since pair2 is just assigned pair, so why not just use pair? The reason is that the rule from SLS 4.1 only applied to vals and vars. Variable patterns (like pair2 here) are not affected. Thus pair is typed as (A4=>String,A4) and pair2 is given the same type (not the packed type). Then pair2._1 is typed A4=>String and pair2._2 is typed A4 and all is well-typed.
So a code fragment of the form x match { case x2 => can be used to "upgrade" x to a new "pseudo-value" x2 that can make some expressions well-typed that would not be well-typed using x. (I don't know why the spec does not simply allow the same thing to happen when we write val x2 = x. It would certainly be nicer to read since we do not get an extra level of indentation.)
After this excursion, let us go through the typing of the remaining expressions from the question:
val wrap = Wrap(({ a: Int => a.toString }, 19) : (A => String, A) forSome { type A })
Here the expression ({ a: Int => a.toString }, 19) types as (Int=>String,Int). The type case makes this into an expression of type
(A => String, A) forSome { type A }). Then the skolemization rule is applied, so the expression (the argument of Wrap, that is) gets type (A5=>String,A5) for a fresh A5. We apply Wrap to it, and that the rhs has type Wrap[(A5=>String,A5)]. To get the type of wrap, we need to apply the rule from SLS 4.1 again: We compute the packed type of Wrap[(A5=>String,A5)] which is Wrap[(A=>String,A)] forSome {type A}. So wrap has type Wrap[(A=>String,A)] forSome {type A} (and not Wrap[(A=>String,A) forSome {type A}] as one might expect at the first glance!) Note that we can confirm that wrap has this type by running the compiler with option -Xprint:typer.
We now type
wrap.x._1(wrap.x._2)
Here the skolemization rule applies to both occurrences of wrap, and they get typed as Wrap[(A6=>String,A6)] and Wrap[(A7=>String,A7)], respectively. Then wrap.x._1 has type A6=>String, and wrap.x._2 has type A7. Thus wrap.x._1(wrap.x._2) is not well-typed.
But the compiler disagrees and accepts wrap.x._1(wrap.x._2)! I do not know why. Either there is some additional rule in the Scala type system that I don't know about, or it is simply a compiler bug. Running the compiler with -Xprint:typer does not give extra insight, either, since it does not annotate the subexpressions in wrap.x._1(wrap.x._2).
Next is:
val wrap2 = Wrap(pair)
Here pair has type (A=>String,A) forSome {type A} and skolemizes to (A8=>String,A8). Then Wrap(pair) has type Wrap[(A8=>String,A8)] and wrap2 gets the packed type Wrap[(A=>String,A)] forSome {type A}. I.e., wrap2 has the same type as wrap.
wrap2.x._1(wrap2.x._2)
As with wrap.x._1(wrap.x._2), this should not type but does.
val Wrap((a2,b2)) = wrap
Here we see a new rule: [SLS 4.1] (not the part quoted above) explains that such a pattern match val statement is expanded to:
val tmp = wrap match { case Wrap((a2,b2)) => (a2,b2) }
val a2 = tmp._1
val b2 = tmp._2
Now we can see that (a2,b2) gets type (A9=>String,A9) for fresh A9,
tmp gets type (A=>String,A) forSome A due to the packed type rule. Then tmp._1 gets type A10=>String using the skolemization rule, and val a2 gets type (A=>String) forSome {type A} by the packed type rule. And tmp._2 gets type A11 using the skolemization rule, and val b2 gets type A forSome {type A} by the packed type rule (this is the same as Any).
Thus
a2(b2)
is not well-typed, because a2 gets type A12=>String and b2 gets type A13=>String from the skolemization rule.
Similarly,
val (a3,b3) = pair
expands to
val tmp2 = pair match { case (a3,b3) => (a3,b3) }
val a3 = tmp2._1
val b3 = tmp2._2
Then tmp2 gets type (A=>String,A) forSome {type A} by the packed type rule, and val a3 and val b3 get type (A=>String) forSome {type A} and A forSome {type A} (a.k.a. Any), respectively.
Then
a3(b3)
is not well-typed for the same reasons as a2(b2) wasn't.

Spark Scala Method Signature in DataFrame API

Hi I am trying to understand scala more and I think I am a little lost with this method signature.
explode[A <: Product](input: Column*)(f: (Row) ⇒ TraversableOnce[A])(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[A]): DataFrame
First off, what is the "<:" supposed to mean in the square brakcets? Are A and B supposed to be parameter types? But Column is the argument type.
Secondly, it looks like it does a lambda function from (Row) to Traversable[A] but I haven't seen a lambda yet that doesn't have the left side argument on the right side argument at least once.
Also, I 'm not 100 percent usre why it has the implicit arg0: piece
Thanks in advance!
what is the "<:" supposed to mean in the square brackets?
<: means subtype in scala, so here it means the A is a subtype of Product. It acts like a kind of upper bound which limits the type that can be passed here to be subtype of Product
Are A and B supposed to be parameter types? But Column is the argument
type
A is not the parameter type, it is a parameter by itself, which is called type parameter. It is a little bit confusing but basically it means you can pass any type that is a subtype of product to this position and use the type parameter inside the function. This makes the function more generic because it can handle different types at the same time and you don't have to write separate functions for different types;
it looks like it does a lambda function from (Row) to Traversable[A]
f: (row) => Traversable[A] is another parameter which in this case is a function type which accepts (row) and return Traversable[A]. By this definition, explode can accept a function as a parameter, in which case is a lambda expression;
To illustrate the last case:
def sum(x: Int, y: Int)(f: Int => Int) = f(x) + f(y)
sum: (x: Int, y: Int)(f: Int => Int)Int
sum(2,3)(x => 2*x)
res2: Int = 10
In conclusion, the function explode accepts three parameters in total, the first one A is a type parameter. The second and the third are real arguments of the function, with Input being of type Column as you have noticed and f being of type (row) => Traversable[A] which is a function type.

Type inference inconsistency between toList and toBuffer

As per the example below, calling xs.toList.map(_.toBuffer) succeeds, but xs.toBuffer.map(_.toBuffer) fails. But when the steps in the latter are performed using an intermediate result, it succeeds. What causes this inconsistency?
scala> "ab-cd".split("-").toBuffer
res0: scala.collection.mutable.Buffer[String] = ArrayBuffer(ab, cd)
scala> res0.map(_.toBuffer)
res1: scala.collection.mutable.Buffer[scala.collection.mutable.Buffer[Char]] = ArrayBuffer(ArrayBuffer(a, b), ArrayBuffer(c, d))
scala> "ab-cd".split("-").toBuffer.map(_.toBuffer)
<console>:8: error: missing parameter type for expanded function ((x$1) => x$1.toBuffer)
"ab-cd".split("-").toBuffer.map(_.toBuffer)
^
scala> "ab-cd".split("-").toList.map(_.toBuffer)
res3: List[scala.collection.mutable.Buffer[Char]] = List(ArrayBuffer(a, b), ArrayBuffer(c, d))
Look at the definitions of toBuffer and toList:
def toBuffer[A1 >: A]: Buffer[A1]
def toList: List[A]
As you can see, toBuffer is generic, while toList is not.
The reason for this difference is - I believe - that Buffer is invariant, while List is covariant.
Let's say that we have the following classes:
class Foo
class Bar extends Foo
Because List is covariant, you can call toList on an instance of Iterable[Bar] and treat the result as a List[Foo].
If List where invariant, this would not be the case.
Buffer being invariant, if toBuffer was defined as def toBuffer: Buffer[A] you would similarly not be able to treat the result
of toBuffer (on an instance of Iterable[Bar]) as an instance of Buffer[Foo] (as Buffer[Bar] is not a sub-type of Buffer[Foo], unlike for lists).
But by declaring toBuffer as def toBuffer[A1 >: A] (notice the added type parameter A1), we get back the possibility to have toBuffer return an instance of Buffer[Foo] :
all we need is to explcitly set A1 to Foo, or let the compiler infer it (if toBuffer is called at a site where a Buffer[Foo] is expected).
I think this explains the reason why toList and toBuffer are defined differently.
Now the problem with this is that toBuffer is generic, and this can badly affect inference.
When you do this:
"ab-cd".split("-").toBuffer
You never explicitly say that A1 is String, but because "ab-cd".split("-") has unambiguously the type Array[String], the compiler knows that A is String.
It also knows that A1 >: A (in toBuffer), and without any further constraint, it will infer A1 to be exactly A, in other words String.
So in the end the whole expression returns a Buffer[String].
But here's the thing: in scala, type inference happens in an expression as a whole.
When you have something like a.b.c, you might expect that scala will infer an exact type
for a, then from that infer an exact type for a.b, and finally for a.b.c. Not so.
Type inference is deferred to the whole expression a.b.c (see scala specification "6.26.4 Local Type Inference
", "case 1: selections")
So, going back to your problem, in the expression "ab-cd".split("-").toBuffer.map(_.toBuffer), the sub-expression "ab-cd".split("-").toBuffer is not typed Buffer[String], but instead
it stays typed as something like Buffer[A1] forSome A1 >: String. In other words, A1 is not fixed, we just carry the constraint A1 >: String to the next step of inference.
This next step is map(_.toBuffer), where map is defined as map[C](f: (B) ⇒ C): Buffer[B]. Here B is actually the same as A1, but at this point A1
is still not fully known, we only know that A1 >: String.
Here lies our problem. The compiler needs to know the exact type of the anonymous function (_.toBuffer) (simply because instantiating a Function1[A,R] requires to know the exact types of A and R, just like for any generic type).
So you need to tell him explcitly somehow, as it was not able to infer it exactly.
This means you need to do either:
"ab-cd".split("-").toBuffer[String].map(_.toBuffer)
Or:
"ab-cd".split("-").toBuffer.map((_:String).toBuffer)

Why the type position of a method is marked as negative?

Sorry I have asked some questions like this one, but I still can't get a clear answer, maybe my bad English and unclear expression puzzled the kind people.
When I read the "Type Parameterization" in this article: http://www.artima.com/pins1ed/type-parameterization.html, I see there are some explanation about the type positions:
As a somewhat contrived example, consider the following class definition, where the variance of several positions is annotated with ^+ (for positive) or ^- (for negative):
abstract class Cat[-T, +U] {
def meow[W^-](volume: T^-, listener: Cat[U^+, T^-]^-)
: Cat[Cat[U^+, T^-]^-, U^+]^+
}
I can understand most of this class, except the W position. I don't understand why it marked as negative, and there is no explanation in the whole document.
It also says:
Type parameters annotated with + may only be used in positive positions, while type parameters annotated with - may only be used in negative positions.
How can I find a type with - annotation in position W to fit this negative position?
The language reference says:
The variance position of a method parameter is the opposite of the variance position of the enclosing parameter clause.
The variance position of a type parameter is the opposite of the variance position of the enclosing type parameter clause.
The variance position of the lower bound of a type declaration or type parameter is the opposite of the variance position of the type declaration or parameter.
OK what does it mean for a type parameter to have a variance position?
class Moo[+A, -B] {
def foo[X] (bar : Y) ...
So Y is in a contravariant position, this is clear. We can put B in its position, but not A.
But what does it mean for X to be in a contravariant position? We cannot substitute A or B or anything there, it's just a formal parameter!
That's true, but this thing can have subordinate positions which are types, and have variance. So we need to count the position of X when tracking how many times we flip variance. There's no subordinate clauses of X here, but consider this:
class Moo[+A, -B] {
def foo[X >: Z] (bar : B) ...
We probably can replace Z with either A or B, but which is correct? Well, the position of Z is the opposite of that of X, and the position of X is the opposite of that of the top-level, which is covariant, so Z must be covariant too. Let's check:
abstract class Moo[+A, -B] {
def foo[X >: A] (bar : B)
}
defined class Moo
Looks like we are right!
There is a familiar example in the spec:
http://www.scala-lang.org/files/archive/spec/2.11/04-basic-declarations-and-definitions.html#variance-annotations
Sequence.append is example 4.5.2 in the pdf, but the markdown isn't numbered at the moment.
abstract class Sequence[+A] {
def append[B >: A](x: Sequence[B]): Sequence[B]
}
In real life, see the doc for Seq.++, ignoring the "use case" and clicking on the "full signature" to show the lower bound.
This is the same pattern as in other widening operations like Option.getOrElse, where you're getting back a possibly wider type than you started with.
Here's an example of how it makes sense in terms of substitution:
Given a Seq[Fruit], I can append a Seq[Orange]. Since Apple <: Fruit, I can also append the oranges to a Seq[Apple] and get fruits back.
That's why the B type parameter wants to be bound by a covariant parameter. The variance position of B is classified as negative, for purposes of variance checking, but B itself is not annotated.
Funny thing is that this parses:
scala> trait X { def append[-](): Unit }
defined trait X

Weird behavior of & function in Set

Set is defined as Set[A]. It takes a in-variant parameter. Doing below works as expected as we are passing co-variant argument:
scala> val a = Set(new Object)
a: scala.collection.immutable.Set[Object] = Set(java.lang.Object#118c38f)
scala> val b = Set("hi")
b: scala.collection.immutable.Set[String] = Set(hi)
scala> a & b
<console>:10: error: type mismatch;
found : scala.collection.immutable.Set[String]
required: scala.collection.GenSet[Object]
Note: String <: Object, but trait GenSet is invariant in type A.
You may wish to investigate a wildcard type such as `_ <: Object`. (SLS 3.2.10)
a & b
But the below works:
scala> Set(new Object) & Set("hi")
res1: scala.collection.immutable.Set[Object] = Set()
Above as I see it, the scala compiler converts Set("hi") to Set[Object] type and hence works.
What is the type-inference doing here? Can someone please link to specification explaining the behavior and when does it happen in general? Shouldn't it be throwing a compile time error for such cases? As 2 different output for the same operation type.
Not sure, but I think what you're looking for is described in the language spec under "Local Type Inference" (at this time of writing, section 6.26.4 on page 100).
Local type inference infers type arguments to be passed to expressions of polymorphic type. Say e is of type [ a1 >: L1 <: U1, ..., an >: Ln <: Un ] T and no explicit type
parameters are given.
Local type inference converts this expression to a type application e [ T1, ..., Tn ]. The choice of the type arguments T1, ..., Tn depends on the context in which the expression appears and on the expected type pt. There are three cases.
[ ... ]
If the expression e appears as a value without being applied to value arguments, the type arguments are inferred by solving a constraint system which relates the expression's type T with the expected type pt. Without loss of generality we can assume that T is a value type; if it is a method type we apply eta-expansion to convert it to a function type. Solving means finding a substitution σ of types Ti for the type parameters ai such that
None of inferred types Ti is a singleton type
All type parameter bounds are respected, i.e. σ Li <: σ ai and σ ai <: σ Ui for i = 1, ..., n.
The expression's type conforms to the expected type, i.e. σ T <: σ pt.
It is a compile time error if no such substitution exists. If several substitutions exist, local-type inference will choose for each type variable ai a minimal or maximal type Ti of the solution space. A maximal type Ti will be chosen if the type parameter ai appears contravariantly in the type T of the expression. A minimal type Ti will be chosen in all other situations, i.e. if the variable appears covariantly, nonvariantly or not at all in the type T. We call such a substitution an optimal solution of the given constraint system for the type T.
In short: Scalac has to choose values for the generic types that you omitted, and it picks the most specific choices possible, under the constraint that the result compiles.
The expression Set("hi") can be either a scala.collection.immutable.Set[String] or a scala.collection.immutable.Set[Object], depending on what the context requires. (A String is a valid Object, of course.) When you write this:
Set(new Object) & Set("hi")
the context requires Set[Object], so that's the type that's inferred; but when you write this:
val b = Set("hi")
the context doesn't specify, so the more-specific type Set[String] is chosen, which (as you expected) then makes a & b be ill-typed.