Set sequencing type puzzle - scala

Last night in responding to this question, I noticed the following:
scala> val foo: Option[Set[Int]] = Some(Set(1, 2, 3))
foo: Option[Set[Int]] = Some(Set(1, 2, 3))
scala> import scalaz._, Scalaz._
import scalaz._
import Scalaz._
scala> foo.sequenceU
res0: scala.collection.immutable.Set[Int] = Set(1, 2, 3)
That is, if foo is an optional set of integers, sequencing it returns a set of integers.
This isn't what I expected at first, since sequencing a F[G[A]] should return a G[F[A]] (assuming that F is traversable and G is an applicative functor). In this case, though, the Option layer just disappears.
I know this probably has something to do with some interaction between one of the supertypes of Set and the Unapply machinery that makes sequenceU work, and when I can find a few minutes I'm planning to work through the types and write up a description of what's going on.
It seems like a potentially interesting little puzzle, though, and I thought I'd post it here in case someone can beat me to an answer.

wow, yeah. Here's what I can surmise is happening. since Set doesn't have an Applicative of its own, we are getting the Monoid#applicative instance instead:
scala> implicitly[Unapply[Applicative, Set[Int]]].TC
res0: scalaz.Applicative[_1.M] forSome { val _1: scalaz.Unapply[scalaz.Applicative,Set[Int]] } = scalaz.Monoid$$anon$1#7f5d0856
Since Monoid is defined for types of kind * and applicative is defined for types of kind * -> *, the definition of Applicative in Monoid sorta wedges in an ignored type parameter using a type lambda:
final def applicative: Applicative[({type λ[α]=F})#λ] = new Applicative[({type λ[α]=F})#λ] with SemigroupApply...
Notice there that the type parameter α of λ is thrown away, so when Applicative#point is called, which becomes Monoid#zero, instead of it being a Monoid[Set[Option[Int]]] it is a Monoid[Set[Int]].
larsh points out that this has the interesting side-effect of alllowing sequenceU to be (ab)used as sum:
scala> List(1,2,3).sequenceU
res3: Int = 6

Related

Using Monoid operators in scalaz: |+| not a member of Some[Double]

What's missing to get this working?
import scalaz._
import Scalaz._
val r = Some(1.0) |+| None
val r1 = None[Double] |+| Some(1.0)
I am getting the following:
Error:(4, 25) value |+| is not a member of Some[Double] lazy val r =
Some(1.0) |+| None
^
The problem is that a Monoid over Double is not lawful as the associativity rule can be invalidated in some cases by floating point arithmetic approximation. For this reason scalaz left that instance out of the main project and included it instead in the scalaz-outlaws one.
Include that library if you need to have an instance for Double but remember that there is a reason for this and consider your use case (e.g. if you are processing money transactions using floating point arithmetic you are probably doing something wrong).

Why the variation in operators?

Long time lurker, first time poster.
In Scala, I'm looking for advantages as to why it was preferred to vary operators depending on type. For example, why was this:
Vector(1, 2, 3) :+ 4
determined to be an advantage over:
Vector(1, 2, 3) + 4
Or:
4 +: Vector(1,2,3)
over:
Vector(4) + Vector(1,2,3)
Or:
Vector(1,2,3) ++ Vector(4,5,6)
over:
Vector(1,2,3) + Vector(4,5,6)
So, here we have :+, +:, and ++ when + alone could have sufficed. I'm new at Scala, and I'll succumb. But, this seems unnecessary and obfuscated for a language that tries to be clean with its syntax.
I've done quite a few google and stack overflow searches and have only found questions about specific operators, and operator overloading in general. But, no background on why it was necessary to split +, for example, into multiple variations.
FWIW, I could overload the operators using implicit classes, such as below, but I imagine that would only cause confusion (and tisk tisks) from experienced Scala programmers using/reading my code.
object AddVectorDemo {
implicit class AddVector(vector : Vector[Any]) {
def +(that : Vector[Any]) = vector ++ that
def +(that : Any) = vector :+ that
}
def main(args : Array[String]) : Unit = {
val u = Vector(1,2,3)
val v = Vector(4,5,6)
println(u + v)
println(u + v + 7)
}
}
Outputs:
Vector(1, 2, 3, 4, 5, 6)
Vector(1, 2, 3, 4, 5, 6, 7)
The answer requires a surprisingly long detour through variance. I'll try to make it as short as possible.
First, note that you can add anything to an existing Vector:
scala> Vector(1)
res0: scala.collection.immutable.Vector[Int] = Vector(1)
scala> res0 :+ "fish"
res1: scala.collection.immutable.Vector[Any] = Vector(1, fish)
Why can you do this? Well, if B extends A and we want to be able to use Vector[B] where Vector[A] is called for, we need to allow Vector[B] to add the same sorts of things that Vector[A] can add. But everything extends Any, so we need to allow addition of anything that Vector[Any] can add, which is everything.
Making Vector and most other non-Set collections covariant is a design decision, but it's what most people expect.
Now, let's try adding a vector to a vector.
scala> res0 :+ Vector("fish")
res2: scala.collection.immutable.Vector[Any] = Vector(1, Vector(fish))
scala> res0 ++ Vector("fish")
res3: scala.collection.immutable.Vector[Any] = Vector(1, fish)
If we only had one operation, +, we wouldn't be able to specify which one of these things we meant. And we really might mean to do either. They're both perfectly sensible things to try. We could try to guess based on types, but in practice it's better to just ask the programmer to explicitly say what they mean. And since there are two different things to mean, there need to be two ways to ask.
Does this come up in practice? With collections of collections, yes, all the time. For example, using your +:
scala> Vector(Vector(1), Vector(2))
res4: Vector[Vector[Int]] = Vector(Vector(1), Vector(2))
scala> res4 + Vector(3)
res5: Vector[Any] = Vector(Vector(1), Vector(2), 3)
That's probably not what I wanted.
It's a fair question, and I think it has a lot to do with legacy code and Java compatibility. Scala copied Java's + for String concatenation, which has complicated things.
This + allows us to do:
(new Object) + "foobar" //"java.lang.Object#5bb90b89foobar"
So what should we expect if we had + for List and we did List(1) + "foobar"? One might expect List(1, "foobar") (of type List[Any]), just like we get if we use :+, but the Java-inspired String-concatenation overload would complicate this, since the compiler would fail to resolve the overload.
Odersky even once commented:
One should never have a + method on collections that are covariant in their element type. Sets and maps are non-variant, that's why they can have a + method. It's all rather delicate and messy. We'd be better off if we did not try to duplicate Java's + for String concatenation. But when Scala got designed the idea was to keep essentially all of Java's expression syntax, including String +. And it's too late to change that now.
There is some discussion (although in a different context) on the answers to this similar question.

What is the Scala equivalent of C++ typeid?

For example, if I do
scala> val a = Set(1,2,3)
a: scala.collection.immutable.Set[Int] = Set(1, 2, 3)
in the REPL, I want to see the most refined type of "a" in order to know whether it's really a HashSet. In C++, typeid(a).name() would do it. What is the Scala equivalent?
scala> val a = Set(1,2,3)
a: scala.collection.immutable.Set[Int] = Set(1, 2, 3)
scala> a.getClass.getName
res0: java.lang.String = scala.collection.immutable.Set$Set3
(Yes, it really is an instance of an inner class called Set3--it's a set specialized for 3 elements. If you make it a little larger, it'll be a HashTrieSet.)
Edit: #pst also pointed out that the type information [Int] was erased; this is how JVM generics work. However, the REPL keeps track since the compiler still knows the type. If you want to get the type that the compiler knows, you can
def manifester[A: ClassManifest](a: A) = implicitly[ClassManifest[A]]
and then you'll get something whose toString is the same as what the REPL reports. Between the two of these, you'll get as much type information as there is to be had. Of course, since the REPL already does this for you, you normally don't need to use this. But if for some reason you want to, the erased types are available from .typeArguments from the manifest.

In Scala 2, type inference fails on Set made with .toSet?

Why is type inference failing here?
scala> val xs = List(1, 2, 3, 3)
xs: List[Int] = List(1, 2, 3, 3)
scala> xs.toSet map(_*2)
<console>:9: error: missing parameter type for expanded function ((x$1) => x$1.$times(2))
xs.toSet map(_*2)
However, if xs.toSet is assigned, it compiles.
scala> xs.toSet
res42: scala.collection.immutable.Set[Int] = Set(1, 2, 3)
scala> res42 map (_*2)
res43: scala.collection.immutable.Set[Int] = Set(2, 4, 6)
Also, going the other way, converting to Set from List, and mapping on List complies.
scala> Set(5, 6, 7)
res44: scala.collection.immutable.Set[Int] = Set(5, 6, 7)
scala> res44.toList map(_*2)
res45: List[Int] = List(10, 12, 14)
Q: Why doesn't toSet do what I want?
A: That would be too easy.
Q: But why doesn't this compile? List(1).toSet.map(x => ...)
A: The Scala compiler is unable to infer that x is an Int.
Q: What, is it stupid?
A: Well, List[A].toSet doesn't return an immutable.Set[A]. It returns an immutable.Set[B] for some unknown B >: A.
Q: How was I supposed to know that?
A: From the Scaladoc.
Q: But why is toSet defined that way?
A: You might be assuming immutable.Set is covariant, but it isn't. It's invariant. And the return type of toSet is in covariant position, so the return type can't be allowed to be invariant.
Q: What do you mean, "covariant position"?
A: Let me Wikipedia that for you: http://en.wikipedia.org/wiki/Covariance_and_contravariance_(computer_science) . See also chapter 19 of Odersky, Venners & Spoon.
Q: I understand now. But why is immutable.Set invariant?
A: Let me Stack Overflow that for you: Why is Scala's immutable Set not covariant in its type?
Q: I surrender. How do I fix my original code?
A: This works: List(1).toSet[Int].map(x => ...). So does this: List(1).toSet.map((x: Int) => ...)
(with apologies to Friedman & Felleisen. thx to paulp & ijuma for assistance)
EDIT: There is valuable additional information in Adriaan's answer and in the discussion in the comments both there and here.
The type inference does not work properly as the signature of List#toSet is
def toSet[B >: A] => scala.collection.immutable.Set[B]
and the compiler would need to infer the types in two places in your call. An alternative to annotating the parameter in your function would be to invoke toSet with an explicit type argument:
xs.toSet[Int] map (_*2)
UPDATE:
Regarding your question why the compiler can infer it in two steps, let's just look at what happens when you type the lines one by one:
scala> val xs = List(1,2,3)
xs: List[Int] = List(1, 2, 3)
scala> val ys = xs.toSet
ys: scala.collection.immutable.Set[Int] = Set(1, 2, 3)
Here the compiler will infer the most specific type for ys which is Set[Int] in this case. This type is known now, so the type of the function passed to map can be inferred.
If you filled in all possible type parameters in your example the call would be written as:
xs.toSet[Int].map[Int,Set[Int]](_*2)
where the second type parameter is used to specify the type of the returned collection (for details look at how Scala collections are implemented). This means I even underestimated the number of types the compiler has to infer.
In this case it may seem easy to infer Int but there are cases where it is not (given the other features of Scala like implicit conversions, singleton types, traits as mixins etc.). I don't say it cannot be done - it's just that the Scala compiler does not do it.
I agree it would be nice to infer "the only possible" type, even when calls are chained, but there are technical limitations.
You can think of inference as a breadth-first sweep over the expression, collecting constraints (which arise from subtype bounds and required implicit arguments) on type variables, followed by solving those constraints. This approach allows, e.g., implicits to guide type inference. In your example, even though there is a single solution if you only look at the xs.toSet subexpression, later chained calls could introduce constraints that make the system unsatisfiable. The downside of leaving the type variables unsolved is that type inference for closures requires the target type to be known, and will thus fail (it needs something concrete to go on -- the required type of the closure and the type of its argument types must not both be unknown).
Now, when delaying solving the constraints makes inference fail, we could backtrack, solve all the type variables, and retry, but this is tricky to implement (and probably quite inefficient).

When is one Set less than another in Scala?

I wanted to compare the cardinality of two sets in Scala. Since stuff sometimes "just work" in Scala, I tried using < between the sets. It seems to go through, but I can't make any sense out of the result.
Example:
scala> Set(1,2,3) < Set(1,4)
res20: Boolean = true
What does it return?
Where can I read about this method in the API?
Why isn't it listed anywhere under scala.collection.immutable.Set?
Update: Even the order(??) of the elements in the sets seem to matter:
scala> Set(2,3,1) < Set(1,3)
res24: Boolean = false
scala> Set(1,2,3) < Set(1,3)
res25: Boolean = true
This doesn't work with 2.8. On Scala 2.7, what happens is this:
scala.Predef.iterable2ordered(Set(1, 2, 3): Iterable[Int]) < (Set(1, 3, 2): Iterable[Int])
In other words, there's an implicit conversion defined on scala.Predef, which is "imported" for all Scala code, from an Iterable[A] to an Ordered[Iterable[A]], provided there's an implicit A => Ordered[A] available.
Given that the order of an iterable for sets is undefined, you can't really predict much about it. If you add elements to make the set size bigger than four, for instance, you'll get entirely different results.
If you want to compare the cardinality, just do so directly:
scala> Set(1, 2, 3).size < Set(2, 3, 4, 5).size
res0: Boolean = true
My knowledge of Scala is not extensive, but doing some test, I get the following:
scala> Set(1,2) <
<console>:5: error: missing arguments for method < in trait Ordered;
follow this method with `_' if you want to treat it as a partially applied function
Set(1,2) <
^
That tells me that < comes from the trait Ordered. More hints:
scala> Set(1,2) < _
res4: (Iterable[Int]) => Boolean = <function>
That is, the Set is evaluated into an Iterable, because maybe there is some implicit conversion from Iterable[A] to Ordered[Iterable[A]], but I'm not sure anymore... Tests are not consistent. For example, these two might suggest a kind of lexicographical compare:
scala> Set(1,2,3) < Set(1,2,4)
res5: Boolean = true
1 is equal, 2 is equal, 3 is less than 4.
scala> Set(1,2,4) < Set(1,2,3)
res6: Boolean = false
But these ones don't:
scala> Set(2,1) < Set(2,4)
res11: Boolean = true
scala> Set(2,1) < Set(2,2)
res12: Boolean = false
I think the correct answer is that found in the Ordered trait proper: There is no implementation for < between sets more than comparing their hashCode:
It is important that the hashCode method for an instance of Ordered[A] be consistent with the compare method. However, it is not possible to provide a sensible default implementation. Therefore, if you need to be able compute the hash of an instance of Ordered[A] you must provide it yourself either when inheiriting or instantiating.