What are the differences and similarities of Scala and Haskell type systems?

What are the differences and similarities of Scala and Haskell type systems? - scala

How to explain Scala's type system to a Haskell expert?
What examples show Scala's advantages?
How to explain Haskell's type system to an advanced Scala practitioner?
What can be done in Haskell that can't be done in Scala?

Scala To a Haskell programmer:
Scala is a strict and impure language with first-class modules. Data types are declared as "classes" or "traits" with subtle differences, and modules or "objects" are values of those types. Scala supports type constructors taking universally quantified type parameters. Objects/classes/traits have members which consist of values, mutable variables, and functions (called "methods", to which the module is implicitly passed as a variable called this). Modules may have type members which can also take parameters. Type members are existentially quantified and type parameters can be higher-kinded. Because types can be members of first-class values, Scala provides a flavour of dependent typing called path-dependent types.
First-class functions are also modules. A function is a module with a method named apply. A method is not first-class, but a syntax is provided to wrap a method in a first-class function. Unfortunately, a module requires all of its type parameters up front, hence a partially applied first-class function is not allowed to be universally quantified. More generally, Scala completely lacks a direct mechanism for types of rank higher than 1, but modules parameterized on higher-kinded types can be exploited to simulate rank-n types.
Instead of type classes with global scope, Scala lets you declare an implicit value of any given type. This includes function types, which provides implicit conversion, and therefore type extension. In addition to implicit conversions, type extension is provided by the "extends" mechanism which lets you declare a subtype/supertype relation among modules. This mechanism can be used to simulate algebraic datatypes where the supertype can be seen as the type on the left-hand side of a data declaration, and its subtypes as the value constructors on the right-hand side. Scala has extensive pattern-matching capabilities using a virtualized pattern matcher with first-class patterns.
Scala supports subtyping, and this limits type inference considerably. But type inference has improved over time. Inference of higher kinded types is supported. However, Scala lacks any meaningful kind system, and therefore has no kind inference and no kind unification. If a type variable is introduced, it is of kind * unless annotated otherwise. Certain types like Any (the supertype of all types) and Nothing (a subtype of every type) are technically of every kind although they cannot be applied to type arguments.
Haskell to a Scala programmer:
Haskell is a purely functional language. This means that functions are not allowed to have any side-effects at all. For example, a Haskell program doesn't print to the screen as such, but is a function that returns a value of the IO[_] datatype which describes a sequence of actions for the IO subsystem to perform.
Whereas Scala is strict by default and provides "by-name" annotation for nonstrict function arguments, Haskell is lazy by default using "by-need" semantics, and provides annotation for strict arguments.
Type inference in Haskell is more complete than in Scala, having full inference. This means that type annotation is almost never necessary.
Recent extensions to the GHC compiler allow advanced type system features that have no equivalent in Scala, such as rank-n types, type families, and kinds polymorphism.
In Haskell, a module is a collection of types and functions, but modules are not first-class entities. Implicits are provided by type classes, but these are globally scoped once declared, and they cannot be passed explicitly as in Scala. Multiple instances of a type class for a given type are resolved by wrapping with a newtype to disambiguate, whereas in Scala this would be solved simply by scoping or by passing instances explicitly.
Since Haskell isn't "object-oriented", there's no method/function dichotomy. Every function is first class and every function is curried by default (no Function1, Function2, etc).
Haskell has no subtype mechanism, but type classes can have a subclass relationship.

I don't believe anyone has systematically compared Haskell (as exemplified by GHC's type system) with Scalas. The main points of difference are the degree of type inference, and support for higher-rank types. But a full treatment of the differences would be a publishable result.

Related

What types are special to the Scala compiler?

Scala makes a big deal about how what seem to be language features are implemented as library features.
Is there a list of types that are treated specially by the language?
Either in the specification or as an implementation detail?
That would include, for example, optimizing away matches on tuples.
What about special conventions related to pattern matching, for comprehensions, try-catch blocks and other language constructs?
Is String somehow special to the compiler? I see that String enhancement is just a library implicit conversion, and that String concatenation is supported by Predef, but is that somehow special-cased by the language?
Similarly, I see questions about <:< and classOf and asInstanceOf, and it's not clear what is a magical intrinsic. Is there a way to tell the difference, either with a compiler option or by looking at byte code?
I would like to understand if a feature is supported uniformly by implementations such as Scala.JS and Scala-native, or if a feature might actually prove to be implementation-dependent, depending on the library implementation.

There is an incredible amount of types that are "known" of the compiler, and are special to varying degrees. You can find a complete list in scalac's Definitions.scala.
We can probably classify them according to the degree of specialness they bear.
Disclaimer: I have probably forgotten a few more.
Special for the type system
The following types are crucial to Scala's type system. They have an influence on how type checking itself is performed. All these types are mentioned in the specification (or at least, they definitely should be).
Any, AnyRef, AnyVal, Null, Nothing: the five types that sit at the top and bottom of Scala's type system.
scala.FunctionN, the (canonical) type given of anonymous functions (including eta-expansion). Even in 2.12, which has the SAM treatment of anonymous functions, FunctionN remains special in some cases (notably in overloading resolution).
scala.PartialFunction (has an impact on how type inference works)
Unit
All types with literal notation: Int, Long, Float, Double, Char, Boolean, String, Symbol, java.lang.Class
All numeric primitive types and Chars, for weak conformance (together, these two bullets cover all primitive types)
Option and tuples (for pattern matching and auto-tupling)
java.lang.Throwable
scala.Dynamic
scala.Singleton
Most of scala.reflect.*, in particular ClassTag, TypeTag, etc.
scala.annotation.{,ClassFile,Static}Annotation
Almost all of the annotations in scala.annotation.* (e.g., unchecked)
scala.language.*
scala.math.ScalaNumber (for obscure reasons)
Known to the compiler as the desugaring of some language features
The following types are not crucial to the type system. They do not have an influence on type checking. However, the Scala language does feature a number of constructs which desugar into expressions of those types.
These types would also be mentioned in the specification.
scala.collection.Seq, Nil and WrappedArray, which are used for varargs parameters.
TupleN types
Product and Serializable (for case classes)
MatchError, generated by pattern matching constructs
scala.xml.*
scala.DelayedInit
List (the compiler performs some trivial optimizations on those, such as rewriting List() as Nil)
Known to the implementation of the language
This is probably the list that you care most about, given that you said you were interested in knowing what could go differently on different back-ends. The previous categories are handled by early (front-end) phases of the compiler, and are therefore shared by Scala/JVM, Scala.js and Scala Native. This category is typically known of the compiler back-end, and so potentially have different treatments. Note that both Scala.js and Scala Native do try to mimic the semantics of Scala/JVM to a reasonable degree.
Those types might not be mentioned in the language specification per se, at least not all of them.
Here are those where the back-ends agree (re Scala Native, to the best of my knowledge):
All primitive types: Boolean, Char, Byte, Short, Int, Long, Float, Double, Unit.
scala.Array.
Cloneable (currently not supported in Scala Native, see #334)
String and StringBuilder (mostly for string concatenation)
Object, for virtually all its methods
And here are those where they disagree:
Boxed versions of primitive types (such as java.lang.Integer)
Serializable
java.rmi.Remote and java.rmi.RemoteException
Some the annotations in scala.annotation.* (e.g., strictfp)
Some stuff in java.lang.reflect.*, used by Scala/JVM to implement structural types
Also, although not types per se, but a long list of primitive methods are also handled specifically by the back-ends.
Platform-specific types
In addition to the types mentioned above, which are available on all platforms, non-JVM platforms add their own special types for interoperability purposes.
Scala.js-specific types
See JSDefinitions.scala
js.Any: conceptually a third subtype of Any, besides AnyVal and AnyRef. They have JavaScript semantics instead of Scala semantics.
String and the boxed versions of all primitive types (heavily rewritten--so-called "hijacked"--by the compiler)
js.ThisFunctionN: their apply methods behave differently than that of other JavaScript types (the first actual argument becomes the thisArgument of the called function)
js.UndefOr and js.| (they behave as JS types even though they do not extend js.Any)
js.Object (new js.Object() is special-cased as an empty JS object literal {})
js.JavaScriptException (behaves very specially in throw and catch)
js.WrappedArray (used by the desugaring of varargs)
js.ConstructorTag (similar to ClassTag)
The annotation js.native, and all annotations in js.annotation.*
Plus, a dozen more primitive methods.
Scala Native-specific types
See NirDefinitions.scala
Unsigned integers: UByte, UShort, UInt and ULong
Ptr, the pointer type
FunctionPtrN, the function pointer types
The annotations in native.*
A number of extra primitive methods in scala.scalanative.runtime

Encoding Standard ML modules in OO

The ML module system stands as a high-water mark of programming language support for
data abstraction. However, superficially, it seems that it can easily be encoded in an object-oriented language that supports abstract type members. For example, we can encode the elements of SML module system in Scala as follows:
SML signatures: Scala traits with no concrete members
SML structures with given signatures: Scala objects extending given traits
SML functors parameterised by given signatures: Scala classes taking objects extending given traits as constructor arguments
Are there any significant features such an encoding would miss? Anything that can be expressed in SML modules that encoding can't express? Any guarantees that SML makes that this encoding would not be able to make?

There are a few fundamental differences that you cannot overcome easily:
ML signatures are structural types, Scala traits are nominal: an ML signature can be matched by any appropriate module after the fact, for Scala objects you need to declare the relation at definition time. Likewise, subtyping between ML signatures is fully structural. Scala refinements are closer to structural types, but have some rather severe limitations (e.g., they cannot reference their own local type definitions, nor contain free references to abstract types outside their scope).
ML signatures can be composed structurally using include and where. The resulting signature is equivalent to the inline expansion of the respective signature expression or type equation. Scala's mixin composition, while more powerful in many ways, again is nominal, and creates an inequivalent type. Even the order of composition matters for type equivalence.
ML functors are parameterised by structures, and thereby by both types and values, Scala's generic classes are only parameterised by types. To encode a functor, you would need to turn it into a generic function, that takes the types and the values separately. In general, this transformation -- called phase-splitting in the ML module literature -- cannot be limited to just definitions and uses of functors, because at their call-sites it has to be applied recursively to nested structure arguments; this ultimately requires that all structures are consistently phase-split, which is not a style you want to program in manually. (Neither is it possible to map functors to plain functions in Scala, since functions cannot express the necessary type dependencies between parameter and result types. Edit: since 2.10, Scala has support for dependent methods, which can encode some examples of SML's first-order generative functors, although it does not seem possible in the general case.)
ML has a general theory of refining and propagating "translucent" type information. Scala uses a weaker equational theory of "path-dependent" types, where paths denote objects. Scala thereby trades ML's more expressive type equivalences for the ability to use objects (with type members) as first-class values. You cannot easily have both without quickly running into decidability or soundness issues.
Edit: ML can naturally express abstract type constructors (i.e., types of higher kind), which often arise with functors. For Scala, higher kinds have to be activated explicitly, which are more challenging for its type system, and apparently lead to undecidable type checking.
The differences become even more interesting when you move beyond SML, to higher-order, first-class, or recursive modules. We briefly discuss a few issues in Section 10.1 of our MixML paper.

Disadvantages of Scala type system versus Haskell?

I have read that Scala's type system is weakened by Java interoperability and therefore cannot perform some of the same powers as Haskell's type system. Is this true? Is the weakness because of type erasure, or am I wrong in every way? Is this difference the reason that Scala has no typeclasses?

The big difference is that Scala doesn't have Hindley-Milner global type inference and instead uses a form of local type inference, requiring you to specify types for method parameters and the return type for overloaded or recursive functions.
This isn't driven by type erasure or by other requirements of the JVM. All possible difficulties here can be overcome, and have been, just consider Jaskell - http://docs.codehaus.org/display/JASKELL/Home
H-M inference doesn't work in an object-oriented context. Specifically, when type-polymorphism is used (as opposed to the ad-hoc polymorphism of type classes). This is crucial for strong interop with other Java libraries, and (to a lesser extent) to get the best possible optimisation from the JVM.
It's not really valid to state that either Haskell or Scala has a stronger type system, just that they are different. Both languages are pushing the boundaries for type-based programming in different directions, and each language has unique strengths that are hard to duplicate in the other.

Scala's type system is different from Haskell's, although Scala's concepts are sometimes directly inspired by Haskell's strengths and its knowledgeable community of researchers and professionals.
Of course, running on a VM not primarily intended for functional programming in the first place creates some compatibility concerns with existing languages targeting this platform.
Because most of the reasoning about types happens at compile time, the limitations of Java (as a language and as a platform) at runtime are nothing to be concerned about (except Type Erasure, although exactly this bug seems to make the integration into the Java ecosystem more seamless).
As far as I know the only "compromise" on the type system level with Java is a special syntax to handle Raw Types. While Scala doesn't even allow Raw Types anymore, it accepts older Java class files with that bug.
Maybe you have seen code like List[_] (or the longer equivalent List[T] forSome { type T }). This is a compatibility feature with Java, but is treated as an existential type internally too and doesn't weaken the type system.
Scala's type system does support type classes, although in a more verbose way than Haskell. I suggest reading this paper, which might create a different impression on the relative strength of Scala's type system (the table on page 17 serves as a nice list of very powerful type system concepts).
Not necessarily related to the power of the type system is the approach Scala's and Haskell's compilers use to infer types, although it has some impact on the way people write code.
Having a powerful type inference algorithm can make it worthwhile to write more abstract code (you can decide yourself if that is a good thing in all cases).
In the end Scala's and Haskell's type system are driven by the desire to provide their users with the best tools to solve their problems, but have taken different paths to that goal.

another interesting point to consider is that Scala directly supports the classical OO-style. Which means, there are subtype relations (e.g. List is a subclass of Seq). And this makes type inference more tricky. Add to this the fact that you can mix in traits in Scala, which means that a given type can have multiple supertype relations (making it yet more tricky)

Scala does not have rank-n types, although it may be possible to work around this limitation in certain cases.

I only have little experenice with Haskell, but the most obvious thing I note that Scala type system different from Haskell is the type inference.
In Scala, there is no global type inference, you must explicit tell the type of function arguments.
For example, in Scala you need to write this:
def add (x: Int, y: Int) = x + y
instead of
add x y = x + y
This may cause problem when you need generic version of add function that work with all kinds of type has the "+" method. There is a workaround for this, but it will get more verbose.
But in real use, I found Scala's type system is powerful enough for daily usage, and I almost never use those workaround for generic, maybe this is because I come from Java world.
And the limitation of explicit declare the type of arguments is not necessary a bad thing, you need document it anyway.

Well are they Turing reducible?
See Oleg Kiselyov's page http://okmij.org/ftp/
...
One can implement the lambda calculus in Haskell's type system. If Scala can do that, then in a sense Haskell's type system and Scala's type system compute the same types. The questions are: How natural is one over the other? How elegant is one over the other?

Advantages of Scala's type system

I am exploring the Scala language. One claim I often hear is that Scala has a stronger type system than Java. By this I think what people mean is that:
scalac rejects certain buggy programs which javac will compile happily, only to cause a runtime error.
Certain invariants can be encoded in a Scala program such that the compiler won't let the programmer write code that violates the condition.
Am I right in thinking so?

The main advantage of the Scala Type system is not so much being stronger but rather being far richer (see "The Scala Type System").
(Java can define some of them, and implement others, but Scala has them built-in).
See also The Myth Makers 1: Scala's "Type Types", commenting Steve Yegge's blog post, where he "disses" Scala as "Frankenstein's Monster" because "there are type types, and type type types".
Value type classes (useful for reasonably small data structures that have value semantics) used instead of primitives types (Int, Doubles, ...), with implicit conversion to "Rich" classes for additional methods.
Nonnullable type
Monad types
Trait types (and the mixin composition that comes with it)
Singleton object types (just define an 'object' and you have one),
Compound types (intersections of object types, to express that the type of an object is a subtype of several other types),
Functional types ((type1, …)=>returnType syntax),
Case classes (regular classes which export their constructor parameters and which provide a recursive decomposition mechanism via pattern matching),
Path-dependent types (Languages that let you nest types provide ways to refer to those type paths),
Anonymous types (for defining anonymous functions),
Self types (can be used for instance in Trait),
Type aliases, along with:
package object (introduced in 2.8)
Generic types (like Java), with a type parameter annotation mechanism to control the subtyping behavior of generic types,
Covariant generic types: The annotation +T declares type T to be used only in covariant positions. Stack[T] is a subtype of Stack[S] if T is a subtype of S.
Contravariant generic types: -T would declare T to be used only in contravariant positions.
Bounded generic types (even though Java supports some part of it),
Higher kinded types, which allow one to express more advanced type relationships than is possible with Java Generics,
Abstract types (the alternative to generic type),
Existential types (used in Scala like the Java wildcard type),
Implicit types (see "The awesomeness of Scala is implicit",
View bounded types, and
Structural types, for specifing a type by specifying characteristics of the desired type (duck typing).

The main safety problem with Java relates to variance. Basically, a programmer can use incorrect variance declarations that may result in exceptions being thrown at run-time in Java, while Scala will not allow it.
In fact, the very fact that Java's Array is co-variant is already a problem, since it allows incorrect code to be generated. For instance, as exemplified by sepp2k:
String[] strings = {"foo"};
Object[] objects = strings;
objects[0] = new Object();
Then, of course, there are raw types in Java, which allows all sort of things.
Also, though Scala has it as well, there's casting. Java API is rich in type casts, and there's no idiom like Scala's case x: X => // x is now safely cast. Sure, one case use instanceof to accomplish that, but there's no incentive to do it. In fact, Scala's asInstanceOf is intentionally verbose.
These are the things that make Scala's type system stronger. It is also much richer, as VonC shows.

What compromises Scala made to run on JVM?

Scala is a wonderful language, but I wonder how could be improved if it had it's own runtime?
I.e. what design choices were made because of JVM choice?

The two most important compromises I know about are:
type erasure ("reflecting on Type"): It has to manage a Manifest to get around the Java compilation (independent of the JVM, for backward compatibility reason).
collection of primitive type: e.g.: arrays
new scheme of handling arrays in Scala 2.8. Instead of boxing/unboxing and other compiler magic the scheme relies on implicit conversions and manifests to integrate arrays
Those are the two main JVM limitations, when it comes to manage generic type (with bounds): The Java JVM does not keep the exact type use in a generic object, and it has "primitive" types.
But you could also consider:
Tail-call optimization is not yet full supported by the JVM, was hard to do anyway (and yet Scala 2.8 introduces the #tailrec annotation)
UAP (universal Access Principle) needs to be emulated (not supported by Java), and will be soon completed for Value Holder (#proxy)
the all mix-in mechanism needs also to be emulated
more generally, the huge number of static types introduced by Scala need (for most of them) to be generated in Java:
In order to cover as many possibilities as possible, Scala provides:
Conventional class types,
Value class types,
Nonnullable types,
Monad types,
Trait types,
Singleton object types (procedural modules, utility classes, etc.),
Compound types,
Functional types,
Case classes,
Path-dependent types,
Anonymous types,
Self types,
Type aliases,
Generic types,
Covariant generic types,
Contravariant generic types,
Bounded generic types,
Abstract types,
Existential types,
Implicit types,
Augmented types,
View bounded types, and
Structural types which allow a form of duck typing when all else fails

This article is a discussion with Martin Odersky (Scala's creator) and includes the compromises that were made in Scala for compatibility with Java. The article mentions:
Static overloading of methods
Having both traits and classes
Inclusion of null pointers.

Less an issue with the runtime than a cultural hangover: universal equality, hashing, toString.
More deeply tied to the VM: strict by default evaluation, impure functions, exceptions.