Advantages of Scala's type system - scala

I am exploring the Scala language. One claim I often hear is that Scala has a stronger type system than Java. By this I think what people mean is that:
scalac rejects certain buggy programs which javac will compile happily, only to cause a runtime error.
Certain invariants can be encoded in a Scala program such that the compiler won't let the programmer write code that violates the condition.
Am I right in thinking so?

The main advantage of the Scala Type system is not so much being stronger but rather being far richer (see "The Scala Type System").
(Java can define some of them, and implement others, but Scala has them built-in).
See also The Myth Makers 1: Scala's "Type Types", commenting Steve Yegge's blog post, where he "disses" Scala as "Frankenstein's Monster" because "there are type types, and type type types".
Value type classes (useful for reasonably small data structures that have value semantics) used instead of primitives types (Int, Doubles, ...), with implicit conversion to "Rich" classes for additional methods.
Nonnullable type
Monad types
Trait types (and the mixin composition that comes with it)
Singleton object types (just define an 'object' and you have one),
Compound types (intersections of object types, to express that the type of an object is a subtype of several other types),
Functional types ((type1, …)=>returnType syntax),
Case classes (regular classes which export their constructor parameters and which provide a recursive decomposition mechanism via pattern matching),
Path-dependent types (Languages that let you nest types provide ways to refer to those type paths),
Anonymous types (for defining anonymous functions),
Self types (can be used for instance in Trait),
Type aliases, along with:
package object (introduced in 2.8)
Generic types (like Java), with a type parameter annotation mechanism to control the subtyping behavior of generic types,
Covariant generic types: The annotation +T declares type T to be used only in covariant positions. Stack[T] is a subtype of Stack[S] if T is a subtype of S.
Contravariant generic types: -T would declare T to be used only in contravariant positions.
Bounded generic types (even though Java supports some part of it),
Higher kinded types, which allow one to express more advanced type relationships than is possible with Java Generics,
Abstract types (the alternative to generic type),
Existential types (used in Scala like the Java wildcard type),
Implicit types (see "The awesomeness of Scala is implicit",
View bounded types, and
Structural types, for specifing a type by specifying characteristics of the desired type (duck typing).

The main safety problem with Java relates to variance. Basically, a programmer can use incorrect variance declarations that may result in exceptions being thrown at run-time in Java, while Scala will not allow it.
In fact, the very fact that Java's Array is co-variant is already a problem, since it allows incorrect code to be generated. For instance, as exemplified by sepp2k:
String[] strings = {"foo"};
Object[] objects = strings;
objects[0] = new Object();
Then, of course, there are raw types in Java, which allows all sort of things.
Also, though Scala has it as well, there's casting. Java API is rich in type casts, and there's no idiom like Scala's case x: X => // x is now safely cast. Sure, one case use instanceof to accomplish that, but there's no incentive to do it. In fact, Scala's asInstanceOf is intentionally verbose.
These are the things that make Scala's type system stronger. It is also much richer, as VonC shows.

Related

What types are special to the Scala compiler?

Scala makes a big deal about how what seem to be language features are implemented as library features.
Is there a list of types that are treated specially by the language?
Either in the specification or as an implementation detail?
That would include, for example, optimizing away matches on tuples.
What about special conventions related to pattern matching, for comprehensions, try-catch blocks and other language constructs?
Is String somehow special to the compiler? I see that String enhancement is just a library implicit conversion, and that String concatenation is supported by Predef, but is that somehow special-cased by the language?
Similarly, I see questions about <:< and classOf and asInstanceOf, and it's not clear what is a magical intrinsic. Is there a way to tell the difference, either with a compiler option or by looking at byte code?
I would like to understand if a feature is supported uniformly by implementations such as Scala.JS and Scala-native, or if a feature might actually prove to be implementation-dependent, depending on the library implementation.
There is an incredible amount of types that are "known" of the compiler, and are special to varying degrees. You can find a complete list in scalac's Definitions.scala.
We can probably classify them according to the degree of specialness they bear.
Disclaimer: I have probably forgotten a few more.
Special for the type system
The following types are crucial to Scala's type system. They have an influence on how type checking itself is performed. All these types are mentioned in the specification (or at least, they definitely should be).
Any, AnyRef, AnyVal, Null, Nothing: the five types that sit at the top and bottom of Scala's type system.
scala.FunctionN, the (canonical) type given of anonymous functions (including eta-expansion). Even in 2.12, which has the SAM treatment of anonymous functions, FunctionN remains special in some cases (notably in overloading resolution).
scala.PartialFunction (has an impact on how type inference works)
Unit
All types with literal notation: Int, Long, Float, Double, Char, Boolean, String, Symbol, java.lang.Class
All numeric primitive types and Chars, for weak conformance (together, these two bullets cover all primitive types)
Option and tuples (for pattern matching and auto-tupling)
java.lang.Throwable
scala.Dynamic
scala.Singleton
Most of scala.reflect.*, in particular ClassTag, TypeTag, etc.
scala.annotation.{,ClassFile,Static}Annotation
Almost all of the annotations in scala.annotation.* (e.g., unchecked)
scala.language.*
scala.math.ScalaNumber (for obscure reasons)
Known to the compiler as the desugaring of some language features
The following types are not crucial to the type system. They do not have an influence on type checking. However, the Scala language does feature a number of constructs which desugar into expressions of those types.
These types would also be mentioned in the specification.
scala.collection.Seq, Nil and WrappedArray, which are used for varargs parameters.
TupleN types
Product and Serializable (for case classes)
MatchError, generated by pattern matching constructs
scala.xml.*
scala.DelayedInit
List (the compiler performs some trivial optimizations on those, such as rewriting List() as Nil)
Known to the implementation of the language
This is probably the list that you care most about, given that you said you were interested in knowing what could go differently on different back-ends. The previous categories are handled by early (front-end) phases of the compiler, and are therefore shared by Scala/JVM, Scala.js and Scala Native. This category is typically known of the compiler back-end, and so potentially have different treatments. Note that both Scala.js and Scala Native do try to mimic the semantics of Scala/JVM to a reasonable degree.
Those types might not be mentioned in the language specification per se, at least not all of them.
Here are those where the back-ends agree (re Scala Native, to the best of my knowledge):
All primitive types: Boolean, Char, Byte, Short, Int, Long, Float, Double, Unit.
scala.Array.
Cloneable (currently not supported in Scala Native, see #334)
String and StringBuilder (mostly for string concatenation)
Object, for virtually all its methods
And here are those where they disagree:
Boxed versions of primitive types (such as java.lang.Integer)
Serializable
java.rmi.Remote and java.rmi.RemoteException
Some the annotations in scala.annotation.* (e.g., strictfp)
Some stuff in java.lang.reflect.*, used by Scala/JVM to implement structural types
Also, although not types per se, but a long list of primitive methods are also handled specifically by the back-ends.
Platform-specific types
In addition to the types mentioned above, which are available on all platforms, non-JVM platforms add their own special types for interoperability purposes.
Scala.js-specific types
See JSDefinitions.scala
js.Any: conceptually a third subtype of Any, besides AnyVal and AnyRef. They have JavaScript semantics instead of Scala semantics.
String and the boxed versions of all primitive types (heavily rewritten--so-called "hijacked"--by the compiler)
js.ThisFunctionN: their apply methods behave differently than that of other JavaScript types (the first actual argument becomes the thisArgument of the called function)
js.UndefOr and js.| (they behave as JS types even though they do not extend js.Any)
js.Object (new js.Object() is special-cased as an empty JS object literal {})
js.JavaScriptException (behaves very specially in throw and catch)
js.WrappedArray (used by the desugaring of varargs)
js.ConstructorTag (similar to ClassTag)
The annotation js.native, and all annotations in js.annotation.*
Plus, a dozen more primitive methods.
Scala Native-specific types
See NirDefinitions.scala
Unsigned integers: UByte, UShort, UInt and ULong
Ptr, the pointer type
FunctionPtrN, the function pointer types
The annotations in native.*
A number of extra primitive methods in scala.scalanative.runtime

Type alias vs lambda type

Can anybody explain the pros/cons of
type VNel[+A] = ValidationNel[String, A]
x.sequence[VNel, ....
vs
x.sequence[({ type l[a] = ValidationNel[String, a] })#l, ....
From what I understand, using structural types incurs a runtime performance hit of having to use reflection.
Type lambda is a way to express complex types inline.
Type alias is way create an identifier for a type. It can be a complex type or be as simple as type UserId = Int. It is useful when you start needing a complex type more than once or you want to simplify a complex signature by breaking it in parts.
Neither type lambdas and type alias are structural typing. but rather a way to express types.
For more details on type lambdas:
https://stackoverflow.com/a/8737611/547564
They are mostly equivalent - use whichever you find clearer. IMO the type alias is usually more readable. In the context of a trait (or class) written to be extended, the type lambda can be clearer as it prevents overriding the type, but this is very much an edge case.
Accessing values defined in a structural type in ordinary code would indeed incur the cost of using reflection. But in a type lambda the structural type is only used as a generic type parameter, which will be erased at runtime. So there will be no runtime performance impact.
If you're making extensive use of type lambdas, you might like to consider the kind-projector plugin, which provides a more convenient syntax (and avoids the misleading visual resemblance to a structural type).

Encoding Standard ML modules in OO

The ML module system stands as a high-water mark of programming language support for
data abstraction. However, superficially, it seems that it can easily be encoded in an object-oriented language that supports abstract type members. For example, we can encode the elements of SML module system in Scala as follows:
SML signatures: Scala traits with no concrete members
SML structures with given signatures: Scala objects extending given traits
SML functors parameterised by given signatures: Scala classes taking objects extending given traits as constructor arguments
Are there any significant features such an encoding would miss? Anything that can be expressed in SML modules that encoding can't express? Any guarantees that SML makes that this encoding would not be able to make?
There are a few fundamental differences that you cannot overcome easily:
ML signatures are structural types, Scala traits are nominal: an ML signature can be matched by any appropriate module after the fact, for Scala objects you need to declare the relation at definition time. Likewise, subtyping between ML signatures is fully structural. Scala refinements are closer to structural types, but have some rather severe limitations (e.g., they cannot reference their own local type definitions, nor contain free references to abstract types outside their scope).
ML signatures can be composed structurally using include and where. The resulting signature is equivalent to the inline expansion of the respective signature expression or type equation. Scala's mixin composition, while more powerful in many ways, again is nominal, and creates an inequivalent type. Even the order of composition matters for type equivalence.
ML functors are parameterised by structures, and thereby by both types and values, Scala's generic classes are only parameterised by types. To encode a functor, you would need to turn it into a generic function, that takes the types and the values separately. In general, this transformation -- called phase-splitting in the ML module literature -- cannot be limited to just definitions and uses of functors, because at their call-sites it has to be applied recursively to nested structure arguments; this ultimately requires that all structures are consistently phase-split, which is not a style you want to program in manually. (Neither is it possible to map functors to plain functions in Scala, since functions cannot express the necessary type dependencies between parameter and result types. Edit: since 2.10, Scala has support for dependent methods, which can encode some examples of SML's first-order generative functors, although it does not seem possible in the general case.)
ML has a general theory of refining and propagating "translucent" type information. Scala uses a weaker equational theory of "path-dependent" types, where paths denote objects. Scala thereby trades ML's more expressive type equivalences for the ability to use objects (with type members) as first-class values. You cannot easily have both without quickly running into decidability or soundness issues.
Edit: ML can naturally express abstract type constructors (i.e., types of higher kind), which often arise with functors. For Scala, higher kinds have to be activated explicitly, which are more challenging for its type system, and apparently lead to undecidable type checking.
The differences become even more interesting when you move beyond SML, to higher-order, first-class, or recursive modules. We briefly discuss a few issues in Section 10.1 of our MixML paper.

What compromises Scala made to run on JVM?

Scala is a wonderful language, but I wonder how could be improved if it had it's own runtime?
I.e. what design choices were made because of JVM choice?
The two most important compromises I know about are:
type erasure ("reflecting on Type"): It has to manage a Manifest to get around the Java compilation (independent of the JVM, for backward compatibility reason).
collection of primitive type: e.g.: arrays
new scheme of handling arrays in Scala 2.8. Instead of boxing/unboxing and other compiler magic the scheme relies on implicit conversions and manifests to integrate arrays
Those are the two main JVM limitations, when it comes to manage generic type (with bounds): The Java JVM does not keep the exact type use in a generic object, and it has "primitive" types.
But you could also consider:
Tail-call optimization is not yet full supported by the JVM, was hard to do anyway (and yet Scala 2.8 introduces the #tailrec annotation)
UAP (universal Access Principle) needs to be emulated (not supported by Java), and will be soon completed for Value Holder (#proxy)
the all mix-in mechanism needs also to be emulated
more generally, the huge number of static types introduced by Scala need (for most of them) to be generated in Java:
In order to cover as many possibilities as possible, Scala provides:
Conventional class types,
Value class types,
Nonnullable types,
Monad types,
Trait types,
Singleton object types (procedural modules, utility classes, etc.),
Compound types,
Functional types,
Case classes,
Path-dependent types,
Anonymous types,
Self types,
Type aliases,
Generic types,
Covariant generic types,
Contravariant generic types,
Bounded generic types,
Abstract types,
Existential types,
Implicit types,
Augmented types,
View bounded types, and
Structural types which allow a form of duck typing when all else fails
This article is a discussion with Martin Odersky (Scala's creator) and includes the compromises that were made in Scala for compatibility with Java. The article mentions:
Static overloading of methods
Having both traits and classes
Inclusion of null pointers.
Less an issue with the runtime than a cultural hangover: universal equality, hashing, toString.
More deeply tied to the VM: strict by default evaluation, impure functions, exceptions.

What are the differences and similarities of Scala and Haskell type systems?

How to explain Scala's type system to a Haskell expert?
What examples show Scala's advantages?
How to explain Haskell's type system to an advanced Scala practitioner?
What can be done in Haskell that can't be done in Scala?
Scala To a Haskell programmer:
Scala is a strict and impure language with first-class modules. Data types are declared as "classes" or "traits" with subtle differences, and modules or "objects" are values of those types. Scala supports type constructors taking universally quantified type parameters. Objects/classes/traits have members which consist of values, mutable variables, and functions (called "methods", to which the module is implicitly passed as a variable called this). Modules may have type members which can also take parameters. Type members are existentially quantified and type parameters can be higher-kinded. Because types can be members of first-class values, Scala provides a flavour of dependent typing called path-dependent types.
First-class functions are also modules. A function is a module with a method named apply. A method is not first-class, but a syntax is provided to wrap a method in a first-class function. Unfortunately, a module requires all of its type parameters up front, hence a partially applied first-class function is not allowed to be universally quantified. More generally, Scala completely lacks a direct mechanism for types of rank higher than 1, but modules parameterized on higher-kinded types can be exploited to simulate rank-n types.
Instead of type classes with global scope, Scala lets you declare an implicit value of any given type. This includes function types, which provides implicit conversion, and therefore type extension. In addition to implicit conversions, type extension is provided by the "extends" mechanism which lets you declare a subtype/supertype relation among modules. This mechanism can be used to simulate algebraic datatypes where the supertype can be seen as the type on the left-hand side of a data declaration, and its subtypes as the value constructors on the right-hand side. Scala has extensive pattern-matching capabilities using a virtualized pattern matcher with first-class patterns.
Scala supports subtyping, and this limits type inference considerably. But type inference has improved over time. Inference of higher kinded types is supported. However, Scala lacks any meaningful kind system, and therefore has no kind inference and no kind unification. If a type variable is introduced, it is of kind * unless annotated otherwise. Certain types like Any (the supertype of all types) and Nothing (a subtype of every type) are technically of every kind although they cannot be applied to type arguments.
Haskell to a Scala programmer:
Haskell is a purely functional language. This means that functions are not allowed to have any side-effects at all. For example, a Haskell program doesn't print to the screen as such, but is a function that returns a value of the IO[_] datatype which describes a sequence of actions for the IO subsystem to perform.
Whereas Scala is strict by default and provides "by-name" annotation for nonstrict function arguments, Haskell is lazy by default using "by-need" semantics, and provides annotation for strict arguments.
Type inference in Haskell is more complete than in Scala, having full inference. This means that type annotation is almost never necessary.
Recent extensions to the GHC compiler allow advanced type system features that have no equivalent in Scala, such as rank-n types, type families, and kinds polymorphism.
In Haskell, a module is a collection of types and functions, but modules are not first-class entities. Implicits are provided by type classes, but these are globally scoped once declared, and they cannot be passed explicitly as in Scala. Multiple instances of a type class for a given type are resolved by wrapping with a newtype to disambiguate, whereas in Scala this would be solved simply by scoping or by passing instances explicitly.
Since Haskell isn't "object-oriented", there's no method/function dichotomy. Every function is first class and every function is curried by default (no Function1, Function2, etc).
Haskell has no subtype mechanism, but type classes can have a subclass relationship.
I don't believe anyone has systematically compared Haskell (as exemplified by GHC's type system) with Scalas. The main points of difference are the degree of type inference, and support for higher-rank types. But a full treatment of the differences would be a publishable result.