What is the proper academic name for the type system used in PureScript? I am looking for papers about that and proofs that it is sound.
In particular, as the type system allows to solve the exception as hidden communication channel problem when one wants to throw an instance of private exception type and pass it through effectful computations to the catch handler without the need to expose the type to the computation and possibility to catch it there, it would be nice to confirm that this is sound.
The PureScript type checker isn't based on any one type system in particular. I took inspiration from several papers as I implemented it, including:
"Complete and Easy Bidirectional Typechecking
for Higher-Rank Polymorphism" by Joshua Dunfield and Neelakantan R. Krishnaswami
"HMF: Simple Type Inference for First-Class Polymorphism" by
Daan Leijen
"Koka: Programming with Row-polymorphic Effect Types" by Daan Leijen
There are no soundness proofs. At some point, I would be interested in going back and reimplementing the type checker based on some system with soundness guarantees, but the initial goal was to produce a practical type system with the features that I wanted: row polymorphism, type classes and rank N types.
Related
In OOP it is good practice to talk to interfaces not to implementations. So, e.g., you write something like this (by Seq I mean scala.collection.immutable.Seq :)):
// talk to the interface - good OOP practice
doSomething[A](xs: Seq[A]) = ???
not something like the following:
// talk to the implementation - bad OOP practice
doSomething[A](xs: List[A]) = ???
However, in pure functional programming languages, such as Haskell, you don't have subtype polymorphism and use, instead, ad hoc polymorphism through type classes. So, for example, you have the list data type and a monadic instance for list. You don't need to worry about using an interface/abstract class because you don't have such a concept.
In hybrid languages, such as Scala, you have both type classes (through a pattern, actually, and not first-class citizens as in Haskell, but I digress) and subtype polymorphism. In scalaz, cats and so on you have monadic instances for concrete types, not for the abstract ones, of course.
Finally the question: given this hybridism of Scala do you still respect the OOP rule to talk to interfaces or just talk to concrete types to take advantage of functors, monads and so on directly without having to convert to a concrete type whenever you need to use them? Put differently, is in Scala still good practice to talk to interfaces even if you want to embrace FP instead of OOP? If not, what if you chose to use List and, later on, you realized that a Vector would have been a better choice?
P.S.: In my examples I used a simple method, but the same reasoning applies to user defined types. E.g.:
case class Foo(bars: Seq[Bar], ...)
What I would attack here is your "concrete vs. interface" concept. Look at it this way: every type has an interface, in the general sense of the term "interface." A "concrete" type is just a limiting case.
So let's look at Haskell lists from this angle. What's the interface of a list? Well, lists are an algebraic data type, and all such data types have the same general form of interface and contract:
You can construct instances of the type using its constructors according to their arities and argument types;
You can observe instances of the type by matching against their constructors according to their arities and argument types;
Construction and observation are inverses—when you pattern match against a value, what you get out is exactly what was put into it.
If you look at it in these terms, I think the following rule works pretty well in either paradigm:
Choose types whose interfaces and contracts match exactly with your requirements.
If their contract is weaker than your requirements, then they won't maintain invariants that you need;
If their contracts are stronger than your requirements, you may unintentionally couple yourself to the "extra" details and limit your ability to change the program later on.
So you no longer ask whether a type is "concrete" or "abstract"—just whether it fits your requirements.
These are my two cents on this subject. In Haskell you have data types (ADTs). You have both lists (linked lists) and vectors (int-indexed arrays) but they don't share a common supertype. If your function takes a list you cannot pass it a vector.
In Scala, being it a hybrid OOP-FP language, you have subtype polymorphism too so you may not care if the client code passes a List or a Vector, just require a Seq (possibly immutable) and you're done.
I guess to answer to this question you have to ask yourself another question: "Do I want to embrace FP in toto?". If the answer is yes then you shouldn't use Seq or any other abstract superclass in the OOP sense. Of course, the exception to this rule is the use of a trait/abstract class when defining ADTs in Scala. For example:
sealed trait Tree[+A]
case object Empty extends Tree[Nothing]
case class Node[A](value: A, left: Tree[A], right: Tree[A]) extends Tree[A]
In this case one would require Tree[A] as a type, of course, and then use, e.g., pattern matching to determine if it's either Empty or Node[A].
I guess my feeling about this subject is confirmed by the red book (Functional Programming in Scala). There they never use Seq, but List, Vector and so on. Also, haskellers, don't care about these problems and use lists whenever they need linked-list semantic and vectors whenever they need int-indexed-array semantic.
If, on the other hand, you want to embrace OOP and use Scala as a better Java then OK, you should follow the OOP best practice to talk to interfaces not to implementations.
If you're thinking: "I'd rather opt for mostly functional" then you should read Erik Meijer's The Curse of the Excluded Middle.
The ML module system stands as a high-water mark of programming language support for
data abstraction. However, superficially, it seems that it can easily be encoded in an object-oriented language that supports abstract type members. For example, we can encode the elements of SML module system in Scala as follows:
SML signatures: Scala traits with no concrete members
SML structures with given signatures: Scala objects extending given traits
SML functors parameterised by given signatures: Scala classes taking objects extending given traits as constructor arguments
Are there any significant features such an encoding would miss? Anything that can be expressed in SML modules that encoding can't express? Any guarantees that SML makes that this encoding would not be able to make?
There are a few fundamental differences that you cannot overcome easily:
ML signatures are structural types, Scala traits are nominal: an ML signature can be matched by any appropriate module after the fact, for Scala objects you need to declare the relation at definition time. Likewise, subtyping between ML signatures is fully structural. Scala refinements are closer to structural types, but have some rather severe limitations (e.g., they cannot reference their own local type definitions, nor contain free references to abstract types outside their scope).
ML signatures can be composed structurally using include and where. The resulting signature is equivalent to the inline expansion of the respective signature expression or type equation. Scala's mixin composition, while more powerful in many ways, again is nominal, and creates an inequivalent type. Even the order of composition matters for type equivalence.
ML functors are parameterised by structures, and thereby by both types and values, Scala's generic classes are only parameterised by types. To encode a functor, you would need to turn it into a generic function, that takes the types and the values separately. In general, this transformation -- called phase-splitting in the ML module literature -- cannot be limited to just definitions and uses of functors, because at their call-sites it has to be applied recursively to nested structure arguments; this ultimately requires that all structures are consistently phase-split, which is not a style you want to program in manually. (Neither is it possible to map functors to plain functions in Scala, since functions cannot express the necessary type dependencies between parameter and result types. Edit: since 2.10, Scala has support for dependent methods, which can encode some examples of SML's first-order generative functors, although it does not seem possible in the general case.)
ML has a general theory of refining and propagating "translucent" type information. Scala uses a weaker equational theory of "path-dependent" types, where paths denote objects. Scala thereby trades ML's more expressive type equivalences for the ability to use objects (with type members) as first-class values. You cannot easily have both without quickly running into decidability or soundness issues.
Edit: ML can naturally express abstract type constructors (i.e., types of higher kind), which often arise with functors. For Scala, higher kinds have to be activated explicitly, which are more challenging for its type system, and apparently lead to undecidable type checking.
The differences become even more interesting when you move beyond SML, to higher-order, first-class, or recursive modules. We briefly discuss a few issues in Section 10.1 of our MixML paper.
I've been doing some research on path-dependent types. The best description I could find for it was:
If L is a type label, then x.L and y.L are the same type iff x and y can be shown to refer to the same object.
This sometimes isn't the subtyping behaviour one would expect. I would expect that if L in the above example was indeed identical then that would be enough to make x.L and y.L indentical.
Is there any particular reason why Scala was designed this way?
The Scalable Component Abstractions paper has a good explanation on path dependent types and also a good example in Section 3: "Case study: subject/observer".
This paper explains it nicely. Basically, they're used to support abstract data type based programming and modularization.
Think about L as about type argument of generic class. Scala boasts about its type members but underlying JVM still has the same limitations.
Apparently, Alexander Stepanov has stated the following in an interview:
“I find OOP [object-oriented programming] technically unsound. It attempts to decompose the world in terms of interfaces that vary on a single type. To deal with the real problems you need multisorted algebras - families of interfaces that span multiple types.” [Emphasis added.]
Ignoring his statement regarding OOP for a moment, what are "multisorted algebras", beyond his terse definition, and can you give a practical example of how they are used (in the language of your choice)?
I believe he was talking about generic programming (he coined the term), whether meant in the context of this talk about the STL, or 'at large', in the sense of:
programming against a sort of interface that describes something that could fit all (and hopefully several) types (hence multi-sorted), ...
... provided they have some properties, often something about the nature of some operations on elements of that type (hence algebras).
To do (1), you need to have a way to specify a program that takes a type as a parameter, i.e. polymorphism, and to do (2), you need a way to say that you also want that type to carry specific operations (and, provided you can express them, properties). In effect, you're parametrizing your program by the structure of the data it manipulates. The paradigm is called in some places bounded polymorphism, datatype-generic programming, ... which reflects that languages have different notions of how to implement that idea — hence the italicized 'sort of' above.
For C++, it seems that —to Stepanov at least— this corresponds to templates (though ideas on how to do this best are still evolving).
For OO languages (Generic Java, C#), constraints on type parameters are typically expressed using subtype bounds ('bounded wildcards' ...).
For Haskell or Scala, you have (respectively, and similarly) type classes or implicits.
The ML family of languages prefers to do this using modules.
note that a number of proof assistants (which can express 'honest-to-god' properties as types) have developed a flavor of type classes : Isabelle, Coq, Matita are such examples
Note that Stepanov just co-wrote an entire book giving an exhaustive development of a library that embodies exactly what (I think) he means. So if you want examples in C++, this is definitely where you should look. Note also that this is much more evolved than the now-common advice of coding against an interface, rather than an object.
By 'practical example', I don't know if you mean 'how' or 'why' does one uses it. To give a caricaturally quick answer to the 'why', genericity is nice because, a bit like run-of-the-mill polymorphism, it lets you reuse code. But, more importantly:
polymorphic code that has to work with every single type often can't do anything interesting, whereas having a constrained interface to play with allows you to write richer programs
by specifying how that interface fits some your data, you have a type-safe way to select just those elements that suit your needs. For example, you probably know that the reduction operator (the reduce of Python & Hadoop, fold of a bunch of functional languages) is parallelizable only if the order in which you apply your reduction function doesn't matter (+, x, min, and work, but set difference doesn't). If you have a notion of 'type equipped with an associative operation', you know that you will be able to call a parallel reduction on it.
any overhead incurred by genericity occurs at compile time. For example, templates are legendarily fast
If you have seen some generic Java, look at say, the Comparable generic interface. It defines just one operation, but the contract it makes, though basic, is very much of algebraic flavor. I quote:
For the mathematically inclined, the relation that defines the natural ordering on a given class C is:
{(x, y) such that x.compareTo((Object)y) <= 0}.
The quotient for this total order is:
{(x, y) such that x.compareTo((Object)y) == 0}.
It follows immediately from the contract for compareTo that the quotient is an equivalence relation on C, > and that the natural ordering is a total order on C.
Now, I can write a method that selects the minimum, once, and use it for any type that fits this interface:
public static <T extends Comparable<T>> T min (T x, T y) {
if (x.compare(y) < 0) x; else y;
}
Naturally, since the way programmative constructs implement that notion varies wildly, what you will get in terms of usability & expressivity will also vary. Perhaps you should not judge data-generic programming just by OO languages like C++ or Java — but I've written too much already to start with module ascription or the automatic instance generation of type classes.
I'm too late, but maybe it will be helpful for you. User huitseeker wrote an excellent answer from the viewpoint of software design. I want to answer your question from the viewpoint of mathematics. Before diving into software world Alex Stepanov was a mathematician and studied abstract and universal algebra. And he often tried to bring rigorous mathematical foundations into the world of software and algorithm design. In his books From Mathematics to Generic Programming and Elements of Programming he advocates this design practice. His ideas about mixing concepts of algebraic structures and software design were realised in the notion of generic programming. And now let's talk about his quote:
To deal with the real problems you need multisorted algebras - families of interfaces that span multiple types
In my opinion there are two main concepts he wanted to mention here: the idea of abstract data type (ADT) and algebraic structure. First concept: ADT. ADT - is a mathematical model for a data types where a data type is defined only by it's semantic. Stepanov contrasted the idea of ADT to the idea of object in the OOP sense. Objects contains data and state whilst ADTs - not. ADT - is a behavioural abstraction, an operation cluster which describes interaction with data. Behavioural abstraction is entirely described by means of algebraic specification of abstract data type. You can read about this more in the original Liskov and Zilles paper, also I recommend you a paper Object-Oriented Programming Versus Abstract Data Types by William R. Cook.
(Discalimer: you can skip this paragraph, because it is more "mathematical and not so important") At first I want to clarify some terminology. When I talk about the algebraic structure it is the same as algebra. The word algebra is often also used for an algebraic structure. To be more precise when we talk about algebraic structures (algebras) we usually mean algebra over an algebraic theory. There is a concept of the variety of algebras, because there are several notions of an algebraic structure on an object of some category. By definition, an algebraic theory (algebra over it) consists of a specification of operations and laws that these operations must satisfy: this is a working definition of the algebraic structure we will use, and this definition ,I think, Stepanov implicitly mentioned in the quote.
Second concept which Stepanov wanted to mention is the most interesting property of ADTs: they can be formally modelled directly as many-sorted algebraic structures. Let's talk about it more formally. An algebraic structure - is a carrier set with one or more finitary operations defined on it. These operations are usually defined not over one set but over the multiple ones. E.g. let's define and algebra which models string concatenation. This algebra will be defined not over one set of strings but over two sets: strings set S and natural numbers set N, because we can define an operation which can concatenate a string with itself some finite number of times. So, this operation will take two operands, which belongs to different underlying (carrier) sets: S and N. Set which define these different operands (their types) in algebra called a set of sorts. Sort is an algebraical analog of the type. Algebra with multiple sorts called a multi-sorted algebra. In universal algebra, a signature lists the operations that characterize an algebraic structure. A many-sorted algebraic structure can have an arbitrary number of domains. The sorts are part of the signature, and they play the role of names for the different domains. Many-sorted signatures also prescribe on which sorts the functions and relations of a many-sorted algebraic structure are defined. For a one-sorted variety of algebras a signature is a set, whose elements are called operations, to each of which is assigned a cardinal number (0,1,2,…) called its arity. A signature of multi-sorted algebra can be defined as Σ = (S,OP,A), where S – set of sort names (types), OP - set of operation names and A - arities as before, except that now an arity is a list (sequence or more generally free monoid) of input sorts instead of merely a natural number (the length of the list) together with one output sort. Now we can create an algebraic specification of an abstract data type ADT as a triple:
ADT = (N, Σ, E)
, where N - name of abstract data type, Σ = (S,OP,A) - signature of multi-sorted algebraic structure, E = {e1, e2, …,en} - is a finite collection of equalities in the signature. As can you see now we have a rigorous mathematical description of ADT. In mathematics many-sorted algebraic structures are often used as a convenient tool even when they could be avoided with a little effort. Many-sorted algebraic structures are rarely defined in a rigorous way, because it is straightforward to carry out the generalization explicitly. That's why theory of many-sorted algebras can be successfully applied to software design.
So, Alex Stepanov wanted to say that he prefer ADTs and generic programming to OOP, because thus we can create programs with rigorous mathematical/algebraical foundations. I appreciate his efforts a lot. We all know that algebraical design is always correct, rigorous, beautiful, simple and gives us better abstractions.
Not that I am an expert with the theory of any of those but let's take a look at the quote so that I can try to give my practical understanding to add to the discussion.
To deal with the real problems you need multisorted algebras - families of interfaces that span multiple types.
From my readings, I think families of interfaces that span multiple types sounds a lot like type classes from Haskell, which is similar to concepts from C++. Take a type class like Foldable it actually is a type parametrized interface, ie. a familiy of interfaces that span multiple types. So about your question of how to solve problems with multisorted algebras, generic programming is all about that if you take it to mean type classes or concepts.
I have read that Scala's type system is weakened by Java interoperability and therefore cannot perform some of the same powers as Haskell's type system. Is this true? Is the weakness because of type erasure, or am I wrong in every way? Is this difference the reason that Scala has no typeclasses?
The big difference is that Scala doesn't have Hindley-Milner global type inference and instead uses a form of local type inference, requiring you to specify types for method parameters and the return type for overloaded or recursive functions.
This isn't driven by type erasure or by other requirements of the JVM. All possible difficulties here can be overcome, and have been, just consider Jaskell - http://docs.codehaus.org/display/JASKELL/Home
H-M inference doesn't work in an object-oriented context. Specifically, when type-polymorphism is used (as opposed to the ad-hoc polymorphism of type classes). This is crucial for strong interop with other Java libraries, and (to a lesser extent) to get the best possible optimisation from the JVM.
It's not really valid to state that either Haskell or Scala has a stronger type system, just that they are different. Both languages are pushing the boundaries for type-based programming in different directions, and each language has unique strengths that are hard to duplicate in the other.
Scala's type system is different from Haskell's, although Scala's concepts are sometimes directly inspired by Haskell's strengths and its knowledgeable community of researchers and professionals.
Of course, running on a VM not primarily intended for functional programming in the first place creates some compatibility concerns with existing languages targeting this platform.
Because most of the reasoning about types happens at compile time, the limitations of Java (as a language and as a platform) at runtime are nothing to be concerned about (except Type Erasure, although exactly this bug seems to make the integration into the Java ecosystem more seamless).
As far as I know the only "compromise" on the type system level with Java is a special syntax to handle Raw Types. While Scala doesn't even allow Raw Types anymore, it accepts older Java class files with that bug.
Maybe you have seen code like List[_] (or the longer equivalent List[T] forSome { type T }). This is a compatibility feature with Java, but is treated as an existential type internally too and doesn't weaken the type system.
Scala's type system does support type classes, although in a more verbose way than Haskell. I suggest reading this paper, which might create a different impression on the relative strength of Scala's type system (the table on page 17 serves as a nice list of very powerful type system concepts).
Not necessarily related to the power of the type system is the approach Scala's and Haskell's compilers use to infer types, although it has some impact on the way people write code.
Having a powerful type inference algorithm can make it worthwhile to write more abstract code (you can decide yourself if that is a good thing in all cases).
In the end Scala's and Haskell's type system are driven by the desire to provide their users with the best tools to solve their problems, but have taken different paths to that goal.
another interesting point to consider is that Scala directly supports the classical OO-style. Which means, there are subtype relations (e.g. List is a subclass of Seq). And this makes type inference more tricky. Add to this the fact that you can mix in traits in Scala, which means that a given type can have multiple supertype relations (making it yet more tricky)
Scala does not have rank-n types, although it may be possible to work around this limitation in certain cases.
I only have little experenice with Haskell, but the most obvious thing I note that Scala type system different from Haskell is the type inference.
In Scala, there is no global type inference, you must explicit tell the type of function arguments.
For example, in Scala you need to write this:
def add (x: Int, y: Int) = x + y
instead of
add x y = x + y
This may cause problem when you need generic version of add function that work with all kinds of type has the "+" method. There is a workaround for this, but it will get more verbose.
But in real use, I found Scala's type system is powerful enough for daily usage, and I almost never use those workaround for generic, maybe this is because I come from Java world.
And the limitation of explicit declare the type of arguments is not necessary a bad thing, you need document it anyway.
Well are they Turing reducible?
See Oleg Kiselyov's page http://okmij.org/ftp/
...
One can implement the lambda calculus in Haskell's type system. If Scala can do that, then in a sense Haskell's type system and Scala's type system compute the same types. The questions are: How natural is one over the other? How elegant is one over the other?