The object-functional impedance mismatch - scala

In OOP it is good practice to talk to interfaces not to implementations. So, e.g., you write something like this (by Seq I mean scala.collection.immutable.Seq :)):
// talk to the interface - good OOP practice
doSomething[A](xs: Seq[A]) = ???
not something like the following:
// talk to the implementation - bad OOP practice
doSomething[A](xs: List[A]) = ???
However, in pure functional programming languages, such as Haskell, you don't have subtype polymorphism and use, instead, ad hoc polymorphism through type classes. So, for example, you have the list data type and a monadic instance for list. You don't need to worry about using an interface/abstract class because you don't have such a concept.
In hybrid languages, such as Scala, you have both type classes (through a pattern, actually, and not first-class citizens as in Haskell, but I digress) and subtype polymorphism. In scalaz, cats and so on you have monadic instances for concrete types, not for the abstract ones, of course.
Finally the question: given this hybridism of Scala do you still respect the OOP rule to talk to interfaces or just talk to concrete types to take advantage of functors, monads and so on directly without having to convert to a concrete type whenever you need to use them? Put differently, is in Scala still good practice to talk to interfaces even if you want to embrace FP instead of OOP? If not, what if you chose to use List and, later on, you realized that a Vector would have been a better choice?
P.S.: In my examples I used a simple method, but the same reasoning applies to user defined types. E.g.:
case class Foo(bars: Seq[Bar], ...)

What I would attack here is your "concrete vs. interface" concept. Look at it this way: every type has an interface, in the general sense of the term "interface." A "concrete" type is just a limiting case.
So let's look at Haskell lists from this angle. What's the interface of a list? Well, lists are an algebraic data type, and all such data types have the same general form of interface and contract:
You can construct instances of the type using its constructors according to their arities and argument types;
You can observe instances of the type by matching against their constructors according to their arities and argument types;
Construction and observation are inverses—when you pattern match against a value, what you get out is exactly what was put into it.
If you look at it in these terms, I think the following rule works pretty well in either paradigm:
Choose types whose interfaces and contracts match exactly with your requirements.
If their contract is weaker than your requirements, then they won't maintain invariants that you need;
If their contracts are stronger than your requirements, you may unintentionally couple yourself to the "extra" details and limit your ability to change the program later on.
So you no longer ask whether a type is "concrete" or "abstract"—just whether it fits your requirements.

These are my two cents on this subject. In Haskell you have data types (ADTs). You have both lists (linked lists) and vectors (int-indexed arrays) but they don't share a common supertype. If your function takes a list you cannot pass it a vector.
In Scala, being it a hybrid OOP-FP language, you have subtype polymorphism too so you may not care if the client code passes a List or a Vector, just require a Seq (possibly immutable) and you're done.
I guess to answer to this question you have to ask yourself another question: "Do I want to embrace FP in toto?". If the answer is yes then you shouldn't use Seq or any other abstract superclass in the OOP sense. Of course, the exception to this rule is the use of a trait/abstract class when defining ADTs in Scala. For example:
sealed trait Tree[+A]
case object Empty extends Tree[Nothing]
case class Node[A](value: A, left: Tree[A], right: Tree[A]) extends Tree[A]
In this case one would require Tree[A] as a type, of course, and then use, e.g., pattern matching to determine if it's either Empty or Node[A].
I guess my feeling about this subject is confirmed by the red book (Functional Programming in Scala). There they never use Seq, but List, Vector and so on. Also, haskellers, don't care about these problems and use lists whenever they need linked-list semantic and vectors whenever they need int-indexed-array semantic.
If, on the other hand, you want to embrace OOP and use Scala as a better Java then OK, you should follow the OOP best practice to talk to interfaces not to implementations.
If you're thinking: "I'd rather opt for mostly functional" then you should read Erik Meijer's The Curse of the Excluded Middle.

Related

Use of the word Canonical

This might be a bit ambiguous but I'm struggling to get a tangible explanation of the use of the word canonical when reading about Scala and FP in general.
Some statements I have read:
Vector is the canonical concrete type for Seq
What is the canonical way to do this in Scala?
My understanding of the canonical form (in computing) is that it represents -
The default unique representation where more than one possible representation is possible.
Is it acceptable to say that asking what is the canonical way of doing something, is really just the same as asking what is the idiomatic way of doing something?
Is there a way to discover what the canonical type is for any abstract type in the Scala hierarchy in a particular context?
As #SarveshKumarSingh points out, generally in Scala, the word canonical means the same as it does elsewhere in English. So yes as to point 1, you can use canonical like that.
Point 2 is more interesting because the Scala standard library strongly suggests that certain concrete collections are the canonical implementations of abstract traits by having the apply method on the companion object return a specific concrete class that is just upcast to the abstract type.
To tell which concrete class is actually created when you do say Seq(1, 2, 3), if you have a little bit of familiarity with the collections hierarchy, you can just take a look at the source. In this case you'll see the Builder for a Seq is a ListBuffer, which means Seq.apply will give you back a List (not a Vector, you get that from IndexedSeq).

Scalaz Bind[Seq] typeclass

I'm currently porting some code from traditional Scala to Scalaz style.
It's fairly common through most of my code to use the Seq trait in my exposed API signatures rather than a concrete type (i.e. List, Vector) directly. However, this poses some problem with Scalaz, since it doesn't provide an implementation of a Bind[Seq] typeclass.
i.e. This will work correctly.
List(1,2,3,4) >>= bindOperation
But this will not
Seq(1,2,3,4) >>= bindOperation
failing with the error could not find implicit value for parameter F0: scalaz.Bind[Seq]
I assume this is an intentional design decision in Scalaz - however am unsure about intended/best practice on how to precede.
Should I instead write my code directly to List/Vector as appropriate instead of using the more flexible Seq interface? Or should I simply define my own Bind[Seq] typeclass?
The collections library does backflips to accommodate subtyping: when you use map on a specific collection type (list, map, etc.), you'll (usually) get the same type back. It manages this through the use of an extremely complex inheritance hierarchy together with type classes like CanBuildFrom. It gets the job done (at least arguably), but the complexity doesn't feel very principled. It's a mess. Lots of people hate it.
The complexity is generally pretty easy to avoid as a library user, but for a library designer it's a nightmare. If I provide a monad instance for Seq, that means all of my users' types get bumped up the hierarchy to Seq every type they use a monadic operation.
Scalaz folks tend not to like subtyping very much, anyway, so for the most part Scalaz stays around the leaves of the hierarchy—List, Vector, etc. You can see some discussion of this decision on the mailing list, for example.
When I first started using Scalaz I wrote a lot of utility code that tried to provide instances for Seq, etc. and make them usable with CanBuildFrom. Then I stopped, and now I tend to follow Scalaz in only ever using List, Vector, Map, and Set in my own code. If you're committed to "Scalaz style", you should do that as well (or even adopt Scalaz's own IList, ISet, ==>>, etc.). You're not going to find clear agreement on best practices more generally, though, and both approaches can be made to work, so you'll just need to experiment to find which you prefer.

Practical uses for Structural Types?

Structural types are one of those "wow, cool!" features of Scala. However, For every example I can think of where they might help, implicit conversions and dynamic mixin composition often seem like better matches. What are some common uses for them and/or advice on when they are appropriate?
Aside from the rare case of classes which provide the same method but aren't related nor do implement a common interface (for example, the close() method -- Source, for one, does not extend Closeable), I find no use for structural types with their present restriction. If they were more flexible, however, I could well write something like this:
def add[T: { def +(x: T): T }](a: T, b: T) = a + b
which would neatly handle numeric types. Every time I think structural types might help me with something, I hit that particular wall.
EDIT
However unuseful I find structural types myself, the compiler, however, uses it to handle anonymous classes. For example:
implicit def toTimes(count: Int) = new {
def times(block: => Unit) = 1 to count foreach { _ => block }
}
5 times { println("This uses structural types!") }
The object resulting from (the implicit) toTimes(5) is of type { def times(block: => Unit) }, ie, a structural type.
I don't know if Scala does that for every anonymous class -- perhaps it does. Alas, that is one reason why doing pimp my library that way is slow, as structural types use reflection to invoke the methods. Instead of an anonymous class, one should use a real class to avoid performance issues in pimp my library.
Structural types are very cool constructs in Scala. I've used them to represent multiple unrelated types that share an attribute upon which I want to perform a common operation without a new level of abstraction.
I have heard one argument against structural types from people who are strict about an application's architecture. They feel it is dangerous to apply a common operation across types without an associative trait or parent type, because you then leave the rule of what type the method should apply to open-ended. Daniel's close() example is spot on, but what if you have another type that requires different behavior? Someone who doesn't understand the architecture might use it and cause problems in the system.
I think structural types are one of these features that you don't need that often, but when you need it, it helps you a lot. One area where structural types really shine is "retrofitting", e.g. when you need to glue together several pieces of software you have no source code for and which were not intended for reuse. But if you find yourself using structural types a lot, you're probably doing it wrong.
[Edit]
Of course implicits are often the way to go, but there are cases when you can't: Imagine you have a mutable object you can modify with methods, but which hides important parts of it's state, a kind of "black box". Then you have to work somehow with this object.
Another use case for structural types is when code relies on naming conventions without a common interface, e.g. in machine generated code. In the JDK we can find such things as well, like the StringBuffer / StringBuilder pair (where the common interfaces Appendable and CharSequence are way to general).
Structural types gives some benefits of dynamic languages to a statically linked language, specifically loose coupling. If you want a method foo() to call instance methods of class Bar, you don't need an interface or base-class that is common to both foo() and Bar. You can define a structural type that foo() accepts and whose Bar has no clue of existence. As long as Bar contains methods that match the structural type signatures, foo() will be able to call.
It's great because you can put foo() and Bar on distinct, completely unrelated libraries, that is, with no common referenced contract. This reduces linkage requirements and thus further contributes for loose coupling.
In some situations, a structural type can be used as an alternative to the Adapter pattern, because it offers the following advantages:
Object identity is preserved (there is no separate object for the adapter instance, at least in the semantic level).
You don't need to instantiate an adapter - just pass a Bar instance to foo().
You don't need to implement wrapper methods - just declare the required signatures in the structural type.
The structural type doesn't need to know the actual instance class or interface, while the adapter must know Bar so it can call its methods. This way, a single structural type can be used for many actual types, whereas with adapter it's necessary to code multiple classes - one for each actual type.
The only drawback of structural types compared to adapters is that a structural type can't be used to translate method signatures. So, when signatures doesn't match, you must use adapters that will have some translation logic. I particularly don't like to code "intelligent" adapters because in many times they are more than just adapters and cause increased complexity. If a class client needs some additional method, I prefer to simply add such method, since it usually doesn't affect footprint.

What is a "multisorted algebra", and how do I use it to solve "real problems"?

Apparently, Alexander Stepanov has stated the following in an interview:
“I find OOP [object-oriented programming] technically unsound. It attempts to decompose the world in terms of interfaces that vary on a single type. To deal with the real problems you need multisorted algebras - families of interfaces that span multiple types.” [Emphasis added.]
Ignoring his statement regarding OOP for a moment, what are "multisorted algebras", beyond his terse definition, and can you give a practical example of how they are used (in the language of your choice)?
I believe he was talking about generic programming (he coined the term), whether meant in the context of this talk about the STL, or 'at large', in the sense of:
programming against a sort of interface that describes something that could fit all (and hopefully several) types (hence multi-sorted), ...
... provided they have some properties, often something about the nature of some operations on elements of that type (hence algebras).
To do (1), you need to have a way to specify a program that takes a type as a parameter, i.e. polymorphism, and to do (2), you need a way to say that you also want that type to carry specific operations (and, provided you can express them, properties). In effect, you're parametrizing your program by the structure of the data it manipulates. The paradigm is called in some places bounded polymorphism, datatype-generic programming, ... which reflects that languages have different notions of how to implement that idea — hence the italicized 'sort of' above.
For C++, it seems that —to Stepanov at least— this corresponds to templates (though ideas on how to do this best are still evolving).
For OO languages (Generic Java, C#), constraints on type parameters are typically expressed using subtype bounds ('bounded wildcards' ...).
For Haskell or Scala, you have (respectively, and similarly) type classes or implicits.
The ML family of languages prefers to do this using modules.
note that a number of proof assistants (which can express 'honest-to-god' properties as types) have developed a flavor of type classes : Isabelle, Coq, Matita are such examples
Note that Stepanov just co-wrote an entire book giving an exhaustive development of a library that embodies exactly what (I think) he means. So if you want examples in C++, this is definitely where you should look. Note also that this is much more evolved than the now-common advice of coding against an interface, rather than an object.
By 'practical example', I don't know if you mean 'how' or 'why' does one uses it. To give a caricaturally quick answer to the 'why', genericity is nice because, a bit like run-of-the-mill polymorphism, it lets you reuse code. But, more importantly:
polymorphic code that has to work with every single type often can't do anything interesting, whereas having a constrained interface to play with allows you to write richer programs
by specifying how that interface fits some your data, you have a type-safe way to select just those elements that suit your needs. For example, you probably know that the reduction operator (the reduce of Python & Hadoop, fold of a bunch of functional languages) is parallelizable only if the order in which you apply your reduction function doesn't matter (+, x, min, and work, but set difference doesn't). If you have a notion of 'type equipped with an associative operation', you know that you will be able to call a parallel reduction on it.
any overhead incurred by genericity occurs at compile time. For example, templates are legendarily fast
If you have seen some generic Java, look at say, the Comparable generic interface. It defines just one operation, but the contract it makes, though basic, is very much of algebraic flavor. I quote:
For the mathematically inclined, the relation that defines the natural ordering on a given class C is:
{(x, y) such that x.compareTo((Object)y) <= 0}.
The quotient for this total order is:
{(x, y) such that x.compareTo((Object)y) == 0}.
It follows immediately from the contract for compareTo that the quotient is an equivalence relation on C, > and that the natural ordering is a total order on C.
Now, I can write a method that selects the minimum, once, and use it for any type that fits this interface:
public static <T extends Comparable<T>> T min (T x, T y) {
if (x.compare(y) < 0) x; else y;
}
Naturally, since the way programmative constructs implement that notion varies wildly, what you will get in terms of usability & expressivity will also vary. Perhaps you should not judge data-generic programming just by OO languages like C++ or Java ­— but I've written too much already to start with module ascription or the automatic instance generation of type classes.
I'm too late, but maybe it will be helpful for you. User huitseeker wrote an excellent answer from the viewpoint of software design. I want to answer your question from the viewpoint of mathematics. Before diving into software world Alex Stepanov was a mathematician and studied abstract and universal algebra. And he often tried to bring rigorous mathematical foundations into the world of software and algorithm design. In his books From Mathematics to Generic Programming and Elements of Programming he advocates this design practice. His ideas about mixing concepts of algebraic structures and software design were realised in the notion of generic programming. And now let's talk about his quote:
To deal with the real problems you need multisorted algebras - families of interfaces that span multiple types
In my opinion there are two main concepts he wanted to mention here: the idea of abstract data type (ADT) and algebraic structure. First concept: ADT. ADT - is a mathematical model for a data types where a data type is defined only by it's semantic. Stepanov contrasted the idea of ADT to the idea of object in the OOP sense. Objects contains data and state whilst ADTs - not. ADT - is a behavioural abstraction, an operation cluster which describes interaction with data. Behavioural abstraction is entirely described by means of algebraic specification of abstract data type. You can read about this more in the original Liskov and Zilles paper, also I recommend you a paper Object-Oriented Programming Versus Abstract Data Types by William R. Cook.
(Discalimer: you can skip this paragraph, because it is more "mathematical and not so important") At first I want to clarify some terminology. When I talk about the algebraic structure it is the same as algebra. The word algebra is often also used for an algebraic structure. To be more precise when we talk about algebraic structures (algebras) we usually mean algebra over an algebraic theory. There is a concept of the variety of algebras, because there are several notions of an algebraic structure on an object of some category. By definition, an algebraic theory (algebra over it) consists of a specification of operations and laws that these operations must satisfy: this is a working definition of the algebraic structure we will use, and this definition ,I think, Stepanov implicitly mentioned in the quote.
Second concept which Stepanov wanted to mention is the most interesting property of ADTs: they can be formally modelled directly as many-sorted algebraic structures. Let's talk about it more formally. An algebraic structure - is a carrier set with one or more finitary operations defined on it. These operations are usually defined not over one set but over the multiple ones. E.g. let's define and algebra which models string concatenation. This algebra will be defined not over one set of strings but over two sets: strings set S and natural numbers set N, because we can define an operation which can concatenate a string with itself some finite number of times. So, this operation will take two operands, which belongs to different underlying (carrier) sets: S and N. Set which define these different operands (their types) in algebra called a set of sorts. Sort is an algebraical analog of the type. Algebra with multiple sorts called a multi-sorted algebra. In universal algebra, a signature lists the operations that characterize an algebraic structure. A many-sorted algebraic structure can have an arbitrary number of domains. The sorts are part of the signature, and they play the role of names for the different domains. Many-sorted signatures also prescribe on which sorts the functions and relations of a many-sorted algebraic structure are defined. For a one-sorted variety of algebras a signature is a set, whose elements are called operations, to each of which is assigned a cardinal number (0,1,2,…) called its arity. A signature of multi-sorted algebra can be defined as Σ = (S,OP,A), where S – set of sort names (types), OP - set of operation names and A - arities as before, except that now an arity is a list (sequence or more generally free monoid) of input sorts instead of merely a natural number (the length of the list) together with one output sort. Now we can create an algebraic specification of an abstract data type ADT as a triple:
ADT = (N, Σ, E)
, where N - name of abstract data type, Σ = (S,OP,A) - signature of multi-sorted algebraic structure, E = {e1, e2, …,en} - is a finite collection of equalities in the signature. As can you see now we have a rigorous mathematical description of ADT. In mathematics many-sorted algebraic structures are often used as a convenient tool even when they could be avoided with a little effort. Many-sorted algebraic structures are rarely defined in a rigorous way, because it is straightforward to carry out the generalization explicitly. That's why theory of many-sorted algebras can be successfully applied to software design.
So, Alex Stepanov wanted to say that he prefer ADTs and generic programming to OOP, because thus we can create programs with rigorous mathematical/algebraical foundations. I appreciate his efforts a lot. We all know that algebraical design is always correct, rigorous, beautiful, simple and gives us better abstractions.
Not that I am an expert with the theory of any of those but let's take a look at the quote so that I can try to give my practical understanding to add to the discussion.
To deal with the real problems you need multisorted algebras - families of interfaces that span multiple types.
From my readings, I think families of interfaces that span multiple types sounds a lot like type classes from Haskell, which is similar to concepts from C++. Take a type class like Foldable it actually is a type parametrized interface, ie. a familiy of interfaces that span multiple types. So about your question of how to solve problems with multisorted algebras, generic programming is all about that if you take it to mean type classes or concepts.

Disadvantages of Scala type system versus Haskell?

I have read that Scala's type system is weakened by Java interoperability and therefore cannot perform some of the same powers as Haskell's type system. Is this true? Is the weakness because of type erasure, or am I wrong in every way? Is this difference the reason that Scala has no typeclasses?
The big difference is that Scala doesn't have Hindley-Milner global type inference and instead uses a form of local type inference, requiring you to specify types for method parameters and the return type for overloaded or recursive functions.
This isn't driven by type erasure or by other requirements of the JVM. All possible difficulties here can be overcome, and have been, just consider Jaskell - http://docs.codehaus.org/display/JASKELL/Home
H-M inference doesn't work in an object-oriented context. Specifically, when type-polymorphism is used (as opposed to the ad-hoc polymorphism of type classes). This is crucial for strong interop with other Java libraries, and (to a lesser extent) to get the best possible optimisation from the JVM.
It's not really valid to state that either Haskell or Scala has a stronger type system, just that they are different. Both languages are pushing the boundaries for type-based programming in different directions, and each language has unique strengths that are hard to duplicate in the other.
Scala's type system is different from Haskell's, although Scala's concepts are sometimes directly inspired by Haskell's strengths and its knowledgeable community of researchers and professionals.
Of course, running on a VM not primarily intended for functional programming in the first place creates some compatibility concerns with existing languages targeting this platform.
Because most of the reasoning about types happens at compile time, the limitations of Java (as a language and as a platform) at runtime are nothing to be concerned about (except Type Erasure, although exactly this bug seems to make the integration into the Java ecosystem more seamless).
As far as I know the only "compromise" on the type system level with Java is a special syntax to handle Raw Types. While Scala doesn't even allow Raw Types anymore, it accepts older Java class files with that bug.
Maybe you have seen code like List[_] (or the longer equivalent List[T] forSome { type T }). This is a compatibility feature with Java, but is treated as an existential type internally too and doesn't weaken the type system.
Scala's type system does support type classes, although in a more verbose way than Haskell. I suggest reading this paper, which might create a different impression on the relative strength of Scala's type system (the table on page 17 serves as a nice list of very powerful type system concepts).
Not necessarily related to the power of the type system is the approach Scala's and Haskell's compilers use to infer types, although it has some impact on the way people write code.
Having a powerful type inference algorithm can make it worthwhile to write more abstract code (you can decide yourself if that is a good thing in all cases).
In the end Scala's and Haskell's type system are driven by the desire to provide their users with the best tools to solve their problems, but have taken different paths to that goal.
another interesting point to consider is that Scala directly supports the classical OO-style. Which means, there are subtype relations (e.g. List is a subclass of Seq). And this makes type inference more tricky. Add to this the fact that you can mix in traits in Scala, which means that a given type can have multiple supertype relations (making it yet more tricky)
Scala does not have rank-n types, although it may be possible to work around this limitation in certain cases.
I only have little experenice with Haskell, but the most obvious thing I note that Scala type system different from Haskell is the type inference.
In Scala, there is no global type inference, you must explicit tell the type of function arguments.
For example, in Scala you need to write this:
def add (x: Int, y: Int) = x + y
instead of
add x y = x + y
This may cause problem when you need generic version of add function that work with all kinds of type has the "+" method. There is a workaround for this, but it will get more verbose.
But in real use, I found Scala's type system is powerful enough for daily usage, and I almost never use those workaround for generic, maybe this is because I come from Java world.
And the limitation of explicit declare the type of arguments is not necessary a bad thing, you need document it anyway.
Well are they Turing reducible?
See Oleg Kiselyov's page http://okmij.org/ftp/
...
One can implement the lambda calculus in Haskell's type system. If Scala can do that, then in a sense Haskell's type system and Scala's type system compute the same types. The questions are: How natural is one over the other? How elegant is one over the other?