When are higher kinded types useful? - scala

I've been doing dev in F# for a while and I like it. However one buzzword I know doesn't exist in F# is higher-kinded types. I've read material on higher-kinded types, and I think I understand their definition. I'm just not sure why they're useful. Can someone provide some examples of what higher-kinded types make easy in Scala or Haskell, that require workarounds in F#? Also for these examples, what would the workarounds be without higher-kinded types (or vice-versa in F#)? Maybe I'm just so used to working around it that I don't notice the absence of that feature.
(I think) I get that instead of myList |> List.map f or myList |> Seq.map f |> Seq.toList higher kinded types allow you to simply write myList |> map f and it'll return a List. That's great (assuming it's correct), but seems kind of petty? (And couldn't it be done simply by allowing function overloading?) I usually convert to Seq anyway and then I can convert to whatever I want afterwards. Again, maybe I'm just too used to working around it. But is there any example where higher-kinded types really saves you either in keystrokes or in type safety?

So the kind of a type is its simple type. For instance Int has kind * which means it's a base type and can be instantiated by values. By some loose definition of higher-kinded type (and I'm not sure where F# draws the line, so let's just include it) polymorphic containers are a great example of a higher-kinded type.
data List a = Cons a (List a) | Nil
The type constructor List has kind * -> * which means that it must be passed a concrete type in order to result in a concrete type: List Int can have inhabitants like [1,2,3] but List itself cannot.
I'm going to assume that the benefits of polymorphic containers are obvious, but more useful kind * -> * types exist than just the containers. For instance, the relations
data Rel a = Rel (a -> a -> Bool)
or parsers
data Parser a = Parser (String -> [(a, String)])
both also have kind * -> *.
We can take this further in Haskell, however, by having types with even higher-order kinds. For instance we could look for a type with kind (* -> *) -> *. A simple example of this might be Shape which tries to fill a container of kind * -> *.
data Shape f = Shape (f ())
Shape [(), (), ()] :: Shape []
This is useful for characterizing Traversables in Haskell, for instance, as they can always be divided into their shape and contents.
split :: Traversable t => t a -> (Shape t, [a])
As another example, let's consider a tree that's parameterized on the kind of branch it has. For instance, a normal tree might be
data Tree a = Branch (Tree a) a (Tree a) | Leaf
But we can see that the branch type contains a Pair of Tree as and so we can extract that piece out of the type parametrically
data TreeG f a = Branch a (f (TreeG f a)) | Leaf
data Pair a = Pair a a
type Tree a = TreeG Pair a
This TreeG type constructor has kind (* -> *) -> * -> *. We can use it to make interesting other variations like a RoseTree
type RoseTree a = TreeG [] a
rose :: RoseTree Int
rose = Branch 3 [Branch 2 [Leaf, Leaf], Leaf, Branch 4 [Branch 4 []]]
Or pathological ones like a MaybeTree
data Empty a = Empty
type MaybeTree a = TreeG Empty a
nothing :: MaybeTree a
nothing = Leaf
just :: a -> MaybeTree a
just a = Branch a Empty
Or a TreeTree
type TreeTree a = TreeG Tree a
treetree :: TreeTree Int
treetree = Branch 3 (Branch Leaf (Pair Leaf Leaf))
Another place this shows up is in "algebras of functors". If we drop a few layers of abstractness this might be better considered as a fold, such as sum :: [Int] -> Int. Algebras are parameterized over the functor and the carrier. The functor has kind * -> * and the carrier kind * so altogether
data Alg f a = Alg (f a -> a)
has kind (* -> *) -> * -> *. Alg useful because of its relation to datatypes and recursion schemes built atop them.
-- | The "single-layer of an expression" functor has kind `(* -> *)`
data ExpF x = Lit Int
| Add x x
| Sub x x
| Mult x x
-- | The fixed point of a functor has kind `(* -> *) -> *`
data Fix f = Fix (f (Fix f))
type Exp = Fix ExpF
exp :: Exp
exp = Fix (Add (Fix (Lit 3)) (Fix (Lit 4))) -- 3 + 4
fold :: Functor f => Alg f a -> Fix f -> a
fold (Alg phi) (Fix f) = phi (fmap (fold (Alg phi)) f)
Finally, though they're theoretically possible, I've never see an even higher-kinded type constructor. We sometimes see functions of that type such as mask :: ((forall a. IO a -> IO a) -> IO b) -> IO b, but I think you'll have to dig into type prolog or dependently typed literature to see that level of complexity in types.

Consider the Functor type class in Haskell, where f is a higher-kinded type variable:
class Functor f where
fmap :: (a -> b) -> f a -> f b
What this type signature says is that fmap changes the type parameter of an f from a to b, but leaves f as it was. So if you use fmap over a list you get a list, if you use it over a parser you get a parser, and so on. And these are static, compile-time guarantees.
I don't know F#, but let's consider what happens if we try to express the Functor abstraction in a language like Java or C#, with inheritance and generics, but no higher-kinded generics. First try:
interface Functor<A> {
Functor<B> map(Function<A, B> f);
}
The problem with this first try is that an implementation of the interface is allowed to return any class that implements Functor. Somebody could write a FunnyList<A> implements Functor<A> whose map method returns a different kind of collection, or even something else that's not a collection at all but is still a Functor. Also, when you use the map method you can't invoke any subtype-specific methods on the result unless you downcast it to the type that you're actually expecting. So we have two problems:
The type system doesn't allow us to express the invariant that the map method always returns the same Functor subclass as the receiver.
Therefore, there's no statically type-safe manner to invoke a non-Functor method on the result of map.
There are other, more complicated ways you can try, but none of them really works. For example, you could try augment the first try by defining subtypes of Functor that restrict the result type:
interface Collection<A> extends Functor<A> {
Collection<B> map(Function<A, B> f);
}
interface List<A> extends Collection<A> {
List<B> map(Function<A, B> f);
}
interface Set<A> extends Collection<A> {
Set<B> map(Function<A, B> f);
}
interface Parser<A> extends Functor<A> {
Parser<B> map(Function<A, B> f);
}
// …
This helps to forbid implementers of those narrower interfaces from returning the wrong type of Functor from the map method, but since there is no limit to how many Functor implementations you can have, there is no limit to how many narrower interfaces you'll need.
(EDIT: And note that this only works because Functor<B> appears as the result type, and so the child interfaces can narrow it. So AFAIK we can't narrow both uses of Monad<B> in the following interface:
interface Monad<A> {
<B> Monad<B> flatMap(Function<? super A, ? extends Monad<? extends B>> f);
}
In Haskell, with higher-rank type variables, this is (>>=) :: Monad m => m a -> (a -> m b) -> m b.)
Yet another try is to use recursive generics to try and have the interface restrict the result type of the subtype to the subtype itself. Toy example:
/**
* A semigroup is a type with a binary associative operation. Law:
*
* > x.append(y).append(z) = x.append(y.append(z))
*/
interface Semigroup<T extends Semigroup<T>> {
T append(T arg);
}
class Foo implements Semigroup<Foo> {
// Since this implements Semigroup<Foo>, now this method must accept
// a Foo argument and return a Foo result.
Foo append(Foo arg);
}
class Bar implements Semigroup<Bar> {
// Any of these is a compilation error:
Semigroup<Bar> append(Semigroup<Bar> arg);
Semigroup<Foo> append(Bar arg);
Semigroup append(Bar arg);
Foo append(Bar arg);
}
But this sort of technique (which is rather arcane to your run-of-the-mill OOP developer, heck to your run-of-the-mill functional developer as well) still can't express the desired Functor constraint either:
interface Functor<FA extends Functor<FA, A>, A> {
<FB extends Functor<FB, B>, B> FB map(Function<A, B> f);
}
The problem here is this doesn't restrict FB to have the same F as FA—so that when you declare a type List<A> implements Functor<List<A>, A>, the map method can still return a NotAList<B> implements Functor<NotAList<B>, B>.
Final try, in Java, using raw types (unparametrized containers):
interface FunctorStrategy<F> {
F map(Function f, F arg);
}
Here F will get instantiated to unparametrized types like just List or Map. This guarantees that a FunctorStrategy<List> can only return a List—but you've abandoned the use of type variables to track the element types of the lists.
The heart of the problem here is that languages like Java and C# don't allow type parameters to have parameters. In Java, if T is a type variable, you can write T and List<T>, but not T<String>. Higher-kinded types remove this restriction, so that you could have something like this (not fully thought out):
interface Functor<F, A> {
<B> F<B> map(Function<A, B> f);
}
class List<A> implements Functor<List, A> {
// Since F := List, F<B> := List<B>
<B> List<B> map(Function<A, B> f) {
// ...
}
}
And addressing this bit in particular:
(I think) I get that instead of myList |> List.map f or myList |> Seq.map f |> Seq.toList higher kinded types allow you to simply write myList |> map f and it'll return a List. That's great (assuming it's correct), but seems kind of petty? (And couldn't it be done simply by allowing function overloading?) I usually convert to Seq anyway and then I can convert to whatever I want afterwards.
There are many languages that generalize the idea of the map function this way, by modeling it as if, at heart, mapping is about sequences. This remark of yours is in that spirit: if you have a type that supports conversion to and from Seq, you get the map operation "for free" by reusing Seq.map.
In Haskell, however, the Functor class is more general than that; it isn't tied to the notion of sequences. You can implement fmap for types that have no good mapping to sequences, like IO actions, parser combinators, functions, etc.:
instance Functor IO where
fmap f action =
do x <- action
return (f x)
-- This declaration is just to make things easier to read for non-Haskellers
newtype Function a b = Function (a -> b)
instance Functor (Function a) where
fmap f (Function g) = Function (f . g) -- `.` is function composition
The concept of "mapping" really isn't tied to sequences. It's best to understand the functor laws:
(1) fmap id xs == xs
(2) fmap f (fmap g xs) = fmap (f . g) xs
Very informally:
The first law says that mapping with an identity/noop function is the same as doing nothing.
The second law says that any result that you can produce by mapping twice, you can also produce by mapping once.
This is why you want fmap to preserve the type—because as soon as you get map operations that produce a different result type, it becomes much, much harder to make guarantees like this.

I don't want to repeat information in some excellent answers already here, but there's a key point I'd like to add.
You usually don't need higher-kinded types to implement any one particular monad, or functor (or applicative functor, or arrow, or ...). But doing so is mostly missing the point.
In general I've found that when people don't see the usefulness of functors/monads/whatevers, it's often because they're thinking of these things one at a time. Functor/monad/etc operations really add nothing to any one instance (instead of calling bind, fmap, etc I could just call whatever operations I used to implement bind, fmap, etc). What you really want these abstractions for is so you can have code that works generically with any functor/monad/etc.
In a context where such generic code is widely used, this means any time you write a new monad instance your type immediately gains access to a large number of useful operations that have already been written for you. That's the point of seeing monads (and functors, and ...) everywhere; not so that I can use bind rather than concat and map to implement myFunkyListOperation (which gains me nothing in itself), but rather so that when I come to need myFunkyParserOperation and myFunkyIOOperation I can re-use the code I originally saw in terms of lists because it's actually monad-generic.
But to abstract across a parameterised type like a monad with type safety, you need higher-kinded types (as well explained in other answers here).

For a more .NET-specific perspective, I wrote a blog post about this a while back. The crux of it is, with higher-kinded types, you could potentially reuse the same LINQ blocks between IEnumerables and IObservables, but without higher-kinded types this is impossible.
The closest you could get (I figured out after posting the blog) is to make your own IEnumerable<T> and IObservable<T> and extended them both from an IMonad<T>. This would allow you to reuse your LINQ blocks if they're denoted IMonad<T>, but then it's no longer typesafe because it allows you to mix-and-match IObservables and IEnumerables within the same block, which while it may sound intriguing to enable this, you'd basically just get some undefined behavior.
I wrote a later post on how Haskell makes this easy. (A no-op, really--restricting a block to a certain kind of monad requires code; enabling reuse is the default).

The most-used example of higher-kinded type polymorphism in Haskell is the Monad interface. Functor and Applicative are higher-kinded in the same way, so I'll show Functor in order to show something concise.
class Functor f where
fmap :: (a -> b) -> f a -> f b
Now, examine that definition, looking at how the type variable f is used. You'll see that f can't mean a type that has value. You can identify values in that type signature because they're arguments to and results of a functions. So the type variables a and b are types that can have values. So are the type expressions f a and f b. But not f itself. f is an example of a higher-kinded type variable. Given that * is the kind of types that can have values, f must have the kind * -> *. That is, it takes a type that can have values, because we know from previous examination that a and b must have values. And we also know that f a and f b must have values, so it returns a type that must have values.
This makes the f used in the definition of Functor a higher-kinded type variable.
The Applicative and Monad interfaces add more, but they're compatible. This means that they work on type variables with kind * -> * as well.
Working on higher-kinded types introduces an additional level of abstraction - you aren't restricted to just creating abstractions over basic types. You can also create abstractions over types that modify other types.

Why might you care about Applicative? Because of traversals.
class (Functor t, Foldable t) => Traversable t where
traverse :: Applicative f => (a -> f b) -> t a -> f (t b)
type Traversal s t a b = forall f. Applicative f => (a -> f b) -> s -> f t
Once you've written a Traversable instance, or a Traversal for some type, you can use it for an arbitrary Applicative.
Why might you care about Monad? One reason is streaming systems like pipes, conduit, and streaming. These are entirely non-trivial systems for working with effectful streams. With the Monad class, we can reuse all that machinery for whatever we like, rather than having to rewrite it from scratch each time.
Why else might you care about Monad? Monad transformers. We can layer monad transformers however we like to express different ideas. The uniformity of Monad is what makes this all work.
What are some other interesting higher-kinded types? Let's say ... Coyoneda. Want to make repeated mapping fast? Use
data Coyoneda f a = forall x. Coyoneda (x -> a) (f x)
This works or any functor f passed to it. No higher-kinded types? You'll need a custom version of this for each functor. This is a pretty simple example, but there are much trickier ones you might not want to have to rewrite every time.

Recently stated learning a bit about higher kinded types. Although it's an interesting idea, to be able to have a generic that needs another generic but apart from library developers, i do not see any practical use in any real app. I use scala in business app, i have also seen and studied the code of some nicely designed sgstems and libraries like kafka, akka and some financial apps. Nowhere I found any higher kinded type in use.
It seems like they are nice for academia or similar but the market doesn't need it or hasn't reached to a point where HKT has any practical uses or proves to be better than other existing techniques. To me it's something that you can use to impress others or write blog posts but nothing more than that. It's like multiverse or string theory. Looks nice on paper, gives you hours to speak about but nothing real
(sorry if you don't have any interest in theoratical physiss). One prove is that all the answers above, they all brilliantly describe the mechanics fail to cite one true real case where we would need it despite being the fact it's been 6+ years since OP posted it.

Related

Why is FunctionK not the same as a Natural Transformation

The cats documentation on FunctionK contains:
Thus natural transformation can be implemented in terms of FunctionK. This is why a parametric polymorphic function FunctionK[F, G] is sometimes referred as a natural transformation. However, they are two different concepts that are not isomorphic.
What's an example of a FunctionK which isn't a Natural Transformation (or vice versa)?
It's not clear to me whether this statement is implying F and G need to have Functor instances, for a FunctionK to be a Natural Transformation.
The Wikipedia article on Natural Transformations says that the Commutative Diagram can be written with a Contravariant Functor instead of a Covariant Functor, which to me implies that a Functor instance isn't required?
Alternatively, the statement could be refering to impure FunctionKs, although I'd kind of expect analogies to category theory breaking down in the presence of impurity to be a given; and not need explicitly stating?
When you have some objects (from CT) in one category and some objects in another category, and you are able to come up with a way show that each object and arrow between objects has a correspondence in later then you can say that there is a functor from one to another. In less strict language: you can say that there is a functor from A to B if you can find a "subgraph" in B that has the same shape as A.
In such case you can "zoom out": draw a point, call it object representing category A, draw another, call it object representing B, and draw an arrow and call it functor.
If there are many ways you can do it, you can have multiple functors. And with more categories you can combine these functors like you compose arrows. Which in this "zoomed out" world look like normal objects and arrows. And you can form categories with them. If you can find a mapping between these categories, a functor on functors, then this is a natural transformation.
When it comes to functional programming you don't work in such generic framework. Usually, you assume that:
object is a type
arrow is a function
to define a category you almost always would have to use a generic type, or else it would be too specific to be useful as a general purpose library (isomorphism between e.g. possible transitions of one enum into transition states of another enum could be a functor, but that wouldn't necessarily fit some generic interface)
since programming languages cannot let you define a generic mapping between two arbitrary types, functors that you'll see will be almost exclusively Id ~> F: you can lift a function A => B into List[A] => List[B], Future[A] => Future[B] and so one easily (this proves existence of F[A] -> F[B] arrow for given A -> B arrow, and if A and B are generic you provided a proof for all arrows from Id), but try finding something more complex than "given A, add a wrapper around it to get F[A]" and it's a challenge
similarly the only natural transformations you'll see will be from Id ~> F into Id ~> G that is "given A, change the wrapper type from F[A] to G[A]", because you have a guarantee that there is the same A hidden somehow in both F and G and you don't have to deal with modifying it (only with modifying the wrapper)
The latter being exactly a FunctionK or just a polymorphic function in Scala 3: [A] => F[A] => G[A]. A concept from type theory rather than from CT (although in mathematics a lot of concepts map into each other, like here it FunctionK maps to natural transformation where objects represent types and arrows functions between them).
Category theory isn't so restrictive. As a matter of the fact, isn't even rooted in computer science and was not created with functional programmers in mind. Let's create non-FunctionK natural transformation which models some operation on "state machines" implementation:
object is a state in state machine of sort (let's say enum value)
arrow is a legal transition from one state to another (let say you can model it as a pair of enum values, and it somehow incorporates transitivity)
when you have 2 models of state machines with different enums, BUT you can take one model: one enum and allowed pairs and translate it to another model: another enum and its pair, you have a functor. one that doesn't implement cats.Functor but still
let's say you would model it with some class Translate[Enum1, Enum2] {...} interface
let's say that this translation also extends the model with some functionality, so it's actually Translate[Enum1, Enum2 | ExtensionX]
now, build another extension Translate[Enum1, Enum2 | ExtensionY]
if you can somehow convert Translate[A, B | ExtensionX] into Translate[A, B | ExtensionY] for a whole category of enums (described as A and B) then this would be a natural transformation
notice that it would not fit FunctionK just like Translate doesn't fit Functor
It's a bit stretched example, also hardly anyone implementing an isomorphisms between state machines would reach for functor as a way to describe it, but it should show that natural transformation is not FunctionK. And why it's so rare to see functors and natural transformations different than Id ~> F lifts and (Id ~> F) ~> (Id ~> G) rewrappings.
PS. When it comes to Scala, you can also meet CT used as: object is a type, arrow is a subtyping relationship, thus covariant functors and contravariant functors appear in type parameters. Since "A is a subtype of B" translates to "A can be always upcasted to B", then these 2 interpretations of functors will often be mashed together - something along "don't define map if you cannot upcast and don't define contramap if you cannot downcast parameter" (see also narrow and widen operations in Cats).
PS2. Authors might be defending against hardcore-CT fans: from the point of view of CT Kleisli and ReaderT are 2 different things, yet in Cats they are combined together into a single Kleisli type for convenience. I saw some people complaining about it so maybe authors of the documentation felt that they need this disclaimer.
You can write down FunctionK-instances for things that aren't functors at all, neither covariant nor contravariant.
For example, given
type F[X] = X => X
you could implement a FunctionK[F, F] by
new FunctionK[F, F] {
def apply[X](f: F[X]): F[X] = f andThen f
}
Here, the F cannot be considered to be a functor, because X appears with both variances. Thus, you get something that's certainly a FunctionK, but the question whether it's a natural transformation isn't even valid to begin with.
Note that this example does not depend on whether you take the general CT-definition or the narrow FP-definition of what a "functor" is, the mapping F is simply not functorial.

What does it mean for a value to "have a functor"?

I'm new to this, and I may be missing something important. I've read part one of Category Theory for Programmers, but the most abstract math I did in university was Group Theory so I'm having to read very slowly.
I would ultimately like to understand the theory to ground my use of the techniques, so whenever I feel like I've made some progress I return to the fantasy-land spec to test myself. This time I felt like I knew where to start: I started with Functor.
From the fantasy-land spec:
Functor
u.map(a => a) is equivalent to u (identity)
u.map(x => f(g(x))) is equivalent to u.map(g).map(f) (composition)
map method
map :: Functor f => f a ~> (a -> b) -> f b
A value which has a Functor must provide a map method. The map method takes one argument:
u.map(f)
f must be a function,
i. If f is not a function, the behaviour of map is unspecified.
ii.f can return any value.
iii. No parts of f's return value should be checked.
map must return a value of the same Functor
I don't understand the phrase "a value which has a functor".
What does it mean for a value to "have a functor"? I don't think "is a functor" or "belongs to a functor" would make any more sense here.
What is the documentation trying to say about u?
My understanding is that a functor is a mapping between categories (or in this case from a category to itself). We are working in the category of types, in which objects are types and morphisms are families of functions that take a type to another type, or a type to itself.
As far as I can tell, a functor maps a to Functor a. It has some kind of constructor like,
Functor :: a -> F a
and a map function like,
map :: Functor f => (a -> b) -> f a -> f b
The a refers to any type, and the family of functions (a -> b) refers to all morphisms pointing from any particular type a to any other particular type b. So it doesn't even really make sense to me to distinguish between values which do or do not "have a functor" because if at least one functor exists, then it exists "for" every type... so every value can be mapped by a function that belongs to a morphism between two types that has been lifted by a functor. What I mean is: please show me an example of a value which "does not have a functor".
The functor is the mapping; it isn't the type F a, and it isn't values of the type F a. So in the docs, the value u is not a functor, and I don't think it "has a" functor... it is a value of type F a.
As you say, a functor in programming is a mapping between types, but JavaScript has not types, so you have to imagine them. Here, imagine that the function f goes from type a to type b. The value u is the result of some functor F acting on a. The result of u.map(f) is of the type F b, so you have a mapping from F a (that's the type of u) to F b. Here, map is treated as a method of u.
That's as close as you can get to a functor in JavaScript. (It doesn't help that the same letter f is alternatively used for the function and the functor in the documentation.)

Sets, Functors and Eq confusion

A discussion came up at work recently about Sets, which in Scala support the zip method and how this can lead to bugs, e.g.
scala> val words = Set("one", "two", "three")
scala> words zip (words map (_.length))
res1: Set[(java.lang.String, Int)] = Set((one,3), (two,5))
I think it's pretty clear that Sets shouldn't support a zip operation, since the elements are not ordered. However, it was suggested that the problem is that Set isn't really a functor, and shouldn't have a map method. Certainly, you can get yourself into trouble by mapping over a set. Switching to Haskell now,
data AlwaysEqual a = Wrap { unWrap :: a }
instance Eq (AlwaysEqual a) where
_ == _ = True
instance Ord (AlwaysEqual a) where
compare _ _ = EQ
and now in ghci
ghci> import Data.Set as Set
ghci> let nums = Set.fromList [1, 2, 3]
ghci> Set.map unWrap $ Set.map Wrap $ nums
fromList [3]
ghci> Set.map (unWrap . Wrap) nums
fromList [1, 2, 3]
So Set fails to satisfy the functor law
fmap f . fmap g = fmap (f . g)
It can be argued that this is not a failing of the map operation on Sets, but a failing of the Eq instance that we defined, because it doesn't respect the substitution law, namely that for two instances of Eq on A and B and a mapping f : A -> B then
if x == y (on A) then f x == f y (on B)
which doesn't hold for AlwaysEqual (e.g. consider f = unWrap).
Is the substition law a sensible law for the Eq type that we should try to respect? Certainly, other equality laws are respected by our AlwaysEqual type (symmetry, transitivity and reflexivity are trivially satisfied) so substitution is the only place that we can get into trouble.
To me, substition seems like a very desirable property for the Eq class. On the other hand, some comments on a recent Reddit discussion include
"Substitution seems stronger than necessary, and is basically quotienting the type, putting requirements on every function using the type."
-- godofpumpkins
"I also really don't want substitution/congruence since there are many legitimate uses for values which we want to equate but are somehow distinguishable."
-- sclv
"Substitution only holds for structural equality, but nothing insists Eq is structural."
-- edwardkmett
These three are all pretty well known in the Haskell community, so I'd be hesitant to go against them and insist on substitability for my Eq types!
Another argument against Set being a Functor - it is widely accepted that being a Functor allows you to transform the "elements" of a "collection" while preserving the shape. For example, this quote on the Haskell wiki (note that Traversable is a generalization of Functor)
"Where Foldable gives you the ability to go through the structure processing the elements but throwing away the shape, Traversable allows you to do that whilst preserving the shape and, e.g., putting new values in."
"Traversable is about preserving the structure exactly as-is."
and in Real World Haskell
"...[A] functor must preserve shape. The structure of a collection should not be affected by a functor; only the values that it contains should change."
Clearly, any functor instance for Set has the possibility to change the shape, by reducing the number of elements in the set.
But it seems as though Sets really should be functors (ignoring the Ord requirement for the moment - I see that as an artificial restriction imposed by our desire to work efficiently with sets, not an absolute requirement for any set. For example, sets of functions are a perfectly sensible thing to consider. In any case, Oleg has shown how to write efficient Functor and Monad instances for Set that don't require an Ord constraint). There are just too many nice uses for them (the same is true for the non-existant Monad instance).
Can anyone clear up this mess? Should Set be a Functor? If so, what does one do about the potential for breaking the Functor laws? What should the laws for Eq be, and how do they interact with the laws for Functor and the Set instance in particular?
Another argument against Set being a Functor - it is widely accepted that being a Functor allows you to transform the "elements" of a "collection" while preserving the shape. [...] Clearly, any functor instance for Set has the possibility to change the shape, by reducing the number of elements in the set.
I'm afraid that this is a case of taking the "shape" analogy as a defining condition when it is not. Mathematically speaking, there is such a thing as the power set functor. From Wikipedia:
Power sets: The power set functor P : Set → Set maps each set to its power set and each function f : X → Y to the map which sends U ⊆ X to its image f(U) ⊆ Y.
The function P(f) (fmap f in the power set functor) does not preserve the size of its argument set, yet this is nonetheless a functor.
If you want an ill-considered intuitive analogy, we could say this: in a structure like a list, each element "cares" about its relationship to the other elements, and would be "offended" if a false functor were to break that relationship. But a set is the limiting case: a structure whose elements are indifferent to each other, so there is very little you can do to "offend" them; the only thing is if a false functor were to map a set that contains that element to a result that doesn't include its "voice."
(Ok, I'll shut up now...)
EDIT: I truncated the following bits when I quoted you at the top of my answer:
For example, this quote on the Haskell wiki (note that Traversable is a generalization of Functor)
"Where Foldable gives you the ability to go through the structure processing the elements but throwing away the shape, Traversable allows you to do that whilst preserving the shape and, e.g., putting new values in."
"Traversable is about preserving the structure exactly as-is."
Here's I'd remark that Traversable is a kind of specialized Functor, not a "generalization" of it. One of the key facts about any Traversable (or, actually, about Foldable, which Traversable extends) is that it requires that the elements of any structure have a linear order—you can turn any Traversable into a list of its elements (with Foldable.toList).
Another, less obvious fact about Traversable is that the following functions exist (adapted from Gibbons & Oliveira, "The Essence of the Iterator Pattern"):
-- | A "shape" is a Traversable structure with "no content,"
-- i.e., () at all locations.
type Shape t = t ()
-- | "Contents" without a shape are lists of elements.
type Contents a = [a]
shape :: Traversable t => t a -> Shape t
shape = fmap (const ())
contents :: Traversable t => t a -> Contents a
contents = Foldable.toList
-- | This function reconstructs any Traversable from its Shape and
-- Contents. Law:
--
-- > reassemble (shape xs) (contents xs) == Just xs
--
-- See Gibbons & Oliveira for implementation. Or do it as an exercise.
-- Hint: use the State monad...
--
reassemble :: Traversable t => Shape t -> Contents a -> Maybe (t a)
A Traversable instance for sets would violate the proposed law, because all non-empty sets would have the same Shape—the set whose Contents is [()]. From this it should be easy to prove that whenever you try to reassemble a set you would only ever get the empty set or a singleton back.
Lesson? Traversable "preserves shape" in a very specific, stronger sense than Functor does.
Set is "just" a functor (not a Functor) from the subcategory of Hask where Eq is "nice" (i.e. the subcategory where congruence, substitution, holds). If constraint kinds were around from way back then perhaps set would be a Functor of some kind.
Well, Set can be treated as a covariant functor, and as a contravariant functor; usually it's a covariant functor. And for it to behave regarding equality one has to make sure that whatever the implementation, it does.
Regarding Set.zip - it is nonsense. As well as Set.head (you have it in Scala). It should not exist.

Multiple flatMap methods for a single monad?

Does it make sense to define multiple flatMap (or >>= / bind in Haskell) methods in a Monad?
The very few monads I actually use (Option, Try, Either projections) only define one flatMap method.
For exemple, could it make sense to define a flatMap method on Option which would take a function producing a Try? So that Option[Try[User]] would be flattened as Option[User] for exemple? (Considering loosing the exception is not a problem ...)
Or a monad should just define one flatMap method, taking a function which produces the same kind of monad? I guess in this case the Either projections wouldn't be monads? Are they?
I have once seriously thought about this. As it turns out, such a construct (aside from losing all monadic capabilities) is not really interesting, since it is sufficient to provide a conversion from the inner to the outer container:
joinWith :: (Functor m, Monad m) => (n a -> m a) -> m (n a) -> m a
joinWith i = join . (fmap i)
bindWith :: (Functor m, Monad m) => (n a -> m a) -> m a -> (a -> n a) -> m a
bindWith i x f = joinWith i $ fmap f x
*Main> let maybeToList = (\x -> case x of Nothing -> []; (Just y) -> [y])
*Main> bindWith maybeToList [1..9] (\x -> if even x then Just x else Nothing)
[2,4,6,8]
It depends what "make sense" means.
If you mean is it consistent with the monad laws, then it's not exactly clear to me the question entirely makes sense. I'd have to see a concrete proposal to tell. If you do it the way I think you suggest, you'll probably end up violating composition at least in some corner cases.
If you mean is it useful, sure, you can always find cases where such things are useful. The problem is that if you start violating monad laws, you have left traps in your code for the unwary functional (category theory) reasoner. Better to make things that look kind of like monads actually be monads (and just one at a time, though you can provide an explicit way to switch a la Either--but you're right that as written LeftProjection and RightProjection are not, strictly speaking, monads). Or write really clear docs explaining that it isn't what it looks like. Otherwise someone will merrily go along assuming the laws hold, and *splat*.
flatMap or (>>=) doesn't typecheck for your Option[Try[ ]] example. In pseudo-Haskell notation
type OptionTry x = Option (Try x)
instance Monad OptionTry where
(>>=) :: OptionTry a -> (a -> OptionTry b) -> OptionTry b
...
We need bind/flatMap to return a value wrapped in the same context as the input value.
We can also see this by looking at the equivalent return/join implementation of a Monad. For OptionTry, join has the specialized type
instance Monad OptionTry where
join :: OptionTry (OptionTry a) -> OptionTry a
...
It should be clear with a bit of squinting that the "flat" part of flatMap is join (or concat for lists where the name derives from).
Now, it's possible for a single datatype to have multiple different binds. Mathematically, a Monad is actually the data type (or, really, the set of values that the monad consists of) along with the particular bind and return operations. Different operations lead to different (mathematical) Monads.
It doesn't make sense, for a specific data type, as far as I know, you can only have one definition for bind.
In haskell a monad is the following type class,
instance Monad m where
return :: a -> m a
bind :: m a -> (a -> m b) -> m b
Concretely For list Monad we have,
instance Monad [] where
return :: a -> [] a
(>>=) :: [] a -> (a -> [] b) -> [] b
Now let's consider a monadic function as.
actOnList :: a -> [] b
....
A Use case to illustrate,
$ [1,2,3] >>= actOnList
On the function actOnList we see that a list is a polymorphic type constraint by another type (here []). Then when we speak about the bind operator for list monad we speak about the bind operator defined by [] a -> (a -> [] b) -> [] b.
What you want to achieve is a bind operator defined like [] Maybe a -> (a -> [] b) -> [] b, this not a specialize version of the first one but another function and regarding it type signature i really doubt that it can be the bind operator of any kind of monad as you do not return what you have consumed. You surely go from one monad to another using a function but this function is definitely not another version of the bind operator of list.
Which is why I've said, It doesn't make sense, for a specific data type, as far as I know, you can only have one definition for bind.

Kind vs Rank in type theory

I'm having a hard time understanding Higher Kind vs Higher Rank types. Kind is pretty simple (thanks Haskell literature for that) and I used to think rank is like kind when talking about types but apparently not! I read the Wikipedia article to no avail. So can someone please explain what is a Rank? and what is meant by Higher Rank? Higher Rank Polymorphism? how that comes to Kinds (if any) ? Comparing Scala and Haskell would be awesome too.
The concept of rank is not really related to the concept of kinds.
The rank of a polymorphic type system describes where foralls may appear in types. In a rank-1 type system foralls may only appear at the outermost level, in a rank-2 type system they may appear at one level of nesting and so on.
So for example forall a. Show a => (a -> String) -> a -> String would be a rank-1 type and forall a. Show a => (forall b. Show b => b -> String) -> a -> String would be a rank-2 type. The difference between those two types is that in the first case, the first argument to the function can be any function that takes one showable argument and returns a String. So a function of type Int -> String would be a valid first argument (like a hypothetical function intToString), so would a function of type forall a. Show a => a -> String (like show). In the second case only a function of type forall a. Show a => a -> String would be a valid argument, i.e. show would be okay, but intToString wouldn't be. As a consequence the following function would be a legal function of the second type, but not the first (where ++ is supposed to represent string concatenation):
higherRankedFunction(f, x) = f("hello") ++ f(x) ++ f(42)
Note that here the function f is applied to (potentially) three different types of arguments. So if f were the function intToString this would not work.
Both Haskell and Scala are Rank-1 (so the above function can not be written in those languages) by default. But GHC contains a language extension to enable Rank-2 polymorphism and another one to enable Rank-n polymorphism for arbitrary n.