Related
I understand the advantage of IO monad and List Monad over Functor, however, I don't understand the advantage of Option/Maybe Monad over Functor.
Is that simply the integration of types of the language?
or
What is the advantage of Option/Maybe Monad over Functor in specific use?
PS. asking the advantage in specific use is not opinion-based because if there is, it can be pointed out without subjective aspect.
PS.PS. some member here is so eager to push repeatedly
Option is both a functor and a monad?
should be the answer, or QA is duplicate, but actually not.
I already know the basics such as
Every monad is an applicative functor and every applicative functor is a functor
as the accepted answer there, and that is not what I'm not asking here.
Having an excellent answer here that is Not included in the previous QA.
The aspect, detail, or resolution of each question is quite different, so please avoid "bundling" different things in rough manner here.
Let's look at the types.
fmap :: Functor f => (a -> b) -> (f a -> f b)
(<*>) :: Applicative f => f (a -> b) -> (f a -> f b)
flip (>>=) :: Monad f => (a -> f b) -> (f a -> f b)
For a functor, we can apply an ordinary function a -> b to a value of type f a to get a value of type f b. The function never gets a say in what happens to the f part, only the inside. Thinking of functors as sort of box-like things, the function in fmap never sees the box itself, just the inside, and the value gets taken out and put back into the exact same box it started at.
An applicative functor is slightly more powerful. Now, we have a function which is also in a box. So the function gets a say in the f part, in a limited sense. The function and the input each have f parts (which are independent of each other) which get combined into the result.
A monad is even more powerful. Now, the function does not, a priori, have an f part. The function takes a value and produces a new box. Whereas in the Applicative case, our function's box and our value's box were independent, in the Monad case, the function's box can depend on the input value.
Now, what's all this mean? You've asked me to focus on Maybe, so let's talk about Maybe in the concrete.
fmap :: (a -> b) -> (Maybe a -> Maybe b)
(<*>) :: Maybe (a -> b) -> (Maybe a -> Maybe b)
flip (>>=) :: (a -> Maybe b) -> (Maybe a -> Maybe b)
As a reminder, Maybe looks like this.
data Maybe a = Nothing | Just a
A Maybe a is a value which may or may not exist. From a functor perspective, we'll generally think of Nothing as some form of failure and Just a as a successful result of type a.
Starting with fmap, the Functor instance for Maybe allows us to apply a function to the inside of the Maybe, if one exists. The function gets no say over the success or failure of the operation: A failed Maybe (i.e. a Nothing) must remain failed, and a successful one must remain successful (obviously, we're glossing over undefined and other denotational semantic issues here; I'm assuming that the only way a function can fail is with Nothing).
Now (<*>), the applicative operator, takes a Maybe (a -> b) and a Maybe a. Either of those two might have failed. If either of them did, then the result is Nothing, and only if the two both succeeded do we get a Just as our result. This allows us to curry operations. Concretely, if I have a function of the form g :: a -> b -> c and I have values ma :: Maybe a and mb :: Maybe b, then we might want to apply g to ma and mb. But when we start to do that, we have a problem.
fmap g ma :: Maybe (b -> c)
Now we've got a function that may or may not exist. We can't fmap that over mb, because a nonexistent function (a Nothing) can't be an argument to fmap. The problem is that we have two independent Maybe values (ma and mb in our example) which are fighting, in some sense, for control. The result should only exist if both are Just. Otherwise, the result should be Nothing. It's sort of a Boolean "and" operation, in that if any of the intermediates fail, then the whole calculation fails. (Note: If you're looking for a Boolean "or", where any individual success can recover from prior failure, then you're looking for Alternative)
So we write
(fmap g ma) <*> mb :: Maybe c
or, using the more convenient synonym Haskell provides for this purpose,
g <$> ma <*> mb :: Maybe c
Now, the key word in the above situation is independent. ma and mb have no say over the other's success or failure. This is good in many cases, because code like this can often be parallelized (there are very efficient command line argument parsing libraries that exploit just this property of Applicative). But, obviously, it's not always what we want.
Enter Monad. In the Maybe monad, the provided function produces a value of type Maybe b based on the input a. The Maybe part of the a and of the b are no longer independent: the latter can depend directly on the former.
For example, take the classic example of Maybe: a square root function. We can't take a square root of a negative number (let's assume we're not working with complex numbers here), so our hypothetical square root looks like
sqrt :: Double -> Maybe Double
sqrt x | x < 0 = Nothing
| otherwise = Just (Prelude.sqrt x)
Now, suppose we've got some number r. But r isn't just a number. It came from earlier in our computation, and our computation might have failed. Maybe it did a square root earlier, or tried to divide by zero, or something else entirely, but it did something that has some chance of producing a Nothing. So r is Maybe Double, and we want to take its square root.
Obviously, if r is already Nothing, then its square root is Nothing; we can't possibly take a square root if we've already failed to compute everything else. On the other hand, if r is a negative number, then sqrt is going to fail and produce Nothing despite the fact that r is itself Just. So what we really want is
case r of
Nothing -> Nothing
Just r' -> sqrt r'
And this is exactly what the Monad instance for Maybe does. That code is equivalent to
r >>= sqrt
The result of this entire computation (and, namely, whether or not it is Nothing or Just) depends not just on whether or not r is Nothing but also on r's actual value. Two different Just values of r can produce success or failure depending on what sqrt does. We can't do that with just a Functor, we can't even do that with Applicative. It takes a Monad.
I am working on an NP search problem and was told I can speed up the search process by using said package. Since memoisation is a new concept to me, I find it hard to wrap my head around anything other than the 'standard' memoised Fibonacci sequence.
In order to instantiate a data type 'a' as Memoizable I need to define a function memoize (:: (a-> v) -> a -> v) on it.
I have a Datatype data Formula which is in the classes (Eq, Ord, Show). I will have to define my own instance declaration, but don't know what function is expected.
What exactly is this function supposed to define for memoisation to work? The package description doesn't elaborate on this, and I doubt function application (which fits the type signature) will speed anything up.
You should read about typeclasses. Here is how I understand the package.
The following definition is given:
class Memoizable a where
memoize ∷ (a → v) → a → v
You should think of the memoize function as something like:
memoize :: (Memoize a) => (a → v) → a → v
Ie: you can apply it to a function from a to v iif an instance of Memoize a is declared. The package declare instances for some basic types like Int
So if you wish to memoize compute :: Int -> WidgetData, you shoud use memoize compute which has the same type without doing anything.
If you wish to memoize a function which takes as input a type without a Memoize instance, you will have to declare it yourself. More likely, you should rely on the template functions like deriveMemoizable to do that for you:
{-# LANGUAGE TemplateHaskell #-} -- put this at the top
deriveMemoizable ''T
I doubt function application (which fits the type signature) will
speed anything up.
It depends of the problem at hands. If compute is expensive, and you call it twice with the same input, it will store the results and avoid computing them twice. If it is not the case, you will increase the memory usage of your program without any gain.
I've been doing dev in F# for a while and I like it. However one buzzword I know doesn't exist in F# is higher-kinded types. I've read material on higher-kinded types, and I think I understand their definition. I'm just not sure why they're useful. Can someone provide some examples of what higher-kinded types make easy in Scala or Haskell, that require workarounds in F#? Also for these examples, what would the workarounds be without higher-kinded types (or vice-versa in F#)? Maybe I'm just so used to working around it that I don't notice the absence of that feature.
(I think) I get that instead of myList |> List.map f or myList |> Seq.map f |> Seq.toList higher kinded types allow you to simply write myList |> map f and it'll return a List. That's great (assuming it's correct), but seems kind of petty? (And couldn't it be done simply by allowing function overloading?) I usually convert to Seq anyway and then I can convert to whatever I want afterwards. Again, maybe I'm just too used to working around it. But is there any example where higher-kinded types really saves you either in keystrokes or in type safety?
So the kind of a type is its simple type. For instance Int has kind * which means it's a base type and can be instantiated by values. By some loose definition of higher-kinded type (and I'm not sure where F# draws the line, so let's just include it) polymorphic containers are a great example of a higher-kinded type.
data List a = Cons a (List a) | Nil
The type constructor List has kind * -> * which means that it must be passed a concrete type in order to result in a concrete type: List Int can have inhabitants like [1,2,3] but List itself cannot.
I'm going to assume that the benefits of polymorphic containers are obvious, but more useful kind * -> * types exist than just the containers. For instance, the relations
data Rel a = Rel (a -> a -> Bool)
or parsers
data Parser a = Parser (String -> [(a, String)])
both also have kind * -> *.
We can take this further in Haskell, however, by having types with even higher-order kinds. For instance we could look for a type with kind (* -> *) -> *. A simple example of this might be Shape which tries to fill a container of kind * -> *.
data Shape f = Shape (f ())
Shape [(), (), ()] :: Shape []
This is useful for characterizing Traversables in Haskell, for instance, as they can always be divided into their shape and contents.
split :: Traversable t => t a -> (Shape t, [a])
As another example, let's consider a tree that's parameterized on the kind of branch it has. For instance, a normal tree might be
data Tree a = Branch (Tree a) a (Tree a) | Leaf
But we can see that the branch type contains a Pair of Tree as and so we can extract that piece out of the type parametrically
data TreeG f a = Branch a (f (TreeG f a)) | Leaf
data Pair a = Pair a a
type Tree a = TreeG Pair a
This TreeG type constructor has kind (* -> *) -> * -> *. We can use it to make interesting other variations like a RoseTree
type RoseTree a = TreeG [] a
rose :: RoseTree Int
rose = Branch 3 [Branch 2 [Leaf, Leaf], Leaf, Branch 4 [Branch 4 []]]
Or pathological ones like a MaybeTree
data Empty a = Empty
type MaybeTree a = TreeG Empty a
nothing :: MaybeTree a
nothing = Leaf
just :: a -> MaybeTree a
just a = Branch a Empty
Or a TreeTree
type TreeTree a = TreeG Tree a
treetree :: TreeTree Int
treetree = Branch 3 (Branch Leaf (Pair Leaf Leaf))
Another place this shows up is in "algebras of functors". If we drop a few layers of abstractness this might be better considered as a fold, such as sum :: [Int] -> Int. Algebras are parameterized over the functor and the carrier. The functor has kind * -> * and the carrier kind * so altogether
data Alg f a = Alg (f a -> a)
has kind (* -> *) -> * -> *. Alg useful because of its relation to datatypes and recursion schemes built atop them.
-- | The "single-layer of an expression" functor has kind `(* -> *)`
data ExpF x = Lit Int
| Add x x
| Sub x x
| Mult x x
-- | The fixed point of a functor has kind `(* -> *) -> *`
data Fix f = Fix (f (Fix f))
type Exp = Fix ExpF
exp :: Exp
exp = Fix (Add (Fix (Lit 3)) (Fix (Lit 4))) -- 3 + 4
fold :: Functor f => Alg f a -> Fix f -> a
fold (Alg phi) (Fix f) = phi (fmap (fold (Alg phi)) f)
Finally, though they're theoretically possible, I've never see an even higher-kinded type constructor. We sometimes see functions of that type such as mask :: ((forall a. IO a -> IO a) -> IO b) -> IO b, but I think you'll have to dig into type prolog or dependently typed literature to see that level of complexity in types.
Consider the Functor type class in Haskell, where f is a higher-kinded type variable:
class Functor f where
fmap :: (a -> b) -> f a -> f b
What this type signature says is that fmap changes the type parameter of an f from a to b, but leaves f as it was. So if you use fmap over a list you get a list, if you use it over a parser you get a parser, and so on. And these are static, compile-time guarantees.
I don't know F#, but let's consider what happens if we try to express the Functor abstraction in a language like Java or C#, with inheritance and generics, but no higher-kinded generics. First try:
interface Functor<A> {
Functor<B> map(Function<A, B> f);
}
The problem with this first try is that an implementation of the interface is allowed to return any class that implements Functor. Somebody could write a FunnyList<A> implements Functor<A> whose map method returns a different kind of collection, or even something else that's not a collection at all but is still a Functor. Also, when you use the map method you can't invoke any subtype-specific methods on the result unless you downcast it to the type that you're actually expecting. So we have two problems:
The type system doesn't allow us to express the invariant that the map method always returns the same Functor subclass as the receiver.
Therefore, there's no statically type-safe manner to invoke a non-Functor method on the result of map.
There are other, more complicated ways you can try, but none of them really works. For example, you could try augment the first try by defining subtypes of Functor that restrict the result type:
interface Collection<A> extends Functor<A> {
Collection<B> map(Function<A, B> f);
}
interface List<A> extends Collection<A> {
List<B> map(Function<A, B> f);
}
interface Set<A> extends Collection<A> {
Set<B> map(Function<A, B> f);
}
interface Parser<A> extends Functor<A> {
Parser<B> map(Function<A, B> f);
}
// …
This helps to forbid implementers of those narrower interfaces from returning the wrong type of Functor from the map method, but since there is no limit to how many Functor implementations you can have, there is no limit to how many narrower interfaces you'll need.
(EDIT: And note that this only works because Functor<B> appears as the result type, and so the child interfaces can narrow it. So AFAIK we can't narrow both uses of Monad<B> in the following interface:
interface Monad<A> {
<B> Monad<B> flatMap(Function<? super A, ? extends Monad<? extends B>> f);
}
In Haskell, with higher-rank type variables, this is (>>=) :: Monad m => m a -> (a -> m b) -> m b.)
Yet another try is to use recursive generics to try and have the interface restrict the result type of the subtype to the subtype itself. Toy example:
/**
* A semigroup is a type with a binary associative operation. Law:
*
* > x.append(y).append(z) = x.append(y.append(z))
*/
interface Semigroup<T extends Semigroup<T>> {
T append(T arg);
}
class Foo implements Semigroup<Foo> {
// Since this implements Semigroup<Foo>, now this method must accept
// a Foo argument and return a Foo result.
Foo append(Foo arg);
}
class Bar implements Semigroup<Bar> {
// Any of these is a compilation error:
Semigroup<Bar> append(Semigroup<Bar> arg);
Semigroup<Foo> append(Bar arg);
Semigroup append(Bar arg);
Foo append(Bar arg);
}
But this sort of technique (which is rather arcane to your run-of-the-mill OOP developer, heck to your run-of-the-mill functional developer as well) still can't express the desired Functor constraint either:
interface Functor<FA extends Functor<FA, A>, A> {
<FB extends Functor<FB, B>, B> FB map(Function<A, B> f);
}
The problem here is this doesn't restrict FB to have the same F as FA—so that when you declare a type List<A> implements Functor<List<A>, A>, the map method can still return a NotAList<B> implements Functor<NotAList<B>, B>.
Final try, in Java, using raw types (unparametrized containers):
interface FunctorStrategy<F> {
F map(Function f, F arg);
}
Here F will get instantiated to unparametrized types like just List or Map. This guarantees that a FunctorStrategy<List> can only return a List—but you've abandoned the use of type variables to track the element types of the lists.
The heart of the problem here is that languages like Java and C# don't allow type parameters to have parameters. In Java, if T is a type variable, you can write T and List<T>, but not T<String>. Higher-kinded types remove this restriction, so that you could have something like this (not fully thought out):
interface Functor<F, A> {
<B> F<B> map(Function<A, B> f);
}
class List<A> implements Functor<List, A> {
// Since F := List, F<B> := List<B>
<B> List<B> map(Function<A, B> f) {
// ...
}
}
And addressing this bit in particular:
(I think) I get that instead of myList |> List.map f or myList |> Seq.map f |> Seq.toList higher kinded types allow you to simply write myList |> map f and it'll return a List. That's great (assuming it's correct), but seems kind of petty? (And couldn't it be done simply by allowing function overloading?) I usually convert to Seq anyway and then I can convert to whatever I want afterwards.
There are many languages that generalize the idea of the map function this way, by modeling it as if, at heart, mapping is about sequences. This remark of yours is in that spirit: if you have a type that supports conversion to and from Seq, you get the map operation "for free" by reusing Seq.map.
In Haskell, however, the Functor class is more general than that; it isn't tied to the notion of sequences. You can implement fmap for types that have no good mapping to sequences, like IO actions, parser combinators, functions, etc.:
instance Functor IO where
fmap f action =
do x <- action
return (f x)
-- This declaration is just to make things easier to read for non-Haskellers
newtype Function a b = Function (a -> b)
instance Functor (Function a) where
fmap f (Function g) = Function (f . g) -- `.` is function composition
The concept of "mapping" really isn't tied to sequences. It's best to understand the functor laws:
(1) fmap id xs == xs
(2) fmap f (fmap g xs) = fmap (f . g) xs
Very informally:
The first law says that mapping with an identity/noop function is the same as doing nothing.
The second law says that any result that you can produce by mapping twice, you can also produce by mapping once.
This is why you want fmap to preserve the type—because as soon as you get map operations that produce a different result type, it becomes much, much harder to make guarantees like this.
I don't want to repeat information in some excellent answers already here, but there's a key point I'd like to add.
You usually don't need higher-kinded types to implement any one particular monad, or functor (or applicative functor, or arrow, or ...). But doing so is mostly missing the point.
In general I've found that when people don't see the usefulness of functors/monads/whatevers, it's often because they're thinking of these things one at a time. Functor/monad/etc operations really add nothing to any one instance (instead of calling bind, fmap, etc I could just call whatever operations I used to implement bind, fmap, etc). What you really want these abstractions for is so you can have code that works generically with any functor/monad/etc.
In a context where such generic code is widely used, this means any time you write a new monad instance your type immediately gains access to a large number of useful operations that have already been written for you. That's the point of seeing monads (and functors, and ...) everywhere; not so that I can use bind rather than concat and map to implement myFunkyListOperation (which gains me nothing in itself), but rather so that when I come to need myFunkyParserOperation and myFunkyIOOperation I can re-use the code I originally saw in terms of lists because it's actually monad-generic.
But to abstract across a parameterised type like a monad with type safety, you need higher-kinded types (as well explained in other answers here).
For a more .NET-specific perspective, I wrote a blog post about this a while back. The crux of it is, with higher-kinded types, you could potentially reuse the same LINQ blocks between IEnumerables and IObservables, but without higher-kinded types this is impossible.
The closest you could get (I figured out after posting the blog) is to make your own IEnumerable<T> and IObservable<T> and extended them both from an IMonad<T>. This would allow you to reuse your LINQ blocks if they're denoted IMonad<T>, but then it's no longer typesafe because it allows you to mix-and-match IObservables and IEnumerables within the same block, which while it may sound intriguing to enable this, you'd basically just get some undefined behavior.
I wrote a later post on how Haskell makes this easy. (A no-op, really--restricting a block to a certain kind of monad requires code; enabling reuse is the default).
The most-used example of higher-kinded type polymorphism in Haskell is the Monad interface. Functor and Applicative are higher-kinded in the same way, so I'll show Functor in order to show something concise.
class Functor f where
fmap :: (a -> b) -> f a -> f b
Now, examine that definition, looking at how the type variable f is used. You'll see that f can't mean a type that has value. You can identify values in that type signature because they're arguments to and results of a functions. So the type variables a and b are types that can have values. So are the type expressions f a and f b. But not f itself. f is an example of a higher-kinded type variable. Given that * is the kind of types that can have values, f must have the kind * -> *. That is, it takes a type that can have values, because we know from previous examination that a and b must have values. And we also know that f a and f b must have values, so it returns a type that must have values.
This makes the f used in the definition of Functor a higher-kinded type variable.
The Applicative and Monad interfaces add more, but they're compatible. This means that they work on type variables with kind * -> * as well.
Working on higher-kinded types introduces an additional level of abstraction - you aren't restricted to just creating abstractions over basic types. You can also create abstractions over types that modify other types.
Why might you care about Applicative? Because of traversals.
class (Functor t, Foldable t) => Traversable t where
traverse :: Applicative f => (a -> f b) -> t a -> f (t b)
type Traversal s t a b = forall f. Applicative f => (a -> f b) -> s -> f t
Once you've written a Traversable instance, or a Traversal for some type, you can use it for an arbitrary Applicative.
Why might you care about Monad? One reason is streaming systems like pipes, conduit, and streaming. These are entirely non-trivial systems for working with effectful streams. With the Monad class, we can reuse all that machinery for whatever we like, rather than having to rewrite it from scratch each time.
Why else might you care about Monad? Monad transformers. We can layer monad transformers however we like to express different ideas. The uniformity of Monad is what makes this all work.
What are some other interesting higher-kinded types? Let's say ... Coyoneda. Want to make repeated mapping fast? Use
data Coyoneda f a = forall x. Coyoneda (x -> a) (f x)
This works or any functor f passed to it. No higher-kinded types? You'll need a custom version of this for each functor. This is a pretty simple example, but there are much trickier ones you might not want to have to rewrite every time.
Recently stated learning a bit about higher kinded types. Although it's an interesting idea, to be able to have a generic that needs another generic but apart from library developers, i do not see any practical use in any real app. I use scala in business app, i have also seen and studied the code of some nicely designed sgstems and libraries like kafka, akka and some financial apps. Nowhere I found any higher kinded type in use.
It seems like they are nice for academia or similar but the market doesn't need it or hasn't reached to a point where HKT has any practical uses or proves to be better than other existing techniques. To me it's something that you can use to impress others or write blog posts but nothing more than that. It's like multiverse or string theory. Looks nice on paper, gives you hours to speak about but nothing real
(sorry if you don't have any interest in theoratical physiss). One prove is that all the answers above, they all brilliantly describe the mechanics fail to cite one true real case where we would need it despite being the fact it's been 6+ years since OP posted it.
A discussion came up at work recently about Sets, which in Scala support the zip method and how this can lead to bugs, e.g.
scala> val words = Set("one", "two", "three")
scala> words zip (words map (_.length))
res1: Set[(java.lang.String, Int)] = Set((one,3), (two,5))
I think it's pretty clear that Sets shouldn't support a zip operation, since the elements are not ordered. However, it was suggested that the problem is that Set isn't really a functor, and shouldn't have a map method. Certainly, you can get yourself into trouble by mapping over a set. Switching to Haskell now,
data AlwaysEqual a = Wrap { unWrap :: a }
instance Eq (AlwaysEqual a) where
_ == _ = True
instance Ord (AlwaysEqual a) where
compare _ _ = EQ
and now in ghci
ghci> import Data.Set as Set
ghci> let nums = Set.fromList [1, 2, 3]
ghci> Set.map unWrap $ Set.map Wrap $ nums
fromList [3]
ghci> Set.map (unWrap . Wrap) nums
fromList [1, 2, 3]
So Set fails to satisfy the functor law
fmap f . fmap g = fmap (f . g)
It can be argued that this is not a failing of the map operation on Sets, but a failing of the Eq instance that we defined, because it doesn't respect the substitution law, namely that for two instances of Eq on A and B and a mapping f : A -> B then
if x == y (on A) then f x == f y (on B)
which doesn't hold for AlwaysEqual (e.g. consider f = unWrap).
Is the substition law a sensible law for the Eq type that we should try to respect? Certainly, other equality laws are respected by our AlwaysEqual type (symmetry, transitivity and reflexivity are trivially satisfied) so substitution is the only place that we can get into trouble.
To me, substition seems like a very desirable property for the Eq class. On the other hand, some comments on a recent Reddit discussion include
"Substitution seems stronger than necessary, and is basically quotienting the type, putting requirements on every function using the type."
-- godofpumpkins
"I also really don't want substitution/congruence since there are many legitimate uses for values which we want to equate but are somehow distinguishable."
-- sclv
"Substitution only holds for structural equality, but nothing insists Eq is structural."
-- edwardkmett
These three are all pretty well known in the Haskell community, so I'd be hesitant to go against them and insist on substitability for my Eq types!
Another argument against Set being a Functor - it is widely accepted that being a Functor allows you to transform the "elements" of a "collection" while preserving the shape. For example, this quote on the Haskell wiki (note that Traversable is a generalization of Functor)
"Where Foldable gives you the ability to go through the structure processing the elements but throwing away the shape, Traversable allows you to do that whilst preserving the shape and, e.g., putting new values in."
"Traversable is about preserving the structure exactly as-is."
and in Real World Haskell
"...[A] functor must preserve shape. The structure of a collection should not be affected by a functor; only the values that it contains should change."
Clearly, any functor instance for Set has the possibility to change the shape, by reducing the number of elements in the set.
But it seems as though Sets really should be functors (ignoring the Ord requirement for the moment - I see that as an artificial restriction imposed by our desire to work efficiently with sets, not an absolute requirement for any set. For example, sets of functions are a perfectly sensible thing to consider. In any case, Oleg has shown how to write efficient Functor and Monad instances for Set that don't require an Ord constraint). There are just too many nice uses for them (the same is true for the non-existant Monad instance).
Can anyone clear up this mess? Should Set be a Functor? If so, what does one do about the potential for breaking the Functor laws? What should the laws for Eq be, and how do they interact with the laws for Functor and the Set instance in particular?
Another argument against Set being a Functor - it is widely accepted that being a Functor allows you to transform the "elements" of a "collection" while preserving the shape. [...] Clearly, any functor instance for Set has the possibility to change the shape, by reducing the number of elements in the set.
I'm afraid that this is a case of taking the "shape" analogy as a defining condition when it is not. Mathematically speaking, there is such a thing as the power set functor. From Wikipedia:
Power sets: The power set functor P : Set → Set maps each set to its power set and each function f : X → Y to the map which sends U ⊆ X to its image f(U) ⊆ Y.
The function P(f) (fmap f in the power set functor) does not preserve the size of its argument set, yet this is nonetheless a functor.
If you want an ill-considered intuitive analogy, we could say this: in a structure like a list, each element "cares" about its relationship to the other elements, and would be "offended" if a false functor were to break that relationship. But a set is the limiting case: a structure whose elements are indifferent to each other, so there is very little you can do to "offend" them; the only thing is if a false functor were to map a set that contains that element to a result that doesn't include its "voice."
(Ok, I'll shut up now...)
EDIT: I truncated the following bits when I quoted you at the top of my answer:
For example, this quote on the Haskell wiki (note that Traversable is a generalization of Functor)
"Where Foldable gives you the ability to go through the structure processing the elements but throwing away the shape, Traversable allows you to do that whilst preserving the shape and, e.g., putting new values in."
"Traversable is about preserving the structure exactly as-is."
Here's I'd remark that Traversable is a kind of specialized Functor, not a "generalization" of it. One of the key facts about any Traversable (or, actually, about Foldable, which Traversable extends) is that it requires that the elements of any structure have a linear order—you can turn any Traversable into a list of its elements (with Foldable.toList).
Another, less obvious fact about Traversable is that the following functions exist (adapted from Gibbons & Oliveira, "The Essence of the Iterator Pattern"):
-- | A "shape" is a Traversable structure with "no content,"
-- i.e., () at all locations.
type Shape t = t ()
-- | "Contents" without a shape are lists of elements.
type Contents a = [a]
shape :: Traversable t => t a -> Shape t
shape = fmap (const ())
contents :: Traversable t => t a -> Contents a
contents = Foldable.toList
-- | This function reconstructs any Traversable from its Shape and
-- Contents. Law:
--
-- > reassemble (shape xs) (contents xs) == Just xs
--
-- See Gibbons & Oliveira for implementation. Or do it as an exercise.
-- Hint: use the State monad...
--
reassemble :: Traversable t => Shape t -> Contents a -> Maybe (t a)
A Traversable instance for sets would violate the proposed law, because all non-empty sets would have the same Shape—the set whose Contents is [()]. From this it should be easy to prove that whenever you try to reassemble a set you would only ever get the empty set or a singleton back.
Lesson? Traversable "preserves shape" in a very specific, stronger sense than Functor does.
Set is "just" a functor (not a Functor) from the subcategory of Hask where Eq is "nice" (i.e. the subcategory where congruence, substitution, holds). If constraint kinds were around from way back then perhaps set would be a Functor of some kind.
Well, Set can be treated as a covariant functor, and as a contravariant functor; usually it's a covariant functor. And for it to behave regarding equality one has to make sure that whatever the implementation, it does.
Regarding Set.zip - it is nonsense. As well as Set.head (you have it in Scala). It should not exist.
I'm having a hard time understanding Higher Kind vs Higher Rank types. Kind is pretty simple (thanks Haskell literature for that) and I used to think rank is like kind when talking about types but apparently not! I read the Wikipedia article to no avail. So can someone please explain what is a Rank? and what is meant by Higher Rank? Higher Rank Polymorphism? how that comes to Kinds (if any) ? Comparing Scala and Haskell would be awesome too.
The concept of rank is not really related to the concept of kinds.
The rank of a polymorphic type system describes where foralls may appear in types. In a rank-1 type system foralls may only appear at the outermost level, in a rank-2 type system they may appear at one level of nesting and so on.
So for example forall a. Show a => (a -> String) -> a -> String would be a rank-1 type and forall a. Show a => (forall b. Show b => b -> String) -> a -> String would be a rank-2 type. The difference between those two types is that in the first case, the first argument to the function can be any function that takes one showable argument and returns a String. So a function of type Int -> String would be a valid first argument (like a hypothetical function intToString), so would a function of type forall a. Show a => a -> String (like show). In the second case only a function of type forall a. Show a => a -> String would be a valid argument, i.e. show would be okay, but intToString wouldn't be. As a consequence the following function would be a legal function of the second type, but not the first (where ++ is supposed to represent string concatenation):
higherRankedFunction(f, x) = f("hello") ++ f(x) ++ f(42)
Note that here the function f is applied to (potentially) three different types of arguments. So if f were the function intToString this would not work.
Both Haskell and Scala are Rank-1 (so the above function can not be written in those languages) by default. But GHC contains a language extension to enable Rank-2 polymorphism and another one to enable Rank-n polymorphism for arbitrary n.