In what way is Scala's Option fold a catamorphism? - scala

The answer to this question suggests that the fold method on Option in Scala is a catamoprhism. From the wikipedia a catamophism is "the unique homomorphism from an initial algebra into some other algebra. The concept has been applied to functional programming as folds". So that seems fair, but leads me to an initial algebra as the initial object in the category of F-algebras.
So if the fold on Option is really a catamophism there needs to be some functor F, to create the category of F-algebras where Option would be the initial object. I can't figure out what this functor would be.
For Lists of type A the functor F is F[X] = 1 + A * X. This makes sense because List is a recursive data type, so if X is List[A] then the above reads that a list of type A is either the empty list (1), or (+) a pair (*) of an A and a List[A]. But Option isn't recursive. Option[A] would just be 1 + A (Nothing or an A). So I don't see where the functor is.
Just to be clear I realize that Option is already a functor, in that it takes A to Option[A], but what is done for lists is different, the A is fixed and the functor is used to describe how to recursively construct the data type.
On a related note, if it is not a catamorphism it probably shouldn't be called a fold, as that leads to some confusion.

Well, the comments are in the right track. I'm just a beginner so I probably have some misconceptions. Yes, the whole point is to be able to model recursive types, but I think nothing precludes a "non-recursive" F-algebra. Since the initial algebra is the "least fixed point" solution to the equation X ~= F X. In the case of Option, the solution is trivial, as there's no recursion involved :)
Other examples of initial algebras:
List[X] = 1 + A * X to represent list = Nil | Cons a list
Tree[X] = A + A * X * X to represent tree = Leaf a | Node a tree tree
In the same way:
Option[X] = 1 + A to represent option = None | Some a
The justification for the existence of a "constant" functor is pretty easy, how do you represent a tree's node?
In fact, to algebraically model (simple) recursive datatypes you need only the following functors:
U (Unit, represents empty)
K (Constant, captures a value)
I (Identity, represent the recursive position)
* (product)
+ (coproduct)
A good reference I found is Functional Generic Programming
Shameless plug: I'm playing with those concepts in code in scala-reggen

Related

How does aggregate generalise fold and fold generalise reduce?

As far as I understand aggregate is a generalisation of fold which in turn is a generalisation of reduce.
Similarily combineByKey is a generalisation of aggregateByKey which in turn is a generalisation of foldByKey which in turn is a generalisation of reduceByKey.
However I have trouble finding simple examples for each of those seven methods which can in turn only be expressed by them and not their less general versions. For example I found http://blog.madhukaraphatak.com/spark-rdd-fold/ giving an example for fold, but I have been able to use reduce in the same situation as well.
What I found out so far:
I read that the more generalised methods can be more efficient, but that would be a non-functional requirement and I would like to get examples which can not be implemented with the more specific method.
I also read that e.g. the function passed to fold only has to be associative, while the one for reduce has to be commutative additionally: https://stackoverflow.com/a/25158790/4533188 (However, I still don't know any good simple example.) whereas in https://stackoverflow.com/a/26635928/4533188 I read that fold needs both properties to hold...
We could think of the zero value as a feature (e.g. for fold over reduce) as in "add all elements and add 3" and using 3 as the zero value, but that would be misleading, because 3 would be added for each partition, not just once. Also this is simply not the purpose of fold as far as I understood - it wasn't meant as a feature, but as a necessity to implement it to be able to take non-commutative functions.
What would simple examples for those seven methods be?
Let's work through what is actually needed logically.
First, note that if your collection is unordered, any set of (binary) operations on it need to be both commutative and associative, or you'll get different answers depending on which (arbitrary) order you choose each time. Since reduce, fold, and aggregate all use binary operations, if you use these things on a collection that is unordered (or is viewed as unordered), everything must be commutative and associative.
reduce is an implementation of the idea that if you can take two things and turn them into one thing, you can collapse an arbitrarily long collection into a single element. Associativity is exactly the property that it doesn't matter how you pair things up as long as you eventually pair them all and keep the left-to-right order unchanged, so that's exactly what you need.
a b c d a b c d a b c d
a # b c d a # b c d a b # c d
(a#b) c # d (a#b) # c d a (b#c) d
(a#b) # (c#d) ((a#b)#c) # d a # ((b#c)#d)
All of the above are the same as long as the operation (here called #) is associative. There is no reason to swap around which things go on the left and which go on the right, so the operation does not need to be commutative (addition is: a+b == b+a; concat is not: ab != ba).
reduce is mathematically simple and requires only an associative operation
Reduce is limited, though, in that it doesn't work on empty collections, and in that you can't change the type. If you're working sequentially, you can a function that takes a new type and the old type, and produces something with the new type. This is a sequential fold (left-fold if the new type goes on the left, right-fold if it goes on the right). There is no choice about the order of operations here, so commutativity and associativity and everything are irrelevant. There's exactly one way to work through your list sequentially. (If you want your left-fold and right-fold to always be the same, then the operation must be associative and commutative, but since left- and right-folds don't generally get accidentally swapped, this isn't very important to ensure.)
The problem comes when you want to work in parallel. You can't sequentially go through your collection; that's not parallel by definition! So you have to insert the new type at multiple places! Let's call our fold operation #, and we'll say that the new type goes on the left. Furthermore, we'll say that we always start with the same element, Z. Now we could do any of the following (and more):
a b c d a b c d a b c d
Z#a b c d Z#a b Z#c d Z#a Z#b Z#c Z#d
(Z#a) # b c d (Z#a) # b (Z#c) # d
((Z#a)#b) # c d
(((Z#a)#b)#c) # d
Now we have a collection of one or more things of the new type. (If the original collection was empty, we just take Z.) We know what to do with that! Reduce! So we make a reduce operation for our new type (let's call it $, and remember it has to be associative), and then we have aggregate:
a b c d a b c d a b c d
Z#a b c d Z#a b Z#c d Z#a Z#b Z#c Z#d
(Z#a) # b c d (Z#a) # b (Z#c) # d Z#a $ Z#b Z#c $ Z#d
((Z#a)#b) # c d ((Z#a)#b) $ ((Z#c)#d) ((Z#a)$(Z#b)) $ ((Z#c)$(Z#d))
(((Z#a)#b)#c) # d
Now, these things all look really different. How can we make sure that they end up to be the same? There is no single concept that describes this, but the Z# operation has to be zero-like and $ and # have to be homomorphic, in that we need (Z#a)#b == (Z#a)$(Z#b). That's the actual relationship that you need (and it is technically very similar to a semigroup homomorphism). There are all sorts of ways to pick badly even if everything is associative and commutative. For example, if Z is the double value 0.0 and # is actually +, then Z is zero-like and # is associative and commutative. But if $ is actually *, which is also associative and commutative, everything goes wrong:
(0.0+2) * (0.0+3) == 2.0 * 3.0 == 6.0
((0.0+2) + 3) == 2.0 + 3 == 5.0
One example of a non-trival aggregate is building a collection, where # is the "append an element" operator and $ is the "concat two collections" operation.
aggregate is tricky and requires an associative reduce operation, plus a zero-like value and a fold-like operation that is homomorphic to the reduce
The bottom line is that aggregate is not simply a generalization of reduce.
But there is a simplification (less general form) if you're not actually changing the type. If Z is actually z and is an actual zero, we can just stick it in wherever we want and use reduce. Again, we don't need commutativity conceptually; we just stick in one or more z's and reduce, and our # and $ operations can be the same thing, namely the original # we used on the reduce
a b c d () <- empty
z#a z#b z
z#a (z#b)#c
z#a ((z#b)#c)#d
(z#a)#((z#b)#c)#d
If we just delete the z's from here, it works perfectly well, and in fact is equivalent to if (empty) z else reduce. But there's another way it could work too. If the operation # is also commutative, and z is not actually a zero but just occupies a fixed point of # (meaning z#z == z but z#a is not necessarily just a), then you can run the same thing, and since commutivity lets you switch the order around, you conceptually can reorder all the z's together at the beginning, and then merge them all together.
And this is a parallel fold, which is really a rather different beast than a sequential fold.
(Note that neither fold nor aggregate are strictly generalizations of reduce even for unordered collections where operations have to be associative and commutative, as some operations do not have a sensible zero! For instance, reducing strings by shortest length has as its "zero" the longest possible string, which conceptually doesn't exist, and practically is an absurd waste of memory.)
fold requires an associative reduce operation plus either a zero value or a reduce operation that's commutative plus a fixed-point value
Now, when would you ever use a parallel fold that wasn't just a reduceOrElse(zero)? Probably never, actually, though they can exist. For example, if you have a ring, you often have fixed points of the type we need. For instance, 10 % 45 == (10*10) % 45, and * is both associative and commutative in integers mod 45. Thus, if our collection is numbers mod 45, we can fold with a "zero" of 10 and an operation of *, and parallelize however we please while still getting the same result. Pretty weird.
However, note that you can just plug the zero and operation of fold into aggregate and get exactly the same result, so aggregate is a proper generalization of fold.
So, bottom line:
Reduce requires only an associative merge operation, but doesn't change the type, and doesn't work on empty collecitons.
Parallel fold tries to extend reduce but requires a true zero, or a fixed point and the merge operation must be commutative.
Aggregate changes the type by (conceptually) running sequential folds followed by a (parallel) reduce, but there are complex relationships between the reduce operation and the fold operation--basically they have to be doing "the same thing".
An unordered collection (e.g. a set) always requires an associative and commutative operation for any of the above.
With regard to the byKey stuff: it's just the same as this, except it only applies it to the collection of values associated with a (potentially repeated) key.
If Spark actually requires commutativity where the above analysis does not suggest it's needed, one could reasonably consider that a bug (or at least an unnecessary limitation of the implementation, given that operations like map and filter preserve order on ordered RDDs).
the function passed to fold only has to be associative, while the one for reduce has to be commutative additionally.
It is not correct. fold on RDDs requires the function to be commutative as well. It is not the same operation as fold on Iterable what is pretty well described in the official documentation:
This behaves somewhat differently from fold operations implemented for non-distributed
collections in functional languages like Scala.
This fold operation may be applied to
partitions individually, and then fold those results into the final result, rather than
apply the fold to each element sequentially in some defined ordering. For functions
that are not commutative, the result may differ from that of a fold applied to a
non-distributed collection.
As you can see order of merging partial values is not part of the contract hence function which is used for fold has to be commutative.
I read that the more generalised methods can be more efficient
Technically speaking there should be no significant difference. For fold vs reduce you can check my answers to reduce() vs. fold() in Apache Spark and Why is the fold action necessary in Spark?
Regarding *byKey methods all are implemented using the same basic construct which is combineByKeyWithClassTag and can be reduced to three simple operations:
createCombiner - create "zero" value for a given partition
mergeValue - merge values into accumulator
mergeCombiners - merge accumulators created for each partition.

Is there any connection between the contravarience of Hom Functor and Scala's Function1?

The Hom functor Hom(-,-) is contravariant in the first argument and covariant in the second.
Can this fact somehow offer another explanation why Scala's Function1[-T1, +R] has the same property?
I have seen this claim made for example here, but at the point where the connection between the two concepts was supposed to be explained, there was so much hand waving it blew me away.
There are two categories of Scala types.
One is the usual types-and-functions category, where types are objects and arrows are functions.
The other one is the types-and-subtyping category, where types are objects and subtyping relationships are arrows. This category is a poset.
Covariance and contravariance in Scala is precisely covariance and contravariance of endofunctors in this latter category.
Now the second category happens to be a subcategory of the first one, due to projection arrows that map subtypes to supertypes. These arrows of the first category are exactly (all) the arrows of the second category. So every covariant endofunctor of the first category is naturally (that is, via a natural transformation) a covariant endofunctor of the second category.
Indeed, if a functor F maps A to A' and B to B' and every arrow f: A -> B to an arrow f': A' -> B', and if A is a subtype of B, then the projection arrow prj_A,B is mapped to a projection arrow prj_A',B', and if one exists, then A' is a subtype of B'. Same thing about contravariant functors.
Now it only remains to see that Function1 is in a sense the Hom functor. Indeed if we see a Scala type as a set of its values, then Function1[A,B] is a set of morphisms (Scala functions) from A to B. The arrow mapping is given by composition. And since it's covariant (contravariant) in the first category, it must be also covariant (contravariant) in the second category.
Edit:corrected subtype/supertupe confusion.
Disclaimer: I've never studied category theory. I may or may not know what I'm talking about.
Probably not if you're careful about the setup. Function1 is very analogous to (the object part of) the Hom functor -- except that its target is not quite the same category. The target of the Function1 mapping is (a subcategory of) the scala category that has Scala types as objects and functions as arrows; while the target of the Hom functor is (a subcategory of) SET. Their images are probably isomorphic, but it's not clear that combining the two functors and the isomorphism preserves the structure in the way you need for variance to be preserved across the whole chain.

Are side effects everything that cannot be found in a pure function?

Is it safe to say that the following dichotomy holds:
Each given function is
either pure
or has side effects
If so, side effects (of a function) are anything that can't be found in a pure function.
This very much depends on the definitions that you choose. It is definitely fair to say that a function is pure or impure. A pure function always returns the same result and does not modify the environment. An impure function can return different results when it is executed repeatedly (which can be caused by doing something to the environment).
Are all impurities side-effects? I would not say so - a function can depend on something in the environment in which it executes. This could be reading some configuration, GPS location or reading data from the internet. These are not really "side-effects" because it does not do anything to the world.
I think there are two different kinds of impurities:
Output impurity is when a function does something to the world. In Haskell, this is modelled using monads - an impure function a -> b is actually a function a -> M b where M captures the other things that it does to the world.
Input impurity is when a function requires something from the environment. An impure function a -> b can be modelled as a function C a -> b where the type C captures other things from the environment that the function may access.
Monads and output impurities are certainly better known, but I think input impurities are equally important. I wrote my PhD thesis about input impurities which I call coeffects, so I this might be a biased answer though.
For a function to be pure it must:
not be affected by side-effects (i.e. always return same value for same arguments)
not cause side-effects
But, you see, this defines functional purity with the property or absence of side-effects. You are trying to apply backwards logic to deduct the definition of side-effects using pure functions, which logically should work, but practically the definition of a side-effect has nothing to do with functional purity.
I don't see problems with the definition of a pure function: a pure function is a function. I.e. it has a domain, a codomain and maps the elements of the former to the elements of the latter. It's defined on all inputs. It doesn't do anything to the environment, because "the environment" at this point doesn't exist: there are no machines that can execute (for some definition of "execute") a given function. There is just a total mapping from something to something.
Then some capitalist decides to invade the world of well-defined functions and enslave such pure creatures, but their fair essence can't survive in our cruel reality, functions become dirty and start to make the CPU warmer.
So it's the environment is responsible for making the CPU warmer and it makes perfect sense to talk about purity before its owner was abused and executed. In the same way referential transparency is a property of a language — it doesn't hold in the environment in general, because there can be a bug in the compiler or a meteorite can fall upon your head and your program will stop producing the same result.
But there are other creatures: the dark inhabitants of the underworld. They look like functions, but they are aware of the environment and can interact with it: read variables, send messages and launch missiles. We call these fallen relatives of functions "impure" or "effectful" and avoid as much as possible, because their nature is so dark that it's impossible to reason about them.
So there is clearly a big difference between those functions which can interact with the outside and those which doesn't. However the definition of "outside" can vary too. The State monad is modeled using only pure tools, but we think about f : Int -> State Int Int as about effectful computation. Moreover, non-termination and exceptions (error "...") are effects, but haskellers usually don't consider them so.
Summarizing, a pure function is a well-defined mathematical concept, but we usually consider functions in programming languages and what is pure there depends on your point of view, so it doesn't make much sense to talk about dichotomies when involved concepts are not well-defined.
A way to define purity of a function f is ∀x∀y x = y ⇒ f x = f y, i.e. given the same argument the function returns the same result, or it preserves equality.
This isn't what people usually mean when they talk about "pure functions"; they usually mean "pure" as "does not have side effects". I haven't figured out how to qualify a "side effect" (comments welcome!) so I don't have anything to say about it.
Nonetheless, I'll explore this concept of purity because it might offer some related insight. I'm no expert here; this is mostly me just rambling. I do however hope it sparks some insightful (and corrective!) comments.
To understand purity we have to know what equality we are talking about. What does x = y mean, and what does f x = f y mean?
One choice is the Haskell semantic equality. That is, equality of the semantics Haskell assigns to its terms. As far as I know there are no official denotational semantics for Haskell, but Wikibooks Haskell Denotational Semantics offers a reasonable standard that I think the community more or less agrees to ad-hoc. When Haskell says its functions are pure this is the equality it refers to.
Another choice is a user-defined equality (i.e. (==)) by deriving the Eq class. This is relevant when using denotational design — that is, we are assigning our own semantics to terms. With this choice we can accidentally write functions which are impure; Haskell is not concerned with our semantics.
I will refer to the Haskell semantic equality as = and the user-defined equality as ==. Also I assume that == is an equality relation — this does not hold for some instances of == such as for Float.
When I use x == y as a proposition what I really mean is x == y = True ∨ x == y = ⊥, because x == y :: Bool and ⊥ :: Bool. In other words, when I say x == y is true, I mean that if it computes to something other than ⊥ then it computes to True.
If x and y are equal according to Haskell's semantics then they are equal according to any other semantic we may choose.
Proof: if x = y then x == y ≡ x == x and x == x is true because == is pure (according to =) and reflexive.
Similarly we can prove ∀f∀x∀y x = y ⇒ f x == f y. If x = y then f x = f y (because f is pure), therefore f x == f y ≡ f x == f x and f x == f x is true because == is pure and reflexive.
Here is a silly example of how we can break purity for a user-defined equality.
data Pair a = Pair a a
instance (Eq a) => Eq (Pair a) where
Pair x _ == Pair y _ = x == y
swap :: Pair a -> Pair a
swap (Pair x y) = Pair y x
Now we have:
Pair 0 1 == Pair 0 2
But:
swap (Pair 0 1) /= swap (Pair 0 2)
Note: ¬(Pair 0 1 = Pair 0 2) so we were not guaranteed that our definition of (==) would be okay.
A more compelling example is to consider Data.Set. If x, y, z :: Set A then you would hope this holds, for example:
x == y ⇒ (Set.union z) x == (Set.union z) y
Especially when Set.fromList [1,2,3] and Set.fromList [3,2,1] denote the same set but probably have different (hidden) representations (not equivalent by Haskell's semantics). That is to say we want to be sure that ∀z Set.union z is pure according to (==) for Set.
Here is a type I have played with:
newtype Spry a = Spry [a]
instance (Eq a) => Eq (Spry a) where
Spry xs == Spry ys = fmap head (group xs) == fmap head (group ys)
A Spry is a list which has non-equal adjacent elements. Examples:
Spry [] == Spry []
Spry [1,1] == Spry [1]
Spry [1,2,2,2,1,1,2] == Spry [1,2,1,2]
Given this, what is a pure implementation (according to == for Spry) for flatten :: Spry (Spry a) -> Spry a such that if x is an element of a sub-spry it is also an element of the flattened spry (i.e. something like ∀x∀xs∀i x ∈ xs[i] ⇒ x ∈ flatten xs)? Exercise for the reader.
It is also worth noting that the functions we've been talking about are across the same domain, so they have type A → A. That is except for when we proved ∀f∀x∀y x = y ⇒ f x == f y which crosses from Haskell's semantic domain to our own. This might be a homomorphism of some sorts… maybe a category theorist could weigh in here (and please do!).
Side effects are part of the definition of the language. In the expression
f e
the side effects of e are all the parts of e's behavior that are 'moved out' and become part of the behavior of the application expression, rather than being passed into f as part of the value of e.
For a concrete example, consider this program:
f x = x; x
f (print 3)
where conceptually the syntax x; x means 'run x, then run it again and return the result'.
In a language where print writes to stdout as a side effect, this writes
3
because the output is part of the semantics of the application expression.
In a language where the output of print is not a side effect, this writes
3
3
because the output is part of the semantics of the x variable inside the definition of f.

Scala constraint based types and literals

I was thinking whether it would be possible in Scala to define a type like NegativeNumber. This type would represent a negative number and it would be checked by the compiler similarly to Ints, Strings etc.
val x: NegativeNumber = -34
val y: NegativeNumber = 34 // should not compile
Likewise:
val s: ContainsHello = "hello world"
val s: ContainsHello = "foo bar" // this should not compile either
I could use these types just like other types, eg:
def myFunc(x: ContainsHello): Unit = println(s"$x contains hello")
These constrained types could be backed by casual types (Int, String).
Is it possible to implement these types (maybe with macros)?
How about custom literals?
val neg = -34n //neg is of type NegativeNumber because of the suffix
val pos = 34n // compile error
Unfortunately, no this isn't something you could easily check at compile time. Well - at least not if you aren't restricting the operations on your type. If your goal is simply to check that a number literal is non-zero, you could easily write a macro that checks this property. However, I do not see any benefit in proving that a negative literal is indeed negative.
The problem isn't a limitation of Scala - which has a very strong type system - but the fact that (in a reasonably complex program) you can't statically know every possible state. You can however try to overapproximate the set of all possible states.
Let us consider the example of introducing a type NegativeNumber that only ever represents a negative number. For simplicity, we define only one operation: plus.
Say you would only allow addition of multiple NegativeNumber, then, the type system could be used to guarantee that each NegativeNumber is indeed a negative number. But this seems really restrictive, so a useful example would certainly allow us to add at least a NegativeNumber and a general Int.
What if you had an expression val z: NegativeNumber = plus(x, y) where you don't know the value of x and y statically (maybe they are returned by a function). How do you know (statically) that z is indead a negative number?
An approach to solve the problem would be to introduce Abstract Interpretation which must be run on a representation of your program (Source Code, Abstract Syntax Tree, ...).
For example, you could define a Lattice on the numbers with the following elements:
Top: all numbers
+: all positive numbers
0: the number 0
-: all negative numbers
Bottom: not a number - only introduced that each pair of elements has a greatest lower bound
with the ordering Top > (+, 0, -) > Bottom.
Then you'd need to define semantics for your operations. Taking the commutative method plus from our example:
plus(Bottom, something) is always Bottom, as you cannot calculate something using invalid numbers
plus(Top, x), x != Bottom is always Top, because adding an arbitrary number to any number is always an arbitrary number
plus(+, +) is +, because adding two positive numbers will always yield a positive number
plus(-, -) is -, because adding two negative numbers will always yield a negative number
plus(0, x), x != Bottom is x, because 0 is the identity of the addition.
The problem is that
plus - + will be Top, because you don't know if it's a positive or negative number.
So to be statically safe, you'd have to take the conservative approach and disallow such an operation.
There are more sophisticated numerical domains but ultimately, they all suffer from the same problem: They represent an overapproximation to the actual program state.
I'd say the problem is similar to integer overflow/underflow: Generally, you don't know statically whether an operation exhibits an overflow - you only know this at runtime.
It could be possible if SIP-23 was implemented, using implicit parameters as a form of refinement types. However, it would be of questionable value though as the Scala compiler and type system is not really not well equipped for proving interesting things about for example integers. For that it would be much nicer to use a language with dependent types (Idris etc.) or refinement types checked by an SMT solver (LiquidHaskell etc.).

What forms of goal in Coq are considered to be "true"?

When I prove some theorem, my goal evolves as I apply more and more tactics. Generally speaking the goal tends to split into sub goals, where the subgoals are more simple. At some final point Coq decides that the goal is proven. How this "proven" goal may look like? These goals seems to be fine:
a = a. (* Any object is identical to itself (?) *)
myFunc x y = myFunc x y. (* Result of the same function with the same params
is always the same (?) *)
What else can be here or can it be that examples are fundamentally wrong?
In other words, when I finally apply reflexivity, Coq just says ** Got it ** without any explanation. Is there any way to get more details on what it actually did or why it decided that the goal is proven?
You're actually facing a very general notion that seems not so general because Coq has some user-friendly facility for reasoning with equality in particular.
In general, Coq accepts a goal as solved as soon as it receives a term whose type is the type of the goal: it has been convinced the proposition is true because it has been convinced the type that this proposition describes is inhabited, and what convinced it is the actual witness you helped build along your proof.
For the particular case of inductive datatypes, the two ways you are going to be able to proved the proposition P a b c are:
by constructing a term of type P a b c, using the constructors of the inductive type P, and providing all the necessary arguments.
or by reusing an existing proof or an axiom in the environment whose type you can get to match P a b c.
For the even more particular case of equality proofs (equality is just an inductive datatype in Coq), the same two ways I list above degenerate to this:
the only constructor of equality is eq_refl, and to apply it you need to show that the two sides are judgementally equal. For most purposes, this corresponds to goals that look like T a b c = T a b c, but it is actually a slightly more broad notion of equality (see below). For these, all you have to do is apply the eq_refl constructor. In a nutshell, that is what reflexivity does!
the second case consists in proving that the equality holds because you have other equalities in your context, nothing special here.
Now one part of your question was: when does Coq accept that two sides of an equality are equal by reflexivity?
If I am not mistaken, the answer is when the two sides of the equality are αβδιζ-convertible.
What this grossly means is that there is a way to make them syntactically equal by repeated applications of:
α : sane renaming of non-free variables
β : computing reducible expressions
δ : unfolding definitions
ι : simplifying matches
ζ : expanding let-bound expressions
[someone please correct me if more rules apply or if I got one wrong]
For instance some of the things that are not captured by these rules are:
equality of functions that do more or less the same thing in different ways:
(fun x => 0 + x) = (fun x => x + 0)
quicksort = mergesort
equality of terms that are stuck reducing but would be equal:
forall n, 0 + n = n + 0