I'm working in Coq and trying to figure out how to do the next thing: If I have a list of natural numbers and a given number n, I want to break my list in what goes before and after each of the n's. To make it clearer, if I have the list [1; 2; 0; 3; 4; 0; 9] and the number n = 0, then I want to have as output the three lists: [1;2], [3;4] and [9]. The main problem I have is that I don't know how to output several elements on a Fixpoint. I think I need to nest Fixpoints but I just don't see how. As a very raw idea with one too many issues I have:
Fixpoint SubLists (A : list nat)(m : nat) :=
match A with
|[] => []
|n::A0 => if n =? m then (SubLists L) else n :: (SubLists L)
end.
I would very much appreciate your input on how to do this, and how to navigate having an output of several elements.
You can do this by combining a few fixpoints:
Require Import Coq.Arith.Arith.
Require Import Coq.Lists.List.
Import ListNotations.
Fixpoint prefix n l :=
match l with
| [] => []
| m :: l' => if beq_nat n m then []
else m :: prefix n l'
end.
Fixpoint suffix n l :=
match l with
| [] => l
| m :: l' => if beq_nat n m then l'
else suffix n l'
end.
Fixpoint split_at n l :=
match l with
| [] => []
| m :: l' => prefix n (m :: l') :: split_at n (suffix n (m :: l'))
end.
Notice that Coq's termination checker accepts the recursive call to split_at, even though it is not done syntactically a subterm of l. The reason for that is that it is able to detect that suffix only outputs subterms of its argument. But in order for this to work, we must return l, and not [] on its first branch (try changing it to see what happens!).
In addition to Arthur's solution, you can use an accumulator, which is typical of Functional Programming style:
Require Import Coq.Arith.Arith.
Require Import Coq.Lists.List.
Import ListNotations.
Definition add_acc m (s : list (list nat)) :=
match s with
| [] => [[m]]
| s :: ss => (m :: s) :: ss
end.
Fixpoint split_seq n l acc :=
match l with
| [] => map (#rev _) (rev acc)
| m :: l' => if beq_nat n m then
split_seq n l' ([] :: acc)
else
split_seq n l' (add_acc m acc)
end.
Compute (split_seq 0 [1; 2; 0; 3; 4; 0; 9] []).
Note that the result is reversed so you need to use rev. A bonus exercise is to improve this.
EDIT: Provided second variant that doesn't add [] for repeated separators.
Definition reset_acc (s : list (list nat)) :=
match s with
| [] :: ss => [] :: ss
| ss => [] :: ss
end.
Fixpoint split_seq_nodup n l acc :=
match l with
| [] => map (#rev _) (rev acc)
| m :: l' => if beq_nat n m then
split_seq_nodup n l' (reset_acc acc)
else
split_seq_nodup n l' (add_acc m acc)
end.
Compute (split_seq_nodup 0 [1; 2; 0; 3; 4; 0; 9] []).
An alternative way to tackle this issue is to formally describe the problem you are trying to solve and then either write a dependently-typed function proving that this problem can indeed be solved or using tactics to slowly build up your proof.
This is, if I am not mistaken, a relation describing the relationship between the outputs n and ns you want to pass your function and the output mss you want to get back.
The (* ------- *) lines are simple comments used to suggest that these constructors should be seen as inference rules: whatever is under one such line is the conclusion one can make based on the assumptions above it.
Inductive SubListsRel (n : nat) : forall (ns : list nat)
(mss : list (list nat)), Prop :=
| base : SubListsRel n nil (nil :: nil)
| consEq : forall ns m mss,
n = m -> SubListsRel n ns mss ->
(* ----------------------------- *)
SubListsRel n (m :: ns) (nil :: mss)
| consNotEq : forall ns m ms mss,
(n <> m) -> SubListsRel n ns (ms :: mss) ->
(* ------------------------------------------------- *)
SubListsRel n (m :: ns) ((m :: ms) :: mss)
.
We can then express your Sublists problem as being, given inputs n and ns, the existence of an output mss such that SubListsRel n ns mss holds:
Definition SubLists (n : nat) (ns : list nat) : Set :=
{ mss | SubListsRel n ns mss }.
Using tactics we can readily generate such Sublists for concrete examples in order to sanity-check our specification. We can for instance take the example you had in your original post:
Example example1 : SubLists 0 (1 :: 2 :: 0 :: 3 :: 4 :: 0 :: 9 :: nil).
Proof.
eexists ; repeat econstructor ; intro Hf; inversion Hf.
Defined.
And check that the output is indeed the list you were expecting:
Check (eq_refl : proj1_sig example1
= ((1 :: 2 :: nil) :: (3 :: 4 :: nil) :: (9 :: nil) :: nil)).
Now comes the main part of this post: the proof that forall n ns, SubLists n ns. Given that the premise of consNotEq assumes that mss is non-empty, we will actually prove a strengthened statement in order to make our life easier:
Definition Strenghtened_SubLists (n : nat) (ns : list nat) : Set :=
{ mss | SubListsRel n ns mss /\ mss <> nil }.
And given that oftentimes we will have goals of the shape something_absurd -> False, I define a simple tactic to handle these things. It introduces the absurd assumption and inverts it immediately to make the goal disappear:
Ltac dismiss := intro Hf; inversion Hf.
We can now prove the main statement by proving the strengthened version by induction and deducing it. I guess that here it's better for you to step through it in Coq rather than me trying to explain what happens. The key steps are the cut (proving a stronger statement), induction and the case analysis on eq_nat_dec.
Lemma subLists : forall n ns, SubLists n ns.
Proof
intros n ns; cut (Strenghtened_SubLists n ns).
- intros [mss [Hmss _]]; eexists; eassumption.
- induction ns.
+ eexists; split; [econstructor | dismiss].
+ destruct IHns as [mss [Hmss mssNotNil]];
destruct (eq_nat_dec n a).
* eexists; split; [eapply consEq ; eassumption| dismiss].
* destruct mss; [apply False_rect, mssNotNil; reflexivity |].
eexists; split; [eapply consNotEq; eassumption| dismiss].
Defined.
Once we have this function, we can come back to our example and generate the appropriate Sublists this time not by calling tactics but by running the function subLists we just defined.
Example example2 : SubLists 0 (1 :: 2 :: 0 :: 3 :: 4 :: 0 :: 9 :: nil) :=
subLists _ _.
And we can Check that the computed list is indeed the same as the one obtained in example1:
Check (eq_refl : proj1_sig example1 = proj1_sig example2).
Nota Bene: It is paramount here that our proofs are ended with Defined rather than Qed in order for them to be unfolded when computing with them (which is what we want to do here: they give us the list (list nat) we are looking for!).
A gist with all the code and the right imports.
Here is another take, based on the standard library function List.fold_left.
It works by maintaining an accumulator, which is a pair of the overall reversed result (a list of lists) and a current sublist (also reversed while accumulating). Once we reach a delimiter, we reverse the current sublist and put it into the resulting list of sublists. After executing fold_left, we reverse the result in the outermost match expression.
Require Import Coq.Arith.Arith.
Require Import Coq.Lists.List.
Import ListNotations.
Definition split_skip_dup_delims (m : nat) (xs : list nat) :=
match fold_left
(fun (acctup: _ * _) x => let (acc, rev_subl) := acctup in
if beq_nat x m
then match rev_subl with (* a delimiter found *)
| [] => (acc, []) (* do not insert empty sublist *)
| _ => (rev rev_subl :: acc, []) end
else (acc, x :: rev_subl)) (* keep adding to the current sublist *)
xs
([],[]) with
| (acc, []) => rev acc (* list ends with a delimiter *)
| (acc, rev_subl) => rev (rev rev_subl :: acc) (* no delimiter at the end *)
end.
Eval compute in split_skip_dup_delims 0 [1; 2; 0; 0; 0; 3; 4; 0; 9].
(* = [[1; 2]; [3; 4]; [9]]
: list (list nat) *)
Related
I am trying to formalize some intuitionistic notions. One of these is the continuity principle. In Coq I defined this as:
(* Infinite sequences *)
Definition N := nat -> nat.
(* The first n elements of a and b coincide. *)
Definition con (a b : N) n := forall i, i < n -> a i = b i.
(* Brouwers Continuity Principle *)
Axiom BCP :
forall (R : N -> nat -> Prop),
(forall a, exists n, R a n) ->
(forall a, exists m n, forall b, con a b m -> R b n).
I want to generalize this to so called spreads. A spread is a subset of the Baire space that can be thought of as a tree with only infinite branches. A decider o (called the spread law) takes a finite starting sequence and returns 0 if it should be in the spread. When a sequence s is in the spread at least one extension n :: s must also be in the spread. The empty sequence must be accepted such that the spread is inhabited. I defined this as follows:
(* Spread law *)
Definition Spr_Law (o : list nat -> nat) :=
o [] = 0 /\ forall s, o s = 0 <-> exists n, o (n :: s) = 0.
One way to prove that the continuity principle generalizes to arbitrary spreads is to define a function that 'retracts' N onto a spread defined by such a decider o. This is where I get stuck because I simply do not know enough about Coq to define this well. First of all, I inserted a picture of this definition from the course notes.
The trouble is that this definition includes a 'smallest m such that o accepts m :: s'. This is not a terminating procedure in general and I do not know how to use Function to prove that this search would terminate for our purposes (it will since a spread law must accept at least one extension).
I found that I can use the Coq.Logic.ConstructiveEpsilon library to get a witness when I have an exists statement. I could pass the condition that at least one extension exists to the function. Based on this I created the following code (this is only the first part of the definition, which maps finite sequences onto the spread):
Definition find_extension o s (w : exists n, o (n :: s) = 0) : nat :=
constructive_ground_epsilon_nat (fun n => o (n :: s) = 0) (decider_dec o s) w.
(* Compute retraction for finite start sequences. *)
Fixpoint rho o (w : forall s, o s = 0 -> exists n, o (n :: s) = 0)
(s : list nat) : list nat :=
match s with
| [] => []
| n :: s => let t := rho o w s in
if o (n :: t) =? 0
then n :: t
else (find_extension o t (w t {?????})) :: t
end.
Now I encounter the real problem. The {?????} part is where I need to insert a proof that o t = 0. This holds since rho only ever returns sequences that are accepted by the decider o. Perhaps I can let rho return a tuple containing the new sequence together with a proof that this sequence is accepted (such that I can feed it into w after recursion), but I do not know how. Note that this is especially tricky for the else branch since the proof that this value is accepted holds because the witness is valid.
Of course alternative ideas for defining spreads are also welcome. I do feel that this is achievable though (there are no logical inconsistencies as far as I can see).
I seem to have figured something out:
(* Only sequences that are accepted by o *)
Inductive spr (o : decider) :=
| spr_s s : o s = 0 -> spr o.
(* Return smallest n such that o accepts n :: s. *)
Definition find_extension o s (witness : exists n, o (n :: s) = 0) : spr o :=
let P := (fun n => o (n :: s) = 0) in
let D := (decider_dec o s) in
spr_s o
((constructive_ground_epsilon_nat P D witness) :: s)
(constructive_ground_epsilon_spec_nat P D witness).
(*
To generalize BCP to spreads we first define a function that retracts the Baire
space onto an arbitrary spread given its spread law. This happens in two steps.
*)
(* Compute retraction for finite start sequences. *)
Fixpoint rho o
(Hnil : o [] = 0)
(Hcons : forall s, o s = 0 -> exists n, o (n :: s) = 0)
(s : list nat) : spr o :=
match s with
| [] => spr_s o [] Hnil
| n :: s =>
match rho o Hnil Hcons s with
| spr_s _ t Ht =>
match eq_dec (o (n :: t)) 0 with
| left Heq => spr_s o (n :: t) Heq
| right _ => find_extension o t (Hcons t Ht)
end
end
end.
(* Retraction of N onto F_o *)
Definition retract o
(Hnil : o [] = 0)
(Hcons : forall s, o s = 0 -> exists n, o (n :: s) = 0)
: N -> N :=
fun a => fun n =>
match rho o Hnil Hcons (get (n + 1) a) with
| spr_s _ [] _ => 0 (* not reachable *)
| spr_s _ (rho_n :: _) _ => rho_n
end.
Say I have the following Program Fixpoint:
From Coq Require Import List Program.
Import ListNotations.
Program Fixpoint f l {measure (length l)}: list nat :=
let f_rec := (f (tl l) ) in
match hd_error l with
| Some n => n :: f_rec
| None => []
end.
(This example basically returns l in a very stupid way, for the sake of having a simple example).
Here, I have a recursive call to f (stored in f_rec) which is only used if l contains an element, which ensures that when I use f_rec, length (tl l) is indeed smaller than length l.
However, when I want to solve the obligation
Next Obligation.
I don't have the hypothesis hd_error l = Some n which I need.
(Somehow, I have the impression that it is understood as "compute f (tl l) at the let in place", and not "delay the computation until it is actually used").
To illustrate the difference, if I "inline" the let ... in statement:
Program Fixpoint f l {measure (length l)}: list nat :=
match hd_error l with
| Some n => n :: (f (tl l) )
| None => []
end.
Next Obligation.
destruct l.
Here I have Heq_anonymous : Some n = hd_error [] in the environment.
My question is the following:
is it possible to have the hypothesis I need, i.e. to have the hypothesis generated by the match ... with statement ?
N.B.: Moving the let is a solution, but I am curious to know whether this is possible without doing so. For instance, it might be useful in the case f_rec is used in various contexts, to avoid duplicating f (tl l).
One trick is to explicitly ask for the hypothesis you need (I recently saw it in this answer by Joachim Breitner):
let f_rec := fun pf : length (tl l) < length l => f (tl l) in
This way you will be able to use f_rec only when it makes sense.
Program Fixpoint f l {measure (length l)}: list nat :=
let f_rec := fun pf : length (tl l) < length l => f (tl l) in
match hd_error l with
| Some n => n :: f_rec _
| None => []
end.
Next Obligation. destruct l; [discriminate | auto]. Qed.
I'm attempting to define and prove correct in Coq a function that efficiently diffs two sorted lists. As it does not always recurse on a structurally smaller term (either the first or second list is smaller), Fixpoint won't accept it, so I'm attempting to use Program Fixpoint instead.
When attempting to prove a property of the function using the tactic simpl or program_simpl, Coq spends minutes computing and then produces a giant term, hundreds of lines long. I was wondering if I'm using Program Fixpoint the wrong way, or alternatively if there are other tactics that should be used instead of simplification when reasoning about it?
I also wondered if it's good practice to include the required properties for correctness in params like this, or would it be better to have a separate wrapper function that takes the correctness properties as params, and make this function just take the two lists to be diffed?
Note that I did try defining a simpler version of make_diff, which only took l1 and l2 as parameters and fixed the type A and relation R, but this still produced a gigantic term when the program_simpl or simpl tactics were applied.
*Edit: my includes are (although they may not all be required here):
Require Import Coq.Sorting.Sorted.
Require Import Coq.Lists.List.
Require Import Coq.Relations.Relation_Definitions.
Require Import Recdef.
Require Import Coq.Program.Wf.
Require Import Coq.Program.Tactics.
The code:
Definition is_decidable (A : Type) (R : relation A) := forall x y, {R x y} + {~(R x y)}.
Definition eq_decidable (A : Type) := forall (x y : A), { x = y } + { ~ (x = y) }.
Inductive diff (X: Type) : Type :=
| add : X -> diff X
| remove : X -> diff X
| update : X -> X -> diff X.
Program Fixpoint make_diff (A : Type)
(R : relation A)
(dec : is_decidable A R)
(eq_dec : eq_decidable A)
(trans : transitive A R)
(lt_neq : (forall x y, R x y -> x <> y))
(l1 l2 : list A)
{measure (length l1 + length l2) } : list (diff A) :=
match l1, l2 with
| nil, nil => nil
| nil, (new_h::new_t) => (add A new_h) :: (make_diff A R dec eq_dec trans lt_neq nil new_t)
| (old_h::old_t), nil => (remove A old_h) :: (make_diff A R dec eq_dec trans lt_neq old_t nil)
| (old_h::old_t) as old_l, (new_h::new_t) as new_l =>
if dec old_h new_h
then (remove A old_h) :: make_diff A R dec eq_dec trans lt_neq old_t new_l
else if eq_dec old_h new_h
then (update A old_h new_h) :: make_diff A R dec eq_dec trans lt_neq old_t new_t
else (add A new_h) :: make_diff A R dec eq_dec trans lt_neq old_l new_t
end.
Next Obligation.
Proof.
simpl.
generalize dependent (length new_t).
generalize dependent (length old_t).
auto with arith.
Defined.
Next Obligation.
Proof.
simpl.
generalize dependent (length new_t).
generalize dependent (length old_t).
auto with arith.
Defined.
In this particular case we can get rid of Program Fixpoint and use plain simple Fixpoint. Since at each recursive call we invoke make_diff either on the tail of the first list or on the tail of the second list, we can nest two fixed-point functions as follows. (I have used the Section mechanism here to avoid passing too many identical arguments)
Require Import Coq.Lists.List.
Import ListNotations.
Require Import Coq.Relations.Relations.
Section Make_diff.
Variable A : Type.
Variable R : relation A.
Variable dec : is_decidable A R.
Variable eq_dec : eq_decidable A.
Variable trans : transitive A R.
Variable lt_neq : forall x y, R x y -> x <> y.
Fixpoint make_diff (l1 l2 : list A) : list (diff A) :=
let fix make_diff2 l2 :=
match l1, l2 with
| nil, nil => nil
| nil, new_h::new_t => (add A new_h) :: make_diff2 new_t
| old_h::old_t, nil => (remove A old_h) :: make_diff old_t nil
| old_h::old_t, new_h::new_t =>
if dec old_h new_h
then (remove A old_h) :: make_diff old_t l2
else if eq_dec old_h new_h
then (update A old_h new_h) :: make_diff old_t new_t
else (add A new_h) :: make_diff2 new_t
end
in make_diff2 l2.
End Make_diff.
Observe that the Section mechanism won't include unused parameters in the resulting signature. Here is a naive test:
(* make the first 2 arguments implicit *)
Arguments make_diff [A R] _ _ _ _.
Require Import Coq.Arith.Arith.
Compute make_diff lt_dec Nat.eq_dec [1;2;3] [4;5;6].
(* = [remove nat 1; remove nat 2; remove nat 3; add nat 4; add nat 5; add nat 6]
: list (diff nat) *)
For anyone who comes across the, a better alternative now is the Equations plugin, which will eventually replace Function and Program Fixpoint.
We have a function that inserts an element into a specific index of a list.
Fixpoint inject_into {A} (x : A) (l : list A) (n : nat) : option (list A) :=
match n, l with
| 0, _ => Some (x :: l)
| S k, [] => None
| S k, h :: t => let kwa := inject_into x t k
in match kwa with
| None => None
| Some l' => Some (h :: l')
end
end.
The following property of the aforementioned function is of relevance to the problem (proof omitted, straightforward induction on l with n not being fixed):
Theorem inject_correct_index : forall A x (l : list A) n,
n <= length l -> exists l', inject_into x l n = Some l'.
And we have a computational definition of permutations, with iota k being a list of nats [0...k]:
Fixpoint permute {A} (l : list A) : list (list A) :=
match l with
| [] => [[]]
| h :: t => flat_map (
fun x => map (
fun y => match inject_into h x y with
| None => []
| Some permutations => permutations
end
) (iota (length t))) (permute t)
end.
The theorem we're trying to prove:
Theorem num_permutations : forall A (l : list A) k,
length l = k -> length (permute l) = factorial k.
By induction on l we can (eventually) get to following goal: length (permute (a :: l)) = S (length l) * length (permute l). If we now simply cbn, the resulting goal is stated as follows:
length
(flat_map
(fun x : list A =>
map
(fun y : nat =>
match inject_into a x y with
| Some permutations => permutations
| None => []
end) (iota (length l))) (permute l)) =
length (permute l) + length l * length (permute l)
Here I would like to proceed by destruct (inject_into a x y), which is impossible considering x and y are lambda arguments. Please note that we will never get the None branch as a result of the lemma inject_correct_index.
How does one proceed from this proof state? (Please do note that I am not trying to simply complete the proof of the theorem, that's completely irrelevant.)
There is a way to rewrite under binders: the setoid_rewrite tactic (see §27.3.1 of the Coq Reference manual).
However, direct rewriting under lambdas is not possible without assuming an axiom as powerful as the axiom of functional extensionality (functional_extensionality).
Otherwise, we could have proved:
(* classical example *)
Goal (fun n => n + 0) = (fun n => n).
Fail setoid_rewrite <- plus_n_O.
Abort.
See here for more detail.
Nevertheless, if you are willing to accept such axiom, then you can use the approach described by Matthieu Sozeau in this Coq Club post to rewrite under lambdas like so:
Require Import Coq.Logic.FunctionalExtensionality.
Require Import Coq.Setoids.Setoid.
Require Import Coq.Classes.Morphisms.
Generalizable All Variables.
Instance pointwise_eq_ext {A B : Type} `(sb : subrelation B RB eq)
: subrelation (pointwise_relation A RB) eq.
Proof. intros f g Hfg. apply functional_extensionality. intro x; apply sb, (Hfg x). Qed.
Goal (fun n => n + 0) = (fun n => n).
setoid_rewrite <- plus_n_O.
reflexivity.
Qed.
Following the example given in the chapter GeneralRec of Chlipala book, I'm trying to write the mergesort algorithm.
Here is my code
Require Import Nat.
Fixpoint insert (x:nat) (l: list nat) : list nat :=
match l with
| nil => x::nil
| y::l' => if leb x y then
x::l
else
y::(insert x l')
end.
Fixpoint merge (l1 l2 : list nat) : list nat :=
match l1 with
| nil => l2
| x::l1' => insert x (merge l1' l2)
end.
Fixpoint split (l : list nat) : list nat * list nat :=
match l with
| nil => (nil,nil)
| x::nil => (x::nil,nil)
| x::y::l' =>
let (ll,lr) := split l' in
(x::ll,y::lr)
end.
Definition lengthOrder (l1 l2 : list nat) :=
length l1 < length l2.
Theorem lengthOrder_wf : well_founded lengthOrder.
Admitted.
The problem is that it is not possible to write the mergeSort function with the command Fixpoint since the function is not structurally decreasing :
Fixpoint mergeSort (l: list nat) : list nat :=
if leb (length l) 1 then l
else
let (ll,lr) := split l in
merge (mergeSort ll) (mergeSort lr).
Instead, one can use the command Program Fixpoint or Definition with the term Fix (as in Chlipala book).
However, if I'm writing this
Definition mergeSort : list nat -> list nat.
refine (Fix lengthOrder_wf (fun (l: list nat) => list nat)
(fun (l : list nat) => (fun mergeSort : (forall ls : list nat, lengthOrder ls l -> list nat )=>
if leb (length l) 1 then
let (ll,lr) := split l in
merge (mergeSort ll _) (mergeSort lr _)
else
l))).
I'm getting impossible goals :
2 subgoals, subgoal 1 (ID 65)
l : list nat
mergeSort : forall ls : list nat, lengthOrder ls l -> list nat
ll, lr : list nat
============================
lengthOrder ll l
subgoal 2 (ID 66) is:
lengthOrder lr l
That is why Chlipala suggests to change the definition of mergeSort this way:
Definition mergeSort : list nat -> list nat.
refine (Fix lengthOrder_wf (fun _ => list nat)
(fun (ls : list nat)
(mergeSort : forall ls' : list nat, lengthOrder ls' ls -> list nat) =>
if Compare_dec.le_lt_dec 2 (length ls)
then let lss := split ls in
merge (mergeSort (fst lss) _) (mergeSort (snd lss) _)
else ls)).
that generates the following goals:
2 subgoals, subgoal 1 (ID 68)
ls : list nat
mergeSort : forall ls' : list nat, lengthOrder ls' ls -> list nat
l : 2 <= length ls
lss := split ls : list nat * list nat
============================
lengthOrder (fst lss) ls
subgoal 2 (ID 69) is:
lengthOrder (snd lss) ls
This new definition sounds like magic to me. So I wonder:
Fom the first definition, is it still possible to proof the well-foudness of the function?
Otherwise why the first definition cannot work?
How a basic user can go from the first definition to the second easily?
It's easy to see that you need to make two changes in order to get to A. Chlipala's solution.
1) When doing split you somehow need to remember that ll and lr came from split, otherwise they would be some arbitrary lists, which cannot possibly be shorter than the original list l.
The following piece of code fails to save that kind of information:
let (ll,lr) := split l in
merge (mergeSort ll _) (mergeSort lr _)
and, thus, needs to be replaced with
let lss := split ls in
merge (mergeSort (fst lss) _) (mergeSort (snd lss) _)
which keeps what we need.
The failure happens due to Coq's inability to remember that ll and lr come from split l and that happens because let (ll,lr) is just match in disguise (see the manual, §2.2.3).
Recall that the aims of pattern-matching is to (loosely speaking)
unpack the components of some value of an inductive datatype and bind them to some names (we'll need this in the 2nd part of my answer) and
replace the original definition with its special cases in the corresponding pattern-match branches.
Now, observe that split l does not occur anywhere in the goal or context before we pattern-match on it. We just arbitrarily introduce it into the definition. That's why pattern-matching doesn't give us anything -- we can't replace split l with its "special case" ((ll,lr)) in the goal or context, because there is no split l anywhere.
There is an alternative way of doing this by using logical equality (=):
(let (ll, lr) as s return (s = split l -> list nat) := split l in
fun split_eq => merge (mergeSort ll _) (mergeSort lr _)) eq_refl
This is analogous to using the remember tactic. We've got rid of fst and snd, but it is a huge overkill and I wouldn't recommend it.
2) Another thing we need to prove is the fact that ll and lr are shorter than l when 2 <= length l.
Since an if-expression is a match in disguise as well (it works for any inductive datatype with exactly two constructors), we need some mechanism to remember that leb 2 (length l) = true in the then branch. Again, since we don't have leb anywhere, this information gets lost.
There are at least two possible solutions to the problem:
either we remember leb 2 (length l) as an equation (just as we did in the 1st part), or
we can use some comparison function with result type behaving like bool (so it can represent two alternatives), but it should also remember some additional information we need. Then we could pattern-match on the comparison result and extract the information, which, of course, in this case have to be a proof of 2 <= length l.
What we need is a type which is able to carry a proof of m <= n in the case when leb m n returns true and a proof of, say, m > n otherwise.
There is a type in the standard library that does exactly that! It's called sumbool:
Inductive sumbool (A B : Prop) : Set :=
left : A -> {A} + {B} | right : B -> {A} + {B}
{A} + {B} is just a notation (syntactic sugar) for sumbool A B.
Just as bool, it has two constructors, but in addition it remembers a proof of either of two propositions A and B. Its advantage over bool shows up when you do case analysis on it with if: you get a proof of A in the then branch and a proof of B in the else branch. In other words, you get to use context you saved beforehand, whereas bool doesn't carry any context (only in the mind of the programmer).
And we need exactly that! Well, not in the else branch, but we would like to get 2 <= length l in our then branch. So, let us ask Coq if it already has a comparison function with the return type like that:
Search (_ -> _ -> {_ <= _} + {_}).
(*
output:
le_lt_dec: forall n m : nat, {n <= m} + {m < n}
le_le_S_dec: forall n m : nat, {n <= m} + {S m <= n}
le_ge_dec: forall n m : nat, {n <= m} + {n >= m}
le_gt_dec: forall n m : nat, {n <= m} + {n > m}
le_dec: forall n m : nat, {n <= m} + {~ n <= m}
*)
Any of the five results would do, because we need a proof only in one case.
Hence, we can replace if leb 2 (length l) then ... with if le_lt_dec 2 (length l) ... and get 2 <= length in the proof context, which will let us finish the proof.