Coq Program Fixpoint vs equations as far as best way to get reduction lemmas? - coq

I am trying to prove that particular implementations of how to calculate the edit distance between two strings are correct and yield identical results. I went with the most natural way to define edit distance recursively as a single function (see below). This caused coq to complain that it couldn't determine the decreasing argument. After some searching, it seems that using the Program Fixpoint mechanism and providing a measure function is one way around this problem. However, this led to the next problem that the tactic simpl no longer works as expected. I found this question which has a similar problem, but I am getting stuck because I don't understand the role the Fix_sub function is playing in the code generated by coq for my edit distance function which looks more complicated than in the simple example in the previous question.
Questions:
For a function like edit distance, could the Equations package be easier to use than Program Fixpoint (get reduction lemmas automatically)? The previous question on this front is from 2016, so I am curious if the best practices on this front have evolved since then.
I came across this coq program involving edit_distance that using an inductively defined prop instead of a function. Maybe this is me still trying to wrap my head around the Curry-Howard Correspondence, but why is Coq willing to accept the inductive proposition definition for edit_distance without termination/measure complaints but not the function driven approach? Does this mean there is an angle using a creatively defined inductive type that could be passed to edit_distance that contains both strings that wrapped as a pair and a number and process on that coq would more easily accept as structural recursion?
Is there an easier way using Program Fixpoint to get reductions?
Fixpoint min_helper (best :nat) (l : list nat) : nat :=
match l with
| nil => best
| h::t => if h<?best then min_helper h t else min_helper best t
end.
Program Fixpoint edit_distance (s1 s2 : string) {measure (length s1+ length s2)} : nat :=
match s1, s2 with
| EmptyString , EmptyString => O
| String char rest , EmptyString => length s1
| EmptyString , String char rest => length s2
| String char1 rest1 , String char2 rest2 =>
let choices : list nat := S ( edit_distance rest1 s2) :: S (edit_distance s1 rest2) :: nil in
if (Ascii.eqb char1 char2)
then min_helper (edit_distance rest1 rest2 ) choices
else min_helper (S (edit_distance rest1 rest2)) choices
end.
Next Obligation.
intros. simpl. rewrite <- plus_n_Sm. apply Lt.le_lt_n_Sm. reflexivity. Qed.
Next Obligation.
simpl. rewrite <- plus_n_Sm. apply Lt.le_lt_n_Sm. apply PeanoNat.Nat.le_succ_diag_r. Qed.
Next Obligation.
simpl. rewrite <- plus_n_Sm. apply Lt.le_lt_n_Sm. apply PeanoNat.Nat.le_succ_diag_r. Qed.
Theorem simpl_edit : forall (s1 s2: string), edit_distance s1 s2 = match s1, s2 with
| EmptyString , EmptyString => O
| String char rest , EmptyString => length s1
| EmptyString , String char rest => length s2
| String char1 rest1 , String char2 rest2 =>
let choices : list nat := S ( edit_distance rest1 s2) :: S (edit_distance s1 rest2) :: nil in
if (Ascii.eqb char1 char2)
then min_helper (edit_distance rest1 rest2 ) choices
else min_helper (S (edit_distance rest1 rest2)) choices
end.
Proof. intros. induction s1.
- induction s2.
-- reflexivity.
-- reflexivity.
- induction s2.
-- reflexivity.
-- remember (a =? a0)%char as test. destruct test.
--- (*Stuck??? Normally I would unfold edit_distance but the definition coq creates after unfold edit_distance ; unfold edit_distance_func is hard for me to reason about*)

You can instead use Function, which comes with Coq and produces a reduction lemma for you (this will actually also generate a graph as Inductive R_edit_distance in the vein of the alternative development you mention, but here it's quite gnarly—that might be because of my edits for concision)
Require Import String.
Require Import List.
Require Import PeanoNat.
Import ListNotations.
Require Import FunInd.
Require Recdef.
Fixpoint min_helper (best : nat) (l : list nat) : nat :=
match l with
| [] => best
| h :: t => if h <? best then min_helper h t else min_helper best t
end.
Function edit_distance
(ss : string * string) (* unfortunately, Function only supports one decreasing argument *)
{measure (fun '(s1, s2) => String.length s1 + String.length s2) ss} : nat :=
match ss with
| (String char1 rest1 as s1, String char2 rest2 as s2) =>
let choices := [S (edit_distance (rest1, s2)); S (edit_distance (s1, rest2))] in
if Ascii.eqb char1 char2
then min_helper (edit_distance (rest1, rest2)) choices
else min_helper (S (edit_distance (rest1, rest2))) choices
| (EmptyString, s) | (s, EmptyString) => String.length s
end.
all: intros; simpl; rewrite Nat.add_succ_r; repeat constructor.
Qed.
Check edit_distance_equation. (* : forall ss : string * string, edit_distance ss = ... *)
Print R_edit_distance. (* Inductive R_edit_distance : string * string -> nat -> Prop := ... *)
The reason the graph Inductive definition (either then nice one you cited or the messy one generated here) doesn't require assurances of termination is that terms of Inductive type have to be finite already. A term of R_edit_distance ss n (which represents edit_distance ss = n) should be seen as a record or log of the steps in the computation of edit_distance. Though a generally recursive function could possibly get stuck in an infinite computation, the corresponding Inductive type excludes that infinite log: if edit_distance ss were to diverge, R_edit_distance ss n would simply be uninhabited for all n, so nothing blows up. In turn, you don't have the ability to actually compute, given ss, what edit_distance ss is or a term in {n | R_edit_distance ss n}, until you complete some termination proof. (E.g. proving forall ss, {n | R_edit_distance ss n} is a form of termination proof for edit_distance.)
Your idea to use structural recursion over some auxiliary type is exactly right (that's the only form of recursion that is available anyway; both Program and Function just build on it), but it doesn't really have anything to do with the graph inductive...
Fixpoint edit_distance
(s1 s2 : string) (n : nat) (prf : n = String.length s1 + String.length s2)
{struct n}
: nat := _.
Something along the lines of the above should work, but it'll be messy. (You could recurse over an instance of the graph inductive instead of the nat here, but, again, that just kicks the bucket to building instances of the graph inductive.)

There is a common trick to this kind of recursion over two arguments, which is to write two nested functions, each recursing over one of the two arguments.
This can also be understood from the perspective of dynamic programming, where the edit distance is computed by traversing a matrix. More generally, the edit distance function edit xs ys can be viewed as a matrix of nat with rows indexed by xs and columns indexed by ys. The outer recursion iterates over rows xs, and for each of those rows, when xs = x :: xs', the inner recursion iterates over its columns ys to generates the entries of that row from another row with a smaller index xs'.
From Coq Require Import String Ascii.
Local Open Scope string_scope.
Infix "::" := String.
(* structural recursion on xs *)
Fixpoint edit (xs ys : string) : nat :=
match xs with
| "" => String.length ys
| x :: xs' =>
(* structural recursion on ys *)
let fix edit_xs ys :=
match ys with
| "" => String.length xs
| y :: ys' =>
let orth := min (edit xs' ys) (edit_xs ys') in
if (x =? y)%char then
min (edit xs' ys') orth
else
orth
end in
edit_xs ys
end.

Related

Improving dependently typed reverse function

Here is the code I have thus far:
Section ilist.
Variable A: Set.
Inductive ilist : nat -> Set :=
| Nil : ilist O
| Cons : forall n, A -> ilist n -> ilist (S n).
(* not sure how to use in irev_aux *)
Lemma same_length: forall n i2, ilist (n + S i2) = ilist (S n + i2).
Proof.
intros.
rewrite Nat.add_succ_comm.
reflexivity.
Defined.
Definition same_length' n i2 (l: ilist (n + S i2)): ilist (S n + i2).
rewrite Nat.add_succ_comm.
assumption.
Defined.
Fixpoint irev_aux i1 i2 (ls:ilist i1): ilist i2 -> ilist (i1+i2) :=
match ls in (ilist i1') return (ilist i2 -> ilist (i1'+i2)) with
| Nil => fun rev => rev
| Cons h t => fun rev =>
same_length' _ _ ((irev_aux t) (Cons h rev))
end.
Definition same_length'' n (l:ilist (n+0)): ilist n.
Proof.
rewrite plus_0_r in l.
assumption.
Defined.
Definition irev n (ls:ilist n): ilist n :=
same_length'' n (irev_aux ls Nil).
End ilist.
This works! Which is an improvement from my previous attempts :) But there are a couple less than desirable aspects that I'd like to try and refine.
First, having a bunch of proofs hanging out to munge equivalent types seems...annoying. Basically, same_length, same_length', same_length''. Perhaps this is an issue with how I defined irev_aux, but I tried some definitions and others required a type level match which seemed about as annoying.
I tried using refine, but got a type error...is there a way to invoke refine, where you then have to prove that the types are in fact equivalent? In a sense that's what I did, but who wants those lemmas hanging out.
Beyond that, I realize there are probably ways to get rid of the equality lemmas, but I'm curious how I can make use of same_length. I've seen cases of matching on equality proofs before to get the typer to unify types...I tried that here but it didn't seem to work.
There is no way of programming irev without a cast. This is one of the many reasons why Coq users generally avoid indexed data types like this :)
Here is one possibility for writing this function:
Require Import Coq.Arith.Arith.
Set Implicit Arguments.
Unset Strict Implicit.
Definition cast {A B : Set} (e : A = B) : A -> B :=
match e with eq_refl => fun x => x end.
Section ilist.
Variable A: Set.
Inductive ilist : nat -> Set :=
| Nil : ilist O
| Cons : forall n, A -> ilist n -> ilist (S n).
Fixpoint irev_length i1 i2 : nat :=
match i1 with
| 0 => i2
| S i1 => irev_length i1 (S i2)
end.
Fixpoint irev_aux i1 i2 (ls:ilist i1): ilist i2 -> ilist (irev_length i1 i2) :=
match ls with
| Nil => fun rev => rev
| Cons h t => fun rev => irev_aux t (Cons h rev)
end.
Fixpoint irev_length_plus i1 i2 : irev_length i1 i2 = i1 + i2 :=
match i1 with
| 0 => eq_refl
| S i1 => eq_trans (irev_length_plus i1 (S i2)) (Nat.add_succ_r _ _)
end.
Definition irev i (l : ilist i) :=
cast (f_equal ilist (eq_trans (irev_length_plus i 0) (Nat.add_comm _ _)))
(irev_aux l Nil).
End ilist.
If you want to prove anything about irev, the easiest is probably to convert the indexed lists to normal lists and reasoning about reversal of normal lists. Fun exercise: try showing that irev is its own inverse.

Teach coq to check termination

Coq, unlike many others, accepts an optional explicit parameter,which can be used to indicate the decreasing structure of a fixpoint definition.
From Gallina specification, 1.3.4,
Fixpoint ident params {struct ident0 } : type0 := term0
defines the syntax. but from it, we've known that it must be an identifier, instead of a general measure.
However, in general, there are recursive functions, that the termination is not quite obvious,or it in fact is, but just difficult for the termination checker to find a decreasing structure. For example, following program interleaves two lists,
Fixpoint interleave (A : Set) (l1 l2 : list A) : list A :=
match l1 with
| [] => []
| h :: t => h :: interleave l2 t
end
This function clearly terminates, while Coq just couldn't figure it out. The reason is neither l1 nor l2 are decreasing every cycle. But what if we consider a measure, defined to be length l1 + length l2? Then this measure clearly decreases every recursion.
So my question is, in the case of sophisticated situation, where code is not straightforward to be organized in a termination checkable way, how do you educate coq and convince it to accept the fixpoint definition?
You have multiple options and all of them boil down to structural recursion in the end.
Preamble
From Coq Require Import List.
Import ListNotations.
Set Implicit Arguments.
Structural recursion
Sometimes you can reformulate your algorithm in a structurally recursive way:
Fixpoint interleave1 {A} (l1 l2 : list A) {struct l1} : list A :=
match l1, l2 with
| [], _ => l2
| _, [] => l1
| h1 :: t1, h2 :: t2 => h1 :: h2 :: interleave1 t1 t2
end.
Incidentally, in some cases you can use a trick with nested fixes -- see this definition of Ackermann function (it wouldn't work with just Fixpoint).
Program Fixpoint
You can use Program Fixpoint mechanism which lets you write your program naturally and later prove that it always terminates.
From Coq Require Import Program Arith.
Program Fixpoint interleave2 {A} (l1 l2 : list A)
{measure (length l1 + length l2)} : list A :=
match l1 with
| [] => l2
| h :: t => h :: interleave2 l2 t
end.
Next Obligation. simpl; rewrite Nat.add_comm; trivial with arith. Qed.
Function
Another option is to use the Function command which can be somewhat limited compared to Program Fixpoint. You can find out more about their differences here.
From Coq Require Recdef.
Definition sum_len {A} (ls : (list A * list A)) : nat :=
length (fst ls) + length (snd ls).
Function interleave3 {A} (ls : (list A * list A))
{measure sum_len ls} : list A :=
match ls with
| ([], _) => []
| (h :: t, l2) => h :: interleave3 (l2, t)
end.
Proof.
intros A ls l1 l2 h t -> ->; unfold sum_len; simpl; rewrite Nat.add_comm; trivial with arith.
Defined.
Equations plugin
This is an external plugin which addresses many issues with defining functions in Coq, including dependent types and termination.
From Equations Require Import Equations.
Equations interleave4 {A} (l1 l2 : list A) : list A :=
interleave4 l1 l2 by rec (length l1 + length l2) lt :=
interleave4 nil l2 := l2;
interleave4 (cons h t) l2 := cons h (interleave4 l2 t).
Next Obligation. rewrite Nat.add_comm; trivial with arith. Qed.
The code above works if you apply this fix.
Fix / Fix_F_2 combinators
You can learn more about this (manual) approach if you follow the links from this question about mergeSort function. By the way, the mergeSort function can be defined without using Fix if you apply the nested fix trick I mentioned earlier. Here is a solution which uses Fix_F_2 combinator since we have two arguments and not one like mergeSort:
Definition ordering {A} (l1 l2 : list A * list A) : Prop :=
length (fst l1) + length (snd l1) < length (fst l2) + length (snd l2).
Lemma ordering_wf' {A} : forall (m : nat) (p : list A * list A),
length (fst p) + length (snd p) <= m -> Acc (#ordering A) p.
Proof.
unfold ordering; induction m; intros p H; constructor; intros p'.
- apply Nat.le_0_r, Nat.eq_add_0 in H as [-> ->].
intros contra%Nat.nlt_0_r; contradiction.
- intros H'; eapply IHm, Nat.lt_succ_r, Nat.lt_le_trans; eauto.
Defined.
Lemma ordering_wf {A} : well_founded (#ordering A).
Proof. now red; intro ; eapply ordering_wf'. Defined.
(* it's in the stdlib but unfortunately opaque -- this blocks evaluation *)
Lemma destruct_list {A} (l : list A) :
{ x:A & {tl:list A | l = x::tl} } + { l = [] }.
Proof.
induction l as [|h tl]; [right | left]; trivial.
exists h, tl; reflexivity.
Defined.
Definition interleave5 {A} (xs ys : list A) : list A.
refine (Fix_F_2 (fun _ _ => list A)
(fun (l1 l2 : list A)
(interleave : (forall l1' l2', ordering (l1', l2') (l1, l2) -> list A)) =>
match destruct_list l1 with
| inright _ => l2
| inleft pf => let '(existT _ h (exist _ tl eq)) := pf in
h :: interleave l2 tl _
end) (ordering_wf (xs,ys))).
Proof. unfold ordering; rewrite eq, Nat.add_comm; auto.
Defined.
Evaluation tests
Check eq_refl : interleave1 [1;2;3] [4;5;6] = [1;4;2;5;3;6].
Check eq_refl : interleave2 [1;2;3] [4;5;6] = [1;4;2;5;3;6].
Check eq_refl : interleave3 ([1;2;3], [4;5;6]) = [1;4;2;5;3;6].
Fail Check eq_refl : interleave4 [1;2;3] [4;5;6] = [1;4;2;5;3;6]. (* Equations plugin *)
Check eq_refl : interleave5 [1;2;3] [4;5;6] = [1;4;2;5;3;6].
Exercise: what happens with this last check if you comment out destruct_list lemma?
You can use something called a measure instead of a structural argument for termination. For this, I believe you have to use the Program Fixpoint mechanism, which is a little involved and will make your proofs look uglier (because it generates a structural recursion out of the proof that you provide, so that the function you will actually use is not quite the function you wrote).
Details here:
https://coq.inria.fr/refman/program.html
It also seems like something called Equations can deal with measures?
cf. http://mattam82.github.io/Coq-Equations/examples/RoseTree.html
https://www.irif.fr/~sozeau/research/coq/equations.en.html

Program Fixpoint: recursive call in `let` and hypothesis of the obligation

Say I have the following Program Fixpoint:
From Coq Require Import List Program.
Import ListNotations.
Program Fixpoint f l {measure (length l)}: list nat :=
let f_rec := (f (tl l) ) in
match hd_error l with
| Some n => n :: f_rec
| None => []
end.
(This example basically returns l in a very stupid way, for the sake of having a simple example).
Here, I have a recursive call to f (stored in f_rec) which is only used if l contains an element, which ensures that when I use f_rec, length (tl l) is indeed smaller than length l.
However, when I want to solve the obligation
Next Obligation.
I don't have the hypothesis hd_error l = Some n which I need.
(Somehow, I have the impression that it is understood as "compute f (tl l) at the let in place", and not "delay the computation until it is actually used").
To illustrate the difference, if I "inline" the let ... in statement:
Program Fixpoint f l {measure (length l)}: list nat :=
match hd_error l with
| Some n => n :: (f (tl l) )
| None => []
end.
Next Obligation.
destruct l.
Here I have Heq_anonymous : Some n = hd_error [] in the environment.
My question is the following:
is it possible to have the hypothesis I need, i.e. to have the hypothesis generated by the match ... with statement ?
N.B.: Moving the let is a solution, but I am curious to know whether this is possible without doing so. For instance, it might be useful in the case f_rec is used in various contexts, to avoid duplicating f (tl l).
One trick is to explicitly ask for the hypothesis you need (I recently saw it in this answer by Joachim Breitner):
let f_rec := fun pf : length (tl l) < length l => f (tl l) in
This way you will be able to use f_rec only when it makes sense.
Program Fixpoint f l {measure (length l)}: list nat :=
let f_rec := fun pf : length (tl l) < length l => f (tl l) in
match hd_error l with
| Some n => n :: f_rec _
| None => []
end.
Next Obligation. destruct l; [discriminate | auto]. Qed.

Proof involving unfolding two recursive functions in COQ

I've began learning Coq, and am trying to prove something that seems fairly simple: if a list contains x, then the number of instances of x in that list will be > 0.
I've defined the contains and count functions as follows:
Fixpoint contains (n: nat) (l: list nat) : Prop :=
match l with
| nil => False
| h :: t => if beq_nat h n then True else contains n t
end.
Fixpoint count (n acc: nat) (l: list nat) : nat :=
match l with
| nil => acc
| h :: t => if beq_nat h n then count n (acc + 1) t else count n acc t
end.
I'm trying to prove:
Lemma contains_count_ge1 : forall (n: nat) (l: list nat), contains n l -> (count n 0 l > 0).
I understand the proof will involve unfolding the definitions of count and contains, but then I'd like to say "the list cannot be nil, as contains is true, so there must be an element x in l such that beq_nat h x is true", and I've played around a bit but can't figure out how to use tactics to do this. Any guidance would be greatly appreciated.
ejgallego already gave a great solution to your problem in his answer. I would still like to single out an important point that he left out: in Coq, you must always argue from first principles, and be very pedantic and precise about your proofs.
You argued that the proof should proceed as follows:
The list cannot be nil, as contains is true, so there must be an element x in l such that beq_nat h x is true.
Even though this makes intuitive sense for humans, it is not precise enough for Coq to understand. The problem, as ejgallego's answer shows, is that your informal reasoning conceals a use of induction. Indeed, it is useful to try to expand out your argument in more details even before translating it into tactics. We could proceed like this, for instance:
Let us prove that, for every n : nat and ns : list nat, contains n ns implies count n 0 ns > 0. We proceed by induction on the list ns. If ns = nil, the definition of contains implies that False holds; a contradiction. We are thus left with the case ns = n' :: ns', where we can use the following induction hypothesis: contains n ns' -> count n 0 ns' > 0. There are two sub-cases to consider: whether beq_nat n n' is true or not.
If beq_nat n n' is true, by the definition of count, we see that we just have to show that count n (0 + 1) ns' > 0. Note there isn't a direct way to proceed here. This is because you wrote count tail-recursively, using an accumulator. While this is perfectly reasonable in functional programming, it can making proving properties about count more difficult. In this case, we would need the following auxiliary lemma, also proved by induction: forall n acc ns, count n acc ns = acc + count n 0 ns. I'll let you figure out how to prove this one. But assuming that we have already established it, the goal would reduce to showing that 1 + count n 0 ns' > 0. This is true by simple arithmetic. (There is an even simpler way that does not require an auxiliary lemma, but it would require slightly generalizing the statement you're proving.)
If beq_nat n n' is false, by the definitions of contains and count, we would need to show that contains n ns' implies count n 0 ns' > 0. This is exactly what the induction hypothesis gives us, and we are done.
There are two lessons to be learned here. The first one is that doing formal proofs often requires translating your intuition in formal terms that the system can understand. We know intuitively what it means to have some element occur inside of a list. But if we were to explain what that means more formally, we would resort to some kind of recursive traversal of the list, which would probably turn out to be the very definition of count that you wrote in Coq. And in order to reason about recursion, we need induction. The second lesson is that the way you define things in Coq has important consequences for the proofs you write. ejgallego's solution did not require any auxiliary lemmas beyond those in the standard library, precisely because his definition of count was not tail-recursive.
Well, you pose many questions about basic Coq beyond what is IMO possible to address here. For this particular problem, I would proceed this way (in reality I would use the already provided lemmas in MathComp):
From Coq Require Import PeanoNat Bool List.
Fixpoint contains (n: nat) (l: list nat) : bool :=
match l with
| nil => false
| h :: t => if Nat.eqb h n then true else contains n t
end.
Fixpoint count (n : nat) (l: list nat) : nat :=
match l with
| nil => 0
| h :: t => if Nat.eqb h n then S (count n t) else count n t
end.
Lemma contains_count_ge1 n l : contains n l = true -> count n l > 0.
Proof.
induction l as [|x l IHl]; simpl; [now congruence|].
now destruct (Nat.eqb_spec x n); auto with arith.
Qed.
My "standard" solution:
Lemma test n (l : list nat) : n \in l -> 0 < count_mem n l.
Proof. by rewrite lt0n => /count_memPn/eqP. Qed.
and different definitions of count and contains that may prove useful:
Fixpoint contains (n: nat) (l: list nat) : bool :=
match l with
| nil => false
| h :: t => Nat.eqb h n || contains n t
end.
Fixpoint count (n : nat) (l: list nat) : nat :=
match l with
| nil => 0
| h :: t => Nat.b2n (Nat.eqb h n) + (count n t)
end.

Reasoning about lists in Coq

I'm try to solve some theorems, based on Pierce's "Software Foundations".
First of all I create a couple of useful functions:
Inductive natlist: Type :=
| nil: natlist
| cons : nat -> natlist -> natlist.
Notation "x :: l" := (cons x l) (at level 60, right associativity).
Fixpoint repeat (n count: nat): natlist :=
match count with
| O => nil
| S count' => n :: (repeat n count')
end.
Fixpoint length (l: natlist): nat :=
match l with
| nil => O
| h :: t => S (length t)
end.
Theorem count_repeat: forall n: nat, length (repeat n n) = n.
Proof.
intros n. induction n as [| n'].
simpl. reflexivity.
simpl. (* and here I can't continue... *)
I want to follow Pierce's advice:
Note that, since this problem is somewhat open-ended, it's possible
that you may come up with a theorem which is true, but whose proof
requires techniques you haven't learned yet. Feel free to ask for help
if you get stuck!
So, could you please advice some proof techniques for me?
As #eponier said, you should try to prove a more general lemma, like
Theorem count_repeat_gen: forall m n: nat, length (repeat n m) = m.
Using repeat n n creates an implicit link between the value of the element and the size of the list which makes your statement impossible to prove directly. Once you proved count_repeat_gen, you'll be able to prove your theorem.