Compute with a recursive function defined by well-defined induction

Compute with a recursive function defined by well-defined induction - coq

When I use Function to define a non-structurally recursive function in Coq, the resulting object behaves strangely when a specific computation is asked. Indeed, instead of giving directly the result, the Eval compute in ... directive return a rather long (typically 170 000 lines) expression. It seems that Coq cannot evaluate everything, and therefore returns a simplified (but long) expression instead of just a value.
The problem seems to come from the way I prove the obligations generated by Function. First, I thought the problem came from the opaque terms I used, and I converted all the lemmas to transparent constants. By the way, is there a way to list the opaque terms appearing in a term ? Or any other way to turn opaque lemmas into transparent ones ?
I then remarked that the problem came more precisely from the proof that the order used is well-founded. But I got strange results.
For example, I define log2 on the natural numbers by repeatedly applying div2. Here is the definition:
Function log2 n {wf lt n} :=
match n with
| 0 => 0
| 1 => 0
| n => S (log2 (Nat.div2 n))
end.
I get two proof obligations. The first one checks that n respects the relation lt in the recursive calls and can be proved easily.
forall n n0 n1 : nat, n0 = S n1 -> n = S (S n1) -> Nat.div2 (S (S n1)) < S (S n1)
intros. apply Nat.lt_div2. apply le_n_S. apply le_0_n.
The second one checks that lt is a well-founded order. This is already proved in the standard library. The corresponding lemma is Coq.Arith.Wf_nat.lt_wf.
If I use this proof, the resulting function behaves normally. Eval compute in log24 10. returns 3.
But if I want to do the proof myself, I do not always get this behaviour. First, if I end the proof with Qed instead of Defined, the result of the computation (even on small numbers) is a complex expression and not a single number. So I use Defined and try to use only transparent lemmas.
Lemma lt_wf2 : well_founded lt.
Proof.
unfold well_founded. intros n.
apply (lemma1 n). clear n.
intros. constructor. apply H.
Defined.
Here, lemma1 is a proof of the well-founded induction on the natural numbers. Here again, I can use already existing lemmas, such as lt_wf_ind, lt_wf_rec, lt_wf_rec1 located in Coq.Arith.Wf_nat, or even well_founded_ind lt_wf. The first one does not work, it seems this is because it is opaque. The three others work.
I tried to prove it directly using the standard induction on the natural numbers, nat_ind. This gives:
Lemma lemma1 : forall n (P:nat -> Prop),
(forall n, (forall p, p < n -> P p) -> P n) -> P n.
Proof.
intros n P H. pose proof (nat_ind (fun n => forall p, p < n -> P p)).
simpl in H0. apply H0 with (n:=S n).
- intros. inversion H1.
- intros. inversion H2.
+ apply H. exact H1.
+ apply H1. assumption.
- apply le_n.
Defined.
With this proof (and some variants of it), log2 has the same strange behaviour. And this proof seems to use only transparent objects, so maybe the problem is not there.
How can I define a Function that returns understandable results on specific values ?

I've managed to pin-point the place that causes troubles: it's inversion H2. in lemma1. It turns out we don't need that case-analysis and intuition can finish the proof (it doesn't pattern-match on H2):
Lemma lemma1 : forall n (P:nat -> Prop),
(forall n, (forall p, p < n -> P p) -> P n) -> P n.
Proof.
intros n P H. pose proof (nat_ind (fun n => forall p, p < n -> P p)).
simpl in H0. apply H0 with (n:=S n).
- intros. inversion H1.
- intros. intuition.
- apply le_n.
Defined.
If we use lemma1 with this proof, the computation of log2 10 results in 3.
By the way, here is my version of lt_wf2 (it lets us compute as well):
Lemma lt_wf2 : well_founded lt.
Proof.
unfold well_founded; intros n.
induction n; constructor; intros k Hk.
- inversion Hk.
- constructor; intros m Hm.
apply IHn; omega.
(* OR: apply IHn, Nat.lt_le_trans with (m := k); auto with arith. *)
Defined.
I believe the
Using Coq's evaluation mechanisms in anger blog post by Xavier Leroy explains this kind of behavior.
it eliminates the proof of equality between the heads before recursing over the tails and finally deciding whether to produce a left or a right. This makes the left/right data part of the final result dependent on a proof term, which in general does not reduce!
In our case we eliminate the proof of inequality (inversion H2.) in the proof of lemma1 and the Function mechanism makes our computations depend on a proof term. Hence, the evaluator can't proceed when n > 1.
And the reason inversion H1. in the body of the lemma doesn't influence computations is that for n = 0 and n = 1, log2 n is defined within the match expression as base cases.
To illustrate this point, let me show an example when we can prevent evaluation of log2 n on any values n and n + 1 of our choice, where n > 1 and nowhere else!
Lemma lt_wf2' : well_founded lt.
Proof.
unfold well_founded; intros n.
induction n; constructor; intros k Hk.
- inversion Hk. (* n = 0 *)
- destruct n. intuition. (* n = 1 *)
destruct n. intuition. (* n = 2 *)
destruct n. intuition. (* n = 3 *)
destruct n. inversion Hk; intuition. (* n = 4 and n = 5 - won't evaluate *)
(* n > 5 *)
constructor; intros m Hm; apply IHn; omega.
Defined.
If you use this modified lemma in the definition of log2 you'll see that it computes everywhere except n = 4 and n = 5. Well, almost everywhere -- computations with large nats can result in stack overflow or segmentation fault, as Coq warns us:
Warning: Stack overflow or segmentation fault happens when working with
large numbers in nat (observed threshold may vary from 5000 to 70000
depending on your system limits and on the command executed).
And to make log2 work for n = 4 and n = 5 even for the above "flawed" proof, we could amend log2 like this
Function log2 n {wf lt n} :=
match n with
| 0 => 0
| 1 => 0
| 4 => 2
| 5 => 2
| n => S (log2 (Nat.div2 n))
end.
adding the necessary proofs at the end.
The "well-founded" proof must be transparent and can't rely on pattern-matching on proof objects because the Function mechanism actually uses the lt_wf lemma to compute the decreasing termination guard. If we look at the term produced by Eval (in a case where evaluation fails to produce a nat), we'll see something along these lines:
fix Ffix (x : nat) (x0 : Acc (fun x0 x1 : nat => S x0 <= x1) x) {struct x0}
It's easy to see that x0 : Prop, so it gets erased when extracting the functional program log2 into, say OCaml, but Coq's internal evaluation mechanism have to use it to ensure termination.

The reduction behavior of functions defined by well-founded recursion in Coq is generally not very good, even when you declare your proofs to be transparent. The reason for this is that arguments of well-foundedness usually need to be done with complicated proof terms. Since these proofs terms end up appearing in well-founded recursive definitions, "simplifying" your function will make all of those proof terms appear, as you noticed.
It is easier to rely on custom tactics and lemmas to reduce functions defined this way. First, I would recommend favoring Program Fixpoint over Function, because the latter is much older and (I think) less well maintained. Thus, you would end up with a definition like this:
Require Import Coq.Numbers.Natural.Peano.NPeano.
Require Import Coq.Program.Wf.
Require Import Coq.Program.Tactics.
Program Fixpoint log2 n {wf lt n} :=
match n with
| 0 => 0
| 1 => 0
| n => S (log2 (Nat.div2 n))
end.
Next Obligation.
admit.
Qed.
Now, you just need to use the program_simpl tactic to simplify calls to log2. Here's an example:
Lemma foo : log2 4 = 2.
Proof.
program_simpl.
Qed.

Related

Back to Coq proof checker after 5-8 years away: how to prove that (forall n, m : N, (n - (S m)) = pred(n - m))?

Somewhere between 5 and 8 years ago (probably 6 or 7) I wrote a full Formalization of Bubble Sort in Coq. One of the earliest Lemmas proven was the one in the title, which I called "sub_succ_r" (or maybe it's the standard name?):
forall n m : nat, (n - (S m))%nat = pred(n - m)
Now, back then, this was the very simple proof for that Lemma:
intros n m.
induction n m using n_double_ind.
simpl.
auto.
apply sub_0_r.
Qed.
"sub_0_r" is the Lemma that asserts that
forall n : nat, (n - 0)%nat = n
Now, attentive readers who are also familiar with modern Coq might have already spotted the problem with the old proof: that second line.
induction n m
i.e. simultaneous induction on 2 terms, doesn't work any more, not even when specifying that one is using "n_double_ind".
I've been wracking my head over how to prove this using first induction over n and then induction over m, but I just can't figure it out.
Any help, great or small, would be much appreciated.

If you want to apply induction on two variables simultaneously, you need to use comma separator, or Coq recognizes f t as f applied to t, so when you write n m this actually means n is a function and you want to apply it to m. Instead, use comma like this:
induction n, m.
this generates four subgoals. Two of them can be proved using auto so you can ask Coq to use auto tactic on every subgoal generated by induction, using semicolon like this:
induction n, m; auto.
and the remaining subgoals are easy to prove using the lemma you mentioned and induction hypotheses. so the whole script is as follows:
Lemma sub_0_r : forall n : nat, (n - 0) = n.
Admitted.
Theorem sub_succ_r: forall n m : nat, (n - (S m)) = pred(n - m).
Proof.
induction n, m; auto.
- apply sub_0_r.
- apply IHn.
Qed.
Also note that using %nat is not necessary.
But as you saw we only used IHn which means that we didn't use the induction hypothesis for m, so we do not need to use induction on m, only a destruct tactic would do the job, a better proof is:
induction n; destruct m; auto.
- apply sub_0_r.
- apply IHn.
which is minimal and elegant.

How should the excluded middle be used in the proof of the pigeonhole principle?

Lemma remove {A} (x : A) xs (p : In x xs) :
exists xs', (forall x', x' <> x -> In x' xs -> In x' xs') /\ (length xs = S (length xs')).
Proof.
induction xs.
- inversion p.
- destruct p.
+ subst x0.
exists xs.
split.
* intros x' neq pin.
destruct pin.
-- contradict neq. symmetry. assumption.
-- assumption.
* reflexivity.
+ destruct (IHxs H) as [xs' pxs']. clear IHxs.
destruct pxs' as [p1 plen]. rename x0 into x'.
exists (x' :: xs').
split.
* intros x'' neq pin.
destruct pin.
-- subst x'. left. reflexivity.
-- right. apply p1. assumption. assumption.
* simpl.
rewrite -> plen.
reflexivity.
Qed.
Theorem pigeonhole_principle: forall (X:Type) (l1 l2:list X),
excluded_middle ->
AllIn l1 l2 ->
length l2 < length l1 ->
repeats l1.
Proof.
induction l1; simpl; intros l2 ex_mid Hin Hlen.
- inversion Hlen.
- apply repeats_rest.
destruct (remove x l2) as [l2' Hl2].
+ apply Hin. left. reflexivity.
+ destruct Hl2 as [Hmap Hlen'].
rewrite Hlen' in Hlen.
clear Hlen'.
apply (IHl1 l2').
1 : { assumption. }
2 : { revert Hlen. unfold lt. intros. omega. }
clear Hlen IHl1.
revert Hin.
unfold AllIn.
intros.
apply Hmap.
2 : { apply Hin. right. assumption. }
1 subgoal
X : Type
x : X
l1, l2 : list X
ex_mid : excluded_middle
l2' : list X
Hmap : forall x' : X, x' <> x -> In x' l2 -> In x' l2'
Hin : forall u : X, In u (x :: l1) -> In u l2
u : X
H : In u l1
______________________________________(1/1)
u <> x
I found various solutions floating about for the pigeonhole principle on the net. The above is adapted from the one by Kovacs, but in his proof he proves that there is an absence of no duplicates rather that there are repetitions as in the SF exercise.
The marked difference is that I cannot prove the u <> x goal here because there is less information when the problem is stated in this form.
Since this problem is both hard and optional and there are existing solutions floating around - and I've already been working on it for two days, could somebody describe to me a high level plan of exactly what I need to make this proof.
I am not looking for a solution, but I am hoping that the one with the excluded middle turns out to be elegant, because the Coq proof without the excluded middle is just a mess of rewriting and knowing the source code of a program is far from understanding what it does. Most explanations of the principle just describe what it is which is not enough for me to bridge the intuition gap.
I've never seen the classical laws in action - it does not seem like knowing something is decidable would gain me much and I find it hard to see what the point of them is. This is especially so in this situation, so I am that much more interested to see what their purpose will turn out to be.

I came across this question when searching for answers while working through SF (Software Foundations), but managed to prove it myself. I'll provide a sketch for the SF version of the pigeonhole principle, using excluded_middle. The statement to prove is that if all elements in list l1 are in l2 and length l2 is less than length l1, then l1 contains repeated elements.
The proof as SF suggests begins by induction on l1. I'll omit the straightforward empty case. In the inductive case, destruct l2. The case where l2 is empty is straightforward; we consider the other case.
Now, because of induction on l1 and destructuring on l2, your statement to prove is about lists with a first member, let's call them x1::l1 and x2::l2. Your membership hypothesis now looks like: all elements in x1::l1 are in x2::l2.
But first, let us use excluded middle to state that either x1 in l1 or x1 not in l1. If x1 in l1, then trivially we can prove that x1::l1 has repeats. So moving forward, we may assume that x1 is not in l1.
It suffices to match the inductive case that there exists an l2' of the same length as l2 such that all elements of l1 are in l2'.
Now consider the membership hypothesis on x1::l1, with the forall variable introduced as x:
By hypothesis, we know that x1 in x2::l2. Now consider l2', where l2' is x2::l2 with one instance of x1 removed (use in_split). Now because x1 not in l1, we can conclude that all members of l1 are also in l2' and not the removed element. Then this satisfies the membership hypothesis in the induction, and some wrangling with lengths gives us length l2 = length l2', so that the length hypothesis is satisfied. We thus conclude that l1 contains repeats, and thus also x1::l1.
Edit: previously, I also did a case analysis on whether or not x1 = x2 or x1 in l2, and whether or not x = x2 or x in l2, and solved the special cases in a more straightforward manner. It's not needed, and the general case also covers them.

Wellfounded induction in CoQ

Let's say that I know certain natural numbers are good. I know 1 is good, if n is good then 3n is, and if n is good then n+5 is, and those are only ways of constructing good numbers. It seems to me that the adequate formalization of this in Coq is
Inductive good : nat -> Prop :=
| g1 : good 1
| g3 : forall n, good n -> good (n * 3)
| g5 : forall n, good n -> good (n + 5).
However, despite being obvious, the fact that 0 is not good seems not being provable using this definition (because when I invert, in case of g3 I only get the same thing in the hypothesis).
Now it isn't so obvious what exactly are good numbers. And it really seems that I don't need to characterize them totally in order to know that 0 is not good. For example, I can know that 2 is not good just by doing few inversions.

Indeed g3 can be applied an unbounded number of times when trying to disprove good 0. That is why we can think this proof requires induction (and we can see that the auxiliary lemma needed in the solution of #AntonTrunov uses induction). The same idea is used in theorem loop_never_stop of http://www.cis.upenn.edu/~bcpierce/sf/current/Imp.html#lab428.
Require Import Omega.
Example not_good_0 : ~ good 0.
Proof.
intros contra. remember 0 as n. induction contra.
discriminate. apply IHcontra. omega. omega.
Qed.

This problem needs induction. And induction needs some predicate P : nat -> Prop to work with. A primitive (constant) predicate like (fun n => ~good 0) doesn't give you much: you won't be able to prove the base case for 1 (which corresponds to the constructor g1), because the predicate "forgets" its argument.
So you need to prove some logically equivalent (or stronger) statement which readily will give you the necessary predicate.
An example of such equivalent statement is forall n, good n -> n > 0, which you can later use to disprove good 0. The corresponding predicate P is (fun n => n > 0).
Require Import Coq.Arith.Arith.
Require Import Coq.omega.Omega.
Inductive good : nat -> Prop :=
| g1 : good 1
| g3 : forall n, good n -> good (n * 3)
| g5 : forall n, good n -> good (n + 5).
Lemma good_gt_O: forall n, good n -> n > 0.
Proof.
intros n H. induction H; omega.
Qed.
Goal ~ good 0.
intro H. now apply good_ge_O in H.
Qed.
Here is a proof of the aforementioned equivalence:
Lemma not_good0_gt_zero_equiv_not_good0 :
(forall n, good n -> n > 0) <-> ~ good 0.
Proof.
split; intros; firstorder.
destruct n; [tauto | omega].
Qed.
And it's easy to show that forall n, n = 0 -> ~ good n which implicitly appears in #eponier's answer is equivalent to ~ good 0 too.
Lemma not_good0_eq_zero_equiv_not_good0 :
(forall n, n = 0 -> ~ good n) <-> ~ good 0.
Proof.
split; intros; subst; auto.
Qed.
Now, the corresponding predicate used to prove forall n, n = 0 -> ~ good n is fun n => n = 0 -> False. This can be shown by using manual application of the goal_ind induction principle, automatically generated by Coq:
Example not_good_0_manual : forall n, n = 0 -> ~ good n.
Proof.
intros n Eq contra.
generalize Eq.
refine (good_ind (fun n => n = 0 -> False) _ _ _ _ _);
try eassumption; intros; omega.
Qed.
generalize Eq. introduces n = 0 as a premise to the current goal. Without it the goal to prove would be False and the corresponding predicate would be the boring fun n => False again.

Using `dependent induction` tactic to keep information while doing induction

I have just run into the issue of the Coq induction discarding information about constructed terms while reading a proof from here.
The authors used something like:
remember (WHILE b DO c END) as cw eqn:Heqcw.
to rewrite a hypothesis H before the actual induction induction H. I really don't like the idea of having to introduce a trivial equality as it looks like black magic.
Some search here in SO shows that actually the remember trick is necessary. One answer here, however, points out that the new dependent induction can be used to avoid the remember trick. This is nice, but the dependent induction itself now seems a bit magical.
I have a hard time trying to understand how dependent induction works. The documentation gives an example where dependent induction is required:
Lemma le_minus : forall n:nat, n < 1 -> n = 0.
I can verify how induction fails and dependent induction works in this case. But I can't use the remember trick to replicate the dependent induction result.
What I tried so far to mimic the remember trick is:
Require Import Coq.Program.Equality.
Lemma le_minus : forall n:nat, n < 1 -> n = 0.
intros n H. (* dependent induction H works*)
remember (n < 1) as H0. induction H.
But this doesn't work. Anyone can explain how dependent induction works here in terms of the remember-ing?

You can do
Require Import Coq.Program.Equality.
Lemma le_minus : forall n:nat, n < 1 -> n = 0.
Proof.
intros n H.
remember 1 as m in H. induction H.
- inversion Heqm. reflexivity.
- inversion Heqm. subst m.
inversion H.
Qed.
As stated here, the problem is that Coq cannot keep track of the shape of terms that appear in the type of the thing you are doing induction on. In other words, doing induction over the "less than" relation instructs Coq to try to prove something about a generic upper bound, as opposed to the specific one you're considering (1).
Notice that it is always possible to prove such goals without remember or dependent induction, by generalizing your result a little bit:
Lemma le_minus_aux :
forall n m, n < m ->
match m with
| 1 => n = 0
| _ => True
end.
Proof.
intros n m H. destruct H.
- destruct n; trivial.
- destruct H; trivial.
Qed.
Lemma le_minus : forall n, n < 1 -> n = 0.
Proof.
intros n H.
apply (le_minus_aux n 1 H).
Qed.

Proof by case analysis in Coq

I am trying to prove a Proposition about the following function:
Program Fixpoint division (m:nat) (n:nat) {measure m} : nat :=
match lt_nat 0 n with
| false => 0
| true => match leq_nat n m with
| false => 0
| true => S (division (menos m n) n)
end
end.
menos is natural subtraction.
I am trying to prove some fact involving division. I wrote down an informal proof were I first consider a case analysis in lt_nat 0 n and then in the case when lt_nat is true a further case analysis in leq_nat n m. This is in order to reduce the definition of division.
However I can not find how to express this case analysis in Coq. I tried with destruct (leq_nat n m) but it does nothing. I am expecting Coq to generate two subgoals: one where I need to prove my proposition assuming leq_nat n m = false and one assuming leq_nat n m = true.
Furthermore, I can not unfold the definition of division in my proof! When I try unfold division I get: division_func (existT (fun _ : nat => nat) m n).
How can I perfom case analysis in leq_nat n m? Why is it that I can not unfold the definition of division as I usually do with other functions?
Thank you.

Everything is more complicated than usual because of Program Fixpoint, which does not define your function as you would expect with a classic Fixpoint, since it needs to find a structurally recursive way of defining it. What division really is, is hidden in division_func.
Therefore, to manipulate your function, you need to prove basic lemmas, including the one stating that your function can be replaced by its body.
Lemma division_eq : forall m n, division m n = match lt_nat 0 n with
| false => 0
| true => match leq_nat n m with
| false => 0
| true => S (division (menos m n) n)
end
end.
Now, the question is how to prove this result. Here is the only solution I know, which I consider really unsatisfying.
I use the tactic fix_sub_eq located in Program.Wf, or fix_sub_eq_ext in Program.Wf.WfExtensionality.
This gives something like:
Proof.
intros.
unfold division. unfold division_func at 1.
rewrite fix_sub_eq; repeat fold division_func.
- simpl. destruct (lt_nat 0 n) eqn:H.
destruct (leq_nat n m) eqn:H0. reflexivity.
reflexivity. reflexivity.
But the second goal is quite complicated. The easy and general way of solving it is to use the axioms proof_irrelevance and functional_extensionality. It should be possible to prove this particular subgoal without any axioms, but I have not found the right way to do it. Instead of manually applying the axioms, you can use the second tactic fix_sub_eq_ext which calls them directly, leaving you a single goal.
Proof.
intros.
unfold division. unfold division_func at 1.
rewrite fix_sub_eq_ext; repeat fold division_func.
simpl. destruct (lt_nat 0 n) eqn:H.
destruct (leq_nat n m) eqn:H0. reflexivity.
reflexivity. reflexivity.
Qed.
I have not found a better way to use Program Fixpoint, that's why I prefer using Function, which has other defaults, but generates directly the equation lemma.
Require Recdef.
Function division (m:nat) (n:nat) {measure (fun n => n) m} : nat :=
match lt_nat 0 n with
| false => 0
| true => match leq_nat n m with
| false => 0
| true => S (division (menos m n) n)
end
end.
Proof.
intros m n. revert m. induction n; intros.
- discriminate teq.
- destruct m. discriminate teq0.
simpl. destruct n. destruct m; apply le_n.
transitivity m. apply IHn. reflexivity. assumption. apply le_n.
Qed.
Check division_equation.
Now you have the equation lemma, you can rewrite with it and reason as usual.
About your problem with destruct, destruct does not unfold the definitions. Therefore, if you have no occurrences of the term you're destructing in your goal or any of the hypotheses, destruct will not do anything interesting, unless you save the equation it produces. You can use destruct ... eqn:H for this purpose. I did not know case_eq but it seems to do the same thing.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse