What are the main blocks of a proof assistant?
I am just interested in knowing the internal logic of proof checking. For example, topics about graphical user interfaces of such assistants do not interest me.
A similar question to mine has been asked for compilers:
https://softwareengineering.stackexchange.com/questions/165543/how-to-write-a-very-basic-compiler
My concern is the same but for proof checking systems.
I'm hardly an expert on the matter (I'm only a user of these systems; I don't worry too much about their internals) and this will probably only be a vague partial answer, but the two main approaches that I know of are:
Dependently-typed systems (e.g. Coq, Lean, Agda) that use the Curry–Howard isomorphism. Statements are just types, and proofs are terms that have that type, so checking the validity of a proof is essentially just a special case of type checking a term. I don't want to say too much about this approach because I don't know too much about it and am afraid I'll get something wrong. Théo Winterhalter linked something in the comments above that may provide more context on this approach.
LCF-style theorems provers (e.g. Isabelle, HOL Light, HOL 4). Here a theorem is (roughly speaking) an opaque value of type thm in the implementation language. Only the comparatively small ‘proof kernel’ can create these thm values and all other parts of the system interact with this proof kernel. The kernel offers an interface consisting of various small functions that implement small inference steps such as modus ponens (if you have a theorem A ⟹ B and a theorem A, you can get the theorem B) or ∀-introduction (if you have the theorem P x for a fixed variable x, you can get the theorem ∀x. P x) etc. The kernel also offers an interface for defining new constants. In principle, as long as you can trust that these functions faithfully implement the basic inference steps of the underlying logic, you can trust that any thm value you can produce really corresponds to a theorem in your logic. For LCF-style provers, the answer of what the actual proof is is a bit more difficult to answer because they usually don't build proof terms (e.g. Isabelle has them, but they are disabled by default and not widely used). I think one could say that the history of how the kernel primitives are called constitute the proof, and if one were to record it, it could – in principle – be replayed and checked in another system.
In both cases the idea is that you have a kernel (the type checker in the former case and the inference kernel in the latter) that you have to trust, and then you have a large ecosystem of additional procedures around this to provide more convenience layers. Since they have to interact with the kernel in order to actually produce theorems, however, you do not have to trust that code.
All these different systems have various trade-offs about what parts of the system are in the kernel and what parts are not. In general, I think it is fair to say that the dependently-typed systems tend to have considerably larger kernels than the LCF-based ones (e.g. HOL Light has a particularly small and simple kernel).
There are also other systems that I believe do not fit into these two categories (e.g. Mizar, ACL2, PVS, Metamath, NuPRL) but I don't know anything about how these are implemented.
In the case of LCF, HOL and Isabelle, you'll find an extensive answer to your question in the journal article "From LCF to Isabelle/HOL". (It's open access.)
Most dependently typed systems, such as Coq, are also LCF-style theorem provers, as described in the article and in Eberl's answer. One significant difference is that such calculi incorporate full proof objects, so that one of the objectives of the LCF approach — to save space by not storing proofs — is abandoned. However, the objective of soundness is still met.
Related
There is a programming "style" (or maybe paradigm, i'm not sure what to call it) which is as follows:
First, you write a specification: a formal description of what your (whole, or part of) program is to do. This is done within the programming system; it is not a separate artifact.
Then, you write the program, but - and this is the key distinction between this programming style and others - every step of this writing task is guided in some way by the specification you've written in the previous step. How exactly this guidance happens varies wildly; in Coq you have a metaprogramming language (Ltac) which lets you "refine" the specification while building the actual program behind the scenes, whereas in Agda you compose a program by filling "holes" (i'm not actually sure how it goes in Agda, as i'm mostly used to Coq).
This isn't exactly everyone's favorite style of programming, but i'd like to try practicing it in general-purpose, popular programming languages. At least in Coq i've found it to be fairly addictive!
...but how would i even search for ways to do it outside proof assistants? Which leads us to the question: I'm looking for a name for this programming style, so that i can try looking up tools that let me program like that in other programming languages.
Mind you, of course a more proper question would be directly asking for examples of such tools, but AFAIK questions asking for lists of answers aren't appropriate for Stack Exchange sites.
And to be clear, i'm not all that hopeful i'm really going to find much; these are mostly academic pastimes, and your typical programming language isn't really amenable to this style of programming (for example, the specification language might end up being impossibly complex). But it's worth a shot!
It is called proof-driven development (or type-driven development). However, there is very little information about it.
This process you mention about slowly creating your program by means of ltac (in the case of coq) or holes (in the case of Agda and Idris) is called refinement. So you will also find reference in the literature for this style as proof by refinement or programming by refinement.
Now the most important thing to realize is that this style of programming is intrinsic to more complex type system that will allow you to extract as much information as possible the current environment. So it is natural to find attached with dependent types, although it is not necessarily the case.
As mentioned in another response you're also going to find references to it as Type-Driven Development, there is an idris book about it.
You may be interested in looking into some other projects such as Lean, Isabelle, Idris, Agda, Cedille, and maybe Liquid Haskell, TLA+ and SAW.
As pointed out by the two previous answers, a possible name for the program style you mention certainly is: type-driven development.
From the Coq viewpoint, you might be interested in the following two references:
Certified Programming with Dependent Types (CPDT, by Adam Chlipala): a Coq textbook that teaches advanced techniques to develop dependently-typed Coq theories and automate related proofs.
Experience Report: Type-Driven Development of Certified Tree Algorithms in Coq (by Reynald Affeldt, Jacques Garrigue, Xuanrui Qi, Kazunari Tanaka), published at the Coq Workshop 2019 (slides, extended abstract):
The authors also use the acronym TDD, which interestingly enough, also has another acceptation in the software engineering community: test-driven development (this widely used methodology naturally leads to high-quality test suites).
Actually, both acceptations of TDD share a common idea: one systematically starts by writing the specification (of the considered unit), then only after that, writing some code that fulfills the spec (make the unit tests pass), then we loop and incrementally specify+implement(+refactor) other code units.
Last but not least, there are some extra pointers in this discussion from the Discourse OCaml forum.
I'm assuming Coq at some point moved to an LCF approach. In the past, I wondered about the foundations of the kernel in Isabelle. And I found some nice description of Isabelle/Pure in a master thesis summarizing somehow the existing literature.
I was wondering if there is a description of Coq's kernel covering the logical and implementation aspects of it.
I think your questions is similar to How does one implement Coq?.
At least I'm tempted to give a similar answer.
I think MetaCoq is the state-of-the-art effort to specify and (partially) verify the Coq kernel: https://github.com/MetaCoq/metacoq.
It is initially a library for meta-programming in Coq and as such implements a representation of the kernel inside Coq. It has evolved a lot and now contains the typing rules of (a subset of) Coq as well as formalisation of several meta-theoretical properties, a type-checker and an erasure mechanism.
Now understanding your question:
The Coq reference manual already offers some sort of specification of the Calculus of Inductive Constructions, which should always be up to date with the latest version of Coq.
The MetaCoq Project paper also attempts a specification of the predicative calculus of cumulative inductive constructions (PCUIC).
You seem to think that this somehow might have less value than a paper specification when done in the proof assistant itself, obviously I do not exactly think so (but I'm one of the authors, I'm biased). This is a fair concern, but at least as far as the specification is concerned, it only makes it much more precise than could be done on paper. The Coq reference manual can be imprecise at times. Our work also forces us to explicit invariants of representations that are not enforced in ocaml. Also we separate implementation and specification (the Coq reference manual is pretty implementation oriented). Arguably more works need to be done on this separation.
Otherwise, usually people treat subsets of these calculi, espcially regarding inductive types which are rather painful to lay out entirely.
I see a couple of different research groups, and at least one book, that talk about using Coq for designing certified programs. Is there are consensus on what the definition of certified program is? From what I can tell, all it really means is that the program was proved total and type correct. Now, the program's type may be something really exotic such as a list with a proof that it's nonempty, sorted, with all elements >= 5, etc. However, ultimately, is a certified program just one that Coq shows is total and type safe, where all the interesting questions boil down to what was included in the final type?
Edit 1
Based on wjedynak's answer, I had a look at Xavier Leroy's paper "Formal Verification of a Realistic Compiler", which is linked in the answers below. I think this contains some good information, but I think the more informative information in this sequence of research can be found in the paper Mechanized Semantics for the Clight Subset of the C Language by Sandrine Blazy and Xavier Leroy. This is the language that the "Formal Verification" paper adds optimizations to. In it, Blazy and Leroy present the syntax and semantics of the Clight language and then discuss the validation of these semantics in section 5. In section 5, there's a list of different strategies used for validating the compiler, which in some sense provides an overview of different strategies for creating a certified program. These are:
Manual reviews
Proving properties of the semantics
Verified translations
Testing executable semantics
Equivalence with alternate semantics
In any case, there are probably points that could be added and I'd certainly like to hear about more.
Going back to my original question of what the definition is of a certified program, it's still a little unclear to me. Wjedynak sort of provides an answer, but really the work by Leroy involved creating a compiler in Coq and then, in some sense, certifying it. In theory, it makes it possible to now prove things about the C programs since we can now go C->Coq->proof. In that sense, it seems like there's this work flow where we could
Write a program in X language
Form of a model of the program from step 1 in Coq or some other proof assistant tool. This could involve creating a model of the programming language in Coq or it could involve creating a model of the program directly (i.e. rewriting the program itself in Coq).
Prove some property about the model. Maybe it's a proof about the values. Maybe it's the proof of the equivalence of statements (stuff like 3=1+2 or f(x,y)=f(y,x), whatever.)
Then, based on these proofs, call the original program certified.
Alternatively, we could create a specification of a program in a proof assistant tool and then prove properties about the specification, but not the program itself.
In any case, I'm still interested in hearing alternative definitions if anyone has them.
I agree that the notion seems vague, but in my understanding a certified program is a program equipped/together with the proof of correctness. Now, by using rich and expressive type signatures you can make it so there is no need for a separate proof, but this is often only a matter of convenience. The real issue is what do we mean by correctness: this a matter of specification. You can take a look at e.g. Xavier Leroy. Formal verification of a realistic compiler.
First note that the phrase "certified" has a slightly French bias: elsewhere the expression "verified" or "proven" is often used.
In any case it is important to ask what that actually means. X. Leroy and CompCert is a very good starting point: it is a big project about C compiler verification, and Leroy is always keen to explain to his audience why verification matters. Especially when talking to people from "certification agencies" who usually mean testing, not proving.
Another big verification project is L4.verified which uses Isabelle/HOL. This part of the exposition explains a bit what is actually stated and proven, and what are the consequences. Unfortunately, the actual proof is top secret, so it cannot be checked publicly.
A certified program is a program that is paired with a proof that the program satisfies its specification, i.e., a certificate. The key is that there exists a proof object that can be checked independently of the tool that produced the proof.
A verified program has undergone verification, which in this context may typically mean that its specification has been formalized and proven correct in a system like Coq, but the proof is not necessarily certified by an external tool.
This distinction is well attested in the scientific literature and is not specific to Francophones. Xavier Leroy describes it very clearly in Section 2.2 of A formally verified compiler back-end.
My understanding is that "certified" in this sense is, as Makarius pointed out, an English word chosen by Francophones where native speakers might instead have used "formally verified". Coq was developed in France, and has many Francophone developers and users.
As to what "formal verification" means, Wikipedia notes (license: CC BY-SA 3.0) that it:
is the act of proving ... the correctness of intended algorithms underlying a system with respect to a certain formal specification or property, using formal methods of mathematics.
(I realise you would like a much more precise definition than this. I hope to update this answer in future, if I find one.)
Wikipedia especially notes the difference between verification and validation:
Validation: "Are we trying to make the right thing?", i.e., is the product specified to the user's actual needs?
Verification: "Have we made what we were trying to make?", i.e., does the product conform to the specifications?
The landmark paper seL4: Formal Verification of an OS Kernel (Klein, et al., 2009) corroborates this interpretation:
A cynic might say that an implementation proof only shows that the
implementation has precisely the same bugs that the specification
contains. This is true: the proof does not guarantee that the
specification describes the behaviour the user expects. The
difference [in a verified approach compared to a non-verified one]
is the degree of abstraction and the absence of whole classes of bugs.
Which classes of bugs are those? The Agda tutorial gives some idea:
no runtime errors (inevitable errors like I/O errors are handled; others are excluded by design).
no non-productive infinite loops.
It may means free of runtime error (numeric overflow, invalid references …), which is already good compared to most developed software, while still weak. The other meaning is proved to be correct according to a domain formalization; that is, it does not only have to be formally free of runtime errors, it also has to be proved to do what it's expected to do (which must have been precisely defined).
Could anyone give me more information about the first-class module in Coq? I know that module in Coq is not a first-class. I would like to know the reason why? and is it possible that in the future module in Coq can be first-class?
Thank you very much
I’m not certain, but as I understand it it comes from essentially two points:
Coq is conservative. Because some of its main applications are in verification, Coq mostly restricts itself to constructs whose semantics are reasonably well-understood.
In the dependent-types setting, first-class modules are rather subtle and not fully understood. In particular, how much of the computational/reduction behaviour of definitions do you want visible outside a module? If none at all, then this is already available, as record types. But if some or all of the reduction behaviour is visible, then it’s hard to quantify exactly how much is, so it’s quite hard to analyse the semantics of the resulting modules.
I’m not an expert on the relevant literature, so I may well be wrong about 2, but I’ve gotten the impression that this is the basic situation.
I am reading Coders at Work.
I came across this paragraph in Donald Knuth's interview.
Seibel: It seems a lot of the people I’ve talked to had direct access to a machine when they were starting out. Yet Dijkstra has a paper I’m sure you’re familiar with, where he basically says we shouldn’t let computer-science students touch a machine for the first few years of their training; they should spend all their time manipulating symbols.
I want link to that paper. Which one paper is that? (He wrote too many :-)
Maybe this one?
Excerpt, from near the end:
Before we part, I would like to invite you to consider the following way of doing justice to computing's radical novelty in an introductory programming course.
On the one hand, we teach what looks like the predicate calculus, but we do it very differently from the philosophers. In order to train the novice programmer in the manipulation of uninterpreted formulae, we teach it more as boolean algebra, familiarizing the student with all algebraic properties of the logical connectives. To further sever the links to intuition, we rename the values {true, false} of the boolean domain as {black, white}.
On the other hand, we teach a simple, clean, imperative programming language, with a skip and a multiple assignment as basic statements, with a block structure for local variables, the semicolon as operator for statement composition, a nice alternative construct, a nice repetition and, if so desired, a procedure call. To this we add a minimum of data types, say booleans, integers, characters and strings. The essential thing is that, for whatever we introduce, the corresponding semantics is defined by the proof rules that go with it.
Right from the beginning, and all through the course, we stress that the programmer's task is not just to write down a program, but that his main task is to give a formal proof that the program he proposes meets the equally formal functional specification. While designing proofs and programs hand in hand, the student gets ample opportunity to perfect his manipulative agility with the predicate calculus. Finally, in order to drive home the message that this introductory programming course is primarily a course in formal mathematics, we see to it that the programming language in question has not been implemented on campus so that students are protected from the temptation to test their programs. And this concludes the sketch of my proposal for an introductory programming course for freshmen.
I found a manuscript of Dijkstra's "Cruelty" lecture.