What is the name of the programming style enabled by dependent types (think Coq or Agda)? - coq

There is a programming "style" (or maybe paradigm, i'm not sure what to call it) which is as follows:
First, you write a specification: a formal description of what your (whole, or part of) program is to do. This is done within the programming system; it is not a separate artifact.
Then, you write the program, but - and this is the key distinction between this programming style and others - every step of this writing task is guided in some way by the specification you've written in the previous step. How exactly this guidance happens varies wildly; in Coq you have a metaprogramming language (Ltac) which lets you "refine" the specification while building the actual program behind the scenes, whereas in Agda you compose a program by filling "holes" (i'm not actually sure how it goes in Agda, as i'm mostly used to Coq).
This isn't exactly everyone's favorite style of programming, but i'd like to try practicing it in general-purpose, popular programming languages. At least in Coq i've found it to be fairly addictive!
...but how would i even search for ways to do it outside proof assistants? Which leads us to the question: I'm looking for a name for this programming style, so that i can try looking up tools that let me program like that in other programming languages.
Mind you, of course a more proper question would be directly asking for examples of such tools, but AFAIK questions asking for lists of answers aren't appropriate for Stack Exchange sites.
And to be clear, i'm not all that hopeful i'm really going to find much; these are mostly academic pastimes, and your typical programming language isn't really amenable to this style of programming (for example, the specification language might end up being impossibly complex). But it's worth a shot!

It is called proof-driven development (or type-driven development). However, there is very little information about it.

This process you mention about slowly creating your program by means of ltac (in the case of coq) or holes (in the case of Agda and Idris) is called refinement. So you will also find reference in the literature for this style as proof by refinement or programming by refinement.
Now the most important thing to realize is that this style of programming is intrinsic to more complex type system that will allow you to extract as much information as possible the current environment. So it is natural to find attached with dependent types, although it is not necessarily the case.
As mentioned in another response you're also going to find references to it as Type-Driven Development, there is an idris book about it.
You may be interested in looking into some other projects such as Lean, Isabelle, Idris, Agda, Cedille, and maybe Liquid Haskell, TLA+ and SAW.

As pointed out by the two previous answers, a possible name for the program style you mention certainly is: type-driven development.
From the Coq viewpoint, you might be interested in the following two references:
Certified Programming with Dependent Types (CPDT, by Adam Chlipala): a Coq textbook that teaches advanced techniques to develop dependently-typed Coq theories and automate related proofs.
Experience Report: Type-Driven Development of Certified Tree Algorithms in Coq (by Reynald Affeldt, Jacques Garrigue, Xuanrui Qi, Kazunari Tanaka), published at the Coq Workshop 2019 (slides, extended abstract):
The authors also use the acronym TDD, which interestingly enough, also has another acceptation in the software engineering community: test-driven development (this widely used methodology naturally leads to high-quality test suites).
Actually, both acceptations of TDD share a common idea: one systematically starts by writing the specification (of the considered unit), then only after that, writing some code that fulfills the spec (make the unit tests pass), then we loop and incrementally specify+implement(+refactor) other code units.
Last but not least, there are some extra pointers in this discussion from the Discourse OCaml forum.

Related

Definition of a certified program

I see a couple of different research groups, and at least one book, that talk about using Coq for designing certified programs. Is there are consensus on what the definition of certified program is? From what I can tell, all it really means is that the program was proved total and type correct. Now, the program's type may be something really exotic such as a list with a proof that it's nonempty, sorted, with all elements >= 5, etc. However, ultimately, is a certified program just one that Coq shows is total and type safe, where all the interesting questions boil down to what was included in the final type?
Edit 1
Based on wjedynak's answer, I had a look at Xavier Leroy's paper "Formal Verification of a Realistic Compiler", which is linked in the answers below. I think this contains some good information, but I think the more informative information in this sequence of research can be found in the paper Mechanized Semantics for the Clight Subset of the C Language by Sandrine Blazy and Xavier Leroy. This is the language that the "Formal Verification" paper adds optimizations to. In it, Blazy and Leroy present the syntax and semantics of the Clight language and then discuss the validation of these semantics in section 5. In section 5, there's a list of different strategies used for validating the compiler, which in some sense provides an overview of different strategies for creating a certified program. These are:
Manual reviews
Proving properties of the semantics
Verified translations
Testing executable semantics
Equivalence with alternate semantics
In any case, there are probably points that could be added and I'd certainly like to hear about more.
Going back to my original question of what the definition is of a certified program, it's still a little unclear to me. Wjedynak sort of provides an answer, but really the work by Leroy involved creating a compiler in Coq and then, in some sense, certifying it. In theory, it makes it possible to now prove things about the C programs since we can now go C->Coq->proof. In that sense, it seems like there's this work flow where we could
Write a program in X language
Form of a model of the program from step 1 in Coq or some other proof assistant tool. This could involve creating a model of the programming language in Coq or it could involve creating a model of the program directly (i.e. rewriting the program itself in Coq).
Prove some property about the model. Maybe it's a proof about the values. Maybe it's the proof of the equivalence of statements (stuff like 3=1+2 or f(x,y)=f(y,x), whatever.)
Then, based on these proofs, call the original program certified.
Alternatively, we could create a specification of a program in a proof assistant tool and then prove properties about the specification, but not the program itself.
In any case, I'm still interested in hearing alternative definitions if anyone has them.
I agree that the notion seems vague, but in my understanding a certified program is a program equipped/together with the proof of correctness. Now, by using rich and expressive type signatures you can make it so there is no need for a separate proof, but this is often only a matter of convenience. The real issue is what do we mean by correctness: this a matter of specification. You can take a look at e.g. Xavier Leroy. Formal verification of a realistic compiler.
First note that the phrase "certified" has a slightly French bias: elsewhere the expression "verified" or "proven" is often used.
In any case it is important to ask what that actually means. X. Leroy and CompCert is a very good starting point: it is a big project about C compiler verification, and Leroy is always keen to explain to his audience why verification matters. Especially when talking to people from "certification agencies" who usually mean testing, not proving.
Another big verification project is L4.verified which uses Isabelle/HOL. This part of the exposition explains a bit what is actually stated and proven, and what are the consequences. Unfortunately, the actual proof is top secret, so it cannot be checked publicly.
A certified program is a program that is paired with a proof that the program satisfies its specification, i.e., a certificate. The key is that there exists a proof object that can be checked independently of the tool that produced the proof.
A verified program has undergone verification, which in this context may typically mean that its specification has been formalized and proven correct in a system like Coq, but the proof is not necessarily certified by an external tool.
This distinction is well attested in the scientific literature and is not specific to Francophones. Xavier Leroy describes it very clearly in Section 2.2 of A formally verified compiler back-end.
My understanding is that "certified" in this sense is, as Makarius pointed out, an English word chosen by Francophones where native speakers might instead have used "formally verified". Coq was developed in France, and has many Francophone developers and users.
As to what "formal verification" means, Wikipedia notes (license: CC BY-SA 3.0) that it:
is the act of proving ... the correctness of intended algorithms underlying a system with respect to a certain formal specification or property, using formal methods of mathematics.
(I realise you would like a much more precise definition than this. I hope to update this answer in future, if I find one.)
Wikipedia especially notes the difference between verification and validation:
Validation: "Are we trying to make the right thing?", i.e., is the product specified to the user's actual needs?
Verification: "Have we made what we were trying to make?", i.e., does the product conform to the specifications?
The landmark paper seL4: Formal Verification of an OS Kernel (Klein, et al., 2009) corroborates this interpretation:
A cynic might say that an implementation proof only shows that the
implementation has precisely the same bugs that the specification
contains. This is true: the proof does not guarantee that the
specification describes the behaviour the user expects. The
difference [in a verified approach compared to a non-verified one]
is the degree of abstraction and the absence of whole classes of bugs.
Which classes of bugs are those? The Agda tutorial gives some idea:
no runtime errors (inevitable errors like I/O errors are handled; others are excluded by design).
no non-productive infinite loops.
It may means free of runtime error (numeric overflow, invalid references …), which is already good compared to most developed software, while still weak. The other meaning is proved to be correct according to a domain formalization; that is, it does not only have to be formally free of runtime errors, it also has to be proved to do what it's expected to do (which must have been precisely defined).

Dynamic typing and programming distributed systems

Coming from Scala (and Akka), I recently began looking at other languages that were designed with distributed computing in mind, namely Erlang (and a tiny bit of Oz and Bloom). Both Erlang and Oz are dynamically typed, and if I remember correctly (will try to find link) people have tried to add types to Erlang and managed to type a good portion of it, but could not successfully coerce the system to make it fit the last bit?
Oz, while a research language, is certainly interesting to me, but that is dynamically typed as well.
Bloom's current implementation is in Ruby, and is consequently dynamically typed.
To my knowledge, Scala (and I suppose Haskell, though I believe that was built initially more as an exploration into pure lazy functional languages as opposed to distributed systems) is the only language that is statically typed and offer language-level abstractions (for lack of a better term) in distributed computing.
I am just wondering if there are inherent advantages of dynamic typing over static typing, specifically in the context of providing language level abstractions for programming distributed systems.
Not really. For example, the same group that invented Oz later did some work on Alice ML, a project whose mission statement was to rethink Oz as a typed, functional language. And although it remained a research project, I'd argue that it was enough proof of concept to demonstrate that the same basic functionality can be supported in such a setting.
(Full disclosure: I was a PhD student in that group at the time, and the type system of Alice ML was my thesis.)
Edit: The problem with adding types to Erlang isn't distribution, it simply is an instance of the general problem that adding types to a language after the fact never works out well. On the other hand, there still is Dialyzer for Erlang.
Edit 2: I should mention that there were other interesting research projects for typed distributed languages, e.g. Acute, which had a scope similar to Alice ML, or ML5, which used modal types to enable stronger checking of mobility characteristics. But they have only survived in the form of papers.
There are no inherent advantages of dynamic typing over static typing for distributed systems. Both have their own advantages and disadvantages in general.
Erlang (Akka is inspired from Erlang Actor Model) is dynamically typed. Dynamic typing in Erlang was historically chosen for simple reasons; those who implemented Erlang at first mostly came from dynamically typed languages particularly Prolog, and as such, having Erlang dynamic was the most natural option to them. Erlang was built with failure in mind.
Static typing helps in catching many errors during compilation time itself rather than at runtime as in case of dynamic typing. Static Typing was tried in Erlang and it was a failure. But Dynamic typing helps in faster prototyping. Check this link for reference which talks a lot about the difference.
Subjectively, I would rather think about the solution/ algorithm of a problem rather than thinking about the type of each of the variable that I use in the algorithm. It also helps in quick development.
These are few links which might help
BenefitsOfDynamicTyping
static-typing-vs-dynamic-typing
BizarroStaticTypingDebate
Cloud Haskell is maturing quickly, statically-typed, and awesome. The only thing it doesn't feature is Erlang-style hot code swapping - that's the real "killer feature" of dynamically-typed distributed systems (the "last bit" that made Erlang difficult to statically type).

What are "not so well defined problems" that LISP is supposed to solve?

Most people agree that LISP helps to solve problems that are not well defined, or that are not fully understood at the beginning of the project.
"Not fully understood"" might indicate that we don't know what problem we are trying to solve, so the developer refines the problem domain continuously. But isn't this process language independent?
All this refinement does not take away the need for, say, developing algorithms/solutions for the final problem that does need to be solved. And that is the actual work.
So, I'm not sure what advantage LISP provides if the developer has no idea where he's going i.e. solving a problem that is not finalised yet.
Lisp (not "LISP") has a number of advantages when you're facing problems that are not well-defined. First of all, you have a REPL where you can quickly experiment with -- that helps in sketching out quick functions and trying to play with them, leading to a very rapid development cycle. Second, having a dynamically typed language is working well in this context too: with a statically typed language you need to "design more" before you begin, and changing the design leads to changing more code -- in contrast, with Lisps you just write the code and the data it operates on can change as needed. In addition to these, there's the usual benefits of a functional language -- one with first class lambda functions, etc (eg, garbage collection).
In general, these advantage have been finding their way into other languages. For example, Javascript has everything that I listed so far. But there is one more advantage for Lisps that is still not present in other languages -- macros. This is an important tool to use when your problem calls for a domain specific language. Basically, in Lisp you can extend the language with constructs that are specific to your problem -- even if these constructs lead to a completely different language.
Finally, you need to plan ahead for what happens when the code becomes more than a quick experiment. In this case you want your language to cope with "growing scripts into applications" -- for example, having a module system means that you can get a more "serious"
application. For example, in Racket you can get your solution separated into such modules, where each can be written in its own language -- it even has a statically typed language which makes it possible to start with a dynamically typed development cycle and once the code becomes more stable and/or big enough that maintenance becomes difficult, you can switch some modules into the static language and get the usual benefits from that. Racket is actually unique among Lisps and Schemes in this kind of support, but even with others the situation is still far more advanced than in non-Lisp languages.
In AI (Artificial Intelligence) historically Lisp was seen as the AI assembly language. It was used to build higher-level languages which help to work with the problem domain in a more direct way. Many of these domains need a lot of 'knowledge' for finding usable answers.
A typical example is an expert system for, say, oil exploration. The expert system gets as inputs (geological) observations and gives information about the chances to find oil, what kind of oil, in what depths, etc. To do that it needs 'expert knowledge' how to interpret the data. When you start such a project to develop such an expert system it is typically not clear what kind of inferences are needed, what kind of 'knowledge' experts can provide and how this 'knowledge' can be written down for a computer.
In this case one typically develops new languages on top of Lisp and you are not working with a fixed predefined language.
As an example see this old paper about Dipmeter Advisor, a Lisp-based expert system developed by Schlumberger in the 1980s.
So, Lisp does not solve any problems. But it was originally used to solve problems that are complex to program, by providing new language layers which should make it easier to express the domain 'knowledge', rules, constraints, etc. to find solutions which are not straight forward to compute.
The "big" win with a language that allows for incremental development is that you (typically) has a read-eval-print loop (or "listener" or "console") that you interact with, plus you tend to not need to lose state when you compile and load new code.
The ability to keep state around from test run to test run means that lengthy computations that are untouched by your changes can simply be kept around instead of being re-computed.
This allows you to experiment and iterate faster. Being able to iterate faster means that exploration is less of a hassle. Very useful for exploratory programming, something that is typical with dealing with less well-defined problems.

Language requirements for AI development [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Why is Lisp used for AI?
What makes a language suitable for Artificial Intelligence development?
I've heard that LISP and Prolog are widely used in this field. What features make them suitable for AI?
Overall I would say the main thing I see about languages "preferred" for AI is that they have high order programming along with many tools for abstraction.
It is high order programming (aka functions as first class objects) that tends to be a defining characteristic of most AI languages http://en.wikipedia.org/wiki/Higher-order_programming that I can see. That article is a stub and it leaves out Prolog http://en.wikipedia.org/wiki/Prolog which allows high order "predicates".
But basically high order programming is the idea that you can pass a function around like a variable. Surprisingly a lot of the scripting languages have functions as first class objects as well. LISP/Prolog are a given as AI languages. But some of the others might be surprising. I have seen several AI books for Python. One of them is http://www.nltk.org/book. Also I have seen some for Ruby and Perl. If you study more about LISP you will recognize a lot of its features are similar to modern scripting languages. However LISP came out in 1958...so it really was ahead of its time.
There are AI libraries for Java. And in Java you can sort of hack functions as first class objects using methods on classes, it is harder/less convenient than LISP but possible. In C and C++ you have function pointers, although again they are much more of a bother than LISP.
Once you have functions as first class objects, you can program much more generically than is otherwise possible. Without functions as first class objects, you might have to construct sum(array), product(array) to perform the different operations. But with functions as first class objects you could compute accumulate(array, +) and accumulate(array, *). You could even do accumulate(array, getDataElement, operation). Since AI is so ill defined that type of flexibility is a great help. Now you can build much more generic code that is much easier to extend in ways that were not originally even conceived.
And Lambda (now finding its way all over the place) becomes a way to save typing so that you don't have to define every function. In the previous example, instead of having to make getDataElement(arrayelement) { return arrayelement.GPA } somewhere you can just say accumulate(array, lambda element: return element.GPA, +). So you don't have to pollute your namespace with tons of functions to only be called once or twice.
If you go back in time to 1958, basically your choices were LISP, Fortran, or Assembly. Compared to Fortran LISP was much more flexible (unfortunately also less efficient) and offered much better means of abstraction. In addition to functions as first class objects, it also had dynamic typing, garbage collection, etc. (stuff any scripting language has today). Now there are more choices to use as a language, although LISP benefited from being first and becoming the language that everyone happened to use for AI. Now look at Ruby/Python/Perl/JavaScript/Java/C#/and even the latest proposed standard for C you start to see features from LISP sneaking in (map/reduce, lambdas, garbage collection, etc.). LISP was way ahead of its time in the 1950's.
Even now LISP still maintains a few aces in the hole over most of the competition. The macro systems in LISP are really advanced. In C you can go and extend the language with library calls or simple macros (basically a text substitution). In LISP you can define new language elements (think your own if statement, now think your own custom language for defining GUIs). Overall LISP languages still offer ways of abstraction that the mainstream languages still haven't caught up with. Sure you can define your own custom compiler for C and add all the language constructs you want, but no one does that really. In LISP the programmer can do that easily via Macros. Also LISP is compiled and per the programming language shootout, it is more efficient than Perl, Python, and Ruby in general.
Prolog basically is a logic language made for representing facts and rules. What are expert systems but collections of rules and facts. Since it is very convenient to represent a bunch of rules in Prolog, there is an obvious synergy there with expert systems.
Now I think using LISP/Prolog for every AI problem is not a given. In fact just look at the multitude of Machine Learning/Data Mining libraries available for Java. However when you are prototyping a new system or are experimenting because you don't know what you are doing, it is way easier to do it with a scripting language than a statically typed one. LISP was the earliest languages to have all these features we take for granted. Basically there was no competition at all at first.
Also in general academia seems to like functional languages a lot. So it doesn't hurt that LISP is functional. Although now you have ML, Haskell, OCaml, etc. on that front as well (some of these languages support multiple paradigms...).
The main calling card of both Lisp and Prolog in this particular field is that they support metaprogramming concepts like lambdas. The reason that is important is that it helps when you want to roll your own programming language within a programming language, like you will commonly want to do for writing expert system rules.
To do this well in a lower-level imperative language like C, it is generally best to just create a separate compiler or language library for your new (expert system rule) language, so you can write your rules in the new language and your actions in C. This is the principle behind things like CLIPS.
The two main things you want are the ability to do experimental programming and the ability to do unconventional programming.
When you're doing AI, you by definition don't really know what you're doing. (If you did, it wouldn't be AI, would it?) This means you want a language where you can quickly try things and change them. I haven't found any language I like better than Common Lisp for that, personally.
Similarly, you're doing something not quite conventional. Prolog is already an unconventional language, and Lisp has macros that can transform the language tremendously.
What do you mean by "AI"? The field is so broad as to make this question unanswerable. What applications are you looking at?
LISP was used because it was better than FORTRAN. Prolog was used, too, but no one remembers that. This was when people believed that symbol-based approaches were the way to go, before it was understood how hard the sensing and expression layers are.
But modern "AI" (machine vision, planners, hell, Google's uncanny ability to know what you 'meant') is done in more efficient programming languages that are more sustainable for a large team to develop in. This usually means C++ these days--but it's not like anyone thinks of C++ as a good language for AI.
Hell, you can do a lot of what was called "AI" in the 70s in MATLAB. No one's ever called MATLAB "a good language for AI" before, have they?
Functional programming languages are easier to parallelise due to their stateless nature. There seems to already be a subject about it with some good answers here: Advantages of stateless programming?
As said, its also generally simpler to build programs that generate programs in LISP due to the simplicity of the language, but this is only relevant to certain areas of AI such as evolutionary computation.
Edit:
Ok, I'll try and explain a bit about why parallelism is important to AI using Symbolic AI as an example, as its probably the area of AI that I understand best. Basically its what everyone was using back in the day when LISP was invented, and the Physical Symbol Hypothesis on which it is based is more or less the same way you would go about calculating and modelling stuff in LISP code. This link explains a bit about it:
http://www.cs.st-andrews.ac.uk/~mkw/IC_Group/What_is_Symbolic_AI_full.html
So basically the idea is that you create a model of your environment, then searching through it to find a solution. One of the simplest to algorithms to implement is a breadth first search, which is an exhaustive search of all possible states. While producing an optimal result, it is usually prohibitively time consuming. One way to optimise this is by using a heuristic (A* being an example), another is to divide the work between CPUs.
Due to statelessness, in theory, any node you expand in your search could be ran in a separate thread without the complexity or overhead involved in locking shared data. In general, assuming the hardware can support it, then the more highly you can parallelise a task the faster you will get your result. An example of this could be the folding#home project, which distributes work over many GPUs to find optimal protein folding configurations (that may not have anything to do with LISP, but is relevant to parallelism).
As far as I know from LISP is that is a Functional Programming Language, and with it you are able to make "programs that make programs. I don't know if my answer suits your needs, see above links for more information.
Pattern matching constructs with instantiation (or the ability to easily construct pattern matching code) are a big plus. Pattern matching is not totally necessary to do A.I., but it can sure simplify the code for many A.I. tasks. I'm finding this also makes F# a convenient language for A.I.
Languages per se (without libraries) are suitable/comfortable for specific areas of research/investigation and/or learning/studying ("how to do the simplest things in the hardest way").
Suitability for commercial development is determined by availability of frameworks, libraries, development tools, communities of developers, adoption by companies. For ex., in internet you shall find support for any, even the most exotic issue/areas (including, of course, AI areas), for ex., in C# because it is mainstream.
BTW, what specifically is context of question? AI is so broad term.
Update:
Oooops, I really did not expect to draw attention and discussion to my answer.
Under ("how to do the simplest things in the hardest way"), I mean that studying and learning, as well as academic R&D objectives/techniques/approaches/methodology do not coincide with objectives of (commercial) development.
In student (or even academic) projects one can write tons of code which would probably require one line of code in commercial RAD (using of component/service/feature of framework or library).
Because..! oooh!
Because, there is no sense to entangle/develop any discussion without first agreeing on common definitions of terms... which are subjective and depend on context... and are not so easy to be formulate in general/abstract context.
And this is inter-disciplinary matter of whole areas of different sciences
The question is broad (philosophical) and evasively formulated... without beginning and end... having no definitive answers without of context and definitions...
Are we going to develop here some spec proposal?

About Dijkstra's paper

I am reading Coders at Work.
I came across this paragraph in Donald Knuth's interview.
Seibel: It seems a lot of the people I’ve talked to had direct access to a machine when they were starting out. Yet Dijkstra has a paper I’m sure you’re familiar with, where he basically says we shouldn’t let computer-science students touch a machine for the first few years of their training; they should spend all their time manipulating symbols.
I want link to that paper. Which one paper is that? (He wrote too many :-)
Maybe this one?
Excerpt, from near the end:
Before we part, I would like to invite you to consider the following way of doing justice to computing's radical novelty in an introductory programming course.
On the one hand, we teach what looks like the predicate calculus, but we do it very differently from the philosophers. In order to train the novice programmer in the manipulation of uninterpreted formulae, we teach it more as boolean algebra, familiarizing the student with all algebraic properties of the logical connectives. To further sever the links to intuition, we rename the values {true, false} of the boolean domain as {black, white}.
On the other hand, we teach a simple, clean, imperative programming language, with a skip and a multiple assignment as basic statements, with a block structure for local variables, the semicolon as operator for statement composition, a nice alternative construct, a nice repetition and, if so desired, a procedure call. To this we add a minimum of data types, say booleans, integers, characters and strings. The essential thing is that, for whatever we introduce, the corresponding semantics is defined by the proof rules that go with it.
Right from the beginning, and all through the course, we stress that the programmer's task is not just to write down a program, but that his main task is to give a formal proof that the program he proposes meets the equally formal functional specification. While designing proofs and programs hand in hand, the student gets ample opportunity to perfect his manipulative agility with the predicate calculus. Finally, in order to drive home the message that this introductory programming course is primarily a course in formal mathematics, we see to it that the programming language in question has not been implemented on campus so that students are protected from the temptation to test their programs. And this concludes the sketch of my proposal for an introductory programming course for freshmen.
I found a manuscript of Dijkstra's "Cruelty" lecture.