What is the metasyntax used in the Scala spec? - scala

I'm reading the Scala spec, link to 2.13. What is the exact metasyntax (I think that's the term) that's used to express the Spec? I believe it's an extension of the Backus-Naur Form but I can't quite figure out every component.
Note: I'm not an expert on programming theory, so I hope you apologize if I use some terms incorrectly, my goal is simply to understand how the spec is defined.

I believe it's an extension of the Backus-Naur Form
I believe that "an extension of the Backus-Naur Form" is the closest you can get.
Various extensions, augmentations, variations, and modifications of BNF are in so widespread use in Programming Language Specifications that most authors do not bother specifying them, since everybody is just so used to reading them all the time that they understand them intuitively.
Chapter 13 – Syntax Summary, section 13.1 – Lexical Syntax says the following [bold emphasis mine]:
The lexical syntax of Scala is given by the following grammar in EBNF form
But that is still ambiguous. Originally, EBNF was designed by Niklaus Wirth, and it would make sense that Martin Odersky uses it, since he studied under Wirth.
However, there are many slight variations of Wirth's EBNF in widespread use, plus, other "extended versions" of BNF are also referred to as "EBNF", so that does not really tell you anything either.

Related

What is the name of the programming style enabled by dependent types (think Coq or Agda)?

There is a programming "style" (or maybe paradigm, i'm not sure what to call it) which is as follows:
First, you write a specification: a formal description of what your (whole, or part of) program is to do. This is done within the programming system; it is not a separate artifact.
Then, you write the program, but - and this is the key distinction between this programming style and others - every step of this writing task is guided in some way by the specification you've written in the previous step. How exactly this guidance happens varies wildly; in Coq you have a metaprogramming language (Ltac) which lets you "refine" the specification while building the actual program behind the scenes, whereas in Agda you compose a program by filling "holes" (i'm not actually sure how it goes in Agda, as i'm mostly used to Coq).
This isn't exactly everyone's favorite style of programming, but i'd like to try practicing it in general-purpose, popular programming languages. At least in Coq i've found it to be fairly addictive!
...but how would i even search for ways to do it outside proof assistants? Which leads us to the question: I'm looking for a name for this programming style, so that i can try looking up tools that let me program like that in other programming languages.
Mind you, of course a more proper question would be directly asking for examples of such tools, but AFAIK questions asking for lists of answers aren't appropriate for Stack Exchange sites.
And to be clear, i'm not all that hopeful i'm really going to find much; these are mostly academic pastimes, and your typical programming language isn't really amenable to this style of programming (for example, the specification language might end up being impossibly complex). But it's worth a shot!
It is called proof-driven development (or type-driven development). However, there is very little information about it.
This process you mention about slowly creating your program by means of ltac (in the case of coq) or holes (in the case of Agda and Idris) is called refinement. So you will also find reference in the literature for this style as proof by refinement or programming by refinement.
Now the most important thing to realize is that this style of programming is intrinsic to more complex type system that will allow you to extract as much information as possible the current environment. So it is natural to find attached with dependent types, although it is not necessarily the case.
As mentioned in another response you're also going to find references to it as Type-Driven Development, there is an idris book about it.
You may be interested in looking into some other projects such as Lean, Isabelle, Idris, Agda, Cedille, and maybe Liquid Haskell, TLA+ and SAW.
As pointed out by the two previous answers, a possible name for the program style you mention certainly is: type-driven development.
From the Coq viewpoint, you might be interested in the following two references:
Certified Programming with Dependent Types (CPDT, by Adam Chlipala): a Coq textbook that teaches advanced techniques to develop dependently-typed Coq theories and automate related proofs.
Experience Report: Type-Driven Development of Certified Tree Algorithms in Coq (by Reynald Affeldt, Jacques Garrigue, Xuanrui Qi, Kazunari Tanaka), published at the Coq Workshop 2019 (slides, extended abstract):
The authors also use the acronym TDD, which interestingly enough, also has another acceptation in the software engineering community: test-driven development (this widely used methodology naturally leads to high-quality test suites).
Actually, both acceptations of TDD share a common idea: one systematically starts by writing the specification (of the considered unit), then only after that, writing some code that fulfills the spec (make the unit tests pass), then we loop and incrementally specify+implement(+refactor) other code units.
Last but not least, there are some extra pointers in this discussion from the Discourse OCaml forum.

How to avoid writing confusing DSLs in Scala

I've read comments stating that Scala's flexibility makes it easy for developers to write DSLs that are difficult to understand and reason about.
DSLs are possible because
we can sometimes omit . and parentheses (e.g. List(1) map println)
we can sometimes interchange () and {}
we have implicit values, parameters, and classes (also conversions, which are now discouraged)
there is a relatively small number of reserved symbols in the language (e.g. I can define + for my class)
and possibly other language features.
How can I avoid writing confusing DSLs ... what are the common antipatterns? Where is a DSL not appropriate?
Whenever you create DSL of your own you're embedding new language into Scala, which is not standard, so it doesn't follow standard code guides, conventions, etc.
I would say it's nothing wrong with adding new DSL as long you add proper documentation, explain the purpose of creating it and add examples of usage. If you feel adding new DSL would increase readability of your code, just go for it, but remember that whenever anyone encounters your DSL and it won't be documented enough, they will be very confused.
A good example of well-documented and serving good purpose DSL would be matchers of scalatest or Scala duration.

How will Dotty change pure functional programming in Scala?

In this question from 2013, Mr. Odersky notes that "it's too early to tell" whether libraries like Scalaz will be able to exist (at least in their current state) under Dotty, due to the castration of higher-kinded and existential types.
In the time passed, has Dotty's implications for Scalaz & Cats been elucidated? Will proposed features like built-in Effects and Records change the scope of these projects?
I understand that Dotty is still a ways off from replacing scalac, but as I am considering investing time applying purely functional constructs and methodologies to my work, I believe it important to consider the future of its flagship libraries.
One example of the latest on Dotty is "Scaling Scala" By Chris McKinlay (December 15, 2016) (the same article also mention the Scalaz and Cats situation)
Martin Odersky has been leading work on Dotty, a novel research compiler based on the Dependent Object Types (DOT) calculus (basically a simplified version of Scala) and ideas from the functional programming (FP) database community.
The team working on Dotty development has shown some remarkable improvements over the state of the art, most notably with respect to compilation times. I asked Odersky what he thought was novel about the Dotty architecture and would help end users. Here’s what he said:
Two things come to mind:
first, it's closely related to formal foundations, giving us better guidance on how to design a sound typesystem. This will lead to fewer surprises for users down the road.
Second, it has an essentially functional architecture. This makes it easier to extend, easier to get correct, and will lead to more robust APIs where the compiler is used as a service for IDEs and meta programming.
Although Dotty opens up a number of interesting language possibilities (notably full-spectrum dependent types, a la Agda and Idris), Odersky has chosen to prioritize making it immediately useful to the community. Language differences are fairly small, and most of them are in order to either simplify the language (like removing procedure syntax) or fix bugs (unsound pattern matching) or both (early initializers).
Still, I couldn’t resist asking him if there is any chance of full-spectrum dependent types ending up in Scala at some point. Here is what he said:
Never say never :-). In fact, we are currently working with Viktor Kuncak on integrating the Leon program prover with Scala, which demands richer dependent types than we have now. But it's currently strictly research, with a completely open outcome.
The Scala and Dotty teams are working closely toward convergence for Scala 2.x and Dotty, and they’ve indicated that they take continuity very seriously. Scala 2.12 and 2.13 have language flags that unlock features being incubated in Dotty (e.g., existential types), and the Dotty compiler has a Scala 2 compatibility mode. There’s even a migration tool.

Definition of a certified program

I see a couple of different research groups, and at least one book, that talk about using Coq for designing certified programs. Is there are consensus on what the definition of certified program is? From what I can tell, all it really means is that the program was proved total and type correct. Now, the program's type may be something really exotic such as a list with a proof that it's nonempty, sorted, with all elements >= 5, etc. However, ultimately, is a certified program just one that Coq shows is total and type safe, where all the interesting questions boil down to what was included in the final type?
Edit 1
Based on wjedynak's answer, I had a look at Xavier Leroy's paper "Formal Verification of a Realistic Compiler", which is linked in the answers below. I think this contains some good information, but I think the more informative information in this sequence of research can be found in the paper Mechanized Semantics for the Clight Subset of the C Language by Sandrine Blazy and Xavier Leroy. This is the language that the "Formal Verification" paper adds optimizations to. In it, Blazy and Leroy present the syntax and semantics of the Clight language and then discuss the validation of these semantics in section 5. In section 5, there's a list of different strategies used for validating the compiler, which in some sense provides an overview of different strategies for creating a certified program. These are:
Manual reviews
Proving properties of the semantics
Verified translations
Testing executable semantics
Equivalence with alternate semantics
In any case, there are probably points that could be added and I'd certainly like to hear about more.
Going back to my original question of what the definition is of a certified program, it's still a little unclear to me. Wjedynak sort of provides an answer, but really the work by Leroy involved creating a compiler in Coq and then, in some sense, certifying it. In theory, it makes it possible to now prove things about the C programs since we can now go C->Coq->proof. In that sense, it seems like there's this work flow where we could
Write a program in X language
Form of a model of the program from step 1 in Coq or some other proof assistant tool. This could involve creating a model of the programming language in Coq or it could involve creating a model of the program directly (i.e. rewriting the program itself in Coq).
Prove some property about the model. Maybe it's a proof about the values. Maybe it's the proof of the equivalence of statements (stuff like 3=1+2 or f(x,y)=f(y,x), whatever.)
Then, based on these proofs, call the original program certified.
Alternatively, we could create a specification of a program in a proof assistant tool and then prove properties about the specification, but not the program itself.
In any case, I'm still interested in hearing alternative definitions if anyone has them.
I agree that the notion seems vague, but in my understanding a certified program is a program equipped/together with the proof of correctness. Now, by using rich and expressive type signatures you can make it so there is no need for a separate proof, but this is often only a matter of convenience. The real issue is what do we mean by correctness: this a matter of specification. You can take a look at e.g. Xavier Leroy. Formal verification of a realistic compiler.
First note that the phrase "certified" has a slightly French bias: elsewhere the expression "verified" or "proven" is often used.
In any case it is important to ask what that actually means. X. Leroy and CompCert is a very good starting point: it is a big project about C compiler verification, and Leroy is always keen to explain to his audience why verification matters. Especially when talking to people from "certification agencies" who usually mean testing, not proving.
Another big verification project is L4.verified which uses Isabelle/HOL. This part of the exposition explains a bit what is actually stated and proven, and what are the consequences. Unfortunately, the actual proof is top secret, so it cannot be checked publicly.
A certified program is a program that is paired with a proof that the program satisfies its specification, i.e., a certificate. The key is that there exists a proof object that can be checked independently of the tool that produced the proof.
A verified program has undergone verification, which in this context may typically mean that its specification has been formalized and proven correct in a system like Coq, but the proof is not necessarily certified by an external tool.
This distinction is well attested in the scientific literature and is not specific to Francophones. Xavier Leroy describes it very clearly in Section 2.2 of A formally verified compiler back-end.
My understanding is that "certified" in this sense is, as Makarius pointed out, an English word chosen by Francophones where native speakers might instead have used "formally verified". Coq was developed in France, and has many Francophone developers and users.
As to what "formal verification" means, Wikipedia notes (license: CC BY-SA 3.0) that it:
is the act of proving ... the correctness of intended algorithms underlying a system with respect to a certain formal specification or property, using formal methods of mathematics.
(I realise you would like a much more precise definition than this. I hope to update this answer in future, if I find one.)
Wikipedia especially notes the difference between verification and validation:
Validation: "Are we trying to make the right thing?", i.e., is the product specified to the user's actual needs?
Verification: "Have we made what we were trying to make?", i.e., does the product conform to the specifications?
The landmark paper seL4: Formal Verification of an OS Kernel (Klein, et al., 2009) corroborates this interpretation:
A cynic might say that an implementation proof only shows that the
implementation has precisely the same bugs that the specification
contains. This is true: the proof does not guarantee that the
specification describes the behaviour the user expects. The
difference [in a verified approach compared to a non-verified one]
is the degree of abstraction and the absence of whole classes of bugs.
Which classes of bugs are those? The Agda tutorial gives some idea:
no runtime errors (inevitable errors like I/O errors are handled; others are excluded by design).
no non-productive infinite loops.
It may means free of runtime error (numeric overflow, invalid references …), which is already good compared to most developed software, while still weak. The other meaning is proved to be correct according to a domain formalization; that is, it does not only have to be formally free of runtime errors, it also has to be proved to do what it's expected to do (which must have been precisely defined).

Writing programs in dynamic languages that go beyond what the specification allows

With the growth of dynamically typed languages, as they give us more flexibility, there is the very likely probability that people will write programs that go beyond what the specification allows.
My thinking was influenced by this question, when I read the answer by bobince:
A question about JavaScript's slice and splice methods
The basic thought is that splice, in Javascript, is specified to be used in only certain situations, but, it can be used in others, and there is nothing that the language can do to stop it, as the language is designed to be extremely flexible.
Unless someone reads through the specification, and decides to adhere to it, I am fairly certain that there are many such violations occuring.
Is this a problem, or a natural extension of writing such flexible languages? Or should we expect tools like JSLint to help be the specification police?
I liked one answer in this question, that the implementation of python is the specification. I am curious if that is actually closer to the truth for these types of languages, that basically, if the language allows you to do something then it is in the specification.
Is there a Python language specification?
UPDATE:
After reading a couple of comments, I thought I would check the splice method in the spec and this is what I found, at the bottom of pg 104, http://www.mozilla.org/js/language/E262-3.pdf, so it appears that I can use splice on the array of children without violating the spec. I just don't want people to get bogged down in my example, but hopefully to consider the question.
The splice function is intentionally generic; it does not require that its this value be an Array object.
Therefore it can be transferred to other kinds of objects for use as a method. Whether the splice function
can be applied successfully to a host object is implementation-dependent.
UPDATE 2:
I am not interested in this being about javascript, but language flexibility and specs. For example, I expect that the Java spec specifies you can't put code into an interface, but using AspectJ I do that frequently. This is probably a violation, but the writers didn't predict AOP and the tool was flexible enough to be bent for this use, just as the JVM is also flexible enough for Scala and Clojure.
Whether a language is statically or dynamically typed is really a tiny part of the issue here: a statically typed one may make it marginally easier for code to enforce its specs, but marginally is the key word here. Only "design by contract" -- a language letting you explicitly state preconditions, postconditions and invariants, and enforcing them -- can help ward you against users of your libraries empirically discovering what exactly the library will let them get away with, and taking advantage of those discoveries to go beyond your design intentions (possibly constraining your future freedom in changing the design or its implementation). And "design by contract" is not supported in mainstream languages -- Eiffel is the closest to that, and few would call it "mainstream" nowadays -- presumably because its costs (mostly, inevitably, at runtime) don't appear to be justified by its advantages. "Argument x must be a prime number", "method A must have been previously called before method B can be called", "method C cannot be called any more once method D has been called", and so on -- the typical kinds of constraints you'd like to state (and have enforced implicitly, without having to spend substantial programming time and energy checking for them yourself) just don't lend themselves well to be framed in the context of what little a statically typed language's compiler can enforce.
I think that this sort of flexibility is an advantage as long as your methods are designed around well defined interfaces rather than some artificial external "type" metadata. Most of the array functions only expect an object with a length property. The fact that they can all be applied generically to lots of different kinds of objects is a boon for code reuse.
The goal of any high level language design should be to reduce the amount of code that needs to be written in order to get stuff done- without harming readability too much. The more code that has to be written, the more bugs get introduced. Restrictive type systems can be, (if not well designed), a pervasive lie at worst, a premature optimisation at best. I don't think overly restrictive type systems aid in writing correct programs. The reason being that the type is merely an assertion, not necessarily based on evidence.
By contrast, the array methods examine their input values to determine whether they have what they need to perform their function. This is duck typing, and I believe that this is more scientific and "correct", and it results in more reusable code, which is what you want. You don't want a method rejecting your inputs because they don't have their papers in order. That's communism.
I do not think your question really has much to do with dynamic vs. static typing. Really, I can see two cases: on one hand, there are things like Duff's device that martin clayton mentioned; that usage is extremely surprising the first time you see it, but it is explicitly allowed by the semantics of the language. If there is a standard, that kind of idiom may appear in later editions of the standard as a specific example. There is nothing wrong with these; in fact, they can (unless overused) be a great productivity boost.
The other case is that of programming to the implementation. Such a case would be an actual abuse, coming from either ignorance of a standard, or lack of a standard, or having a single implementation, or multiple implementations that have varying semantics. The problem is that code written in this way is at best non-portable between implementations and at worst limits the future development of the language, for fear that adding an optimization or feature would break a major application.
It seems to me that the original question is a bit of a non-sequitor. If the specification explicitly allows a particular behavior (as MUST, MAY, SHALL or SHOULD) then anything compiler/interpreter that allows/implements the behavior is, by definition, compliant with the language. This would seem to be the situation proposed by the OP in the comments section - the JavaScript specification supposedly* says that the function in question MAY be used in different situations, and thus it is explicitly allowed.
If, on the other hand, a compiler/interpreter implements or allows behavior that is expressly forbidden by a specification, then the compiler/interpreter is, by definition, operating outside the specification.
There is yet a third scenario, and an associated, well defined, term for those situations where the specification does not define a behavior: undefined. If the specification does not actually specify a behavior given a particular situation, then the behavior is undefined, and may be handled either intentionally or unintentionally by the compiler/interpreter. It is then the responsibility of the developer to realize that the behavior is not part of the specification, and, should s/he choose to leverage the behavior, the developer's application is thereby dependent upon the particular implementation. The interpreter/compiler providing that implementation is under no obligation to maintain the officially undefined behavior beyond backwards compatibility and whatever commitments the producer may make. Furthermore, a later iteration of the language specification may define the previously undefined behavior, making the compiler/interpreter either (a) non-compliant with the new iteration, or (b) come out with a new patch/version to become compliant, thereby breaking older versions.
* "supposedly" because I have not seen the spec, myself. I go by the statements made, above.