What did John McCarthy mean by *pornographic programming*? - lisp

In the History of Lisp, McCarthy writes :
The unexpected appearance of an interpreter tended to freeze the form of the language, and some of the decisions made rather lightheartedly for the ``Recursive functions ...'' paper later proved unfortunate. These included the COND notation for conditional expressions which leads to an unnecessary depth of parentheses, and the use of the number zero to denote the empty list NIL and the truth value false. Besides encouraging pornographic programming, giving a special interpretation to the address 0 has caused difficulties in all subsequent implementations.
What's he talking about?

... zero to denote the empty list ...
because 0==() has been the emoticon for pornography since 1958.
Now you know.

The fact that too many implementation details were leaking at a higher level, i.e. showing up too much

The original Fortran III spec document, a technical paper disseminated in the Winter of 1958 describes some very explicit additions to the Fortran II language, including ... inline assembly.
The PDF is here
A tantalizing description of the "additions" follows :
Some taboo code is
Mysteriously, Fortran-III was never released to the public (see section 5.), but disseminated in limited fashion before quietly fading away.

I think it is about mixing numerical and logic values, which can still be seen in popular constructs, probably originated in Fortran, like while (1). There are a lot of "clever" C algorithms, that rely on the fact, that 0 is false and every other value isn't.
The same applies at large to API calls, like in POSIX or Linux kernel, some of which return 0 on failure, while some -1 (there's a rule of thumb, when to apply which, but it is just folklore, so often it is broken). Considering the fact, that at McCarthy's time, those things weren't developed yet, you can see his "prophetic" power even here.

Perhaps it was his way of talking about null references: the billion dollar mistake (T. Hoare).

Related

Why do numbers outside of variable assignments compile?

Why in Swift can I type numbers without assigning them to var or let, and the code will compile fine? What is this good for?
import Foundation
55
25
23+8
print("Hello, World!")
4
11
-45
What is this good for?
It isn't good for anything, in the sense that it doesn't cause anything to happen with regard to the flow of the program. It's just something you're doing for fun, if you see what I mean. The numeric value is not being captured, so in effect it is immediately thrown away, like a virtual particle that flashes into existence one moment and back out of existence the next.
The reason it is legal is that a Swift statement is an evaluatable expression. A numeric literal is an evaluatable expression, so it's legal — though useless — as an independent statement.
You can see the same thing in many other ways. This is legal:
let n = 23
n
n is an evaluatable expression, so it's legal as a separate statement. But it is useless.
EDIT In answer to your comment below: I see no reason why a useless statement should prevent a program from compiling. But in a case like this, I would agree that it might be helpful if the compiler would warn that you're doing something useless, and in fact, for at least one case of this sort of thing, I have filed a bug report requesting this.
Swift is a language that has side effects, meaning that some operations can result in a mutation of the global state of the program. This has the implications that the compiler cannot simply eliminate a stand-alone statement without making extra sure that it can do so without affecting the execution of the program. Due to the complexity of state in a program, this is generally not a trivial problem, therefore in order not to penalize users who wish to invoke functions that have side effects, a line must be drawn; Swift has chosen to let users put any kind of stand-alone statement, even ones that are obviously free of side effects (constants or constant expressions), rather than spend a lot of effort trying to differentiate among various possibilities.
There could be a compile-time warning that shows lines of code that have no effect. I don't know much about Swift so I can't tell you, but if there is, you should be able to find it in the documentation.
It has to do with the fact that a number or an expression is a valid statement. So it can exist on its own. Some other languages have the same or similar feature.
Like in any language you can pronounce sentences of no interest... 55 is such a swift sentence. You may think it could be easy to eliminate unuseful sentences, but it is impossible, so why not just eliminate (at compile-time) the obvious ones? Because, in general people don't try to use them or just to exercise, and it will not worth the effort. It would be easy to forbid 55, but what about n=n? Easy? Or n=2*n-n(it is much more tricky because you would need to implement mathematical properties of expressions inside the compiler, and optimisation tricks for them).

What's the point of using "map()" for two elements in perl?

I've seen code where there are just two rather static elements to be mapped such as time intervals with start and end dates, yet map() is being used rather than explicit code for mapping, e.g.
{ map { ... } qw(start end) } # vs.
{ start => ..., end => ... }
Which way is preferrable, and why?
The map form may be less concise but looks more functional (as in functional programming), so I guess that's why it may be preferred over explicit code and is perhaps more DRY.
However, it looks less legible to me because there is more logic going on behind, and mapping should also be less efficient because it invokes calls a and consists of more atomic operations.
EDIT
There is a conflicting goal in programming: KISS (keep it { pick 2 from: small, simple, stupid }). Using map slightly complicates code.
Assuming you're not just setting both items to the same constant or something similarly trivial, I would expect the map version to be more concise.
IMO, the main point in favor of the map version is that you know the same process will be used to produce both values. Not only for the sake of DRY, but also because it eliminates any concern that one might have a subtle change which the other doesn't.
As for the performance concern... If your use case is sufficiently performance-sensitive for any potential difference to matter, then you shouldn't be using Perl in the first place. Switching to well-written C (not C#, not C++, not Objective C - just plain C) will have a far greater performance impact than micro-optimizing whether you assign two values individually vs. using a loop to set them. But the odds of your use case being that sensitive are approximately zero anyhow.
There is a principle of coding known as DRY. Don't Repeat Yourself.
It asserts that:
Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.
And that can be interpreted as condensing duplicate typing with (things like) map/for.
I use idioms like the one you've quoted when I'm trying to expand some text - for example:
my #defs = map { "DEF:$_=$source_file:$_:MAX" } qw ( read write );
This generates me some DEF lines for rrdtool.
I'm doing it this way, because for some cases, I've got considerably longer lists of 'things I want to define' and want to be consistent. (Sometimes I have say, 10 similar lines that differ only by a single word).
But also because:
my #defs = ( "DEF:read=$source_file:read:MAX",
"DEF:write=$source_file:write:MAX" );
There's not much in it for two elements, and I'd suggest it's as much a matter of style as anything. However, if you've got more than that, it quickly becomes very beneficial because you can change the single line - say you've got a different file location? Want to swap MAX for AVERAGE?
It's also quite shockingly easy to go 'punctuation blind' when looking at a long sequence of similar statements, where someone's typo-ed and added a , where it should be . or similar.
And ... you probably don't lose a great deal in terms of readability. But will acknowledge that's something of a style point, because whilst map is pretty amazing, it can make for some rather hard to read code if you're not careful.
Also to specifically address:
mapping should also be less efficient because it invokes calls a and consists of more atomic operations.
A wise man once said:
premature optimization is the root of all evil
Don't think about the efficiency of a statement - look at the legibility/readability. Compilers are pretty clever. Most "obvious" optimisations, they already make for you. Processors are also pretty fast. Your limiting factor in most code isn't the amount of CPU cycles you need, it's IO throughput and memory footprint. So don't worry about it - write clear code.
And if there's a performance critical demand on your code, you should be using a code profiler to look at where you gain the most efficiency for your effort at refactoring. You may end up with less clear code in doing so (sometimes) but that's a more clear tradeoff.

Definition of a certified program

I see a couple of different research groups, and at least one book, that talk about using Coq for designing certified programs. Is there are consensus on what the definition of certified program is? From what I can tell, all it really means is that the program was proved total and type correct. Now, the program's type may be something really exotic such as a list with a proof that it's nonempty, sorted, with all elements >= 5, etc. However, ultimately, is a certified program just one that Coq shows is total and type safe, where all the interesting questions boil down to what was included in the final type?
Edit 1
Based on wjedynak's answer, I had a look at Xavier Leroy's paper "Formal Verification of a Realistic Compiler", which is linked in the answers below. I think this contains some good information, but I think the more informative information in this sequence of research can be found in the paper Mechanized Semantics for the Clight Subset of the C Language by Sandrine Blazy and Xavier Leroy. This is the language that the "Formal Verification" paper adds optimizations to. In it, Blazy and Leroy present the syntax and semantics of the Clight language and then discuss the validation of these semantics in section 5. In section 5, there's a list of different strategies used for validating the compiler, which in some sense provides an overview of different strategies for creating a certified program. These are:
Manual reviews
Proving properties of the semantics
Verified translations
Testing executable semantics
Equivalence with alternate semantics
In any case, there are probably points that could be added and I'd certainly like to hear about more.
Going back to my original question of what the definition is of a certified program, it's still a little unclear to me. Wjedynak sort of provides an answer, but really the work by Leroy involved creating a compiler in Coq and then, in some sense, certifying it. In theory, it makes it possible to now prove things about the C programs since we can now go C->Coq->proof. In that sense, it seems like there's this work flow where we could
Write a program in X language
Form of a model of the program from step 1 in Coq or some other proof assistant tool. This could involve creating a model of the programming language in Coq or it could involve creating a model of the program directly (i.e. rewriting the program itself in Coq).
Prove some property about the model. Maybe it's a proof about the values. Maybe it's the proof of the equivalence of statements (stuff like 3=1+2 or f(x,y)=f(y,x), whatever.)
Then, based on these proofs, call the original program certified.
Alternatively, we could create a specification of a program in a proof assistant tool and then prove properties about the specification, but not the program itself.
In any case, I'm still interested in hearing alternative definitions if anyone has them.
I agree that the notion seems vague, but in my understanding a certified program is a program equipped/together with the proof of correctness. Now, by using rich and expressive type signatures you can make it so there is no need for a separate proof, but this is often only a matter of convenience. The real issue is what do we mean by correctness: this a matter of specification. You can take a look at e.g. Xavier Leroy. Formal verification of a realistic compiler.
First note that the phrase "certified" has a slightly French bias: elsewhere the expression "verified" or "proven" is often used.
In any case it is important to ask what that actually means. X. Leroy and CompCert is a very good starting point: it is a big project about C compiler verification, and Leroy is always keen to explain to his audience why verification matters. Especially when talking to people from "certification agencies" who usually mean testing, not proving.
Another big verification project is L4.verified which uses Isabelle/HOL. This part of the exposition explains a bit what is actually stated and proven, and what are the consequences. Unfortunately, the actual proof is top secret, so it cannot be checked publicly.
A certified program is a program that is paired with a proof that the program satisfies its specification, i.e., a certificate. The key is that there exists a proof object that can be checked independently of the tool that produced the proof.
A verified program has undergone verification, which in this context may typically mean that its specification has been formalized and proven correct in a system like Coq, but the proof is not necessarily certified by an external tool.
This distinction is well attested in the scientific literature and is not specific to Francophones. Xavier Leroy describes it very clearly in Section 2.2 of A formally verified compiler back-end.
My understanding is that "certified" in this sense is, as Makarius pointed out, an English word chosen by Francophones where native speakers might instead have used "formally verified". Coq was developed in France, and has many Francophone developers and users.
As to what "formal verification" means, Wikipedia notes (license: CC BY-SA 3.0) that it:
is the act of proving ... the correctness of intended algorithms underlying a system with respect to a certain formal specification or property, using formal methods of mathematics.
(I realise you would like a much more precise definition than this. I hope to update this answer in future, if I find one.)
Wikipedia especially notes the difference between verification and validation:
Validation: "Are we trying to make the right thing?", i.e., is the product specified to the user's actual needs?
Verification: "Have we made what we were trying to make?", i.e., does the product conform to the specifications?
The landmark paper seL4: Formal Verification of an OS Kernel (Klein, et al., 2009) corroborates this interpretation:
A cynic might say that an implementation proof only shows that the
implementation has precisely the same bugs that the specification
contains. This is true: the proof does not guarantee that the
specification describes the behaviour the user expects. The
difference [in a verified approach compared to a non-verified one]
is the degree of abstraction and the absence of whole classes of bugs.
Which classes of bugs are those? The Agda tutorial gives some idea:
no runtime errors (inevitable errors like I/O errors are handled; others are excluded by design).
no non-productive infinite loops.
It may means free of runtime error (numeric overflow, invalid references …), which is already good compared to most developed software, while still weak. The other meaning is proved to be correct according to a domain formalization; that is, it does not only have to be formally free of runtime errors, it also has to be proved to do what it's expected to do (which must have been precisely defined).

How to write a X86_64 _assembler_?

Goal: I want to write an X86_64 assembler. Note: marked as community wiki
Background: I'm familiar with C. I've written MIPS assembly before. I've written some x86 assembly. However, I want to write an x86_64 assembler -- it should output machine code that I can jump to and start executing (like in a JIT).
Question is: what is the best way to approach this? I realize this problem looks kind large to tackle. I want to start out with a basic minimum set:
Load into register
Arithmetric ops on registers (just integers is fine, no need to mess with FPU yet)
Conditionals
Jumps
Just a basic set to make it Turing complete. Anyone done this? Suggestions / resources?
An assembler, like any other "compiler", is best written as a lexical analyser feeding into a language grammar processor.
Assembly language is usually easier than the regular compiled languages since you don't need to worry about constructs crossing line boundaries and the format is usually fixed.
I wrote an assembler for a (fictional) CPU some two years ago for educational purposes and it basically treated each line as:
optional label (e.g., :loop).
operation (e.g., mov).
operands (e.g., ax,$1).
The easiest way to do it is to ensure that tokens are easily distinguishable.
That's why I made the rule that labels had to begin with : - it made the analysis of the line so much easier. The process for handling a line was:
strip off comments (first ; outside a string to end of line).
extract label if present.
first word is then the operation.
rest are the operands.
You can easily insist that different operands have special markers as well, to make your life easier. All this is assuming you have control over the input format. If you're required to use Intel or AT&T format, it's a little more difficult.
The way I approached it is that there was a simple per-operation function that got called (e.g., doJmp, doCall, doRet) and that function decided on what was allowed in the operands.
For example, doCall only allows a numeric or label, doRet allows nothing.
For example, here's a code segment from the encInstr function:
private static MultiRet encInstr(
boolean ignoreVars,
String opcode,
String operands)
{
if (opcode.length() == 0) return hlprNone(ignoreVars);
if (opcode.equals("defb")) return hlprByte(ignoreVars,operands);
if (opcode.equals("defbr")) return hlprByteR(ignoreVars,operands);
if (opcode.equals("defs")) return hlprString(ignoreVars,operands);
if (opcode.equals("defw")) return hlprWord(ignoreVars,operands);
if (opcode.equals("defwr")) return hlprWordR(ignoreVars,operands);
if (opcode.equals("equ")) return hlprNone(ignoreVars);
if (opcode.equals("org")) return hlprNone(ignoreVars);
if (opcode.equals("adc")) return hlprTwoReg(ignoreVars,0x0a,operands);
if (opcode.equals("add")) return hlprTwoReg(ignoreVars,0x09,operands);
if (opcode.equals("and")) return hlprTwoReg(ignoreVars,0x0d,operands);
The hlpr... functions simply took the operands and returned a byte array containing the instructions. They're useful when many operations have similar operand requirements, such as adc,addandand` all requiring two register operands in the above case (the second parameter controlled what opcode was returned for the instruction).
By making the types of operands easily distinguishable, you can check what operands are provided, whether they are legal and which byte sequences to generate. The separation of operations into their own functions provides for a nice logical structure.
In addition, most CPUs follow a reasonably logical translation from opcode to operation (to make the chip designers lives easier) so there will be very similar calculations on all opcodes that allow, for example, indexed addressing.
In order to properly create code in a CPU that allows variable-length instructions, you're best of doing it in two passes.
In the first pass, don't generate code, just generate the lengths of instructions. This allows you to assign values to all labels as you encounter them. The second pass will generate the code and can fill in references to those labels since their values are known. The ignoreVars in that code segment above was used for this purpose (byte sequences of code were returned so we could know the length but any references to symbols just used 0).
Not to discourage you, but there are already many assemblers with various bells and whistles. Please consider contributing to an existing open source project like elftoolchain.

Writing programs in dynamic languages that go beyond what the specification allows

With the growth of dynamically typed languages, as they give us more flexibility, there is the very likely probability that people will write programs that go beyond what the specification allows.
My thinking was influenced by this question, when I read the answer by bobince:
A question about JavaScript's slice and splice methods
The basic thought is that splice, in Javascript, is specified to be used in only certain situations, but, it can be used in others, and there is nothing that the language can do to stop it, as the language is designed to be extremely flexible.
Unless someone reads through the specification, and decides to adhere to it, I am fairly certain that there are many such violations occuring.
Is this a problem, or a natural extension of writing such flexible languages? Or should we expect tools like JSLint to help be the specification police?
I liked one answer in this question, that the implementation of python is the specification. I am curious if that is actually closer to the truth for these types of languages, that basically, if the language allows you to do something then it is in the specification.
Is there a Python language specification?
UPDATE:
After reading a couple of comments, I thought I would check the splice method in the spec and this is what I found, at the bottom of pg 104, http://www.mozilla.org/js/language/E262-3.pdf, so it appears that I can use splice on the array of children without violating the spec. I just don't want people to get bogged down in my example, but hopefully to consider the question.
The splice function is intentionally generic; it does not require that its this value be an Array object.
Therefore it can be transferred to other kinds of objects for use as a method. Whether the splice function
can be applied successfully to a host object is implementation-dependent.
UPDATE 2:
I am not interested in this being about javascript, but language flexibility and specs. For example, I expect that the Java spec specifies you can't put code into an interface, but using AspectJ I do that frequently. This is probably a violation, but the writers didn't predict AOP and the tool was flexible enough to be bent for this use, just as the JVM is also flexible enough for Scala and Clojure.
Whether a language is statically or dynamically typed is really a tiny part of the issue here: a statically typed one may make it marginally easier for code to enforce its specs, but marginally is the key word here. Only "design by contract" -- a language letting you explicitly state preconditions, postconditions and invariants, and enforcing them -- can help ward you against users of your libraries empirically discovering what exactly the library will let them get away with, and taking advantage of those discoveries to go beyond your design intentions (possibly constraining your future freedom in changing the design or its implementation). And "design by contract" is not supported in mainstream languages -- Eiffel is the closest to that, and few would call it "mainstream" nowadays -- presumably because its costs (mostly, inevitably, at runtime) don't appear to be justified by its advantages. "Argument x must be a prime number", "method A must have been previously called before method B can be called", "method C cannot be called any more once method D has been called", and so on -- the typical kinds of constraints you'd like to state (and have enforced implicitly, without having to spend substantial programming time and energy checking for them yourself) just don't lend themselves well to be framed in the context of what little a statically typed language's compiler can enforce.
I think that this sort of flexibility is an advantage as long as your methods are designed around well defined interfaces rather than some artificial external "type" metadata. Most of the array functions only expect an object with a length property. The fact that they can all be applied generically to lots of different kinds of objects is a boon for code reuse.
The goal of any high level language design should be to reduce the amount of code that needs to be written in order to get stuff done- without harming readability too much. The more code that has to be written, the more bugs get introduced. Restrictive type systems can be, (if not well designed), a pervasive lie at worst, a premature optimisation at best. I don't think overly restrictive type systems aid in writing correct programs. The reason being that the type is merely an assertion, not necessarily based on evidence.
By contrast, the array methods examine their input values to determine whether they have what they need to perform their function. This is duck typing, and I believe that this is more scientific and "correct", and it results in more reusable code, which is what you want. You don't want a method rejecting your inputs because they don't have their papers in order. That's communism.
I do not think your question really has much to do with dynamic vs. static typing. Really, I can see two cases: on one hand, there are things like Duff's device that martin clayton mentioned; that usage is extremely surprising the first time you see it, but it is explicitly allowed by the semantics of the language. If there is a standard, that kind of idiom may appear in later editions of the standard as a specific example. There is nothing wrong with these; in fact, they can (unless overused) be a great productivity boost.
The other case is that of programming to the implementation. Such a case would be an actual abuse, coming from either ignorance of a standard, or lack of a standard, or having a single implementation, or multiple implementations that have varying semantics. The problem is that code written in this way is at best non-portable between implementations and at worst limits the future development of the language, for fear that adding an optimization or feature would break a major application.
It seems to me that the original question is a bit of a non-sequitor. If the specification explicitly allows a particular behavior (as MUST, MAY, SHALL or SHOULD) then anything compiler/interpreter that allows/implements the behavior is, by definition, compliant with the language. This would seem to be the situation proposed by the OP in the comments section - the JavaScript specification supposedly* says that the function in question MAY be used in different situations, and thus it is explicitly allowed.
If, on the other hand, a compiler/interpreter implements or allows behavior that is expressly forbidden by a specification, then the compiler/interpreter is, by definition, operating outside the specification.
There is yet a third scenario, and an associated, well defined, term for those situations where the specification does not define a behavior: undefined. If the specification does not actually specify a behavior given a particular situation, then the behavior is undefined, and may be handled either intentionally or unintentionally by the compiler/interpreter. It is then the responsibility of the developer to realize that the behavior is not part of the specification, and, should s/he choose to leverage the behavior, the developer's application is thereby dependent upon the particular implementation. The interpreter/compiler providing that implementation is under no obligation to maintain the officially undefined behavior beyond backwards compatibility and whatever commitments the producer may make. Furthermore, a later iteration of the language specification may define the previously undefined behavior, making the compiler/interpreter either (a) non-compliant with the new iteration, or (b) come out with a new patch/version to become compliant, thereby breaking older versions.
* "supposedly" because I have not seen the spec, myself. I go by the statements made, above.