Is there a name for the SAT solving scenario where part of the formula is static (forming a propositional "theory") and serves as a static context for testing the satisfiability of relatively small sentence.
Many such tests need to be performed with different sentence, so evaluating the conjunction each small formula with the static part every time anew is suboptimal.
In contrast to incremental SAT, satisfiable sentences are not appended to the theory, but discarded after testing.
Is there a tool that could be adapted for such a case?
Not sure if this has an official name, but in the SMTLib parlance it is known as check-sat-assuming. See Section 4.2.5 (page 64) of http://smtlib.cs.uiowa.edu/papers/smt-lib-reference-v2.6-r2017-07-18.pdf
Related
I have some test simulations that, when run, do not follow the exact same solution path when run in Dymola 2021x and Dymola 2022. Thus, my system that checks to make sure my test simulations are running properly indicates that they are in fact not running properly. It appears that the simulations are still within +/-2*tolerance of each other. I am attempting to learn why it would be that the solutions are not numerically identical if they inputs and models are identical. Nothing in the update release indicated that there were any solution method changes.
From the Dymola 2022 Release Notes, one possible explanation could be in the different Modelica Language Specifications (3.4 in Dymola 2021x, 3.5 in Dymola 2022), either affecting the models directly (MSL 4.0.0 is compliant with the Language Specification 3.4, for example), or the way the models are handled during translation.
You mention you have a system in place that checks your simulation. Is it implemented in Modelica, that is, might it be affected by the different Language Specifications?
Personally, I tend to expect in each release minor changes/improvements in the translation algorithms and in the solvers; even though they might not be mentioned in the release notes. Perhaps this attitude only works because my models are rather coarse, so I cannot really say whether such a change in the computed solution (+-2*tolerance, tolerance=0.1% in my case) is bringing me closer or farther away from the real solution.
I used FEST-Assert and moved to AssertJ after it stopped development.
Recently I was pointed to Google repository with another assertions library Truth (http://google.github.io/truth/).
Reading the examples I can not find any advantage of start using it over AssertJ. So it is just matter of taste what to use. But maybe I missed the point, did I?
From one of their comments at GitHub:
The core difference is that the design of Truth includes two specific
areas of extensibility - that of a strategy for proposition failure -
such that a "subject" for Integers, or a subject for Strings can be
re-used in the context of completely different results for failure. A
notable example is the distinction between JUnit's use of
AssertionError and it's AssumptionViolationException. Truth lets you
use the same proposition classes for both.
The other area of flexibility is the ability to create new
assertion/proposition types and hook them in without declaring
possibly conflicting static methods to import. This can be for new
types (say, adding protobufs) or for new uses of existing types (say,
Strings that are treated as Uris). This is the assertAbout() feature.
Other than that, Truth is very similar to AssertJ, since it was
inspired by FEST, of which AssertJ is a fork of the 2.0 development
line.
To sum up, Truth is designed to be a bit more extensible and flexible, but AssertJ will be great (possibly the greatest) for assertions on standard types.
I see a couple of different research groups, and at least one book, that talk about using Coq for designing certified programs. Is there are consensus on what the definition of certified program is? From what I can tell, all it really means is that the program was proved total and type correct. Now, the program's type may be something really exotic such as a list with a proof that it's nonempty, sorted, with all elements >= 5, etc. However, ultimately, is a certified program just one that Coq shows is total and type safe, where all the interesting questions boil down to what was included in the final type?
Edit 1
Based on wjedynak's answer, I had a look at Xavier Leroy's paper "Formal Verification of a Realistic Compiler", which is linked in the answers below. I think this contains some good information, but I think the more informative information in this sequence of research can be found in the paper Mechanized Semantics for the Clight Subset of the C Language by Sandrine Blazy and Xavier Leroy. This is the language that the "Formal Verification" paper adds optimizations to. In it, Blazy and Leroy present the syntax and semantics of the Clight language and then discuss the validation of these semantics in section 5. In section 5, there's a list of different strategies used for validating the compiler, which in some sense provides an overview of different strategies for creating a certified program. These are:
Manual reviews
Proving properties of the semantics
Verified translations
Testing executable semantics
Equivalence with alternate semantics
In any case, there are probably points that could be added and I'd certainly like to hear about more.
Going back to my original question of what the definition is of a certified program, it's still a little unclear to me. Wjedynak sort of provides an answer, but really the work by Leroy involved creating a compiler in Coq and then, in some sense, certifying it. In theory, it makes it possible to now prove things about the C programs since we can now go C->Coq->proof. In that sense, it seems like there's this work flow where we could
Write a program in X language
Form of a model of the program from step 1 in Coq or some other proof assistant tool. This could involve creating a model of the programming language in Coq or it could involve creating a model of the program directly (i.e. rewriting the program itself in Coq).
Prove some property about the model. Maybe it's a proof about the values. Maybe it's the proof of the equivalence of statements (stuff like 3=1+2 or f(x,y)=f(y,x), whatever.)
Then, based on these proofs, call the original program certified.
Alternatively, we could create a specification of a program in a proof assistant tool and then prove properties about the specification, but not the program itself.
In any case, I'm still interested in hearing alternative definitions if anyone has them.
I agree that the notion seems vague, but in my understanding a certified program is a program equipped/together with the proof of correctness. Now, by using rich and expressive type signatures you can make it so there is no need for a separate proof, but this is often only a matter of convenience. The real issue is what do we mean by correctness: this a matter of specification. You can take a look at e.g. Xavier Leroy. Formal verification of a realistic compiler.
First note that the phrase "certified" has a slightly French bias: elsewhere the expression "verified" or "proven" is often used.
In any case it is important to ask what that actually means. X. Leroy and CompCert is a very good starting point: it is a big project about C compiler verification, and Leroy is always keen to explain to his audience why verification matters. Especially when talking to people from "certification agencies" who usually mean testing, not proving.
Another big verification project is L4.verified which uses Isabelle/HOL. This part of the exposition explains a bit what is actually stated and proven, and what are the consequences. Unfortunately, the actual proof is top secret, so it cannot be checked publicly.
A certified program is a program that is paired with a proof that the program satisfies its specification, i.e., a certificate. The key is that there exists a proof object that can be checked independently of the tool that produced the proof.
A verified program has undergone verification, which in this context may typically mean that its specification has been formalized and proven correct in a system like Coq, but the proof is not necessarily certified by an external tool.
This distinction is well attested in the scientific literature and is not specific to Francophones. Xavier Leroy describes it very clearly in Section 2.2 of A formally verified compiler back-end.
My understanding is that "certified" in this sense is, as Makarius pointed out, an English word chosen by Francophones where native speakers might instead have used "formally verified". Coq was developed in France, and has many Francophone developers and users.
As to what "formal verification" means, Wikipedia notes (license: CC BY-SA 3.0) that it:
is the act of proving ... the correctness of intended algorithms underlying a system with respect to a certain formal specification or property, using formal methods of mathematics.
(I realise you would like a much more precise definition than this. I hope to update this answer in future, if I find one.)
Wikipedia especially notes the difference between verification and validation:
Validation: "Are we trying to make the right thing?", i.e., is the product specified to the user's actual needs?
Verification: "Have we made what we were trying to make?", i.e., does the product conform to the specifications?
The landmark paper seL4: Formal Verification of an OS Kernel (Klein, et al., 2009) corroborates this interpretation:
A cynic might say that an implementation proof only shows that the
implementation has precisely the same bugs that the specification
contains. This is true: the proof does not guarantee that the
specification describes the behaviour the user expects. The
difference [in a verified approach compared to a non-verified one]
is the degree of abstraction and the absence of whole classes of bugs.
Which classes of bugs are those? The Agda tutorial gives some idea:
no runtime errors (inevitable errors like I/O errors are handled; others are excluded by design).
no non-productive infinite loops.
It may means free of runtime error (numeric overflow, invalid references …), which is already good compared to most developed software, while still weak. The other meaning is proved to be correct according to a domain formalization; that is, it does not only have to be formally free of runtime errors, it also has to be proved to do what it's expected to do (which must have been precisely defined).
In the History of Lisp, McCarthy writes :
The unexpected appearance of an interpreter tended to freeze the form of the language, and some of the decisions made rather lightheartedly for the ``Recursive functions ...'' paper later proved unfortunate. These included the COND notation for conditional expressions which leads to an unnecessary depth of parentheses, and the use of the number zero to denote the empty list NIL and the truth value false. Besides encouraging pornographic programming, giving a special interpretation to the address 0 has caused difficulties in all subsequent implementations.
What's he talking about?
... zero to denote the empty list ...
because 0==() has been the emoticon for pornography since 1958.
Now you know.
The fact that too many implementation details were leaking at a higher level, i.e. showing up too much
The original Fortran III spec document, a technical paper disseminated in the Winter of 1958 describes some very explicit additions to the Fortran II language, including ... inline assembly.
The PDF is here
A tantalizing description of the "additions" follows :
Some taboo code is
Mysteriously, Fortran-III was never released to the public (see section 5.), but disseminated in limited fashion before quietly fading away.
I think it is about mixing numerical and logic values, which can still be seen in popular constructs, probably originated in Fortran, like while (1). There are a lot of "clever" C algorithms, that rely on the fact, that 0 is false and every other value isn't.
The same applies at large to API calls, like in POSIX or Linux kernel, some of which return 0 on failure, while some -1 (there's a rule of thumb, when to apply which, but it is just folklore, so often it is broken). Considering the fact, that at McCarthy's time, those things weren't developed yet, you can see his "prophetic" power even here.
Perhaps it was his way of talking about null references: the billion dollar mistake (T. Hoare).
Goal: I want to write an X86_64 assembler. Note: marked as community wiki
Background: I'm familiar with C. I've written MIPS assembly before. I've written some x86 assembly. However, I want to write an x86_64 assembler -- it should output machine code that I can jump to and start executing (like in a JIT).
Question is: what is the best way to approach this? I realize this problem looks kind large to tackle. I want to start out with a basic minimum set:
Load into register
Arithmetric ops on registers (just integers is fine, no need to mess with FPU yet)
Conditionals
Jumps
Just a basic set to make it Turing complete. Anyone done this? Suggestions / resources?
An assembler, like any other "compiler", is best written as a lexical analyser feeding into a language grammar processor.
Assembly language is usually easier than the regular compiled languages since you don't need to worry about constructs crossing line boundaries and the format is usually fixed.
I wrote an assembler for a (fictional) CPU some two years ago for educational purposes and it basically treated each line as:
optional label (e.g., :loop).
operation (e.g., mov).
operands (e.g., ax,$1).
The easiest way to do it is to ensure that tokens are easily distinguishable.
That's why I made the rule that labels had to begin with : - it made the analysis of the line so much easier. The process for handling a line was:
strip off comments (first ; outside a string to end of line).
extract label if present.
first word is then the operation.
rest are the operands.
You can easily insist that different operands have special markers as well, to make your life easier. All this is assuming you have control over the input format. If you're required to use Intel or AT&T format, it's a little more difficult.
The way I approached it is that there was a simple per-operation function that got called (e.g., doJmp, doCall, doRet) and that function decided on what was allowed in the operands.
For example, doCall only allows a numeric or label, doRet allows nothing.
For example, here's a code segment from the encInstr function:
private static MultiRet encInstr(
boolean ignoreVars,
String opcode,
String operands)
{
if (opcode.length() == 0) return hlprNone(ignoreVars);
if (opcode.equals("defb")) return hlprByte(ignoreVars,operands);
if (opcode.equals("defbr")) return hlprByteR(ignoreVars,operands);
if (opcode.equals("defs")) return hlprString(ignoreVars,operands);
if (opcode.equals("defw")) return hlprWord(ignoreVars,operands);
if (opcode.equals("defwr")) return hlprWordR(ignoreVars,operands);
if (opcode.equals("equ")) return hlprNone(ignoreVars);
if (opcode.equals("org")) return hlprNone(ignoreVars);
if (opcode.equals("adc")) return hlprTwoReg(ignoreVars,0x0a,operands);
if (opcode.equals("add")) return hlprTwoReg(ignoreVars,0x09,operands);
if (opcode.equals("and")) return hlprTwoReg(ignoreVars,0x0d,operands);
The hlpr... functions simply took the operands and returned a byte array containing the instructions. They're useful when many operations have similar operand requirements, such as adc,addandand` all requiring two register operands in the above case (the second parameter controlled what opcode was returned for the instruction).
By making the types of operands easily distinguishable, you can check what operands are provided, whether they are legal and which byte sequences to generate. The separation of operations into their own functions provides for a nice logical structure.
In addition, most CPUs follow a reasonably logical translation from opcode to operation (to make the chip designers lives easier) so there will be very similar calculations on all opcodes that allow, for example, indexed addressing.
In order to properly create code in a CPU that allows variable-length instructions, you're best of doing it in two passes.
In the first pass, don't generate code, just generate the lengths of instructions. This allows you to assign values to all labels as you encounter them. The second pass will generate the code and can fill in references to those labels since their values are known. The ignoreVars in that code segment above was used for this purpose (byte sequences of code were returned so we could know the length but any references to symbols just used 0).
Not to discourage you, but there are already many assemblers with various bells and whistles. Please consider contributing to an existing open source project like elftoolchain.