Converting Problems to CIRCUIT-SAT - sat

I am interested in converting Partial Weighted Max SAT to SAT. I have been recommended to go through CIRCUIT SAT.
Partial Weighted Max SAT consists of a set of hard clauses and a set of weighted soft clauses. We seek an assignment which satisfies all hard clauses and achieves at least a weight of k from the soft clauses.
How would I encode this as a boolean combinational circuit?
I can see how I can easily encode the hard clauses. But how would I encode the soft clauses, and associate weights with them, and ensure that a weight of at least k is achieved by a satisfying assignment?
Thanks

You need to encode your Partial Weighted Max SAT as a pseudo-boolean problem.
Just consider hard clauses as weighted clauses with a high weight and adapt the target value (the sum).
To encode it as a SAT formula, you can use technics embedded in SMT solvers, for example :
MiniSAT+
Z3 by Microsoft Research
... (see others here : https://en.wikipedia.org/wiki/Satisfiability_modulo_theories#SMT_solvers)
To understand how, here is an article (Translating Pseudo-Boolean Constraints into SAT) from the creator of MiniSAT+, who will help you to understand.
And from SAT to Circuit SAT, you will have to use the Tseitin transformation, and your problem will be solved :).

Related

Why is MAX-SAT a generalisation of the SAT problem?

According to Wikipedia, the maximum satisfiability problem (MAX-SAT) is the problem of determining the maximum number of clauses, of a given Boolean formula in conjunctive normal form, that can be made true by an assignment of truth values to the variables of the formula. It is a generalisation of the Boolean satisfiability problem, which asks whether there exists a truth assignment that makes all clauses true.
I do not understand the 2nd sentence on how MAX-SAT is a generalisation of SAT. According to Wikipedia, SAT asks whether the variables of a given Boolean formula can be consistently replaced by the values TRUE or FALSE in such a way that the formula evaluates to TRUE.
The reason why I am asking this is because of the paper 'Semidefinite Optimization Approaches for Satisfiability and Maximum-Satisfiability Problems', where I would like to try Semidefinite optimisation techniques to solve some SAT problems I have at hand.
Imagine turning each of your clauses to implications, by adding p -> q where p is a fresh variable for each clause q you have in your original problem. Then, a satisfying instance to this modified problem is a solution to MAXSAT problem of the original, when you pick those clauses where the solver assigned true to the corresponding p. This gives you a maxsat solver, albeit a crappy one.
Now imagine you have a system that makes sure it makes as many of those p's true as possible. That combination now gives you a maxsat solver, i.e., one that can optimize the number of ps that are true. This way you get a nice maxsat solver for your original problem, i.e., you can reduce the maxsat problem to sat, provided you have something that maximizes the number of true assignment to those p's that you introduce through the translation.
#PatrickTrentin can probably explain much better! The vZ paper (the maxsat engine associated with z3) is also a very nice and simple read on this topic: https://backend.orbit.dtu.dk/ws/portalfiles/portal/110977246/Bj_rner_Phan_Fleckenstein_Unknown_Z_An_Optimizing_SMT_Solver_1.pdf

Boolean expression simplification

I am trying to simplify a Boolean expression with exactly 39 inputs, and about 500 million - 800million terms (as in that many and/not/or statements).
A perfect simplification is not needed, but a good one would be nice.
I am aware of the K-maps , Quine–McCluskey, Espresso algorithms. However I am also aware that these mechanisms would take way too long to simplify a circuit of this size based on what I have read
I would need to simplify this expression as much as possible within a 24 hour period.
After google searching, I find it difficult to find any resources for attempting to simplify a machine of quite this magnitude! Any resources out there or a library out there that can attempt to at least simplify this to some extent within a 24 time period?
A greedy heuristic Simplify is described in the somewhat dated book
Robert K. Brayton , Gary D. Hachtel , C. McMullen , Alberto Sangiovanni-Vincentelli
Logic Minimization Algorithms for VLSI Synthesis
You can find the chapter online.
Simplify is based on the unate paradigm. In divide-and-conquer style, it recursively applies Shannon's expansion theorem to split the function into smaller sub-functions. The heuristic rule is to split by the most binate variable first, i.e. the variable which separates the largest number of terms.
A second approach could be to use graph partitioning tools like METIS to split the terms into independent (or at least loosely related) subsets. But I am not aware that this has been tried sucessfully for logic synthesis tasks. My favorite search engine is sceptical and does not return any hits.
A more recent algorithm based on Binary Decision Diagrams was published in
Olivier Coudert: Doing Two-Level Logic Minimization 100 Times Faster
The paper lists examples with very high number of terms similar to your task at hand.
A somewhat related simplification technique BDD Sweeping as described in A Study of Sweeping Algorithms in the Context of Model Checking.
This is a duplicate question. See https://stackoverflow.com/a/60535990/1531728 for resources about logic optimization, or simplication of boolean expressions.

improve hashing using genetic programming/algorithm

I'm writing a program which can significantly lessen the number of collisions that occur while using hash functions like 'key mod table_size'. For this I would like to use Genetic Programming/Algorithm. But I don't know much about it. Even after reading many articles and examples I don't know that in my case (as in program definition) what would be the fitness function, target (target is usually the required result), what would pose as the population/individuals and parents, etc.
Please help me in identifying the above and with a few codes/pseudo-codes snippets if possible as this is my project.
Its not necessary to be using genetic programming/algorithm, it can be anything using evolutionary programming/algorithm.
thanks..
My advice would be: don't do this that way. The literature on hash functions is vast and we more or less understand what makes a good hash function. We know enough mathematics not to look for them blindly.
If you need a hash function to use, there is plenty to choose from.
However, if this is your uni project and you cannot possibly change the subject or steer it in a more manageable direction, then as you noticed there will be complex issues of getting fitness function and mutation operators right. As far as I can tell off the top of my head, there are no obvious candidates.
You may look up e.g. 'strict avalanche criterion' and try to see if you can reason about it in terms of fitness and mutations.
Another question is how do you want to represent your function? Just a boolean expression? Something built from word operations like AND, XOR, NOT, ROT ?
Depending on your constraints (or rather, assumptions) the question of fitness and mutation will be different.
Broadly fitness is clearly minimize the number of collisions in your 'hash modulo table-size' model.
The obvious part is to take a suitably large and (very important) representative distribution of keys and chuck them through your 'candidate' function.
Then you might pass them through 'hash modulo table-size' for one or more values of table-size and evaluate some measure of 'niceness' of the arising distribution(s).
So what that boils down to is what table-sizes to try and what niceness measure to apply.
Niceness is context dependent.
You might measure 'fullest bucket' as a measure of 'worst case' insert/find time.
You might measure sum of squares of bucket sizes as a measure of 'average' insert/find time based on uniform distribution of amongst the keys look-up.
Finally you would need to decide what table-size (or sizes) to test at.
Conventional wisdom often uses primes because hash modulo prime tends to be nicely volatile to all the bits in hash where as something like hash modulo 2^n only involves the lower n-1 bits.
To keep computation down you might consider the series of next prime larger than each power of two. 5(>2^2) 11 (>2^3), 17 (>2^4) , etc. up to and including the first power of 2 greater than your 'sample' size.
There are other ways of considering fitness but without a practical application the question is (of course) ill-defined.
If your 'space' of potential hash functions don't all have the same execution time you should also factor in 'cost'.
It's fairly easy to define very good hash functions but execution time can be a significant factor.

finding good hash function for languages accepted by finite state automata

I'm working on project in Java (but I think it doesn't depend on the language) where I'm generating small (4 states max) nondeterministic finite state automata on binary alphabet and I have to check fast the generated automaton for equivalence with the previous ones. Therefore, I have to use some good hash function, to avoid compairing with too many automatas.
My first thought was doing a DFS on the transitions and finding all the accepted words until length max. 5 and then I map the set of accepted words to a 64-bit long (the amount of binary words of length max. 5). But it seems to produce too many collisions on NFAs with 4 states. Increasing the length results in making the computing of the hash code too slow for practical use.
Another approach was having a set of words and testing which of them the automaton accepts but finding the right ones, I think, isn't that trivial.
Do you have any idea how to improve the hash function to avoid too many collisions without a significant loss of speed?
Thanks in advance
I was thinking further (thanks #justhalf and #templatetypedef) and I have an idea - an injective function of any NFA (or more precisely, language accepted by it) to integers - Let's have an NFA A. Let's construct minimal DFA A_min accepting the same language with complete delta-function. As a consequence of Myhill-Nerode theorem, this automaton should be unambiguous except isomorphism. Do a BFS from the initial state giving priority to the edges(transitions) based on some fixed order of characters in the alphabet (for example first 0, second 1). And renumber the states based on the order of visiting. Now we have a canonical minimal DFA and we can map the incidence matrix of states to an integer and append enumeration of final states (or better make a tuple, to avoid collision). This integer could be then used for deciding equivalence of two NFAs. Do you think, it is ok or have any other idea?

Efficient Function to Map (or Hash) Integers and Integer Ranges into Index

We are looking for the computationally simplest function that will enable an indexed look-up of a function to be determined by a high frequency input stream of widely distributed integers and ranges of integers.
It is OK if the hash/map function selection itself varies based on the specific integer and range requirements, and the performance associated with the part of the code that selects this algorithm is not critical. The number of integers/ranges of interest in most cases will be small (zero to a few thousand). The performance critical portion is in processing the incoming stream and selecting the appropriate function.
As a simple example, please consider the following pseudo-code:
switch (highFrequencyIntegerStream)
case(2) : func1();
case(3) : func2();
case(8) : func3();
case(33-122) : func4();
...
case(10,000) : func40();
In a typical example, there would be only a few thousand of the "cases" shown above, which could include a full range of 32-bit integer values and ranges. (In the pseudo code above 33-122 represents all integers from 33 to 122.) There will be a large number of objects containing these "switch statements."
(Note that the actual implementation will not include switch statements. It will instead be a jump table (which is an array of function pointers) or maybe a combination of the Command and Observer patterns, etc. The implementation details are tangential to the request, but provided to help with visualization.)
Many of the objects will contain "switch statements" with only a few entries. The values of interest are subject to real time change, but performance associated with managing these changes is not critical. Hash/map algorithms can be re-generated slowly with each update based on the specific integers and ranges of interest (for a given object at a given time).
We have searched around the internet, looking at Bloom filters, various hash functions listed on Wikipedia's "hash function" page and elsewhere, quite a few Stack Overflow questions, abstract algebra (mostly Galois theory which is attractive for its computationally simple operands), various ciphers, etc., but have not found a solution that appears to be targeted to this problem. (We could not even find a hash or map function that considered these types of ranges as inputs, much less a highly efficient one. Perhaps we are not looking in the right places or using the correct vernacular.)
The current plan is to create a custom algorithm that preprocesses the list of interesting integers and ranges (for a given object at a given time) looking for shifts and masks that can be applied to input stream to help delineate the ranges. Note that most of the incoming integers will be uninteresting, and it is of critical importance to make a very quick decision for as large a percentage of that portion of the stream as possible (which is why Bloom filters looked interesting at first (before we starting thinking that their implementation required more computational complexity than other solutions)).
Because the first decision is so important, we are also considering having multiple tables, the first of which would be inverse masks (masks to select uninteresting numbers) for the easy to find large ranges of data not included in a given "switch statement", to be followed by subsequent tables that would expand the smaller ranges. We are thinking this will, for most cases of input streams, yield something quite a bit faster than a binary search on the bounds of the ranges.
Note that the input stream can be considered to be randomly distributed.
There is a pretty extensive theory of minimal perfect hash functions that I think will meet your requirement. The idea of a minimal perfect hash is that a set of distinct inputs is mapped to a dense set of integers in 1-1 fashion. In your case a set of N 32-bit integers and ranges would each be mapped to a unique integer in a range of size a small multiple of N. Gnu has a perfect hash function generator called gperf that is meant for strings but might possibly work on your data. I'd definitely give it a try. Just add a length byte so that integers are 5 byte strings and ranges are 9 bytes. There are some formal references on the Wikipedia page. A literature search in ACM and IEEE literature will certainly turn up more.
I just ran across this library I had not seen before.
Addition
I see now that you are trying to map all integers in the ranges to the same function value. As I said in the comment, this is not very compatible with hashing because hash functions deliberately try to "erase" the magnitude information in a bit's position so that values with similar magnitude are unlikely to map to the same hash value.
Consequently, I think that you will not do better than an optimal binary search tree, or equivalently a code generator that produces an optimal "tree" of "if else" statements.
If we wanted to construct a function of the type you are asking for, we could try using real numbers where individual domain values map to consecutive integers in the co-domain and ranges map to unit intervals in the co-domain. So a simple floor operation will give you the jump table indices you're looking for.
In the example you provided you'd have the following mapping:
2 -> 0.0
3 -> 1.0
8 -> 2.0
33 -> 3.0
122 -> 3.99999
...
10000 -> 42.0 (for example)
The trick is to find a monotonically increasing polynomial that interpolates these points. This is certainly possible, but with thousands of points I'm certain you'ed end up with something much slower to evaluate than the optimal search would be.
Perhaps our thoughts on hashing integers can help a little bit. You will also find there a hashing library (hashlib.zip) based on Bob Jenkins' work which deals with integer numbers in a smart way.
I would propose to deal with larger ranges after the single cases have been rejected by the hashing mechanism.