Incremental SAT Solving: save solving instance - change model between runs - sat

From my understanding incremental SAT solving helps to evaluate different models who are quite close to each other.
I want to use this to evaluate a model and if I change it later reevaluate it again using the previous solution for a faster result. However after looking into various SAT Solvers (Sat4J, Minisat, mathsat5) it seems like they are only able to solve incrementally when all models are presented within one run.
I'm quite new to SAT solving so I might be overlooking something. Is there no way to save a solving instance for later use? Is all learning lost on closing the instance?

In incremental mode, you can feed the solver with new constraints.
Depending on the settings, the solver may or may not forget previous learned clauses and heuristics.
To fully take advantage of the incremental mode and discard previously entered constraints in the system, you need to use "assumptions", i.e. specific variables which will activate or disable the constraints in the solver.
See e.g. this discussion in minisat newsgroup: https://groups.google.com/forum/#!topic/minisat/ffXxBpqKh90

SAT4J provides a mechanism, which allows you feed the solver and then remove parts of the clauses and add new ones clauses for the next check for satisfiability. Clauses to be removed need to be added to a ConstGroup. Unfortunately, it is slighly more complicated, as unit clauses need special handling. It works roughly like this:
solver = initialize it with clauses which are not to be removed
boolean satisfiable;
ConstrGroup group = new ConstrGroup();
IVecInt unit = new VecInt();
try {
for (all clauses to be added and removed) {
if (unit clause) {
unit.push(variable from clause);
} else {
group.add(solver.addClause(clause));
}
satisfiable = solver.isSatisfiable(unit);
} catch (ContradictionException e) {
satisfiable = false;
} finally {
group.removeFrom(solver);
}
Unfortunately, the removal of clauses is implemented in a rather naive way and requires quadratic effort in the number of clauses to be removed.
While this solution works in FeatureIDE (see isSatisfiable(Node node) in https://github.com/FeatureIDE/FeatureIDE/blob/develop/plugins/de.ovgu.featureide.fm.core/src/org/prop4j/SatSolver.java), it is likely that there are way more performant solutions out there.
The other solution with assumptions does not work in our case, as we have millions of queries to a single SAT solver instance with up-to 20,000 variables. Assumptions would increate the number of variables from 20 thousand to a million, which is unlikely to help.

Related

Implementing a priority queue in matlab in order to solve optimization problems using BRANCH AND BOUND

I'm trying to code a priority queue in MATLAB, I know there is the SIMULINK toolbox for priority queue, but I'm trying to code it in MATLAB. I have a pseudo code that uses priority queue for a method called BEST First Search with Branch and Bound. The branch and bound algorithm design strategy is a state space tree and it is used to solve optimization problems. simple explanation of what is branch and bound
I have read chapter 5: Branch and Bound from a book called 'FOUNDATIONS OF ALGORITHMS', it's the 4th edition by Richard Neapolitan and Kumarss Naimipour , and the text is about designing algorithms, complexity analysis of algorithms, and computational complexity (analysis of problems), very interesting book, and I came across this pseudocode:
Void BeFS( state_space_tree T, number& best)
{
priority _queue-of_node PQ;
node(u,v);
initialize (PQ) % initialize PQ to be empty
u=root of T;
best=value(v);
insert(PQ,v) insert(PQ,v) is a procedure that adds v to the priority queue PQ
while(!empty(PQ){ % remove node with best bound
remove(PQ,v);
remove(PQ,v) is a procedure that removes the node with the best bound and it assigns its value to v
if(bound(v) is better than best) % check if node is still promising
for (each child of u of v){
if (value (u) is better than best)
(best=value(u);
if (bound(u) is better than best)
insert(PQ,u)
}
}
}
I don't know how to code it in matlab, and branch and bound is an interesting general algorithm for finding optimal solutions of various optimization problems, especially in discrete and combinatorial optimization, instead of using heuristics to find an optimal solution, since branch and bound reduces calculation time and finds the optimal solution faster.
EDIT:
I have checked everywhere whether a solution already has been implemented , before posting a question here. And I came here to get ideas of how I can get started to implement this code
I have included this in your post so people can know better what you expect of them. However, 'ideas to get started to implement' is still not much more specific than 'how to write code in matlab'.
However, I will still try to answer:
Make the structure of the code, write the basic loops and fill them with comments of what you want to do
Pick (the easiest or first) one of those comments, and see whether you can make it happen in a few lines, you can test it by generating some dummy input for that piece of code
Keep repeating step 2 untill all comments have the required code
If you get stuck in one of the blocks, and have searched but not found the answer to a specific question. Then this is not a bad place to ask.

Which is better in PHP: suppress warnings with '#' or run extra checks with isset()?

For example, if I implement some simple object caching, which method is faster?
1. return isset($cache[$cls]) ? $cache[$cls] : $cache[$cls] = new $cls;
2. return #$cache[$cls] ?: $cache[$cls] = new $cls;
I read somewhere # takes significant time to execute (and I wonder why), especially when warnings/notices are actually being issued and suppressed. isset() on the other hand means an extra hash lookup. So which is better and why?
I do want to keep E_NOTICE on globally, both on dev and production servers.
I wouldn't worry about which method is FASTER. That is a micro-optimization. I would worry more about which is more readable code and better coding practice.
I would certainly prefer your first option over the second, as your intent is much clearer. Also, best to keep away edge condition problems by always explicitly testing variables to make sure you are getting what you are expecting to get. For example, what if the class stored in $cache[$cls] is not of type $cls?
Personally, if I typically would not expect the index on $cache to be unset, then I would also put error handling in there rather than using ternary operations. If I could reasonably expect that that index would be unset on a regular basis, then I would make class $cls behave as a singleton and have your code be something like
return $cls::get_instance();
The isset() approach is better. It is code that explicitly states the index may be undefined. Suppressing the error is sloppy coding.
According to this article 10 Performance Tips to Speed Up PHP, warnings take additional execution time and also claims the # operator is "expensive."
Cleaning up warnings and errors beforehand can also keep you from
using # error suppression, which is expensive.
Additionally, the # will not suppress the errors with respect to custom error handlers:
http://www.php.net/manual/en/language.operators.errorcontrol.php
If you have set a custom error handler function with
set_error_handler() then it will still get called, but this custom
error handler can (and should) call error_reporting() which will
return 0 when the call that triggered the error was preceded by an #.
If the track_errors feature is enabled, any error message generated by
the expression will be saved in the variable $php_errormsg. This
variable will be overwritten on each error, so check early if you want
to use it.
# temporarily changes the error_reporting state, that's why it is said to take time.
If you expect a certain value, the first thing to do to validate it, is to check that it is defined. If you have notices, it's probably because you're missing something. Using isset() is, in my opinion, a good practice.
I ran timing tests for both cases, using hash keys of various lengths, also using various hit/miss ratios for the hash table, plus with and without E_NOTICE.
The results were: with error_reporting(E_ALL) the isset() variant was faster than the # by some 20-30%. Platform used: command line PHP 5.4.7 on OS X 10.8.
However, with error_reporting(E_ALL & ~E_NOTICE) the difference was within 1-2% for short hash keys, and up 10% for longer ones (16 chars).
Note that the first variant executes 2 hash table lookups, whereas the variant with # does only one lookup.
Thus, # is inferior in all scenarios and I wonder if there are any plans to optimize it.
I think you have your priorities a little mixed up here.
First of all, if you want to get a real world test of which is faster - load test them. As stated though suppressing will probably be slower.
The problem here is if you have performance issues with regular code, you should be upgrading your hardware, or optimize the grand logic of your code rather than preventing proper execution and error checking.
Suppressing errors to steal the tiniest fraction of a speed gain won't do you any favours in the long run. Especially if you think that this error may keep happening time and time again, and cause your app to run more slowly than if the error was caught and fixed.

Implementing snapshot in FRP

I'm implementing an FRP framework in Scala and I seem to have run into a problem. Motivated by some thinking, this question I decided to restrict the public interface of my framework so Behaviours could only be evaluated in the 'present' i.e.:
behaviour.at(now)
This also falls in line with Conal's assumption in the Fran paper that Behaviours are only ever evaluated/sampled at increasing times. It does restrict transformations on Behaviours but otherwise we find ourselves in huge problems with Behaviours that represent some input:
val slider = Stepper(0, sliderChangeEvent)
With this Behaviour, evaluating future values would be incorrect and evaluating past values would require an unbounded amount of memory (all occurrences used in the 'slider' event would have to be stored).
I am having trouble with the specification for the 'snapshot' operation on Behaviours given this restriction. My problem is best explained with an example (using the slider mentioned above):
val event = mouseB // an event that occurs when the mouse is pressed
val sampler = slider.snapshot(event)
val stepper = Stepper(0, sampler)
My problem here is that if the 'mouseB' Event has occurred when this code is executed then the current value of 'stepper' will be the last 'sample' of 'slider' (the value at the time the last occurrence occurred). If the time of the last occurrence is in the past then we will consequently end up evaluating 'slider' using a past time which breaks the rule set above (and your original assumption). I can see a couple of ways to solve this:
We 'record' the past (keep hold of all past occurrences in an Event) allowing evaluation of Behaviours with past times (using an unbounded amount of memory)
We modify 'snapshot' to take a time argument ("sample after this time") and enforce that that time >= now
In a more wacky move, we could restrict creation of FRP objects to the initial setup of a program somehow and only start processing events/input after this setup is complete
I could also simply not implement 'sample' or remove 'stepper'/'switcher' (but I don't really want to do either of these things). Has anyone any thoughts on this? Have I misunderstood anything here?
Oh I see what you mean now.
Your "you can only sample at 'now'" restriction isn't tight enough, I think. It needs to be a bit stronger to avoid looking into the past. Since you are using an environmental conception of now, I would define the behavior construction functions in terms of it (so long as now cannot advance by the mere execution of definitions, which, per my last answer, would get messy). For example:
Stepper(i,e) is a behavior with the value i in the interval [now,e1] (where e1 is the
time of first occurrence of e after now), and the value of the most recent occurrence of e afterward.
With this semantics, your prediction about the value of stepper that got you into this conundrum is dismantled, and the stepper will now have the value 0. I don't know whether this semantics is desirable to you, but it seems natural enough to me.
From what I can tell, you are worried about a race condition: what happens if an event occurs while the code is executing.
Purely functional code does not like to have to know that it gets executed. Functional techniques are at their finest in the pure setting, such that it does not matter in what order code is executed. A way out of this dilemma is to pretend that every change happened in one sensitive (internal, probably) piece of imperative code; pretend that any functional declarations in the FRP framework happen in 0 time so it is impossible for something to change during their declaration.
Nobody should ever sleep, or really do anything time sensitive, in a section of code that is declaring behaviors and things. Essentially, code that works with FRP objects ought to be pure, then you don't have any problems.
This does not necessarily preclude running it on multiple threads, but to support that you might need to reorganize your internal representations. Welcome to the world of FRP library implementation -- I suspect your internal representation will fluctuate many times during this process. :-)
I'm confused about your confusion. The way I see is that Stepper will "set" the behavior to a new value whenever the event occurs. So, what happens is the following:
The instant in which the event mouseB occurs, the value of the slider behavior will be read (snapshot). This value will be "set" into the behavior stepper.
So, it is true that the Stepper will "remember" values from the past; the point is that it only remembers the latest value from the past, not everything.
Semantically, it is best to model Stepper as a function like luqui proposes.

Performance of immutable set implementations in Scala

I have recently been diving into Scala and (perhaps predictably) have spent quite a bit of time studying the immutable collections API in the Scala standard library.
I am writing an application that necessarily does many +/- operations on large sets. For this reason, I want to ensure that the implementation I choose is a so-called "persistent" data structure so that I avoid doing copy-on-write. I saw this answer by Martin Odersky, but it didn't really clear up the issue for me.
I wrote the following test code to compare the performance of ListSet and HashSet for add operations:
import scala.collection.immutable._
object TestListSet extends App {
var set = new ListSet[Int]
for(i <- 0 to 100000) {
set += i
}
}
object TestHashSet extends App {
var set = new HashSet[Int]
for(i <- 0 to 100000) {
set += i
}
}
Here is a rough runtime measurement of the HashSet:
$ time scala TestHashSet
real 0m0.955s
user 0m1.192s
sys 0m0.147s
And ListSet:
$ time scala TestListSet
real 0m30.516s
user 0m30.612s
sys 0m0.168s
Cons on a singly linked list is a constant-time operation, but this performance looks linear or worse. Is this performance hit related to the need to check each element of the set for object equality to conform to the no-duplicates invariant of Set? If this is the case, I realize it's not related to "persistence".
As for official documentation, all I could find was the following page, but it seems incomplete: Scala 2.8 Collections API -- Performance Characteristics. Since ListSet seems initially to be a good choice for its memory footprint, maybe there should be some information about its performance in the API docs.
An old question but also a good example of conclusions being drawn on the wrong foundation.
Connor, basically you're trying to do a microbenchmark. That is generally not recommended and damn hard to do properly.
Why? Because the JVM is doing many other things than executing the code in your examples. It's loading classes, doing garbage collection, compiling bytecode to native code, etc. All dynamically and based on different metrics sampled at runtime.
So you cannot conclude anything about the performance of the two collections with the above test code. For example, what you could actually be measuring could be the compilation time of the += method of HashSet and garbage collection times of ListSet. So it's a comparison between apples and pears.
To do a micro benchmark properly, you should:
Warm up the JVM: Load all classes, ensure all code paths in the benchmark are run and hot spots in the code are compiled (e.g. the += method).
Run the benchmark and ensure neither the GC or the compiler runs during the test (use the JVM flags -XX:-PrintCompilation and -XX:-PrintGC). If either runs during the test, discard the result.
Repeat step 2 and sample 10-15 good measurements. Calculate variance and standard deviation.
Evaluate: If the mean of each benchmark +/- 3 std do not overlap, then you can draw a conclusion about which is faster. Otherwise, it's a blurry result (depending on the amount of overlap).
I can recommend reading Oracle's recommendations for doing micro benchmarks and a great article about benchmark pitfalls by Brian Goetz.
Also, if you want to use a good tool, which does all the above for you, try Caliper by Google.
The key line from the ListSet source is (within subclass Node):
override def +(e: A): ListSet[A] = if (contains(e)) this else new Node(e)
where you can see that an item is added only if it is not already contained. So adding to the set is O(n). You can generally assume that XMap has similar performance characteristics to XSet, and ListMap is listed as linear time all the way along. This is why, and it is how a set is supposed to behave.
P.S. In the TestHashSet case you're measuring startup time. It's way more than 30x faster.
Since a set has to have with no duplicates, before adding an element, a Set must check to see if it already contains the element. This search in a list that has no guarantee of an element's position will be O(N) linear time. The same general idea applies to its remove operation.
With a HashSet, the class defines a function that picks a location for any element in O(1), which makes the contains(element) method much quicker, at the expense of taking up more space to decrease the chance of element location collisions.

Genetic Algorithm optimization - using the -O3 flag

Working on a problem that requires a GA. I have all of that working and spent quite a bit of time trimming the fat and optimizing the code before resorting to compiler optimization. Because the GA runs as a result of user input it has to find a solution within a reasonable period of time, otherwise the UI stalls and it just won't play well at all. I got this binary GA solving a 27 variable problem in about 0.1s on an iPhone 3GS.
In order to achieve this level of performance the entire GA was coded in C, not Objective-C.
In a search for a further reduction of run time I was considering the idea of using the "-O3" optimization switch only for the solver module. I tried it and it cut run time nearly in half.
Should I be concerned about any gotcha's by setting optimization to "-O3"? Keep in mind that I am doing this at the file level and not for the entire project.
-O3 flag will make the code work in the same way as before (only faster), as long as you don't do any tricky stuff that is unsafe or otherwise dependant on what the compiler does to it.
Also, as suggested in the comments, it might be a good idea to let the computation run in a separate thread to prevent the UI from locking up. That also gives you the flexibility to make the computation more expensive, or to display a progress bar, or whatever.
Tricky stuff
Optimisation will produce unexpected results if you try to access stuff on the stack directly, or move the stack pointer somewhere else, or if you do something inherently illegal, like forget to initialise a variable (some compilers (MinGW) will set them to 0).
E.g.
int main() {
int array[10];
array[-2] = 13; // some negative value might get the return address
}
Some other tricky stuff involves the optimiser stuffing up by itself. Here's an example of when -O3 completely breaks the code.