The subsets-sum problem and the solvability of NP-complete problems - lisp

I was reading about the subset-sums problem when I came up with what appears to be a general-purpose algorithm for solving it:
(defun subset-contains-sum (set sum)
(let ((subsets) (new-subset) (new-sum))
(dolist (element set)
(dolist (subset-sum subsets)
(setf new-subset (cons element (car subset-sum)))
(setf new-sum (+ element (cdr subset-sum)))
(if (= new-sum sum)
(return-from subset-contains-sum new-subset))
(setf subsets (cons (cons new-subset new-sum) subsets)))
(setf subsets (cons (cons element element) subsets)))))
"set" is a list not containing duplicates and "sum" is the sum to search subsets for. "subsets" is a list of cons cells where the "car" is a subset list and the "cdr" is the sum of that subset. New subsets are created from old ones in O(1) time by just cons'ing the element to the front.
I am not sure what the runtime complexity of it is, but appears that with each element "sum" grows by, the size of "subsets" doubles, plus one, so it appears to me to at least be quadratic.
I am posting this because my impression before was that NP-complete problems tend to be intractable and that the best one can usually hope for is a heuristic, but this appears to be a general-purpose solution that will, assuming you have the CPU cycles, always give you the correct answer. How many other NP-complete problems can be solved like this one?

NP-complete problems are solvable, just not in polynomial time (as far as we know). That is, an NP-complete problem may have an O(n*2^n) algorithm that could solve it, but it won't have, for example, an O(n^3) algorithm to solve it.
Interestingly, if a quick (polynomial) algorithm was found for any NP-complete problem, then every problem in NP could be solved in polynomial time. This is what P=NP is about.
If I understand your algorithm correctly (and this is based more on your comments than on the code), then it is equivalent to the O(n*2^n) algorithm here. There are 2^n subsets, and since you also need to sum each subset, the algorithm is O(n*2^n).
One more thing about complexity - the O(whatever) only indicates how well a particular algorithm scales. You cannot compare two algorithms and say that one is faster than the other based on this. Big-O notation doesn't care about implementation details and optimisations - it is possible to write two implementations of the same algorithm with one being much faster than the other, even though they might both be O(n^2). One woman making babies is an O(n) operation, but the chances are that this is going to take a lot longer than most O(n*log(n)) sorts you perform. All you can say based on this is that sorting will be slower for very large values on n.

All of the NP-complete problems have solutions. As long as you're willing to spend the time to compute the answer, that is. Just because there's not an efficient algorithm, doesn't mean there isn't one. For example, you could just iterate over every potential solution, and you'll eventually get one. These problems are used all over the place in real-world computing. You just need to be careful about how a big a problem you set for yourself if you're going to need exponential time (or worse!) to solve it.

I am not sure what the runtime
complexity of it is, but appears that
with each element "sum" grows by, the
size of "subsets" doubles, plus one,
so it appears to me to at least be
quadratic.
If the run-time doubles for each increase in N, you're looking at an O(2^N) algorithm. That's also what I'd expect from visiting all subsets of a set (or all members of the powerset of a set), as that's exactly 2^N members (if you include rhe empty set).
The fact that adding or not adding an element to all hitherto-seen sets is fast doesn't mean that the total processing is fast.

What is going on here could be expressed much more simply using recursion:
(defun subset-sum (set sum &optional subset)
(when set
(destructuring-bind (head . tail) set
(or (and (= head sum) (cons head subset))
(subset-sum tail sum subset)
(subset-sum tail (- sum head) (cons head subset))))))
The two recursive calls at the end clearly show we are traversing a binary tree of depth n, the size of the given set. The number of nodes in the binary tree is O(2^n), as expected.

It's karpreducible to polynomial time. Reduce with Karp reduction to a decision problem O(nM) using a heap or binary search upper bounds is log(M*2^M)=logM+log(2^M)=logM+Mlog2 Ergo Time:O(nM)

Related

Lisp interactive emacs multiplying incorrectly

I'm running the following code on Emacs Lisp Interaction:
(defun square (x) (* x x))
(square (square (square 1001)))
which is giving me 1114476179152563777. However, the ((1001^2)^2)^2 is actually 1008028056070056028008001.
How is this possible?
#Barmar's answer is accurate for Emacs versions < 27.
In Emacs 27 bignum support has been added. NEWS says:
** Emacs Lisp integers can now be of arbitrary size.
Emacs uses the GNU Multiple Precision (GMP) library to support
integers whose size is too large to support natively. The integers
supported natively are known as "fixnums", while the larger ones are
"bignums". The new predicates 'bignump' and 'fixnump' can be used to
distinguish between these two types of integers.
All the arithmetic, comparison, and logical (a.k.a. "bitwise")
operations where bignums make sense now support both fixnums and
bignums. However, note that unlike fixnums, bignums will not compare
equal with 'eq', you must use 'eql' instead. (Numerical comparison
with '=' works on both, of course.)
Since large bignums consume a lot of memory, Emacs limits the size of
the largest bignum a Lisp program is allowed to create. The
nonnegative value of the new variable 'integer-width' specifies the
maximum number of bits allowed in a bignum. Emacs signals an integer
overflow error if this limit is exceeded.
Several primitive functions formerly returned floats or lists of
integers to represent integers that did not fit into fixnums. These
functions now simply return integers instead. Affected functions
include functions like 'encode-char' that compute code-points, functions
like 'file-attributes' that compute file sizes and other attributes,
functions like 'process-id' that compute process IDs, and functions like
'user-uid' and 'group-gid' that compute user and group IDs.
and indeed using my 27.0.50 build:
(defun square (x) (* x x))
square
(square (square (square 1001)))
1008028056070056028008001
Emacs Lisp doesn't implement bignums, it uses the machine's integer type. The range of integers it supports is between most-negative-fixnum and most-positive-fixnum. On a 64-bit system, most-positive-fixnum will be 261-1, which has about 20 decimal digits.
See Integer Basics in the Elisp manual.
The correct result of your calculation is 25 digits, which is much larger than this. The calculation overflows and wraps around. It should be correct modulo 262.
You could use floating point instead. It has a much larger range, although very large numbers lose precision.
(square (square (square 1001.0)))
1.008028056070056e+24

What happens behind MATLAB's factor() function?

Mainly, why is it so fast (for big numbers)? The documentation only tells me how to use it. For example, it needs at most one second to find the largest prime factor of 1234567890987654, which, to me, seems insane.
>>max(factor(1234567890987654))
ans =
69444443
The largest factor to be tried is sqrt(N), or 35136418 in this case. Also even the most elementary optimizations would skip all even numbers > 2, leaving only 17568209 candidates to be tested. Once the candidate 17777778 (and it's cofactor 69444443) is found, the algorithm would be wise enough to stop.
This can be somewhat easily improved further by a modified sieve to skip multiples of small primes 2,3,5[,7].
Basically even the sqrt(N) optimization is enough for the noted performance, unless you are working on an exceptionally old CPU (8086).
It's interesting to look at the source code of the factor and primes functions.
factor(N) essentiallty calls primes to find out all primes up to sqrt(N). Once they have been identified, it tests them one by one to see if they divide N.
primes(n) uses Eratosthenes' sieve: for each identified prime, remove all its multiples, exploiting sqrt again to reduce complexity.

Length of circular list in lisp

From 'A gentle introduction to lisp':
If given a circular list such as #1=(A B C . #1#) as input, LENGTH may
not return a value at all. In most implementations it will go into an
infinite loop.
Is this still true? Was/is it a bug? Why not check the nature of the list first?
In modern implementations like R7RS Scheme and Common Lisp they do identify circular list, but to reduce overhead CL has both length that might hang and list-length that returns nil if a circle is detected.
There is no simple way to see the nature of a list when all you see is one cons at a time. What you do is iterate each step in one variable and every two steps in the second staring at the second element. If those two ever is the same object there is a circle reference. That is called the turtle and hare algorithm.
length is meant to work on sequences in general; the circularity issue is relevant for lists but not for, say, strings or arrays. list-length is specialized on lists and works as expected for proper lists, but returns nil for circular lists.

Does heuristics in constraint satisfaction problems ensure no backtracking? (when there exists a solution)

I'm doing a map-coloring problem with Scheme, and I used minimum remaining values (Select the vertex with the fewest legal colors) and degree heuristics select the vertex that has the largest number of neighbors). If there exists a solution for a certain configuration, will these heuristics ensures that it won't need to backtrack?
Let's do a simple theoretical analysis.
Graph coloring is NP-complete for general graphs (if not asking for a coloring with less than 4 colors). This means there exists no known polynomial time algorithm.
Your heuristic is computable in polynomial time.
Assuming you need no backtracking, then you need to make n steps, each of which requires polynomial time (n is number of vertices). Thus you can solve the problem in polynomial time.
Either you have proven P=NP or your assumption is wrong.
I leave it up to you to decide upon which option in point (4) is more plausible.
In general: no, MRV and your other heuristic will not guarantee a straight walk to the goal. (I imagine they might if your problem has some very specific structure, but don't count on it until you've seen the theorem.)
Heuristics prune the search space, or change the order of the search to make an early termination more likely. This is not the same thing as backtracking.
But it's a related concept.
We prune some spaces because we are confident that the solution does not lie in those branches of the search tree, or change the order because we have some reason to believe that it will be quicker if we look in some subtrees before others.
We also cut ourselves off from backtracking because we are confident that the solution is in the branch of the space we are in now (so that if we don't find it in this subtree, we can declare failure and don't bother).
Both kinds of strategies are ultimately about searching less of the space somehow and getting to the answer (positive or negative) without searching everything.
MRV and the degrees heuristic are about reordering the sub-searches, not about avoiding backtracking. Heuristics can be right and make a short search but that's not the same
thing as eliminating backtracking (e.g. the "cut" operator in Prolog). When you find what you're looking for, you can declare success, and of course that eliminates further backtracking. But real backtracking elimination means making a decision not to backtrack no matter what, before the search completes.
E.g. if you're doing a depth-first search, and you find what you're looking for by dumb luck without backtracking, we cannot say that dumb luck is a fence operation that eliminates backtracking. :)

CLISP overflow after multiplication

i'm trying to get a first lisp program to work using the CLISP implementation, by typing
(print (mod (+ (* 28433 (expt 2 7830457) 1)) (expt 10 10))))
in the REPL.
but it gives me *** - overflow during multiplication of large numbers. i thought lisp features arbitrary size/precision. how could that ever happen then?
Lisp's bignums may hold really large numbers, but they too have their limits.
In your case, you can combine exponentiation and modulus into a single procedure, e.g. as in http://en.wikipedia.org/wiki/Modular_exponentiation#Right-to-left_binary_method.
According to http://clisp.cons.org/impnotes/num-concepts.html the maximum size for a bignum is (2^2097088 - 1) and your 2^7830457 is much larger than that.
Perhaps you can look at breaking down that number - perhaps separate out a number of smaller 2^X factors...
Chances are there's a better way to solve the problem. I haven't made it that far on PE, but I know the few that I've done so far tend to have "aha!" solutions to problems that seem out of a computer programs range.
This one especially - 2^7830457 is a huge number -- try (format t "~r" (expt 2 160)). You might try to look at the problem in a new light and see if there's a way to look at it that you haven't thought of.
Lisp is a family of languages with dozens of dialects and hundreds of different implementations.
Computers have finite memory. Programs under some operating systems may have limitations about the memory size. Different Common Lisp implementations use different numeric libraries.
You may want to consult your CLISP manual for its limitations of its various data types.
CLisp provided the function "mod-expt" (or EXT:mod-expt)
[1]> (mod-expt 2 1000000 59)
53
which is pretty fast. And for your purpose that works.