Subset sum where the size of the subset is `k` is NPC? - reduction

I have a variation of Subset-Sum problem where the size of the subset is k and all the integers are positive (not zero).
As can be seen online, this question can be fairly solved using dynamic programming in pseudo-polynomial time.
I need to decide wether this problem is NPC, or in P (while assuming P!=NP).
I've tried to reduce from subset-sum problem, but had a problem with the constraint that all integers must be greater than zero. Since otherwise I would have just padded the input with k zero integers.
Formal definition of the problem:
L={<S1,S2,...,Sn,T,k>|There exists a subset I of S1,...,Sn of size m which sums up to T}

The problem is in NPC.
If you could find polynomial time solutions to your problem then you could have polynomial time solutions to Subset Sum problem with upper bound
Time of Subset Sum = k * (Your Problem's Time)

That problem is NPC. ​ In fact, when k is in n1-Ω(1), even the combination of
the numbers all being in nO(k)
with
the promise that there is at most one solution
is not suspected to put it in ​ coNTIME(n^(o(k))) / 2o(k*n*log(n)) ​ infinitely-often,
since this paper's Proof of Theorem 5.1 gives a reduction
that works for such k and preserves the number of solutions.

Related

Maple unable to evalute function in whole range of plot

Maple helpfully can work out the solution to Laplace's equation in a square region and give me the answer in closed form (in terms of an infinite sum). If I try to plot the function of two variables as a 3d plot it gives me most of the surface but not all of it:
Here is the Maple code which produces the solution and turns it into an expression suitable for plotting
lapeq:=diff(v(x,y),x$2)+diff(v(x,y),y$2)=0;
bcs:=v(x,0)=0,v(0,y)=0,v(1,y)=0,v(x,1)=100;
sol1:=pdsolve({lapeq,bcs});
vxy:=eval(v(x,y),sol1);
the result of which is
All good so far. Plotting it via
plot3d(vxy,x=0..1,y=0..1);
gives a result which is fine for x in the full range (0<x<1) but only for y between 0 and around 0.9:
I have tried to evalf some point in the unknown region and Maple can't tell me numerical values there. Is there any way to get Maple to "try a bit harder" to evaluate those numbers?
You could try setting the number of terms in the sum
Compare
lapeq:=diff(v(x,y),x$2)+diff(v(x,y),y$2)=0;
bcs:=v(x,0)=0,v(0,y)=0,v(1,y)=0,v(x,1)=100;
sol1:=pdsolve({lapeq,bcs});
vxy:=subs(infinity=100,sol1);
plot3d(rhs(vxy),x=0..1,y=0..1);
With
restart;
lapeq:=diff(v(x,y),x$2)+diff(v(x,y),y$2)=0;
bcs:=v(x,0)=0,v(0,y)=0,v(1,y)=0,v(x,1)=100;
sol1:=pdsolve({lapeq,bcs});
vxy:=eval(v(x,y),sol1);
plot3d(vxy,x=0..1,y=0..1);
I'm not a huge fan of chopping the infinite sum at some value of the upper bound for n, without at least demonstrating either symbolically or numerically that it is justified. Ie, that chopping does not provide a false idea of convergence.
So, you asked how to make it work "harder". I'll take that to mean that you too might prefer to let evalf/Sum itself decide whether each infinite numeric sum converges -- rather than manually truncate it yourself at some finite value for the upper value of the range for n.
For fun, and caution, I also divide both numerator and denominator of K by the potentially large exp call (potentially much larger than 1). That may not be necessary here.
restart;
lapeq:=diff(v(x,y),x$2)+diff(v(x,y),y$2)=0:
bcs:=v(x,0)=0,v(0,y)=0,v(1,y)=0,v(x,1)=100:
sol1:=pdsolve({lapeq,bcs}):
vxy:=eval(v(x,y),sol1):
K:=op(1,vxy):
J:=simplify(combine(numer(K)/exp(2*Pi*n)))
/simplify(combine(denom(K)/exp(2*Pi*n))):
F:=subs(__d=J,
proc(x,y) local k, m, n, r;
if y<0.8 then
r:=Sum(__d,n=1..infinity);
else
UseHardwareFloats:=false;
m := ceil(1*abs(y/0.80)^16);
r:=add(Sum(eval(__d,n=m*n-k),n=1..infinity),
k=0..m-1);
end if;
evalf(r);
end proc):
plot3d( F, 0..1, 0..0.99 );
Naturally this is slower than mere chopping of terms to obtain a finite sum. And you might be satisfied with some technique that establishes that the excluded terms' sums are negligible.

Proof of NP Completeness of set-partition problem

I have reduced subset sum problem to set partition problem but do not know whether it is correct and so I need your help.
MY METHOD:
In subset sum problem we have to find a subset S1 of set S so that it sums to a number t and in set partition problem we need to find a subset X1 of set X such that summation of numbers in set X1 is half of that in X.
So let us take instance of subset sum problem where t = sum of numbers in X / 2. If we can solve the set partition problem than we solved the subset sum problem too. But we know that subset sum id NP Complete so subset sum problem is also NP Complete( I know how to prove it is NP).
I am having doubt whether we can make a choice of t like that or not. Please help.
Your logic is sound, that is a valid reduction.
We know this is valid because the proof is for the known problem to the unknown problem. You need to prove that EVERY instance of the known problem can be reduced into SOME instance of the unknown problem. So putting restrictions on your unknown problem is perfectly acceptable.
Some notes: Your description is not sufficient for a proper proof. You noted that you knew this but for any readers here: to prove a problem is NP-Complete, you first prove it is in NP, and then you prove it is NP-Hard. This question only addresses a small portion of what an NP-Hard proof should contain.

Partitioning a number into a number of almost equal partitions

I would like to partition a number into an almost equal number of values in each partition. The only criteria is that each partition must be in between 60 to 80.
For example, if I have a value = 300, this means that 75 * 4 = 300.
I would like to know a method to get this 4 and 75 in the above example. In some cases, all partitions don't need to be of equal value, but they should be in between 60 and 80. Any constraints can be used (addition, subtraction, etc..). However, the outputs must not be floating point.
Also it's not that the total must be exactly 300 as in this case, but they can be up to a maximum of +40 of the total, and so for the case of 300, the numbers can sum up to 340 if required.
Assuming only addition, you can formulate this problem into a linear programming problem. You would choose an objective function that would maximize the sum of all of the factors chosen to generate that number for you. Therefore, your objective function would be:
(source: codecogs.com)
.
In this case, n would be the number of factors you are using to try and decompose your number into. Each x_i is a particular factor in the overall sum of the value you want to decompose. I'm also going to assume that none of the factors can be floating point, and can only be integer. As such, you need to use a special case of linear programming called integer programming where the constraints and the actual solution to your problem are all in integers. In general, the integer programming problem is formulated thusly:
You are actually trying to minimize this objective function, such that you produce a parameter vector of x that are subject to all of these constraints. In our case, x would be a vector of numbers where each element forms part of the sum to the value you are trying to decompose (300 in your case).
You have inequalities, equalities and also boundaries of x that each parameter in your solution must respect. You also need to make sure that each parameter of x is an integer. As such, MATLAB has a function called intlinprog that will perform this for you. However, this function assumes that you are minimizing the objective function, and so if you want to maximize, simply minimize on the negative. f is a vector of weights to be applied to each value in your parameter vector, and with our objective function, you just need to set all of these to -1.
Therefore, to formulate your problem in an integer programming framework, you are actually doing:
(source: codecogs.com)
V would be the value you are trying to decompose (so 300 in your example).
The standard way to call intlinprog is in the following way:
x = intlinprog(f,intcon,A,b,Aeq,beq,lb,ub);
f is the vector that weights each parameter of the solution you want to solve, intcon denotes which of your parameters need to be integer. In this case, you want all of them to be integer so you would have to supply an increasing vector from 1 to n, where n is the number of factors you want to decompose the number V into (same as before). A and b are matrices and vectors that define your inequality constraints. Because you want equality, you'd set this to empty ([]). Aeq and beq are the same as A and b, but for equality. Because you only have one constraint here, you would simply create a matrix of 1 row, where each value is set to 1. beq would be a single value which denotes the number you are trying to factorize. lb and ub are the lower and upper bounds for each value in the parameter set that you are bounding with, so this would be 60 and 80 respectively, and you'd have to specify a vector to ensure that each value of the parameters are bounded between these two ranges.
Now, because you don't know how many factors will evenly decompose your value, you'll have to loop over a given set of factors (like between 1 to 10, or 1 to 20, etc.), place your results in a cell array, then you have to manually examine yourself whether or not an integer decomposition was successful.
num_factors = 20; %// Number of factors to try and decompose your value
V = 300;
results = cell(1, num_factors);
%// Try to solve the problem for a number of different factors
for n = 1 : num_factors
x = intlinprog(-ones(n,1),1:n,[],[],ones(1,n),V,60*ones(n,1),80*ones(n,1));
results{n} = x;
end
You can then go through results and see which value of n was successful in decomposing your number into that said number of factors.
One small problem here is that we also don't know how many factors we should check up to. That unfortunately I don't have an answer to, and so you'll have to play with this value until you get good results. This is also an unconstrained parameter, and I'll talk about this more later in this post.
However, intlinprog was only released in recent versions of MATLAB. If you want to do the same thing without it, you can use linprog, which is the floating point version of integer programming... actually, it's just the core linear programming framework itself. You would call linprog this way:
x = linprog(f,A,b,Aeq,beq,lb,ub);
All of the variables are the same, except that intcon is not used here... which makes sense as linprog may generate floating point numbers as part of its solution. Due to the fact that linprog can generate floating point solutions, what you can do is if you want to ensure that for a given value of n, you could loop over your results, take the floor of the result and subtract with the final result, and sum over the result. If you get a value of 0, this means that you had a completely integer result. Therefore, you'd have to do something like:
num_factors = 20; %// Number of factors to try and decompose your value
V = 300;
results = cell(1, num_factors);
%// Try to solve the problem for a number of different factors
for n = 1 : num_factors
x = linprog(-ones(n,1),[],[],ones(1,n),V,60*ones(n,1),80*ones(n,1));
results{n} = x;
end
%// Loop through and determine which decompositions were successful integer ones
out = cellfun(#(x) sum(abs(floor(x) - x)), results);
%// Determine which values of n were successful in the integer composition.
final_factors = find(~out);
final_factors will contain which number of factors you specified that was successful in an integer decomposition. Now, if final_factors is empty, this means that it wasn't successful in finding anything that would be able to decompose the value into integer factors. Noting your problem description, you said you can allow for tolerances, so perhaps scan through results and determine which overall sum best matches the value, then choose whatever number of factors that gave you that result as the final answer.
Now, noting from my comments, you'll see that this problem is very unconstrained. You don't know how many factors are required to get an integer decomposition of your value, which is why we had to semi-brute-force it. In fact, this is a more general case of the subset sum problem. This problem is NP-complete. Basically, what this means is that it is not known whether there is a polynomial-time algorithm that can be used to solve this kind of problem and that the only way to get a valid solution is to brute-force each possible solution and check if it works with the specified problem. Usually, brute-forcing solutions requires exponential time, which is very intractable for large problems. Another interesting fact is that modern cryptography algorithms use NP-Complete intractability as part of their ciphertext and encrypting. Basically, they're banking on the fact that the only way for you to determine the right key that was used to encrypt your plain text is to check all possible keys, which is an intractable problem... especially if you use 128-bit encryption! This means you would have to check 2^128 possibilities, and assuming a moderately fast computer, the worst-case time to find the right key will take more than the current age of the universe. Check out this cool Wikipedia post for more details in intractability with regards to key breaking in cryptography.
In fact, NP-complete problems are very popular and there have been many attempts to determine whether there is or there isn't a polynomial-time algorithm to solve such problems. An interesting property is that if you can find a polynomial-time algorithm that will solve one problem, you will have found an algorithm to solve them all.
The Clay Mathematics Institute has what are known as Millennium Problems where if you solve any problem listed on their website, you get a million dollars.
Also, that's for each problem, so one problem solved == 1 million dollars!
(source: quickmeme.com)
The NP problem is amongst one of the seven problems up for solving. If I recall correctly, only one problem has been solved so far, and these problems were first released to the public in the year 2000 (hence millennium...). So... it has been about 14 years and only one problem has been solved. Don't let that discourage you though! If you want to invest some time and try to solve one of the problems, please do!
Hopefully this will be enough to get you started. Good luck!

What is this code doing? Machine Learning

I'm just learning matlab and I have a snippet of code which I don't understand the syntax of. The x is an n x 1 vector.
Code is below
p = (min(x):(max(x)/300):max(x))';
The p vector is used a few lines later to plot the function
plot(p,pp*model,'r');
It generates an arithmetic progression.
An arithmetic progression is a sequence of numbers where the next number is equal to the previous number plus a constant. In an arithmetic progression, this constant must stay the same value.
In your code,
min(x) is the initial value of the sequence
max(x) / 300 is the increment amount
max(x) is the stopping criteria. When the result of incrementation exceeds this stopping criteria, no more items are generated for the sequence.
I cannot comment on this particular choice of initial value and increment amount, without seeing the surrounding code where it was used.
However, from a naive perspective, MATLAB has a linspace command which does something similar, but not exactly the same.
Certainly looks to me like an odd thing to be doing. Basically, it's creating a vector of values p that range from the smallest to the largest values of x, which is fine, but it's using steps between successive values of max(x)/300.
If min(x)=300 and max(x)=300.5 then this would only give 1 point for p.
On the other hand, if min(x)=-1000 and max(x)=0.3 then p would have thousands of elements.
In fact, it's even worse. If max(x) is negative, then you would get an error as p would start from min(x), some negative number below max(x), and then each element would be smaller than the last.
I think p must be used to create pp or model somehow as well so that the plot works, and without knowing how I can't suggest how to fix this, but I can't think of a good reason why it would be done like this. using linspace(min(x),max(x),300) or setting the step to (max(x)-min(x))/299 would make more sense to me.
This code examines an array named x, and finds its minimum value min(x) and its maximum value max(x). It takes the maximum value and divides it by the constant 300.
It doesn't explicitly name any variable, setting it equal to max(x)/300, but for the sake of explanation, I'm naming it "incr", short for increment.
And, it creates a vector named p. p looks something like this:
p = [min(x), min(x) + incr, min(x) + 2*incr, ..., min(x) + 299*incr, max(x)];

Predicting an overflow when counting combinations

We have the following formula for determining how many combinations C we can pick of size k out of a set of n:
I have written an algorithm which will always give an answer if, of course, the answer falls within the range of the datatype (ulong, in my case), by factorising and cancelling terms on the numerator and denominator during evaluation.
Even though it's quite fast to try to compute C and detect an overflow if the result is too large, it would be better if I could put n and k into a preliminary function which estimates whether the answer will be larger than what ulong can hold. It doesn't have to be exact. If it estimates that a given n and k will not overflow but it does, that's fine - but it should never say this it will overflow if it won't. Ideally this function should be very fast otherwise there is no point in having it - I may as well try and compute C directly and let it overflow.
I was plotting the curve of the nCk for various n's as a function of k to see if I can find a curve which grows at least as fast as C(n, k) but doesn't diverge too far in the range I'm interested in (0..2^64-1) and is computationally easy to evaluate.
I didn't have any luck. Any ideas?
Without seeing the actual code for your algorithm, I can't give you a 100% solution, but your best bet is to develop a heuristic function. By simply finding the smallest value of r for which the final answer to nCr overflows for a variety of n values, you should then be able to analyze the relationship between something like n and the ratio between n and r (n/r), and find a quick to calculate function which would let you know if overflow would occur via regression.
I found that for any n < 68, you should never overflow on the final answer, as 67C33 = 67C34 ~ 1.42x1019 is the largest possible answer, and a ulong holds ~1.84x1019. Similarly, when n > 5000, any r > 5 or n-r < n-5 will certainly overflow. You can tune these cutoffs to your liking, and for all the n values in between them, just calculate n/r and use the regression formula to decide if it will overflow or not.
This might be too much work, but it should at least get you started on the right path.