Multiple Knapsacks with Fungible Items - or-tools

I am using cp_model to solve a problem very similar to the multiple-knapsack problem (https://developers.google.com/optimization/bin/multiple_knapsack). Just like in the example code, I use some boolean variables to encode membership:
# Variables
# x[i, j] = 1 if item i is packed in bin j.
x = {}
for i in data['items']:
for j in data['bins']:
x[(i, j)] = solver.IntVar(0, 1, 'x_%i_%i' % (i, j))
What is specific to my problem is that there are a large number of fungible items. There may be 5 items of type 1 and 10 items of type 2. Any item is exchangeable with items of the same type. Using the boolean variables to encode the problem implicitly assumes that the order of the assignment for the same type of items matter. But in fact, the order does not matter and only takes up unnecessary computation time.
I am wondering if there is any way to design the model so that it accurately expresses that we are allocating from fungible pools of items to save computation.

Instead of creating 5 Boolean variables for 5 items of type 'i' in bin 'b', just create an integer variable 'count' from 0 to 5 of items 'i' in bin 'b'. Then sum over b (count[i][b]) == #item b

Related

Few minizinc questions on constraints

A little bit of background. I'm trying to make a model for clustering a Design Structure Matrix(DSM). I made a draft model and have a couple of questions. Most of them are not directly related to DSM per se.
include "globals.mzn";
int: dsmSize = 7;
int: maxClusterSize = 7;
int: maxClusters = 4;
int: powcc = 2;
enum dsmElements = {A, B, C, D, E, F,G};
array[dsmElements, dsmElements] of int: dsm =
[|1,1,0,0,1,1,0
|0,1,0,1,0,0,1
|0,1,1,1,0,0,1
|0,1,1,1,1,0,1
|0,0,0,1,1,1,0
|1,0,0,0,1,1,0
|0,1,1,1,0,0,1|];
array[1..maxClusters] of var set of dsmElements: clusters;
array[1..maxClusters] of var int: clusterCard;
constraint forall(i in 1..maxClusters)(
clusterCard[i] = pow(card(clusters[i]), powcc)
);
% #1
% constraint forall(i, j in clusters where i != j)(card(i intersect j) == 0);
% #2
constraint forall(i, j in 1..maxClusters where i != j)(
card(clusters[i] intersect clusters[j]) == 0
);
% #3
% constraint all_different([i | i in clusters]);
constraint (clusters[1] union clusters[2] union clusters[3] union clusters[4]) = dsmElements;
var int: intraCost = sum(i in 1..maxClusters, j, k in clusters[i] where k != j)(
(dsm[j,k] + dsm[k,j]) * clusterCard[i]
) ;
var int: extraCost = sum(el in dsmElements,
c in clusters where card(c intersect {el}) = 0,
k,j in c)(
(dsm[j,k] + dsm[k,j]) * pow(card(dsmElements), powcc)
);
var int: TCC = trace("\(intraCost), \(extraCost)\n", intraCost+extraCost);
solve maximize TCC;
Question 1
I was under the impression, that constraints #1 and #2 are the same. However, seems like they are not. The question here is why? What is the difference?
Question 2
How can I replace constraint #2 with all_different? Does it make sense?
Question 3
Why the trace("\(intraCost), \(extraCost)\n", intraCost+extraCost); shows nothing in the output? The output I see using gecode is:
Running dsm.mzn
intraCost, extraCost
clusters = array1d(1..4, [{A, B, C, D, E, F, G}, {}, {}, {}]);
clusterCard = array1d(1..4, [49, 0, 0, 0]);
----------
<sipped to save space>
----------
clusters = array1d(1..4, [{B, C, D, G}, {A, E, F}, {}, {}]);
clusterCard = array1d(1..4, [16, 9, 0, 0]);
----------
==========
Finished in 5s 419msec
Question 4
The expression constraint (clusters[1] union clusters[2] union clusters[3] union clusters[4]) = dsmElements;, here I wanted to say that the union of all clusters should match the set of all nodes. Unfortunately, I did not find a way to make this big union more dynamic, so for now I just manually provide all clusters. Is there a way to make this expression return union of all sets from the array of sets?
Question 5
Basically, if I understand it correctly, for example from here, the Intra-cluster cost is the sum of all interactions within a cluster multiplied by the size of the cluster in some power, basically the cardinality of the set of nodes, that represents the cluster.
The Extra-cluster cost is a sum of interactions between some random element that does not belong to a cluster and all elements of that cluster multiplied by the cardinality of the whole space of nodes to some power.
The main question here is are the intraCost and extraCost I the model correct (they seem to be but still), and is there a better way to express these sums?
Thanks!
(Perhaps you would get more answers if you separate this into multiple questions.)
Question 3:
Here's an answer on the trace question:
When running the model, the trace actually shows this:
intraCost, extraCost
which is not what you expect, of course. Trace is in effect when creating the model, but at that stage there is no value of these two decision values and MiniZinc shows only the variable names. They got some values to show after the (first) solution is reached, and can then be shown in the output section.
trace is mostly used to see what's happening in loops where one can trace the (fixed) loop variables etc.
If you trace an array of decision variables then they will be represented in a different fashion, the array x will be shown as X_INTRODUCED_0_ etc.
And you can also use trace for domain reflection, e.g. using lb and ub to get the lower/upper value of the domain of a variable ("safe approximation of the bounds" as the documentation states it: https://www.minizinc.org/doc-2.5.5/en/predicates.html?highlight=ub_array). Here's an example which shows the domain of the intraCost variable:
constraint
trace("intraCost: \(lb(intraCost))..\(ub(intraCost))\n")
;
which shows
intraCost: -infinity..infinity
You can read a little more about trace here https://www.minizinc.org/doc-2.5.5/en/efficient.html?highlight=trace .
Update Answer to question 1, 2 and 4.
The constraint #1 and #2 means the same thing, i.e. that the elements in clusters should be disjoint. The #1 constraint is a little different in that it loops over decision variables while the #2 constraint use plain indices. One can guess that #2 is faster since #1 use the where i != j which must be translated to some extra constraints. (And using i < j instead should be a little faster.)
The all_different constraint states about the same and depending on the underlying solver it might be faster if it's translated to an efficient algorithm in the solver.
In the model there is also the following constraint which states that all elements must be used:
constraint (clusters[1] union clusters[2] union clusters[3] union clusters[4]) = dsmElements;
Apart from efficiency, all these constraints above can be replaced with one single constraint: partition_set which ensure that all elements in dsmElements must be used in clusters.
constraint partition_set(clusters,dsmElements);
It might be faster to also combine with the all_different constraint, but that has to be tested.

Printing part of an array in MiniZinc

I have a MiniZinc model for wolf-goat-cabbage in which I store the locations of each entity in its own array, e.g., array[1..max] of Loc: wolf where Loc is defined as an enum: enum Loc = {left, rght}; and max is the maximum possible number of steps needed, e.g., 20..
To find a shortest plan I define a variable var 1..max: len; and constrain the end state to occur at step len.
constraint farmer[len] == left /\ wolf[len] == left /\ goat[len] == left /\ cabbage[len] == left
Then I ask for
solve minimize len
I get all the right answers.
I'd like to display the arrays from 1..len, but I can't find a way to do it. When I try, for example, to include in the output:
[ "\(wolf[n]), " | n in 1..max where n <= len ]
I get an error message saying that I can't display an array of opt string.
Is there a way to display only an initial portion of an array, where the length of the initial portion is determined by the model?
Thanks.
Did you try to fix the len variable in the output statement like n <= fix(len)?. See also What is the use of minizinc fix function?

Why are products called minterms and sums called maxterms?

Do they have a reason for doing so? I mean, in the sum of minterms, you look for the terms with the output 1; I don't get why they call it "minterms." Why not maxterms because 1 is well bigger than 0?
Is there a reason behind this that I don't know? Or should I just accept it without asking why?
The convention for calling these terms "minterms" and "maxterms" does not correspond to 1 being greater than 0. I think the best way to answer is with an example:
Say that you have a circuit and it is described by X̄YZ̄ + XȲZ.
"This form is composed of two groups of three. Each group of three is a 'minterm'. What the expression minterm is intended to imply it that each of the groups of three in the expression takes on a value of 1 only for one of the eight possible combinations of X, Y and Z and their inverses." http://www.facstaff.bucknell.edu/mastascu/elessonshtml/Logic/Logic2.html
So what the "min" refers to is the fact that these terms are the "minimal" terms you need in order to build a certain function. If you would like more information, the example above is explained in more context in the link provided.
Edit: The "reason they used MIN for ANDs, and MAX for ORs" is that:
In Sum of Products (what you call ANDs) only one of the minterms must be true for the expression to be true.
In Product of Sums (what you call ORs) all the maxterms must be true for the expression to be true.
min(0,0) = 0
min(0,1) = 0
min(1,0) = 0
min(1,1) = 1
So minimum is pretty much like logical AND.
max(0,0) = 0
max(0,1) = 1
max(1,0) = 1
max(1,1) = 1
So maximum is pretty much like logical OR.
In Sum Of Products (SOP), each term of the SOP expression is called a "minterm" because,
say, an SOP expression is given as:
F(X,Y,Z) = X'.Y'.Z + X.Y'.Z' + X.Y'.Z + X.Y.Z
for this SOP expression to be "1" or true (being a positive logic),
ANY of the term of the expression should be 1.
thus the word "minterm".
i.e, any of the term (X'Y'Z) , (XY'Z') , (XY'Z) or (XYZ) being 1, results in F(X,Y,Z) to be 1!!
Thus they are called "minterms".
On the other hand,
In Product Of Sum (POS), each term of the POS expression is called a "maxterm" because,
say an POS expression is given as: F(X,Y,Z) = (X+Y+Z).(X+Y'+Z).(X+Y'+Z').(X'+Y'+Z)
for this POS expression to be "0" (because POS is considered as a negative logic and we consider 0 terms), ALL of the terms of the expression should be 0. thus the word "max term"!!
i.e for F(X,Y,Z) to be 0,
each of the terms (X+Y+Z), (X+Y'+Z), (X+Y'+Z') and (X'+Y'+Z) should be equal to "0", otherwise F won't be zero!!
Thus each of the terms in POS expression is called a MAXTERM (maximum all the terms!) because all terms should be zero for F to
be zero, whereas any of the terms in POS being one results in F to be
one. Thus it is known as MINTERM (minimum one term!)
I believe that AB is called a minterm is because it occupies the minimum area on a Venn diagram; while A+B is called a MAXTERM because it occupies a maximum area in a Venn diagram. Draw the two diagrams and the meanings will become obvious
Ed Brumgnach
Here is another way to think about it.
A product is called a minterm because it has minimum-satisfiability where as a sum is called a maxterm because it has maximum-satisfiability among all practically interesting boolean functions.
They are called terms because they are used as the building-blocks of various canonical representations of arbitrary boolean functions.
Details:
Note that '0' and '1' are the trivial boolean functions.
Assume a set of boolean variables x1,x2,...,xk and a non-trivial boolean function f(x1,x2,...,xk).
Conventionally, an input is said to satisfy the boolean function f, whenever f holds a value of 1 for that input.
Note that there are exactly 2^k inputs possible, and any non-trivial boolean-function can satisfy a minimum of 1 input to a maximum of 2^k -1 inputs.
Now consider the two simple boolean functions of interest: sum of all variables S, and product of all variables P (variables may/may-not appear as complements). S is one boolean function that has maximum-satisfiability hence called as maxterm, where as P is the one having minimum-satisfiability hence called a minterm.

Change the class of columns in a data frame

First of all, excuse me if I do any mistakes, but English is not a language I use very often.
I have a data frame with numbers. A small part of the data frame is this:
nominal 2
2
2
2
ordinal
2
1
1
2
So, I want to use the gower distance function on these numbers.
Here ( http://rgm2.lab.nig.ac.jp/RGM2/R_man-2.9.0/library/StatMatch/man/gower.dist.html ) says that in order to use gower.dist, all nominal variables must be of class "factor" and all ordinal variables of class "ordered".
By default, all the columns are of class "integer" and mode "numeric". In order to change the class of the columns, i use these commands:
DF=read.table("clipboard",header=TRUE,sep="\t")
# I select all the cells and I copy them to the clipboard.
#Then R, with this command, reads the data from there.
MyHeader=names(DF) # I save the headers of the data frame to a temp matrix
for (i in 1:length(DF)) {
if (MyHeader[[i]]=="nominal") DF[[i]]=as.factor(DF[[i]])
}
for (i in 1:length(DF)) {
if (MyHeader[[i]]=="ordinal") DF[[i]]=as.ordered(DF[[i]])
}
The first for/if loop changes the class from integer to factor, which is what I want, but the second changes the class of ordinal variables to: "ordered" "factor".
I need to change all the columns with the header "ordinal" to "ordered", as the gower.dist function says.
Thanks in advance,
B.T.
What you are doing is fine --- if perhaps a little inelegantly.
With your ordered factor, you have something like:
> foo <- as.ordered(1:10)
> foo
[1] 1 2 3 4 5 6 7 8 9 10
Levels: 1 < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9 < 10
> class(foo)
[1] "ordered" "factor"
Notice that it has two classes, indicating that it is an ordered factor and that is is a factor:
> is.ordered(as.ordered(1:10))
[1] TRUE
> is.factor(as.ordered(1:10))
[1] TRUE
In some senses, you might like to think that foo is an ordered factor but also inherits from the factor class too. Alternatively, if there isn't a specific method that handles ordered factors, but there is a method for factors, R will use the factor method. As far as R is concerned, an ordered factor is an object with classes "ordered" and "factor". This is what your function for Gower's distance will require.
You could easily do this with:
DF$nominal <- as.factor(DF$nominal)
DF$ordinal <- as.ordered(DF$ordinal)
which gives you a dataframe with the correct structure. If you work with data frames, please stay away from [[]] unless you know very well what you're doing. Take Dirks advice, and check Owen's R Guide as well. You definitely need it.
If i do the conversion as I showed above, gower.dist() works perfectly fine. On a sidenote, the gowers distance can easily be calculated using the daisy() function as well:
DF <- data.frame(
ordinal= c(1,2,3,1,2,1),
nominal= c(2,2,2,2,2,2)
)
DF$nominal <- as.factor(DF$nominal)
DF$ordinal <- as.ordered(DF$ordinal)
library(cluster)
daisy(DF,metric="gower")
library(StatMatch)
gower.dist(DF)

Generate a hash sum for several integers

I am facing the problem of having several integers, and I have to generate one using them. For example.
Int 1: 14
Int 2: 4
Int 3: 8
Int 4: 4
Hash Sum: 43
I have some restriction in the values, the maximum value that and attribute can have is 30, the addition of all of them is always 30. And the attributes are always positive.
The key is that I want to generate the same hash sum for similar integers, for example if I have the integers, 14, 4, 10, 2 then I want to generate the same hash sum, in the case above 43. But of course if the integers are very different (4, 4, 2, 20) then I should have a different hash sum. Also it needs to be fast.
Ideally I would like that the output of the hash sum is between 0 and 512, and it should evenly distributed. With my restrictions I can have around 5K different possibilities, so what I would like to have is around 10 per bucket.
I am sure there are many algorithms that do this, but I could not find a way of googling this thing. Can anyone please post an algorithm to do this?.
Some more information
The whole thing with this is that those integers are attributes for a function. I want to store the values of the function in a table, but I do not have enough memory to store all the different options. That is why I want to generalize between similar attributes.
The reason why 10, 5, 15 are totally different from 5, 10, 15, it is because if you imagine this in 3d then both points are a totally different point
Some more information 2
Some answers try to solve the problem using hashing. But I do not think this is so complex. Thanks to one of the comments I have realized that this is a clustering algorithm problem. If we have only 3 attributes and we imagine the problem in 3d, what I just need is divide the space in blocks.
In fact this can be solved with rules of this type
if (att[0] < 5 && att[1] < 5 && att[2] < 5 && att[3] < 5)
Block = 21
if ( (5 < att[0] < 10) && (5 < att[1] < 10) && (5 < att[2] < 10) && (5 < att[3] < 10))
Block = 45
The problem is that I need a fast and a general way to generate those ifs I cannot write all the possibilities.
The simple solution:
Convert the integers to strings separated by commas, and hash the resulting string using a common hashing algorithm (md5, sha, etc).
If you really want to roll-your-own, I would do something like:
Generate large prime P
Generate random numbers 0 < a[i] < P (for each dimension you have)
To generate hash, calculate: sum(a[i] * x[i]) mod P
Given the inputs a, b, c, and d, each ranging in value from 0 to 30 (5 bits), the following will produce an number in the range of 0 to 255 (8 bits).
bucket = ((a & 0x18) << 3) | ((b & 0x18) << 1) | ((c & 0x18) >> 1) | ((d & 0x18) >> 3)
Whether the general approach is appropriate depends on how the question is interpreted. The 3 least significant bits are dropped, grouping 0-7 in the same set, 8-15 in the next, and so forth.
0-7,0-7,0-7,0-7 -> bucket 0
0-7,0-7,0-7,8-15 -> bucket 1
0-7,0-7,0-7,16-23 -> bucket 2
...
24-30,24-30,24-30,24-30 -> bucket 255
Trivially tested with:
for (int a = 0; a <= 30; a++)
for (int b = 0; b <= 30; b++)
for (int c = 0; c <= 30; c++)
for (int d = 0; d <= 30; d++) {
int bucket = ((a & 0x18) << 3) |
((b & 0x18) << 1) |
((c & 0x18) >> 1) |
((d & 0x18) >> 3);
printf("%d, %d, %d, %d -> %d\n",
a, b, c, d, bucket);
}
You want a hash function that depends on the order of inputs and where similar sets of numbers will generate the same hash? That is, you want 50 5 5 10 and 5 5 10 50 to generate different values, but you want 52 7 4 12 to generate the same hash as 50 5 5 10? A simple way to do something like this is:
long hash = 13;
for (int i = 0; i < array.length; i++) {
hash = hash * 37 + array[i] / 5;
}
This is imperfect, but should give you an idea of one way to implement what you want. It will treat the values 50 - 54 as the same value, but it will treat 49 and 50 as different values.
If you want the hash to be independent of the order of the inputs (so the hash of 5 10 20 and 20 10 5 are the same) then one way to do this is to sort the array of integers into ascending order before applying the hash. Another way would be to replace
hash = hash * 37 + array[i] / 5;
with
hash += array[i] / 5;
EDIT: Taking into account your comments in response to this answer, it sounds like my attempt above may serve your needs well enough. It won't be ideal, nor perfect. If you need high performance you have some research and experimentation to do.
To summarize, order is important, so 5 10 20 differs from 20 10 5. Also, you would ideally store each "vector" separately in your hash table, but to handle space limitations you want to store some groups of values in one table entry.
An ideal hash function would return a number evenly spread across the possible values based on your table size. Doing this right depends on the expected size of your table and on the number of and expected maximum value of the input vector values. If you can have negative values as "coordinate" values then this may affect how you compute your hash. If, given your range of input values and the hash function chosen, your maximum hash value is less than your hash table size, then you need to change the hash function to generate a larger hash value.
You might want to try using vectors to describe each number set as the hash value.
EDIT:
Since you're not describing why you want to not run the function itself, I'm guessing it's long running. Since you haven't described the breadth of the argument set.
If every value is expected then a full lookup table in a database might be faster.
If you're expecting repeated calls with the same arguments and little overall variation, then you could look at memoizing so only the first run for a argument set is expensive, and each additional request is fast, with less memory usage.
You would need to define what you mean by "similar". Hashes are generally designed to create unique results from unique input.
One approach would be to normalize your input and then generate a hash from the results.
Generating the same hash sum is called a collision, and is a bad thing for a hash to have. It makes it less useful.
If you want similar values to give the same output, you can divide the input by however close you want them to count. If the order makes a difference, use a different divisor for each number. The following function does what you describe:
int SqueezedSum( int a, int b, int c, int d )
{
return (a/11) + (b/7) + (c/5) + (d/3);
}
This is not a hash, but does what you describe.
You want to look into geometric hashing. In "standard" hashing you want
a short key
inverse resistance
collision resistance
With geometric hashing you susbtitute number 3 with something whihch is almost opposite; namely close initial values give close hash values.
Another way to view my problem is using the multidimesional scaling (MS). In MS we start with a matrix of items and what we want is assign a location of each item to an N dimensional space. Reducing in this way the number of dimensions.
http://en.wikipedia.org/wiki/Multidimensional_scaling