2d bin packing using or-tools: AddNoOverlap2D and OnlyEnforceIf gives MODEL_INVALID - or-tools

I am playing with a 2d bin packing model. I tried this using:
for j in range(n):
for i in range(j):
model.Add(b[i] == b[j]).OnlyEnforceIf(b2[(i,j)]) # not needed?
model.Add(b[i] != b[j]).OnlyEnforceIf(b2[(i,j)].Not())
model.AddNoOverlap2D([xival[i],xival[j]],[yival[i],yival[j]]).OnlyEnforceIf(b2[(i,j)])
The purpose here is to only enforce the no-overlap constraint if items i and j are assigned to the same bin. The combination of AddNoOverlap2D and OnlyEnforceIf seems to give the status:
MODEL_INVALID
If I remove OnlyEnforceIf(b2[(i,j)]) the (now incorrect) model solves fine.
Am I correct to conclude this is just not supported in or-tools (yet)?
I guess I can reformulate things to more MIP-like approach.
A reproducible example is below. I used version 8.1.8487.
from ortools.sat.python import cp_model
#---------------------------------------------------
# data
#---------------------------------------------------
# bin width and height
H = 60
W = 40
# h,w for each item
h = [7,7]
w = [12,12]
n = len(h) # number of items
m = 2 # number of bins
#---------------------------------------------------
# or-tools model
#---------------------------------------------------
model = cp_model.CpModel()
#
# variables
#
# x1,x2 and y1,y2 are start and end
x1 = [model.NewIntVar(0,W-w[i],'x1{}'.format(i)) for i in range(n)]
x2 = [model.NewIntVar(w[i],W,'x2{}'.format(i)) for i in range(n)]
y1 = [model.NewIntVar(0,H-h[i],'y1{}'.format(i)) for i in range(n)]
y2 = [model.NewIntVar(h[i],H,'y2{}'.format(i)) for i in range(n)]
# interval variables
xival = [model.NewIntervalVar(x1[i],w[i],x2[i],'xival{}'.format(i)) for i in range(n)]
yival = [model.NewIntervalVar(y1[i],w[i],y2[i],'yival{}'.format(i)) for i in range(n)]
# bin numbers
b = [model.NewIntVar(0,m-1,'b{}'.format(i)) for i in range(n)]
# b2[(i,j)] = true if b[i]=b[j] for i<j
b2 = {(i,j):model.NewBoolVar('b2{}.{}'.format(i,j)) for j in range(n) for i in range(j)}
#
# constraints
#
for j in range(n):
for i in range(j):
model.Add(b[i] == b[j]).OnlyEnforceIf(b2[(i,j)]) # not needed?
model.Add(b[i] != b[j]).OnlyEnforceIf(b2[(i,j)].Not())
model.AddNoOverlap2D([xival[i],xival[j]],[yival[i],yival[j]]).OnlyEnforceIf(b2[(i,j)])
# model.AddNoOverlap2D([xival[i],xival[j]],[yival[i],yival[j]]) # this one works
#
# solve model
#
solver = cp_model.CpSolver()
rc = solver.Solve(model)
print(rc)
print(solver.StatusName())
Notes:
As indicated in the answer, this is just not supported.
A different formulation for this 2d bin packing problem is shown here. That seems to work quite well.
It is further noted that the pairwise NoOverlap2D formulation may not be a good thing.

If you set solver.parameters.log_search_progress = True you'll see:
Parameters: log_search_progress: true
Enforcement literal not supported in constraint: enforcement_literal: 11 no_overlap_2d { x_intervals: 0 x_intervals: 1 y_intervals: 2 y_intervals: 3 }
...
So yeah, it isn't supported, maybe you can open a feature request as documented here.
But I think you can also solve it using OptionalIntervalVar if you encode the bin number with booleans.

Related

GPFlow multiple independent realizations of same GP, irregular sampling times/lengths

In GPflow I have multiple time series and the sampling times are not aligned across time series, and the time series may have different length (longitudinal data). I assume that they are independent realizations from the same GP. What is the right way to handle this with svgp, and more generally with GPflow? Do i need to use coregionalization? The coregionalization notebook assumed correlated trajectories, while I want shared mean/kernel but independent.
Yes, the Coregion kernel implemented in GPflow is what you can use for your problem.
Let's set up some data from the generative model you describe, with different lengths for the timeseries:
import numpy as np
import gpflow
import matplotlib.pyplot as plt
Ns = [80, 90, 100] # number of observations for three different realizations
Xs = [np.random.uniform(0, 10, size=N) for N in Ns] # observation locations
# three different draws from the same GP:
k = gpflow.kernels.Matern52(variance=2.0, lengthscales=0.5) # kernel
Ks = [k(X[:, None]) for X in Xs]
Ls = [np.linalg.cholesky(K) for K in Ks]
vs = [np.random.randn(N, 1) for N in Ns]
fs = [(L # v).squeeze(axis=-1) for L, v in zip(Ls, vs)]
To actually set up the training data for the gpflow GP model:
# output indicator for the observations: which timeseries is this?
os = [o * np.ones(N) for o, N in enumerate(Ns)] # [0 ... 0, 1 ... 1, 2 ... 2]
# now assemble the three timeseries in single data set:
allX = np.concatenate(Xs)
allo = np.concatenate(os)
allf = np.concatenate(fs)
X = np.c_[allX, allo]
Y = allf[:, None]
assert X.shape == (sum(Ns), 2)
assert Y.shape == (sum(Ns), 1)
# now let's set up a copy of the original kernel:
k2 = gpflow.kernels.Matern52(active_dims=[0]) # the same as k above, but with different hyperparameters
# and a Coregionalization kernel that effectively says they are all independent:
kc = gpflow.kernels.Coregion(output_dim=len(Ns), rank=1, active_dims=[1])
kc.W.assign(np.zeros(kc.W.shape))
kc.kappa.assign(np.ones(kc.kappa.shape))
gpflow.set_trainable(kc, False) # we want W and kappa fixed
The Coregion kernel defines a covariance matrix B = W Wᵀ + diag(kappa), so by setting W=0 we prescribe zero correlations (independent realizations) and kappa=1 (actually the default) ensures that the variance hyperparameter of the copy of the original kernel remains interpretable.
Now construct the actual model and optimize hyperparameters:
k2c = k2 * kc
m = gpflow.models.GPR((X, Y), k2c, noise_variance=1e-5)
opt = gpflow.optimizers.Scipy()
opt.minimize(m.training_loss, m.trainable_variables, compile=False)
which recovers the initial variance and lengthscale hyperparameters pretty well.
If you want to predict, you have to provide the extra "output" column in the Xnew argument to m.predict_f(), e.g. as follows:
Xtest = np.linspace(0, 10, 100)
Xtest_augmented = np.c_[Xtest, np.zeros_like(Xtest)]
f_mean, f_var = m.predict_f(Xtest_augmented)
(whether you set the output column to 0, 1, or 2 does not matter, as we set them all to be the same with our choice of W and kappa).
If your input was more than one-dimensional, you could set
active_dims=list(range(X.shape[1] - 1)) for the first kernel(s) and active_dims=[X.shape[1]-1] for the Coregion kernel.

Kalman Filter (pykalman): Value for obs_covariance and model without intercept

I am looking at the KalmanFilter from pykalman shown in examples:
pykalman documentation
Example 1
Example 2
and I am wondering
observation_covariance=100,
vs
observation_covariance=1,
the documentation states
observation_covariance R: e(t)^2 ~ Gaussian (0, R)
How should the value be set here correctly?
Additionally, is it possible to apply the Kalman filter without intercept in the above module?
The observation covariance shows how much error you assume to be in your input data. Kalman filter works fine on normally distributed data. Under this assumption you can use the 3-Sigma rule to calculate the covariance (in this case the variance) of your observation based on the maximum error in the observation.
The values in your question can be interpreted as follows:
Example 1
observation_covariance = 100
sigma = sqrt(observation_covariance) = 10
max_error = 3*sigma = 30
Example 2
observation_covariance = 1
sigma = sqrt(observation_covariance) = 1
max_error = 3*sigma = 3
So you need to choose the value based on your observation data. The more accurate the observation, the smaller the observation covariance.
Another point: you can tune your filter by manipulating the covariance, but I think it's not a good idea. The higher the observation covariance value the weaker impact a new observation has on the filter state.
Sorry, I did not understand the second part of your question (about the Kalman Filter without intercept). Could you please explain what you mean?
You are trying to use a regression model and both intercept and slope belong to it.
---------------------------
UPDATE
I prepared some code and plots to answer your questions in details. I used EWC and EWA historical data to stay close to the original article.
First of all here is the code (pretty the same one as in the examples above but with a different notation)
from pykalman import KalmanFilter
import numpy as np
import matplotlib.pyplot as plt
# reading data (quick and dirty)
Datum=[]
EWA=[]
EWC=[]
for line in open('data/dataset.csv'):
f1, f2, f3 = line.split(';')
Datum.append(f1)
EWA.append(float(f2))
EWC.append(float(f3))
n = len(Datum)
# Filter Configuration
# both slope and intercept have to be estimated
# transition_matrix
F = np.eye(2) # identity matrix because x_(k+1) = x_(k) + noise
# observation_matrix
# H_k = [EWA_k 1]
H = np.vstack([np.matrix(EWA), np.ones((1, n))]).T[:, np.newaxis]
# transition_covariance
Q = [[1e-4, 0],
[ 0, 1e-4]]
# observation_covariance
R = 1 # max error = 3
# initial_state_mean
X0 = [0,
0]
# initial_state_covariance
P0 = [[ 1, 0],
[ 0, 1]]
# Kalman-Filter initialization
kf = KalmanFilter(n_dim_obs=1, n_dim_state=2,
transition_matrices = F,
observation_matrices = H,
transition_covariance = Q,
observation_covariance = R,
initial_state_mean = X0,
initial_state_covariance = P0)
# Filtering
state_means, state_covs = kf.filter(EWC)
# Restore EWC based on EWA and estimated parameters
EWC_restored = np.multiply(EWA, state_means[:, 0]) + state_means[:, 1]
# Plots
plt.figure(1)
ax1 = plt.subplot(211)
plt.plot(state_means[:, 0], label="Slope")
plt.grid()
plt.legend(loc="upper left")
ax2 = plt.subplot(212)
plt.plot(state_means[:, 1], label="Intercept")
plt.grid()
plt.legend(loc="upper left")
# check the result
plt.figure(2)
plt.plot(EWC, label="EWC original")
plt.plot(EWC_restored, label="EWC restored")
plt.grid()
plt.legend(loc="upper left")
plt.show()
I could not retrieve data using pandas, so I downloaded them and read from the file.
Here you can see the estimated slope and intercept:
To test the estimated data I restored the EWC value from the EWA using the estimated parameters:
About the observation covariance value
By varying the observation covariance value you tell the Filter how accurate the input data is (normally you just describe your confidence in the observation using some datasheets or your knowledge about the system).
Here are estimated parameters and the restored EWC values using different observation covariance values:
You can see the filter follows the original function better with a bigger confidence in observation (smaller R). If the confidence is low (bigger R) the filter leaves the initial estimate (slope = 0, intercept = 0) very slowly and the restored function is far away from the original one.
About the frozen intercept
If you want to freeze the intercept for some reason, you need to change the whole model and all filter parameters.
In the normal case we had:
x = [slope; intercept] #estimation state
H = [EWA 1] #observation matrix
z = [EWC] #observation
Now we have:
x = [slope] #estimation state
H = [EWA] #observation matrix
z = [EWC-const_intercept] #observation
Results:
Here is the code:
from pykalman import KalmanFilter
import numpy as np
import matplotlib.pyplot as plt
# only slope has to be estimated (it will be manipulated by the constant intercept) - mathematically incorrect!
const_intercept = 10
# reading data (quick and dirty)
Datum=[]
EWA=[]
EWC=[]
for line in open('data/dataset.csv'):
f1, f2, f3 = line.split(';')
Datum.append(f1)
EWA.append(float(f2))
EWC.append(float(f3))
n = len(Datum)
# Filter Configuration
# transition_matrix
F = 1 # identity matrix because x_(k+1) = x_(k) + noise
# observation_matrix
# H_k = [EWA_k]
H = np.matrix(EWA).T[:, np.newaxis]
# transition_covariance
Q = 1e-4
# observation_covariance
R = 1 # max error = 3
# initial_state_mean
X0 = 0
# initial_state_covariance
P0 = 1
# Kalman-Filter initialization
kf = KalmanFilter(n_dim_obs=1, n_dim_state=1,
transition_matrices = F,
observation_matrices = H,
transition_covariance = Q,
observation_covariance = R,
initial_state_mean = X0,
initial_state_covariance = P0)
# Creating the observation based on EWC and the constant intercept
z = EWC[:] # copy the list (not just assign the reference!)
z[:] = [x - const_intercept for x in z]
# Filtering
state_means, state_covs = kf.filter(z) # the estimation for the EWC data minus constant intercept
# Restore EWC based on EWA and estimated parameters
EWC_restored = np.multiply(EWA, state_means[:, 0]) + const_intercept
# Plots
plt.figure(1)
ax1 = plt.subplot(211)
plt.plot(state_means[:, 0], label="Slope")
plt.grid()
plt.legend(loc="upper left")
ax2 = plt.subplot(212)
plt.plot(const_intercept*np.ones((n, 1)), label="Intercept")
plt.grid()
plt.legend(loc="upper left")
# check the result
plt.figure(2)
plt.plot(EWC, label="EWC original")
plt.plot(EWC_restored, label="EWC restored")
plt.grid()
plt.legend(loc="upper left")
plt.show()

CNTK Neural network with not one-hot-vector output (multi-class classifier)

Thank you for the CNTK Tool, the examples are running pretty fast. Since some days, I try to set up a simple network, but I dont get it. I need a network with 2 input and 3 output, for example:
|features 0.3 0.5 |labels 0.2 0.7 0.9
The output is not a one-hot-vector, the network has to learn the label-values 0.2 0.7 0.9. But most examples have a one-hot-vector as output, so it is not clear to me how to solve this. I have tried to change the tutorial with 3 classification, but it does not work, the network does not learn the output correctly. The network I have tried is:
BrainScriptNetworkBuilder = {
SDim = 2 # feature dimension
H1Dim = 50 # hidden dimension
H2Dim = 50 # hidden dimension
LDim = 3 # number of classes (labels)
model (features) = {
W0 = ParameterTensor {(H1Dim:SDim)} ; b0 = ParameterTensor {H1Dim}
W1 = ParameterTensor {(H2Dim:H1Dim)} ; b1 = ParameterTensor {H2Dim}
W2 = ParameterTensor {(LDim:H2Dim)} ; b2 = ParameterTensor {LDim}
r1 = ReLU(W0 * features + b0) # hidden layer 1
r2 = ReLU(W1 * r1 + b1) # hidden layer 2
z = ReLU(W2 * r2 + b2)
}.z
# define inputs
features = Input {SDim, sparse = false}
labels = Input {LDim, sparse = false}
# apply model to features
z = model (features)
# define criteria and output(s)
ce = SquareError(labels, z) # criterion (loss)
err = SquareError(labels, z) # additional metric
# connect to the system. These five variables must be named exactly like this.
featureNodes = (features)
inputNodes = (labels)
criterionNodes = (ce)
evaluationNodes = (err)
outputNodes = (z)
}
So my question is: How to set up a network in CNTK, so that the output is not a one hot vector?
Thank you for help.
When your label is not a one-hot vector, squareError is a good loss function to minimize. If some examples have a one-hot label you can still user squareError. So I think you are doing everything right, you might have to just tune the learning rate to get it to work well.

Theano ANN "TypeError: randint() takes at least 1 positional argument (0 given)"

This is the error that I'm receiving:
File "mtrand.pyx",line 1192, in mtrand.RandomState.randint(numpy/random/mtrand/mtrand.c:14128)
I am somewhat new to coding, but I really want to get started with simple ANNs so I decided to start this project.
TypeError: randint() takes at least 1 positional argument (0 given)
# -- coding: utf-8 --
"""
Created on Sun Sep 18 14:56:44 2016
#author: Jamoonie
"""
##theano practice
import numpy as np
import theano
import theano.tensor as T
from sklearn.datasets import load_digits
digits=load_digits()
print (digits.data.shape)
train_x = list(digits.data)
#print train_x.count
train_x = np.array(train_x)
#print train_x
train_y = list(digits.target)
#print train_y.count
train_y = np.array(train_y)
#print train_y
#q = T.matrix('q') checking how matrix dot products work, and how the row,col of the W0 should be set up
#q = np.zeros([5,10])
#print q
#p = T.matrix('p')
#p = np.zeros([10,5])
#
#print np.dot(q,p)
nn_input_dim = train_x.shape[1] ## if shape[0] it yields 1797, which is the number of rows
print nn_input_dim ##shows 64; shape[1] yields 1 row thus 64 columns! which are the layers of data we want to apply
nn_hdim0 = 10
nn_output_dim = len(train_y)
#nn_hdim0 = np.transpose(np.zeros(digits.data.shape))
#print nn_hdim0
epsilon = 0.008
batch_size = 100 ## how much data input per iteration
X = T.matrix('X')
y = T.lvector('y')
## set weight shapeswith random values
#W0 = np.transpose(np.zeros(digits.data.shape))
W0 = theano.shared(np.random.randn(nn_input_dim,nn_hdim0),name='W0') ##the shape of W0 should be row=input_dim, col=# hidden nodes
b0 = theano.shared(np.zeros(nn_hdim0),name='b0')
W1 = theano.shared(np.random.randn(nn_hdim0,nn_output_dim),name='W1') ## shape of W1 should have row=#hidden nodes, col = output dimension
b1 = theano.shared(np.zeros(nn_output_dim),name='b1')
z0 = X.dot(W0)+b0
a0 = T.nnet.softmax(z0) ## first hidden layer result
z1 = a0.dot(W1)+b1
a1 = T.nnet.softmax(z1) ## final result or prediction
loss = T.nnet.categorical_crossentropy(a1,y).mean() ## howmuch the prediction differrs from the real result
prediction = T.argmax(a1,axis=1) ## the maximum values of a1, presented in index posn 1
fwd_propagation = theano.function([X],a1) ## forward propagation function dpeneding on the array of X values and final prediction
calc_loss = theano.function([X,y],loss)
predict= theano.function([X],prediction)
accuracy = theano.function([X],T.sum(T.eq(prediction,train_y))) ## T.eq is elementwise. so this does an elementwise sum of prediction and train_y
dW0 = T.grad(loss,W0)
dW1 = T.grad(loss,W1)
db0=T.grad(loss,b0)
db1=T.grad(loss,b1)
np.random.randint()
gradient_step = theano.function(
[X,y], ##for each set of X,y values
updates=((W1,W1-epsilon*dW1), ##updates W1 by deltaW1(error)*learning rate and subtracting from original W1
(W0,W0-epsilon*dW0),
(b1,b1-epsilon*db1),
(b0,b0-epsilon*db0)))
def build(iterations = 80000):
W1.set_value(np.random.randn(nn_hdim0,nn_output_dim)/np.sqrt(nn_input_dim)) ## why dividing by the sqrt of nn_input_dim,i'm not sure, but they're meant to be random anyway.
W0.set_value(np.random.randn(nn_input_dim,nn_hdim0)/np.sqrt(nn_input_dim))
b1.set_value(np.zeros(nn_output_dim))
b0.set_value(np.zeros(nn_hdim0))
for i in range(0, iterations):
batch_indicies=np.random.randint(0,17,size=100)
batch_x,batch_y=train_x[batch_indicies],train_y[batch_indicies]
gradient_step(batch_x,batch_y)
##so we're providing the values now for the weights, biases and input output values
if i%2000==0:
print("loss after iteration %r: %r" % (i, calc_loss(train_x,train_y)))
print(accuracy(train_x))
if i==80000:
print (W0,b0,W1,b1)
build()
As per the documentation, you need to at-least specify the lowest value of integer to be drawn from the distribution. If you want a random number less than 213 (to be exact between 0 and 213) then you would do r = np.random.randint(213), and if you want a random number between some range let's say 213 and 537 then you would do, r = np.random.randint(213, 537). Also you are trying to get a random number from randint(..) without even storing it to any variable (or passing to any function), which is useless. I would suggest going through basic Theano tutorials to get started, start from here.

Matlab fourier descriptors what's wrong?

I am using Gonzalez frdescp function to get Fourier descriptors of a boundary. I use this code, and I get two totally different sets of numbers describing two identical but different in scale shapes.
So what is wrong?
im = imread('c:\classes\a1.png');
im = im2bw(im);
b = bwboundaries(im);
f = frdescp(b{1}); // fourier descriptors for the boundary of the first object ( my pic only contains one object anyway )
// Normalization
f = f(2:20); // getting the first 20 & deleting the dc component
f = abs(f) ;
f = f/f(1);
Why do I get different descriptors for identical - but different in scale - two circles?
The problem is that the frdescp code (I used this code, that should be the same as referred by you) is written also in order to center the Fourier descriptors.
If you want to describe your shape in a correct way, it is mandatory to mantain some descriptors that are symmetric with respect to the one representing the DC component.
The following image summarize the concept:
In order to solve your problem (and others like yours), I wrote the following two functions:
function descriptors = fourierdescriptor( boundary )
%I assume that the boundary is a N x 2 matrix
%Also, N must be an even number
np = size(boundary, 1);
s = boundary(:, 1) + i*boundary(:, 2);
descriptors = fft(s);
descriptors = [descriptors((1+(np/2)):end); descriptors(1:np/2)];
end
function significativedescriptors = getsignificativedescriptors( alldescriptors, num )
%num is the number of significative descriptors (in your example, is was 20)
%In the following, I assume that num and size(alldescriptors,1) are even numbers
dim = size(alldescriptors, 1);
if num >= dim
significativedescriptors = alldescriptors;
else
a = (dim/2 - num/2) + 1;
b = dim/2 + num/2;
significativedescriptors = alldescriptors(a : b);
end
end
Know, you can use the above functions as follows:
im = imread('test.jpg');
im = im2bw(im);
b = bwboundaries(im);
b = b{1};
%force the number of boundary points to be even
if mod(size(b,1), 2) ~= 0
b = [b; b(end, :)];
end
%define the number of significative descriptors I want to extract (it must be even)
numdescr = 20;
%Now, you can extract all fourier descriptors...
f = fourierdescriptor(b);
%...and get only the most significative:
f_sign = getsignificativedescriptors(f, numdescr);
I just went through the same problem with you.
According to this link, if you want invariant to scaling, make the comparison ratio-like, for example by dividing every Fourier coefficient by the DC-coefficient. f*1 = f1/f[0], f*[2]/f[0], and so on. Thus, you need to use the DC-coefficient where the f(1) in your code is not the actual DC-coefficient after your step "f = f(2:20); % getting the first 20 & deleting the dc component". I think the problem can be solved by keeping the value of the DC-coefficient first, the code after adjusted should be like follows:
% Normalization
DC = f(1);
f = f(2:20); % getting the first 20 & deleting the dc component
f = abs(f) ; % use magnitudes to be invariant to translation & rotation
f = f/DC; % divide the Fourier coefficients by the DC-coefficient to be invariant to scale