scipy.optimize.linprog seems to solve the task but doesn't return the x? - scipy

I'm trying to solve a very simple linear program using scipy.optimize.linprog, and it seems the function does what I want it to do, but somehow it doesn't return the 'x' (it does return the correct minimal function value)
Just for a simple example (in matlab notation), I have a 2-D a=[a1; a2] and simple linear constraint [1, 2] * a = 1, and want to minimize the L1 norm of a. The optimum should be a=[0, 0.5].
As far as I understand, I can formulate this in standard form by using an extra variable s, such that b>=abs(a) (i.e. a-b<=0 and -a-b<=0) and minimize sum(b) subject to these constraints and the original equality constraint [1, 2] * a = 1.
So I define x= [a; b], plug it into scipy's linprog, it returns with a success, and I get the correct answer: optimal value of sum(b) is 0.5. However, the x that it returns is full of nan's instead of [0; 0.5; 0; 0.5]
Here's the code:
A = np.array([1,2]).reshape([1,2])
b_eq = np.array([1])
ones = np.ones([2,])
zeros = np.zeros([2,])
zerosm = np.zeros([1, 2])
eye = np.eye(2)
c = np.hstack([zeros, ones])
A_ub = np.vstack([np.hstack([eye, -eye]), np.hstack([-eye, -eye])])
b_ub = np.hstack([zeros, zeros])
A_eq = np.hstack([A, zerosm])
res = scipy.optimize.linprog(c, A_ub=A_ub, b_ub=b_ub, bounds=(None, None),
A_eq=A_eq, b_eq=b_eq)
The result:
success: True
status: 0
fun: 0.5
x: array([ nan, nan, nan, nan])
nit: 3
slack: array([ 0., 0., 0., 1.])
message: 'Optimization terminated successfully.'
I.e. x is nan's instead of the solution. The function value is correct (0.5), and slacks seem fine - according to the scipy docs slack of 0 means that the constraint is active, so 1st and 3rd zero mean that a1=b1=0, and the 2nd zero means a2=b2 and they are not zero (otherwise the 4th slack would also be 0). This is again expected as [0, 0.5] is the solution.
What am I doing wrong? Is this a bug? (using scipy 0.15.1)
Thanks!

Apparently you are running into a bug that has been fixed since version 0.15.1.
When I run your code with scipy 0.18.0, I get:
In [3]: import scipy.optimize
In [4]: %paste
A = np.array([1,2]).reshape([1,2])
b_eq = np.array([1])
ones = np.ones([2,])
zeros = np.zeros([2,])
zerosm = np.zeros([1, 2])
eye = np.eye(2)
c = np.hstack([zeros, ones])
A_ub = np.vstack([np.hstack([eye, -eye]), np.hstack([-eye, -eye])])
b_ub = np.hstack([zeros, zeros])
A_eq = np.hstack([A, zerosm])
res = scipy.optimize.linprog(c, A_ub=A_ub, b_ub=b_ub, bounds=(None, None),
A_eq=A_eq, b_eq=b_eq)
## -- End pasted text --
In [5]: res
Out[5]:
fun: 0.5
message: 'Optimization terminated successfully.'
nit: 4
slack: array([ 0., 0., 0., 1.])
status: 0
success: True
x: array([ 0. , 0.5, 0. , 0.5])

I try to solve your problem with scipy 1.3.1 and it works well:
con: array([ 7.07545134e-12])
fun: 0.49999999999882094
message: 'Optimization terminated successfully.'
nit: 4
slack: array([ 1.07107656e-11, -4.48552306e-12, -4.75552930e-12,
1.00000000e+00])
status: 0
success: True
x: array([ -7.73314746e-12, 5.00000000e-01, 2.97761815e-12,
5.00000000e-01])

Related

GPflow - GP classification with 1-dim Linear kernel fits poorly for 2 dimension data

Following the issue #1435, I have an additional question to how to use GPflow.
I replicate the issue in an additional kernel: https://github.com/avalonhse/BayesNotebook/blob/master/Issue_2_GPFlow_Linear_Classification.ipynb
My purpose is fitting an additive kernel to a 2-dimensional data (squared exponential in dimension 1 and linear kernel in dimension 2). Following the instruction of #1435, I have been successfully fitting the model with kernel gpflow.kernels.Linear(variance= 0.1).
Linear kernel
However, when I use the kernel gpflow.kernels.Linear(active_dims=1,variance= 0.01) as I original planned, the model is not fitted. I used the GPy with same kernel as a reference then the result looks reasonable.
1-dim GPFlow kernel
import numpy as np
X = np.array([[ 9.96578428, 60.],[ 9.96578428, 40.],[ 9.96578428, 20.],
[10.96578428, 30.],[11.96578428, 40.],[12.96578428, 50.],
[12.96578428, 70.],[8.96578428, 30. ],[ 7.96578428, 40.],
[ 6.96578428, 50.],[ 6.96578428, 30.],[ 6.96578428, 10.],
[11.4655664 , 71.],[ 8.56605404, 63.],[12.41574177, 69.],
[10.61562964, 48.],[ 7.61470984, 51.],[ 9.31514956, 45.]])
Y = np.array([[1., 1., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 1., 1., 0., 0., 1., 0.]]).T
# plotting
import matplotlib.pyplot as plt
import matplotlib
%matplotlib inline
def plot(X,Y):
mask = Y[:, 0] == 1
plt.figure(figsize=(6, 6))
plt.plot(X[mask, 0], X[mask, 1], "oC0", mew=0, alpha=0.5)
plt.ylim(-10, 100)
plt.xlim(5, 15)
_ = plt.plot(X[np.logical_not(mask), 0], X[np.logical_not(mask), 1], "oC1", mew=0, alpha=0.5)
plot(X,Y)
# Evaluate real function and the predicted probability
res = 500
xx, yy = np.meshgrid(np.linspace(5, 15, res),
np.linspace(- 10, 120, res))
Xplot = np.vstack((xx.flatten(), yy.flatten())).T
# Code followed the Notebook : https://gpflow.readthedocs.io/en/develop/notebooks/basics/classification.html
import tensorflow as tf
import tensorflow_probability as tfp
import gpflow
from gpflow.utilities import print_summary, set_trainable, to_default_float
gpflow.config.set_default_summary_fmt("notebook")
def testGPFlow(k):
m = gpflow.models.VGP(
(X, Y),
kernel= k,
likelihood=gpflow.likelihoods.Bernoulli()
)
print("\n ########### Model before optimzation ########### \n")
print_summary(m)
print("\n ########### Model after optimzation ########### \n")
opt = gpflow.optimizers.Scipy()
res = opt.minimize(
m.training_loss, variables=m.trainable_variables, options=dict(maxiter=2500), method="L-BFGS-B"
)
print(' Message: ' + str(res.message) + '\n Status = ' + str(res.status) + '\n Number of iterations = ' + str(res.nit))
print_summary(m)
means, _ = m.predict_y(Xplot) # here we only care about the mean
y_prob = means.numpy().reshape(*xx.shape)
print("Fitting model using GPFlow")
plot(X,Y)
_ = plt.contour(
xx,
yy,
y_prob,
[0.5], # plot the p=0.5 contour line only
colors="k",
linewidths=1.8,
zorder=100,
)
k = gpflow.kernels.Linear(active_dims=[1],variance= 0.01)
testGPFlow(k)
k = gpflow.kernels.Linear(variance= 1)
testGPFlow(k)
The GPy code is for reference only to suggest how a fitted model should be. I am aware that GPy and GPflow use different methods. My question is why GPflow model does not fit when I specify the Linear kernel in 1 dimension.
Thanks for posting this question, Hoang, and for using GPflow.
When you specify input_dim in Gpy, you are telling the algorithm to act on two dimensions. Active_dims in GPflow behaves differently. It specifies which dimensions you want the kernel to act on. 'active_dims = 1' is telling GPflow to apply your linear kernel to only the y dimension.
Since you want your kernel to act on both x and y dimensions, you should specify active_dims = [0,1] rather than just 'active_dims = 1.' When I run your code with this fix, I get a result identical to GPy's result:

BVP4c solve for unknown boundary

I am trying to use bvp4c to solve a system of 4 odes. The issue is that one of the boundaries is unknown.
Can bvp4c handle this? In my code L is the unknown I am solving for.
I get an error message printed below.
function mat4bvp
L = 8;
solinit = bvpinit(linspace(0,L,100),#mat4init);
sol = bvp4c(#mat4ode,#mat4bc,solinit);
sint = linspace(0,L);
Sxint = deval(sol,sint);
end
% ------------------------------------------------------------
function dtdpdxdy = mat4ode(s,y,L)
Lambda = 0.3536;
dtdpdxdy = [y(2)
-sin(y(1)) + Lambda*(L-s)*cos(y(1))
cos(y(1))
sin(y(1))];
end
% ------------------------------------------------------------
function res = mat4bc(ya,yb,L)
res = [ ya(1)
ya(2)
ya(3)
ya(4)
yb(1)];
end
% ------------------------------------------------------------
function yinit = mat4init(s)
yinit = [ cos(s)
0
0
0
];
end
Unfortunately I get the following error message ;
>> mat4bvp
Not enough input arguments.
Error in mat4bvp>mat4ode (line 13)
-sin(y(1)) + Lambda*(L-s)*cos(y(1))
Error in bvparguments (line 105)
testODE = ode(x1,y1,odeExtras{:});
Error in bvp4c (line 130)
bvparguments(solver_name,ode,bc,solinit,options,varargin);
Error in mat4bvp (line 4)
sol = bvp4c(#mat4ode,#mat4bc,solinit);
One trick to transform a variable end point into a fixed one is to change the time scale. If x'(t)=f(t,x(t)) is the differential equation, set t=L*s, s from 0 to 1, and compute the associated differential equation for y(s)=x(L*s)
y'(s)=L*x'(L*s)=L*f(L*s,y(s))
The next trick to employ is to transform the global variable into a part of the differential equation by computing it as constant function. So the new system is
[ y'(s), L'(s) ] = [ L(s)*f(L(s)*s,y(s)), 0 ]
and the value of L occurs as additional free left or right boundary value, increasing the number of variables = dimension of the state vector to the number of boundary conditions.
I do not have Matlab readily available, in Python with the tools in scipy this can be implemented as
from math import sin, cos
import numpy as np
from scipy.integrate import solve_bvp, odeint
import matplotlib.pyplot as plt
# The original function with the interval length as parameter
def fun0(t, y, L):
Lambda = 0.3536;
#print t,y,L
return np.array([ y[1], -np.sin(y[0]) + Lambda*(L-t)*np.cos(y[0]), np.cos(y[0]), np.sin(y[0]) ]);
# Wrapper function to apply both tricks to transform variable interval length to a fixed interval.
def fun1(s,y):
L = y[-1];
dydt = np.zeros_like(y);
dydt[:-1] = L*fun0(L*s, y[:-1], L);
return dydt;
# Implement evaluation of the boundary condition residuals:
def bc(ya, yb):
return [ ya[0],ya[1], ya[2], ya[3], yb[0] ];
# Define the initial mesh with 5 nodes:
x = np.linspace(0, 1, 3)
# This problem has multiple solutions. Try two initial guesses.
L_a=8
L_b=9
y_a = odeint(lambda y,t: fun1(t,y), [0,0,0,0,L_a], x)
y_b = odeint(lambda y,t: fun1(t,y), [0,0,0,0,L_b], x)
# Now we are ready to run the solver.
res_a = solve_bvp(fun1, bc, x, y_a.T)
res_b = solve_bvp(fun1, bc, x, y_b.T)
L_a = res_a.sol(0)[-1]
L_b = res_b.sol(0)[-1]
print "L_a=%.8f, L_b=%.8f" % ( L_a,L_b )
# Plot the two found solutions. The solution are in a spline form, use this to produce a smooth plot.
x_plot = np.linspace(0, 1, 100)
y_plot_a = res_a.sol(x_plot)[0]
y_plot_b = res_b.sol(x_plot)[0]
plt.plot(L_a*x_plot, y_plot_a, label='L=%.8f'%L_a)
plt.plot(L_b*x_plot, y_plot_b, label='L=%.8f'%L_b)
plt.legend()
plt.xlabel("t")
plt.ylabel("y")
plt.grid(); plt.show()
which produces
Trying different initial values for L finds other solutions on quite different scales, among them
L=0.03195111
L=0.05256775
L=0.05846539
L=0.06888907
L=0.08231966
L=4.50411522
L=6.84868060
L=20.01725616
L=22.53189063

Understanding the Jacobian output of scipy.optimize.minimize

I'm working with scipy.optimize.minimize to find the minimum of the RSS for a custom nonlinear function. I'll provide a simple linear example to illustrate what I am doing:
import numpy as np
from scipy import optimize
def response(X, b0, b1, b2):
return b2 * X[1]**2 + b1 * X[0] + b0
def obj_rss(model_params, y_true, X):
return np.sum((y_true - response(X, *model_params))**2)
x = np.array([np.arange(0, 10), np.arange(10, 20)])
r = 15. * x[1]**2 - 32. * x[0] + 10.
init_guess = np.array([0., 50., 10.])
res = optimize.minimize(obj_rss, init_guess, args=(r, x))
print res
This yields the results:
fun: 3.0218799331864133e-08
hess_inv: array([[ 7.50606278e+00, 2.38939463e+00, -8.33333575e-02],
[ 2.38939463e+00, 8.02462363e-01, -2.74621294e-02],
[ -8.33333575e-02, -2.74621294e-02, 9.46969972e-04]])
jac: array([ -3.31359843e-07, -5.42022462e-08, 2.34304025e-08])
message: 'Optimization terminated successfully.'
nfev: 45
nit: 6
njev: 9
status: 0
success: True
x: array([ 10.00066577, -31.99978062, 14.99999243])
And we see that the fitted parameters 10, -32, and 15 are equivalent to those used to generate the actuals data. That's great. Now my question:
I have the understanding that the Jacobian should be an m x n matrix where m is the number of records from the X input and n is the number of parameters. Clearly I don't have that in the results object. The results object yields an array that is referred to as the Jacobian in the documentation (1 and 2), but is only one-dimensional with a number of elements equal to the number of parameters.
Further confusing the matter, when I use method='SLSQP', the Jacobian that is returned has one more element than that returned by other minimization algorithms.
. . .
My larger goal here is to be able to calculate either confidence intervals or standard errors, t-, and p-values for the fitted parameters, so if you think I'm way off track here, please let me know.
EDIT:
The following is intended to show how the SLSQP minimization algorithm yields different results in the Jacobian than the default minimization algorithm, which is one of BFGS, L-BFGS-B, or SLSQP, depending on if the problem has constraints (as mentioned in the documentation). The SLSQP solver is intended for use with constraints.
import numpy as np
from scipy import optimize
def response(X, b0, b1, b2):
return b2 * X[1]**2 + b1 * X[0] + b0
def obj_rss(model_params, y_true, X):
return np.sum((y_true - response(X, *model_params))**2)
x = np.array([np.arange(0, 10), np.arange(10, 20)])
r = 15. * x[1]**2 - 32. * x[0] + 10.
init_guess = np.array([0., 50., 10.])
res = optimize.minimize(obj_rss, init_guess, method='SLSQP', args=(r, x))
print res
r_pred = response(x, *res.x)
Yields results:
fun: 7.5269461938291697e-10
jac: array([ 2.94677643e-05, 5.52844499e-04, 2.59870917e-02,
0.00000000e+00])
message: 'Optimization terminated successfully.'
nfev: 58
nit: 10
njev: 10
status: 0
success: True
x: array([ 10.00004495, -31.9999794 , 14.99999938])
One can see that there is an extra element in the Jacobian array that is returned from the SLSQP solver. I am confused where this comes from.

scipy optimize minimize: hess_inv strongly depends on initial guess

I am using scipy.optimize.minimize to minimize a simple log likelihood function. The Hessian matrix doesn't seem to behave well.
import scipy.optimize as op
def lnlike(theta, n, bhat, fhat, sigb, sigf):
S, b, f = theta
mu = f*S + b
scb2 = ((b-bhat)/sigb)**2
scf2 = ((f-fhat)/sigf)**2
return n*np.log(mu) - mu - 0.5*(scb2+scf2)
nll = lambda *args: -lnlike(*args)
myargs=(21.0, 20.0, 0.5, 6.0, 0.1)
If the initial guess is at the minimum, the iteration doesn't go anywhere. That is fine in terms of the parameter values, but it doesn't touch Hessian (still identity) either, so I cannot use it for uncertainty estimation.
x0 = [2.0, 20.0, 0.5] # initial guess is at the minimum
result = op.minimize(nll, x0, args= myargs)
print result
status: 0
success: True
njev: 1
nfev: 5
hess_inv: array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])
fun: -42.934971192191881
x: array([ 2. , 20. , 0.5])
message: 'Optimization terminated successfully.'
jac: array([ 0.00000000e+00, 0.00000000e+00, 9.53674316e-07])
If I change the initial guess a little bit, it seems to return a sensible hess_inv.
x0 = [2.01, 20.0, 0.5]
result = op.minimize(nll, x0, args= myargs)
print result
print np.sqrt(result.hess_inv[0,0])
status: 0
success: True
njev: 15
nfev: 75
hess_inv: array([[ 2.16004477e+02, -7.60588367e+01, -2.94846112e-02],
[ -7.60588367e+01, 3.55748024e+01, 2.74064505e-03],
[ -2.94846112e-02, 2.74064505e-03, 9.98030944e-03]])
fun: -42.934971191969964
x: array([ 1.99984604, 19.9999814 , 0.5000001 ])
message: 'Optimization terminated successfully.'
jac: array([ -2.38418579e-06, -5.24520874e-06, 1.90734863e-06])
14.697090757
However, hess_inv is very sensitive to the initial guess.
x0 = [2.02, 20.0, 0.5]
result = op.minimize(nll, x0, args= myargs)
print result
print np.sqrt(result.hess_inv[0,0])
status: 0
success: True
njev: 16
nfev: 80
hess_inv: array([[ 1.82153214e+02, -6.03482772e+01, -2.97458789e-02],
[ -6.03482772e+01, 3.30771459e+01, -2.53811809e-03],
[ -2.97458789e-02, -2.53811809e-03, 9.99052952e-03]])
fun: -42.934971192188634
x: array([ 1.9999702 , 20.00000354, 0.50000001])
message: 'Optimization terminated successfully.'
jac: array([ -9.53674316e-07, -4.76837158e-07, -4.76837158e-07])
13.4964148462
Change the initial guess a bit more
x0 = [2.03, 20.0, 0.5]
result = op.minimize(nll, x0, args= myargs)
print result
print np.sqrt(result.hess_inv[0,0])
status: 0
success: True
njev: 14
nfev: 70
hess_inv: array([[ 2.30479371e+02, -7.36087027e+01, -3.79639119e-02],
[ -7.36087027e+01, 3.55785937e+01, 3.54182478e-03],
[ -3.79639119e-02, 3.54182478e-03, 9.97664441e-03]])
fun: -42.93497119204827
x: array([ 1.99975148, 20.00006366, 0.50000009])
message: 'Optimization terminated successfully.'
jac: array([ -9.53674316e-07, -9.53674316e-07, 4.29153442e-06])
15.1815470484
Did I miss something? Is this a bug or a feature?
The way I understand the optimizers, the Hessian are approximated by finite differences. In your case, it does not seem the best idea. Perhaps, utilizing Sympy (in IPython) will produce more usable results:
import sympy as sy
import numpy as np
import scipy.optimize as sopt
from IPython.display import display # nice printing
sy.init_printing() # LaTeX like printing for IPython
def lnlike(theta, n, bhat, fhat, sigb, sigf):
S, b, f = theta
mu = f*S + b
scb2 = ((b-bhat)/sigb)**2
scf2 = ((f-fhat)/sigf)**2
return n*sy.log(mu) - mu - (scb2+scf2) / 2
# declare symbols:
th_S, th_b, th_f = sy.symbols("theta_S, theta_b, theta_f", real=True)
theta = (th_S, th_b, th_f)
n, bhat, fhat = sy.symbols("n, \hat{b}, \hat{f}", real=True )
sigb, sigf = sy.symbols("sigma_b, sigma_d", real=True )
# symbolic optimizaton function:
lf = -lnlike(theta, n, bhat, fhat, sigb, sigf)
# Gradient:
dlf = sy.Matrix([lf.diff(th) for th in theta])
# Hessian:
Hlf = sy.Matrix([dlf.T.diff(th) for th in theta])
print("Symbolic Hessian:")
display(Hlf)
# Make numpy functions:
margs = {n:21, bhat:20, fhat:.5, sigb:6, sigf:.1} # parameters
lf_a, dlf_a, Hlf_a = lf.subs(margs), dlf.subs(margs), Hlf.subs(margs)
lf_lam = sy.lambdify(theta, lf_a, modules="numpy")
dlf_lam = sy.lambdify(theta, dlf_a, modules="numpy")
Hlf_lam = sy.lambdify(theta, Hlf_a, modules="numpy")
nlf = lambda xx: np.array(lf_lam(xx[0], xx[1], xx[2])) # function
ndlf = lambda xx: np.array(dlf_lam(xx[0], xx[1], xx[2])).flatten() # gradient
nHlf = lambda xx: np.array(Hlf_lam(xx[0], xx[1], xx[2])) # Hessian
x0 = [2.02, 20.0, 0.5]
rs = sopt.minimize(nlf, x0, jac=ndlf, hess=nHlf, method='Newton-CG')
print(rs)
print("Hessian:")
print(nHlf(rs.x))
If you're using a quasi-Newton method, which from the documentation it appears you are:
Quasi-Newton methods build up a guess at the Hessian inverse by applying a sequence of low-rank updates to a completely naive guess (typically a multiple of the identity). The low-rank updates used are in some sense the "least-change" updates that make a given equation hold, and the meaning of "least-change" varies with the quasi-Newton method chosen. If you start at, or very close to, the minimiser, the optimiser will figure this out very quickly and it won't build up much information in its approximation to the Hessian inverse.

Positive directional derivative for linesearch

What does the smode of scipy.optimize 'Positive directional derivative for linesearch' mean?
for example in fmin_slsqp
http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_slsqp.html
These optimization algorithms typically work by choosing a descent direction, and then performing a line search to that direction. I think this message means that the optimizer got into a position where it did not manage to find a direction where the value of the objective function decreases (fast enough), but could also not verify that the current position is a minimum.
I still don't know what it means but how to solve it. Basically, the function that is optimized needs to return a smaller value.
F(x):
...
return value / 10000000
To avoid changing your function you can also try experimenting with the ftol and eps parameters. Changing ftol to a higher value is equivalent to changing the function to a smaller value.
One situation in which you receive this error, is when
x0 is outside the valid range you defined in bounds.
and the unconstrained maximum is attained for values outside bounds.
I will set up a hypothetical optimization problem, run it with two different initial values and print the output of scipy.optimize:
import numpy as np
from scipy import optimize
H = np.array([[2., 0.],
[0., 8.]])
c = np.array([0, -32])
x0 = np.array([0.5, 0.5]) # valid initial value
x1 = np.array([-1, 1.1]) # invalid initial value
def loss(x, sign=1.):
return sign * (0.5 * np.dot(x.T, np.dot(H, x)) + np.dot(c, x))
def jac(x, sign=1.):
return sign * (np.dot(x.T, H) + c)
bounds = [(0, 1), (0, 1)]
Now that loss function, gradient, x0 and bounds are in place, we can solve the problem:
def solve(start):
res = optimize.minimize(fun=loss,
x0=start,
jac=jac,
bounds=bounds,
method='SLSQP')
return res
solve(x0) # valid initial value
# fun: -27.999999999963507
# jac: array([ 2.90878432e-14, -2.40000000e+01])
# message: 'Optimization terminated successfully.'
# ...
# status: 0
# success: True
# x: array([1.45439216e-14, 1.00000000e+00])
solve(x1) # invalid initial value:
# fun: -29.534653465326528
# jac: array([ -1.16831683, -23.36633663])
# message: 'Positive directional derivative for linesearch'
# ...
# status: 8
# success: False
# x: array([-0.58415842, 1.07920792])
As #pv. pointed out in the accepted answer, the algorithm can't verify that this is a minimum:
I think this message means that the optimizer got into a position where it did not manage to find a direction where the value of the objective function decreases (fast enough), but could also not verify that the current position is a minimum.
It's not a complete answer, but you can see the source code that generates the smode here:
https://github.com/scipy/scipy/blob/master/scipy/optimize/slsqp/slsqp_optmz.f
Assignments of mode = 8 (the "Positive directional derivative for linesearch" you are asking about) can be found in lines 412 and 486. If can figure out why they are assigned in the code, you've got your answer.