scipy optimize minimize: hess_inv strongly depends on initial guess - scipy

I am using scipy.optimize.minimize to minimize a simple log likelihood function. The Hessian matrix doesn't seem to behave well.
import scipy.optimize as op
def lnlike(theta, n, bhat, fhat, sigb, sigf):
S, b, f = theta
mu = f*S + b
scb2 = ((b-bhat)/sigb)**2
scf2 = ((f-fhat)/sigf)**2
return n*np.log(mu) - mu - 0.5*(scb2+scf2)
nll = lambda *args: -lnlike(*args)
myargs=(21.0, 20.0, 0.5, 6.0, 0.1)
If the initial guess is at the minimum, the iteration doesn't go anywhere. That is fine in terms of the parameter values, but it doesn't touch Hessian (still identity) either, so I cannot use it for uncertainty estimation.
x0 = [2.0, 20.0, 0.5] # initial guess is at the minimum
result = op.minimize(nll, x0, args= myargs)
print result
status: 0
success: True
njev: 1
nfev: 5
hess_inv: array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])
fun: -42.934971192191881
x: array([ 2. , 20. , 0.5])
message: 'Optimization terminated successfully.'
jac: array([ 0.00000000e+00, 0.00000000e+00, 9.53674316e-07])
If I change the initial guess a little bit, it seems to return a sensible hess_inv.
x0 = [2.01, 20.0, 0.5]
result = op.minimize(nll, x0, args= myargs)
print result
print np.sqrt(result.hess_inv[0,0])
status: 0
success: True
njev: 15
nfev: 75
hess_inv: array([[ 2.16004477e+02, -7.60588367e+01, -2.94846112e-02],
[ -7.60588367e+01, 3.55748024e+01, 2.74064505e-03],
[ -2.94846112e-02, 2.74064505e-03, 9.98030944e-03]])
fun: -42.934971191969964
x: array([ 1.99984604, 19.9999814 , 0.5000001 ])
message: 'Optimization terminated successfully.'
jac: array([ -2.38418579e-06, -5.24520874e-06, 1.90734863e-06])
14.697090757
However, hess_inv is very sensitive to the initial guess.
x0 = [2.02, 20.0, 0.5]
result = op.minimize(nll, x0, args= myargs)
print result
print np.sqrt(result.hess_inv[0,0])
status: 0
success: True
njev: 16
nfev: 80
hess_inv: array([[ 1.82153214e+02, -6.03482772e+01, -2.97458789e-02],
[ -6.03482772e+01, 3.30771459e+01, -2.53811809e-03],
[ -2.97458789e-02, -2.53811809e-03, 9.99052952e-03]])
fun: -42.934971192188634
x: array([ 1.9999702 , 20.00000354, 0.50000001])
message: 'Optimization terminated successfully.'
jac: array([ -9.53674316e-07, -4.76837158e-07, -4.76837158e-07])
13.4964148462
Change the initial guess a bit more
x0 = [2.03, 20.0, 0.5]
result = op.minimize(nll, x0, args= myargs)
print result
print np.sqrt(result.hess_inv[0,0])
status: 0
success: True
njev: 14
nfev: 70
hess_inv: array([[ 2.30479371e+02, -7.36087027e+01, -3.79639119e-02],
[ -7.36087027e+01, 3.55785937e+01, 3.54182478e-03],
[ -3.79639119e-02, 3.54182478e-03, 9.97664441e-03]])
fun: -42.93497119204827
x: array([ 1.99975148, 20.00006366, 0.50000009])
message: 'Optimization terminated successfully.'
jac: array([ -9.53674316e-07, -9.53674316e-07, 4.29153442e-06])
15.1815470484
Did I miss something? Is this a bug or a feature?

The way I understand the optimizers, the Hessian are approximated by finite differences. In your case, it does not seem the best idea. Perhaps, utilizing Sympy (in IPython) will produce more usable results:
import sympy as sy
import numpy as np
import scipy.optimize as sopt
from IPython.display import display # nice printing
sy.init_printing() # LaTeX like printing for IPython
def lnlike(theta, n, bhat, fhat, sigb, sigf):
S, b, f = theta
mu = f*S + b
scb2 = ((b-bhat)/sigb)**2
scf2 = ((f-fhat)/sigf)**2
return n*sy.log(mu) - mu - (scb2+scf2) / 2
# declare symbols:
th_S, th_b, th_f = sy.symbols("theta_S, theta_b, theta_f", real=True)
theta = (th_S, th_b, th_f)
n, bhat, fhat = sy.symbols("n, \hat{b}, \hat{f}", real=True )
sigb, sigf = sy.symbols("sigma_b, sigma_d", real=True )
# symbolic optimizaton function:
lf = -lnlike(theta, n, bhat, fhat, sigb, sigf)
# Gradient:
dlf = sy.Matrix([lf.diff(th) for th in theta])
# Hessian:
Hlf = sy.Matrix([dlf.T.diff(th) for th in theta])
print("Symbolic Hessian:")
display(Hlf)
# Make numpy functions:
margs = {n:21, bhat:20, fhat:.5, sigb:6, sigf:.1} # parameters
lf_a, dlf_a, Hlf_a = lf.subs(margs), dlf.subs(margs), Hlf.subs(margs)
lf_lam = sy.lambdify(theta, lf_a, modules="numpy")
dlf_lam = sy.lambdify(theta, dlf_a, modules="numpy")
Hlf_lam = sy.lambdify(theta, Hlf_a, modules="numpy")
nlf = lambda xx: np.array(lf_lam(xx[0], xx[1], xx[2])) # function
ndlf = lambda xx: np.array(dlf_lam(xx[0], xx[1], xx[2])).flatten() # gradient
nHlf = lambda xx: np.array(Hlf_lam(xx[0], xx[1], xx[2])) # Hessian
x0 = [2.02, 20.0, 0.5]
rs = sopt.minimize(nlf, x0, jac=ndlf, hess=nHlf, method='Newton-CG')
print(rs)
print("Hessian:")
print(nHlf(rs.x))

If you're using a quasi-Newton method, which from the documentation it appears you are:
Quasi-Newton methods build up a guess at the Hessian inverse by applying a sequence of low-rank updates to a completely naive guess (typically a multiple of the identity). The low-rank updates used are in some sense the "least-change" updates that make a given equation hold, and the meaning of "least-change" varies with the quasi-Newton method chosen. If you start at, or very close to, the minimiser, the optimiser will figure this out very quickly and it won't build up much information in its approximation to the Hessian inverse.

Related

GPflow - GP classification with 1-dim Linear kernel fits poorly for 2 dimension data

Following the issue #1435, I have an additional question to how to use GPflow.
I replicate the issue in an additional kernel: https://github.com/avalonhse/BayesNotebook/blob/master/Issue_2_GPFlow_Linear_Classification.ipynb
My purpose is fitting an additive kernel to a 2-dimensional data (squared exponential in dimension 1 and linear kernel in dimension 2). Following the instruction of #1435, I have been successfully fitting the model with kernel gpflow.kernels.Linear(variance= 0.1).
Linear kernel
However, when I use the kernel gpflow.kernels.Linear(active_dims=1,variance= 0.01) as I original planned, the model is not fitted. I used the GPy with same kernel as a reference then the result looks reasonable.
1-dim GPFlow kernel
import numpy as np
X = np.array([[ 9.96578428, 60.],[ 9.96578428, 40.],[ 9.96578428, 20.],
[10.96578428, 30.],[11.96578428, 40.],[12.96578428, 50.],
[12.96578428, 70.],[8.96578428, 30. ],[ 7.96578428, 40.],
[ 6.96578428, 50.],[ 6.96578428, 30.],[ 6.96578428, 10.],
[11.4655664 , 71.],[ 8.56605404, 63.],[12.41574177, 69.],
[10.61562964, 48.],[ 7.61470984, 51.],[ 9.31514956, 45.]])
Y = np.array([[1., 1., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 1., 1., 0., 0., 1., 0.]]).T
# plotting
import matplotlib.pyplot as plt
import matplotlib
%matplotlib inline
def plot(X,Y):
mask = Y[:, 0] == 1
plt.figure(figsize=(6, 6))
plt.plot(X[mask, 0], X[mask, 1], "oC0", mew=0, alpha=0.5)
plt.ylim(-10, 100)
plt.xlim(5, 15)
_ = plt.plot(X[np.logical_not(mask), 0], X[np.logical_not(mask), 1], "oC1", mew=0, alpha=0.5)
plot(X,Y)
# Evaluate real function and the predicted probability
res = 500
xx, yy = np.meshgrid(np.linspace(5, 15, res),
np.linspace(- 10, 120, res))
Xplot = np.vstack((xx.flatten(), yy.flatten())).T
# Code followed the Notebook : https://gpflow.readthedocs.io/en/develop/notebooks/basics/classification.html
import tensorflow as tf
import tensorflow_probability as tfp
import gpflow
from gpflow.utilities import print_summary, set_trainable, to_default_float
gpflow.config.set_default_summary_fmt("notebook")
def testGPFlow(k):
m = gpflow.models.VGP(
(X, Y),
kernel= k,
likelihood=gpflow.likelihoods.Bernoulli()
)
print("\n ########### Model before optimzation ########### \n")
print_summary(m)
print("\n ########### Model after optimzation ########### \n")
opt = gpflow.optimizers.Scipy()
res = opt.minimize(
m.training_loss, variables=m.trainable_variables, options=dict(maxiter=2500), method="L-BFGS-B"
)
print(' Message: ' + str(res.message) + '\n Status = ' + str(res.status) + '\n Number of iterations = ' + str(res.nit))
print_summary(m)
means, _ = m.predict_y(Xplot) # here we only care about the mean
y_prob = means.numpy().reshape(*xx.shape)
print("Fitting model using GPFlow")
plot(X,Y)
_ = plt.contour(
xx,
yy,
y_prob,
[0.5], # plot the p=0.5 contour line only
colors="k",
linewidths=1.8,
zorder=100,
)
k = gpflow.kernels.Linear(active_dims=[1],variance= 0.01)
testGPFlow(k)
k = gpflow.kernels.Linear(variance= 1)
testGPFlow(k)
The GPy code is for reference only to suggest how a fitted model should be. I am aware that GPy and GPflow use different methods. My question is why GPflow model does not fit when I specify the Linear kernel in 1 dimension.
Thanks for posting this question, Hoang, and for using GPflow.
When you specify input_dim in Gpy, you are telling the algorithm to act on two dimensions. Active_dims in GPflow behaves differently. It specifies which dimensions you want the kernel to act on. 'active_dims = 1' is telling GPflow to apply your linear kernel to only the y dimension.
Since you want your kernel to act on both x and y dimensions, you should specify active_dims = [0,1] rather than just 'active_dims = 1.' When I run your code with this fix, I get a result identical to GPy's result:

Kalman Filter (pykalman): Value for obs_covariance and model without intercept

I am looking at the KalmanFilter from pykalman shown in examples:
pykalman documentation
Example 1
Example 2
and I am wondering
observation_covariance=100,
vs
observation_covariance=1,
the documentation states
observation_covariance R: e(t)^2 ~ Gaussian (0, R)
How should the value be set here correctly?
Additionally, is it possible to apply the Kalman filter without intercept in the above module?
The observation covariance shows how much error you assume to be in your input data. Kalman filter works fine on normally distributed data. Under this assumption you can use the 3-Sigma rule to calculate the covariance (in this case the variance) of your observation based on the maximum error in the observation.
The values in your question can be interpreted as follows:
Example 1
observation_covariance = 100
sigma = sqrt(observation_covariance) = 10
max_error = 3*sigma = 30
Example 2
observation_covariance = 1
sigma = sqrt(observation_covariance) = 1
max_error = 3*sigma = 3
So you need to choose the value based on your observation data. The more accurate the observation, the smaller the observation covariance.
Another point: you can tune your filter by manipulating the covariance, but I think it's not a good idea. The higher the observation covariance value the weaker impact a new observation has on the filter state.
Sorry, I did not understand the second part of your question (about the Kalman Filter without intercept). Could you please explain what you mean?
You are trying to use a regression model and both intercept and slope belong to it.
---------------------------
UPDATE
I prepared some code and plots to answer your questions in details. I used EWC and EWA historical data to stay close to the original article.
First of all here is the code (pretty the same one as in the examples above but with a different notation)
from pykalman import KalmanFilter
import numpy as np
import matplotlib.pyplot as plt
# reading data (quick and dirty)
Datum=[]
EWA=[]
EWC=[]
for line in open('data/dataset.csv'):
f1, f2, f3 = line.split(';')
Datum.append(f1)
EWA.append(float(f2))
EWC.append(float(f3))
n = len(Datum)
# Filter Configuration
# both slope and intercept have to be estimated
# transition_matrix
F = np.eye(2) # identity matrix because x_(k+1) = x_(k) + noise
# observation_matrix
# H_k = [EWA_k 1]
H = np.vstack([np.matrix(EWA), np.ones((1, n))]).T[:, np.newaxis]
# transition_covariance
Q = [[1e-4, 0],
[ 0, 1e-4]]
# observation_covariance
R = 1 # max error = 3
# initial_state_mean
X0 = [0,
0]
# initial_state_covariance
P0 = [[ 1, 0],
[ 0, 1]]
# Kalman-Filter initialization
kf = KalmanFilter(n_dim_obs=1, n_dim_state=2,
transition_matrices = F,
observation_matrices = H,
transition_covariance = Q,
observation_covariance = R,
initial_state_mean = X0,
initial_state_covariance = P0)
# Filtering
state_means, state_covs = kf.filter(EWC)
# Restore EWC based on EWA and estimated parameters
EWC_restored = np.multiply(EWA, state_means[:, 0]) + state_means[:, 1]
# Plots
plt.figure(1)
ax1 = plt.subplot(211)
plt.plot(state_means[:, 0], label="Slope")
plt.grid()
plt.legend(loc="upper left")
ax2 = plt.subplot(212)
plt.plot(state_means[:, 1], label="Intercept")
plt.grid()
plt.legend(loc="upper left")
# check the result
plt.figure(2)
plt.plot(EWC, label="EWC original")
plt.plot(EWC_restored, label="EWC restored")
plt.grid()
plt.legend(loc="upper left")
plt.show()
I could not retrieve data using pandas, so I downloaded them and read from the file.
Here you can see the estimated slope and intercept:
To test the estimated data I restored the EWC value from the EWA using the estimated parameters:
About the observation covariance value
By varying the observation covariance value you tell the Filter how accurate the input data is (normally you just describe your confidence in the observation using some datasheets or your knowledge about the system).
Here are estimated parameters and the restored EWC values using different observation covariance values:
You can see the filter follows the original function better with a bigger confidence in observation (smaller R). If the confidence is low (bigger R) the filter leaves the initial estimate (slope = 0, intercept = 0) very slowly and the restored function is far away from the original one.
About the frozen intercept
If you want to freeze the intercept for some reason, you need to change the whole model and all filter parameters.
In the normal case we had:
x = [slope; intercept] #estimation state
H = [EWA 1] #observation matrix
z = [EWC] #observation
Now we have:
x = [slope] #estimation state
H = [EWA] #observation matrix
z = [EWC-const_intercept] #observation
Results:
Here is the code:
from pykalman import KalmanFilter
import numpy as np
import matplotlib.pyplot as plt
# only slope has to be estimated (it will be manipulated by the constant intercept) - mathematically incorrect!
const_intercept = 10
# reading data (quick and dirty)
Datum=[]
EWA=[]
EWC=[]
for line in open('data/dataset.csv'):
f1, f2, f3 = line.split(';')
Datum.append(f1)
EWA.append(float(f2))
EWC.append(float(f3))
n = len(Datum)
# Filter Configuration
# transition_matrix
F = 1 # identity matrix because x_(k+1) = x_(k) + noise
# observation_matrix
# H_k = [EWA_k]
H = np.matrix(EWA).T[:, np.newaxis]
# transition_covariance
Q = 1e-4
# observation_covariance
R = 1 # max error = 3
# initial_state_mean
X0 = 0
# initial_state_covariance
P0 = 1
# Kalman-Filter initialization
kf = KalmanFilter(n_dim_obs=1, n_dim_state=1,
transition_matrices = F,
observation_matrices = H,
transition_covariance = Q,
observation_covariance = R,
initial_state_mean = X0,
initial_state_covariance = P0)
# Creating the observation based on EWC and the constant intercept
z = EWC[:] # copy the list (not just assign the reference!)
z[:] = [x - const_intercept for x in z]
# Filtering
state_means, state_covs = kf.filter(z) # the estimation for the EWC data minus constant intercept
# Restore EWC based on EWA and estimated parameters
EWC_restored = np.multiply(EWA, state_means[:, 0]) + const_intercept
# Plots
plt.figure(1)
ax1 = plt.subplot(211)
plt.plot(state_means[:, 0], label="Slope")
plt.grid()
plt.legend(loc="upper left")
ax2 = plt.subplot(212)
plt.plot(const_intercept*np.ones((n, 1)), label="Intercept")
plt.grid()
plt.legend(loc="upper left")
# check the result
plt.figure(2)
plt.plot(EWC, label="EWC original")
plt.plot(EWC_restored, label="EWC restored")
plt.grid()
plt.legend(loc="upper left")
plt.show()

BVP4c solve for unknown boundary

I am trying to use bvp4c to solve a system of 4 odes. The issue is that one of the boundaries is unknown.
Can bvp4c handle this? In my code L is the unknown I am solving for.
I get an error message printed below.
function mat4bvp
L = 8;
solinit = bvpinit(linspace(0,L,100),#mat4init);
sol = bvp4c(#mat4ode,#mat4bc,solinit);
sint = linspace(0,L);
Sxint = deval(sol,sint);
end
% ------------------------------------------------------------
function dtdpdxdy = mat4ode(s,y,L)
Lambda = 0.3536;
dtdpdxdy = [y(2)
-sin(y(1)) + Lambda*(L-s)*cos(y(1))
cos(y(1))
sin(y(1))];
end
% ------------------------------------------------------------
function res = mat4bc(ya,yb,L)
res = [ ya(1)
ya(2)
ya(3)
ya(4)
yb(1)];
end
% ------------------------------------------------------------
function yinit = mat4init(s)
yinit = [ cos(s)
0
0
0
];
end
Unfortunately I get the following error message ;
>> mat4bvp
Not enough input arguments.
Error in mat4bvp>mat4ode (line 13)
-sin(y(1)) + Lambda*(L-s)*cos(y(1))
Error in bvparguments (line 105)
testODE = ode(x1,y1,odeExtras{:});
Error in bvp4c (line 130)
bvparguments(solver_name,ode,bc,solinit,options,varargin);
Error in mat4bvp (line 4)
sol = bvp4c(#mat4ode,#mat4bc,solinit);
One trick to transform a variable end point into a fixed one is to change the time scale. If x'(t)=f(t,x(t)) is the differential equation, set t=L*s, s from 0 to 1, and compute the associated differential equation for y(s)=x(L*s)
y'(s)=L*x'(L*s)=L*f(L*s,y(s))
The next trick to employ is to transform the global variable into a part of the differential equation by computing it as constant function. So the new system is
[ y'(s), L'(s) ] = [ L(s)*f(L(s)*s,y(s)), 0 ]
and the value of L occurs as additional free left or right boundary value, increasing the number of variables = dimension of the state vector to the number of boundary conditions.
I do not have Matlab readily available, in Python with the tools in scipy this can be implemented as
from math import sin, cos
import numpy as np
from scipy.integrate import solve_bvp, odeint
import matplotlib.pyplot as plt
# The original function with the interval length as parameter
def fun0(t, y, L):
Lambda = 0.3536;
#print t,y,L
return np.array([ y[1], -np.sin(y[0]) + Lambda*(L-t)*np.cos(y[0]), np.cos(y[0]), np.sin(y[0]) ]);
# Wrapper function to apply both tricks to transform variable interval length to a fixed interval.
def fun1(s,y):
L = y[-1];
dydt = np.zeros_like(y);
dydt[:-1] = L*fun0(L*s, y[:-1], L);
return dydt;
# Implement evaluation of the boundary condition residuals:
def bc(ya, yb):
return [ ya[0],ya[1], ya[2], ya[3], yb[0] ];
# Define the initial mesh with 5 nodes:
x = np.linspace(0, 1, 3)
# This problem has multiple solutions. Try two initial guesses.
L_a=8
L_b=9
y_a = odeint(lambda y,t: fun1(t,y), [0,0,0,0,L_a], x)
y_b = odeint(lambda y,t: fun1(t,y), [0,0,0,0,L_b], x)
# Now we are ready to run the solver.
res_a = solve_bvp(fun1, bc, x, y_a.T)
res_b = solve_bvp(fun1, bc, x, y_b.T)
L_a = res_a.sol(0)[-1]
L_b = res_b.sol(0)[-1]
print "L_a=%.8f, L_b=%.8f" % ( L_a,L_b )
# Plot the two found solutions. The solution are in a spline form, use this to produce a smooth plot.
x_plot = np.linspace(0, 1, 100)
y_plot_a = res_a.sol(x_plot)[0]
y_plot_b = res_b.sol(x_plot)[0]
plt.plot(L_a*x_plot, y_plot_a, label='L=%.8f'%L_a)
plt.plot(L_b*x_plot, y_plot_b, label='L=%.8f'%L_b)
plt.legend()
plt.xlabel("t")
plt.ylabel("y")
plt.grid(); plt.show()
which produces
Trying different initial values for L finds other solutions on quite different scales, among them
L=0.03195111
L=0.05256775
L=0.05846539
L=0.06888907
L=0.08231966
L=4.50411522
L=6.84868060
L=20.01725616
L=22.53189063

Understanding the Jacobian output of scipy.optimize.minimize

I'm working with scipy.optimize.minimize to find the minimum of the RSS for a custom nonlinear function. I'll provide a simple linear example to illustrate what I am doing:
import numpy as np
from scipy import optimize
def response(X, b0, b1, b2):
return b2 * X[1]**2 + b1 * X[0] + b0
def obj_rss(model_params, y_true, X):
return np.sum((y_true - response(X, *model_params))**2)
x = np.array([np.arange(0, 10), np.arange(10, 20)])
r = 15. * x[1]**2 - 32. * x[0] + 10.
init_guess = np.array([0., 50., 10.])
res = optimize.minimize(obj_rss, init_guess, args=(r, x))
print res
This yields the results:
fun: 3.0218799331864133e-08
hess_inv: array([[ 7.50606278e+00, 2.38939463e+00, -8.33333575e-02],
[ 2.38939463e+00, 8.02462363e-01, -2.74621294e-02],
[ -8.33333575e-02, -2.74621294e-02, 9.46969972e-04]])
jac: array([ -3.31359843e-07, -5.42022462e-08, 2.34304025e-08])
message: 'Optimization terminated successfully.'
nfev: 45
nit: 6
njev: 9
status: 0
success: True
x: array([ 10.00066577, -31.99978062, 14.99999243])
And we see that the fitted parameters 10, -32, and 15 are equivalent to those used to generate the actuals data. That's great. Now my question:
I have the understanding that the Jacobian should be an m x n matrix where m is the number of records from the X input and n is the number of parameters. Clearly I don't have that in the results object. The results object yields an array that is referred to as the Jacobian in the documentation (1 and 2), but is only one-dimensional with a number of elements equal to the number of parameters.
Further confusing the matter, when I use method='SLSQP', the Jacobian that is returned has one more element than that returned by other minimization algorithms.
. . .
My larger goal here is to be able to calculate either confidence intervals or standard errors, t-, and p-values for the fitted parameters, so if you think I'm way off track here, please let me know.
EDIT:
The following is intended to show how the SLSQP minimization algorithm yields different results in the Jacobian than the default minimization algorithm, which is one of BFGS, L-BFGS-B, or SLSQP, depending on if the problem has constraints (as mentioned in the documentation). The SLSQP solver is intended for use with constraints.
import numpy as np
from scipy import optimize
def response(X, b0, b1, b2):
return b2 * X[1]**2 + b1 * X[0] + b0
def obj_rss(model_params, y_true, X):
return np.sum((y_true - response(X, *model_params))**2)
x = np.array([np.arange(0, 10), np.arange(10, 20)])
r = 15. * x[1]**2 - 32. * x[0] + 10.
init_guess = np.array([0., 50., 10.])
res = optimize.minimize(obj_rss, init_guess, method='SLSQP', args=(r, x))
print res
r_pred = response(x, *res.x)
Yields results:
fun: 7.5269461938291697e-10
jac: array([ 2.94677643e-05, 5.52844499e-04, 2.59870917e-02,
0.00000000e+00])
message: 'Optimization terminated successfully.'
nfev: 58
nit: 10
njev: 10
status: 0
success: True
x: array([ 10.00004495, -31.9999794 , 14.99999938])
One can see that there is an extra element in the Jacobian array that is returned from the SLSQP solver. I am confused where this comes from.

scipy.optimize.linprog seems to solve the task but doesn't return the x?

I'm trying to solve a very simple linear program using scipy.optimize.linprog, and it seems the function does what I want it to do, but somehow it doesn't return the 'x' (it does return the correct minimal function value)
Just for a simple example (in matlab notation), I have a 2-D a=[a1; a2] and simple linear constraint [1, 2] * a = 1, and want to minimize the L1 norm of a. The optimum should be a=[0, 0.5].
As far as I understand, I can formulate this in standard form by using an extra variable s, such that b>=abs(a) (i.e. a-b<=0 and -a-b<=0) and minimize sum(b) subject to these constraints and the original equality constraint [1, 2] * a = 1.
So I define x= [a; b], plug it into scipy's linprog, it returns with a success, and I get the correct answer: optimal value of sum(b) is 0.5. However, the x that it returns is full of nan's instead of [0; 0.5; 0; 0.5]
Here's the code:
A = np.array([1,2]).reshape([1,2])
b_eq = np.array([1])
ones = np.ones([2,])
zeros = np.zeros([2,])
zerosm = np.zeros([1, 2])
eye = np.eye(2)
c = np.hstack([zeros, ones])
A_ub = np.vstack([np.hstack([eye, -eye]), np.hstack([-eye, -eye])])
b_ub = np.hstack([zeros, zeros])
A_eq = np.hstack([A, zerosm])
res = scipy.optimize.linprog(c, A_ub=A_ub, b_ub=b_ub, bounds=(None, None),
A_eq=A_eq, b_eq=b_eq)
The result:
success: True
status: 0
fun: 0.5
x: array([ nan, nan, nan, nan])
nit: 3
slack: array([ 0., 0., 0., 1.])
message: 'Optimization terminated successfully.'
I.e. x is nan's instead of the solution. The function value is correct (0.5), and slacks seem fine - according to the scipy docs slack of 0 means that the constraint is active, so 1st and 3rd zero mean that a1=b1=0, and the 2nd zero means a2=b2 and they are not zero (otherwise the 4th slack would also be 0). This is again expected as [0, 0.5] is the solution.
What am I doing wrong? Is this a bug? (using scipy 0.15.1)
Thanks!
Apparently you are running into a bug that has been fixed since version 0.15.1.
When I run your code with scipy 0.18.0, I get:
In [3]: import scipy.optimize
In [4]: %paste
A = np.array([1,2]).reshape([1,2])
b_eq = np.array([1])
ones = np.ones([2,])
zeros = np.zeros([2,])
zerosm = np.zeros([1, 2])
eye = np.eye(2)
c = np.hstack([zeros, ones])
A_ub = np.vstack([np.hstack([eye, -eye]), np.hstack([-eye, -eye])])
b_ub = np.hstack([zeros, zeros])
A_eq = np.hstack([A, zerosm])
res = scipy.optimize.linprog(c, A_ub=A_ub, b_ub=b_ub, bounds=(None, None),
A_eq=A_eq, b_eq=b_eq)
## -- End pasted text --
In [5]: res
Out[5]:
fun: 0.5
message: 'Optimization terminated successfully.'
nit: 4
slack: array([ 0., 0., 0., 1.])
status: 0
success: True
x: array([ 0. , 0.5, 0. , 0.5])
I try to solve your problem with scipy 1.3.1 and it works well:
con: array([ 7.07545134e-12])
fun: 0.49999999999882094
message: 'Optimization terminated successfully.'
nit: 4
slack: array([ 1.07107656e-11, -4.48552306e-12, -4.75552930e-12,
1.00000000e+00])
status: 0
success: True
x: array([ -7.73314746e-12, 5.00000000e-01, 2.97761815e-12,
5.00000000e-01])