scipy.optimize.minimize: success is True, but L-BFGS-B prints a worrying warning - scipy

I use scipy.optimize.minimize() for univariate linear regression where I want the cost function to be different that quadratic (thus can't use scipy.stats.linregress()).
Because I specify bounds (a and b need to be positive in my y = a*x + b), SciPy chooses L-BFGS-B as a solver. I trust SciPy on that, plus I don't change any of its default parameter values.
The result of fitting is:
Warning: more than 10 function and gradient
evaluations in the last line search. Termination
may possibly be caused by a bad search direction.
hess_inv: <2x2 LbfgsInvHessProduct with dtype=float64>
jac: array([4.42014425e+00, 1.81898925e-04])
message: 'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
nfev: 105
nit: 7
njev: 35
status: 0
success: True
So, on one hand, success is True (great), but then the Warning message suggests that perhaps the algorithm terminated in a bad spot because of some other stop condition. The question is then: what shall I do with it? I can't even detect this situation automatically, as it is just a Warning printed on the screen. Although the d dict mentioned here could be helpful, it is not returned by the wrapping function.
In short: Is it enough to check the 'success' flag? Should I fight it with parameters manipulation or is that fine?

Related

In OpenMDAO, is there a way to ensure that the constraints are respected before proceeding with a computation?

I have a constrained nonlinear optimization problem, "A". Inside the computation is an om.Group which I'll call "B" that requires a nonlinear solve. Whether "B" finds a solution or crashes seems to depend on its initial conditions. So far I've found that some of the initial conditions given to "B" are inconsistent with the constraints on "A", and that this seems to be contributing to its propensity for crashing. The constraints on "A" can be computed before "B".
If the objective of "A" could be computed before "B" then I would put "A" in its own group and have it pass its known-good solution to "B". However, the objective of "A" can only be computed as a result of the converged solution of "B". Is there a way to tell OpenMDAO or the optimizer (right now I'm using ScipyOptimizerDriver and the SLSQP method) that when it chooses a new point in design-variable space, it should check that the constraints of "A" hold before proceeding to "B"?
A slightly simpler example (without the complication of an initial guess) might be:
There are two design variables 0 < x1 < 1, 0 < x2 < 1.
There is a constraint that x2 >= x1.
Minimize f(sqrt(x2 - x1), x1) where f crashes if given imaginary inputs. How can I make sure that the driver explores the design space without giving f a bad input?
I have two proposed solutions. The best one is highly problem dependent. You can either raise an AnalysisError or use numerical clipping.
import numpy as np
import openmdao.api as om
class SafeComponent(om.ExplicitComponent):
def setup(self):
self.add_input('x1')
self.add_input('x2')
self.add_output('y')
def compute(self, inputs, outputs):
x1 = inputs['x1']
x2 = inputs['x2']
diff = x1 - x2
######################################################
# option 1: raise an error, which causes the
# optimizer line search to backtrack
######################################################
# if (diff < 0):
# raise om.AnalysisError('invalid inputs: x2 > x1')
######################################################
# option 2: use numerical clipping
######################################################
if (diff < 0):
diff = 0.
outputs['y'] = np.sqrt(diff)
# build the model
prob = om.Problem()
prob.model.add_subsystem('sc', SafeComponent(), promotes=['*'])
prob.setup()
prob['x1'] = 10
prob['x2'] = 20
prob.run_model()
print(prob['y'])
Option 1: raise an AnalysisError
Some optimizers are set up to handle this well. Others are not.
As of V3.7.0, the OpenMDAO wrappers for SLSQP from scipy and pyoptsparse, and the SNOPT/IPOPT wrappers from pyoptsparse all handle AnalysisErrors gracefully.
When the error is raised, the execution stops and the optimizer recognizes a failed case. It backtracks on the linesearch a bit to try and get out of the situation. It will usually try a few steps backwards, but at some point it will give up. So the success of this situation depends a bit on why you ended up in the bad part of the space and how much the gradients are pushing you back into it.
This solution works very well with fully analytic derivatives. The reason is that (most) gradient based optimizers will only ever ask for function evaluations along a line search operation. So that means that, as long as a clean point is found, you're always able to be able to compute derivatives at that point as well.
If you're using finite-differences, you could end a line search right near the error condition, but not violating it (e.g. x1=1, x2=.9999999). Then during the FD step to compute derivatives, you might end up tripping the error condition and raising the error. The optimizer is not going to be able to recover from this condition. Errors during FD steps will effectively kill the whole opt.
So, for this reason I never recommend the AnalysisError approach if you're suing FD.
Option 2: Numerical Clipping
If you optimizer wrapper does not have the ability to handle an AnalysisError, you can try some numerical clipping instead. You can add a filter in your calcs to to keep the values numerically safe. However, you obviously need to use this very carefully. You should at least add an additional constraint that forces the optimizer to keep away from the error condition when converged (e.g. x1 >= x2).
One important note: if you provide analytic derivatives, include the clipping in them!
Sometimes the optimizer just wants to pass through this bad region on its way to the answer. In that case, the simple clipping I show here is probably fine. Other times it wants to ride the constraint (be sure you add that constraint!!!) and then you probably want a more smoothly varying type of clipping. In other words don't use a simple if-condition. Smooth the round corner a bit, and maybe make the value asymptotically approach 0 from a very small value. This way you have a c1 continuous function and the derivatives won't got to exactly 0 for these inputs.

Divide-by-zero encountered: rhok assumed large error using scipy.optimizor

I used scipy.optimize.fmin_bfgs to minimize the hinge loss (SVM). However, there are errors :
Divide-by-zero encountered: rhok assumed large.
Somebody said that “It had to do with the training data set”, anybody knows how to deal with the problem?
From the source code of scipy, rhok is,
rhok = 1.0 / (numpy.dot(yk, sk))
where both yk and sk depend on intput array x0.
A possible causes of this error may be a bad choice of initial condition x0 which tends to singularities in your function f. I would suggest plotting your function and maybe ensuring initial conditions are always away from possible divergent values. If this is part of a larger training routine, you could possibly use try and on catching an ZeroDivisionError try shifting the initial condition shifted by some amount. You may also find a different minimisation method is more robust from scipy minimize.
If you add the full_output option to scipy.optimize.fmin_bfgs it should give you more information about you particular case.

Maple - Integration returns undefined for very simple and possible integrations

This is a question about maple producing undefined errors.
The code below should give the result 0 but instead maple chooses to label it "undefined".
(nj*(nj-1))*(int(N^(ni+nj-2),N=-1..1));
ni:=0; nj:=0;
Since nj=0 you can see quite clearly that even before the integral, the answer is 0 x integral.
The integral is possible to do and doing it by hand it gives you (-1/N) evaluated between 1 and -1
so substituting in (-1/1)-(-1/-1) which is -1-1 = -2).
The overall answer is given by 0x-2 which is 0.
Maple returns undefined.
However if you take a subsection of that code (just the integral)
(int(N^(ni+nj-2),N=-1..1)) or even (int(N^(-2),N=-1..1))
then maple returns infinity.
Neither of these are correct.
Can anyone explain to me why this happens?
I think others are likely to come across a similar issue because it is such a simple maple procedure. Yet it gives a confusing result.
As was already shared in the comments, 0 times infinity is undefined, see e.g. Why is infinity multiplied by zero not an easy zero answer
To still keep your Maple sheet as intact as possible, you can always include if-statements in the code, which is really easy
if nj = 0 then
#do something
end if;
However, you should always check if you are doing the right thing mathematically, as Maple does output Undefined for a reason!

keving murphy's hmm matlab toolbox assertion error

I am working on a project that needs to use hidden markov models. I downloaded Kevin Murphy's toolbox. I have some problems about the usage. In the toolbox webpage, he says that first input of dhmm_em and dhmm_logprob are symbol sequence data. On their examples, they give row vectors as data. So, when I give my symbol sequence as row vector, I get error;
??? Error using ==> assert at 9
assertion violated:
Error in ==> fwdback at 105
assert(approxeq(sum(alpha(:,t)),1))
Error in ==> dhmm_logprob at 17
[alpha, beta, gamma, ll] = fwdback(prior,
transmat, obslik, 'fwd_only', 1);
Error in ==> mainCourseProject at 110
loglik(train_act) =
dhmm_logprob(orderedSymbols,
hmm{train_act}.prior,
hmm{train_act}.trans,
hmm{act}.emiss);
However, before giving this error, code works for some symbol vectors. When I give my data as column vector, functions work fine, no errors. So why exactly am I getting this error?
You might say that I should be giving not single vectors, but vector sets, I also tried to collect my feature vectors in a struct and give row vectors as such, but nothing changed, I still get assertion error.
By the way, my symbol sequence does not have any zeros, I am doing everything almost the same as they showed in their examples, so I would be greatful if anyone could help me please.
Im not sure, but from the function call stack shown above, shouldn't the last line be hmm{train_act}.emiss instead of hmm{act}.emiss.
In other words when you computing the log-probability of a sequence, you should pass components that belong to the same HMM model (transition matrix, emission matrix, and prior probabilities).
By the way, the ASSERT in the code is a sanity check that a vector of probabilities should sum to 1. Oftentimes, when working with very small values (log-probabilities), numerical stability issues can creep in... You could edit the APPROXEQ function to relax the comparison a bit, by giving it a bigger margin of error
This error message and the code it refers to are human-readable. An assertion is a guard put in by the programmer, to ensure that certain conditions are met. In this case, what is the condition? approxeq(sum(alpha(:,t)),1) I'd venture to say that approxeq wants the values to be approximately equal, so this boils down to: sum(alpha(:,t)) ~= 1
Without knowing anything about the code, I'd also guess that these refer to probabilities. The probabilities of a node's edges must sum to one. Hopefully this starts you down a productive debugging path. If you can't figure out what's wrong with your input that produces this condition, start wading into the code a bit to see where this alpha vector comes from, and how it ended up invalid.

what is the difference between xtol and ftol to use fmin() of scipy.optimize?

I'm starting to use the function fmin with a very simple example and I try to get the values ​​of a vector that minimizes the value of their multiplication:
def prueba(x,y):
print "valor1:",x[0],"\n"
print "valor2:",x[1],"\n"
print "valor3:",x[2],"\n"
print "valor4:",x[3],"\n"
min=x[0]*x[1]*x[2]*x[3]
print min
return min
sal = fmin(prueba,x0=array([1, 2, 3,4]),args="1",retall=1,xtol=0.5,ftol=0.5)#maxfun=1,maxiter=1,retall=1,args="1")
but if I dont define xtol and ftol appears:
"Warning: Maximum number of function evaluations has been exceeded."
For this reason I have defined the convergence of the algorithm using the parameters xtol and ftol,but i still don't understand what is the difference between them, I look the same, but if I delete one of the two I get the warning again.
What exactly is the difference of xtol and ftol?, Which should use in this case?.
I have read the documentation:
OtherParameters
xtol : number
acceptable relative error in xopt for convergence.
ftol : number
acceptable relative error in func(xopt) for convergence.
I still do not understand
Here's my understanding. It's similar to the mathworks function fminsearch. They define these values:
TolFun: Termination tolerance on the function value
TolX: Termination tolerance on x
As the search proceeds in its iterative fashion. The difference in the values of x from one iteration to another become smaller and smaller, until it doesn't matter any more and you might as well be done. Same goes for the function tolerance. In your example, prueba is evaluated and the difference between its return value from iteration to iteration gets smaller and smaller, until it doesn't matter either. You asked which you should use. This can bit a bit of an experimental approach. In the past I have often used:
xtol = 1e-6;
ftol = 1e-6;
It seems to scale well to many problems, and is a good place to start. You will likely find that if one needs to be tweaked, it will be obvious. Like horrid convergence times. Poor goodness of fit in the data, etc. Hope this helps.
I know this question is a bit old, but still I want to point out that #user1264127's example simply does not converge. Depending on the specific details of the fmin algorithm, this example may be diverging exponentially. If you want to see the differences between xtol and ftol, try a convergent example, like this:
def myFun(x):
return (x[0]-1.2)**2 + (x[1]+3.7)**2
optimize.fmin(myFun,[0,0])
The output when I run with default parameters:
Optimization terminated successfully.
Current function value: 0.000000
Iterations: 79
Function evaluations: 150
Out[4]: array([ 1.19997121, -3.70003115])
The output when I run with xtol=1e-12:
Optimization terminated successfully.
Current function value: 0.000000
Iterations: 138
Function evaluations: 263
Out[5]: array([ 1.2, -3.7])
And with xtol=1, ftol=1e-3:
Optimization terminated successfully.
Current function value: 0.000099
Iterations: 61
Function evaluations: 116
Out[17]: array([ 1.20348989, -3.69068928])