I am working on a third party software optimization problem using Scipy optimize.minimize with constraints and bounds (using the SLSQP method). Exactly I am giving inputs to a very complex function (can't write it here) that will launch my software and return me one output I need to minimize.
def func_to_minimize(param):
launch_software(param)
return software_output() # I retrieve the output after the software finish his computation
While working on it, I notice that during the optimization, the algorithm does not always respect the constraints.
However, the software I am trying to optimize cannot be run with certain inputs values (physical law not to be violated), I wrote these equations as constraints in my code. For example the output flow rate can't be greater than the input flow rate.
I would like to know if it is possible to respect these constraints even during the optimization.
One approach you could try is intercept param and check whether it's feasible before sending it to launch_software. If it's not feasible then return np.inf. I'm not sure that this will work with SLSQP, but give it a go.
One approach, that will work, would be to use optimize.differential_evolution. That function examines to see if your constraints are feasible before calculating the objective function; if it's not then your objective function isn't called. However, differential_evolution does ask for orders of magnitude more function evaluations, so if your objective function is expensive then that could be an issue. One amelioration would be vectorisation or parallel computation.
Related
I was wondering if there exists a technical way to choose initial parameters to these kind of problems (as they can take virtually any form). My question arises from the fact that my solution depends a little on initial parameters (as usual). My fit consists of 10 parameters and approximately 5120 data points (x,y,z) and has non linear constraints. I have been doing this by brute force, that is, trying parameters randomly and trying to observe a pattern but it led me nowhere.
I also have tried using MATLAB's Genetic Algorithm (to find a global optimum) but with no success as it seems my function has a ton of local minima.
For the purpose of my problem, I need justfy in some manner the reasons behind choosing initial parameters.
Without any insight on the model and likely values of the parameters, the search space is too large for anything feasible. Think that just trying ten values for every parameter corresponds to ten billion combinations.
There is no magical black box.
You can try Bayesian Optimization to find a global optimum for expensive black box functions. Matlab describes it's implementation [bayesopt][2] as
Select optimal machine learning hyperparameters using Bayesian optimization
but you can use it to optimize any function. Bayesian Optimization works by updating a prior belief over a distribution of functions with the observed data.
To speed up the optimization I would recommend adding your existing data via the InitialX and InitialObjective input arguments.
In kernalized ELM, they (www.ntu.edu.sg/home/egbhuang/pdf/ELM-Unified-Learning.pdf) mentioned that a kernel should satistsfy Mercer's condition. I didn't find a specific reason behind that. Please explain the reason.
The reason is explained here.
Let me quote it:
Finally, what happens if one uses a kernel which does not satisfy Mercer�s condition? In general, there may exist data such that the Hessian is indefinite, and for which the quadratic programming problem will have no solution (the dual objective function can become arbitrarily large). However, even for kernels that do not satisfy Mercer�s condition, one might still find that a given training set results in a positive semidefinite Hessian, in which case the training will converge perfectly well. In this case, however, the geometrical interpretation described above is lacking."
Burgess (1998)
So without a kernel which is satisfying Mercer's condition, you lose at least some convergence guarantees (it's possible, that you lose even more: e.g. convergence-speed or approximation-accuracy when early-stopping)!
I have a program using PSO algorithm using penalty function for Constraint Satisfaction. But when I run the program for different iterations, the output of the algorithm would be :
"Iteration 1: Best Cost = Inf"
.
Does anyone know why I always get inf answer?
There could be many reasons for that, none of which will be accurate if you don't provide a MWE with the code you have already tried or a context of the function you are analysing.
For instance, while studying the PSO algorithm you might use it on functions which have analytical solutions first. By doing this you can study the behaviour of the algorithm before applying to a similar problem, and fine tune its parameters.
My guess is that you might not be providing either the right function (I have done that already, getting a signal wrong is easy!), the right constraints (same logic applies), your weights for the penalty function and velocity update are way off.
When running the GlobalSearch solver on a nonlinear constrained optimization problem I have, I often get very different solutions each run. For the cases that I have an analytical solution, the numerical results are less dispersed than the non-analytical cases but are still different each run. It would be nice to get the same results at least for these analytical cases so that I know the optimization routine is working properly. Is there a good explanation of this in the Global Optimization Toolbox User Guide that I missed?
Also, why does GlobalSearch use a different number of local solver runs each run?
Thanks!
A full description of how the GlobalSearch algorithm works can be found Here.
In summary the GlobalSearch method iteratively performs convex optimization. Basically it starts out by using fmincon to search for a local minimum near the initial conditions you have provided. Then a bunch of "trial points", based on how good the initial result was, are generated using the "scatter search algorithm." Then there is some more convex optimization and rating of "how good" the minima around these points are.
There are a couple of things that can cause the algorithm give you different answers:
1. Changing the initial conditions you give it
2. The scatter search algorithm itself
The fact that you are getting different answers each time likely means that your function is highly non-convex. The best thing that I know of that you can do in this scenario is just to try the optimization algorithm at several different initial conditions and see what result you get back the most frequently.
It also looks like there is something called the 'PlotFcns' property which would allow you get a better idea what the functions the solver is generating for you look like.
You can use the ga or gamulti objective functions within the GlobalSearch api. I would recommend this. Convex optimizers wont be able to solve a non-linear problem. Even then Genetic Algorithms dont gaurantee the solution. If you run the ga and then use its final minimum as the start of your fmincon search then it should result in the same answer consistently. There may be better ones but if the search space is unknown you may never know.
I am using Genetic Algorithm in Matlab for optimization of a computationally expensive fitness function which also has constraints . I am right now imposing constraints in the form of penalty in to the objective function since constraint violation can only be calculated at the end of the function evaluation. I wanted to use nonlcon rather to satisfy the constraints.
But my problem is that the fitness function evaluation is expensive and I can't afford to do it again for checking of constraint violation. I have seen some nested function formulations where using output function I can accumulate all the individual variable values for every generation.
As per what I was thinking, Would it be possible to have sort of a matrix in which I can store all the individual values at the beginning of a generation update that matrix while my fitness evaluations and when I call nonlcon for constraint evaluation, then to look up that updated matrix for constraint violation. While I am trying to implement this, I have some doubts.
1) I remember reading in some forum that outputfcn for Genetic algo can be called either at the beginning of a generation or at the end. By default, it is at the end. I wont be able to execute my method if it calls at the end. Sadly I am not able to find how to call the outputfcn at the beginning rather than the end of a generation.
2) Since my fitness function is computationally expensive, I am using Parallel evaluations. So Would it be possible to implement the above mentioned idea while using parallel option in Matlab or it would create some difficulties?
Are you still looking for an answer? I had a similar problem and solved it here. I use two anonymous functions fitnessFunction and nonlconFunction in the ga which both point to my switchOutput function. They just pass an additional flag which output is requested. In the switchOutput, the expensive calculation is done for the first call with a specific input set and the results are stored. If there is another call with the same input set, the stored results are returned.
With this setup it doesn't matter it which order you call your fitness function and your constraint function. For the first call with a new input set, the results will be calculated and for any subsequent calls with the same inputs the saved results will be returned!