Solving non-convex optimization with global optimization algorithm using MATLAB - matlab

I have a simple unconstrained non-convex optimization problem. Since problems of these type have multiple local minima, I am looking for global optimization algorithm that yields a unique/global minimum. In the internet I came across global optimization algorithms like genetic algorithms, simulated annealing, etc but for solving a simple one variable unconstrained non-convex optimization problem, I think using these high level algorithms doesn't seem to be a good idea. Could anyone recommend me a simple global algorithm for solving such simple one variable unconstrained non-convex optimization problem? I would highly appreciate ideas on this.

"Since problems of these type have multiple local minima". It's not true, the real situation is the following:
Maybe you have one local minimum
Maybe you have infinite set of local miminums
Maybe you have finite number of local minimums
Maybe minimum is not attained
Maybe problem is unbounded below
Also big picture is that there are really true methods which really solve problems (numerically and they slow), but there is a slang to call method which is not nessesary find minumum value of function also call as "solve".
In fact M^n~M for any finite n and any infinite set M. So the fact that you problem has one dimension is nothing. It is still hard as problem with 1000000 parameters which are drawn from the set M from theoretical point of view.
If you interesting how approximately solve problem with known precision epsilon in domain - then split you domain into 1/espsilon regions, sample value(evalute function) at middle point, and select minimum
Method which I will describe below is precise method, and other methods: particle estimation, sequent.convex.programming, alternative direction, particle swarm, Neidler-Mead simplex method, mutlistart gradient/subgradient descend or any descend algorithm like Newton Method or coordinate descend, they all has no gurantess for non-convex problems and some of them even can no be applied if function is nonconvex.
If you interesting in really solve with some precision on function value then then you can take attention into method, which is called branch-and-bound and which truly found minimum, algorithms which you described I don't think so that they solve problem and find minimum in strong sense:
Basic idea of branch and bound - partition domain into convex sets and improve lower/upper bound, in your case it is intervals.
You should have a routine to find upper bound of optimal (min.) value: you can do it e.g. just by sampling subdomain and take smallest or use local optimization method start from random point.
But also you should have lower bound of optimal (min.) value by some principle and this is hard part:
convex relaxation of integer variables to make them real variables
use Lagrange Dual function
use Lipshitc constant on function, etc.
This is sophisticaed step.
If this two values are near - we're done in other case partion or refine partition.
Get info about lower and upper bound of child subproblems and then take min. of upper bounds and min. of lower bounds of children. If child returns more worse lower bound it can be upgraded by parent.
References:
For more great explanation please look into:
EE364B, Lecture 18, prof. Stephen Boyd, Stanford University. It's available on youtube and in ITunes University. If you new to this area I recommend you to look EE263, EE364A, EE364B courses of Stephen P. Boyd. You will love it

Since this is a one dimensional problem, things are easier.
A simple steepest descend procedure may be used as follows.
Suppose the interval of search is a<x<b.
Start the SD from a minimizing your function say f(x). You recover the first minimum Xm1. You should use a fine step, not too large.
Shift this point by adding a positive small constant Xm1+ε. Then maximize f or minimize -f, starting from this point. You get a max of f, you distort it by ε and start from there a minimization, and so on so forth.

Related

What does Default Solver Iteration Means?

I'm trying to understand Unity Physics engine (PhysX), Can somebody explain that what exactly Default Solver Iterations and Default Solver Velocity Iterations are?
This is from Unity documentation :
Default Solver Iterations: Define how many solver processes Unity runs
on every physics frame. Solvers are small physics engine tasks which
determine a number of physics interactions, such as the movements of
joints or managing contact between overlapping Rigidbody components.
This affects the quality of the solver output and it’s advisable to
change the property in case non-default Time.fixedDeltaTime is used,
or the configuration is extra demanding. Typically, it’s used to
reduce the jitter resulting from joints or contacts.
Please provide some example of how it works and how does increase or decreasing it affects the final result?
I asked this question on Unity Forum and Hyblademin answered it:
In mathematics, an iterative solution method is any algorithm which
approximately solves a system of unknown values like [x1, x2, x3 ...
xn] by repeating a set of steps (iterating). Often, the system of
interest is a set of linear equations exactly like those seen in
algebra class but with a prohibitively high number of unknowns.
Starting with a guess for the solution to each unknown, which could be
based on a similar, known system or could be from a common starting
point like [1, 1, 1 ... 1], a procedure is carried out which gives an
approximate solution which will be closer to the exact values. After
only one iteration, the approximation won't be a very good one unless
the initial guess was already close. But the procedure can be repeated
with the first approximation as the new input, which will give a
closer approximation.
After repeating a few more times, we can expect a reliable
approximation. It still isn't exact, which we could confirm by just
plugging in our answers into our original system and seeing that it
isn't quite right (after simplifying, we would end up with things like
10=10.001 or something to that effect). That said, if the
approximation is close enough for our application, we stop iterating
and use it.
These lecture notes courtesy of a Notre Dame course give a nice
example of this in action using the well-known Jacobi method. Carrying
out an iteration of an iterative method outputs an approximation that
is better than the input because the methods are defined in a way that
causes this to happen, and this is a property called convergence. When
looking at why any given method converges, things get abstract pretty
quickly. I think this is outside the scope of your question,
especially since I don't know what method(s) Unity uses anyway.
When physics is calculated in Unity, we end up with a lot of systems
of equations. We could draw a free-body diagram to show forces and
torques during a collision for a given FixedUpdate in a Unity runtime
to show this. We could try to solve them "directly", which means to
use logical relationships to determine the exact results of the values
(like solving for x in algebra class), but even if the systems are on
the simple side, doing a lot of them will slow the execution to a
crawl. Luckily, iterative, "indirect" methods can be used to get a
pretty good approximation at a fraction of the computing cost.
Increasing the number of iterations will lead to more precise
approximate solutions. There is a point where increasing the number of
iterations gives an increase in precision that is not at all worth the
processing overhead of doing another iteration. But the number of
iterations for this point depends on what you need your project to do.
Sometimes a given arrangement of physics objects will result in jitter
with the default settings that might be improved with more solver
iterations, which is mentioned in the manual entry. There isn't a
great way to determine if changing solver iteration counts will
improve behavior or performance in the way that you need, except for
just trial and error (use the Profiler for a more-objective indication
of performance impact).
https://forum.unity.com/threads/what-does-default-solver-iteration-means.673912/#post-4512004

Don't understand the need of "grad" in lrCostFunction.m

Coding the lrCostFunction.m in Octave for the course Machine Learning in Coursera (Neural Networks) "ex3". I don't get why we need to obtain "grad". Anybody has a clue?
Thx in advance
Grad refers to the 'gradient' of the cost function.
Your objective is to minimize the cost function. In order to do that, most optimisation algorithms also need to know the equation that gives its gradient at each point, so that they can use it to move the next search in a direction that makes it more likely that the cost function will be at a lower value.
Specifically, since the gradient at a point is defined as the direction of maximal rate of 'increase' in the underlying function, typically optimisation algorithms use the current point and take a small step in the reverse direction to that indicated by the gradient.
In any case, since you're asking an abstract optimisation algorithm to optimise parameters such that a cost function is minimized by making use of its gradient at each step, you need to provide all of those inputs to the algorithm. Hence why you need to calculate 'grad' value as well as the value of the cost function itself at each point.

Matlab optimization - Variable bounds have "donut hole"

I'm trying to solve a problem using Matlab's genetic algorithm and fmincon functions where the variables' values do not have single upper and lower bounds. Instead, the variables should be allowed to take a value of x=0 or be lb<=x<=ub. This is a turbine allocation problem, where the turbine can either be turned off (x=0) or be within the lower and upper cavitation limits (lb and ub). Of course I can trick the problem by creating a constraint which will violate for values in between 0 and lb, but I'm finding that the problem is having a hard time converging like this. Is there an easier way to do this, which will trim down the search space?
If the number of variables is small enough (say, like 10 or 15 or less) then you can try every subset of variables that are set to be non-zero, and see which subset gives you the optimal value. If you can't make assumptions about the structure of your optimization problem (e.g. you have penalties for non-zero variables but your main objective function is "exotic"), this is essentially the best that you can do. If you are willing to settle for an approximate solution, you can add a so-called "L1" penalty to your objective function which is the sum of a constant times the absolute values of the variables. This will encourage some variables to be zero, and if your main objective function is convex then the resulting objective function will be convex because negative absolute value is convex. It's much easier to optimize convex functions (taking the minimum) because strictly convex functions always have a global minimum that you can reach using any number of optimization routines (including the ones that are implemented in matlab.)

Why doesn't k-means give the global minima?

I read that the k-means algorithm only converges to a local minima and not to a global minima. Why is this? I can logically think of how initialization could affect the final clustering and there is a possibility of sub-optimum clustering, but I did not find anything that will mathematically prove that.
Also, why is k-means an iterative process?
Can't we just partially differentiate the objective function w.r.t. to the centroids, equate it to zero to find the centroids that minimizes this function? Why do we have to use gradient descent to reach the minimum step by step?
Consider:
. c .
. c .
Where c is a cluster centroid. The algorithm will stop, but a better solution is:
. .
c c
. .
With regards to a proof - You don't require a mathematical proof to prove that something isn't always true, you just need a single counter-example, as provided above. You can probably convert the above into a mathematical proof, but this is unnecessary and generally requires a lot of work; even in academia it is accepted to merely give a counter-example to disprove something.
The k-means algorithm is by definition an iterative process, it's simply the way it works. The problem of clustering is NP-hard, thus using an exact algorithm to calculate the centroids would take immensely long.
Don't mix the problem and the algorithm.
The k-means problem is finding the least-squares assignment to centroids.
There are multiple algorithms for finding a solution.
There is an obvious approach to find the global optimum: enumerating all k^n possible assignments - that will yield a global minimum, but in exponential runtime.
Much more attention was put to finding an approximate solution in faster time.
The Lloyd/Forgy algorithm is an EM-style iterative model refinement approach, that is guaranteed to converge to a local minimum simply because there is a finite number of states, and the objective function must decrease in every step. This algorithm runs in O(n*k*i) where i << n usually, but it may find a local minimum only.
The MacQueens method is technically not iterative. It's a single-pass, one-element-at-a-time algorithm that will not even find a local minimum in the Lloyd sense. (You can however run it multiple passes over the data set, until convergence, to get a local minimum too!) If you do a single pass, its in O(n*k), for multiple passes add i. It may or may not take more passes than Lloyd.
Then there is Hartigan and Wong. I don't remember the details, IIRC it was a clever, more lazy, variant of Lloyd/Forgy, so probably in O(n*k*i), too (although probably not recomputing all n*k distances for later iterations?)
You could also do a randomized alogrithm that just tests l random assignments. It probably won't find a minimum at all, but run in "linear" time O(n*l).
Oh, and you can try different random initializations, to improve your chances of finding the global minimum. Add a factor t for the number of trials...

Linear regression, with limits

I have a set of points, (x, y), where each y has an error range y.low to y.high. Assume a linear regression is appropriate (in some cases the data may originally have followed a power law, but has been transformed [log, log] to be linear).
Calculating a best fit line is easy, but I need to make sure the line stays within the error range for every point. If the regressed line goes outside the ranges, and I simply push it up or down to stay between, is this the best fit available, or might the slope need changed as well?
I realize that in some cases, a lower bound of 1 point and an upper bound of another point may require a different slope, in which case presumably just touching those 2 bounds is the best fit.
The constrained problem as stated can have both a different intercept and a different slope compared to the unconstrained problem.
Consider the following example (the solid line shows the OLS fit):
Now if you imagine very tight [y.low; y.high] bounds around the first two points and extremely loose bounds over the last one. The constrained fit would be close to the dotted line. Clearly, the two fits have different slopes and different intercepts.
Your problem is essentially the least squares with linear inequality constraints. The relevant algorithms are treated, for example, in "Solving least squares problems" by Charles L. Lawson and Richard J. Hanson.
Here is a direct link to the relevant chapter (I hope the link works). Your problem can be trivially transformed to Problem LSI (by multiplying your y.high constraints by -1).
As far as coding this up, I'd suggest taking a look at LAPACK: there may already be a function there that solves this problem (I haven't checked).
I know MATLAB has an optimization library that can do constrained SQP (sequential quadratic programming) and also lots of other methods for solving quadratic minimization problems with inequality constraints. The cost function you want to minimize will be the sum of the squared errors between your fit and the data. The constraints are those you mentioned. I'm sure there are free libraries that do the same thing too.