I would like to find an open source example of a code for deterministic annealing. It can be in almost any language: C, C++, MatLab/Octave, Fortran. I have already found a MatLab code for simulated annealing, so MatLab would be best. Here is a paper that describes the algorithm.
Deterministic annealing is an
optimization technique that attempts
to find a global minimum of a cost
function. The technique is designed to
be able to explore a large portion of
the cost surface using randomness,
while still performing optimization
using local information. The procedure
starts with changing the cost function
to introduce a notion of randomness,
allowing a large area to be explored.
Each iteration the amount of
randomness (measured by Shannon
Entropy [2]) is constrained, and a
local optimization of performed.
Gradually, the amount of imposed
randomness is lowered so that upon
termination the algorithm optimizes
over the original cost function,
yielding a solution to the original
problem
The figures in the paper you link to look like Matlab figures. I suggest you contact the authors whether they're willing to share their code with you.
Related
I'm using recusive least squares (RLS) to identify system parameters for a dynamical system. The RLS algorithm is implemented in discrete time, while the real system is continuous. In practice this is easily done, but how can I simulate these two together? A sequential solution doesn't help, since I want to use the RLS estimate to influence the system input.
The built-in event-triggering can only stop integration, if I got that right. Thus, I'd have to stop at each sampling point of the RLS algorithm and then solve the ode between samples. -> How is this implemented in Simulink?
The only real solution I found was to implement my own RK45 with adaptive step size. It is designed to take discrete and continuous systems (ode and difference equations) and solves with adaptive step size until a new sample has to be taken. This method works like a charm - with slow dynamics only the discrete points are sampled for sufficiently small sampling times and fast dynamics yield small integration step sizes, as expected!
Also the implementation was way less effort than expected and compares surprisingly well to matlabs ode45, ie. lower computational cost, higher accuracy, less oscillations after discrete jumps in the ode!
The QP problem is convex. For Wiki, the problem can be solved in polynomial time.
But what exactly is the order?
That is an interesting question with (in my opinion) no clear answer. I am going to assume your problem is convex and you are interested in run-time complexity (as opposed to Iteration complexity).
As you may know, QuadProg is not one algorithm but rather, a generic name for something that solves Quadratic problems. It uses a set of algorithms underneath viz. Interior Point (Default), Trust-Region and Active-Set. Source.
Depending upon what you choose, each of these algorithms will have its own complexity analysis. For Trust-Region and Active-Set methods, the complexity analysis is extremely hard. In fact, Active-Set methods are not polynomial to begin with. Counterexamples exist where Active-Set methods take exponential "time" to converge (This is true also for the Simplex Method for Linear Programs). Source.
Now, assuming that you choose Interior Point methods, the answer is still not straightforward because there are various flavours of these methods. When Karmarkar first proposed this method, it was the first known polynomial algorithm for solving Linear Programs and it had a complexity of O(n^3.5). Source. These bounds were improved quite a lot later. However, this is for Linear Programs.
Finally, to answer your question, Ye and Tse proved in 1989 that we can have an Interior Point method with complexity O(n^3). However, whether MATLAB uses this exact flavor of Interior Point method is a little tricky to know but O(n^3) would be my best guess.
Of course, my answer is rather theoretical; if you want to empirically test it out, you can do so by gradually increasing the number of variables and plotting the CPU time required to get an estimate.
I wrote a basic O(n^2) algorithm for a nearest neighbor search. As usual Matlab 2013a's knnsearch(..) method works a lot faster.
Can someone tell me what kind of optimization they used in their implementation?
I am okay with reading any documentation or paper that you may point me to.
PS: I understand the documentation on the site mentions the paper on kd trees as a reference. But as far as I understand kd trees are the default option when column number is less than 10. Mine is 21. Correct me if I'm wrong about it.
The biggest optimization MathWorks have made in implementing nearest-neighbors search is that all the hard stuff is implemented in a MEX file, as compiled C, rather than MATLAB.
With an algorithm such as kNN that (in my limited understanding) is quite recursive and difficult to vectorize, that's likely to give such an improvement that the O() analysis will only be relevant at pretty high n.
In more detail, under the hood the knnsearch command uses createns to create a NeighborSearcher object. By default, when X has less than 10 columns, this will be a KDTreeSearcher object, and when X has more than 10 columns it will be an ExhaustiveSearcher object (both KDTreeSearcher and ExhaustiveSearcher are subclasses of NeighborSearcher).
All objects of class NeighbourSearcher have a method knnsearch (which you would rarely call directly, using instead the convenience command knnsearch rather than this method). The knnsearch method of KDTreeSearcher calls straight out to a MEX file for all the hard work. This lives in matlabroot\toolbox\stats\stats\#KDTreeSearcher\private\knnsearchmex.mexw64.
As far as I know, this MEX file performs pretty much the algorithm described in the paper by Friedman, Bentely, and Finkel referenced in the documentation page, with no structural changes. As the title of the paper suggests, this algorithm is O(log(n)) rather than O(n^2). Unfortunately, the contents of the MEX file are not available for inspection to confirm that.
The code builds a KD-tree space-partitioning structure to speed up nearest neighbor search, think of it like building indexes commonly used in RDBMS to speed up lookup operations.
In addition to nearest neighbor(s) searches, this structure also speeds up range-searches, which finds all points that are within a distance r from a query point.
As pointed by #SamRoberts, the core of the code is implemented in C/C++ as a MEX-function.
Note that knnsearch chooses to build a KD-tree only under certain conditions, and falls back to an exhaustive search otherwise (by naively searching all points for the nearest one).
Keep in mind that in cases of very high-dimensional data (and few instances), the algorithm degenerates and is no better than an exhaustive search. In general as you go with dimensions d>30, the cost of searching KD-trees will increase to searching almost all the points, and could even become worse than a brute force search due to the overhead involved in building the tree.
There are other variations to the algorithm that deals with high dimensions such as the ball trees which partitions the data in a series of nesting hyper-spheres (as opposed to partitioning the data along Cartesian axes like KD-trees). Unfortunately those are not implemented in the official Statistics toolbox. If you are interested, here is a paper which presents a survey of available kNN algorithms.
(The above is an illustration of searching a kd-tree partitioned 2d space, borrowed from the docs)
I'm using the genetic algorithm from the MATLAB Global Optimization Toolbox with SimEvents, in order to implement a mixed integer optimization making use of simulation outputs to evaluate the fitness function. My model is pretty similar to the one described in this video from MathWorks website:
http://www.mathworks.it/videos/optimizing-manufacturing-production-processes-68961.html
Reading the documentation, I found that ga can solve constrained problems only if such constraints are linear inequalities. The constraints are supposed to be written as functions of the problem's variables, that in this case are the number of resources used during the simulation.
I would like, instead, to set a constraint that takes into account another simulation output (e.g. the drain utilization), i.e. minimize
objfun = backlog*10000 + cost
where backlog is a simulation output (obtained using simOut.get), considering the following constraint:
drain_utilization > 0.7
where drain_ utilization is another simulation output (again, obtained using simOut.get).
Is it possible or this feature is not supported by the Global Optimization Toolbox?
Thank you in advance and forgive me for any improper term, but I'm new to the Global Optimization Toolbox.
After profiling my Neural Nets' code I've realized that the method, which computes the weight changes for each arc in the network (-rate*gradient + momentum*previous_delta - decay*rate*weight), already given the gradient, is the bottleneck (55% inclusive samples).
Is there any trick to compute these values in a efficient manner?
This is normal behaviour. I am assuming that you are using an iterative process to solve the weights at each evolution step (such as backpropagation?). If the number of neurons is large and the training (back-testing) algorithm is short, then it is normal that weight mutation such as this will consume a larger fraction of compute time during training of the neural network.
Did you get this result using a simple XOR problem or similar? If so, you will probably find that if you start to solve more complex problems (such as pattern detection in multidimensional arrays, image processing, etc.) that those functions will begin to consume an insignificant fraction of compute time.
If you are profiling, I would suggest you profile with a problem that is closer to the purpose for which the neural network is designed (I am guessing you didn't design it to solve XOR or play tic tac toe) and you will probably find that optimising code such as -rate*gradient + momentum*previous_delta - decay*rate*weight is more or less a waste of time, at least this is my experience.
If you do find that this code is compute-intensive in real-world applications then I would suggest trying to reduce the number of times this line of code is executed via structural changes. Neural network optimization is a rich field and I can't possibly give you useful advise from such a broad question, but I will say that if your program is unusually slow, you're unlikely to see significant improvements by tinkering at such low-level code. I will however suggest the following from my own experience:
Consider parallelisation. Many search algorithms such as those implemented in back-propagation techniques are amenable to parallel attempts to improve convergence. As weight-adjustments are identical in terms of computation demand for a given network, think static loops in Open MP.
Modify the convergence criterion (the critical convergence rate before you stop adjustments of weights) to perform less of these calculations
Consider an alternative to deterministic solutions such as back-propagations, which are slightly more prone to local optimisation anyway. Consider gaussian mutation (All things being equal gaussian mutation will 1) reduce time spent on mutation relative to backtesting 2) increase convergence time and 3) be less prone to getting caught in local minima of the error search space)
Please note that this is a non-technical answer to what I have interpreted as a non-technical question.