Computing the Analytic Center of a Polytope in Python - scipy

I have a polytope described by a set of inequalities and equalities that are given as numpy arrays. These arrays will usually be quite large, as they describe the LP-relaxation of an MIPLIB problem. I want to compute the analytic center of this polytope. If possible, I'd like to do this directly from my Python code. However, I am also open to solutions that (for instance) write the polytope description to an MPS file and use another tool to compute the analytic center.
So far, I have tried implementing a primal-dual newton algorithm as described in "Interior-Point Algorithm: Theory and Analysis" by Yinyu Ye. However, this method is much too slow.
Another approach I tried was using a path-following interior-point method with a vanishing objective. To this end, I used linprog from scipy. However, evaluation on simple examples shows that method="interior-point" gives an interior point that is different from the analytic center, even when I set options={"resolve": False, "rr": False, "autoscale": False} to prevent modifications to the polytope description. Using method="highs-ipm" will give a vertex solution.
I'd be happy to hear about any ideas.

If you polytope is described as { x | a_i^T x <= b_i } then finding the analytic center amounts to minimizing function f(x) = - sum_i log(b_i - a_i^T x).
Function f is convex and self-concordant. I would recommend applying damped Newton steps, which should converge in a few iterations. The main computational effort will be solving the Newton linear system.
For a description of Newton's method with damped steps see section 9.5.2 in https://web.stanford.edu/~boyd/cvxbook/ (Convex Optimization, Boyd and Vandenberghe) or https://francisbach.com/self-concordant-analysis-newton/ or

Related

about backpropagation and sigmoid function

I have been reading this ebook about ANN:https://www4.rgu.ac.uk/files/chapter3%20-%20bp.pdf
and got a doubt about the effect of the sigmoid function for calculating the errorB. In the text says that if I have threshold neuron I can use:
Target-Output
but because I have a sigmoid function involved I should add:
Output(1-Output)
and end up with:
ErrorB=OutputB(1-OutputB)(TargetB-OutputB)
I mean why I should add the part of O(1-O), I have tried with different values, but I really do not get the intuition why it should be in that way.
Any help?
Thanks
As Kelu stated, that part of the equation is based on derivatives of your transfer function (in this case sigmoid). To understand why you need derivatives, you need to understand how the delta rule works(*):
Your overall goal is to minimize the error in the network's output using gradient descent. Gradient descent itself tries to find a minimum in the error function (E) by taking steps proportional to the negative of the gradient. A gradient is simply the derivative and the reason you're working with derivatives mathematically is that gradients point in the direction of the greatest rate of increase of the (error) function. Conclusion: Since you wanna minimize the error, you go the opposite way of the gradient.
This is the intuitive reason for using gradients. If you want the mathematical derivation, you should check this basic wiki article (additional comment as it's not mentioned anywhere: the g'(x) in the article is the first derivative of g(x))
Other transfer functions can be used, e.g. linear (in this case there is no g'(x) term as the derivative is simply a constant) or hyperbolic tangent in which case the derivative is something different again.
(*) Equation is derived from following equation where you start by minimizing the error of the output:
It is like that because of the fact that Output(1-Output) is a derivative of sigmoid function (simplified). In general, this part is based on derivatives, you can try with different functions (from sigmoid) and then you have to use their derivatives too to get a proper learning rate.
If you want you can take a look at my implementation (it's far from perfect, but maybe you will get some idea from it ;)), it's a simple project I made on my university - https://github.com/kelostrada/neuron-network

How calculating hessian works for Neural Network learning

Can anyone explain to me in a easy and less mathematical way what is a Hessian and how does it work in practice when optimizing the learning process for a neural network ?
To understand the Hessian you first need to understand Jacobian, and to understand a Jacobian you need to understand the derivative
Derivative is the measure of how fast function value changes withe the change of the argument. So if you have the function f(x)=x^2 you can compute its derivative and obtain a knowledge how fast f(x+t) changes with small enough t. This gives you knowledge about basic dynamics of the function
Gradient shows you in multidimensional functions the direction of the biggest value change (which is based on the directional derivatives) so given a function ie. g(x,y)=-x+y^2 you will know, that it is better to minimize the value of x, while strongly maximize the vlaue of y. This is a base of gradient based methods, like steepest descent technique (used in the traditional backpropagation methods).
Jacobian is yet another generalization, as your function might have many values, like g(x,y)=(x+1, xy, x-z), thus you now have 23 partial derivatives, one gradient per each output value (each of 2 values) thus forming together a matrix of 2*3=6 values.
Now, derivative shows you the dynamics of the function itself. But you can go one step further, if you can use this dynamics to find the optimum of the function, maybe you can do even better if you find out the dynamics of this dynamics, and so - compute derivatives of second order? This is exactly what Hessian is, it is a matrix of second order derivatives of your function. It captures the dynamics of the derivatives, so how fast (in what direction) does the change change. It may seem a bit complex at the first sight, but if you think about it for a while it becomes quite clear. You want to go in the direction of the gradient, but you do not know "how far" (what is the correct step size). And so you define new, smaller optimization problem, where you are asking "ok, I have this gradient, how can I tell where to go?" and solve it analogously, using derivatives (and derivatives of the derivatives form the Hessian).
You may also look at this in the geometrical way - gradient based optimization approximates your function with the line. You simply try to find a line which is closest to your function in a current point, and so it defines a direction of change. Now, lines are quite primitive, maybe we could use some more complex shapes like.... parabolas? Second derivative, hessian methods are just trying to fit the parabola (quadratic function, f(x)=ax^2+bx+c) to your current position. And based on this approximation - chose the valid step.

Scipy/Python indirect spline interpolation

I need to fit data in quite an indirect way. The original data to be recovered in the fit is some linear function with small oscillations and drifts on it, that I would like to identify. Let's call this f(t). We can not record this parameter in the experiment directly, but only indirectly, let's say as g(f) = sin(a f(t)). (The real transfer funcion is more complex, but it should not play a role in here)
So if f(t) changes direction towards the turning points of the sin function, it is difficult to identify and I tried an alternative approach to recover f(t) than just the inverse function of g and some data continuing guesses:
I create a model function fm(t) which undergoes the same and known transfer function g() and fit g(fm(t)) to the data. As the dataset is huge, I do this piecewise for successive chunks of data guaranteeing the continuity of fm across the whole set.
A first try was to use linear functions using the optimize.leastsq, where the error estimate is derived from g(fm). It is not completely satisfactory, and I think it would be far better to fit a spline to the data to get fspline(t) as a model for f(t), guaranteeing the continuity of the data and of its derivative.
The problem with it is, that spline fitting from the interpolate package works on the data directly, so I can not wrap the spline using g(fspline) and do the spline interpolation on this. Is there a way this can be done in scipy?
Any other ideas?
I tried quadratic functions and fixing the offset and slope such to match the ones of the preceeding fitted chunk of data, so there is only one fitting parameter, the curvature, which very quickly starts to deviate
Thanks
What you would need is a matrix of spline basis functions, b(t), so you can approximate f(t) as a linear combination of spline basis function
f(t) = np.dot(b(t), coefs)
and then estimate the coefficients, coefs, by optimize.leastsq.
However, spline basis functions are not readily available in python, as far as I know (unless you borrow experimental scripts or search through the code of some packages).
Instead you could also use polynomials, for example
b(t) = np.polynomial.chebvander(t, order)
and use a polynomial approximation instead of the splines.
The structure of this problem is very similar to generalized linear models where g is your known link function and similar to index problems in econometrics.
It would be possible to use the scipy splines in an indirect way if you create artificial data
y_i = f(t_i)
where f(t_i) are scipy.interpolate splines, and the y_i are the parameters to be estimated in the least squares optimization. (Loosely based on a script that I saw some time ago that used this for creating a different kind of smoothing splines than the scipy version. I don't remember where I saw this.)
Thank you for these comments. I tried out the polynomial basis suggested above, but polynomials are no option for my needs, ads they tend to create ringing, which is difficult to condition.
The solution on using splines I now found is quite simple and straightforward, and I think it is what you meant by "using the splines in an indirect way".
The fitting function f(t) is obtained by the interpolate.splev(x, (t,c,k)) function, but providing the spline coefficients c by the omptimize.leastsq function. In this way, f(t) is no direct spline fit (as one would usually obtain with the splrep(x, y) function) but indirectly optimized in the fit, and therefore it is possible to use the link function g on it. The initial guess for c might be obtained by one evaluation of splrep(xinit, yinit, t=knots) on model data.
One trick is to restrict the number of knots for the spline to below the number of datapoints by explicitly specifying them during the function call of splrep() and giving this reduced set during the evaluation using splev().

Linear least-squares fit with constraint - any ideas?

I have a problem where I am fitting a high-order polynomial to (not very) noisy data using linear least squares. Currently I'm using polynomial orders around 15 - 25, which work surprisingly well: The dependence is very nearly linear, but the accuracy of modelling the 'very nearly' is critical. I'm using Matlab's polyfit() function, and (obviously) normalising the x-data. This generally works fine, but I have come across an issue with some recent datasets. The fitted polynomial has extrema within the x-data interval. For the application I'm working on this is a non-no. The polynomial model must have no stationary points over the x-interval.
So I need to add a constraint to the least-squares problem: the derivative of the fitted polynomial must be strictly positive over a known x-range (or strictly negative - this depends on the data but a simple linear fit will quickly tell me which it is.) I have had a quick look at the available optimisation toolbox functions, but I admit I'm at a loss to know how to go about this. Does anyone have any suggestions?
[I appreciate there are probably better models than polynomials for this data, but in the short term it isn't feasible to change the form of the model]
[A closing note: I have finally got the go-ahead to replace this awful polynomial model! I am going to adopt a nonparametric approach, spline smoothing, using the excellent SPLINEFIT code by Jonas Lundgren. This has the advantage that I'm already using a spline model in the end-user application, so I already have C# code available to evaluate a spline model]
You could use cftool and use the exclude data points option.

Integration with matlab

i want to solve this problem:
alt text http://img265.imageshack.us/img265/6598/greenshot20100727091025.png
i don't want to use "int", i want to use "quad" family (quad,dblquad,triplequad)
but i can't.
can you help me?
I assume that your real problem is more complex than this trivial one. The best solution is just to use a symbolic integral. Why is numerical integration difficult?
Numerical integration in ONE dimension typically requires on the order of say 100 function evaluations. (The exact number will be very dependent on the accuracy required, the limits, etc.) This makes a 2-d integral typically require on the order of 100^2 = 10000 function evals. So an adaptive, 5-d integral will require on the order of 100^5 = 1e10 function evaluations. (This is only a very rough order of magnitude estimate here.) My point is, you simply don't want to do that!
Better is to reduce the problem in complexity. If your integral is separable (as is this one) then do so! Reduce a 5-d problem into multiple 1-d problems.
Also, in many cases I see people wanting to do a numerical integration of a Gaussian PDF. See that this is easily solved using a call to erf or erfc, coupled with a transformation. The point is that in many cases special functions are defined to greatly reduce the complexity of a problem.
I should add that in many cases, the key to solving a difficult problem in mathematics is to use mathematics to reduce the problem to something simpler. If you can find a way to reduce the dimensionality of your problem just a bit, it will become much more tractable.
The integral you show is
Analytically solvable: always do analytically what you can
?equal to a number: constant expressions should be eliminated from numerical calculations
not easy to get calculated in MATLAB (or very correct).
You can use cumtrapz to integrate over each variable alone, and call trapz the final integration. Remember that this will blow up the error on any problem that is more complicated than the simple sum of linear functions.
Mathematica is more suited to nD integrations, if you have access to that.
matlab can do symbolic integration
>> x = sym('x'); y = sym('y'); z = sym('z'); u = sym('u'); v = sym('v');
>> int(int(int(int(int(x+y+z+u+v,1,5),-2,3),0,1),-1,1),0,1)
ans =
180
Just noticed you want to do numeric, not symbolic integration
If you look at the source of dblquad and triplequad
>> edit dblquad
you see that they just call the lower versions.
it should be possible for you to add a quadquad and a quintquad (or recursively an n-quad)