Spark (Scala) - how to optimize objective function parameters - scala

I have function f going from R2 to R which takes 2 parameters (a and b) and returns a scalar.
I would like to use an optimizer to estimate the values of a and b for which the value returned by f is maximized (or minimized, I can work with -f).
I have looked into the LBFGS optimizer from mllib, see:
the doc at https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.mllib.optimization.LBFGS and https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.mllib.optimization.LBFGS$
an example for logistic regression at https://spark.apache.org/docs/2.1.0/mllib-optimization.html
My issue is that I am not sure I fully understand how this optimizer works.
The optimizers I have seen before in python and R usually expect the following parameters: an implementation of the objective function, a set of initial values for the parameters of the objective function (and optionally: additional arguments for the objective function, boundaries for the domain within which the parameters should be searched...).
Usually, the optimizer invokes the function iteratively using a set of inital parameters provided by the user and calculates a gradient until the value returned by the objective function has converged (or the loss). It then returns the best set of parameters and corresponding value of the objective function. Pretty standard stuff.
In this case, I see org.apache.spark.mllib.optimization.LBFGS.runLBFGS expects to be given an RDD of labeled data and a gradient.
What is this data RDD argument the optimizer is expecting?
Is the argument gradient an implementation of the gradient of the objective function?
If I am to code my own gradient for my own objective function, how should the loss function be calculated (ratio of the value returned by the objective function at iteration n / (n-1)?
What is the argument initialWeights? Is it an array containing the initial values of the parameters to be optimized?
Ideally, would you be able to provide a very simple code example showing how a simple objective function can be optimized using org.apache.spark.mllib.optimization.LBFGS.runLBFGS?
Finally, could Breeze be an interesting alternative? https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/package.scala
Thanks!

Related

How do I pass a polynomial variable into a matlab function?

I am just getting started with MATLAB and I have written a function to produce the binomial expansion of (x-a)^n when given x, a and n. As far as I can tell my code should work, but I do not seem to be using the function variables correctly.
function expand(a,n,x)
f = 0;
for k = 0:1:n
f = f + nchoosek(n,k).*x.^(n-k).*(-a).^k;
end
end
I need to be able to call the function and have it output f as the expanded polynomial in x, for example calling expand(1,3,x) should return x^3-3*x^2+3*x-1, but instead, calling it gives this error:
Unrecognized function or variable 'x'. It seems like it wants me to call the function with x being another number but I in fact need it to be able to be any letter to be used as the variable in the polynomial.
I know in Maple I would specify the variable type in the function to be x::name so I'm assuming there's something similar in MATLAB that I don't yet know.
Thanks for any help.
There are two ways to go about this:
Create x as a symbolic variable. For example, syms x; expand(a,n,x). This gives you the power of using the symbolic toolbox features like simplify(), but comes with a bit of a performance penalty. You should avoid using the symbolic toolbox in intensive calculations.
Return an anonymous function, f=#(X)sum(arrayfun(#(k)nchoosek....,1:n). This has better performance and does not require double(subs(...)) when you want an actual numeric value, but may be too difficult to implement for a beginner.

scipy minimization: How to code a jacobian/hessian for objective function using max value

I'm using scipy.optimize.minimize with the Newton-CG (Newton Conjugate Gradient) method since I have an objective function for which I know the analytical Jacobian and Hessian. However, I need to add a regularization term R=exp(max(s)) based on the maximum value inside the array parameter "s" that being fit. It isn't entirely obvious to me how to implement derivatives for R. Letting the minimization algorithm do numeric derivatives for the whole objective function isn't an option, by the way, because it is far too complex. Any thoughts, oh wise people of the web?

2-Dimensional Minimization without Derivatives and Ignoring certain Input Parameters on the go

I have a Function V which depends on two variables v1 and v2 and a parameter-Array p containing 15 Parameters.
I want to Minimize my Function V regarding v1 and v2, but there is no closed expression for my Function, so I can't build and use the Derivatives.
The Problem is the following : For caluclating the Value of my Function I need the Eigenvalues of two 4x4 Matrices (which should be symmetric and real by concept, but sometimes the EigenSolver does not get real Eigenvalues). These Eigenvalues I calculate with the Eigen Package. The entries of the Matrices are given by v1,v2 and p.
There are certain Input Sets for which some of these Eigenvalues become negative. These are Input Sets which I want to ignore for my calculation as they will lead to an complex Function value and my Function is only allowed to have real values.
Is there a way to include this? My first attempt was a Nelder-Mead-Simplex Algorithm using the GSL-Library and an way too high Output value for the Function if one of the Eigenvalues becomes negative, but this doesn't work.
Thanks for any suggestions.
For the Nelder-Mead simplex, you could reject new points as vertices for the simplex, unless they have the desired properties.
Your method to artificially increase the function value for forbidden points is also called penalty or barrier function. You might want to re-design your penalty function.
Another optimization method without derivatives is the Simulated Annealing method. Again, you could modify the method to avoid forbidden points.
What do you mean by "doesn't work"? Does it take too long? Are the resulting function values too high?
Depending on the function evaluation cost, it might be an approach to simply scan a 2D interval, evaluate all width x height function values and drill down in the tile with the lowest function values.

Matlab, SCIP and Opti Toolbox

I am using the Opti Toolbox, a free optimization toolbox for Matlab. I am solving a Mixed Integer Nonlinear Program, a MINLP. Inside the Opti Toolbox, the MINLP solver used is SCIP.
I define my own objective as a separate function (fun argument in Opti), and this function needs to call other matlab functions which take double arguments.
The problem is that whenever Opti invokes my function to evaluate the objective, it first calls it using a vector of 'scipvar' objects and then it calls it again using a vector of 'double' objects. My obj function does not work with the scipvar objects, it returns an error.
I tried (just for testing) setting the output of my function for something fixed when the type is 'scipvar', and for the actual real thing when the type is 'double', and this doesn't work, changing the fixed value actually changes the final optimal value.
I basically need to convert a scipvar object to double, is this possible? Or is there any other alternative?
Thank you.
Ok, so after enlightenment by J. Currie, an Opti toolbox developer, I understood the cause of the problem above.
The first call to the objective with a vector of scipvar variables is actually a parser sweeping the objective function to see if it can be properly mapped to something that can be handled by SCIP. I reimplemented the objective function to use only methods allowed by scip - obtained by typing methods(scipvar) in matlab:
abs dot log minus mrdivide norm power rdivide sqrt times
display exp log10 mpower mtimes plus prod scipvar sum uminus
Once the objective could be parsed by scip my problem worked fine.

How to pass a function to pdepe initial condition function

I have created a function from sets of points using curve fitting toolbox. I used generate code function and generated function called createFit(a,b), where a and b are sets of points used for interpolation. As a result createFit returns my interpolated function.
Now I want to use this function as the u0 (initial conditions) of my PDE equation (I am using pdepe to solve PDE). To do that, in function where I need to establish u0 I need to invoke a createFit function, which is not a problem, I have access to it. A problem is that I cannot pass a and b as the parameters to this function. I tried to make them global but it did not worked. How to do that?
From the pdepe documentation:
Parameterizing Functions explains how to provide additional parameters to the functions pdefun, icfun, or bcfun, if necessary.
Essentially, use nested functions or anonymous functions to handle extra parameters.