Nelder-Mead optimization with equality constraints - matlab

I would like to use the function fminsearch
of matlab to search for the best hyperparameters of my SVM with a weighted RBF kernel classifier. fminsearch uses the Nelder-Mead simplex method.
Let's say I have the following hyperparameters: C, gamma, w1....wn where wi are the weights of the kernel.
Additionally, I have the constraint that sum(wi) = 1, i.e. all weights must sum up to one.
Is there a possibility to use Nelder-Mead with this equality constraint? I know that there is the fminsearchbnd
method for Matlab but I think it can handle only boundary inequality constraints.
Edit: I'm using a SVM classifier and the weights are used in a weighted RBF kernel (one weight for each feature). The parameters to estimate are thus C, gamma and the weights. The cost function is the accuracy.

Can you substitute out one of the w(i)? That means, replace e.g. w1 by 1-w2-w3-... (and drop the constraint). Otherwise have a look at fmincon which allows explicit constraints. In addition you may need 0 <= w(i) <= 1.

Related

How to specify non linear regression model in python

I am taking an Econometrics course, and have been trying to use Python rather than the propreitry STATA and EVIEWS they set the assignments in.
In one of the questions, I have consumption data over time. I am asked to compute it in two ways.
The first way is calculating a model of the form consumption = Aexp(Bt), and the second way is to log both sides and do ordinary OLS on log(consumption) = alpha + Bt
I know how to do the second way. Howver, when I try to do the first way it goes wrong. Using statsmodels, I can exponentiate the time data (after normalising), but this calculates a regression in the form consumption = Aexp(t) + B, which is not what I want. (I want to specify where the parameters go). In sklearn I could find a polynomial regression, but not exponential.
Then I found scipy.curve_fit
However this seems to have two problems:
(1) It seems to rely on initial guesses for parameters, which means my output will end up being different from proprietry software (whereas output for things like OLS are the same) [as I assume initial guesses means some iterative solution is done which is helpful for very weird and wonderful functions, but I assume fairly standard results hold for exponential regression]
(2) every time I try to implement it, it just returns the guess parameters.
Here is my code
`consumption_data = pd.read_csv(......\consumption.csv")
def func(x,a,b):
return a * np.exp(b*x)
xdata = consumption_data.YEAR
ydata = consumption_data.CONSUMPTION
ydata = (ydata - 1948)/100
popt, pcov = curve_fit(func, xdata, ydata, (1,1))
print(popt)
plt.plot(xdata, func(xdata, *popt), 'g--',)
`
The scipy.optimize code is basically just copy-pasted from their tutorial
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
short answer: use statsmodels GLM
statsmodels does not have nonlinear least squares. The best python library for that is lmfit https://pypi.org/project/lmfit/
curve_fit, lmfit and nonlinear least squares algorithm in general find an iterative solution to the optimization problem. Even when we have to provide starting values, the solution is in many cases the same across packages up to convergence tolerance, e.g. 1e-5 or 1e-6.
Many standard models in statistics and econometrics have a single global maximum with well behaved data. However, in other cases like mixture models, there might be many local optima and the estimation might converge to one of them.
To the specific case:
consumption = A exp(B t)
can be rewritten as
consumption = exp(a + B t)
So this is just a single index model or a generalized linear model with an exponential mean function.
The general version has the expectation of the dependent variable as a nonlinear function of a linear combination of the explanatory variables:
E(y | x) = g(x b)
This can be estimated with statsmodels with GLM with family Gaussian and the log-link.
Aside: In econometrics, there is a literature to use Poisson quasi-likelihood as an estimator for exp models instead of taking the log of the dependent variable.
Poisson usually uses the log-link function as in the above.
However, using GLM allows us to use log-link, i.e. exponential mean function, with any of the supported distribution families. The main difference is in the underlying variance assumption. Gaussian assumes constant variance, Poisson assumes that the variance is proportional to the mean and Gamma assumes that the variance is quadratic in the mean.
If we use a robust sandwich covariance estimator for parameter inference, then standard errors and inference are correct even if the variance function is misspecified.

Mixture of 1D Gaussians fit to data in Matlab / Python

I have a discrete curve y=f(x). I know the locations and amplitudes of peaks. I want to approximate the curve by fitting a gaussian at each peak. How should I go about finding the optimized gaussian parameters ? I would like to know if there is any inbuilt function which will make my task simpler.
Edit
I have fixed mean of gaussians and tried to optimize on sigma using
lsqcurvefit() in matlab. MSE is less. However, I have an additional hard constraint that the value of approximate curve should be equal to the original function at the peaks. This constraint is not satisfied by my model. I am pasting current working code here. I would like to have a solution which obeys the hard constraint at peaks and approximately fits the curve at other points. The basic idea is that the approximate curve has fewer parameters but still closely resembles the original curve.
fun = #(x,xdata)myFun(x,xdata,pks,locs); %pks,locs are the peak locations and amplitudes already available
x0=w(1:6)*0.25; % my initial guess based on domain knowledge
[sigma resnorm] = lsqcurvefit(fun,x0,xdata,ydata); %xdata and ydata are the original curve data points
recons = myFun(sigma,xdata,pks,locs);
figure;plot(ydata,'r');hold on;plot(recons);
function f=myFun(sigma,xdata,a,c)
% a is constant , c is mean of individual gaussians
f=zeros(size(xdata));
for i = 1:6 %use 6 gaussians to approximate function
f = f + a(i) * exp(-(xdata-c(i)).^2 ./ (2*sigma(i)^2));
end
end
If you know your peak locations and amplitudes, then all you have left to do is find the width of each Gaussian. You can think of this as an optimization problem.
Say you have x and y, which are samples from the curve you want to approximate.
First, define a function g() that will construct the approximation for given values of the widths. g() takes a parameter vector sigma containing the width of each Gaussian. The locations and amplitudes of the Gaussians will be constrained to the values you already know. g() outputs the value of the sum-of-gaussians approximation at each point in x.
Now, define a loss function L(), which takes sigma as input. L(sigma) returns a scalar that measures the error--how badly the given approximation (using sigma) differs from the curve you're trying to approximate. The squared error is a common loss function for curve fitting:
L(sigma) = sum((y - g(sigma)) .^ 2)
The task now is to search over possible values of sigma, and find the choice that minimizes the error. This can be done using a variety of optimization routines.
If you have the Mathworks optimization toolbox, you can use the function lsqnonlin() (in this case you won't have to define L() yourself). The curve fitting toolbox is probably an alternative. Otherwise, you can use an open source optimization routine (check out cvxopt).
A couple things to note. You need to impose the constraint that all values in sigma are greater than zero. You can tell the optimization algorithm about this constraint. Also, you'll need to specify an initial guess for the parameters (i.e. sigma). In this case, you could probably choose something reasonable by looking at the curve in the vicinity of each peak. It may be the case (when the loss function is nonconvex) that the final solution is different, depending on the initial guess (i.e. you converge to a local minimum). There are many fancy techniques for dealing with this kind of situation, but a simple thing to do is to just try with multiple different initial guesses, and pick the best result.
Edited to add:
In python, you can use optimization routines in the scipy.optimize module, e.g. curve_fit().
Edit 2 (response to edited question):
If your Gaussians have much overlap with each other, then taking their sum may cause the height of the peaks to differ from your known values. In this case, you could take a weighted sum, and treat the weights as another parameter to optimize.
If you want the peak heights to be exactly equal to some specified values, you can enforce this constraint in the optimization problem. lsqcurvefit() won't be able to do it because it only handles bound constraints on the parameters. Take a look at fmincon().
you can use Expectation–Maximization algorithm for fitting Mixture of Gaussians on your data. it don't care about data dimension.
in documentation of MATLAB you can lookup gmdistribution.fit or fitgmdist.

efficient inversion of known CDF in MATLAB

I need to compute efficiently and in a numerically stable way the inverse CDF F^-1(y) (cumulative distribution function) of a probability function, assuming that both the PDF f(x) and the CDF F(x) are known analytically but the inverse CDF is not. I am doing this in MATLAB.
This is a root-finding problem for F(x)-y and I could use fzero:
invcdf = #(y, x0) fzero(#(x) cdf(x) - y, x0);
However, fzero is for a generic nonlinear function.
I wonder if there is some function, or I can write some algorithm that uses the explicit information that F(x) is a cdf (for example, we know that it is monotonically non-decreasing and we have its derivative, f(x)).
FYI, the shape of the PDFs I am working with is generic mixtures of Gaussian distributions multiplied by a polynomial of arbitrary degree (the CDF can be computed analytically in this case, although it's not pretty and it becomes expensive for polynomials with many terms). Note that I need to compute the inverse CDF for millions of CDFs within this class; a lookup table is not feasible.
For more mathematical details see also this related question on Math Exchange (here I am asking specifically for a MATLAB solution).

Boolean least squares

For a spectrum estimation algorithm I need to find the best fitting linear combination of vectors to fit a target spectral distribution. So far, this works relatively well using the lsqlin optimizer in MATLAB.
However, for the final application I would like to approximate/solve this problem for exclusively zeros and ones, meaning Ax=b solved for Boolean x.
Is there any way to parametrize lsqlin or another optimizer function for this purpose?
If the problem is just:
Solve Ax=b for x in {0,1}
then you can use a MIP solver (e.g. Matlab intlinprog). If the problem is over-constrained and you want a least squares solution:
Min w'w
S.t. Ax - b = w
x in {0,1} (binary variable)
w free variable
then you have a MIQP (Mixed Integer Quadratic Programming) problem. There are good solvers for this such as Cplex and Gurobi (callable from Matlab). Also Matlab has a discussion about an approximation scheme using intlinprog. Another idea is to replace the quadratic objective by a sum of absolute values. This can be formulated as linear MIP model.

How to use symbolic-math of Matlab to obtain Gradient of a complex equation

I am solving a hug optimization problem that takes a lot of time to converge to a solution. This is for the reason that Matlab uses finite difference method for calculating the Gradient of objective functions and nonlinear constraint and also constructing Hessian matrix. But there is an option in fmincon solver that allow you to supply the analytic derivative of functions and constraints.
For this reason I wanted to know how can I calculate the Grad of the namely function which is given here both in mathematical aspect and symbolic math tool. I should note that still I want the gradient of the objective in the vector format. (not by extracting Eq1 in 5 equation.)
Lets assume we have these optimization variables
Pd=[x1 x2 x3 x4]
Now we define these 2 variables based on optimization vector i.e.,Pd
Pdn=[pd(1);mo;Pd(2);0;Pd(4)]
Pgn=[pd(2);Pd(1);m1;Pd(4),Pd(1)]
Now this is the equation that I want to take the gradient from:
Eq1=Sin(Pdn)+Pdn+Pgn.^2