I have been fitting some data generated in the lab. The best fit is by piecewise polynomial
I would now need to turn this into a function that I can then plug in a model (where I repeat this function with different time delays and then sum them up). What is the best way to do this? If I generate a fit wth the FIT function, I am then unable to use that object as a normal function. Any advice?
Related
If I know right- for trying to fit a model; some kind of iterative algorithm is used where the goal is to minimize a cost function (e.g. OLS, MSE, RMSE, MMSE).
I know the robustfit() method do the fitting for a regression model using OLS (Ordinary least squares) cost function and then performs an additional weighted regression to provide the final model. Also, I think fitlm() uses RMSE as the cost function.
My first query is: in Matlab, whether the cost function and weight function are same or not.
Also, how to provide my custom cost function (e.g. MSE) while letting MATLAB do the fitting?
I came to know that, robustfit() can take additional/custom weight function. But again I am confused will that be treated as cost function? Or I need to use some other kind of argument for providing my custom cost function?
Just like you said, the cost function is the function you want to minimize to find the correct coefficients of your predictor. For example, a cost function could be cost(a) = sqrt(mean((a).^2));, a usually being y - y_est.
In another side, the weight function, regarding to Robust Regression, is a way to give less influence or to remove accidental measurements for the algorithm. If you look at the shape of a weight function:
You will see that the further residuals are from zero, the less they are taken in account (sometimes even completely removed) . This is a way to avoid mistakes due to outliers.
The way the function estimates error is not open to custom, only the weight function is.
I want to minimize some function J using gradient information. I found two functions in scipy that may do the job, scipy.optimize.fmin_tnc (here) and scipy.optimize.minimize (here), and I implemented them. However, now I need the stepwise output of the function evaluations at each step of the (e.g. Newton) algorithm to plot its convergence. Is it possible to get this vector somehow out of these functions? It is not part of default return values as it seems.
There are many curve fitting and interpolation tools like polyfit (or even this nice logfit toolbox I found here), but I can't seem to find anything that will fit a sigmoid function to my x-y data.
Does such a tool exist or do I need to make my own?
If you have the Statistics Toolbox installed, you can use nonlinear regression with nlinfit:
sigfunc = #(A, x)(A(1) ./ (A(2) + exp(-x)));
A0 = ones(size(A)); %// Initial values fed into the iterative algorithm
A_fit = nlinfit(x, y, sigfunc, A0);
Here sigfunc is just an example for a sigmoid function, and A is the vector of the fitting coefficients.
nlinfit, and especially gatool, are big hammers for this problem. A sigmoid is not a specific function. Most commonly it is taken to be the same as the logistic function (also often the most efficient to calculate):
y = 1./(1+exp(-x));
or a generalized logistic. But all manner of curves can have sigmoidal shapes. If you know if your data corresponds to one in particular, fitting can be improved and more efficient methods can be applied. For example, the error function (erf) has a sigmoidal shape and shows up in the CDF of the normal distribution. If you know that your data is the result of a Gaussian process (i.e., the data is the CDF) and you have the Stats toolbox, you can use the normfit function. This function is based on maximum likelihood estimation (MLE). If you end up needing to write a custom fitting function - say, for performance reasons - I'd investigate MLE techniques for the particular form of sigmoid that you'd like to fit.
I would suggest you use MATLAB's Global Optimization Toolbox, and in particular the Genetic Algorithm Solver, which you can use for your problem by optimizing (= finding the best fit for your data) the sigmoid function's parameters through genetic algorithm. It has a GUI that is easy to use.
The Genetic Algorithm Solver's GUI, which you can call using gatool:
I need to fit data in quite an indirect way. The original data to be recovered in the fit is some linear function with small oscillations and drifts on it, that I would like to identify. Let's call this f(t). We can not record this parameter in the experiment directly, but only indirectly, let's say as g(f) = sin(a f(t)). (The real transfer funcion is more complex, but it should not play a role in here)
So if f(t) changes direction towards the turning points of the sin function, it is difficult to identify and I tried an alternative approach to recover f(t) than just the inverse function of g and some data continuing guesses:
I create a model function fm(t) which undergoes the same and known transfer function g() and fit g(fm(t)) to the data. As the dataset is huge, I do this piecewise for successive chunks of data guaranteeing the continuity of fm across the whole set.
A first try was to use linear functions using the optimize.leastsq, where the error estimate is derived from g(fm). It is not completely satisfactory, and I think it would be far better to fit a spline to the data to get fspline(t) as a model for f(t), guaranteeing the continuity of the data and of its derivative.
The problem with it is, that spline fitting from the interpolate package works on the data directly, so I can not wrap the spline using g(fspline) and do the spline interpolation on this. Is there a way this can be done in scipy?
Any other ideas?
I tried quadratic functions and fixing the offset and slope such to match the ones of the preceeding fitted chunk of data, so there is only one fitting parameter, the curvature, which very quickly starts to deviate
Thanks
What you would need is a matrix of spline basis functions, b(t), so you can approximate f(t) as a linear combination of spline basis function
f(t) = np.dot(b(t), coefs)
and then estimate the coefficients, coefs, by optimize.leastsq.
However, spline basis functions are not readily available in python, as far as I know (unless you borrow experimental scripts or search through the code of some packages).
Instead you could also use polynomials, for example
b(t) = np.polynomial.chebvander(t, order)
and use a polynomial approximation instead of the splines.
The structure of this problem is very similar to generalized linear models where g is your known link function and similar to index problems in econometrics.
It would be possible to use the scipy splines in an indirect way if you create artificial data
y_i = f(t_i)
where f(t_i) are scipy.interpolate splines, and the y_i are the parameters to be estimated in the least squares optimization. (Loosely based on a script that I saw some time ago that used this for creating a different kind of smoothing splines than the scipy version. I don't remember where I saw this.)
Thank you for these comments. I tried out the polynomial basis suggested above, but polynomials are no option for my needs, ads they tend to create ringing, which is difficult to condition.
The solution on using splines I now found is quite simple and straightforward, and I think it is what you meant by "using the splines in an indirect way".
The fitting function f(t) is obtained by the interpolate.splev(x, (t,c,k)) function, but providing the spline coefficients c by the omptimize.leastsq function. In this way, f(t) is no direct spline fit (as one would usually obtain with the splrep(x, y) function) but indirectly optimized in the fit, and therefore it is possible to use the link function g on it. The initial guess for c might be obtained by one evaluation of splrep(xinit, yinit, t=knots) on model data.
One trick is to restrict the number of knots for the spline to below the number of datapoints by explicitly specifying them during the function call of splrep() and giving this reduced set during the evaluation using splev().
I am looking for numerical integration with matlab. I know that there is a trapz function in matlab but the precision is not good enough. By searching it online, I found there is a quad function there it seems only accept symbolic expression as input. My data is all discrete and one-dimensional. Is that any way to use quad on my data? Thanks.
An answer to your question would be no. The only way to perform numerical integration for data with no expression in Matlab is by using the trapz function. If it's not accurate enough for you, try writing your own quad function as Li-aung said, it's very simple, this may help.
Another method you may try is to use the powerful Curve Fitting Tool cftool to make a fit then use the integrate function which can operate on cfit objects (it has a weird convention, the upper limit is the first argument!). I don't think you will get much accurate answers than trapz, it depends on the fit.
Use the spline function in MATLAB to interpolate your data, then integrate this data. This is the standard method for integrating data in discrete form.
You can use quadl() to integrate your data if you first create a function in which you interpolate them.
function f = int_fun(x,xdata,ydata)
f = interp1(xdata,ydata,x);
And then feed it to the quadl() function:
integral = quadl(#int_fun,A,B,[],[],x,y) % syntax to pass extra arguments
% to the function
Integration of a function of one variable is the computation of the area under the curve of the graph of the function. For this answer I'll leave aside the nasty functions and the corner cases and all the twists and turns that trip up writers of numerical integration routines, most of which are probably not relevant here.
Simpson's rule is an approach to the numerical integration of a function for which you have a code to evaluate the function at points within its domain. That's irrelevant here.
Let's suppose that your data represents a time series of values collected at regular intervals. Then you can plot your data as a histogram with bars of equal width. The integrand you seek is the sum of the areas of the bars in the histogram between the limits you are interested in.
You should be able to apply this approach to data sets where the x-axis (ie the width of the bars in the histogram) does not show time, to the situation where the bars are not of equal width, to the situation where the data crosses the x-axis, and most reasonable data sets, quite easily.
The discretisation of your data establishes a limit to the accuracy of the result you can get. If, for example, your time series is sampled at 1sec intervals you can't integrate over an interval which is not a whole number of seconds by this approach. But then, you don't really have the data on which to compute a figure with any more accuracy by any approach. Sure, you can use Matlab (or anything else) to generate extra digits of precision but they don't carry any meaning.