extrapolation using Gaussian processes regression or Kriging - kriging

Is there any way to estimate the extrapolation using kriging or Gaussian processes regression ?
Gaussian processes work very well for interpolation of scattered data; however, I need to extrapolate a time series of variable in time.
hoe can I extrapolate the x(n+1)
using the history of x variable, x_i , i = n, n-1 ,....
flag
for example, in python: scikit-learn.org/stable/modules/gaussian_process.html

Extrapolation works in the same way theoretically and practically.
In theory, when you learn a Gaussian process regression model, you have modelled a Gaussian process on your data, you selected its mean function, its covariance function and have estimated their parameters. To interpolate (or extrapolate), you compute the mean of this Gaussian process at a new point, knowing the learning points.
In practice, for both interpolation and extrapolation, you just have to call a prediction function (called predict in R package DiceKriging and in scikit-learn in python).
However, you must known that Gaussian process regression (as many of the regression techniques [citation needed] works quite bad in extrapolation. The Gaussian process mean quickly "returns" to the function mean you have defined. Then, Gaussian process regression in extrapolation is just parametric regression whose model is the one you have chosen for the mean function.

Related

In Bayesian simulation, why use fixed value for the parameters, which has a prior, in the model?

When simulating a Bayesian model, are we supposed to treat the parameters as a random variable (a prior), but not use a fixed value?
For example, we have a Bayesian linear model y=x\beta+\epsilon. When simulating it, literature usually: 1. set regression coefficients at fixed values, e.g. (0,3,-2,1,0,...); 2. simulate the predictors many times; 3. simulate the error term, usually standard normal; 4. generate the response.
If the regression coefficients have a prior (assume they have exchangeable priors), and thus we have posterior distributions, why would we simulate only one set of regression coefficients values? This sounds like the posterior has a distribution, meaning that we don't believe in any fixed value, while the truth indeed is fixed value. Even the posterior mean is supposed to converge to the OLS estimate under good setups, but this still feels difficult to understand.

Is non-linearity added to neural networks because of its derivatives?

I have question:
I always assumed that non-linearity was applied to a neural-network in order to calculate the minimum of a error surface.
If the function is f(x)=mx+b the derivative is always f'(x) = 1.
Is this one of the reasons why non-linearity ( exempli gratia through sigmoid functions which derivative is f'(x)=f(x)*(1-f(x))) is applied?
Thank you very much.
The neural network is a model of your problem, making predictions
for inputs. The loss function is a measure of the accuracy of
predictions with respect to the observed results.
"Linearity" typically refers to the model. A linear model is a very
simple one: many interesting problems can be approximated by linear
functions, but often you need a more sophisticated model.
Since the sequential composition of linear functions is still linear,
the expressiveness of deep networks derives from the fact of inserting
non linear activation functions modulating the output of artificial
neurons (approximating a thresholding filter). These non linear functions
must be derivable to work with the backpropagation algorithm.
Indipendently from the model, the loss function can be "linear" (L1),
such as the sum of absolute deviations, or non linear, such as
mean squared residuals (L2) or other different loss functions. Again,
the loss function must be derivable too.
See for instance this lecture by Hinton et al.
for the discussion of a simple linear model with a L2 loss function
(then enriched with a sigmoid activation function).

Non-linear classification vs regression with FFANN

I am trying to differentiate between two classes of data for forecasting. Basically the dependent variables are features of a signal that I want to forecast. I want to predict whether the signal will have a positive or negative slope in the near future (1 time step ahead). I have tried with different time series analysis, such as Fourier analysis, fitting using neural networks, auto-regressive models, and classification with neural nets (using patternet in Matlab).
The function is continuous, so the most logical assumption is to use some regression analysis tool to determine what's going to happen. However, since I only care whether the slope is going to positive or negative, I changed the signal to a binary signal (1 if the slope is positive, -1 if the slope is 0 or negative).
This is by the far the best results I have gotten! However, for some unknown reason a neural net designed for classification did not work (the confusion matrix stated that there was a precision of around 50%). So I decided to try with a regular feedforward neural net...
Since the neural network outputs continuous data, I didn't know what to do... But then I remembered about Logistic regression, and since its transfer function is a log function (bounded by 0 and 1), it can be interpreted as a probability. So I basically did the same, defined a threshold (e.g above 0 is 1, below 0 is -1), and voila! The precision sky-rocked! I am getting a precision of around 70-80%.
Since I am using a sigmoid transfer function, the neural network wll have a continuous output just as logistic regression (but on this case between -1 and 1), so I am assuming my approach is technically still regression and not classification. My question is... Which is better? For my specific problem where fitting did not give really good results but I had to convert this to a binary problem... Which should give better results? Classification or regression?
Should I try a different configuration of a neural net (with a different transfer function), should I try with support vector machine or any other classification algorithm? Or should I stick with regression but defining a threshold myself just as I would do with logistic regression?

Fitting sigmoid to data

There are many curve fitting and interpolation tools like polyfit (or even this nice logfit toolbox I found here), but I can't seem to find anything that will fit a sigmoid function to my x-y data.
Does such a tool exist or do I need to make my own?
If you have the Statistics Toolbox installed, you can use nonlinear regression with nlinfit:
sigfunc = #(A, x)(A(1) ./ (A(2) + exp(-x)));
A0 = ones(size(A)); %// Initial values fed into the iterative algorithm
A_fit = nlinfit(x, y, sigfunc, A0);
Here sigfunc is just an example for a sigmoid function, and A is the vector of the fitting coefficients.
nlinfit, and especially gatool, are big hammers for this problem. A sigmoid is not a specific function. Most commonly it is taken to be the same as the logistic function (also often the most efficient to calculate):
y = 1./(1+exp(-x));
or a generalized logistic. But all manner of curves can have sigmoidal shapes. If you know if your data corresponds to one in particular, fitting can be improved and more efficient methods can be applied. For example, the error function (erf) has a sigmoidal shape and shows up in the CDF of the normal distribution. If you know that your data is the result of a Gaussian process (i.e., the data is the CDF) and you have the Stats toolbox, you can use the normfit function. This function is based on maximum likelihood estimation (MLE). If you end up needing to write a custom fitting function - say, for performance reasons - I'd investigate MLE techniques for the particular form of sigmoid that you'd like to fit.
I would suggest you use MATLAB's Global Optimization Toolbox, and in particular the Genetic Algorithm Solver, which you can use for your problem by optimizing (= finding the best fit for your data) the sigmoid function's parameters through genetic algorithm. It has a GUI that is easy to use.
The Genetic Algorithm Solver's GUI, which you can call using gatool:

Scipy/Python indirect spline interpolation

I need to fit data in quite an indirect way. The original data to be recovered in the fit is some linear function with small oscillations and drifts on it, that I would like to identify. Let's call this f(t). We can not record this parameter in the experiment directly, but only indirectly, let's say as g(f) = sin(a f(t)). (The real transfer funcion is more complex, but it should not play a role in here)
So if f(t) changes direction towards the turning points of the sin function, it is difficult to identify and I tried an alternative approach to recover f(t) than just the inverse function of g and some data continuing guesses:
I create a model function fm(t) which undergoes the same and known transfer function g() and fit g(fm(t)) to the data. As the dataset is huge, I do this piecewise for successive chunks of data guaranteeing the continuity of fm across the whole set.
A first try was to use linear functions using the optimize.leastsq, where the error estimate is derived from g(fm). It is not completely satisfactory, and I think it would be far better to fit a spline to the data to get fspline(t) as a model for f(t), guaranteeing the continuity of the data and of its derivative.
The problem with it is, that spline fitting from the interpolate package works on the data directly, so I can not wrap the spline using g(fspline) and do the spline interpolation on this. Is there a way this can be done in scipy?
Any other ideas?
I tried quadratic functions and fixing the offset and slope such to match the ones of the preceeding fitted chunk of data, so there is only one fitting parameter, the curvature, which very quickly starts to deviate
Thanks
What you would need is a matrix of spline basis functions, b(t), so you can approximate f(t) as a linear combination of spline basis function
f(t) = np.dot(b(t), coefs)
and then estimate the coefficients, coefs, by optimize.leastsq.
However, spline basis functions are not readily available in python, as far as I know (unless you borrow experimental scripts or search through the code of some packages).
Instead you could also use polynomials, for example
b(t) = np.polynomial.chebvander(t, order)
and use a polynomial approximation instead of the splines.
The structure of this problem is very similar to generalized linear models where g is your known link function and similar to index problems in econometrics.
It would be possible to use the scipy splines in an indirect way if you create artificial data
y_i = f(t_i)
where f(t_i) are scipy.interpolate splines, and the y_i are the parameters to be estimated in the least squares optimization. (Loosely based on a script that I saw some time ago that used this for creating a different kind of smoothing splines than the scipy version. I don't remember where I saw this.)
Thank you for these comments. I tried out the polynomial basis suggested above, but polynomials are no option for my needs, ads they tend to create ringing, which is difficult to condition.
The solution on using splines I now found is quite simple and straightforward, and I think it is what you meant by "using the splines in an indirect way".
The fitting function f(t) is obtained by the interpolate.splev(x, (t,c,k)) function, but providing the spline coefficients c by the omptimize.leastsq function. In this way, f(t) is no direct spline fit (as one would usually obtain with the splrep(x, y) function) but indirectly optimized in the fit, and therefore it is possible to use the link function g on it. The initial guess for c might be obtained by one evaluation of splrep(xinit, yinit, t=knots) on model data.
One trick is to restrict the number of knots for the spline to below the number of datapoints by explicitly specifying them during the function call of splrep() and giving this reduced set during the evaluation using splev().