Can Scipy.optimize.differential_evolution compute SD or SE of estimated parameters? - confidence-interval

I used Scipy.optimize.differential_evolution to estimate parameters for my models; however, unlike Curve_fit, it cannot retire covariance. How can I calculate the uncertainties of my parameters?

You could start off with differential_evolution to find the minimum, then follow-up with a leastsq or curve_fit to estimate the uncertainties. Other ways of estimating the uncertainties are to calculate the Hessian matrix by using e.g. finite differences

Related

Difference between scipy.optimize.curve_fit and linear least squares

I am struggling to find information on what exactly the scipy.optimize.curve_fit function does to fit (for example) exponential data and how would does this method differ from just linearizing the data and directly computing the linear fit using the general formulas for a weighted linear least squares fit?
It's Levenberg-Marquadt nonlinear fitting for unbounded problems and a trust-region variant when bounds are given. See the references in the docstring of least_squares.

MATLAB computing Bayesian Information Criterion with the fit.m results

I'm trying to compute the Bayesian with results from fit.m
According to the Wikipedia, log-likelihood can be approximated (when noise is ~N(0,sigma^2)) as:
L = -(n/2)*log(2*pi*sigma^2) - (rss(2*sigma^2))
with n as the number of samples, k as the number of free parameters, and rss as residual sum of squares. And BIC is defined as:
-2*L + k*log(n)
But this is a bit different from the fitglm.m result even for simple polynomial models and the discrepancy seems to increase when higher order terms are used.
Because I want to fit Gaussian models and compute BICs of them, I cannot just use fitglm.m Or, is there any other way to write Gaussian model with the Wilkinson notation? I'm not familiar with the notation, so I don't know if it's possible.
I'm not 100% sure this is your issue, but I think your definition of BIC may be misunderstood.
The Bayesian Information Criterion (BIC) is an approximation to the log of the evidence, and is defined as:
where
is the data,
is the number of adaptive parameters of your model,
is the data size, and most importantly,
is the maximimum a posteriori estimate for your model / parameter set.
Compare for instance with the much simpler Akaike Information Criterion (AIK):
which relies on the usually simpler to obtain maximum likelihood estimate
of the model instead.
Your
is simply a parameter, which is subject to estimation. If the
you're using here is derived from the sample variance, for instance, then that simply corresponds to the
estimate, and not the
one.
So, your discrepancy may simply derive from the builtin function using the 'correct' estimate and you using the wrong one in your 'by-hand' calculations of the BIC.

extrapolation using Gaussian processes regression or Kriging

Is there any way to estimate the extrapolation using kriging or Gaussian processes regression ?
Gaussian processes work very well for interpolation of scattered data; however, I need to extrapolate a time series of variable in time.
hoe can I extrapolate the x(n+1)
using the history of x variable, x_i , i = n, n-1 ,....
flag
for example, in python: scikit-learn.org/stable/modules/gaussian_process.html
Extrapolation works in the same way theoretically and practically.
In theory, when you learn a Gaussian process regression model, you have modelled a Gaussian process on your data, you selected its mean function, its covariance function and have estimated their parameters. To interpolate (or extrapolate), you compute the mean of this Gaussian process at a new point, knowing the learning points.
In practice, for both interpolation and extrapolation, you just have to call a prediction function (called predict in R package DiceKriging and in scikit-learn in python).
However, you must known that Gaussian process regression (as many of the regression techniques [citation needed] works quite bad in extrapolation. The Gaussian process mean quickly "returns" to the function mean you have defined. Then, Gaussian process regression in extrapolation is just parametric regression whose model is the one you have chosen for the mean function.

Hyper-parameters of Gaussian Processes for Regression

I know a Gaussian Process Regression model is mainly specified by its covariance matrix and the free hyper-parameters act as the 'weights'of the model. But could anyone explain what do the 2 hyper-parameters (length-scale & amplitude) in the covariance matrix represent (since they are not 'real' parameters)? I'm a little confused on the 'actual' meaning of these 2 parameters.
Thank you for your help in advance. :)
First off I would like to point out that there are infinite number of kernels that could be used in a gaussian process. One of the most common however is the RBF (also referred to as squared exponential, the expodentiated quadratic, etc). This kernel is of the following form:
The above equation is of course for the simple 1D case. Here l is the length scale and sigma is the variance parameter (note they go under different names depending on the source). Effectively the length scale controls how two points appear to be similar as it simply magnifies the distance between x and x'. The variance parameter controls how smooth the function is. These are related but not the same.
The Kernel Cookbook give a nice little description and compares RBF kernels to other commonly used kernels.

How can I efficiently model the sum of Bernoullli random variables?

I am using Perl to model a random variable (Y) which is the sum of some ~15-40k independent Bernoulli random variables (X_i), each with a different success probability (p_i). Formally, Y=Sum{X_i} where Pr(X_i=1)=p_i and Pr(X_i=0)=1-p_i.
I am interested in quickly answering queries such as Pr(Y<=k) (where k is given).
Currently, I use random simulations to answer such queries. I randomly draw each X_i according to its p_i, then sum all X_i values to get Y'. I repeat this process a few thousand times and return the fraction of times Pr(Y'<=k).
Obviously, this is not totally accurate, although accuracy greatly increases as the number of simulations I use increases.
Can you think of a reasonable way to get the exact probability?
First, I would avoid using the rand built-in for this purpose which is too dependent on the underlying C library implementation to be reliable (see, for example, my blog post pointing out that the range of rand on Windows has cardinality 32,768).
To use the Monte-Carlo approach, I would start with a known good random generator, such as Rand::MersenneTwister or just use one of Random.org's services and pre-compute a CDF for Y assuming Y is pretty stable. If each Y is only used once, pre-computing the CDF is obviously pointless.
To quote Wikipedia:
In probability theory and statistics, the Poisson binomial distribution is the discrete probability distribution of a sum of independent Bernoulli trials.
In other words, it is the probability distribution of the number of successes in a sequence of n independent yes/no experiments with success probabilities p1, …, pn. (emphasis mine)
Closed-Form Expression for the Poisson-Binomial Probability Density Function might be of interest. The article is behind a paywall:
and we discuss several of its advantages regarding computing speed and implementation and in simplifying analysis, with examples of the latter including the computation of moments and the development of new trigonometric identities for the binomial coefficient and the binomial cumulative distribution function (cdf).
As far as I recall, shouldn't this end up asymptotically as a normal distribution? See also this newsgroup thread: http://newsgroups.derkeiler.com/Archive/Sci/sci.stat.consult/2008-05/msg00146.html
If so, you can use Statistics::Distrib::Normal.
To obtain the exact solution you can exploit the fact that the probability distribution of the sum of two or more independent random variables is the convolution of their individual distributions. Convolution is a bit expensive but must be calculated only if the p_i change.
Once you have the probability distribution, you can easily obtain the CDF by calculating the cumulative sum of the probabilities.