fitting a statistical distribution using scipy - scipy

I am trying to use Scipy to fit a theoretical distribution to my empirical data using code similar to:
dist_names = ['gamma', 'beta', 'rayleigh', 'norm', 'pareto']
for dist_name in dist_names:
dist = getattr(scipy.stats, dist_name)
param = dist.fit(data)
pdf_fitted = dist.pdf(x, *param[:-2], loc=param[-2], scale=param[1]) #* size
The fit() function uses a data series of observations. My problem is that I already have the empirical distribution in form of an x-array with the bin values (0,1,2.,3....,n) and a corresponding y-array with the probability for each bin. Can anyone advise me how to use my x/y arrays in the fitting process rather than the original series?

Related

How to obtain an equation for a line fitted to data

I am trying to obtain an equation for a function fitted to some histogram data, I was thinking of trying to do this by fitting a rational function as the data doesn't resemble any distribution recognisable by myself.
The data is experimental, and I want to be able to generate a random number according to its distribution. Hence I am hoping to be able to fit it to some sort of PDF from which I can obtain a CDF, which can be rearranged to a function into which a uniformly distributed random number between 0 and 1 can be substituted in order to obtain the desired result.
I have attempted to use the histfit function, which has worked but I couldn't figure out how to obtain an equation for the curve it fitted. Is there something stupid I have missed?
Update: I have discovered the function rationalfit, however I am struggling to figure out what the inputs need to be.
Further Update: Upon exploring the histfit command further I have discovered the option to fit it to a kernal, the figure for which looks promising, however I am only able to obtain a set of x and y values for the curve, not its equation as a I wanted.
From the documentation on histfit:
Algorithms
histfit uses fitdist to fit a distribution to data. Use fitdist
to obtain parameters used in fitting.
So the answer to your question is to use fitdist to get the parameters you're after. Here's the example from the documentation:
rng default; % For reproducibility
r = normrnd(10,1,100,1);
histfit(r)
pd = fitdist(r,'Normal')
pd =
NormalDistribution
Normal distribution
mu = 10.1231 [9.89244, 10.3537]
sigma = 1.1624 [1.02059, 1.35033]

Fit data to other formulations of Gumbel and Weibull models in Matlab

I need to fit Extreme Value distributions to wind speed data. I'm using Matlab for doing this. It may not be evident to a user that there are alternative formulations of the Gumbel and Weibull models than those that Matlab has built in in its commands: evfit and wblfit. Thus, the definitions implemented are:
Gumbel (suited for minima)
However, there's another version of the Gumbel to which I need to fit the data:
Weibull
Same comments applies to the Weibull models in Matlab. In previous versions, Matlab implemented a version of the Weibull in the command weibfit (not available anymore), which was later replaced by wblfit.
and previously as:
My question is: any ideas how can the data be fitted to the previous definitions of the Gumbel and Weibull models in Matlab?
Thanks,
You can estimate the parameter of a custom distribution using the function mle:
Example with your custom weibul PDF:
data = wblrnd(1,1,1000,1); %random weibull data
custompdf = #(x,a,b) (b*a).*x.^(b-1).*exp(-a*x.^b); %your custom PDF function
opt = statset('MaxIter',1e5,'MaxFunEvals',1e5,'FunValCheck','off'); %Iteration's option
[param,ci]= mle(data,'pdf',custompdf,'start',[1 1],'Options',opt,'LowerBound',[0 0],'UpperBound',[Inf Inf])
If the function doesn't converge, you can adjust the starting point with some better suited value.

How to propagate error when using scipy quad on a spline of data with measurement error?

I have a data set with N points which I fit a spline to and integrate using scipy.integrate.quad. I would like to use the N associated measurement errors to put an error estimate on the final integral value.
I originally tried to use the uncertainties package but the x+/-stddev objects did not work with scipy.
def integrand(w_point, x, y):
#call spline function to get data arbitrary points
f_i = spline_flux_full(x, y, w_point)
#use spline for normalizing data at arbitrary points
f_i_continuum = coef_continuum(w_point)
#this is the integrand evaluated at w_point
W_i = 1.-(f_i/f_i_continuum)
return(W_i)
Have any ideas?
Synthetic datasets. You have your data points with errors. Now generate 1000 datasets with each point drawn from a normal distribution centered around the measured point and standard deviation given by an errror at this point. Fit each dataset. Integrate. Repeat. Now you have 1000 values of the integral. Compute the mean and std dev of these values.

How to apply Maximum likelihood estimator on RGB values in MATLAB using predefined function?

I have a text file that consists 3 columns and N rows of RGB values [See Text File] of a particular region of a image. I am trying to use the mle function that is predefined in matlab but not able to do it.
These are the possible methods that can be used as shown on matlab website but I don't know how and which one to implement with RGB values.
phat = mle(data)
phat = mle(data,'distribution',dist)
phat = mle(data,'pdf',pdf,'start',start)
phat = mle(data,'pdf',pdf,'start',start,'cdf',cdf)
phat = mle(data,'logpdf',logpdf,'start',start)
phat = mle(data,'logpdf',logpdf,'start',start,'logsf',logsf)
phat = mle(data,'nloglf',nloglf,'start',start)
phat = mle(___,Name,Value)
[phat,pci] = mle(___)
Help would be appreciated. Thanks.
I cannot comment so writing as an answer.
You can read your rgb data into an nx3 matrix.
Since the mle function needs a vector of 1 dimensional values, from what I understand, one way to go ahead would be to average the r,g and b values and get a mean vector of size nx1.(you will have to evaluate if this serves your purpose)
Then pass it to the mle function, as
phat = mle(data);
IF you think your data will be fit by an unskewed normal distribution function. Otherwise you can use the form
phat = mle(data,'distribution',dist);
and provide the distribution yourself or look up and choose from some of matlab's provided distributions, for example, 'burr'.
You can also probably view the distribution using a histogram to see what the distribution looks like first.
http://in.mathworks.com/help/stats/mle.html

How To Fit Multivariate Normal Distribution To Data In MATLAB?

I'm trying to fit a multivariate normal distribution to data that I collected, in order to take samples from it.
I know how to fit a (univariate) normal distribution, using the fitdist function (with the 'Normal' option).
How can I do something similar for a multivariate normal distribution?
Doesn't using fitdist on every dimension separately assumes the variables are uncorrelated?
There isn't any need for a specialized fitting function; the maximum likelihood estimates for the mean and variance of the distribution are just the sample mean and sample variance. I.e., compute the sample mean and sample variance and you're done.
Estimate the mean with mean and the variance-covariance matrix with cov.
Then you can generate random numbers with mvnrnd.
It is also possible to use fitmgdist, but for just a multivariate normal distribution mean and cov are enough.
Yes, using fitdist on every dimension separately assumes the variables are uncorrelated and it's not what you want.
You can use [sigma,mu] = robustcov(X) function, where X is your multivariate data, i.e. X = [x1 x2 ... xn] and xi is a column vector data.
Then you can use Y = mvnpdf(X,mu,sigma) to get the values of the estimated normal probability density function.
https://www.mathworks.com/help/stats/normfit.html
https://www.mathworks.com/help/stats/mvnpdf.html