What is the parameter that one obtains with stats.expon.fit()? - scipy

The scipy.stats.expon.fit(data) function is supposed to fit data to the equation
<img src="https://latex.codecogs.com/gif.latex?$\lambda&space;e^{-\lambda&space;x}$" title="$\lambda e^{-\lambda x}$" />
However, sometimes the equation is expressed as
<img src="https://latex.codecogs.com/gif.latex?\frac{1}{\beta}e^{-x/\beta}" title="\frac{1}{\beta}e^{-x/\beta}" />
I revised the documentation in scipy, but I am not sure if the method returns lambda or beta. Please help.

scipy.stats.expon has two parameters, the location loc and the scale scale. (In fact, all the univariate continuous distributions in SciPy include these two parameters.) Neither of the two versions that you show include the location parameter loc. You can think of them as having fixed loc=0; when you use the fit method, you can achieve this by using the argument floc=0. See Generating random numbers given required distribution and empirical sampling for an example.
SciPy's scale corresponds to β in your second version of the distribution, or 1/λ in the first version.

Related

dfittool results interpretation

Does anyone know how to tell the difference between distributions (ie their goodness of fit) using the dfittool in Matlab? In a class I took forever ago, we learned about the log likelihood parameter and how to compare a pdf fitted to Gaussian vs gamma, etc. But right now, all the matlab help files online are like "it means something." Any assistance would be appreciated. Basically, I need to interpret the "results" in "edit fit" of the dfittool. I want to be able to compare my dfits to each other from the results, so I can pick the best fit for my analysis. I don't know what the difference is between a log likelihood of -111 vs -105.
Example below:
Distribution: Normal
Log likelihood: -110.954
Domain: -Inf < y < Inf
Mean: 101.443
Variance: 436.332
Parameter Estimate Std. Err.
mu 101.443 4.17771
sigma 20.8886 3.04691
Estimated covariance of parameter estimates:
mu sigma
mu 17.4533 6.59643e-15
sigma 6.59643e-15 9.28366
Thank you!
(Log) likelihood is a measure of the fit of a distribution to data, so the simple answer is: the distribution with the largest likelihood is the one that fits best. However, what you get here as an output is the maximized likelihood, i.e. the likelihood at those parameter values where it is maximal. Different families of distributions might be differently "flexible", so that it is easier to get a larger likelihood with one of them in general, so this limits comparability. This holds especially if you compare families with different numbers of parameters. A fix for this is to use formal model comparison, e.g. using the Bayes factor, which however is considerably more complex mathematically, or its approximation, the Bayesian information criterion.
More generally speaking however, it is seldomly a good idea to just randomly pick distributions and see how well they fit. It would be better to have some at least partially theoretically motivated idea why a distribution is a candidate. On the most basic level this means considering its definition range: the normal distribution is defined on the whole real line, the gamma distribution only for nonnegative real numbers. This way it should be possible to rule one of them out based on basic properties of your data.

Creating a stochastic time-series with given parameters

I would like to create a tool for generating a stochastic time-series distribution, for which I can provide the parameters (for a normal distribution) the mean, standard deviation, skewness and kurtosis. There is a similar question here using R, but I am not able to interpret this and put it in MATLAB.
Is there something that someone knows can do this already? (I haven't been able to find anything)
If not, what would be some good advice for starting something of my own? Any known useful functions? I would also like to be able to build upon it afterwards, for example: adding outliers, clusters of volatility, adjusting heteroscedasticity.
I realise me saying 'stochastic' and then in the same sentence 'given parameters' may seem odd, but it isn't - I want each time point to be random, but the parameters to describe, say 10,000 time points.
If you're looking for the equivalent of the solution in R, Matlab's Statistics Toolbox has limited support for the Johnson and Pearson distribution systems. In particular, the johnsrnd function produces random variates for the Johnson system. The Pearson system and pearsrnd, however, takes moments directly.
A big caveat. Using moments to describe or fit or produce random variates – often referred to as moment matching – is not robust and poorly regarded by statisticians. They're not guaranteed to uniquely define a distribution unless you have the entire moment generating function.

matlab - pdf for multivariate uniform distribution

The pdf for the multivariate normal distribution in MATLAB is mvnpdf(...). What about the case where multiple variables are uniformly distributed: Is there a function to describe their joint distribution analogous to the multivariate normal distribution? If there is no such function, is there a trick to handle this case?
The simplest way how several variables can be uniformly distributed is if they are mutually independent; in that case you simply have a uniform distribution over the hypercube in the space spanned by the variables. In order to get samples from this distribution, you just separately generate samples for each of the variables.
The point where a "trick" might be necessary is if you have dependencies between the variables even though the marginal distribution for each of them is still uniform. In this case you have to describe the dependency structure, and I'm not aware of any standard way to do this (the way dependencies between normally distributed variables are described by a correlation matrix).
Of course such distributions exist: For two dimensions, one possibility would be to have a joint distribution that looks like a solution to the "eight rooks" problem:
Another one actually derives from the introductory Matlab example, the magic square:
Both of these examples are discrete distributions, but can be produced at arbitrary granularity, or simply interpreted as piecewise constant continuous distributions.
As you can see there are many possibilities for a multivariate distribution each of whose marginal distributions are uniform. The question you have to answer for yourself is what kind of dependencies, if any, you are interested in?
If I'm understanding the question properly, we want to calculate the pdf of a multivariate uniform distribution. By definition, the pdf is constant for all values in the support the distribution. Thus to calculate the pdf all that is required is to calculate the norming constant, which is given by the inverse of the integral of the support. That is to say, the pdf is given by
f(x) = 1 / integral(A)
where A is the support set, and x is an element in A. If an analytic solution to integral(A) is not available, then a numerical integrator can be employed.

Accuracy of numerical integration in Matlab

I am trying to integrate an analytic function (a composite of sqrt and trig function) on a rectangle area. It has no singularity point in the area and seems to be a perfect candidate to use dblquad. My question is how to evaluate the accuracy of the numerical value that Matlab provided to me. Without knowing the exact value of the integration, how can we justify the significant-digits? When you are required to give a value with certain digits of precision, you should be able to justify. Is it possible to achieve this given the value is calculated by using Matlab?
Unless you set it otherwise, dblquad uses a default tolerance threshold (10-6 in the latest releases) for the absolute quadrature error. The approximation of the integral will be within an error no larger than the specified tolerance.
You could have a peek at the source code for dblquad, somewhere it will be using a certain number of 'steps'. I guess you could make a new m-file with the important bits that get the integral working and play around with the number of steps until it takes the computer a long time and doesn't change the result. Personally I use a custom simpsons rule for numerical integration and just change N (number of steps) to some large number.

Using matlab to calculate the properties of a polygon defined as a list of points

Does MATLAB have a built-in function to find general properties like center of mass & moments of inertia for a polygon defined as a list of (non-integer valued) points?
regionprops performs this task for integer valued points, on the assumption that these represent indices of pixels in an image. But the only functions I can find that treat non integral point lists are polyarea and inpolygon.
My kludge for now is to create a bwconncomp structure with all the points multiplied by some large value (like 10,000), then feeding it in to regionprops, but wondered if there is a more elegant solution.
You should check out the submission POLYGEOM by H.J. Sommer on the MathWorks File Exchange. It looks like it has all the property measurements you want, and nice documentation describing the formulae used in the code.
I don't know of a function in MATLAB that would do this for you.
However, poly2mask might be of use for you to create the pixel masks to feed into regionprops. I also suggest that, should you decide to go this route, you carefully test how much the discretization affects the results, so that you don't create crazy large arrays (and waste time) for no real gain in accuracy.
One possibility is to farm out the calculations to the Java Topology Suite. I don't know about "moments of inertia", but it does at least have a centroid method.