Parameter estimation (MLE) of a truncated Pareto distribution - matlab

i'm new here and i am super desperate so i really hope anyone of you can help me....
i have a sample of random data x_1....x_n and i want to fit a truncated pareto distribution to the data.... to fit a generalized pareto distribution is super easy and i have already done that. I calculated the shape and scale parameters with a matlab routine.
But for the truncated pareto distribution i can't seem to find a routine to calculate the parameters i need...
Does anybody have an idea how to do it?
Thanks in advance!

You can use Markov-Chain-Monte-Carlo simulations to do Bayesian Inference to get the most likely parameters of your generalized pareto distribution for the given data. Or you stay with the maximum likelihood method. Your problem can be solved in many ways. But if you want to apply MLE you actually just need to search for a maximum. You could do it with fminsearch()
http://de.mathworks.com/help/optim/ug/fminsearch.html
For this you just need to define another function in a separate m-file which computes your Likelihood or Log-Likehood for a given set of parameters of your truncated pareto distribution. fminsearch now returns you the optimal parameters according to this likelihood. Is this the kind of routine you are looking for?

Related

Advice on Speeding up SciPy Custom Distribution Sampling & Fitting

I am trying to fit a custom distribution to a large (~O(500,000) measurements) dataset using scipy. I have derived a theoretical PDF based on some other factors, but both by hand and using symbolic integration software I cannot find an exact form of the CDF.
Currently, simply evaluating 1000 random samples from my custom distribution is expensive, which I believe is due to the need to invert an unknown CDF. If I cannot find an explicit form of the CDF and it's inverse, is there anything else I can do to speed up usage of this distribution?
I've used maple, matlab and Sympy to try and determine a CDF, yet none give a result. I also tried down-sampling my data whilst still retaining the tail attributes, but this still required so much data that doing anything with the distribution was slow.
My distribution is a sub-class of SciPy's rv_continuous class.
Thanks for any advice.
This sounds like you want to sample from a Kernel Density Estimation of the probability distribution. While Scipy does offer a Gaussian Kernel package, for that many measurements you would be much better off using sklearn's implementation. A good resource with code examples can be found on Jake VanderPlas's blog.

Matlab: Curve Fitting with Start Value

I’m working with the Matlab Curve Fitting tool for the very first time and I have a question. My fit is exponential with two terms and it looks pretty good. The problem is, it won’t start from P(0,0), although my first measurement is.
Is it possible to force a start value onto my fit? Also, how does R-squared work? Is it safe to rely on?
Thank you so much
See a thorough description of this process here
In short, the most common method of fitting to a polynomial in matlab, polyfit, does not allow for forcing through zero (or anywhere else), and so a different function is required, lsqlin, for example.

Obtaining distribution from histogram

I have an array of values, with that values I plotted the histogram.I want to know the corresponding distribution from the histogram obtained. How is it possible.
Could you please explain the steps in obtaining appropriate probability distribution from histogram.
You'd better to ask this question in stats.stackexchange.com as it is more about the method than the programming. However, one thing that you can do is to fit a parametric distribution (using moment matching or maximum likelihood for example) then compare the fitted distribution to the normalized histogram using KL divergence or Bhattacharyya distance.
One option might be to use the "Distribution Fitting App" in the Statistics and Machine Learning Toolbox. That should help you evaluate if your data seems like it might have been drawn from some common distributions. You may never know for sure, since multiple distributions could account for the data, but if you have a lot of data it might help you narrow it down.
I think that in many cases an eye-ball comparison is enough. With a reasonable amount of data, it is quite difficult to not be able to distinguish between a gaussian or a weibull or...
I would use fitdist or fithist to eye-ball different distributions.
If you have no idea at all on the distribution and you want to know if two datasets are distributed differently it could be useful to compare their distributions by obtaining them with the option 'kernel'

Numerical integration using Simpson's Rule on discrete data

I am looking for numerical integration with matlab. I know that there is a trapz function in matlab but the precision is not good enough. By searching it online, I found there is a quad function there it seems only accept symbolic expression as input. My data is all discrete and one-dimensional. Is that any way to use quad on my data? Thanks.
An answer to your question would be no. The only way to perform numerical integration for data with no expression in Matlab is by using the trapz function. If it's not accurate enough for you, try writing your own quad function as Li-aung said, it's very simple, this may help.
Another method you may try is to use the powerful Curve Fitting Tool cftool to make a fit then use the integrate function which can operate on cfit objects (it has a weird convention, the upper limit is the first argument!). I don't think you will get much accurate answers than trapz, it depends on the fit.
Use the spline function in MATLAB to interpolate your data, then integrate this data. This is the standard method for integrating data in discrete form.
You can use quadl() to integrate your data if you first create a function in which you interpolate them.
function f = int_fun(x,xdata,ydata)
f = interp1(xdata,ydata,x);
And then feed it to the quadl() function:
integral = quadl(#int_fun,A,B,[],[],x,y) % syntax to pass extra arguments
% to the function
Integration of a function of one variable is the computation of the area under the curve of the graph of the function. For this answer I'll leave aside the nasty functions and the corner cases and all the twists and turns that trip up writers of numerical integration routines, most of which are probably not relevant here.
Simpson's rule is an approach to the numerical integration of a function for which you have a code to evaluate the function at points within its domain. That's irrelevant here.
Let's suppose that your data represents a time series of values collected at regular intervals. Then you can plot your data as a histogram with bars of equal width. The integrand you seek is the sum of the areas of the bars in the histogram between the limits you are interested in.
You should be able to apply this approach to data sets where the x-axis (ie the width of the bars in the histogram) does not show time, to the situation where the bars are not of equal width, to the situation where the data crosses the x-axis, and most reasonable data sets, quite easily.
The discretisation of your data establishes a limit to the accuracy of the result you can get. If, for example, your time series is sampled at 1sec intervals you can't integrate over an interval which is not a whole number of seconds by this approach. But then, you don't really have the data on which to compute a figure with any more accuracy by any approach. Sure, you can use Matlab (or anything else) to generate extra digits of precision but they don't carry any meaning.

fit data(measurements) with numerical datasets

I have some data for which I have a set of numerically determined model curves. Now I would like to find the one with least square deviation, I only need to vary one parameter, which is the amplitude of these model curves.
I used fitting with analytic functions, but I did not find a way to handle such a problem.
Is there any solution?
Thanks a lot!
One of the optimize functions should do the trick. You can also read the section on optimization in the manual. Without any specifics on the data or the model you wish to match, it's hard to recommend anything more specific. For example, if your cost function has many maxima and minima or is not differentiable, you'll have to choose some of the more expensive routines.