I am trying to learn the kernel density estimation from the basic. Anyone have the simple routine for 1d KDE would be great helpful. Thanks.
If you have the statistics toolbox in MATLAB, you can use the ksdensity to estimate pdf/cdf using kernel smoothing. Here's an example
data=[randn(2000,1);4+randn(2000,1)];%# create a bimodal Gaussian distribution
x=linspace(-4,8,1e4);%# need to evaluate density at these points
pF=ksdensity(data,x,'function','pdf');%# evaluate the pdf of the data points
If you plot it, it should look like this
You can also get the cumulative distribution or the inverse cumulative or change the kernel that is used. You can look up the list of options from the link provided. This should help you get started :)
Related
I am trying to fit a custom distribution to a large (~O(500,000) measurements) dataset using scipy. I have derived a theoretical PDF based on some other factors, but both by hand and using symbolic integration software I cannot find an exact form of the CDF.
Currently, simply evaluating 1000 random samples from my custom distribution is expensive, which I believe is due to the need to invert an unknown CDF. If I cannot find an explicit form of the CDF and it's inverse, is there anything else I can do to speed up usage of this distribution?
I've used maple, matlab and Sympy to try and determine a CDF, yet none give a result. I also tried down-sampling my data whilst still retaining the tail attributes, but this still required so much data that doing anything with the distribution was slow.
My distribution is a sub-class of SciPy's rv_continuous class.
Thanks for any advice.
This sounds like you want to sample from a Kernel Density Estimation of the probability distribution. While Scipy does offer a Gaussian Kernel package, for that many measurements you would be much better off using sklearn's implementation. A good resource with code examples can be found on Jake VanderPlas's blog.
I really need to calculate some cumulative probability of the chi2 (degree of freedom 1000)distribution. I know there is this function chi2cdf(x,n) in the corresponding statistics Toolbox in Matlab. However I don't have this specific Toolbox, can anyone help me with that??
It would be great if you can help me with that! Thanks !!
You can look at this submission from the File Exchange which seems to do what you are looking for.
Hi I'm trying to estimate the data distribution using Matlab.
For one dimensional data, I can definitely use ksdensity.
However my problems is that I need multi-dimensional joint distribution and conditional distribution.
I've tried kde tools from UCI. It is not functioning in my case and I cannot figure out why. So I'm asking for another tool I can possibly use..
Edit
The toolbox is not working and giving extreme results. I used 1e5 points and it might because the points are too dense.
KDE toolbox result
ksdensity result
Is it possible to define your own probability density function in MATLAB or Octave and use it
for generating random numbers?
MATLAB and Octave have default functions like rand, randn built in to draw points at random from a uniform, or normal distributions but there seems to be no documentation of how to define my very own proability density function.
Sampling from an arbitrary random distribution is not always trivial. For well known distributions there are tricks to implement them and most of them are implemented in stats toolbox as Oli said.
If your distribution of interest is of difficult form, there are many sampling algorithms that may help you, such as, rejection sampling, slice sampling, Metropolis–Hastings algorithm.
If your distribution is discrete, or can be approximated by a discrete distribution fairly well, then you can just do multinomial sampling using randsamp.
If you have the stats toolbox, you can use random(), as it has a lot of useful PDFs built-in.
I've had to do that a few times recently, and it's not exactly an easy thing to accomplish. My favorite technique was to use Inverse transform sampling.
The idea is quite simple:
create a cdf
use a uniform random number generator.
identify the RV that maps to your cdf value.
Is there any possibility to fit a curve to that histogram above in Matlab?
The histogram is not normalized or anything like that.
I know that there is a function called histfit,but can i use it here?
Try this FileExchange submission:
ALLFITDIST - Fit all valid parametric probability distributions to data.
--- UPDATE ---
ALLFITDIST is no longer available on the MATLAB File Exchange.
You can try this instead:
FITMETHIS - finds best-fitting distribution to data vector, including non-parametric.
If you know the underlying distribution (i.e. skewed gaussian etc.), you can manually do a maximum likelihood estimate for the parameters of the distribution and then plot the resulting distribution on top of your histogram. However, you need to normalize your histogram so that you see empirical probabilities instead of the numbers.
I think what you want it to fit a distribution, not any curve that might not have finite area under the curve. Data looks like it's censored on the right tail, but over all it may fit log normal distribution or Gamma distribution pretty well. If you have stats toolbox, try gamfit or lognfit for starter.
See also Kernel density estimation
http://en.wikipedia.org/wiki/Kernel_density