According to what I read from here, the kurtosis of a normal distribution should be around 3. However, when I use the kurtosis function provided by MATLAB, I could not verify it:
data1 = randn(1,20000);
v1 = kurtosis(data1)
It seems that the kurtosis of a normal distribution is around 0. I was wondering what's wrong with it. Thanks!
EDIT
I am using MATLAB 2012b.
If it did that, this would be a strong indication that it was computing excess kurtosis, which is defined to be kurtosis minus three.
However, my MATLAB doesn't actually do that:
MATLAB>> data1 = randn(1,20000);
MATLAB>> kurtosis(data1)
ans =
2.9825
Related
I am trying to obtain an equation for a function fitted to some histogram data, I was thinking of trying to do this by fitting a rational function as the data doesn't resemble any distribution recognisable by myself.
The data is experimental, and I want to be able to generate a random number according to its distribution. Hence I am hoping to be able to fit it to some sort of PDF from which I can obtain a CDF, which can be rearranged to a function into which a uniformly distributed random number between 0 and 1 can be substituted in order to obtain the desired result.
I have attempted to use the histfit function, which has worked but I couldn't figure out how to obtain an equation for the curve it fitted. Is there something stupid I have missed?
Update: I have discovered the function rationalfit, however I am struggling to figure out what the inputs need to be.
Further Update: Upon exploring the histfit command further I have discovered the option to fit it to a kernal, the figure for which looks promising, however I am only able to obtain a set of x and y values for the curve, not its equation as a I wanted.
From the documentation on histfit:
Algorithms
histfit uses fitdist to fit a distribution to data. Use fitdist
to obtain parameters used in fitting.
So the answer to your question is to use fitdist to get the parameters you're after. Here's the example from the documentation:
rng default; % For reproducibility
r = normrnd(10,1,100,1);
histfit(r)
pd = fitdist(r,'Normal')
pd =
NormalDistribution
Normal distribution
mu = 10.1231 [9.89244, 10.3537]
sigma = 1.1624 [1.02059, 1.35033]
y = gauss(x,s,m)
Y = normpdf(X,mu,sigma)
R = normrnd(mu,sigma)
What are the basic differences between these three functions?
Y = normpdf(X,mu,sigma) is the probability density function for a normal distribution with mean mu and stdev sigma. Use this if you want to know the relative likelihood at a point X.
R = normrnd(mu,sigma) takes random samples from the same distribution as above. So use this function if you want to simulate something based on the normal distribution.
y = gauss(x,s,m) at first glance looks like the exact same function as normpdf(). But there is a slight difference: Its calculation is
Y = EXP(-(X-M).^2./S.^2)./(sqrt(2*pi).*S)
while normpdf() uses
Y = EXP(-(X-M).^2./(2*S.^2))./(sqrt(2*pi).*S)
This means that the integral of gauss() from -inf to inf is 1/sqrt(2). Therefore it isn't a legit PDF and I have no clue where one could use something like this.
For completeness we also have to mention p = normcdf(x,mu,sigma). This is the normal cumulative distribution function. It gives the probability that a value is between -inf and x.
A few more insights to add to Leander good answer:
When comparing between functions it is good to look at their source or toolbox. gauss is not a function written by Mathworks, so it may be redundant to a function that comes with Matlab.
Also, both normpdf and normrnd are part of the Statistics and Machine Learning Toolbox so users without it cannot use them. However, generating random numbers from a normal distribution is quite a common task, so it should be accessible for users that have only the core Matlab. Hence, there is a redundant function to normrnd which is randn that is part of the core Matlab.
So I've been playing around with Julia, and I've discovered that the function to calculate the kurtosis of a probability distribution is implemented differently between Julia and MATLAB.
In Julia, do:
using Distributions
dist = Beta(3, 5)
x = rand(dist, 10000)
kurtosis(x) #gives a value approximately around -0.42
In MATLAB do:
x = betarnd(3, 5, [1, 10000]);
kurtosis(x) %gives something approximately around 2.60
What's happening here? Why is the kurtosis different between the two languages?
As explained here: http://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm
We often use excess Kurtosis (Kurtosis - 3) so that the (Excess) Kurtosis of a normal distribution becomes zero. As shown in the distributions.jl docs that is what is used by kurtosis(x) in Julia.
Matlab does not use the excess measure (there is even a note in the docs that mentions this potential issue).
I guess this is an easy question, but I have been struggling to solve it. Is it possible to create a Normal Distribution in Matlab purely based on the mean and Standard Deviation? I don't know what the x values are and therefore unable to use the normpdf() function.
Thanks
The randn function can do that for you.
The documentation give this example :
Generate values from a normal distribution with mean 1 and standard
deviation 2:
r = 1 + 2.*randn(100,1);
If I estimate the entropy of a vector of standard normal random variables using the Matlab entropy() function, I get an answer somewhere in the region of 4, whereas the actual entropy should be 0.5 * log(2*pi*e*sigma^2) which is approximately equal to 1.4.
Does anyone know where the discrepancy is coming from?
Note: To save time here is the Matlab code
for i = 1:1000
X(i) = randn();
end
'The entropy of X is'
entropy(X)
Please read the help (help entropy) or documentation for entropy. You'll see that it's designed for images and uses a histogram technique rather than calculating the it analytically. You'll need to create your own function if you want the formula from Wikipedia, but as the formula is so simple, that should be no problem.
I believe that the reason that you're getting such divergent answers is that entropy scales the bins of the histogram by the number of elements. If you want to uses such an estimation technique you'll want to use hist and scale the bins by area. See this StackOverflow question.