So I've been playing around with Julia, and I've discovered that the function to calculate the kurtosis of a probability distribution is implemented differently between Julia and MATLAB.
In Julia, do:
using Distributions
dist = Beta(3, 5)
x = rand(dist, 10000)
kurtosis(x) #gives a value approximately around -0.42
In MATLAB do:
x = betarnd(3, 5, [1, 10000]);
kurtosis(x) %gives something approximately around 2.60
What's happening here? Why is the kurtosis different between the two languages?
As explained here: http://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm
We often use excess Kurtosis (Kurtosis - 3) so that the (Excess) Kurtosis of a normal distribution becomes zero. As shown in the distributions.jl docs that is what is used by kurtosis(x) in Julia.
Matlab does not use the excess measure (there is even a note in the docs that mentions this potential issue).
Related
y = gauss(x,s,m)
Y = normpdf(X,mu,sigma)
R = normrnd(mu,sigma)
What are the basic differences between these three functions?
Y = normpdf(X,mu,sigma) is the probability density function for a normal distribution with mean mu and stdev sigma. Use this if you want to know the relative likelihood at a point X.
R = normrnd(mu,sigma) takes random samples from the same distribution as above. So use this function if you want to simulate something based on the normal distribution.
y = gauss(x,s,m) at first glance looks like the exact same function as normpdf(). But there is a slight difference: Its calculation is
Y = EXP(-(X-M).^2./S.^2)./(sqrt(2*pi).*S)
while normpdf() uses
Y = EXP(-(X-M).^2./(2*S.^2))./(sqrt(2*pi).*S)
This means that the integral of gauss() from -inf to inf is 1/sqrt(2). Therefore it isn't a legit PDF and I have no clue where one could use something like this.
For completeness we also have to mention p = normcdf(x,mu,sigma). This is the normal cumulative distribution function. It gives the probability that a value is between -inf and x.
A few more insights to add to Leander good answer:
When comparing between functions it is good to look at their source or toolbox. gauss is not a function written by Mathworks, so it may be redundant to a function that comes with Matlab.
Also, both normpdf and normrnd are part of the Statistics and Machine Learning Toolbox so users without it cannot use them. However, generating random numbers from a normal distribution is quite a common task, so it should be accessible for users that have only the core Matlab. Hence, there is a redundant function to normrnd which is randn that is part of the core Matlab.
Has anyone tried plotting a sine function for large values in MATLAB?
For e.g.:
x = 0:1000:100000;
plot(x,sin(2*pi*x))
I was just wondering why the amplitude is changing for this periodic function? As per what I expect, for any value of x, the function has a period of 2*pi. Why is it not?
Does anyone know? Is there a way to get it right? Also, is this a bug and is it already known?
That's actually not the amplitude changing. That is due to the numerical imprecisions of floating point arithmetic. Bear in mind that you are specifying an integer sequence from 0 to 100000 in steps of 1000. If you recall from trigonometry, sin(n*x*pi) = 0 when x and n are integers, and so theoretically you should be obtaining an output of all zeroes. In your case, n = 2, and x is a number from 0 to 100000 that is a multiple of 1000.
However, this is what I get when I use the above code in your post:
Take a look at the scale of that graph. It's 10^{-11}. Do you know how small that is? As further evidence, here's what the max and min values are of that sequence:
>> min(sin(2*pi*x))
ans =
-7.8397e-11
>> max(sin(2*pi*x))
ans =
2.9190e-11
The values are so small that they might as well be zero. What you are visualizing in the graph is due to numerical imprecision. As I mentioned before, sin(n*x*pi) = 0 when n and x is are integers, under the assumption that we have all of the decimal places of pi available. However, because we only have 64-bits total to represent pi numerically, you will certainly not get the result to be exactly zero. In addition, be advised that the sin function is very likely to be using numerical approximation algorithms (Taylor / MacLaurin series for example), and so that could also contribute to the fact that the result may not be exactly 0.
There are, of course, workarounds, such as using the symbolic mathematics toolbox (see #yoh.lej's answer). In this case, you will get zero, but I won't focus on that here. Your post is questioning the accuracy of the sin function in MATLAB, that works on numeric type inputs. Theoretically with your input into sin, as it is an integer sequence, every value of x should make sin(n*x*pi) = 0.
BTW, this article is good reading. This is what every programmer needs to know about floating point arithmetic and functions: http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html. A more simple overview can be found here: http://floating-point-gui.de/
Because what is the exact value of pi?
This apparent error is due to the limit of floating point accuracy. If you really need/want to get around that you can do symbolic computation with matlab, see the difference between:
>> sin(2*pi*10)
ans =
-2.4493e-15
and
>> sin(sym(2*pi*10))
ans =
0
If I estimate the entropy of a vector of standard normal random variables using the Matlab entropy() function, I get an answer somewhere in the region of 4, whereas the actual entropy should be 0.5 * log(2*pi*e*sigma^2) which is approximately equal to 1.4.
Does anyone know where the discrepancy is coming from?
Note: To save time here is the Matlab code
for i = 1:1000
X(i) = randn();
end
'The entropy of X is'
entropy(X)
Please read the help (help entropy) or documentation for entropy. You'll see that it's designed for images and uses a histogram technique rather than calculating the it analytically. You'll need to create your own function if you want the formula from Wikipedia, but as the formula is so simple, that should be no problem.
I believe that the reason that you're getting such divergent answers is that entropy scales the bins of the histogram by the number of elements. If you want to uses such an estimation technique you'll want to use hist and scale the bins by area. See this StackOverflow question.
According to what I read from here, the kurtosis of a normal distribution should be around 3. However, when I use the kurtosis function provided by MATLAB, I could not verify it:
data1 = randn(1,20000);
v1 = kurtosis(data1)
It seems that the kurtosis of a normal distribution is around 0. I was wondering what's wrong with it. Thanks!
EDIT
I am using MATLAB 2012b.
If it did that, this would be a strong indication that it was computing excess kurtosis, which is defined to be kurtosis minus three.
However, my MATLAB doesn't actually do that:
MATLAB>> data1 = randn(1,20000);
MATLAB>> kurtosis(data1)
ans =
2.9825
I did calculation and got the following numbers
0.739128438976901 0.739128438976900
I want MATLAB to consider that they are equal, but MATLAB recognized that the first one was greater than the second. How can I make MATLAB consider them as they are equal ?
Thanks
x = 42
y = 42.00001
if abs(x-y) < tolerance
% do something
end
The setting for tolerance is up to you.
I don't know a whole lot about Matlab (I'm more of a Mathematica guy myself), but it seems there is a roundn(x,n) function which rounds an element x to the nearest multiple of 10^n. Perhaps this could be used here.