Draw random numbers from the Gumbel distribution in Matlab - matlab

Question: I would like your help to draw random numbers from the Gumbel distribution with scale mu and location beta in Matlab.
I want to use the definition of the Gumbel distribution provided by Wikipedia (see the PDF and CDF definitions on the right of the page).
Notice: The package evrnd in Matlab, described here, cannot be used (or maybe can be used with some modifications?) because it considers flipped signs.
Let me explain better this last point.
Let's fix the scale to 0 and the location to 1.
Now, following Wikipedia and other textbooks (for example, here p.42) the Gumbel PDF is
exp(-x)*exp(-exp(-x))
In Matlab though it seems that evrnd considers random draws from the following PDF:
exp(x)*exp(-exp(x))
You can see that in Matlab -x is replaced with x.
Any idea on what is the best way to proceed?

According to the Wikipedia, the inverse cumulative distribution function is
Q(p) = mu - beta*log(-log(p))
From this function, the inverse transformation method can be applied. Thus
sz = [1 1e6]; % desired size for result array
mu = 1; % location parameter
beta = 2.5; % scale parameter
result = mu - beta*log(-log(rand(sz)))
gives result with i.i.d. Gumbel-distributed numbers. Plotting the histogram for these example values gives
>> histogram(result, 51)
If you want to use the evrnd function (Statistics Toolbox), you only need to change the sign of the output. According to the documentation,
R = evrnd(mu,sigma,[m,n,...])
The version used here is suitable for modeling minima; the mirror image of this distribution can be used to model maxima by negating R.
Thus, use
result = -evrnd(mu, beta, sz);

Related

Inverse cumulative distribution function in MATLAB given an empirical PDF

Is it possible to generate random numbers with a distribution that depends on empirical probability data? I imagine this is possible by taking the inverse cumulative distribution function. I have seen some examples where this is done in MATLAB (the software that I'm using) but all of those examples have an underlying analytic form for the probability. Here I have only the PDF. For instance, I have data of probabilities for a particular event. Most of the probabilities are zero and hence not unique, but not all.
My goal is to generate the random numbers and then figure out what the distribution is. I'd really appreciate if people can help clear up my thinking here.
EDIT:
I think I want something like:
cdf=cumsum(pdf); % calculate pdf from empirical pdf
M=length(cdf);
xq=linspace(0,1,M);
invcdf=interp1(cdf,xq,xq); % calculate inverse cdf, i.e., x
but how do I take into account that a lot of the values of the pdf are zero and not unique? Is this even the right approach?
I am basing my answer on Inverse empirical cumulative distribution function from the MathWorks File Exchange. See that link for other suggestions to solving your problem.
% y - input: data set
% q - input: desired quantile (can be a scalar or a vector)
% xq - output: ICDF at specified quantile
[f, x] = ecdf(y);
xq = zeros(size(q));
for ii = 1:length(q)
xq(ii) = min(x(q(ii) <= f));
end
I'd eliminate the for loop if you're only using scalars. Also, there may be a more efficient way to vectorize the for loop, but this should at least get you started.

How to obtain an equation for a line fitted to data

I am trying to obtain an equation for a function fitted to some histogram data, I was thinking of trying to do this by fitting a rational function as the data doesn't resemble any distribution recognisable by myself.
The data is experimental, and I want to be able to generate a random number according to its distribution. Hence I am hoping to be able to fit it to some sort of PDF from which I can obtain a CDF, which can be rearranged to a function into which a uniformly distributed random number between 0 and 1 can be substituted in order to obtain the desired result.
I have attempted to use the histfit function, which has worked but I couldn't figure out how to obtain an equation for the curve it fitted. Is there something stupid I have missed?
Update: I have discovered the function rationalfit, however I am struggling to figure out what the inputs need to be.
Further Update: Upon exploring the histfit command further I have discovered the option to fit it to a kernal, the figure for which looks promising, however I am only able to obtain a set of x and y values for the curve, not its equation as a I wanted.
From the documentation on histfit:
Algorithms
histfit uses fitdist to fit a distribution to data. Use fitdist
to obtain parameters used in fitting.
So the answer to your question is to use fitdist to get the parameters you're after. Here's the example from the documentation:
rng default; % For reproducibility
r = normrnd(10,1,100,1);
histfit(r)
pd = fitdist(r,'Normal')
pd =
NormalDistribution
Normal distribution
mu = 10.1231 [9.89244, 10.3537]
sigma = 1.1624 [1.02059, 1.35033]

What are the differences between different gaussian functions in Matlab?

y = gauss(x,s,m)
Y = normpdf(X,mu,sigma)
R = normrnd(mu,sigma)
What are the basic differences between these three functions?
Y = normpdf(X,mu,sigma) is the probability density function for a normal distribution with mean mu and stdev sigma. Use this if you want to know the relative likelihood at a point X.
R = normrnd(mu,sigma) takes random samples from the same distribution as above. So use this function if you want to simulate something based on the normal distribution.
y = gauss(x,s,m) at first glance looks like the exact same function as normpdf(). But there is a slight difference: Its calculation is
Y = EXP(-(X-M).^2./S.^2)./(sqrt(2*pi).*S)
while normpdf() uses
Y = EXP(-(X-M).^2./(2*S.^2))./(sqrt(2*pi).*S)
This means that the integral of gauss() from -inf to inf is 1/sqrt(2). Therefore it isn't a legit PDF and I have no clue where one could use something like this.
For completeness we also have to mention p = normcdf(x,mu,sigma). This is the normal cumulative distribution function. It gives the probability that a value is between -inf and x.
A few more insights to add to Leander good answer:
When comparing between functions it is good to look at their source or toolbox. gauss is not a function written by Mathworks, so it may be redundant to a function that comes with Matlab.
Also, both normpdf and normrnd are part of the Statistics and Machine Learning Toolbox so users without it cannot use them. However, generating random numbers from a normal distribution is quite a common task, so it should be accessible for users that have only the core Matlab. Hence, there is a redundant function to normrnd which is randn that is part of the core Matlab.

the result does`t match what I expect when I used log-normal PDF in matlab

I`m learning a paper
the paper presents a figure
the figure shows CDF of buildings height
and the paper also gives details about this figure
Building height statistics: The present model uses the statistics of
building heights in typical built-up areas as input data. A suitable
form was sought by comparing with geographical data for the city of
Guildford, United Kingdom. The probability density function that was
selected to fit the data was the log-normal distribution with unknown
parameters: mean value p and standard deviation t. As can be noted
from Fig. 3, it was found to be a good fit to the geographical data
values with parameters p = 7.3m, t= 0.26.
it tells the mean value is 7.3 and the standard deviation is 0.26 right?
however, when I try them in matlab by adding codes
x=0:0.01:20;
meanValue = 7.3;
standardDeviation = 0.26;
y1 = logncdf(x,meanValue,standardDeviation);
plot(x,y1);
what the result showed is different from the figure 3
I tried to re-read the paper to make sure parameters are correct.
and check the document on matlab about how to use this method.
everything seem all right except the simulation result.
please help me fix it ! thanks
As mentioned in the comments, the parameters mu and sigma are the mean and standard derivation of the associated normal distribution, not of the log normal distribution. The details, especially the connection between both is explained in the Wikipedia article.
To calculate mu and sigma from the mean and variance, the formulas are given in the Wikipedia article or here in the matlab syntax:
m=7.3
t=0.26
v=t.^2;
%A lognormal distribution with mean m and variance v has parameters
mu = log((m^2)/sqrt(v+m^2));
sigma = sqrt(log(v/(m^2)+1));
%finally your code:
x=0:0.01:20;
y1 = logncdf(x,mu,sigma);
plot(x,y1);
Which is much closer to the graph in your question, but the graph in your question seems to be the CDF for a much higher standard derivation. Visually guessing the parameters form your plot, I would say it's roughly t=5

How to generate random numbers from cut-off log-normal distribution in Matlab?

The radii r is drawn from a cut-off log-normal distribution, which has a following probability density function:
pdf=((sqrt(2).*exp(-0.5*((log(r/rch)).^2)))./((sqrt(pi.*(sigma_nd.^2))...
.*r).*(erf((log(rmax/rch))./sqrt(2.*(sigma_nd.^2)))-erf((log(rmin/rch))./sqrt(2.*(sigma_nd.^2))))));
rch, sigma_nd, rmax, and rmin are all constants.
I found the explanation from the net, but it seems difficult to find its integral and then take inverse in Matlab.
I checked, but my first instinct is that it looks like log(r/rch) is a truncated normal distribution with limits of log(rmin/rch) and log(rmax/rch). So you can generate the appropriate truncated normal random variable, say y, then r = rch * exp(y).
You can generate truncated normal random variables by generating the untruncated values and replacing those that are outside the limits. Alternatively, you can do it using the CDF, as described by #PengOne, which you can find on the wikipedia page.
I'm (still) not sure that your p.d.f. is completely correct, but what's most important here is the distribution.
If your PDF is continuous, then you can integrate to get a CDF, then find the inverse of the CDF and evaluate that at the random value.
If your PDF is not continuous, then you can get a discrete CDF using cumsum, and use that as your initial Y value in interp(), with the initial X value the same as the values the PDF was sampled at, and asking to interpolate at your array of rand() numbers.
it's not clear what's your variable, but i'm assuming it's r.
the simplest way to do this is, as Chris noted, first get the cdf (note that if r starts at 0, pdf(1) is Nan... change it to 0):
cdf = cumtrapz(pdf);
cdf = cdf / cdf(end);
then spawn a uniform distribution (size_dist indicating the number of elements):
y = rand (size_dist,1);
followed by a method to place distribution along the cdf. Any technique will work, but here is the simplest (albeit inelegant)
x = zeros(size_dist,1);
for i = 1:size_dist
x(i) = find( y(i)<= cdf,1);
end
and finally, returning to the original pdf. Use matlab numerical indexing to reverse course. Note: use r and not pdf:
pdfHist = r(x);
hist (pdfHist,50)
Probably an overkill for your distribution - but you can always write a Metropolis sampler.
On the other hand - implementation is straight forward so you'd have your sampler very quick.