How do I compute the Inverse gaussian distribution from given CDF? - matlab

I want to compute the parameters mu and lambda for the Inverse Gaussian Distribution given the CDF.
By 'given the CDF' I mean that I have given the data AND the (estimated) quantile for the data I.e.
Quantile - Value
0.01 - 10
0.5 - 12
0.7 - 13
Now I want to find out the inverse gaussian distribution for this data so that I can e.g. Look up the quantile for value 11 based on my distribution.
How can I find out the values mu and lambda?
The only solution I can think of is using Gradient descent to find the best mu and lambda using RMSE as an error measure.
Isn't there a better solution?
Comment: Matlab's MLE-Algorithm is not an option, since it does not use the quantile data.

As all you really want to do is estimate the quantiles of the distribution at unknown values and you have a lot of data points you can simply interpolate the values you want to lookup.
quantile_estimate = interp1(values, quantiles, value_of_interest);

According to #mpiktas here I implemented a gradient descent algorithm for estimating my mu and lambda:
Make initial guess using MLE
Learn mu and lambda using gradient descent with RMSE as error measure.

The following article explains in detail how to compute quantiles (the inverse CDF) for the inverse Gaussian distribution:
Giner, G, and Smyth, GK (2016). statmod: probability calculations for the inverse Gaussian distribution. R Journal. http://arxiv.org/abs/1603.06687
Code for the R language is contained in the R package statmod available from CRAN. For example:
> library(statmod)
> qinvgauss(0.01, lower.tail=FALSE)
[1] 4.98
computes the 0.01 upper tail quantile of the standard IG distribution.

Related

This line of code is supposed to generate exponential service times, but I am not able to get the logic behind it

This line of code is supposed to generate exponential service times, but I am not able to get the logic behind it.
% Exponential service time with rate 1
mean = 1;
dt = -mean * log(1 - rand());
This is the source link, but MATLAB is needed to open the example.
I was also thinking if exprnd(1) will give the same result of generating numbers from the exponential distribution that has a mean of 1?
You are right!
First, note that MATLAB parameterizes the Exponential distribution by the mean, not the rate, so exprnd(5) would have a rate lambda = 1/5.
This line of code is another way to do the same thing:
-mean * log(1 - rand());
This is the inverse transform for the Exponential distribution.
If X follows an Exponential distribution, then
and rewriting the cumulative distribution function (CDF) and letting U ~ Uniform(0,1), we can derive the inverse transform.
Note the last equality is because 1-U and U are equal in distribution. In other words, 1-U ~ Uniform(0,1) and U ~ Uniform(0,1).
You can test this yourself with this example code with multiple approaches.
% MATLAB R2018b
rate = 1; % mean = 1 % mean = 1/rate
NumSamples = 1000;
% Approach 1
X1 = (-1/rate)*log(1-rand(NumSamples,1)); % inverse transform
% Approach 2
X2 = exprnd(1/rate,NumSamples,1);
% Approach 3
pd = makedist('Exponential',1/rate) % create probability distribution object
X3 = random(pd,NumSamples,1);
EDIT: The OP asked is there was a reason to generate from the CDF rather than from the probability density function (PDF). This is my attempt to answer that.
The inverse transform method uses the CDF to take advantage of the fact that the CDF is itself a probability and so must be on the interval [0, 1]. Then it is very easy to generate very good (pseudo) random numbers which will be on that interval. The CDF is sufficient to uniquely define the distribution, and inverting the CDF means that its unique "shape" will properly map the uniformly distributed numbers on [0, 1] to a non-uniform shape in the domain which will follow the probability density function (PDF).
You can see the CDF performing this nonlinear mapping in this figure.
One use of the PDF would be Acceptance-Rejection methods, which can be useful for some distributions including custom PDFs (thanks to #pjs for jogging my memory).

Sample multinomial distribution in Matlab without using mnrnd

I know for a random variable x that P(x=i) for each i=1,2,...,100. Then how may I sample x by a multinomial distribution, based on the given P(x=i) in Matlab?
I am allowed to use the Matlab built-in commands rand and randi, but not mnrnd.
In general, you can sample numbers from any 1 dimensional probability distribution X using a uniform random number generator and the inverse cumulative distribution function of X. This is known as inverse transform sampling.
random_x = xcdf_inverse(rand())
How does this apply here? If you have your vector p of probabilities defining your multinomial distribution, F = cumsum(p) gives you a vector that defines the CDF. You can then generate a uniform random number on [0,1] using temp = rand() and then find the first row in F greater than temp. This is basically using the inverse CDF of the multinomial distribution.
Be aware though that for some distributions (eg. gamma distribution), this turns out to be an inefficient way to generate random draws because evaluating the inverse CDF is so slow (if the CDF cannot expressed analytically, slower numerical methods must be used).

how to calculate the spectral density of a matrix of data use matlab

I am not doing signal processing. But in my area, I will use the spectral density of a matrix of data. I get quite confused at a very detailed level.
%matrix H is given.
corr=xcorr2(H); %get the correlation
spec=fft2(corr); % Wiener-Khinchin Theorem
In matlab, xcorr2 will calculate the correlation function of this matrix. The lag will range from -N+1 to N-1. So if size of matrix H is N by N, then size of corr will be 2N-1 by 2N-1. For discretized data, I should use corr or half of corr?
Another problem is I think Wiener-Khinchin Theorem is basically for continuous function. I have always thought that Discretized FT is an approximation to Continuous FT, or you can say it is a tool to calculate Continuous FT. If you use matlab build in function 'fft', you should divide the final result by \delta x.
Any kind soul who knows this area well there to share some matlab code with me?
Basically, approximating a continuous FT by a Discretized FT is the same as approximating an integral by a finite sum.
We will first discuss the 1D case, then we'll discuss the 2D case.
Let's look at the Wiener-Kinchin theorem (for example here).
It states that :
"For the discrete-time case, the power spectral density of the function with discrete values x[n], is :
where
Is the autocorrelation function of x[n]."
1) You can see already that the sum is taken from -infty to +infty in the calculation of S(f)
2) Now considering the Matlab fft - You can see (command 'edit fft' in Matlab), that it is defined as :
X(k) = sum_{n=1}^N x(n)*exp(-j*2*pi*(k-1)*(n-1)/N), 1 <= k <= N.
which is exactly what you want to be done in order to calculate the power spectral density for a frequency f.
Note that, for continuous functions, S(f) will be a continuous function. For Discretized function, S(f) will be discrete.
Now that we know all that, it can easily be extended to the 2D case. Indeed, the structure of fft2 matches the structure of the right hand side of the Wiener-Kinchin Theorem for the 2D case.
Though, it will be necessary to divide your result by NxM, where N is the number of sample points in x and M is the number of sample points in y.

How To Fit Multivariate Normal Distribution To Data In MATLAB?

I'm trying to fit a multivariate normal distribution to data that I collected, in order to take samples from it.
I know how to fit a (univariate) normal distribution, using the fitdist function (with the 'Normal' option).
How can I do something similar for a multivariate normal distribution?
Doesn't using fitdist on every dimension separately assumes the variables are uncorrelated?
There isn't any need for a specialized fitting function; the maximum likelihood estimates for the mean and variance of the distribution are just the sample mean and sample variance. I.e., compute the sample mean and sample variance and you're done.
Estimate the mean with mean and the variance-covariance matrix with cov.
Then you can generate random numbers with mvnrnd.
It is also possible to use fitmgdist, but for just a multivariate normal distribution mean and cov are enough.
Yes, using fitdist on every dimension separately assumes the variables are uncorrelated and it's not what you want.
You can use [sigma,mu] = robustcov(X) function, where X is your multivariate data, i.e. X = [x1 x2 ... xn] and xi is a column vector data.
Then you can use Y = mvnpdf(X,mu,sigma) to get the values of the estimated normal probability density function.
https://www.mathworks.com/help/stats/normfit.html
https://www.mathworks.com/help/stats/mvnpdf.html

empirical quantiles in matlab

does anyone know how to calculate the empirical quantiles of a distribution in matlab? specifically I have issues working w the empiricalQuantiles() function and need to calculate empirical quantiles of a rolling population (a matrix that is say 49x1025 for every 100 points).
if you can also give information on how to calculate the inverse of the empirical distribution (which should give approximately the same answer) that would be great
% Simulating empirical data
empiricalData=randn(50000,1);
% Quantile evaluation
% For instance: Median
y = quantile(empiricalData,[.50]);