How To Fit Multivariate Normal Distribution To Data In MATLAB? - matlab

I'm trying to fit a multivariate normal distribution to data that I collected, in order to take samples from it.
I know how to fit a (univariate) normal distribution, using the fitdist function (with the 'Normal' option).
How can I do something similar for a multivariate normal distribution?
Doesn't using fitdist on every dimension separately assumes the variables are uncorrelated?

There isn't any need for a specialized fitting function; the maximum likelihood estimates for the mean and variance of the distribution are just the sample mean and sample variance. I.e., compute the sample mean and sample variance and you're done.

Estimate the mean with mean and the variance-covariance matrix with cov.
Then you can generate random numbers with mvnrnd.
It is also possible to use fitmgdist, but for just a multivariate normal distribution mean and cov are enough.
Yes, using fitdist on every dimension separately assumes the variables are uncorrelated and it's not what you want.

You can use [sigma,mu] = robustcov(X) function, where X is your multivariate data, i.e. X = [x1 x2 ... xn] and xi is a column vector data.
Then you can use Y = mvnpdf(X,mu,sigma) to get the values of the estimated normal probability density function.
https://www.mathworks.com/help/stats/normfit.html
https://www.mathworks.com/help/stats/mvnpdf.html

Related

Is there a way to get the probability from the probability density in multivariate kernel estimation?

I have a question about multivariate kernel density in matlab, which is my first time using it.
I have a 3-dimensional sample data (x, y, z in axes) and want to find a probability of being in a certain volume using kernel density estimation. So, I used the mvksdensity function in matlab and got the probability density (estimated function values) for the points I decided.
What I originally wanted to do was to (if I could fine the function) triple integral the multivariate function for a given volume. But the mvksdensity function only returns the density estimates and does not return the function. I thought there will be an easy way to compute the probability from the density, but I’m stuck. Does anyone have any useful information for this? Thanks in advance.
I thought about fitdist function to find the distribution, but it only works for univariate kernel distribution.
I also tried to use mvncdf, which is a function that returns the cdf of the multivariate normal distribution for the row of the sample data after setting the mean and the std. But then I have to calculate the probability for a given volume for every normal distribution in each data point and then add it, which will be inefficient for a large amount of data and I don't know if it's a correct way.
I can suggest the following Monte-Carlo approach. You find a master volume that contains the entire mass of the estimated probability density. This should be as small as possible for the sake of efficiency. Then you generate a large number of test points in the master volume, either on a grid or randomly according to a uniform distribution. The probability content of a specific volume V can be estimated by the sum of the density values of the test points in V over the sum of the density values of all test points. I am afraid, however, that in 3D you would need at least 1E6 test points, probably more. If you give me access to your sample, I would be pleased to try out my suggestion. It should also be fairly easy to work out an estimate of the standard error of the estimated probability content of V.

Why is my Gaussian function giving values out of range?

I am trying to plot Gaussian using Matlab. My code is like this.
a=1/(0.1*sqrt(2*3.14))
y1=a*exp(-1*(((X1-Mu).^2)./(2*(Sigma^2)) ))
plot(X1,y1)
My graph looks like the image on link
It is showing correct shape but values at y axis is going up to 4. As per my knowledge Gaussian is a probability distribution function and thus must always return value between 0 and 1.Thus I am apprehensive if my implementation is correct?
Yes it is a probability distribution function but it is not required to return value between 0 and 1 everytime. As you can see from the picture below, Gaussian graph depends on variance and mean.
Your implementation is correct. The gaussian is a probability DENSITY function, which is different from a probability distribution. The former must only be larger or equal than zero but when integrated over the entire range of posible X1, the result must be equal to 1.
Probability distributions are the ones whos values must be lower or equal to 1.
As a sidenote. Matlab has both the gaussian probability density and distribution functions builtin as normpdf and normcdf respectively.

Sample multinomial distribution in Matlab without using mnrnd

I know for a random variable x that P(x=i) for each i=1,2,...,100. Then how may I sample x by a multinomial distribution, based on the given P(x=i) in Matlab?
I am allowed to use the Matlab built-in commands rand and randi, but not mnrnd.
In general, you can sample numbers from any 1 dimensional probability distribution X using a uniform random number generator and the inverse cumulative distribution function of X. This is known as inverse transform sampling.
random_x = xcdf_inverse(rand())
How does this apply here? If you have your vector p of probabilities defining your multinomial distribution, F = cumsum(p) gives you a vector that defines the CDF. You can then generate a uniform random number on [0,1] using temp = rand() and then find the first row in F greater than temp. This is basically using the inverse CDF of the multinomial distribution.
Be aware though that for some distributions (eg. gamma distribution), this turns out to be an inefficient way to generate random draws because evaluating the inverse CDF is so slow (if the CDF cannot expressed analytically, slower numerical methods must be used).

Generate random samples from arbitrary discrete probability density function in Matlab

I've got an arbitrary probability density function discretized as a matrix in Matlab, that means that for every pair x,y the probability is stored in the matrix:
A(x,y) = probability
This is a 100x100 matrix, and I would like to be able to generate random samples of two dimensions (x,y) out of this matrix and also, if possible, to be able to calculate the mean and other moments of the PDF. I want to do this because after resampling, I want to fit the samples to an approximated Gaussian Mixture Model.
I've been looking everywhere but I haven't found anything as specific as this. I hope you may be able to help me.
Thank you.
If you really have a discrete probably density function defined by A (as opposed to a continuous probability density function that is merely described by A), you can "cheat" by turning your 2D problem into a 1D problem.
%define the possible values for the (x,y) pair
row_vals = [1:size(A,1)]'*ones(1,size(A,2)); %all x values
col_vals = ones(size(A,1),1)*[1:size(A,2)]; %all y values
%convert your 2D problem into a 1D problem
A = A(:);
row_vals = row_vals(:);
col_vals = col_vals(:);
%calculate your fake 1D CDF, assumes sum(A(:))==1
CDF = cumsum(A); %remember, first term out of of cumsum is not zero
%because of the operation we're doing below (interp1 followed by ceil)
%we need the CDF to start at zero
CDF = [0; CDF(:)];
%generate random values
N_vals = 1000; %give me 1000 values
rand_vals = rand(N_vals,1); %spans zero to one
%look into CDF to see which index the rand val corresponds to
out_val = interp1(CDF,[0:1/(length(CDF)-1):1],rand_vals); %spans zero to one
ind = ceil(out_val*length(A));
%using the inds, you can lookup each pair of values
xy_values = [row_vals(ind) col_vals(ind)];
I hope that this helps!
Chip
I don't believe matlab has built-in functionality for generating multivariate random variables with arbitrary distribution. As a matter of fact, the same is true for univariate random numbers. But while the latter can be easily generated based on the cumulative distribution function, the CDF does not exist for multivariate distributions, so generating such numbers is much more messy (the main problem is the fact that 2 or more variables have correlation). So this part of your question is far beyond the scope of this site.
Since half an answer is better than no answer, here's how you can compute the mean and higher moments numerically using matlab:
%generate some dummy input
xv=linspace(-50,50,101);
yv=linspace(-30,30,100);
[x y]=meshgrid(xv,yv);
%define a discretized two-hump Gaussian distribution
A=floor(15*exp(-((x-10).^2+y.^2)/100)+15*exp(-((x+25).^2+y.^2)/100));
A=A/sum(A(:)); %normalized to sum to 1
%plot it if you like
%figure;
%surf(x,y,A)
%actual half-answer starts here
%get normalized pdf
weight=trapz(xv,trapz(yv,A));
A=A/weight; %A normalized to 1 according to trapz^2
%mean
mean_x=trapz(xv,trapz(yv,A.*x));
mean_y=trapz(xv,trapz(yv,A.*y));
So, the point is that you can perform a double integral on a rectangular mesh using two consecutive calls to trapz. This allows you to compute the integral of any quantity that has the same shape as your mesh, but a drawback is that vector components have to be computed independently. If you only wish to compute things which can be parametrized with x and y (which are naturally the same size as you mesh), then you can get along without having to do any additional thinking.
You could also define a function for the integration:
function res=trapz2(xv,yv,A,arg)
if ~isscalar(arg) && any(size(arg)~=size(A))
error('Size of A and var must be the same!')
end
res=trapz(xv,trapz(yv,A.*arg));
end
This way you can compute stuff like
weight=trapz2(xv,yv,A,1);
mean_x=trapz2(xv,yv,A,x);
NOTE: the reason I used a 101x100 mesh in the example is that the double call to trapz should be performed in the proper order. If you interchange xv and yv in the calls, you get the wrong answer due to inconsistency with the definition of A, but this will not be evident if A is square. I suggest avoiding symmetric quantities during the development stage.

Determine Covariance for multivariate normal distribution in MATLAB

I am trying to create a bivariate normal distribution of random numbers in Matlab that is symmetrical. I know the standard deviation of the gaussian (15 for example) and that it is the same in both directions. How do I use this standard deviation information to get the covariance in a form that Matlab will accept for the mvnrnd command? Thanks, I would really appreciate any advice.
First of all, you need to know the correlation between the two normal variables. Like #Luis said, the diagonal will be 15 each but for the covariance, you need to know the correlation between both.
They are related by this equation:
cov(x,y) = correlation(x,y)*std(x)*std(y)
But if you do not know the correlation, then you can calculate the sample covariance.
Forumla for sample covariance:
To calculate in Matlab:
cov = (1/n)*(x-mean(x))*(y-mean(y))'
With reference to:http://www.cogsci.ucsd.edu/~desa/109/trieschmarksslides.pdf
If the random variables are independent, the off-diaginal elements of the covariance matrix are zero. So that matrix will be diag(std1,std2), where std1 and std2 are the standard deviations of your two variables. In your example you would use diag(15,15).
If the random variables are not independent, you need to specify all four elements of the covariance matrix.
You can use the command cov in Matlab:
SIGMA = cov([x y]);
HTH