I am trying to create a bivariate normal distribution of random numbers in Matlab that is symmetrical. I know the standard deviation of the gaussian (15 for example) and that it is the same in both directions. How do I use this standard deviation information to get the covariance in a form that Matlab will accept for the mvnrnd command? Thanks, I would really appreciate any advice.
First of all, you need to know the correlation between the two normal variables. Like #Luis said, the diagonal will be 15 each but for the covariance, you need to know the correlation between both.
They are related by this equation:
cov(x,y) = correlation(x,y)*std(x)*std(y)
But if you do not know the correlation, then you can calculate the sample covariance.
Forumla for sample covariance:
To calculate in Matlab:
cov = (1/n)*(x-mean(x))*(y-mean(y))'
With reference to:http://www.cogsci.ucsd.edu/~desa/109/trieschmarksslides.pdf
If the random variables are independent, the off-diaginal elements of the covariance matrix are zero. So that matrix will be diag(std1,std2), where std1 and std2 are the standard deviations of your two variables. In your example you would use diag(15,15).
If the random variables are not independent, you need to specify all four elements of the covariance matrix.
You can use the command cov in Matlab:
SIGMA = cov([x y]);
HTH
Related
I have a correlation matrix for N random variables. Each of them is uniformly distributed within [0,1]. I am trying to simulate these random variables, how can I do that? Note N > 2. I was trying to using Cholesky Decomposition and below is my steps:
get the lower triangle of the correlation matrix (L=N*N)
independently sample 10000 times for each of the N uniformly distributed random variables (S=N*10000)
multiply the two: L*S, and this gives me correlated samples but the range of them is not within [0,1] anymore.
How can I solve the problem?
I know that if I only have 2 random variables I can do something like:
1*x1+sqrt(1-tho^2)*y1
to get my correlated sample y. But if you have more than two variables correlated, not sure what should I do.
You can get approximate solutions by generating correlated normals using the Cholesky factorization, then converting them to U(0,1)'s using the normal CDF. The solution is approximate because the normals have the desired correlation, but converting to uniforms is a non-linear transformation and only linear xforms preserve correlation.
There's a transformation available which will give exact solutions if the transformed Var/Cov matrix is positive semidefinite, but that's not always the case. See the abstract at https://www.tandfonline.com/doi/abs/10.1080/03610919908813578.
I'm trying to fit a multivariate normal distribution to data that I collected, in order to take samples from it.
I know how to fit a (univariate) normal distribution, using the fitdist function (with the 'Normal' option).
How can I do something similar for a multivariate normal distribution?
Doesn't using fitdist on every dimension separately assumes the variables are uncorrelated?
There isn't any need for a specialized fitting function; the maximum likelihood estimates for the mean and variance of the distribution are just the sample mean and sample variance. I.e., compute the sample mean and sample variance and you're done.
Estimate the mean with mean and the variance-covariance matrix with cov.
Then you can generate random numbers with mvnrnd.
It is also possible to use fitmgdist, but for just a multivariate normal distribution mean and cov are enough.
Yes, using fitdist on every dimension separately assumes the variables are uncorrelated and it's not what you want.
You can use [sigma,mu] = robustcov(X) function, where X is your multivariate data, i.e. X = [x1 x2 ... xn] and xi is a column vector data.
Then you can use Y = mvnpdf(X,mu,sigma) to get the values of the estimated normal probability density function.
https://www.mathworks.com/help/stats/normfit.html
https://www.mathworks.com/help/stats/mvnpdf.html
I want to generate a random vector in MATLAB with the distribution N(0,σ^2*I_dxd)
d (dimension) can be any number . How can I do this? Thanks in advance
If the variance/covariance matrix is σ^2*I, then the normals are independent. Generate d independent N(0,σ^2), or d standard normals and multiply them by σ.
I think you want randn(d,1) * sigma where randn() and sigma is the standard deviation σ in your problem statement.
You're just talking about the generation of d independent identically distributed random variables each with a normal distribution, right?
The command you need is randn so if you type help randn you should be able to figure it out.
Assuming your I is zero outside the diagonal:
randn(length(σ^2*I_dxd),1).*diag(σ^2*I_dxd)
If I is not zero outside the diagonal it gets a bit more complex.
I'm probably being a little dense but I'm not very mathsy and can't seem to understand the covariance element of creating multivariate data.
I'm after two columns of random data (representing two correlated variables).
I think I am right in needing to use the mvnrnd function and I understand that 'mu' must be a column of my mean vectors. As I need 4 distinct classes within my data these are going to be (1, 1) (-1 1) (1 -1) and (-1 -1). I assume I will have to do the function 4x with a different column of mean vectors each time and then combine them to get my full data set.
I don't understand what I should put for SIGMA - Matlab help tells me that it must be 'a d-by-d symmetric positive semi-definite matrix, or a d-by-d-by-n array' i.e. a covariance matrix. I don't understand how I create a covariance matrix for numbers that I am yet to generate.
Any advice would be greatly appreciated!
Assuming that I understood your case properly, I would go this way:
data = [normrnd(0,1,5000,1),normrnd(0,1,5000,1)]; %% your starting data series
MU = mean(data,1);
SIGMA = cov(data);
Now, it should be possible to feed mvnrnd with MU and SIGMA:
r = mvnrnd(MU,SIGMA,5000);
plot(r(:,1),r(:,2),'+') %% in case you wanna plot the results
I hope this helps.
I think your aim is to generate the simulated multivariate gaussian distributed data. For example, I use
k = 6; % feature dimension
mu = rand(1,k);
sigma = 10*eye(k,k);
unit matrix by 10 times is a symmetric positive semi-definite matrix. And the gaussian distribution will be more round than other type of sigma.
then you can use it as the above example of mvnrnd function and see the plot.
I have a data matrix A (with dependencies between columns) of which I estimate the covariance matrix S. I now want to use this covariance matrix to simulate a new matrix A_sim. Since I assume that the underlying data generator of A was gaussian, I can simply sample from a gaussian specified by S. I do that in matlab as follows:
A_sim = randn(size(A))*chol(S);
However, the values in A_sim are way larger than in A. if I scale down S by a factor of 100, A_sim looks much better. I am now looking for a way to determine this scaling factor in a principled way. can anyone give advise or suggest literature that might be helpful?
Matlab has the function mvnrnd which generates multivariate random variables for you.