Cholesky decomposition for simulation correlated random variables - simulation

I have a correlation matrix for N random variables. Each of them is uniformly distributed within [0,1]. I am trying to simulate these random variables, how can I do that? Note N > 2. I was trying to using Cholesky Decomposition and below is my steps:
get the lower triangle of the correlation matrix (L=N*N)
independently sample 10000 times for each of the N uniformly distributed random variables (S=N*10000)
multiply the two: L*S, and this gives me correlated samples but the range of them is not within [0,1] anymore.
How can I solve the problem?
I know that if I only have 2 random variables I can do something like:
1*x1+sqrt(1-tho^2)*y1
to get my correlated sample y. But if you have more than two variables correlated, not sure what should I do.

You can get approximate solutions by generating correlated normals using the Cholesky factorization, then converting them to U(0,1)'s using the normal CDF. The solution is approximate because the normals have the desired correlation, but converting to uniforms is a non-linear transformation and only linear xforms preserve correlation.
There's a transformation available which will give exact solutions if the transformed Var/Cov matrix is positive semidefinite, but that's not always the case. See the abstract at https://www.tandfonline.com/doi/abs/10.1080/03610919908813578.

Related

How can I reduce extract features from a set of Matrices and vectors to be used in Machine Learning in MATLAB

I have a task where I need to train a machine learning model to predict a set of outputs from multiple inputs. My inputs are 1000 iterations of a set of 3x 1 vectors, a set of 3x3 covariance matrices and a set of scalars, while my output is just a set of scalars. I cannot use regression learner app because these inputs need to have the same dimensions, any idea on how to unify them?
One possible way to solve this is to flatten the covariance matrix into a vector. Once you did that, you can construct a 1000xN matrix where 1000 refers to the number of samples in your dataset and N is the number of features. For example if your features consist of a 3x1 vector, a 3x3 covariance matrix and lets say 5 other scalars, N could be 3+3*3+5=17. You then use this matrix to train an arbitrary model such as a linear regressor or more advanced models like a tree or the like.
When training machine learning models it is important to understand your data and exploit its structure to help the learning algorithms. For example we could use the fact that a covariance matrix is symmetric and positive semi-definite and thus lives in a closed convex cone. Symmetry of the matrix implies that it lives in a subspace of the set of all 3x3 matrices. In fact the dimension of the space of 3x3 symmetric matrices is only 6. You can use that knowledge to reduce redundancy in your data.

Sample multinomial distribution in Matlab without using mnrnd

I know for a random variable x that P(x=i) for each i=1,2,...,100. Then how may I sample x by a multinomial distribution, based on the given P(x=i) in Matlab?
I am allowed to use the Matlab built-in commands rand and randi, but not mnrnd.
In general, you can sample numbers from any 1 dimensional probability distribution X using a uniform random number generator and the inverse cumulative distribution function of X. This is known as inverse transform sampling.
random_x = xcdf_inverse(rand())
How does this apply here? If you have your vector p of probabilities defining your multinomial distribution, F = cumsum(p) gives you a vector that defines the CDF. You can then generate a uniform random number on [0,1] using temp = rand() and then find the first row in F greater than temp. This is basically using the inverse CDF of the multinomial distribution.
Be aware though that for some distributions (eg. gamma distribution), this turns out to be an inefficient way to generate random draws because evaluating the inverse CDF is so slow (if the CDF cannot expressed analytically, slower numerical methods must be used).

Determine Covariance for multivariate normal distribution in MATLAB

I am trying to create a bivariate normal distribution of random numbers in Matlab that is symmetrical. I know the standard deviation of the gaussian (15 for example) and that it is the same in both directions. How do I use this standard deviation information to get the covariance in a form that Matlab will accept for the mvnrnd command? Thanks, I would really appreciate any advice.
First of all, you need to know the correlation between the two normal variables. Like #Luis said, the diagonal will be 15 each but for the covariance, you need to know the correlation between both.
They are related by this equation:
cov(x,y) = correlation(x,y)*std(x)*std(y)
But if you do not know the correlation, then you can calculate the sample covariance.
Forumla for sample covariance:
To calculate in Matlab:
cov = (1/n)*(x-mean(x))*(y-mean(y))'
With reference to:http://www.cogsci.ucsd.edu/~desa/109/trieschmarksslides.pdf
If the random variables are independent, the off-diaginal elements of the covariance matrix are zero. So that matrix will be diag(std1,std2), where std1 and std2 are the standard deviations of your two variables. In your example you would use diag(15,15).
If the random variables are not independent, you need to specify all four elements of the covariance matrix.
You can use the command cov in Matlab:
SIGMA = cov([x y]);
HTH

Creating a 1D Second derivative of gaussian Window

In MATLAB I need to generate a second derivative of a gaussian window to apply to a vector representing the height of a curve. I need the second derivative in order to determine the locations of the inflection points and maxima along the curve. The vector representing the curve may be quite noise hence the use of the gaussian window.
What is the best way to generate this window?
Is it best to use the gausswin function to generate the gaussian window then take the second derivative of that?
Or to generate the window manually using the equation for the second derivative of the gaussian?
Or even is it best to apply the gaussian window to the data, then take the second derivative of it all? (I know these last two are mathematically the same, however with the discrete data points I do not know which will be more accurate)
The maximum length of the height vector is going to be around 100-200 elements.
Thanks
Chris
I would create a linear filter composed of the weights generated by the second derivative of a Gaussian function and convolve this with your vector.
The weights of a second derivative of a Gaussian are given by:
Where:
Tau is the time shift for the filter. If you are generating weights for a discrete filter of length T with an odd number of samples, set tau to zero and allow t to vary from [-T/2,T/2]
sigma - varies the scale of your operator. Set sigma to a value somewhere between T/6. If you are concerned about long filter length then this can be reduced to T/4
C is the normalising factor. This can be derived algebraically but in practice I always do this numerically after calculating the filter weights. For unity gain when smoothing periodic signals, I will set C = 1 / sum(G'').
In terms of your comment on the equivalence of smoothing first and taking a derivative later, I would say it is more involved than that. As which derivative operator would you use in the second step? A simple central difference would not yield the same results.
You can get an equivalent (but approximate) response to a second derivative of a Gaussian by filtering the data with two Gaussians of different scales and then taking the point-wise differences between the two resulting vectors. See Difference of Gaussians for that approach.

scaling when sampling from multivariate gaussian

I have a data matrix A (with dependencies between columns) of which I estimate the covariance matrix S. I now want to use this covariance matrix to simulate a new matrix A_sim. Since I assume that the underlying data generator of A was gaussian, I can simply sample from a gaussian specified by S. I do that in matlab as follows:
A_sim = randn(size(A))*chol(S);
However, the values in A_sim are way larger than in A. if I scale down S by a factor of 100, A_sim looks much better. I am now looking for a way to determine this scaling factor in a principled way. can anyone give advise or suggest literature that might be helpful?
Matlab has the function mvnrnd which generates multivariate random variables for you.