I am trying to investigate the statistical variance of the eigenvalues of sample covariance matrices using Matlab. To clarify, each sample covariance matrix is constructed from a finite number of vector snapshots (afflicted with random white Gaussian noise). Then, over a large number of trials, a large number of such matrices are generated and eigendecomposed in order to estimate the theoretical statistics of the eigenvalues.
According to several sources (see, for example, [1, Eq.3] and [2, Eq.11]), the variance of each sample eigenvalue should be equal to that theoretical eigenvalue squared, divided by the number of vector snapshots used for each covariance matrix. However, the results I get from Matlab aren't even close.
Is this an issue with my code? With Matlab? (I've never had such trouble working on similar problems).
Here's a very simple example:
% Data vector length
Lvec = 5;
% Number of snapshots per sample covariance matrix
N = 200;
% Number of simulation trials
Ntrials = 10000;
% Noise variance
sigma2 = 10;
% Theoretical covariance matrix
Rnn_th = sigma2*eye(Lvec);
% Theoretical eigenvalues (should all be sigma2)
lambda_th = sort(eig(Rnn_th),'descend');
lambda = zeros(Lvec,Ntrials);
for trial = 1:Ntrials
% Generate new (complex) white Gaussian noise data
n = sqrt(sigma2/2)*(randn(Lvec,N) + 1j*randn(Lvec,N));
% Sample covariance matrix
Rnn = n*n'/N;
% Save sample eigenvalues
lambda(:,trial) = sort(eig(Rnn),'descend');
end
% Estimated eigenvalue covariance matrix
b = lambda - lambda_th(:,ones(1,Ntrials));
Rbb = b*b'/Ntrials
% Predicted (approximate) theoretical result
Rbb_th_approx = diag(lambda_th.^2/N)
References:
[1] Friedlander, B.; Weiss, A.J.; , "On the second-order statistics of the eigenvectors of sample covariance matrices," Signal Processing, IEEE Transactions on , vol.46, no.11, pp.3136-3139, Nov 1998
[2] Kaveh, M.; Barabell, A.; , "The statistical performance of the MUSIC and the minimum-norm algorithms in resolving plane waves in noise," Acoustics, Speech and Signal Processing, IEEE Transactions on , vol.34, no.2, pp. 331- 341, Apr 1986
According to the abstract of from your first reference:
"Formulas for the second-order statistics of the eigenvectors have been derived in the statistical literature and are widely used. We point out a discrepancy between the statistics observed in numerical simulations and the theoretical formulas, due to the nonuniqueness of the definition of eigenvectors. We present two ways to resolve this discrepancy. The first involves modifying the theoretical formulas to match the computational results. The second involved a simple modification of the computations to make them match existing formulas."
Sounds like there is a discrepancy, and it also sounds like the two 'solutions' are hacks, but without access to the actual paper, it's kind of hard to help.
Related
Consider the following draws for a 2x1 vector in Matlab with a probability distribution that is a mixture of two Gaussian components.
P=10^3; %number draws
v=1;
%First component
mu_a = [0,0.5];
sigma_a = [v,0;0,v];
%Second component
mu_b = [0,8.2];
sigma_b = [v,0;0,v];
%Combine
MU = [mu_a;mu_b];
SIGMA = cat(3,sigma_a,sigma_b);
w = ones(1,2)/2; %equal weight 0.5
obj = gmdistribution(MU,SIGMA,w);
%Draws
RV_temp = random(obj,P);%Px2
% Transform each component of RV_temp into a uniform in [0,1] by estimating the cdf.
RV1=ksdensity(RV_temp(:,1), RV_temp(:,1),'function', 'cdf');
RV2=ksdensity(RV_temp(:,2), RV_temp(:,2),'function', 'cdf');
Now, if we check whether RV1 and RV2 are uniformly distributed on [0,1] by doing
ecdf(RV1)
ecdf(RV2)
we can see that RV1 is uniformly distributed on [0,1] (the empirical cdf is close to the 45 degree line) while RV2 is not.
I don't understand why. It seems that the more distant are mu_a(2)and mu_b(2), the worse the job done by ksdensity with a reasonable number of draws. Why?
When you have a mixture of N(0.5,v) and N(8.2,v) then the range of the generated data is larger than if you had expectation which were closer, like N(0,v) and N(0,v), as you have in the other dimension. Then you ask ksdensity to approximate a function using P points inside this range.
Like in standard linear interpolation, the denser the points the better approximation of the function (inside the range), this is the same case here. Thus in the N(0.5,v) and N(8.2,v) where the points are "sparse" (or sparser, is that a word?) the approximation is worse than in the N(0,v) and N(0,v) where the points are denser.
As a small side note, are there any reason that you do not apply ksdensity directly on the bivariate data? Also I cannot reproduce your comment where you say that 5e2points are also good. Final comment, 1e3 is typically prefered over 10^3.
I think this is simply about the number of samples you're using. For the first example, the means of the two Gaussians are relatively close, hence a thousand samples are enough to obtain a cdf really close the the U[0,1] cdf. On the second vector though, you have a higher difference, and need more samples. With 100000 samples, I obtained the following result:
With 1000 I obtained this:
Which is clearly farther from the Uniform cdf function. Try to increase the number of samples to a million and check if the result is again getting closer.
I am currently fiddling with multivariate kernel density estimations for estimating the probability density functions (PDF) of hydrological data sets using Matlab. I am most familiar with kernel density estimation using Gaussian kernels as outlined in Sharma (2000 and 2014) (where the kernel bandwidths are set using the Gaussian Reference Rule (GRR)). The GRR is written as follows (Sharma, 2000):
where lambda_ref = GRR bandwidth of kernel, n is the sample size, and d is the dimension of the data set we are using for density estimation. To estimate the multivariate density of our data set X we use the following formula (Sharma, 2000):
where lamda is the same as lamda_ref above, S is the sample covariance of X and det() stands for determinant.
My question is: I understand that there are many "fast" methods for calculating the Gaussian kernel function represented by the term exp() such as the method proposed here (using Matlab): http://mrmartin.net/?p=218. Since I will be working with data sets that are quite large in sample size (1000-10,000) I am looking for a fast code. Is anyone aware how I can write a fast code for the second equation that takes into account the inverse of the sample covariance matrix (S^-1)?
I greatly appreciate any help that can be provided on this issue. Thank you!
Note(s):
I understand that there is a Matlab code for calculating the second equation, found as a sub-function in: http://www.mathworks.com/matlabcentral/fileexchange/29039-mutual-information-2-variablle/content/MutualInfo.m. However this code has a bottleneck in how it calculates the kernel matrix.
References:
1 A. Sharma, Seasonal to interannual rainfall probabilistic forecasts for improved water supply management: Part 3 — A nonparametric probabilistic forecast model, Journal of Hydrology, Volume 239, Issues 1–4, 20 December 2000, Pages 249-258, ISSN 0022-1694, http://dx.doi.org/10.1016/S0022-1694(00)00348-6.
2 Sharma, A., and R. Mehrotra (2014), An information theoretic alternative to model a natural system using observational information alone, Water Resour. Res., 50, 650–660, doi:10.1002/2013WR013845.
I have found a code that I am able to modify for my purposes. The original code is listed at the following link: http://www.kernel-methods.net/matlab/kernels/rbf.m.
Code
function K = rbf(coord,sig)
%function K = rbf(coord,sig)
%
% Computes an rbf kernel matrix from the input coordinates
%
%INPUTS
% coord = a matrix containing all samples as rows
% sig = sigma, the kernel width; squared distances are divided by
% squared sig in the exponent
%
%OUTPUTS
% K = the rbf kernel matrix ( = exp(-1/(2*sigma^2)*(coord*coord')^2) )
%
%
% For more info, see www.kernel-methods.net
%
%Author: Tijl De Bie, february 2003. Adapted: october 2004 (for speedup).
n=size(coord,1);
K=coord*coord'/sig^2;
d=diag(K);
K=K-ones(n,1)*d'/2;
K=K-d*ones(1,n)/2;
K=exp(K);
Modified Code incorporating sample covariance scaling:
xcov = cov(x.'); % sample covariance of the data
invxc = pinv(xcov); % inversion of data sample covariance
coord = x.';
sig = sigma; % kernel bandwidth
n = size(coord,1);
K = coord*invxc*coord'/sig^2;
d = diag(K);
K = K-ones(n,1)*d'/2;
K = K-d*ones(1,n)/2;
K = exp(K); % kernel matrix
I hope this helps someone else looking into the same problem.
I would like to perform conditional simulations for Gaussian process (GP) models in Matlab. I have found a tutorial by Martin Kolář (http://mrmartin.net/?p=223).
sigma_f = 1.1251; %parameter of the squared exponential kernel
l = 0.90441; %parameter of the squared exponential kernel
kernel_function = #(x,x2) sigma_f^2*exp((x-x2)^2/(-2*l^2));
%This is one of many popular kernel functions, the squared exponential
%kernel. It favors smooth functions. (Here, it is defined here as an anonymous
%function handle)
% we can also define an error function, which models the observation noise
sigma_n = 0.1; %known noise on observed data
error_function = #(x,x2) sigma_n^2*(x==x2);
%this is just iid gaussian noise with mean 0 and variance sigma_n^2s
%kernel functions can be added together. Here, we add the error kernel to
%the squared exponential kernel)
k = #(x,x2) kernel_function(x,x2)+error_function(x,x2);
X_o = [-1.5 -1 -0.75 -0.4 -0.3 0]';
Y_o = [-1.6 -1.3 -0.5 0 0.3 0.6]';
prediction_x=-2:0.01:1;
K = zeros(length(X_o));
for i=1:length(X_o)
for j=1:length(X_o)
K(i,j)=k(X_o(i),X_o(j));
end
end
%% Demo #5.2 Sample from the Gaussian Process posterior
clearvars -except k prediction_x K X_o Y_o
%We can also sample from this posterior, the same way as we sampled before:
K_ss=zeros(length(prediction_x),length(prediction_x));
for i=1:length(prediction_x)
for j=i:length(prediction_x)%We only calculate the top half of the matrix. This an unnecessary speedup trick
K_ss(i,j)=k(prediction_x(i),prediction_x(j));
end
end
K_ss=K_ss+triu(K_ss,1)'; % We can use the upper half of the matrix and copy it to the
K_s=zeros(length(prediction_x),length(X_o));
for i=1:length(prediction_x)
for j=1:length(X_o)
K_s(i,j)=k(prediction_x(i),X_o(j));
end
end
[V,D]=eig(K_ss-K_s/K*K_s');
A=real(V*(D.^(1/2)));
for i=1:7
standard_random_vector = randn(length(A),1);
gaussian_process_sample(:,i) = A * standard_random_vector+K_s/K*Y_o;
end
hold on
plot(prediction_x,real(gaussian_process_sample))
set(plot(X_o,Y_o,'r.'),'MarkerSize',20)
The tutorial generates the conditional simulations using a direct simulation method based on covariance matrix decomposition. It is my understanding that there are several methods of generating conditional simulations that may be better when the number of simulation points is large such as conditioning by Kriging using a local neighborhood. I have found information regarding several methods in J.-P. Chilès and P. Delfiner, “Chapter 7 - Conditional Simulations,” in Geostatistics: Modeling Spatial Uncertainty, Second Edition, John Wiley & Sons, Inc., 2012, pp. 478–628.
Is there an existing Matlab toolbox that can be used for conditional simulations? I am aware of DACE, GPML, and mGstat (http://mgstat.sourceforge.net/). I believe only mGstat offers the capability to perform conditional simulations. However, mGstat also seems to be limited to only 3D models and I am interested in higher dimensional models.
Can anybody offer any advice on getting started performing conditional simulations with an existing toolbox such as GPML?
===================================================================
EDIT
I have found a few more Matlab toolboxes: STK, ScalaGauss, ooDACE
It appears STK is capable of conditional simulations using covariance matrix decomposition. However, is limited to a moderate number (maybe a few thousand?) of simulation points due to the Cholesky factorization.
I used the STK toolbox and I recommend it for others:
http://kriging.sourceforge.net/htmldoc/
I found that if you need conditional simulations at a large number of points then you might consider generating a conditional simulation at the points in a large design of experiment (DoE) and then simply relying on the mean prediction conditional on that DoE.
I need to create a diagonal matrix containing the Fourier coefficients of the Gaussian wavelet function, but I'm unsure of what to do.
Currently I'm using this function to generate the Haar Wavelet matrix
http://www.mathworks.co.uk/matlabcentral/fileexchange/33625-haar-wavelet-transformation-matrix-implementation/content/ConstructHaarWaveletTransformationMatrix.m
and taking the rows at dyadic scales (2,4,8,16) as the transform:
M= 256
H = ConstructHaarWaveletTransformationMatrix(M);
fi = conj(dftmtx(M))/M;
H = fi*H;
H = H(4,:);
H = diag(H);
etc
How do I repeat this for Gaussian wavelets? Is there a built in Matlab function which will do this for me?
For reference I'm implementing the algorithm in section 4 of this paper:
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=04218361
I maybe would not being answering the question, but i will try to help you advance.
As far as i know, the Matlab Wavelet Toolbox only deal with wavelet operations and coefficients, increase or decrease resolution levels, and similar operations, but do not exposes the internal matrices serving to doing the transformations from signals and coefficients.
Hence i fear the answer to this question is no. Some time ago, i did this for some of the Hart Class wavelet, and i actually build the matrix from the scratch, and then i compared the coefficients obtained with the Built-in Matlab Wavelet Toolbox, hence ensuring your matrices are good enough for your algorithm. In my case, recursive parameter estimation for time varying models.
For the function ConstructHaarWaveletTransformationMatrix it is really simple to create the matrix, because the Hart Class could be really simple expressed as Kronecker products.
The Gaussian Wavelet case as i fear should be done from the scratch too...
THe steps i suggest would be;
Although MATLAB dont include explicitely the matrices, you can use the Matlab built-in functions to recover the Gaussian Wavelets, and thus compose the matrix for your algorithm.
Build every column of the matrix with every Gaussian Wavelet, for every resolution levels you are requiring (the dyadic scales). Use the Matlab Wavelets toolbox for recover the shapes.
After this, compare the coefficients obtained by you, with the coefficients of the toolbox. This way you will correct the order of the Matrix row.
Numerically, being fj the signal projection over Vj (the PHI signals space, scaling functions) at resolution level j, and gj the signal projection over Wj (the PSI signals space, mother functions) at resolution level j, we can write:
f=fj0+sum_{j0}^{j1-1}{gj}
Hence, both fj0 and gj will induce two matrices, lets call them PHIj and PSIj matrices:
f=PHIj0*cj0+sum_{j0}^{j1-1}{PSIj*dj}
The PHIj columns contain the scaled and shifted scaling wavelet signal (one, for j0 only) for the approximation projection (the Vj0 space), and the PSIj columns contain the scaled and shifted mother wavelet signals (several, from j0 to j1-1) for the detail projection (onto the Wj0 to Wj1-1 spaces).
Hence, the Matrix you need is:
PHI=[PHIj0 PSIj0... PSIj1]
Thus you can express you original signal as:
f=PHI*C
where C is a vector of approximation and detail coefficients, for the levels:
C=[cj0' dj0'...dj1']'
The first part, for addressing the PHI build can be achieved by writing:
function PHI=MakePhi(l,str,Jmin,Jmax)
% [PHI]=MakePhi(l,str,Jmin,Jmax)
%
% Build full PHI Wavelet Matrix for obtaining wavelet coefficients
% (extract)
%FILTER
[LO_R,HI_R] = wfilters(str,'r');
lf=length(LO_R);
%PHI BUILD
PHI=[];
laux=l([end-Jmax end-Jmax:end]);
PHI=[PHI MakeWMatrix('a',str,laux)];
for j=Jmax:-1:Jmin
laux=l([end-j end-j:end]);
PHI=[PHI MakeWMatrix('d',str,laux)];
end
the wfilters is a MATLAB built in function, giving the required signal for the approximation and or detail wavelet signals.
The MakeWMatrix function is:
function M=MakeWMatrix(typestr,str,laux)
% M=MakeWMatrix(typestr,str,laux)
%
% Build Wavelet Matrix for obtaining wavelet coefficients
% for a single level vector.
% (extract)
[LO_R,HI_R] = wfilters(str,'r');
if typestr=='a'
F_R=LO_R';
else
F_R=HI_R';
end
la=length(laux);
lin=laux(2); lout=laux(3);
M=MakeCMatrix(F_R,lin,lout);
for i=3:la-1
lin=laux(i); lout=laux(i+1);
Mi=MakeCMatrix(LO_R',lin,lout);
M=Mi*M;
end
and finally the MakeCMatrix is:
function [M]=MakeCMatrix(F_R,lin,lout)
% Convolucion Matrix
% (extract)
lf=length(F_R);
M=[];
for i=1:lin
M(:,i)=[zeros(2*(i-1),1) ;F_R ;zeros(2*(lin-i),1)];
end
M=[zeros(1,lin); M ;zeros(1,lin)];
[ltot,lin]=size(M);
lmin=floor((ltot-lout)/2)+1;
lmax=floor((ltot-lout)/2)+lout;
M=M(lmin:lmax,:);
This last matrix should include some interpolation routine for having better general results in each case.
I expect this solve part of your problem.....
Hyp
I'm probably being a little dense but I'm not very mathsy and can't seem to understand the covariance element of creating multivariate data.
I'm after two columns of random data (representing two correlated variables).
I think I am right in needing to use the mvnrnd function and I understand that 'mu' must be a column of my mean vectors. As I need 4 distinct classes within my data these are going to be (1, 1) (-1 1) (1 -1) and (-1 -1). I assume I will have to do the function 4x with a different column of mean vectors each time and then combine them to get my full data set.
I don't understand what I should put for SIGMA - Matlab help tells me that it must be 'a d-by-d symmetric positive semi-definite matrix, or a d-by-d-by-n array' i.e. a covariance matrix. I don't understand how I create a covariance matrix for numbers that I am yet to generate.
Any advice would be greatly appreciated!
Assuming that I understood your case properly, I would go this way:
data = [normrnd(0,1,5000,1),normrnd(0,1,5000,1)]; %% your starting data series
MU = mean(data,1);
SIGMA = cov(data);
Now, it should be possible to feed mvnrnd with MU and SIGMA:
r = mvnrnd(MU,SIGMA,5000);
plot(r(:,1),r(:,2),'+') %% in case you wanna plot the results
I hope this helps.
I think your aim is to generate the simulated multivariate gaussian distributed data. For example, I use
k = 6; % feature dimension
mu = rand(1,k);
sigma = 10*eye(k,k);
unit matrix by 10 times is a symmetric positive semi-definite matrix. And the gaussian distribution will be more round than other type of sigma.
then you can use it as the above example of mvnrnd function and see the plot.