I compute the multinomial Gaussian density for some huge number of times in a project where I update the covariance matrix by rank-1. Instead of computing the covariance from scratch, I used the cholupdate function to add a new sample to the covariance and remove a new sample to the covariance. By this way, the update is told to be in $O(n^2)$ as opposed to $O(n^3)$ Cholesky factorization of the covariance matrix.
persistent R
if (initialize) % or isempty(R)
% compute covariance V
R = chol(V);
else
R = cholupdate(R,xAdded);
detVar = prod(diag(R))^2;
Rt = R';
coeff = 1/sqrt((2*pi)^dimension*detVar);
y = Rt\x;
logp = log(coeff) - 1/2 * norm(y)^2;
Actually the code is quite complicated but I simplified it here. I wonder if there is a faster way to compute the inverse (the Rt\x part in the code) of an upper triangular matrix in MATLAB. Do you have any ideas to do it more efficiently in MATLAB.
Note that computing the determinant is also faster this way. So the new method will also not bad for the computation of the determinant.
The mldivide function is smart enough to check for triangular matrices, in which case it uses a forward/backward substitution method to efficiently solve the linear system:
AX=B <--> X=inv(A)*B <--> X=A\B
(compute x1, substitute it in second equation and compute x2, substitute in third ...)
Related
I have a question about solving linear equation Ax=b, in which x is unknown, A is square matrix NxN and non-singular matrix.
The vector x can be solved by
x=inv(A)*b
or
x=A\b
In Matlab, the ‘\’ command invokes an algorithm which depends upon the structure of the matrix A and includes checks (small overhead) on properties of A. Hence, It highly depends on A structure. However, A structure is unknown (i.e random matrix). I want to measure complexity of above equation. Hence, to fairly comparison, I need to fixed the method which I used. In this case, I choose Gaussian Elimination (GE) with complexity O(N^3) My question is how can I choose/fix the method (i.e. GE) to solve above equation?
One way would be to compute the LU factorisation (assuming A is not symmetric)
[L,U] = lu(A)
where L is a permutation of a lower-triangular matrix with unit diagonal and U is upper triangular. This is equivalent to the Gaussian elimination.
Then, when you solve Ax = b, you actually perform first Ly = b and then Ux = y.
The important thing is that solving these linear system is essentially O(n^2), while computing the factorisation is O(n^3). So if n is big, you can just measure the time taken to compute the LU factorisation.
For random matrices, you can see the complexity of the LU factorization like that
nn = round(logspace(2, 4, 20)) ;
time = zeros(size(nn)) ;
for i = 1:numel(nn)
A = rand(nn(i),nn(i)) ;
tic() ; [L,U] = lu(A) ;
time(i) = toc() ;
end
loglog(nn,time) ;
(change the "4" to something bigger or smaller, depending on you pc).
On my laptop, I have this result
where the slop of 3 (hence, the O(n^3) complexity) is fairly clear.
i have (256*1) vectors of feature come from (16*16) of gray images. number of vectors is 550
when i compute Sample covariance of this vectors and compute covariance matrix determinant
answer is inf
it is possible determinant of finite matrix with finite range (0:255) value be infinite or i mistake some where?
in fact i want classification with bayesian estimation , my distribution is gaussian and when
i compute determinant be inf and ultimate Answer(likelihood) is zero .
some part of my code:
Mean = mean(dataSet,2);
MeanMatrix = Mean*ones(1,NoC);
Xc = double(dataSet)-MeanMatrix; % transform data to the origine
Sigma = (1/NoC) *Xc*Xc'; % calculate sample covariance matrix
Parameters(i).M = Mean';
Parameters(i).C = Sigma;
likelihoods(i) = (1/(2*pi*sqrt(det(params(i).C)))) * (exp(-0.5 * (double(X)-params(i).M)' * inv(params(i).C) * (double(X)-params(i).M)));
variable i show my classes;
variable X show my feature vector;
Can the determinant of such matrix be infinite? No it cannot.
Can it evaluate as infinite? Yes definitely.
Here is an example of a matrix with a finite amount of elements, that are not too big, yet the determinant will rarely evaluate as a finite number:
det(rand(255)*255)
In your case, probably what is happening is that you have too few datapoints to produce a full-rank covariance matrix.
For instance, if you have N examples, each with dimension d, and N<d, then your d x d covariance matrix will not be full rank and will have a determinant of zero.
In this case, a matrix inverse (precision matrix) does not exist. However, attempting to compute the determinant of the inverse (by taking 1/|X'*X|=1/0 -> \infty) will produce an infinite value.
One way to get around this problem is to set the covariance to X'*X+eps*eye(d), where eps is a small value. This technique corresponds to placing a weak prior distribution on elements of X.
no it is not possible. it may be singular but taking elements a large value has will have a determinant value.
I have to solve in MATLAB a linear system of equations A*x=B where A is symmetric and its elements depend on the difference of the indices: Aij=f(i-j).
I use iterative solvers because the size of A is say 40000x40000. The iterative solvers require to determine the product A*x where x is the test solution. The evaluation of this product turns out to be a convolution and therefore can be done dy means of fast fourier transforms (cputime ~ Nlog(N) instead of N^2). I have the following questions to this problem:
is this convolution circular? Because if it is circular I think that I have to use a specific indexing for the new matrices to take the fft. Is that right?
I find difficult to program the routine for the fft because I cannot understand the indexing I should use. Is there any ready routine which I can use to evaluate by fft directly the product A*x and not the convolution? Actually, the matrix A is constructed of 3x3 blocks and is symmetric. A ready routine for the product A*x would be the best solution for me.
In case that there is no ready routine, could you give me an idea by example how I could construct this routine to evaluate a matrix-vector product by fft?
Thank you in advance,
Panos
Very good and interesting question! :)
For certain special matrix structures, the Ax = b problem can be solved very quickly.
Circulant matrices.
Matrices corresponding to cyclic convolution Ax = h*x (* - is convolution symbol) are diagonalized in
the Fourier domain, and can be solved by:
x = ifft(fft(b)./fft(h));
Triangular and banded.
Triangular matrices and diagonally-dominant banded matrices are solved
efficiently by sparse LU factorization:
[L,U] = lu(sparse(A)); x = U\(L\b);
Poisson problem.
If A is a finite difference approximation of the Laplacian, the problem is efficiently solved by multigrid methods (e.g., web search for "matlab multigrid").
Interesting question!
The convolution is not circular in your case, unless you impose additional conditions. For example, A(1,3) should equal A(2,1), etc.
You could do it with conv (retaining only the non-zero-padded part with option valid), which probably is also N*log(N). For example, let
A = [a b c d
e a b c
f e a b
g f e a];
Then A*x is the same as
conv(fliplr([g f e a b c d]),x,'valid').'
Or more generally, A*x is the same as
conv(fliplr([A(end,1:end-1) A(1,:)]),x,'valid').'
I'd like to add some comments on Pio_Koon's answer.
First of all, I wouldn't advise to follow the suggestion for triangular and banded matrices. The time taken by a call to Matlab's lu() procedure on a large sparse matrix massively overshadows any benefits gained by solving the linear system as x=U\(L\b).
Second, in the Poisson problem you end up with a circulant matrix, therefore you can solve it using the FFT as described. In this specific case, your convolution mask h is a Laplacian, i.e., h=[0 -0.25 0; -0.25 1 -0.25; 0 -0.25 0].
I am wondering how to draw samples in matlab, where I have precision matrix and mean as the input argument.
I know mvnrnd is a typical way to do so, but it requires the covariance matrix (i.e inverse of precision)) as the argument.
I only have precision matrix, and due to the computational issue, I can't invert my precision matrix, since it will take too long (my dimension is about 2000*2000)
Good question. Note that you can generate samples from a multivariant normal distribution using samples from the standard normal distribution by way of the procedure described in the relevant Wikipedia article.
Basically, this boils down to evaluating A*z + mu where z is a vector of independent random variables sampled from the standard normal distribution, mu is a vector of means, and A*A' = Sigma is the covariance matrix. Since you have the inverse of the latter quantity, i.e. inv(Sigma), you can probably do a Cholesky decomposition (see chol) to determine the inverse of A. You then need to evaluate A * z. If you only know inv(A) this can still be done without performing a matrix inverse by instead solving a linear system (e.g. via the backslash operator).
The Cholesky decomposition might still be problematic for you, but I hope this helps.
If you want to sample from N(μ,Q-1) and only Q is available, you can take the Cholesky factorization of Q, L, such that LLT=Q. Next take the inverse of LT, L-T, and sample Z from a standard normal distribution N(0, I).
Considering that L-T is an upper triangular dxd matrix and Z is a d-dimensional column vector,
μ + L-TZ will be distributed as N(μ, Q-1).
If you wish to avoid taking the inverse of L, you can instead solve the triangular system of equations LTv=Z by back substitution. μ+v will then be distributed as N(μ, Q-1).
Some illustrative matlab code:
% make a 2x2 covariance matrix and a mean vector
covm = [3 0.4*(sqrt(3*7)); 0.4*(sqrt(3*7)) 7];
mu = [100; 2];
% Get the precision matrix
Q = inv(covm);
%take the Cholesky decomposition of Q (chol in matlab already returns the upper triangular factor)
L = chol(Q);
%draw 2000 samples from a standard bivariate normal distribution
Z = normrnd(0,1, [2, 2000]);
%solve the system and add the mean
X = repmat(mu, 1, 2000)+L\Z;
%check the result
mean(X')
var(X')
corrcoef(X')
% compare to the sampling from the covariance matrix
Y=mvnrnd(mu,covm, 2000)';
mean(Y')
var(Y')
corrcoef(Y')
scatter(X(1,:), X(2,:),'b')
hold on
scatter(Y(1,:), Y(2,:), 'r')
For more efficiency, I guess you can search for some package that efficiently solves triangular systems.
I am not sure if this is a programming or statistics question, but I am %99 sure that there should be a numerical problem. So maybe a programmatic solution can be proposed.
I am using MATLAB's mvnpdf function to calculate multi-variate Gaussian PDF of some observations. Frequently I get "SIGMA must be symmetric and positive definite" errors.
However, I am obtaining the covarince matrix from the data, so the data should be legal. A code to regenerate the problem is:
err_cnt = 0;
for i = 1:1000
try
a = rand(3);
c = cov(a);
m = mean(a);
mvnpdf(a, m, c);
catch me
err_cnt = err_cnt + 1;
end
end
I get ~500-600 errors each time I run.
P.S. I do not generate random data in my case, just generated here to demonstrate.
This is a linear algebra problem rather than a programming one. Recall the formula for the PDF of a k-dimensional multivariate normal distribution:
When your matrix is not strictly positive definite (i.e., it is singular), the determinant in the denominator is zero and the inverse in the exponent is not defined, which is why you're getting the errors.
However, it is a common misconception that covariance matrices must be positive definite. This is not true — covariance matrices only need to be positive semidefinite. It is perfectly possible for your data to have a covariance matrix that is singular. Also, since what you're forming is the sample covariance matrix of your observed data, you can have singularities arising from not having sufficient observations.
This happens if the diagonal values of the covariance matrix are (very close to) zero. A simple fix is add a very small constant number to c.
err_cnt = 0;
for i = 1:1000
try
a = rand(3);
c = cov(a) + .0001 * eye(3);
m = mean(a);
mvnpdf(a, m, c);
catch me
err_cnt = err_cnt + 1;
end
end
Results in 0 errors.
When your data lives in a subspace (singular covariance matrix), the probability density is singular in the full space. Loosely speaking, this means that your density is infinite at each point which is not very useful. Therefore, if this is the case, and it is NOT numerical, then you may want to consider the probability density in the subspace for which the data spans. Here the density is well defined. Adding a diagonal value as #Junuxx results in very different values in this case.