i have (256*1) vectors of feature come from (16*16) of gray images. number of vectors is 550
when i compute Sample covariance of this vectors and compute covariance matrix determinant
answer is inf
it is possible determinant of finite matrix with finite range (0:255) value be infinite or i mistake some where?
in fact i want classification with bayesian estimation , my distribution is gaussian and when
i compute determinant be inf and ultimate Answer(likelihood) is zero .
some part of my code:
Mean = mean(dataSet,2);
MeanMatrix = Mean*ones(1,NoC);
Xc = double(dataSet)-MeanMatrix; % transform data to the origine
Sigma = (1/NoC) *Xc*Xc'; % calculate sample covariance matrix
Parameters(i).M = Mean';
Parameters(i).C = Sigma;
likelihoods(i) = (1/(2*pi*sqrt(det(params(i).C)))) * (exp(-0.5 * (double(X)-params(i).M)' * inv(params(i).C) * (double(X)-params(i).M)));
variable i show my classes;
variable X show my feature vector;
Can the determinant of such matrix be infinite? No it cannot.
Can it evaluate as infinite? Yes definitely.
Here is an example of a matrix with a finite amount of elements, that are not too big, yet the determinant will rarely evaluate as a finite number:
det(rand(255)*255)
In your case, probably what is happening is that you have too few datapoints to produce a full-rank covariance matrix.
For instance, if you have N examples, each with dimension d, and N<d, then your d x d covariance matrix will not be full rank and will have a determinant of zero.
In this case, a matrix inverse (precision matrix) does not exist. However, attempting to compute the determinant of the inverse (by taking 1/|X'*X|=1/0 -> \infty) will produce an infinite value.
One way to get around this problem is to set the covariance to X'*X+eps*eye(d), where eps is a small value. This technique corresponds to placing a weak prior distribution on elements of X.
no it is not possible. it may be singular but taking elements a large value has will have a determinant value.
Related
If X is a multivariate t random variable with mean=[1,2,3,4,5] and a covariance matrix C, how to simulate points in matlab? I try mvtrnd in matlab, but clearly the sample mean does not give mean close to [1,2,3,4,5]. Also, when I test three simple examples, say X1 with mean 0 and C1=[1,0.3;0.3,1], X2 with mean 0 and C2=[0.5,0.15;0.15,0.5] and X3 with mean 0 and C3=[0.4,0.12;0.12,0.4] and use mvtrnd(C1,3,1000000), mvtrnd(C2,3,1000000) amd mvtrnd(C2,3,1000000) respectively, I find the sample points in each case give nearly the correlation matrix [1,0.3;0.3,1] but the sample covariance computed all give near [3,1;1,3]. Why and how to fix it?
The Mean
The t distribution has a zero mean unless you shift it. In the documentation for mvtrnd:
the distribution of t is that of a vector having a multivariate normal
distribution with mean 0, variance 1, and covariance matrix C, divided
by an independent chi-square random value having df degrees of
freedom.
Indeed, mean(X) will approach [0 0] for X = mvtrnd(C,df,n); as n gets larger.
The Correlation
Matching the correlation is straightforward as it addresses a part of the relationship between the two dimensions of X.
% MATLAB 2018b
df = 5; % degrees of freedom
C = [0.44 0.25; 0.25 0.44]; % covariance matrix
numSamples = 1000;
R = corrcov(C); % Convert covariance to correlation matrix
X = mvtrnd(R,df,numSamples); % X ~ multivariate t distribution
You can compare how well you matched the correlation matrix R using corrcoef or corr().
corrcoef(X) % Alternatively, use corr(X)
The Covariance
Matching the covariance is another matter. Admittedly, calling cov(X) will reveal that this is lacking. Recall that the diagonal of the covariance is the variance for the two components of X. My intuition is that we fixed the degrees of freedom df, so there is no way to match the desired variance (& covariance).
A useful function is corrcov which converts a covariance matrix into a correlation matrix.
Notice that this is unnecessary as the documentation for mvtrnd indicates
C must be a square, symmetric and positive definite matrix. If its
diagonal elements are not all 1 (that is, if C is a covariance matrix
rather than a correlation matrix), mvtrnd rescales C to transform it
to a correlation matrix before generating the random numbers.
I have a certain problem while implementing multivariate Gaussian distribution for anomaly detection.
I have referred the formula from Andrew Ng notes
http://www.holehouse.org/mlclass/15_Anomaly_Detection.html
below is the problem I face
Suppose I have a data set with 2 features and m number of training set i.e n=2 and wants to determine my multivariate Gaussian probability p(x;mu;sigma) which should be a [m*1] matrix because it produces estimated Gaussian value by feature correlation.
The problem I face is I am unable to use the formula to produce the matrix [m*1].
I am using Octave as IDE to develop the algorithm.
Below is a snapshot showcasing my problem
Considering the multiplication of the Red boundary equation because the LHS of the red boundary is just a real number
PLEASE HELP ME UNDERSTAND WHERE AM I GOING WRONG
Thanks
I think you got the dimensions wrong.
Let's assume you have a 2-dimensional (n=2) data of m instances. We can store this data as a n-by-m matrix in MATLAB (columns are data instances, rows represent features/dimensions). In this case we have:
X the data matrix of size nxm, each instance x = X(:,i) is a vector of size nx1 (column vector in our convention).
mu is the mean vector (mu = mean(X,2)). This is also a column vector of same size as an instance nx1.
sigma is the covariance matrix (sigma = cov(X.')). It has size nxn (it describes how each dimensions co-vary with each other dimension).
So the part that you highlighted in red involves expressions of the following sizes:
= ([nx1] - [nx1])' * [nxn] * ([nx1] - [nx1])
= [1xn] * [nxn] * [nx1]
= 1x1
I am trying to understand principal component analysis in Matlab,
There seems to be at least 3 different functions that do it.
I have some questions re the code below:
Am I creating approximate x values using only one eigenvector (the one corresponding to the largest eigenvalue) correctly? I think so??
Why are PC and V which are both meant to be the loadings for (x'x) presented differently? The column order is reversed because eig does not order the eigenvalues with the largest value first but why are they the negative of each other?
Why are the eig values not in ordered with the eigenvector corresponding to the largest eigenvalue in the first column?
Using the code below I get back to the input matrix x when using svd and eig, but the results from princomp seem to be totally different? What so I have to do to make princomp match the other two functions?
Code:
x=[1 2;3 4;5 6;7 8 ]
econFlag=0;
[U,sigma,V] = svd(x,econFlag);%[U,sigma,coeff] = svd(z,econFlag);
U1=U(:,1);
V1=V(:,1);
sigma_partial=sigma(1,1);
score1=U*sigma;
test1=score1*V';
score_partial=U1*sigma_partial;
test1_partial=score_partial*V1';
[PC, D] = eig(x'*x)
score2=x*PC;
test2=score2*PC';
PC1=PC(:,2);
score2_partial=x*PC1;
test2_partial=score2_partial*PC1';
[o1 o2 o3]=princomp(x);
Yes. According to the documentation of svd, diagonal elements of the output S are in decreasing order. There is no such guarantee for the the output D of eig though.
Eigenvectors and singular vectors have no defined sign. If a is an eigenvector, so is -a.
I've often wondered the same. Laziness on the part of TMW? Optimization, because sorting would be an additional step and not everybody needs 'em sorted?
princomp centers the input data before computing the principal components. This makes sense as normally the PCA is computed with respect to the covariance matrix, and the eigenvectors of x' * x are only identical to those of the covariance matrix if x is mean-free.
I would compute the PCA by transforming to the basis of the eigenvectors of the covariance matrix (centered data), but apply this transform to the original (uncentered) data. This allows to capture a maximum of variance with as few principal components as possible, but still to recover the orginal data from all of them:
[V, D] = eig(cov(x));
score = x * V;
test = score * V';
test is identical to x, up to numerical error.
In order to easily pick the components with the most variance, let's fix that lack of sorting ourselves:
[V, D] = eig(cov(x));
[D, ind] = sort(diag(D), 'descend');
V = V(:, ind);
score = x * V;
test = score * V';
Reconstruct the signal using the strongest principal component only:
test_partial = score(:, 1) * V(:, 1)';
In response to Amro's comments: It is of course also possible to first remove the means from the input data, and transform these "centered" data. In that case, for perfect reconstruction of the original data it would be necessary to add the means again. The way to compute the PCA given above is the one described by Neil H. Timm, Applied Multivariate Analysis, Springer 2002, page 446:
Given an observation vector Y with mean mu and covariance matrix Sigma of full rank p, the goal of PCA is to create a new set of variables called principal components (PCs) or principal variates. The principal components are linear combinations of the variables of the vector Y that are uncorrelated such that the variance of the jth component is maximal.
Timm later defines "standardized components" as those which have been computed from centered data and are then divided by the square root of the eigenvalues (i.e. variances), i.e. "standardized principal components" have mean 0 and variance 1.
I am wondering how to draw samples in matlab, where I have precision matrix and mean as the input argument.
I know mvnrnd is a typical way to do so, but it requires the covariance matrix (i.e inverse of precision)) as the argument.
I only have precision matrix, and due to the computational issue, I can't invert my precision matrix, since it will take too long (my dimension is about 2000*2000)
Good question. Note that you can generate samples from a multivariant normal distribution using samples from the standard normal distribution by way of the procedure described in the relevant Wikipedia article.
Basically, this boils down to evaluating A*z + mu where z is a vector of independent random variables sampled from the standard normal distribution, mu is a vector of means, and A*A' = Sigma is the covariance matrix. Since you have the inverse of the latter quantity, i.e. inv(Sigma), you can probably do a Cholesky decomposition (see chol) to determine the inverse of A. You then need to evaluate A * z. If you only know inv(A) this can still be done without performing a matrix inverse by instead solving a linear system (e.g. via the backslash operator).
The Cholesky decomposition might still be problematic for you, but I hope this helps.
If you want to sample from N(μ,Q-1) and only Q is available, you can take the Cholesky factorization of Q, L, such that LLT=Q. Next take the inverse of LT, L-T, and sample Z from a standard normal distribution N(0, I).
Considering that L-T is an upper triangular dxd matrix and Z is a d-dimensional column vector,
μ + L-TZ will be distributed as N(μ, Q-1).
If you wish to avoid taking the inverse of L, you can instead solve the triangular system of equations LTv=Z by back substitution. μ+v will then be distributed as N(μ, Q-1).
Some illustrative matlab code:
% make a 2x2 covariance matrix and a mean vector
covm = [3 0.4*(sqrt(3*7)); 0.4*(sqrt(3*7)) 7];
mu = [100; 2];
% Get the precision matrix
Q = inv(covm);
%take the Cholesky decomposition of Q (chol in matlab already returns the upper triangular factor)
L = chol(Q);
%draw 2000 samples from a standard bivariate normal distribution
Z = normrnd(0,1, [2, 2000]);
%solve the system and add the mean
X = repmat(mu, 1, 2000)+L\Z;
%check the result
mean(X')
var(X')
corrcoef(X')
% compare to the sampling from the covariance matrix
Y=mvnrnd(mu,covm, 2000)';
mean(Y')
var(Y')
corrcoef(Y')
scatter(X(1,:), X(2,:),'b')
hold on
scatter(Y(1,:), Y(2,:), 'r')
For more efficiency, I guess you can search for some package that efficiently solves triangular systems.
Here is my testing function:
function diff = svdtester()
y = rand(500,20);
[U,S,V] = svd(y);
%{
y = sprand(500,20,.1);
[U,S,V] = svds(y);
%}
diff_mat = y - U*S*V';
diff = mean(abs(diff_mat(:)));
end
There are two very similar parts: one finds the SVD of a random matrix, the other finds the SVD of a random sparse matrix. Regardless of which one you choose to comment (right now the second one is commented-out), we compute the difference between the original matrix and the product of its SVD components and return that average absolute difference.
When using rand/svd, the typical return (mean error) value is around 8.8e-16, basically zero. When using sprand/svds, the typical return values is around 0.07, which is fairly terrible considering the sparse matrix is 90% 0's to start with.
Am I misunderstanding how SVD should work for sparse matrices, or is something wrong with these functions?
Yes, the behavior of svds is a little bit different from svd. According to MATLAB's documentation:
[U,S,V] = svds(A,...) returns three output arguments, and if A is m-by-n:
U is m-by-k with orthonormal columns
S is k-by-k diagonal
V is n-by-k with orthonormal columns
U*S*V' is the closest rank k approximation to A
In fact, usually k will be somethings about 6, so you will get rather "rude" approximation. To get more exact approximation specify k to be min(size(y)):
[U, S, V] = svds(y, min(size(y)))
and you will get error of the same order of magnitude as in case of svd.
P.S. Also, MATLAB's documentations says:
Note svds is best used to find a few singular values of a large, sparse matrix. To find all the singular values of such a matrix, svd(full(A)) will usually perform better than svds(A,min(size(A))).