I run
Y_testing_obtained = classify(X_testing, X_training, Y_training);
and the error I get is
Error using ==> classify at 246
The pooled covariance matrix of TRAINING must be positive definite.
X_training is 1550 x 5 matrix. Can you please tell me what this error means, i.e. why is it appearing, and how to work around it?
Thanks
Explanation: When you run the function classify without specifying the type of discriminant function (as you did), Matlab uses Linear Discriminant Analysis (LDA). Without going into too much details on LDA, the algorithms needs to calculate the covariance matrix of X_testing in order to solve an optimisation problem, and this matrix has to be positive definite (see Wikipedia: Positive-definite matrix). The underlying assumption is that your data is represented by a multivariate probability distribution, which always has a positive definite covariance matrix unless one or more variables are exact linear combinations of the others.
To solve your problem: It is possible that one of your variables is a linear combination of the others. You can try selecting a sensible subset of your variables, or perform Principal Component Analysis (PCA) on the training data and then classify using the first few principal components. Or, you could specify the type of discriminant function and choose one of the two naive Bayes classifiers, for example:
Y_testing_obtained = classify(X_testing, X_training, Y_training, 'diaglinear');
As a side note, you also need to have more observations (rows) than variables (columns), but in your case this is not the problem as you seem to have 1550 observations and 5 variables.
Finally, you can also have a look at the answers posted to a similar question on the Matlab forum.
Try regularizing the data using cvshrink function in Matlab
Related
I am trying to run a standard Kalman Filter algorithm to calculate likelihoods, but I keep getting a problema of a non positive definite variance matrix when calculating normal densities.
I've researched a little and seen that there may be in fact some numerical instabitlity; tried some numerical ways to avoid a non-positive definite matrix, using both choleski decomposition and its variant LDL' decomposition.
I am using MatLab.
Does anyone suggest anything?
Thanks.
I have encountered what might be the same problem before when I needed to run a Kalman filter for long periods but over time my covariance matrix would degenerate. It might just be a problem of losing symmetry due to numerical error. One simple way to enforce your covariance matrix (let's call it P) to remain symmetric is to do:
P = (P + P')/2 # where P' is transpose(P)
right after estimating P.
post your code.
As a rule of thumb, if the model is not accurate and the regularization (i.e. the model noise matrix Q) is not sufficiently "large" an underfitting will occur and the covariance matrix of the estimator will be ill-conditioned. Try fine tuning your Q matrix.
The Kalman Filter implemented using the Joseph Form is known to be numerically unstable, as any old timer who once worked with single precision implementation of the filter can tell. This problem was discovered zillions of years ago and prompt a lot of research in implementing the filter in a stable manner. Probably the best well-known implementation is the UD, where the Covariance matrix is factorized as UDU' and the two factors are updated and propagated using special formulas (see Thoronton and Bierman). U is an upper diagonal matrix with "1" in its diagonal, and D is a diagonal matrix.
When solving the log likelihood expression for autoregressive models, I cam across the variance covariance matrix Tau given under slide 9 Parameter estimation of time series tutorial. Now, in order to use
fminsearch
to maximize the likelihood function expression, I need to express the likelihood function where the variance covariance matrix arises. Can somebody please show with an example how I can implement (determinant of Gamma)^-1/2 ? Any other example apart from autoregressive model will also do.
How about sqrt(det(Gamma)) for the sqrt-determinant and inv(Gamma) for inverse?
But if you do not want to implement it yourself you can look at yulewalkerarestimator
UPD: For estimation of autocovariance matrix use xcov
also, this topic is a bit more explained here
I performed PCA on a 63*2308 matrix and obtained a score and a co-efficient matrix. The score matrix is 63*2308 and the co-efficient matrix is 2308*2308 in dimensions.
How do i extract the column names for the top 100 features which are most important so that i can perform regression on them?
PCA should give you both a set of eigenvectors (your co-efficient matrix) and a vector of eigenvalues (1*2308) often referred to as lambda). You might been to use a different PCA function in matlab to get them.
The eigenvalues indicate how much of your data each eigenvector explains. A simple method for selecting features would be to select the 100 features with the highest eigen values. This gives you a set of feature which explain most of the variance in the data.
If you need to justify your approach for a write up you can actually calculate the amount of variance explained per eigenvector and cut of at, for example, 95% variance explained.
Bear in mind that selecting based solely on eigenvalue, might not correspond to the set of features most important to your regression, so if you don't get the performance you expect you might want to try a different feature selection method such as recursive feature selection. I would suggest using google scholar to find a couple of papers doing something similar and see what methods they use.
A quick matlab example of taking the top 100 principle components using PCA.
[eigenvectors, projected_data, eigenvalues] = princomp(X);
[foo, feature_idx] = sort(eigenvalues, 'descend');
selected_projected_data = projected(:, feature_idx(1:100));
Have you tried with
B = sort(your_matrix,2,'descend');
C = B(:,1:100);
Be careful!
With just 63 observations and 2308 variables, your PCA result will be meaningless because the data is underspecified. You should have at least (rule of thumb) dimensions*3 observations.
With 63 observations, you can at most define a 62 dimensional hyperspace!
I'm not too familiar with MATLAB or computational mathematics so I was wondering how I might solve an equation involving the sum of squares, where each term involves two vectors- one known and one unknown. This formula is supposed to represent the error and I need to minimize the error. I think I'm supposed to use least squares but I don't know too much about it and I'm wondering what function is best for doing that and what arguments would represent my equation. My teacher also mentioned something about taking derivatives and he formed a matrix using derivatives which confused me even more- am I required to take derivatives?
The problem that you must be trying to solve is
Min u'u = min \sum_i u_i^2, u=y-Xbeta, where u is the error, y is the vector of dependent variables you are trying to explain, X is a matrix of independent variables and beta is the vector you want to estimate.
Since sum u_i^2 is diferentiable (and convex), you can evaluate the minimal of this expression calculating its derivative and making it equal to zero.
If you do that, you find that beta=inv(X'X)X'y. This maybe calculated using the matlab function regress http://www.mathworks.com/help/stats/regress.html or writing this formula in Matlab. However, you should be careful how to evaluate the inverse (X'X) see Most efficient matrix inversion in MATLAB
This question has already confused me several days. While I referred to senior students, they also cannot give a reply.
We have ten ODEs, into which each a noise term should be added. The noise is defined as follows. since I always find that I cannot upload a picture, the formula below maybe not very clear. In order to understand, you can either read my explanation or go the this address: Plos one. You could find the description of the equations directly above the Support Information in this address
The white noise term epislon_i(t) is assumed with Gaussian distribution. epislon_i(t) means that for equation i, and at t timepoint, the value of the noise.
the auto-correlation of noise are given:
(EQ.1)
where delta(t) is the Dirac delta function and the diffusion matrix D is defined by
(EQ.2)
Our problem focuses on how to explain the Dirac delta function in the diffusion matrix. Since the property of Dirac delta function is delta(0) = Inf and delta(t) = 0 if t neq 0, we don't know how to calculate the epislonif we try to sqrt of 2D(x, t)delta(t-t'). So we simply assume that delta(0) = 1 and delta(t) = 0 if t neq 0; But we don't know whether or not this is right. Could you please tell me how to use Delta function of diffusion equation in MATLAB?
This question associates with the stochastic process in MATLAB. So we review different stochastic process to inspire our ideas. In MATLAB, the Wienner process is often defined as a = sqrt(dt) * rand(1, N). N is the number of steps, dt is the length of the steps. Correspondingly, the Brownian motion can be defined as: b = cumsum(a); All of these associate with stochastic process. However, they doesn't related to the white noise process which has a constraints on the matrix of auto-correlation, noted by D.
Then we consider that, we may simply use randn(1, 10) to generate a vector representing the noise. However, since the definition of the noise must satisfy the equation (2), this cannot enable noise term in different equation have the predefined partial correlation (D_ij). Then we try to use mvnrnd to generate a multiple variable normal distribution at each time step. Unfortunately, the function mvnrnd in MATLAB return a matrix. But we need to return a vector of length 10.
We are rather confused, so could you please give me just a light? Thanks so much!
NOTE: I see two hazy questions in here: 1) how to deal with a stochastic term in a DE and 2) how to deal with a delta function in a DE. Both of these are math related questions and http://www.math.stackexchange.com will be a better place for this. If you had a question pertaining to MATLAB, I haven't been able to pin it down, and you should perhaps add code examples to better illustrate your point. That said, I'll answer the two questions briefly, just to put you on the right track.
What you have here are not ODEs, but Stochastic differential equations (SDE). I'm not sure how you're using MATLAB to work with this, but routines like ode45 or ode23 will not be of any help. For SDEs, your usual mathematical tools of separation of variables/method of characteristics etc don't work and you'll need to use Itô calculus and Itô integrals to work with them. The solutions, as you might have guessed, will be stochastic. To learn more about SDEs and working with them, you can consider Stochastic Differential Equations: An Introduction with Applications by Bernt Øksendal and for numerical solutions, Numerical Solution of Stochastic Differential Equations by Peter E. Kloeden and Eckhard Platen.
Coming to the delta function part, you can easily deal with it by taking the Fourier transform of the ODE. Recall that the Fourier transform of a delta function is 1. This greatly simplifies the DE and you can take an inverse transform in the very end to return to the original domain.