Could you help explain me why please? I have tested a pair of data for their covariance and I got a positive result but why I input them together in linear regression, the p value is not significant?
Thank you very much
I am trying to test linear regression in excel and the output summary is not significant but when I pair the data, the covariance is positive
Related
I have a problem with classification (LDA classifier ).
I have 80 samples of training data (80x100) and 15 samples of testing data (15x100). classify function returns: The covariance matrix of each group in TRAINING must be positive definite.
Without knowing how your data looks like, all I can do is to suggest you a few solutions that may solve your problem. A non positive definite convariance matrix can be produced by many different factors:
linear dependence between two or more columns (you can get rid of as many columns that produce linear dependence as possible)
non-stationary data (in this case, you can use differences instead of levels because they grant stationarity)
columns with highly mismatching magnitude, for example a column with very big values and another one with very small values (rescale your columns so that all of them have approximately the same magnitude).
I have been looking into Spark's documentation but still couldn't find how to get covariance matrix after doing linear regression.
Given input training data, I did a very simple linear regression similar to this:
val lr = new LinearRegression()
val fit = lr.fit(training)
Getting regression parameters is as easy as fit.coefficients but there seems to be no information on how to get covariance matrix.
And just to clarify, I am looking for function similar to vcov in R. With this, I should be able to do something like vcov(fit) to get the covariance matrix. Any other methods that can help to achieve this are okay too.
EDIT
The explanation on how to get covariance matrix from linear regression is discussed in detail here. Standard deviation is easy to get as it is provided by fit.summary.meanSsquaredError. However, the parameter (X'X)-1 is hard to get. It would be interesting to see if this can be used to somehow calculate the covariance matrix.
Although the whole covariance matrix is collected on the driver, it is not possible to obtain it without making your own solver. You can do that by copying WLS and setting additional "getters".
Closest you can get without digging into the code is lrModel.summary.coefficientStandardErrors that is based on diagonal of inverted matrix (A^T * W * A) which is based on upper triangular matrix (covariance).
I don't think that is enough so sorry about that.
In Matlab, while trying to do PCA, is there a difference when using the princomp command for positive vs. negative data values? Is it a non-negative definite command? Thank you!
The condition of non-negative defininiteness in PCA does not refer to the data (that wouldn't make sense) but to the covariance matrix estimated from the data. princomp estimates the covariance matrix internally from your data, and such an estimate is always guaranteed to be non-negative definite, no matter what you data look like.
Can someone help me out with this problem. I have trying to figure this out from a long time.
I have a training_Set: <1530*270400 double>
and Test_Set: <4794*270400 double>
I am using Linear discriminant analysis method
class = classify(Test_Set,Training_Set,train_label,'linear')
Error using classify (line 228)
The pooled covariance matrix of TRAINING must be positive definite.
In order for the covariance matrix of TRAINING to be positive definite, you must at the very least have more observations than variables in Test_Set. In your case, it seems as though you have many more variables (270400) than observations (1530). You can try dimension reduction before classifying.
I have answered a very similar question here: Matlab bug with linear discriminant analysis
I have a data matrix A (with dependencies between columns) of which I estimate the covariance matrix S. I now want to use this covariance matrix to simulate a new matrix A_sim. Since I assume that the underlying data generator of A was gaussian, I can simply sample from a gaussian specified by S. I do that in matlab as follows:
A_sim = randn(size(A))*chol(S);
However, the values in A_sim are way larger than in A. if I scale down S by a factor of 100, A_sim looks much better. I am now looking for a way to determine this scaling factor in a principled way. can anyone give advise or suggest literature that might be helpful?
Matlab has the function mvnrnd which generates multivariate random variables for you.