Scipy PearsonR p-value confusion when using the alternative hypothesis as less than 0 - scipy

So I understand that scipy.stats.pearsonr() produces a p-value and correlation coefficient of the given observations. By default, it performs a 2-tailed test but when I change this to one-tailed, I get a wildly different result especially when testing the less than 0 tail.
Input:
scipy.stats.pearsonr(x, y)
scipy.stats.pearsonr(x, y, alternative='less')
scipy.stats.pearsonr(x, y, alternative='greater')
Output:
PearsonRResult(statistic=-0.199716724929402, pvalue=1.8904897940408397e-05)
PearsonRResult(statistic=-0.199716724929402, pvalue=0.9999905475510298)
PearsonRResult(statistic=-0.199716724929402, pvalue=9.452448970204198e-06)
So the Pearson correlation coefficient between the arrays is negative and significantly not 0 in a 2-tailed test, but then not significantly negative but also significantly positive?

Related

How can I interpret correlation results?

I have some doubts about Pearson correlation in Matlab, especially in regard to the concept of p-value.
I have 2 vectors (A and B) and I computed Pearson correlation using corrcoeff function.
I have the following results:
Correlation
1 0.1219
0.1219 1
and relative p-value
1 0.3042
0.3042 1
What can I say about these 2 vectors?
I would say that they have low correlation for sure. But what about the p-value? (it's greater than 0.05)
The p-value is telling you that the correlation between the two variables measured in vectors A and B is not significantly different from 0 at a 0.3042 level.
What this p-value means is: if you conclude that the true (unknown) correlation between the variables is not 0, the probability of being wrong is 0.3042... which is usually interpreted as a large probability.
That's why, normally, such "high" p-values suggest that the analyst should NOT reject the hypothesis being tested (in this case the hypothesis is: "the correlation between the two analyzed variables is 0").
The underlying null hypothesis here is that there is no linear relationship between A and B (i.e. the correlation between A and B is 0). A p-value of 0.3042, as you pointed out, is greater than 0.05. This means at the significance level of 0.05, we fail to reject the null hypothesis (i.e. there is no evidence to suggest that the correlation between A and B is significantly different from 0). This is expected considering that the correlation between A and B is quite low at 0.1219.

Interpreting Min-Max Scaled Regression Coefficients

Hi I am running a linear regression i.e. y on x. both variables are in units. I have scaled both variables between 0 and 1. In other words Min-Max scaling is applied on y and x. I am getting a significant coefficient value of 0.5. The question is what is the appropriate way to interpret 0.5:
1). 0.5 unit increase in y due to 0.1 unit increase in x?
OR
2). Since both y and x are between 0 and 1, can we interpret it in percentage terms i.e. 0.5% increase in y due to 1% increase in x?
Thanks for your comments and feedback.

Determinant is showing infinity instead of zero! Why?

This is my matlab code I wrote for a problem I got as homework. after multiplication of A and its transpose the resulting square matrix should have determinant zero according all classmates as their codes (different one) gave them so. Why is my code not giving the determinant of c and d to be infinity
A = rand(500,1500);
b = rand(500,1);
c = (A.')*A;
detc = det(c);
cinv = inv((A.')*A);
d = A*(A.');
detd = det(d);
dinv = inv(A*(A.'));
x1 = (inv((A.')*A))*((A.')*b);
x2 = A.'*((inv(A*(A.')))*b);
This behavior is explained in the Limitations section of the det's documentation and exemplified in the Find Determinant of Singular Matrix subsection where it is stated:
The determinant of A is quite large despite the fact that A is singular. In fact, the determinant of A should be exactly zero! The inaccuracy of d is due to an aggregation of round-off errors in the MATLABĀ® implementation of the LU decomposition, which det uses to calculate the determinant.
That said, in this instance, you can produce your desired result by using the m-code implementation given on that same page but sorting the diagonal elements of U in an ascending matter. Consider the sample script:
clc();
clear();
A = rand(500,1500);
b = rand(500,1);
c = (A.')*A;
[L,U] = lu(c);
% Since det(L) is always (+/-)1, it doesn't impact anything
diagU = diag(U);
detU1 = prod(diagU);
detU2 = prod(sort(diagU,'descend'));
detU3 = prod(sort(diagU,'ascend'));
fprintf('Minimum: %+9.5e\n',min(abs(diagU)));
fprintf('Maximum: %+9.5e\n',max(abs(diagU)));
fprintf('Determinant:\n');
fprintf('\tNo Sort: %g\n' ,detU1);
fprintf('\tDescending Sort: %g\n' ,detU2);
fprintf('\tAscending Sort: %g\n\n',detU3);
This produces the output:
Minimum: +1.53111e-13
Maximum: +1.72592e+02
Determinant:
No Sort: Inf
Descending Sort: Inf
Ascending Sort: 0
Notice that the direction of the sort matters, and that no-sorting gives Inf since a true 0 doesn't exist on the diagonal. The descending sort sees the largest values multiplied first, and apparently, they exceed realmax and are never multiplied by a true 0, which would generate a NaN. The ascending sort clumps together all of the near-zero diagonal values with very few large negative values (in truth, a more robust method would sort based on magnitude, but that was not done here), and their multiplication generates a true 0 (meaning that the value falls below the smallest denormalized number available in IEEE-754 arithmetic) that produces the "correct" result.
All that written, and as others have implied, I'll quote original Matlab developer and Mathworks co-founder Cleve Moler:
[The determinant] is useful in theoretical considerations and hand calculations, but does not provide a sound basis for robust numerical software.
Ok. So the fact that det(A'*A) is not zero is not a good indication of the (non-)singularity of A'*A.
The determinant depends on the scaling, and matrix clearly non-singular can have very small determinant. For instance, the matrix
1/2 * I_n
where I_n is the nxn identity has a determinant of (1/2)^n which is converging (quickly) to 0 as n goes to infinity. But 1/2 * I_n is not, at all, singular.
For this reason, a best idea to check the singularity of a matrix is the condition number.
In you case, after doing some tests
>> A = rand(500, 1500) ;
>> det(A'*A)
ans =
Inf
You can see that the (computed) determinant is clearly non-zero. But this is actually not surprising, and it should not really bother you. The determinant is fairly hard to compute, so yes, it is just rounding errors. If you want a better approximation, you can do the following
>> s = eig(A'*A) ;
>> prod(s)
ans =
0
There, you see it is closer to zero.
The condition number, on the other hand, is a much better estimator of the (non-)singularity of a matrix. Here, it is
>> cond(A'*A)
ans =
1.4853e+20
And, since it is much larger than 1e+16, the matrix is clearly singular. The reason for 1e+16 is a bit tedious, but is mostly due to the computer precision when doing floating point computations.
I think this is pretty much just a rounding problem, the Inf does not mean you are getting Infinity as an answer, it's just that your determinant is really big and exceeded realmax. As Adiel said, A*A.' generates a symmetric matrix, and should have a numerical value for its determinant. for example, set:
A=rand(5,15)
and you should find that the det of A*A.' is just a numerical value.
SO how did your friends get a ZERO, well it's easy to get 0 or inf for det of large matrices (why are you doing this in the first place I have no clue). So I think they are just getting the same/similar rounding issue.

Merging multiple probability arrays in a Cartesian type of way in Matlab

I have a couple vectors where each entity denotes a probability value. For example, consider the following to vectors:
a=[0.7 0.3]
b=[0.1 0.9]
Consider a and b as vectors showing probability. For example, a denotes that a random variable will be 0 with probability 0.7, and it will be 1 with probability 0.3. Similarly, b represents another random variable that will be 0 with probability 0.1, and it will be 1 with probability 0.9
I want to compute a vector c where c captures the probability mass function of the sum of a and b by considering a and b are independent. In this example, c should be
c=[0.07 0.66 0.27]
In other words, c=0 when both a=0 and b=0 this happens with probability 0.7*0.1=0.07. c=1 when either a=0 and b=1 or a=1 and b=0. The first occurs with probability 0.7*0.9=0.63 and the second occurs with probability 0.3*0.1=0.03, so the sum is 0.63+0.03=0.66. Finally, the third entry of c corresponds to the case in which both a and b is equal to 1 with probability 0.3*0.9=0.27.
I want to write a code to compute c. In my application, there will be 30 of these a vectors with a length of 100 each. So, scalability definitely matters.
Many thanks!
For your simple example, you can use conv
c=conv(a,b)
however for your actual case it will be more complicated. You could repeatedly conv vectors like so (if your a vectors are rows of A)
A=a(1,:);
for i=2:30
A=conv(A,a(i,:));
end
(Note: This code works but I am not sure whether it will give you the correct results---this is not a topic I know much about, so be careful!)

Issues with calculating the determinant of a matrix

I am trying to calculate the determinant of the inverse of a matrix. The inverse of the matrix exists. However, when I try to calculate the determinant of the inverse, it gives me Inf value in matlab. What is the reason behind this?
Short answer: given A = inv(B), then det(A)==Inf may have two explanations:
an overflow during the numerical computation of the determinant,
one or more infinite elements in A.
In the first case your matrix is badly scaled so that det(B) may underflow and det(A) overflow. Remember that det(a*B) == a^N * det(B) where a is a scalar and B is a N times N matrix.
In the second case (i.e. nnz(A==inf)>0) matrix B may be "singular to working precision".
PS:
A matrix is nearly singular if it has a large condition number. (A small determinant has nothing to do with singularity, since the magnitude of the determinant itself is affected by scaling.).
A matrix is singular to working precision if it has a zero pivot in the Gaussian elimination: when computing the inverse, matlab has to calculate 1/0 which returns Inf.
In fact in Matlab overflow and zero-division exceptions are not caught, so that, according to IEEE 754, an Inf value is propagated.