Hi I am running a linear regression i.e. y on x. both variables are in units. I have scaled both variables between 0 and 1. In other words Min-Max scaling is applied on y and x. I am getting a significant coefficient value of 0.5. The question is what is the appropriate way to interpret 0.5:
1). 0.5 unit increase in y due to 0.1 unit increase in x?
OR
2). Since both y and x are between 0 and 1, can we interpret it in percentage terms i.e. 0.5% increase in y due to 1% increase in x?
Thanks for your comments and feedback.
Related
So I understand that scipy.stats.pearsonr() produces a p-value and correlation coefficient of the given observations. By default, it performs a 2-tailed test but when I change this to one-tailed, I get a wildly different result especially when testing the less than 0 tail.
Input:
scipy.stats.pearsonr(x, y)
scipy.stats.pearsonr(x, y, alternative='less')
scipy.stats.pearsonr(x, y, alternative='greater')
Output:
PearsonRResult(statistic=-0.199716724929402, pvalue=1.8904897940408397e-05)
PearsonRResult(statistic=-0.199716724929402, pvalue=0.9999905475510298)
PearsonRResult(statistic=-0.199716724929402, pvalue=9.452448970204198e-06)
So the Pearson correlation coefficient between the arrays is negative and significantly not 0 in a 2-tailed test, but then not significantly negative but also significantly positive?
I am trying to make a small angle approximation in MATLAB's symbolic toolbox. This is being used for the equations of motion in a spacecraft control simulation (and yes, I need to linearize, I can't leave them in their more exact form). For those unfamiliar, the small quantity approximation does a few main things that I need. For small quantities delta and gamma,
delta times gamma is approximately 0
delta^2 is approximately 0 (same with higher powers)
sin(delta) is approximately delta
cos(delta) is approximately 1
I have tried using MATLAB's taylor function (link here), but it doesn't seem to be doing what I want except in a very specific scenario (which I am sure is coincidental anyway). A test case is presented below:
syms psiX psiY psiZ rGMag mu Ixx Iyy Izz
QLB = [1,psiZ,-psiY;-psiZ,1,psiX;psiY,-psiX,1]; %linearized version of the rotation matrix from the L frame to the B frame
rG_LVLH = [0;0;rGMag]; %magnitude of the rG vector expressed in the L frame
rG = QLB*rG_LVLH
G = 3*mu/rGMag^5 .* [rG(2)*rG(3)*(Izz-Iyy);rG(1)*rG(3)*(Ixx-Izz);rG(1)*rG(2)*(Iyy-Ixx)]; %gravity-gradient torque
The desired output of the above should have the G vector with a 0 in the third component and symbolic variables left in the other two. This particular example doesn't include a trigonometric example, but I can provide if necessary. Thanks.
I'm building up on my preivous question because there is a further issue.
I have fitted in Matlab a normal distribution to my data vector: PD = fitdist(data,'normal'). Now I have a new data point coming in (e.g. x = 0.5) and I would like to calculate its probability.
Using cdf(PD,x) will not work because it gives the probability that the point is smaller or equal to x (but not exactly x). Using pdf(PD,x) gives just the densitiy but not the probability and so it can be greater than one.
How can I calculate the probability?
If the distribution is continuous then the probability of any point x is 0, almost by definition of continuous distribution. If the distribution is discrete and, furthermore, the support of the distribution is a subset of the set of integers, then for any integer x its probability is
cdf(PD,x) - cdf(PD,x-1)
More generally, for any random variable X which takes on integer values, the probability mass function f(x) and the cumulative distribution F(x) are related by
f(x) = F(x) - F(x-1)
The right hand side can be interpreted as a discrete derivative, so this is a direct analog of the fact that in the continuous case the pdf is the derivative of the cdf.
I'm not sure if matlab has a more direct way to get at the probability mass function in your situation than going through the cdf like that.
In the continuous case, your question doesn't make a lot of sense since, as I said above, the probability is 0. Non-zero probability in this case is something that attaches to intervals rather than individual points. You still might want to ask for the probability of getting a value near x -- but then you have to decide on what you mean by "near". For example, if x is an integer then you might want to know the probability of getting a value that rounds to x. That would be:
cdf(PD, x + 0.5) - cdf(PD, x - 0.5)
Let's say you have a random variable X that follows the normal distribution with mean mu and standard deviation s.
Let F be the cumulative distribution function for the normal distribution with mean mu and standard deviation s. The probability the random variableX falls between a and b, that is P(a < X <= b) = F(b) - F(a).
In Matlab code:
P_a_b = normcdf(b, mu, s) - normcdf(a, mu, s);
Note: observe that the probability X is exactly equal to 0.5 (or any specific value) is zero! A range of outcomes will have positive probability, but an insufficient sum of individual outcomes will have probability zero.
I am working in image classification. I am using an information that called prior probability (in Bayesian rule). It has range in [0,1]. And it requires computing in logarithm. However, as you know, logarithm of zero number is Inf.
For example, given an pixel x in image I (size 3 by 3) with an cost function such as
Cost(x)=30+log(prior(x))
where prior is an matrix 3 by 3
prior=[ 0 0 0.5;
1 1 0.2;
0.4 0 0]
I =[ 1 2 3;
4 5 6;
7 8 9]
I want to compute cost of x=1 then
cost(x=1)=30+log(0)
Now, log(0) is Inf. Then result cost(x=1) also Inf. Based on my assumption that prior=0 that mean the given pixel belongs to background, and prior=1 that mean the given pixel belongs to foreground.
My question is that how to compute log(prior) satisfy my assumption.
I am using Matlab to do it. I think that log(0) becomes very small negative value. And I just set it is -9 as my code
%% Handle with log(0)
prior(prior==0.0) = NaN;
%% Compute log
log_prior=log(prior);
%% Assume that e^-9 very near 0.
log_prior(isnan(log_prior)) = -9;
UPDATE: To make clearly what I am doing. Let see the Bayesian rule. My task is that how to assign an given pixel x belongs to Background (BG) or Foreground (FG). It will depends on the probability
P(x∈BG|x)=P(x|x∈BG)P(x∈BG)/P(x)
In which P(x|x∈BG) is likelihood function and assume that it is approximated by Gaussian distribution, P(x∈BG) is prior term and P(x) can be ignore due to it is const
Using Maximum-a-Posteriori (MAP) Estimation we can map the above equation in to log space (to resolve exponential in Gaussian function)
Cost(x)=log(P(x∈BG|x))=log(P(x|x∈BG))+log(P(x∈BG))
To make simple, let assume log(P(x|x∈BG))=30, log(P(x∈BG)) is log(prior) then my cost function can rewritten as
Cost(x)=30+log(prior(x))
Now problem is that prior is within [0,1] then it logarithm is -Inf. As the chepner said, we can add eps value as
log(prior+eps)
However, log(eps) is very a lager negative number. It will be affected my cost function (also becomes very large negative number). Then the first term in my cost function (30) becomes not necessary. Based on my assumption that log(x)=1 then the pixel x will be BG and prior(x)=1 will be FG. How to make handle with my log(prior) when I compute my cost function?
The correct thing to do, before fiddling with Matlab, is to try to understand your problem. Ask yourself "what does it mean for the prior probability to vanish?". The answer is given by Bayes theorem, one form of which is:
posterior = likelihood * prior / normalization
So places where the prior is nil are, by definition, places where you are certain that your events (the things whose probabilities you are computing) cannot happen, regardless of their apparent likelihood (i.e. "cost"). So they are not interesting for you. You just recognize that and skip them.
I have this doubt about the ridge regression in matlab. They have mentioned at http://www.mathworks.com/help/stats/ridge.html, that ridge regression actually mean centers and make the std equal to 1 for the predictors. However, I could see that it doesn't. For e.g.
Let my x be
1 1 2
1 3 5
1 9 12
1 12 50
Let my y be
1
2
3
4
It doesn't do any normalization of the xs to 0 mean and unit variance. Any clarifications what's going on? I mean ridge should do normalization of the data i.e x to 0 mean and unit variance and then calculate the coefficients. I was expecting Ridge(y,x,0,0) to give me result of R=inv(x'*x)*x'y where R takes x and y normalized
The output must be the same, ridge regression only makes the calculation more stable numerically (less sensitive to multicollinearity).
== UPDATE ==
Now I understand better what you ask :) The documentation says:
b = ridge(y,X,k,scaled) uses the {0,1}-valued flag scaled to determine
if the coefficient estimates in b are restored to the scale of the
original data. ridge(y,X,k,0) performs this additional transformation.
You've set both the third and the fourth parameters to 0, which means that the ridge parameter is zero, and the result won't be scaled, so it should be the same as what you get with inv(x'*x)*x'y (this is what the ridge regression formula becomes if the ridge parameter k is set to 0).