I have some data points with errors in both the x and y coordinates on these data points. I therefore want to use python's ODR tool to compute the best-fit slope and the error on this slope. I have tried doing it for my actual data but do not find good results. Therefore, I have first tried to use ODR with a simple example as follows:
import numpy as np
import matplotlib.pyplot as plt
from scipy.odr import *
def linear_func(B, x):
return B[0]*x+B[1]
x_data=np.array([0.0, 1.0, 2.0, 3.0])
y_data=np.array([0.0, 1.0, 2.0, 3.0])
x_err=np.array([1.0, 1.0, 1.0, 1.0])
y_err=np.array([5.0, 5.0, 5.0, 5.0])
linear=Model(linear_func)
data=RealData(x_data, y_data, sx=x_err, sy=y_err)
odr=ODR(data, linear, beta0=[1.0, 0.0])
out=odr.run()
out.pprint()
The pprint() line gives:
Beta: [ 1. 0.]
Beta Std Error: [ 0. 0.]
Beta Covariance: [[ 5.20000039 -7.80000026]
[ -7.80000026 18.1999991 ]]
Residual Variance: 0.0
Inverse Condition #: 0.0315397386692
Reason(s) for Halting:
Sum of squares convergence
The resutling Beta values are shown to be 1.0 and 0.0, which I would epect. But why are the standard errors, Beta Std Error, also both zero if my errors on the data points are quite large? Can anyone offer some insight?
I see no discrepancy here. Your example model fits your data perfectly, so the weights you pass to the data do not matter. Moreover, your initial guess beta0=[1.0, 0.0] is a parameter vector giving an optimal solution, so the ODR machinery can not find an iterative improvement of the parameters and quits after zero iterations. The associated errors are zero because for a given data the solution found is infinitely better than any other solution possible because your sum of squares at B=[1, 0] is zero.
To see the what actually happens inside ODR.run() function, add odr.set_iprint(init=2, iter=2, final=2) before you run the regression. In particular, the following output confirms that ODR reaches the stopping condition immediately:
--- STOPPING CONDITIONS:
INFO = 1 ==> SUM OF SQUARES CONVERGENCE.
NITER = 0 (NUMBER OF ITERATIONS)
Note how the errors will not be zero, and NITER will be an integer number if either your x_data is unequal to y_data or if beta0 does not match the optimal solution. In that case, the errors returned by ODR will be nonzero, although still incredibly small.
Related
I'd like to build a GP with marginalized hyperparameters.
I have seen that this is possible with the HMC sampler provided in gpflow from this notebook
However, when I tried to run the following code as a first step of this (NOTE this is on gpflow 0.5, an older version), the returned samples are negative, even though the lengthscale and variance need to be positive (negative values would be meaningless).
import numpy as np
from matplotlib import pyplot as plt
import gpflow
from gpflow import hmc
X = np.linspace(-3, 3, 20)
Y = np.random.exponential(np.sin(X) ** 2)
Y = (Y - np.mean(Y)) / np.std(Y)
k = gpflow.kernels.Matern32(1, lengthscales=.2, ARD=False)
m = gpflow.gpr.GPR(X[:, None], Y[:, None], k)
m.kern.lengthscales.prior = gpflow.priors.Gamma(1., 1.)
m.kern.variance.prior = gpflow.priors.Gamma(1., 1.)
# dont want likelihood be a hyperparam now so fixed
m.likelihood.variance = 1e-6
m.likelihood.variance.fixed = True
m.optimize(maxiter=1000)
samples = m.sample(500)
print(samples)
Output:
[[-0.43764571 -0.22753325]
[-0.50418501 -0.11070128]
[-0.5932655 0.00821438]
[-0.70217714 0.05077999]
[-0.77745654 0.09362291]
[-0.79404456 0.13649446]
[-0.83989415 0.27118385]
[-0.90355789 0.29589641]
...
I don't know too much in detail about HMC sampling but I would expect that the sampled posterior hyperparameters are positive, I've checked the code and it seems maybe related to the Log1pe transform, though I failed to figure it out myself.
Any hint on this?
It would be helpful if you specified which GPflow version you are using - especially given that from the output you posted it looks like you are using a really old version of GPflow (pre-1.0), and this is actually something that got improved since. What is happening here (in old GPflow) is that the sample() method returns a single array S x P, where S is the number of samples, and P is the number of free parameters [e.g. for a M x M matrix parameter with lower-triangular transform (such as the Cholesky of the covariance of the approximate posterior, q_sqrt), only M * (M - 1)/2 parameters are actually stored and optimised!]. These are the values in the unconstrained space, i.e. they can take any value whatsoever. Transforms (see gpflow.transforms module) provide the mapping between this value (between plus/minus infinity) and the constrained value (e.g. gpflow.transforms.positive for lengthscales and variances). In old GPflow, the model provides a get_samples_df() method that takes the S x P array returned by sample() and returns a pandas DataFrame with columns for all the trainable parameters which would be what you want. Or, ideally, you would just use a recent version of GPflow, in which the HMC sampler directly returns the DataFrame!
I am trying to produce a random distribution where I control the mean, SD, skewness and kurtosis.
I can solve the mean and SD with some simple maths after the distribution is produced.
Kurtosis I am leaving on the shelf for the moment because it just seems too hard.
Skewness is today's problem.
import scipy.stats
def convert_to_alpha(s):
d=(np.pi/2*((abs(s)**(2/3))/(abs(s)**(2/3)+((4-np.pi)/2)**(2/3))))**0.5
a=((d)/((1-d**2)**.5))
return(a)
for skewness_expected in (.5, .9, 1.3):
alpha = convert_to_alpha(skewness_expected)
r = stats.skewnorm.rvs(alpha,size=10000)
print('Skewness expected:',skewness_expected)
print('Skewness obtained:',stats.skew(r))
print()
Skewness expected: 0.5
Skewness obtained: 0.47851348006629035
Skewness expected: 0.9
Skewness obtained: 0.8917020428586827
Skewness expected: 1.3
Skewness obtained: (1.2794406116842627+0.01780402125888404j)
I understand that the calculated skewness will generally not match the desired skewness - this is a random distribution, after all. But I am confused as to how I can get a distribution with a skewness > 1 without falling into complex number territory. The rvs method appears incapable of handling it, since the parameter alpha is an imaginary number whenever skewness > 1.
How can I fix it so that I can generate distributions with skewness > 1, but not have complex numbers creeping in?
[With credit to Warren Weckesser for pointing me at Wikipedia in order to write the convert_to_alpha function.]
Understand this thread is a year and a half old now, but I've run into this problem recently as well and it never seemed to get answered here. The further problem with converting between alpha from stats.skewnorm and the skewness statistic (excellent function to do that by the way) is that doing so will also alter the measures of central tendency for the distribution, which was problematic for my needs.
I've developed this based on the F-distribution (https://en.wikipedia.org/wiki/F-distribution). The end result of a lot of work is this function for which you specify the mean, SD and skewness required, and desired sample size. I can share the work behind it if anyone wishes. The output SD and skew become a little rough at extreme settings. Presumably because the F-distribution naturally sits around 1. It is also very problematic for skew values close to zero, in which case there would be no need for this function anyway.
from scipy import stats
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
def createSkewDist(mean, sd, skew, size):
# calculate the degrees of freedom 1 required to obtain the specific skewness statistic, derived from simulations
loglog_slope=-2.211897875506251
loglog_intercept=1.002555437670879
df2=500
df1 = 10**(loglog_slope*np.log10(abs(skew)) + loglog_intercept)
# sample from F distribution
fsample = np.sort(stats.f(df1, df2).rvs(size=size))
# adjust the variance by scaling the distance from each point to the distribution mean by a constant, derived from simulations
k1_slope = 0.5670830069364579
k1_intercept = -0.09239985798819927
k2_slope = 0.5823114978219056
k2_intercept = -0.11748300123471256
scaling_slope = abs(skew)*k1_slope + k1_intercept
scaling_intercept = abs(skew)*k2_slope + k2_intercept
scale_factor = (sd - scaling_intercept)/scaling_slope
new_dist = (fsample - np.mean(fsample))*scale_factor + fsample
# flip the distribution if specified skew is negative
if skew < 0:
new_dist = np.mean(new_dist) - new_dist
# adjust the distribution mean to the specified value
final_dist = new_dist + (mean - np.mean(new_dist))
return final_dist
'''EXAMPLE'''
desired_mean = 497.68
desired_skew = -1.75
desired_sd = 77.24
final_dist = createSkewDist(mean=desired_mean, sd=desired_sd, skew=desired_skew, size=1000000)
# inspect the plots & moments, try random sample
fig, ax = plt.subplots(figsize=(12,7))
sns.distplot(final_dist, hist=True, ax=ax, color='green', label='generated distribution')
sns.distplot(np.random.choice(final_dist, size=100), hist=True, ax=ax, color='red', hist_kws={'alpha':.2}, label='sample n=100')
ax.legend()
print('Input mean: ', desired_mean)
print('Result mean: ', np.mean(final_dist),'\n')
print('Input SD: ', desired_sd)
print('Result SD: ', np.std(final_dist),'\n')
print('Input skew: ', desired_skew)
print('Result skew: ', stats.skew(final_dist))
Input mean: 497.68
Result mean: 497.6799999999999
Input SD: 77.24
Result SD: 71.69030764848961
Input skew: -1.75
Result skew: -1.6724486459469905
The shape parameter of the skew-normal distribution is not the skewness of the distribution. Check out the wikipedia page for the skew normal distribution. The formulas in the table on the right give the expressions for the mean, variance, skewness, etc., in terms of the parameters. You can get these values from the skewnorm object with the stats() method.
For example, here's the skewness of the distribution with shape parameter 2:
In [46]: from scipy.stats import skewnorm, skew
In [47]: skewnorm.stats(2, moments='s')
Out[47]: array(0.45382556395938217)
Generate a couple samples and find the sample skewness:
In [48]: r = skewnorm.rvs(2, size=10000000)
In [49]: skew(r)
Out[49]: 0.4533209955299838
In [50]: r = skewnorm.rvs(2, size=10000000)
In [51]: skew(r)
Out[51]: 0.4536583726840712
I want to run a regression analysis on below data, here x1 and x2 produce y value. But in that case, y value is fixed in all time. So regression will not happen. But why? Need explanation.
Your training set shows that the coefficients are all ~0 and the constant is 5. There's no more information in that dataset, you don't need regression to show that.
You did not specify what kind of regression you are running. Depending on the type of regression you are using, you will need the matrices to be invertible and not be related linearly.
It seems to work using normal equation (with expected results):
import numpy as np
import matplotlib.pyplot as plt
input = np.array([
[2,3,5],
[1,2,5],
[4,2,5],
[1,7,5],
[1,9,5]
])
m = len(input)
X = np.array([np.ones(m), input[:, 0],input[:, 1]]).T # Add Constant to X
y = np.array(input[:, 2]).reshape(-1, 1) # Get the dependant values
betaHat = np.linalg.solve(X.T.dot(X), X.T.dot(y)) # Calculate coefficients
print(betaHat) # Show Constant and coefficients (in that order)
[[ 5.00000000e+00]
[ 5.29208238e-16]
[ 4.32685981e-17]]
I've been making a routine which measures the phase difference between two spectra using NumPy/Scipy.
I already had the routine written in Matlab, so I basically re-implemented the function and the corresponding unit test using NumPy. However, I found that the unit test fails because scipy.fftpack.fft is introducing some small numerical errors:
import numpy as np
import scipy.fftpack.fft
x = np.array([0.0, 1.0, 2.0, 3.0, 4.0, 3.0, 2.0, 1.0])
X = scipy.fftpack.fft(x)
In this case, since the time-domain signal is symmetric, the expected output is
[16.0000 -6.8284 0 -1.1716 0 -1.1716 0 -6.8284]
as shown in the following Matlab code:
>> x = [0.0, 1.0, 2.0, 3.0, 4.0, 3.0, 2.0, 1.0];
>> X = fft(x)
X =
16.0000 -6.8284 0 -1.1716 0 -1.1716 0 -6.8284
The result should not contain any imaginary components, based on DSP theory. However, the scipy result is as follows:
array([ 16.00000000 +0.00000000e+00j, -6.82842712 -2.22044605e-16j,
0.00000000 -0.00000000e+00j, -1.17157288 -2.22044605e-16j,
0.00000000 +0.00000000e+00j, -1.17157288 +2.22044605e-16j,
0.00000000 +0.00000000e+00j, -6.82842712 +2.22044605e-16j])
Why does scipy.fftpack.fft introduce small imaginary components? I really want to avoid this issue. Could anyone give me a suggestion?
For one thing, scipy.fftpack.fft is guaranteed to always return a complex result, whereas the result of MATLAB's fft function is sometimes real and sometimes complex, depending on whether there is a non-zero imaginary component. However, that doesn't explain why the result of scipy.fftpack.fft actually contains non-zero imaginary components, whereas the result of MATLAB's fft function does not.
I suspect that the underlying reason for the difference has to do with the fact that MATLAB's fft function is apparently based on FFTW, whereas scipy and numpy use FFTPACK due to licensing restrictions.
pyfftw, however, does provide Python bindings to FFTW. If we compare the imaginary components of the results for FFTPACK and FFTW:
from pyfftw.interfaces import scipy_fftpack as fftw
Fx1 = fftpack.fft(x)
print(Fx1.imag)
# [ 0.00000000e+00 -2.22044605e-16 -0.00000000e+00 -2.22044605e-16
# 0.00000000e+00 2.22044605e-16 0.00000000e+00 2.22044605e-16]
print(Fx1.imag == 0)
# [ True False True False True False True False]
Fx2 = fftw.fft(x)
print(Fx2.imag)
# [ 0. 0. 0. 0. 0. 0. 0. 0.]
print(Fx2.imag == 0)
# [ True True True True True True True True]
we see that the imaginary component of the FFTW result compares exactly equal to zero, whereas FFTPACK has a tiny amount of floating-point rounding error.
Beyond that, I have no idea why FFTW's implementation suffers less from rounding error than FFTPACK, but in any case it's important to note that these rounding errors are small enough that they normally don't cause problems (you know you shouldn't be testing for exact equality between float values, right?).
Usually you would simply take the real component of the result, e.g.:
scipy.fftpack.fft(x).real
If these errors are a problem then you could switch to using pyfftw instead of numpy/scipy, but if your code is that sensitive to rounding error then it probably means you're doing something wrong anyway.
I'm using the PRTools MATLAB library to train some classifiers, generating test data and testing the classifiers.
I have the following details:
N: Total # of test examples
k: # of
mis-classification for each
classifier and class
I want to do:
Calculate and plot Bayesian posterior distributions of the unknown probabilities of mis-classification (denoted q), that is, as probability density functions over q itself (so, P(q) will be plotted over q, from 0 to 1).
I have that (math formulae, not matlab code!):
Posterior = Likelihood * Prior / Normalization constant =
P(q|k,N) = P(k|q,N) * P(q|N) / P(k|N)
The prior is set to 1, so I only need to calculate the likelihood and normalization constant.
I know that the likelihood can be expressed as (where B(N,k) is the binomial coefficient):
P(k|q,N) = B(N,k) * q^k * (1-q)^(N-k)
... so the Normalization constant is simply an integral of the posterior above, from 0 to 1:
P(k|N) = B(N,k) * integralFromZeroToOne( q^k * (1-q)^(N-k) )
(The Binomial coefficient ( B(N,k) ) can be omitted though as it appears in both the likelihood and normalization constant)
Now, I've heard that the integral for the normalization constant should be able to be calculated as a series ... something like:
k!(N-k)! / (N+1)!
Is that correct? (I have some lecture notes with this series, but can't figure out if it is for the normalization constant integral, or for the overall distribution of mis-classification (q))
Also, hints are welcome as how to practically calculate this? (factorials are easily creating truncation errors right?) ... AND, how to practically calculate the final plot (the posterior distribution over q, from 0 to 1).
I really haven't done much with Bayesian posterior distributions ( and not for a while), but I'll try to help with what you've given. First,
k!(N-k)! / (N+1)! = 1 / (B(N,k) * (N + 1))
and you can calculate the binomial coefficients in Matlab with nchoosek() though it does say in the docs that there can be accuracy problems for large coefficients. How big are N and k?
Second, according to Mathematica,
integralFromZeroToOne( q^k * (1-q)^(N-k) ) = pi * csc((k-N)*pi) * Gamma(1+k)/(Gamma(k-N) * Gamma(2+N))
where csc() is the cosecant function and Gamma() is the gamma function. However, Gamma(x) = (x-1)! which we'll use in a moment. The problem is that we have a function Gamma(k-N) on the bottom and k-N will be negative. However, the reflection formula will help us with that so that we end up with:
= (N-k)! * k! / (N+1)!
Apparently, your notes were correct.
Let q be the probability of mis-classification. Then the probability that you would observe k mis-classifications in N runs is given by:
P(k|N,q) = B(N,k) q^k (1-q)^(N-k)
You need to then assume a suitable prior for q which is bounded between 0 and 1. A conjugate prior for the above is the beta distribution. If q ~ Beta(a,b) then the posterior is also a Beta distribution. For your info the posterior is:
f(q|-) ~ Beta(a+k,b+N-k)
Hope that helps.