I try to optimize the function with scipy curve_fitand then find maximum of it with scipy minimize
def func(x, a0, a1,a2,a3,a4,a5,a6,a7,a8,
a28,a29,a30,a31,a32,a33, pol_rate1, pol_rate2, pol_rate3,pol_rate4
):
f = a0*sin(x[0])**pol_rate1 + a1*sin(x[1])**0.5 + a2*sin(x[2])**pol_rate4 + a3*x[3] + \
a4*x[4] + a5*sin(x[5])**pol_rate2 + a6*x[6] + a7*x[7] + a8*sin(x[8])**pol_rate3
return f
THe thing that I cant understand is that results of fitting (such as R squared) and the optimal values depend on the scale of my data.
For example, results for x, 2x and 10x are pretty much different. How can I define when results are really sustainable and trustworthing?
Related
I've got a following question. I've got function f(t) = C3*exp(t*x*1i) + C4*exp(-t*x*1i) as a solution of a differential equation (as syms). But I need this solution as a real function (C3*cos + C4*sin). How can I do it? And how can I get real and imaginary parts of this function? Is there a function in matlab allowing me to do it?
You can use rewrite to rewrite the expression in terms of cosines and sines, then collect to collect coefficients in terms of i, giving you your real and imaginary terms:
f = C3*exp(t*x*1i) + C4*exp(-t*x*1i);
g = collect(rewrite(f, 'sincos'), i)
g =
(C3*sin(t*x) - C4*sin(t*x))*1i + C3*cos(t*x) + C4*cos(t*x)
You can see from the above that the imaginary term is zero if C3 is equal to C4.
You can rewrite the expression/function in terms of sine and cosine using rewrite. Still you cannot apply the real and imag functions to get the respective parts in as nicer form as you get in case of non-symbolic computations. The trick to get the real and imaginary parts in a complex expression is to substitute i with 0 to get the real part and then subtract the real part from the original expression to get the imaginary part. Use simplify for the surety.
An example:
syms C3 C4 t x
f(t) = C3*exp(t*x*1i) + C4*exp(-t*x*1i);
fsincos = rewrite(f, 'sincos');
realf = simplify(subs(fsincos, i,0));
imagf = simplify(fsincos-realf);
%or you can use the collect function to avoid simplify
>> fsincos
fsincos(t) =
C3*(cos(t*x) + sin(t*x)*1i) + C4*(cos(t*x) - sin(t*x)*1i)
>> realf
realf(t) =
cos(t*x)*(C3 + C4)
>> imagf
imagf(t) =
sin(t*x)*(C3*1i - C4*1i)
I am trying to calculate some integrals, for example:
a = 1/sqrt(2);
b = -5;
c = 62;
d = 1;
f = exp(-x^2-y^2)*(erfc((sym(a) + 1/(x^2+y^2)*(sym(b)*x+sym(d)*y))*sqrt((x^2+y^2)*sym(10.^(c/10))))...
+ erfc((sym(a) - 1/(x^2+y^2)*(sym(b)*x+sym(d)*y))*sqrt((x^2+y^2)*sym(10.^(c/10)))));
h = int(int(f,x,-Inf,Inf),y,-Inf,Inf);
It will occur error like this:
Warning: Explicit integral could not be found.
Then, I try to use vpato calculate that integral,and get the result like this
vpa(int(int(f,x,-Inf,Inf),y,-Inf,Inf),5)
numeric::int(numeric::int(exp(- x^2 - y^2)*(erfc(((6807064429273519*x^2)/4294967296 + (6807064429273519*y^2)/4294967296)^(1/2)*(2^(1/2)/2 + (5*x - y)/(x^2 + y^2))) + erfc(((6807064429273519*x^2)/4294967296 + (6807064429273519*y^2)/4294967296)^(1/2)*(2^(1/2)/2 - (5*x - y)/(x^2 + y^2)))), x == -Inf..Inf), y == -Inf..Inf)
I already tried to change the interval [-Inf,Inf] to [-100,100], and get the same above result:
numeric::int(numeric::int(exp(- x^2 - y^2)*(erfc(((6807064429273519*x^2)/4294967296 + (6807064429273519*y^2)/4294967296)^(1/2)*(2^(1/2)/2 + (5*x - y)/(x^2 + y^2))) + erfc(((6807064429273519*x^2)/4294967296 + (6807064429273519*y^2)/4294967296)^(1/2)*(2^(1/2)/2 - (5*x - y)/(x^2 + y^2)))), x == -100..100), y == -100..100)
My question is why vpa in this case could not return to a real value?
There are something wrong in above Matlab code? (I, myself, could not find the bug so far)
Thank you in advance for your help.
It is unlikely that there is an analytic solution to this integral so using int may not be a good choice. In some cases vpa can be used to for a numeric solution. When this fails (by returning a call to itself) it may be for several reason: the integral may not exist, the integral may converge too slowly, singularities may cause issues, the integrand may be highly oscillatory or non-smooth, etc. Mathematica also struggles with this integral.
You can try calculating the integral numerically using integral2:
a = 1/sqrt(2);
b = -5;
c = 62;
d = 1;
f = #(x,y)exp(-x.^2-y.^2).*(erfc((a + 1./(x.^2+y.^2).*(b*x+d*y)).*sqrt((x.^2+y.^2)*10^(c/10)))...
+erfc((a - 1./(x.^2+y.^2).*(b*x+d*y)).*sqrt((x.^2+y.^2)*10^(c/10))));
h = integral2(f,-Inf,Inf,-Inf,Inf)
which returns 5.790631184403967. This compares well with Mathematica's numerical integration using NIntegrate. You can try specifying smaller absolute and relative tolerances for integral2 to get more accurate values, but this will result in much slower compute times.
The model I'm working on is a multinomial logit choice model. It's a very specific dataset so other existing MNLogit libraries don't fit with my data.
So basically, it's a very complex function which takes 11 parameters and returns a loglikelihood value. Then I need to find the optimal parameter values that can minimize the loglikelihood using scipy.optimize.minimize.
Here are the problems that I encounter with different methods:
'Nelder-Mead’: it works well, and always give me the correct answer. However, it's EXTREMELY slow. For another function with a more complicated setup, it takes 15 hours to get to the optimal point. At the same time, the same function takes only 1 hour on Matlab using fminunc (which uses BFGS by default)
‘BFGS’: This is the method used by Matlab. It works well for any simply functions. However, for the function that I have, it always fails to converge and returns 'Desired error not necessarily achieved due to precision loss.’. I've spent lots of time playing around with the options but still failed to work.
'Powell': It quickly converges successfully but returns a wrong answer. The code is printed below (x0 is the correct answer, Nelder-Mead works for whatever initial value), and you can get the data here: https://www.dropbox.com/s/aap2dhor5jyxy94/data.csv
Thanks!
import pandas as pd
import numpy as np
from scipy.optimize import minimize
# https://www.dropbox.com/s/aap2dhor5jyxy94/data.csv
df = pd.read_csv('data.csv', index_col=0)
dfhh = df.hh
B = df.ix[:,'b0':'b4'].values # NT*5
P = df.ix[:,'p1':'p4'].values # NT*4
F = df.ix[:,'f1':'f4'].values # NT*4
SDV = df.ix[:,'lagb1':'lagb4'].values
def Li(x):
b1 = x[0] # coeff on prices
b2 = x[1] # coeff on features
a = x[2:7] # take first 4 values as alpha
E = np.exp(a + b1*P + b2*F) # (1*4) + (NT*4) + (NT*4) build matrix (NT*J) for each exp()
E = np.insert(E, 0, 1, axis=1) # (NT*5)
denom = E.sum(1)
return -np.log((B * E).sum(1) / denom).sum()
x0 = np.array([-32.31028223, 0.23965953, 0.84739154, 0.25418215,-3.38757007,-0.38036966])
np.random.seed(0)
x0 = x0 + np.random.rand(6)
minL = minimize(Li, x0, method='Nelder-Mead',options={'xtol': 1e-8, 'disp': True})
# minL = minimize(Li, x0, method='BFGS')
# minL = minimize(Li, x0, method='Powell', options={'xtol': 1e-12, 'ftol': 1e-12})
print minL
Update: 03/07/14 Simpler Version of the Code
Now Powell works well with very small tolerance, however the speed of Powell is slower than Nelder-Mead in this case. BFGS still fails to work.
I am fitting data with weights using scipy.odr but I don't know how to obtain a measure of goodness-of-fit or an R squared. Does anyone have suggestions for how to obtain this measure using the output stored by the function?
The res_var attribute of the Output is the so-called reduced Chi-square value for the fit, a popular choice of goodness-of-fit statistic. It is somewhat problematic for non-linear fitting, though. You can look at the residuals directly (out.delta for the X residuals and out.eps for the Y residuals). Implementing a cross-validation or bootstrap method for determining goodness-of-fit, as suggested in the linked paper, is left as an exercise for the reader.
The output of ODR gives both the estimated parameters beta as well as the standard deviation of those parameters sd_beta. Following p. 76 of the ODRPACK documentation, you can convert these values into a t-statistic with (beta - beta_0) / sd_beta, where beta_0 is the number that you're testing significance with respect to (often zero). From there, you can use the t-distribution to get the p-value.
Here's a working example:
import numpy as np
from scipy import stats, odr
def linear_func(B, x):
"""
From https://docs.scipy.org/doc/scipy/reference/odr.html
Linear function y = m*x + b
"""
# B is a vector of the parameters.
# x is an array of the current x values.
# x is in the same format as the x passed to Data or RealData.
#
# Return an array in the same format as y passed to Data or RealData.
return B[0] * x + B[1]
np.random.seed(0)
sigma_x = .1
sigma_y = .15
N = 100
x_star = np.linspace(0, 10, N)
x = np.random.normal(x_star, sigma_x, N)
# the true underlying function is y = 2*x_star + 1
y = np.random.normal(2*x_star + 1, sigma_y, N)
linear = odr.Model(linear_func)
dat = odr.Data(x, y, wd=1./sigma_x**2, we=1./sigma_y**2)
this_odr = odr.ODR(dat, linear, beta0=[1., 0.])
odr_out = this_odr.run()
# degrees of freedom are n_samples - n_parameters
df = N - 2 # equivalently, df = odr_out.iwork[10]
beta_0 = 0 # test if slope is significantly different from zero
t_stat = (odr_out.beta[0] - beta_0) / odr_out.sd_beta[0] # t statistic for the slope parameter
p_val = stats.t.sf(np.abs(t_stat), df) * 2
print('Recovered equation: y={:3.2f}x + {:3.2f}, t={:3.2f}, p={:.2e}'.format(odr_out.beta[0], odr_out.beta[1], t_stat, p_val))
Recovered equation: y=2.00x + 1.01, t=239.63, p=1.76e-137
One note of caution in using this approach on nonlinear problems, from the same ODRPACK docs:
"Note that for nonlinear ordinary least squares, the linearized confidence regions and intervals are asymptotically correct as n → ∞ [Jennrich, 1969]. For the orthogonal distance regression problem, they have been shown to be asymptotically correct as σ∗ → 0 [Fuller, 1987]. The difference between the conditions of asymptotic correctness can be explained by the fact that, as the number of observations increases in the orthogonal distance regression problem one does not obtain additional information for ∆. Note also that Vˆ is dependent upon the weight matrix Ω, which must be assumed to be correct, and cannot be confirmed from the orthogonal distance regression results. Errors in the values of wǫi and wδi that form Ω will have an adverse affect on the accuracy of Vˆ and its component parts. The results of a Monte Carlo experiment examining the accuracy
of the linearized confidence intervals for four different measurement error models is presented in [Boggs and Rogers, 1990b]. Those results indicate that the confidence regions and intervals for ∆ are not as accurate as those for β.
Despite its potential inaccuracy, the covariance matrix is frequently used to construct confidence regions and intervals for both nonlinear ordinary least squares and measurement error models because the resulting regions and intervals are inexpensive to compute, often adequate, and familiar to practitioners. Caution must be exercised when using such regions and intervals, however, since the validity of the approximation will depend on the nonlinearity of the model, the variance and distribution of the errors, and the data itself. When more reliable intervals and regions are required, other more accurate methods should be used. (See, e.g., [Bates and Watts, 1988], [Donaldson and Schnabel, 1987], and [Efron, 1985].)"
As mentioned by R. Ken, chi-square or variance of the residuals is one of the more
commonly used tests of goodness of fit. ODR stores the sum of squared
residuals in out.sum_square and you can verify yourself
that out.res_var = out.sum_square/degrees_freedom corresponds to what is commonly called reduced chi-square: i.e. the chi-square test result divided by its expected value.
As for the other very popular estimator of goodness of fit in linear regression, R squared and its adjusted version, we can define the functions
import numpy as np
def R_squared(observed, predicted, uncertainty=1):
""" Returns R square measure of goodness of fit for predicted model. """
weight = 1./uncertainty
return 1. - (np.var((observed - predicted)*weight) / np.var(observed*weight))
def adjusted_R(x, y, model, popt, unc=1):
"""
Returns adjusted R squared test for optimal parameters popt calculated
according to W-MN formula, other forms have different coefficients:
Wherry/McNemar : (n - 1)/(n - p - 1)
Wherry : (n - 1)/(n - p)
Lord : (n + p - 1)/(n - p - 1)
Stein : (n - 1)/(n - p - 1) * (n - 2)/(n - p - 2) * (n + 1)/n
"""
# Assuming you have a model with ODR argument order f(beta, x)
# otherwise if model is of the form f(x, a, b, c..) you could use
# R = R_squared(y, model(x, *popt), uncertainty=unc)
R = R_squared(y, model(popt, x), uncertainty=unc)
n, p = len(y), len(popt)
coefficient = (n - 1)/(n - p - 1)
adj = 1 - (1 - R) * coefficient
return adj, R
From the output of your ODR run you can find the optimal values for your model's parameters in out.beta and at this point we have everything we need for computing R squared.
from scipy import odr
def lin_model(beta, x):
"""
Linear function y = m*x + q
slope m, constant term/y-intercept q
"""
return beta[0] * x + beta[1]
linear = odr.Model(lin_model)
data = odr.RealData(x, y, sx=sigma_x, sy=sigma_y)
init = odr.ODR(data, linear, beta0=[1, 1])
out = init.run()
adjusted_Rsq, Rsq = adjusted_R(x, y, lin_model, popt=out.beta)
I am currently trying to implement a machine learning algorithm that involves the logistic loss function in MATLAB. Unfortunately, I am having some trouble due to numerical overflow.
In general, for a given an input s, the value of the logistic function is:
log(1 + exp(s))
and the slope of the logistic loss function is:
exp(s)./(1 + exp(s)) = 1./(1 + exp(-s))
In my algorithm, the value of s = X*beta. Here X is a matrix with N data points and P features per data point (i.e. size(X)=[N,P]) and beta is a vector of P coefficients for each feature such that size(beta)=[P 1].
I am specifically interested in calculating the average value and gradient of the Logistic function for given value of beta.
The average value of the Logistic function w.r.t to a value of beta is:
L = 1/N * sum(log(1+exp(X*beta)),1)
The average value of the slope of the Logistic function w.r.t. to a value of b is:
dL = 1/N * sum((exp(X*beta)./(1+exp(X*beta))' X, 1)'
Note that size(dL) = [P 1].
My issue is that these expressions keep producing numerical overflows. The problem effectively comes from the fact that exp(s)=Inf when s>1000 and exp(s)=0 when s<-1000.
I am looking for a solution such that s can take on any value in floating point arithmetic. Ideally, I would also really appreciate a solution that allows me to evaluate the value and gradient in a vectorized / efficient way.
How about the following approximations:
– For computing L, if s is large, then exp(s) will be much larger than 1:
1 + exp(s) ≅ exp(s)
and consequently
log(1 + exp(s)) ≅ log(exp(s)) = s.
If s is small, then using the Taylor series of exp()
exp(s) ≅ 1 + s
and using the Taylor series of log()
log(1 + exp(s)) ≅ log(2 + s) ≅ log(2) + s / 2.
– For computing dL, for large s
exp(s) ./ (1 + exp(s)) ≅ 1
and for small s
exp(s) ./ (1 + exp(s)) ≅ 1/2 + s / 4.
– The code to compute L could look for example like this:
s = X*beta;
l = log(1+exp(s));
ind = isinf(l);
l(ind) = s(ind);
ind = (l == 0);
l(ind) = log(2) + s(ind) / 2;
L = 1/N * sum(l,1)
I found a good article about this problem.
Cutting through a lot of words, we can simplify the argument to stating that the original expression
log(1 + exp(s))
can be rewritten as
log(exp(s)*(exp(-s) + 1))
= log(exp(s)) + log(exp(-s) + 1)
= s + log(exp(-s) + 1)
This stops overflow from occurring - it doesn't prevent underflow, but by the time that occurs, you have your answer (namely, s). You can't just use this instead of the original, since it will still give you problems. However, we now have the basis for a function that can be written that will be accurate and won't produce over/underflow:
function LL = logistic(s)
if s<0
LL = log(1 + exp(s));
else
LL = s + logistic(-s);
I think this maintains reasonably good accuracy.
EDIT now to the meat of your question - making this vectorized, and allowing the calculation of the slope as well. Let's take these one at a time:
function LL = logisticVec(s)
LL = zeros(size(s));
LL(s<0) = log(1 + exp(s(s<0)));
LL(s>=0) = s(s>=0) + log(1 + exp(-s(s>=0)));
To obtain the average you wanted:
L = logisticVec(X*beta) / N;
The slope is a little bit trickier; note I believe you may have a typo in your expression (missing a multiplication sign).
dL/dbeta = sum(X * exp(X*beta) ./ (1 + exp(X*beta))) / N;
If we divide top and bottom by exp(X*beta) we get
dL = sum(X ./ (exp(-X*beta) + 1)) / N;
Once again, the overflow has gone away and we are left with underflow - but since the underflowed value has 1 added to it, the error this creates is insignificant.