BFGS Fails to Converge - scipy

The model I'm working on is a multinomial logit choice model. It's a very specific dataset so other existing MNLogit libraries don't fit with my data.
So basically, it's a very complex function which takes 11 parameters and returns a loglikelihood value. Then I need to find the optimal parameter values that can minimize the loglikelihood using scipy.optimize.minimize.
Here are the problems that I encounter with different methods:
'Nelder-Mead’: it works well, and always give me the correct answer. However, it's EXTREMELY slow. For another function with a more complicated setup, it takes 15 hours to get to the optimal point. At the same time, the same function takes only 1 hour on Matlab using fminunc (which uses BFGS by default)
‘BFGS’: This is the method used by Matlab. It works well for any simply functions. However, for the function that I have, it always fails to converge and returns 'Desired error not necessarily achieved due to precision loss.’. I've spent lots of time playing around with the options but still failed to work.
'Powell': It quickly converges successfully but returns a wrong answer. The code is printed below (x0 is the correct answer, Nelder-Mead works for whatever initial value), and you can get the data here: https://www.dropbox.com/s/aap2dhor5jyxy94/data.csv
Thanks!
import pandas as pd
import numpy as np
from scipy.optimize import minimize
# https://www.dropbox.com/s/aap2dhor5jyxy94/data.csv
df = pd.read_csv('data.csv', index_col=0)
dfhh = df.hh
B = df.ix[:,'b0':'b4'].values # NT*5
P = df.ix[:,'p1':'p4'].values # NT*4
F = df.ix[:,'f1':'f4'].values # NT*4
SDV = df.ix[:,'lagb1':'lagb4'].values
def Li(x):
b1 = x[0] # coeff on prices
b2 = x[1] # coeff on features
a = x[2:7] # take first 4 values as alpha
E = np.exp(a + b1*P + b2*F) # (1*4) + (NT*4) + (NT*4) build matrix (NT*J) for each exp()
E = np.insert(E, 0, 1, axis=1) # (NT*5)
denom = E.sum(1)
return -np.log((B * E).sum(1) / denom).sum()
x0 = np.array([-32.31028223, 0.23965953, 0.84739154, 0.25418215,-3.38757007,-0.38036966])
np.random.seed(0)
x0 = x0 + np.random.rand(6)
minL = minimize(Li, x0, method='Nelder-Mead',options={'xtol': 1e-8, 'disp': True})
# minL = minimize(Li, x0, method='BFGS')
# minL = minimize(Li, x0, method='Powell', options={'xtol': 1e-12, 'ftol': 1e-12})
print minL
Update: 03/07/14 Simpler Version of the Code
Now Powell works well with very small tolerance, however the speed of Powell is slower than Nelder-Mead in this case. BFGS still fails to work.

Related

Definite integration using int command

Firstly, I'm quite new to Matlab.
I am currently trying to do a definite integral with respect to y of a particular function. The function that I want to integrate is
(note that the big parenthesis is multiplying with the first factor - I can't get the latex to not make it look like power)
I have tried plugging the above integral into Desmos and it worked as intended. My plan was to vary the value of x and y and will be using for loop via matlab.
However, after trying to use the int function to calculate the definite integral with the code as follow:
h = 5;
a = 2;
syms y
x = 3.8;
p = 2.*x.^2+2.*a.*y;
q = x.^2+y.^2;
r = x.^2+a.^2;
f = (-1./sqrt(1-(p.^2./(4.*q.*r)))).*(2.*sqrt(q).*sqrt(r).*2.*a-p.*2.*y.*sqrt(r)./sqrt(q))./(4.*q.*r);
theta = int(f,y,a+0.01,h) %the integral is undefined at y=2, hence the +0.01
the result is not quite as expected
theta =
int(-((8*461^(1/2)*(y^2 + 361/25)^(1/2))/5 - (461^(1/2)*y*(8*y + 1444/25))/(5*(y^2 + 361/25)^(1/2)))/((1 - (4*y + 722/25)^2/((1844*y^2)/25 + 665684/625))^(1/2)*((1844*y^2)/25 + 665684/625)), y, 21/10, 5)
After browsing through various posts, the common mistake is the undefined interval but the +0.01 should have fixed it. Any guidance on what went wrong is much appreciated.
The Definite Integrals example in the docs shows exactly this type of output when a closed form cannot be computed. You can approximate it numerically using vpa, i.e.
F = int(f,y,a,h);
theta = vpa(F);
Or you can do a numerical computation directly
theta = vpaintegral(f,y,a,h);
From the docs:
The vpaintegral function is faster and provides control over integration tolerances.

Solve system of differential equation in python

I'm trying to solve a system of differential equations in python.
I have a system composed by two equations where I have two variables, A and B.
The initial condition are that A0=1e17 and B0=0, they change simultaneously.
I wrote the following code using ODEINT:
import numpy as np
from scipy.integrate import odeint
def dmdt(m,t):
A, B = m
dAdt = A-B
dBdt = (A-B)*A
return [dAdt, dBdt]
# Create time domain
t = np.linspace(0, 100, 1)
# Initial condition
A0=1e17
B0=0
m0=[A0, B0]
solution = odeint(dmdt, m0, t)
Apparently I obtain an output different from the expected one but I don't understand the error.
Can someone help me?
Thanks
From A*A'-B'=0 one concludes
B = 0.5*(A^2 - A0^2)
Inserted into the first equation that gives
A' = A - 0.5*A^2 + 0.5*A0^2
= 0.5*(A0^2+1 - (A-1)^2)
This means that the A dynamic has two fixed points at about A0+1 and -A0+1, is growing inside that interval, the upper fixed point is stable. However, in standard floating point numbers there is no difference between 1e17 and 1e17+1. If you want to see the difference, you have to encode it separately.
Also note that the standard error tolerances atol and rtol in the range somewhere between 1e-6 and 1e-9 are totally incompatible with the scales of the problem as originally stated, also highlighting the need to rescale and shift the problem into a more appreciable range of values.
Setting A = A0+u with |u| in an expected scale of 1..10 then gives
B = 0.5*u*(2*A0+u)
u' = A0+u - 0.5*u*(2*A0+u) = (1-u)*A0 - 0.5*u^2
This now suggests that the time scale be reduced by A0, set t=s/A0. Also, B = A0*v. Insert the direct parametrizations into the original system to get
du/ds = dA/dt / A0 = (A0+u-A0*v)/A0 = 1 + u/A0 - v
dv/ds = dB/dt / A0^2 = (A0+u-A0*v)*(A0+u)/A0^2 = (1+u/A0-v)*(1+u/A0)
u(0)=v(0)=0
Now in floating point and the expected range for u, we get 1+u/A0 == 1, so effectively u'(s)=v'(s)=1-v which gives
u(s)=v(s)=1-exp(-s)`,
A(t) = A0 + 1-exp(-A0*t) + very small corrections
B(t) = A0*(1-exp(-A0*t)) + very small corrections
The system in s,u,v should be well-computable by any solver in the default tolerances.

scipy integrate.quad return an incorrect value

i use scipy integrate.quad to calc cdf of normal distribution:
def nor(delta, mu, x):
return 1 / (math.sqrt(2 * math.pi) * delta) * np.exp(-np.square(x - mu) / (2 * np.square(delta)))
delta = 0.1
mu = 0
t = np.arange(4.0, 10.0, 1)
nor_int = lambda t: integrate.quad(lambda x: nor(delta, mu, x), -np.inf, t)
nor_int_vec = np.vectorize(nor_int)
s = nor_int_vec(t)
for i in zip(s[0],s[1]):
print i
while it print as follows:
(1.0000000000000002, 1.2506543424265854e-08)
(1.9563704110140217e-11, 3.5403445591955275e-11)
(1.0000000000001916, 1.2616577562700088e-08)
(1.0842532749783998e-34, 1.9621183122960244e-34)
(4.234531567162006e-09, 7.753407284370446e-09)
(1.0000000000001334, 1.757986959115912e-10)
for some x, it return a value approximate to zero, it should be return 1.
can somebody tell me what is wrong?
Same reason as in why does quad return both zeros when integrating a simple Gaussian pdf at a very small variance? but seeing as I can't mark it as a duplicate, here goes:
You are integrating a function with tight localization (at scale delta) over a very large (in fact infinite) interval. The integration routine can simply miss the part of the interval where the function is substantially different from 0, judging it to be 0 instead. Some guidance is required. The parameter points can be used to this effect (see the linked question) but since quad over an infinite interval does not support it, the interval has to be manually split, like so:
for t in range(4, 10):
int1 = integrate.quad(lambda x: nor(delta, mu, x), -np.inf, mu - 10*delta)[0]
int2 = integrate.quad(lambda x: nor(delta, mu, x), mu - 10*delta, t)[0]
print(int1 + int2)
This prints 1 or nearly 1 every time. I picked mu-10*delta as a point to split on, figuring most of the function lies to the right of it, no matter what mu and delta are.
Notes:
Use np.sqrt etc; there is usually no reason for put math functions in NumPy code. The NumPy versions are available and are vectorized.
Applying np.vectorize to quad is not doing anything besides making the code longer and slightly harder to read. Use a normal Python loop or list comprehension. See NumPy vectorization with integration

How to convert deep learning gradient descent equation into python

I've been following an online tutorial on deep learning. It has a practical question on gradient descent and cost calculations where I been struggling to get the given answers once it was converted to python code. Hope you can kindly help me get the correct answer please
Please see the following link for the equations used
Click here to see the equations used for the calculations
Following is the function given to calculate the gradient descent,cost etc. The values need to be found without using for loops but using matrix manipulation operations
import numpy as np
def propagate(w, b, X, Y):
"""
Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of size (num_px * num_px * 3, number of examples)
Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size
(1, number of examples)
Return:
cost -- negative log-likelihood cost for logistic regression
dw -- gradient of the loss with respect to w, thus same shape as w
db -- gradient of the loss with respect to b, thus same shape as b
Tips:
- Write your code step by step for the propagation. np.log(), np.dot()
"""
m = X.shape[1]
# FORWARD PROPAGATION (FROM X TO COST)
### START CODE HERE ### (≈ 2 lines of code)
A = # compute activation
cost = # compute cost
### END CODE HERE ###
# BACKWARD PROPAGATION (TO FIND GRAD)
### START CODE HERE ### (≈ 2 lines of code)
dw =
db =
### END CODE HERE ###
assert(dw.shape == w.shape)
assert(db.dtype == float)
cost = np.squeeze(cost)
assert(cost.shape == ())
grads = {"dw": dw,
"db": db}
return grads, cost
Following are the data given to test the above function
w, b, X, Y = np.array([[1],[2]]), 2, np.array([[1,2],[3,4]]),
np.array([[1,0]])
grads, cost = propagate(w, b, X, Y)
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print ("cost = " + str(cost))
Following is the expected output of the above
Expected Output:
dw [[ 0.99993216] [ 1.99980262]]
db 0.499935230625
cost 6.000064773192205
For the above propagate function I have used the below replacements, but the output is not what is expected. Please kindly help on how to get the expected output
A = sigmoid(X)
cost = -1*((np.sum(np.dot(Y,np.log(A))+np.dot((1-Y),(np.log(1-A))),axis=0))/m)
dw = (np.dot(X,((A-Y).T)))/m
db = np.sum((A-Y),axis=0)/m
Following is the sigmoid function used to calculate the Activation:
def sigmoid(z):
"""
Compute the sigmoid of z
Arguments:
z -- A scalar or numpy array of any size.
Return:
s -- sigmoid(z)
"""
### START CODE HERE ### (≈ 1 line of code)
s = 1 / (1+np.exp(-z))
### END CODE HERE ###
return s
Hope someone could help me understand how to solve this as I couldn't continue with rest of the tutorials without understanding this. Many thanks
You can calculate A,cost,dw,db as the following:
A = sigmoid(np.dot(w.T,X) + b)
cost = -1 / m * np.sum(Y*np.log(A)+(1-Y)*np.log(1-A))
dw = 1/m * np.dot(X,(A-Y).T)
db = 1/m * np.sum(A-Y)
where sigmoid is :
def sigmoid(z):
s = 1 / (1 + np.exp(-z))
return s
After going through the code and notes a few times was finally able to figure out the error.
First it needs calculating Z and then pass it to the sigmoid function, instead of X
Formula for Z = w(T)X+b. So in python this is calculated as below
Z=np.dot(w.T,X)+b
Then calculate A by passing z to sigmoid function
A = sigmoid(Z)
Then dw can be calculated as below
dw=np.dot(X,(A-Y).T)/m
Calculation of the other variables; cost and derivative of b will be as follows
cost = -1*((np.sum((Y*np.log(A))+((1-Y)*(np.log(1-A))),axis=1))/m)
db = np.sum((A-Y),axis=1)/m
def sigmoid(x):
#You have it right
return 1/(1 + np.exp(-x))
def derivSigmoid(x):
return sigmoid(x) * (1 - sigmoid(x))
error = targetSample - output
#Make sure to keep the sigmoided value around. For instance, an output that has already been sigmoided can be used to get the sigmoid derivative faster (output = sigmoid(x)):
dOutput = output * (1 - output)
Looks like you're already working on the backprop. Just thought I'd help simplify some of the forward prop for you.

Find the optimum combination

I am trying to optimise this: function [ LPS, LCE ] = runProject( Nw, Np, Nb) which calls some other functions I have written before. The idea is to find the optimum combination of Nw, Np, Nb AND keep the LPS=0, while LCE is minimum. Nw, Np, Nb should be positive integers. LCE will also be positive.
function [ LPS, LCE ] = runProject( Nw, Np, Nb)
%
% Detailed explanation goes here
[Pg, Pw, Pp] = Pgener();
[Pb, LPS] = Bat( Pg );
[LCE] = Constr(Pw, Pp, Nb)
end
However, I tried the gamultiobj solver from the Global Optimization Toolbox of matlab2015 (trial version) for a different approach with pareto front, but I got the error:
"Optimization running.
Error running optimization.
Not enough input arguments."
You should write your objective function like the following example:
function scores = rastriginsfcn(pop)
%RASTRIGINSFCN Compute the "Rastrigin" function.
% pop = max(-5.12,min(5.12,pop));
scores = 10.0 * size(pop,2) + sum(pop .^2 - 10.0 * cos(2 * pi .* pop),2);
As you can see, the function accepts all the inputs as a single vector pop.
With such representation I can evaluate the function as follows:
rastriginsfcn([2 3])
>> ans
13
Still for running the optimization from the toolbox you have to mention the number of variables, for instance, in my example it is equal to 2:
[x fval exitflag] = ga(#rastriginsfcn, 2)
It is the same for the multi-objective optimization. Check the following image from MATHWORKS: