scipy integrate.quad return an incorrect value - scipy

i use scipy integrate.quad to calc cdf of normal distribution:
def nor(delta, mu, x):
return 1 / (math.sqrt(2 * math.pi) * delta) * np.exp(-np.square(x - mu) / (2 * np.square(delta)))
delta = 0.1
mu = 0
t = np.arange(4.0, 10.0, 1)
nor_int = lambda t: integrate.quad(lambda x: nor(delta, mu, x), -np.inf, t)
nor_int_vec = np.vectorize(nor_int)
s = nor_int_vec(t)
for i in zip(s[0],s[1]):
print i
while it print as follows:
(1.0000000000000002, 1.2506543424265854e-08)
(1.9563704110140217e-11, 3.5403445591955275e-11)
(1.0000000000001916, 1.2616577562700088e-08)
(1.0842532749783998e-34, 1.9621183122960244e-34)
(4.234531567162006e-09, 7.753407284370446e-09)
(1.0000000000001334, 1.757986959115912e-10)
for some x, it return a value approximate to zero, it should be return 1.
can somebody tell me what is wrong?

Same reason as in why does quad return both zeros when integrating a simple Gaussian pdf at a very small variance? but seeing as I can't mark it as a duplicate, here goes:
You are integrating a function with tight localization (at scale delta) over a very large (in fact infinite) interval. The integration routine can simply miss the part of the interval where the function is substantially different from 0, judging it to be 0 instead. Some guidance is required. The parameter points can be used to this effect (see the linked question) but since quad over an infinite interval does not support it, the interval has to be manually split, like so:
for t in range(4, 10):
int1 = integrate.quad(lambda x: nor(delta, mu, x), -np.inf, mu - 10*delta)[0]
int2 = integrate.quad(lambda x: nor(delta, mu, x), mu - 10*delta, t)[0]
print(int1 + int2)
This prints 1 or nearly 1 every time. I picked mu-10*delta as a point to split on, figuring most of the function lies to the right of it, no matter what mu and delta are.
Notes:
Use np.sqrt etc; there is usually no reason for put math functions in NumPy code. The NumPy versions are available and are vectorized.
Applying np.vectorize to quad is not doing anything besides making the code longer and slightly harder to read. Use a normal Python loop or list comprehension. See NumPy vectorization with integration

Related

Definite integration using int command

Firstly, I'm quite new to Matlab.
I am currently trying to do a definite integral with respect to y of a particular function. The function that I want to integrate is
(note that the big parenthesis is multiplying with the first factor - I can't get the latex to not make it look like power)
I have tried plugging the above integral into Desmos and it worked as intended. My plan was to vary the value of x and y and will be using for loop via matlab.
However, after trying to use the int function to calculate the definite integral with the code as follow:
h = 5;
a = 2;
syms y
x = 3.8;
p = 2.*x.^2+2.*a.*y;
q = x.^2+y.^2;
r = x.^2+a.^2;
f = (-1./sqrt(1-(p.^2./(4.*q.*r)))).*(2.*sqrt(q).*sqrt(r).*2.*a-p.*2.*y.*sqrt(r)./sqrt(q))./(4.*q.*r);
theta = int(f,y,a+0.01,h) %the integral is undefined at y=2, hence the +0.01
the result is not quite as expected
theta =
int(-((8*461^(1/2)*(y^2 + 361/25)^(1/2))/5 - (461^(1/2)*y*(8*y + 1444/25))/(5*(y^2 + 361/25)^(1/2)))/((1 - (4*y + 722/25)^2/((1844*y^2)/25 + 665684/625))^(1/2)*((1844*y^2)/25 + 665684/625)), y, 21/10, 5)
After browsing through various posts, the common mistake is the undefined interval but the +0.01 should have fixed it. Any guidance on what went wrong is much appreciated.
The Definite Integrals example in the docs shows exactly this type of output when a closed form cannot be computed. You can approximate it numerically using vpa, i.e.
F = int(f,y,a,h);
theta = vpa(F);
Or you can do a numerical computation directly
theta = vpaintegral(f,y,a,h);
From the docs:
The vpaintegral function is faster and provides control over integration tolerances.

Return values differ depending on vector size

I noticed that for example the log(x) function returns slightly different values when called with vectors of different sizes in MATLAB.
Here is a minimal working example:
x1 = 0.1:0.1:1;
x2 = 0.1:0.1:1.1;
y1 = log(x1);
y2 = log(x2);
d = y1 - y2(1:length(x1));
d(7)
Executing returns:
>> ans =
-1.6653e-16
The behaviour seems to start when the vector becomes greater than 10 entries.
Although the difference is very small, when being scaled by a lot of operations using large vectors, the errors became big enough to notice.
Does anyone have an idea what is happening here?
The differences exist in x1 and x2 and those errors are propagated and potentially accentuated by log.
max(abs(x1 - x2(1:numel(x1))))
% 1.1102e-16
This is due to the inability of floating point number to represent your data exactly. See here for more information.
Per Suever’s answer, this is because for unfathomable reasons, Matlab’s colon operator [start : step : stop] with floating-point step produces non-bit-exact results even when start and step are the same, and only stop is different.
This is wrong, although it’s not unknown: in a blog post from 2006 (search for “Typical MATLAB Pitfall”), Loren notes that : colon operator can suffer from floating-point accuracy issues.
Numpy/Python does it right:
import numpy as np
np.all(np.arange(0.1,1.0+1e-4, 0.1) == np.arange(0.1, 1.1+1e-4, 0.1)[:-1]) # => True
(np.arange(start, stop, step) doesn’t include stop so I use stop+1e-4 above.)
Julia does it right too:
all(collect(0.1 : 0.1 : 1) .== collect(0.1 : 0.1 : 1.1)[1:10]) # => true
Alternative. Here’s a straightforward guess as to what Numpy’s arange is doing, in Matlab:
function y = arange(start, stop, step)
%ARANGE An alternative to Matlab's colon operator
%
% The API for this function follows Numpy's arange [1].
%
% ARANGE(START, STOP, STEP) produces evenly-spaced values within the half-open
% interval [START, STOP). The resulting vector has CEIL((STOP - START) / STEP)
% elements and is roughly equivalent to (START : STEP : STOP - STEP / 2), but
% may differ from this COLON-based version due to numerical differences.
%
% ARANGE(START, STOP) assumes STEP of 1.0.
%
% [1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html
if nargin < 3 || isempty(step)
step = 1.0;
end
len = ceil((stop - start) / step);
y = start + (0 : len - 1) * step;
This function tries to keep things exact until the last possible moment, when it applies the scaling by step and shifting by start. With this, your original two vectors are bit-exact over their shared interval:
y1 = arange(0.1, 1.0 + 1e-4, 0.1);
y2 = arange(0.1, 1.1 + 1e-4, 0.1);
all(y2(1:numel(y1)) == y1) % => 1
And therefore all downstream operations like log are also bit-exact.
I will investigate whether this bug in Matlab is causing any problems in our internal code and check if we should enforce using linspace (which I believe, but have not checked, does not suffer as much from accuracy issues) or something like arange above instead of : for floating-point steps. (arange also can be tricky because, as the docs note, depending on (stop-start)/step, you may get a vector whose last element is greater than stop sometimes—those same docs also recommend using linspace with non-unit steps.)

BFGS Fails to Converge

The model I'm working on is a multinomial logit choice model. It's a very specific dataset so other existing MNLogit libraries don't fit with my data.
So basically, it's a very complex function which takes 11 parameters and returns a loglikelihood value. Then I need to find the optimal parameter values that can minimize the loglikelihood using scipy.optimize.minimize.
Here are the problems that I encounter with different methods:
'Nelder-Mead’: it works well, and always give me the correct answer. However, it's EXTREMELY slow. For another function with a more complicated setup, it takes 15 hours to get to the optimal point. At the same time, the same function takes only 1 hour on Matlab using fminunc (which uses BFGS by default)
‘BFGS’: This is the method used by Matlab. It works well for any simply functions. However, for the function that I have, it always fails to converge and returns 'Desired error not necessarily achieved due to precision loss.’. I've spent lots of time playing around with the options but still failed to work.
'Powell': It quickly converges successfully but returns a wrong answer. The code is printed below (x0 is the correct answer, Nelder-Mead works for whatever initial value), and you can get the data here: https://www.dropbox.com/s/aap2dhor5jyxy94/data.csv
Thanks!
import pandas as pd
import numpy as np
from scipy.optimize import minimize
# https://www.dropbox.com/s/aap2dhor5jyxy94/data.csv
df = pd.read_csv('data.csv', index_col=0)
dfhh = df.hh
B = df.ix[:,'b0':'b4'].values # NT*5
P = df.ix[:,'p1':'p4'].values # NT*4
F = df.ix[:,'f1':'f4'].values # NT*4
SDV = df.ix[:,'lagb1':'lagb4'].values
def Li(x):
b1 = x[0] # coeff on prices
b2 = x[1] # coeff on features
a = x[2:7] # take first 4 values as alpha
E = np.exp(a + b1*P + b2*F) # (1*4) + (NT*4) + (NT*4) build matrix (NT*J) for each exp()
E = np.insert(E, 0, 1, axis=1) # (NT*5)
denom = E.sum(1)
return -np.log((B * E).sum(1) / denom).sum()
x0 = np.array([-32.31028223, 0.23965953, 0.84739154, 0.25418215,-3.38757007,-0.38036966])
np.random.seed(0)
x0 = x0 + np.random.rand(6)
minL = minimize(Li, x0, method='Nelder-Mead',options={'xtol': 1e-8, 'disp': True})
# minL = minimize(Li, x0, method='BFGS')
# minL = minimize(Li, x0, method='Powell', options={'xtol': 1e-12, 'ftol': 1e-12})
print minL
Update: 03/07/14 Simpler Version of the Code
Now Powell works well with very small tolerance, however the speed of Powell is slower than Nelder-Mead in this case. BFGS still fails to work.

Matlab gamma function: I get Inf for large values

I am writing my own code for the pdf of the multivariate t-distribution in Matlab.
There is a piece of code that includes the gamma function.
gamma((nu+D)/2) / gamma(nu/2)
The problem is that nu=1000, and so I get Inf from the gamma function.
It seems I will have to use some mathematical property of the gamma
function to rewrite it in a different way.
Thanks for any suggestions
You can use the function gammaln(x), which is the equivalent of log(gamma(x)) but avoids the overflow issue. The function you wrote is equivalent to:
exp(gammaln((nu+D)/2) - gammaln(nu/2))
The number gamma(1000/2) is larger than the maximum number MATLAB support. Thus it shows 'inf'. To see the maximum number in MATLAB, check realmax. For your case, if D is not very large, you will have to rewrite your formula. Let us assume that in your case 'D' is an even number. Then the formula you have will be: nu/2 * (nu/2 -1) * ....* (nu/2 - D/2 + 1).
sum1 = 1
for i = 1:D/2
sum1 = sum1*(nu/2 - i+1);
end
Then sum1 will be the result you want.

How to calculate the convolution of a function with itself in MATLAB and WolframAlpha?

I am trying to calculate the convolution of
x(t) = 1, -1<=t<=1
x(t) = 0, outside
with itself using the definition.
http://en.wikipedia.org/wiki/Convolution
I know how to do using the Matlab function conv, but I want to use the integral definition. My knowledge of Matlab and WolframAlpha is very limited.
I am still learning Mathematica myself, but here is what I came up with..
First we define the piecewise function (I am using the example from the Wikipedia page)
f[x_] := Piecewise[{{1, -0.5 <= x <= 0.5}}, 0]
Lets plot the function:
Plot[f[x], {x, -2, 2}, PlotStyle -> Thick, Exclusions -> None]
Then we write the function that defines the convolution of f with itself:
g[t_] = Integrate[f[x]*f[t - x], {x, -Infinity, Infinity}]
and the plot:
Plot[g[t], {t, -2, 2}, PlotStyle -> Thick]
EDIT
I tried to do the same in MATLAB/MuPad, I wasn't as successful:
f := x -> piecewise([x < -0.5 or x > 0.5, 0], [x >= -0.5 and x <= 0.5, 1])
plot(f, x = -2..2)
However when I try to compute the integral, it took almost a minute to return this:
g := t -> int(f(x)*f(t-x), x = -infinity..infinity)
the plot (also took too long)
plot(g, t = -2..2)
Note the same could have been done from inside MATLAB with the syntax:
evalin(symengine,'<MUPAD EXPRESSIONS HERE>')
Mathematica actually has a convolve function. The documentation on it has a number of different examples:
http://reference.wolfram.com/mathematica/ref/Convolve.html?q=Convolve&lang=en
I don't know much about Mathematica so I can only help you (partially) about the Matlab part.
To do the convolution with the Matlab conv functions means you do it numerically. What you mean with the integral definition is that you want to do it symbolically. For this you need the Matlab Symbolic Toolbox. This is essentially a Maple plugin for Matlab. So what you want to know is how it works in Maple.
You might find this link useful (wikibooks) for an introduction on the MuPad in Matlab.
I tried to implement your box function as a function in Matlab as
t = sym('t')
f = (t > -1) + (t < 1) - 1
However this does not work when t is ob symbolic type Function 'gt' is not implemented for MuPAD symbolic objects.
You could declare f it as a piecewise function see (matlab online help), this did not work in my Matlab though. The examples are all Maple syntax, so they would work in Maple straight away.
To circumvent this, I used
t = sym('t')
s = sym('s')
f = #(t) heaviside(t + 1) - heaviside(t - 1)
Unfortunately this is not successful
I = int(f(s)*f(t-s),s, 0, t)
gives
Warning: Explicit integral could not be found.
and does not provide an explicit result. You can still evaluate this expression, e.g.
>> subs(I, t, 1.5)
ans =
1/2
But Matlab/MuPad failed to give you and explicit expression in terms of t. This is not really unexpected as the function f is discontinuous. This is not so easy for symbolic computations.
Now I would go and help the computer, fortunately for the example that you asked the answer is very easy. The convolution in your example is simply the int_0^t BoxFunction(s) * BoxFunction(t - s) ds. The integrant BoxFunction(s) * BoxFunction(t - s) is again a box function, just not one that goes from [-1,1] but to a smaller interval (that depends on t). Then you only need to integrate the function f(x)=1 over this smaller interval.
Some of those steps you would have to to by hand first, then you could try to re-enter them to Matlab. You don't even need a computer algebra program at all to get to the answer.
Maybe Matematica or Maple could actually solve this problem, remember that the MuPad shipped with Matlab is only a stripped down version of Maple. Maybe the above might still help you, it should give you the idea how things work together. Try to put in a nicer function for f, e.g. a polynomial and you should get it working.