Why am I getting 'isinstance': Cannot determine Numba type? - numba

I am new with Numba. I am trying to accelerate a pretty complicated solver. However, I keep getting an error such as
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend) Untyped global name 'isinstance': Cannot determine Numba type of <class 'builtin_function_or_method'>
I wrote a small example to reproduce the same error:
import numba
import numpy as np
from numba import types
from numpy import zeros_like, isfinite
from numpy.linalg import solve
from numpy.random import uniform
#numba.njit(parallel=True)
def foo(A_, b_, M1=None, M2=None):
x_ = zeros_like(b_)
r = b_ - A_.dot(x_)
flag = 1
if isinstance(M1, types.NoneType): # Error here
y = r
else:
y = solve(M1, r)
if not isfinite(y).any():
flag = 2
if isinstance(M2, types.NoneType):
z = y
else:
z = solve(M2, y)
if not isfinite(z).any():
flag = 2
return z, flag
N = 10
tmp = np.random.rand(N, N)
A = np.dot(tmp, tmp.T)
x = np.zeros((N, 1), dtype=np.float64)
b = np.vstack([uniform(0.0, 1.0) for i in range(N)])
X_1, info = foo(A, b)
Also if I change the decorator to generated_jit() I get the following error:
r = b_ - A_.dot(x_)
AttributeError: 'Array' object has no attribute 'dot'

Numba compiles the function and requires every variables to be statically typed. This means that each variable has only one unique type: one variable cannot be of both the type NoneType and something else as opposed to with CPython based on dynamic typing. Dynamic typing is also a major source of the slowdown of CPython. Thus, using isinstance in nopython JITed Numba functions does not make much sense. In fact, this built-in function is not supported.
That being said, Numba supports optional arguments by specifying optional(ArgumentType) in the signature (note that the resulting type of the variable is optional(ArgumentType) and not ArgumentType nor NoneType. You can then test if the argument is set using if yourArgument is None:. I do not know what is the type of M1 and M2 in your code but they need to be explicitly defined in the signature with optional argument.

Related

Passing Argument to a Generator to build a tf.data.Dataset

I am trying to build a tensorflow dataset from a generator. I have a list of tuples called some_list , where each tuple has an integer and some text.
When I do not pass some_list as an argument to the generator, the code works fine
import tensorflow as tf
import random
import numpy as np
some_list=[(1,'One'),[2,'Two'],[3,'Three'],[4,'Four'],
(5,'Five'),[6,'Six'],[7,'Seven'],[8,'Eight']]
def text_gen1():
random.shuffle(some_list)
size=len(some_list)
i=0
while True:
yield some_list[i][0],some_list[i][1]
i+=1
if i>size:
i=0
random.shuffle(some_list)
#Not passing any argument
tf_dataset1 = tf.data.Dataset.from_generator(text_gen1,output_types=(tf.int32,tf.string),
output_shapes = ((),()))
for count_batch in tf_dataset1.repeat().batch(3).take(2):
print(count_batch)
(<tf.Tensor: shape=(3,), dtype=int32, numpy=array([7, 1, 2])>, <tf.Tensor: shape=(3,), dtype=string, numpy=array([b'Seven', b'One', b'Two'], dtype=object)>)
(<tf.Tensor: shape=(3,), dtype=int32, numpy=array([3, 5, 4])>, <tf.Tensor: shape=(3,), dtype=string, numpy=array([b'Three', b'Five', b'Four'], dtype=object)>)
However, when I try to pass some_list as an argument, the code fails
def text_gen2(file_list):
random.shuffle(file_list)
size=len(file_list)
i=0
while True:
yield file_list[i][0],file_list[i][1]
i+=1
if i>size:
i=0
random.shuffle(file_list)
tf_dataset2 = tf.data.Dataset.from_generator(text_gen2,args=[some_list],output_types=
(tf.int32,tf.string),output_shapes = ((),()))
for count_batch in tf_dataset1.repeat().batch(3).take(2):
print(count_batch)
ValueError: Can't convert Python sequence with mixed types to Tensor.
I noticed , when I try to pass a list of integers as an argument , the code works. However, a list of tuples seems to make it crash. Can someone shed some light on it ?
The problem is what it says is, you cannot have heterogeneous data types (int and str) in the same tf.Tensor. I did a few changes and came up with the code below.
Separate your some_list to two lists using zip(), i.e. int_list and str_list and make your generator function accept two lists.
I don't understand why you're manually shuffling stuff within the generator. You can do it in a cleaner way using tf.data.Dataset.shuffle()
import tensorflow as tf
import random
import numpy as np
some_list=[(1,'One'),[2,'Two'],[3,'Three'],[4,'Four'],
(5,'Five'),[6,'Six'],[7,'Seven'],[8,'Eight']]
def text_gen2(int_list, str_list):
for x, y in zip(int_list, str_list):
yield x, y
tf_dataset2 = tf.data.Dataset.from_generator(
text_gen2,
args=list(zip(*some_list)),
output_types=(tf.int32,tf.string),output_shapes = ((),())
)
i = 0
for count_batch in tf_dataset2.repeat().batch(4).shuffle(buffer_size=6):
print(count_batch)
i += 1
if i > 10: break;

Minimize a sympy expression

I have a sympy expression that depends on a variable x, and I want to find the value x for which the expression is minimized. This is my code so far:
import numpy as np
from sympy import *
from scipy.optimize import minimize as scipy_min
x = Symbol('x')
p = Symbol('p')
f = exp(-(x-p)**2/2)/sqrt(2*pi)
func = lambdify([x,p], f)
def func_np(x):
return func(x,2.2)
res = scipy_min(func_np, x, method='Nelder-Mead', tol=1e-6)
However I am getting the error: can't convert expression to float. Can someone help me with this? Thank you!
The second argument in minimize is an initial guess, a number, not a variable. You are trying to pass a sympy.Symbol, which is definitely not a number. It is ok to minimize lambdified function, however, be aware that lambdify is (relatively) slow, so it could be better to print(expression) and create a def manually.
import numpy as np
from sympy import *
from scipy.optimize import minimize as scipy_min
x = Symbol('x')
p = Symbol('p')
f = exp(-(x-p)**2/2)/sqrt(2*pi)
func = lambdify([x,p], f)
def func_np(x):
return func(x,2.2)
res = scipy_min(func_np, 1.0, method='Nelder-Mead', tol=1e-6)
print(res.x)
yields -37.3. However, it is not the solution, because this particular function goes towards 0 when x goes towards ±∞.

Reshape of Inducing Variables - GPflow

I have an SGPR model:
import numpy as np
import gpflow
X, Y = np.random.randn(50, 2), np.random.randn(50, 1)
Z1 = np.random.randn(13, 2)
k = gpflow.kernels.SquaredExponential()
m = gpflow.models.SGPR(data=(X, Y), kernel=k, inducing_variable=Z1)
And I would like to assign inducing variable but with different shape, like:
Z2 = np.random.randn(29, 2)
m.inducing_variable.Z.assign(Z2)
But if I do it, I got:
ValueError: Shapes (13, 2) and (29, 2) are incompatible
is there a way to reassign the inducing variables without redefining the model?
Context: Instead of optimizing the model with the inducing variables, I would like to optimize the model without optimizing the inducing variables, manually reassigning the inducing variables at each step of the optimization.
UPDATE: This issue is resolved by https://github.com/GPflow/GPflow/pull/1594, which will become part of the next GPflow patch release (2.1.4).
With that fix, you don't need a custom class. All you need to do is explicitly set the static shape with None along the first dimension:
inducing_variable = gpflow.inducing_variables.InducingPoints(
tf.Variable(
Z1, # initial value
trainable=False, # True does not work - see Note below
shape=(None, Z1.shape[1]), # or even tf.TensorShape(None)
dtype=gpflow.default_float(), # required due to tf's 32bit default
)
)
m = gpflow.models.SGPR(data=(X, Y), kernel=k, inducing_variable=inducing_variable)
Then m.inducing_variable.Z.assign(Z2) should work just fine.
Note that in this case Z cannot be trainable, as the TensorFlow optimizers need to know the shape at construction time and don't support dynamic shapes.
Right now (as of GPflow 2.1.2) there is no built-in way to change the shape of inducing variables for SGPR, though it is in principle possible. You can get what you want with your own inducing variable class though:
class VariableInducingPoints(gpflow.inducing_variables.InducingPoints):
def __init__(self, Z, name=None):
super().__init__(Z, name=name)
# overwrite with Variable with None as first element in shape so
# we can assign arrays with arbitrary length along this dimension:
self.Z = tf.Variable(Z, dtype=gpflow.default_float(),
shape=(None, Z.shape[1])
)
def __len__(self):
return tf.shape(self.Z)[0] # dynamic shape
# instead of the static shape returned by the InducingPoints parent class
and then do
m = gpflow.models.SGPR(
data=(X, Y), kernel=k, inducing_variable=VariableInducingPoints(Z1)
)
instead. Then your m.inducing_variable.Z.assign() should work as you like it.
(For SVGP, the size of the inducing variable and the distribution defined by q_mu and q_sqrt has to match, as well as be known at construction time, so in this case changing the number of inducing variables is less trivial.)

How to convert symbol variable to iterable variable?

I am trying to minimize a function using scipy.optimize. Here is my programme and the last line is the error message.
import sympy as s
from scipy.optimize import minimize
x,y,z=s.symbols('x y z')
f= lambda z: x**2-y**2
bnds = ((70,None),(4,6))
res = minimize(lambda z: fun(*x),(70,4), bounds=bnds)
<lambda>() argument after * must be an iterable, not Symbol
How to convert symbol to an iterable or define an iterable directly ?
In Python, calling a function with f(*x) means f(x[0], x[1], ...). That is it expects x to a tuple (or other iterable), and the function should have a definition like
def f(*args):
<use args tuple>
I'm not quite sure what you are trying to do with the sympy code, or why you are using it instead of defining a function in Python/numpy directly.
A function like:
def f(z):
x,y = z # expand it to 2 variables
return x**2 - y**2
should work in a minimize call with:
minimize(f, (10,3))
which will vary x and y starting with (10,3) seeking to minimize the f value.
In [20]: minimize(f, (70,4), bounds=((70,None),(4,6)))
Out[20]:
fun: 4864.0
hess_inv: <2x2 LbfgsInvHessProduct with dtype=float64>
jac: array([ 139.99988369, -11.99996404])
message: b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
nfev: 9
nit: 1
status: 0
success: True
x: array([ 70., 6.])

scipy.optimize failure with a "vectorized" implementation

I have an optimization problem (1d) coded in 2 ways - one using a for loop and an other using numpy arrays. The for loop version works fine but the numpy one fails.
Actually it is a bit more complicated, it can work with different starting points (!!) or if I choose an other optimization algo like CG.
The 2 versions (functions and gradients) are giving the same results and the returned types are also the same as far as I can tell.
Here is my example, what am I missing?
import numpy as np
from scipy.optimize import minimize
# local params
v1 = np.array([1., 1.])
v2 = np.array([1., 2.])
# local functions
def f1(x):
s = 0
for i in range(len(v1)):
s += (v1[i]*x-v2[i])**2
return 0.5*s/len(v1)
def df1(x):
g = 0
for i in range(len(v1)):
g += v1[i]*(v1[i]*x-v2[i])
return g/len(v1)
def f2(x):
return 0.5*np.sum((v1*x-v2)**2)/len(v1)
def df2(x):
return np.sum(v1*(v1*x-v2))/len(v1)
x0 = 10. # x0 = 2 works
# tests...
assert np.abs(f1(x0)-f2(x0)) < 1.e-6 and np.abs(df1(x0)-df2(x0)) < 1.e-6 \
and np.abs((f1(x0+1.e-6)-f1(x0))/(1.e-6)-df1(x0)) < 1.e-4
# BFGS for f1: OK
o = minimize(f1, x0, method='BFGS', jac=df1)
if not o.success:
print('FAILURE', o)
else:
print('SUCCESS min = %f reached at %f' % (f1(o.x[0]), o.x[0]))
# BFGS for f2: failure
o = minimize(f2, x0, method='BFGS', jac=df2)
if not o.success:
print('FAILURE', o)
else:
print('SUCCESS min = %f reached at %f' % (f2(o.x[0]), o.x[0]))
The error I get is
A1 = I - sk[:, numpy.newaxis] * yk[numpy.newaxis, :] * rhok
IndexError: invalid index to scalar variable.
but I doesn't really helps me since it can work with some other starting values.
I am using an all new fresh python install (python 3.5.2, scipy 0.18.1 and numpy 1.11.3).
The solver expects the return value of jacobian df2 to be the same shape as its input x. Even though you passed in a scalar here, it's actually converted into a single element ndarray. Since you used np.sum, your result became scalar and that causes strange things to happen.
Enclose the scalar result of df2 with np.array, and your code should work.