Numba: UntypedAttributeError in class method - numba

I have the following class and method that should convolve an array with a kernel.
import numpy as np
from numpy.fft import fft2 as FFT, ifft2 as IFFT
from PIL import Image
from tqdm import trange, tqdm
from numba import jit
from time import sleep
import _kernel
class convolve(object):
""" contains methods to convolve two images """
def __init__(self, image_array, kernel):
self.array = image_array
self.kernel = kernel
self.__rangeX_ = self.array.shape[0]
self.__rangeY_ = self.array.shape[1]
self.__rangeKX_ = self.kernel.shape[0]
self.__rangeKY_ = self.kernel.shape[1]
if (self.__rangeKX_ >= self.__rangeX_ or \
self.__rangeKY_ >= self.__rangeY_):
raise ValueError('Must submit suitable sizes for convolution.')
#jit(nopython=True)
def spaceConv(self):
""" normal convolution, O(N^2*n^2). This is usually too slow """
# pad array for convolution
offsetX = self.__rangeKX_ // 2
offsetY = self.__rangeKY_ // 2
self.array = np.pad(self.array, \
[(offsetY, offsetY), (offsetX, offsetX)], \
mode='constant', constant_values=0)
# this is the O(N^2) part of this algorithm
for i in xrange(self.__rangeX_ - 2*offsetX):
for j in xrange(self.__rangeY_ - 2*offsetY):
# Now O(n^2) portion
total = 0.0
for k in xrange(2*offsetX+1):
for t in xrange(2*offsetY+1):
total += self.kernel[k][t] * self.array[i+k][j+t]
self.array[i+offsetX][j+offsetY] = total
return self.array
As an additional note (in case anyone asks), _kernel just generates specific kernels one may want to convolve the image with (e.g. Gaussian, Moffat, etc.), so it has nothing to do with this class.
When I call the above class on an image and kernel, I get the following error:
Traceback (most recent call last):
File "fftconv.py", line 147, in <module>
plt.imshow(conv.spaceConv(), interpolation='none', cmap='gray')
File "/root/anaconda2/lib/python2.7/site-packages/numba/dispatcher.py", line 304, in _compile_for_args
raise e
numba.errors.UntypedAttributeError: Caused By:
Traceback (most recent call last):
File "/root/anaconda2/lib/python2.7/site-packages/numba/compiler.py", line 249, in run
stage()
File "/root/anaconda2/lib/python2.7/site-packages/numba/compiler.py", line 465, in stage_nopython_frontend
self.locals)
File "/root/anaconda2/lib/python2.7/site-packages/numba/compiler.py", line 789, in type_inference_stage
infer.propagate()
File "/root/anaconda2/lib/python2.7/site-packages/numba/typeinfer.py", line 717, in propagate
raise errors[0]
UntypedAttributeError: Unknown attribute "rangeKX" of type pyobject
File "fftconv.py", line 45
[1] During: typing of get attribute at fftconv.py (45)
Failed at nopython (nopython frontend)
Unknown attribute "rangeKX" of type pyobject
File "fftconv.py", line 45
[1] During: typing of get attribute at fftconv.py (45)
This error may have been caused by the following argument(s):
- argument 0: cannot determine Numba type of value <__main__.convolve object at 0xaff5628c>
Usually I'm pretty good at tracing through Python errors to the cause, but because I'm not familiar with the inner-works of Numba, I'm not sure why it doesn't know what type offsetX is. Any suggestions?

One step performed by numba is type-inference. This assigns types to the different values present in the function so that it can compile (in a way that it works fast).
The error means that numba doesn't understand the first input argument on the function (self in this case). Numba works best in plain functions where the arguments are scalars or array (all numeric). One option would be to move the O(n^2) loop into a function of its own and have that function receive the arrays and any other value explicitly, and decorate that function with numba.njit (or numba.jit(nopython=True), which are equivalent
Also worth a try is just trying the code "as is" removing the "nopython=True". If the performance is good enough then leave it alone :). That may happen, as numba.jit is able to detect loops inside the code that can be compiled in "no python" mode and automatically do what is needed so that the loop itself is compiled in full speed mode. The explicit "nopython=True" keyword disables that mode though.

Related

Why am I getting 'isinstance': Cannot determine Numba type?

I am new with Numba. I am trying to accelerate a pretty complicated solver. However, I keep getting an error such as
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend) Untyped global name 'isinstance': Cannot determine Numba type of <class 'builtin_function_or_method'>
I wrote a small example to reproduce the same error:
import numba
import numpy as np
from numba import types
from numpy import zeros_like, isfinite
from numpy.linalg import solve
from numpy.random import uniform
#numba.njit(parallel=True)
def foo(A_, b_, M1=None, M2=None):
x_ = zeros_like(b_)
r = b_ - A_.dot(x_)
flag = 1
if isinstance(M1, types.NoneType): # Error here
y = r
else:
y = solve(M1, r)
if not isfinite(y).any():
flag = 2
if isinstance(M2, types.NoneType):
z = y
else:
z = solve(M2, y)
if not isfinite(z).any():
flag = 2
return z, flag
N = 10
tmp = np.random.rand(N, N)
A = np.dot(tmp, tmp.T)
x = np.zeros((N, 1), dtype=np.float64)
b = np.vstack([uniform(0.0, 1.0) for i in range(N)])
X_1, info = foo(A, b)
Also if I change the decorator to generated_jit() I get the following error:
r = b_ - A_.dot(x_)
AttributeError: 'Array' object has no attribute 'dot'
Numba compiles the function and requires every variables to be statically typed. This means that each variable has only one unique type: one variable cannot be of both the type NoneType and something else as opposed to with CPython based on dynamic typing. Dynamic typing is also a major source of the slowdown of CPython. Thus, using isinstance in nopython JITed Numba functions does not make much sense. In fact, this built-in function is not supported.
That being said, Numba supports optional arguments by specifying optional(ArgumentType) in the signature (note that the resulting type of the variable is optional(ArgumentType) and not ArgumentType nor NoneType. You can then test if the argument is set using if yourArgument is None:. I do not know what is the type of M1 and M2 in your code but they need to be explicitly defined in the signature with optional argument.

SARIMAX statsmodel weird error in Databricks

I'm running a grid search optimazation o a Databricks notebook, the same code runs on my local machine but when I try to run in on Databricks I get a TypeError as follow:
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
The fitting process I'm running is this (note this has defined p,d,q,P,D,Q,m values as I need to check why none model are being fitted):
exodus_train = np.array(np.random.normal(2,1, size=(25,1)))
model = sm.tsa.statespace.SARIMAX(train,
order=[2,0,0],
exog=exodus_train,
seasonal_order=[2,0,0,12],
enforce_stationarity=False,
enforce_invertibility=False).fit()
Than it trow an TypeError:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<command-1275539631463044> in <module>
4 seasonal_order=[2,0,0,12],
5 enforce_stationarity=False,
----> 6 enforce_invertibility=False).fit()
/databricks/python/lib/python3.7/site-packages/statsmodels/tsa/statespace/mlemodel.py in fit(self, start_params, transformed, cov_type, cov_kwds, method, maxiter, full_output, disp, callback, return_params, optim_score, optim_complex_step, optim_hessian, flags, **kwargs)
430 """
431 if start_params is None:
--> 432 start_params = self.start_params
433 transformed = True
434
/databricks/python/lib/python3.7/site-packages/statsmodels/tsa/statespace/sarimax.py in start_params(self)
966 # Although the Kalman filter can deal with missing values in endog,
967 # conditional sum of squares cannot
--> 968 if np.any(np.isnan(endog)):
969 mask = ~np.isnan(endog).squeeze()
970 endog = endog[mask]
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' ```
In case this happens with someone else, this will happen if your time series value has commas as decimal separator or if you column is not a float.

Getting AttributeError: 'OneHotEncoder' object has no attribute '_jdf in pyspark'

I am trying to implement Gradient boosting algorithm on a kaggle dataset in pyspark for learning purpose. I am facing error given below
Traceback (most recent call last):
File "C:/SparkCourse/Gradientboost.py", line 29, in <module>
output=assembler.transform(data)
File "C:\spark\python\lib\pyspark.zip\pyspark\ml\base.py", line 105, in transform
File "C:\spark\python\lib\pyspark.zip\pyspark\ml\wrapper.py", line 281, in _transform
AttributeError: 'OneHotEncoder' object has no attribute '_jdf'
the corresponding code is
from pyspark.sql import SparkSession
from pyspark.ml.feature import StringIndexer,VectorIndexer,OneHotEncoder,VectorAssembler
spark=SparkSession.builder.config("spark.sql.warehouse.dir", "file:///C:/temp").appName("Gradientboostapp").enableHiveSupport().getOrCreate()
data= spark.read.csv("C:/Users/codemen/Desktop/Timeseries Analytics/liver_patient.csv",header=True, inferSchema=True)
#data.show()
print(data.count())
#data.printSchema()
print("After deleting null values")
data=data.na.drop()
print(data.count())
data=StringIndexer(inputCol="Gender",outputCol="GenderIndex").fit(data)
#let onehot encode the data
data=OneHotEncoder(inputCol="GenderIndex",outputCol="gendervec")
usedfeature=["Age","gendervec","Total_Bilirubin","Direct_Bilirubin","Alkaline_Phosphotase","Alamine_Aminotransferase","Aspartate_Aminotransferase","Total_Protiens","Albumin","Albumin_and_Globulin_Ratio"]
#
assembler=VectorAssembler(inputCols=usedfeature,outputCol="features")
output=assembler.transform(data)
output.select("features","category").show()
I have converted Gender category into numerical form by using String indexer then I have tried to perform OnehotEncoding on Genderindex value. I am getting the error when I have performed VectorAssembler in code. May I am missing very silly concept here. kindly help me to figure it out
This line of code is incorrect: data=OneHotEncoder(inputCol="GenderIndex",outputCol="gendervec"). You are setting data to be equal to the OneHotEncoder() object, not transforming the data. You need to call a transform to encode the data. It should look like this.
encoder=OneHotEncoder(inputCol="GenderIndex",outputCol="gendervec")
data = encoder.transform(data)

Can operations on a numpy.memmap be deferred?

Consider this example:
import numpy as np
a = np.array(1)
np.save("a.npy", a)
a = np.load("a.npy", mmap_mode='r')
print(type(a))
b = a + 2
print(type(b))
which outputs
<class 'numpy.core.memmap.memmap'>
<class 'numpy.int32'>
So it seems that b is not a memmap any more, and I assume that this forces numpy to read the whole a.npy, defeating the purpose of the memmap. Hence my question, can operations on memmaps be deferred until access time?
I believe subclassing ndarray or memmap could work, but don't feel confident enough about my Python skills to try it.
Here is an extended example showing my problem:
import numpy as np
# create 8 GB file
# np.save("memmap.npy", np.empty([1000000000]))
# I want to print the first value using f and memmaps
def f(value):
print(value[1])
# this is fast: f receives a memmap
a = np.load("memmap.npy", mmap_mode='r')
print("a = ")
f(a)
# this is slow: b has to be read completely; converted into an array
b = np.load("memmap.npy", mmap_mode='r')
print("b + 1 = ")
f(b + 1)
Here's a simple example of an ndarray subclass that defers operations on it until a specific element is requested by indexing.
I'm including this to show that it can be done, but it almost certainly will fail in novel and unexpected ways, and require substantial work to make it usable.
For a very specific case it may be easier than redesigning your code to solve the problem in a better way.
I'd recommend reading over these examples from the docs to help understand how it works.
import numpy as np
class Defered(np.ndarray):
"""
An array class that deferrs calculations applied to it, only
calculating them when an index is requested
"""
def __new__(cls, arr):
arr = np.asanyarray(arr).view(cls)
arr.toApply = []
return arr
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
## Convert all arguments to ndarray, otherwise arguments
# of type Defered will cause infinite recursion
# also store self as None, to be replaced later on
newinputs = []
for i in inputs:
if i is self:
newinputs.append(None)
elif isinstance(i, np.ndarray):
newinputs.append(i.view(np.ndarray))
else:
newinputs.append(i)
## Store function to apply and necessary arguments
self.toApply.append((ufunc, method, newinputs, kwargs))
return self
def __getitem__(self, idx):
## Get index and convert to regular array
sub = self.view(np.ndarray).__getitem__(idx)
## Apply stored actions
for ufunc, method, inputs, kwargs in self.toApply:
inputs = [i if i is not None else sub for i in inputs]
sub = super().__array_ufunc__(ufunc, method, *inputs, **kwargs)
return sub
This will fail if modifications are made to it that don't use numpy's universal functions. For instance percentile and median aren't based on ufuncs, and would end up loading the entire array. Likewise, if you pass it to a function that iterates over the array, or applies an index to substantial amounts the entire array will be loaded.
This is just how python works. By default numpy operations return a new array, so b never exists as a memmap - it is created when + is called on a.
There's a couple of ways to work around this. The simplest is to do all operations in place,
a += 1
This requires loading the memory mapped array for reading and writing,
a = np.load("a.npy", mmap_mode='r+')
Of course this isn't any good if you don't want to overwrite your original array.
In this case you need to specify that b should be memmapped.
b = np.memmap("b.npy", mmap+mode='w+', dtype=a.dtype, shape=a.shape)
Assigning can be done by using the out keyword provided by numpy ufuncs.
np.add(a, 2, out=b)

Python program continuing to run after encountering an error

I'm in the process of creating a program for my linear algebra class with vector and matrix classes, but I'm having trouble with stringifying my matrix class to print it. What's causing the problem is an if statement that adds a comma after an entry in the matrix if it's not the last entry in a row. What's curious about this is I've isolated the problem to the part of my program that assigns a variable to the index of the entry at hand, but when I added a line after that that printed that variable to try and figure out what was happening, running the program printed the variable AND THEN gave the error from the line before. Here's the code:
import copy
class vector:
def __init__(self, entries):
if type(entries) == list:
self.elements = []
self.dimensionality = len(entries)
for entry in entries:
self.elements.append(entry)
if type(entries) == vector:
self.elements = entries.elements
def __str__(self):
buff = "("
for e in self.elements:
buff += str(e)
if self.elements.index(e) < len(self.elements) - 1:
buff += ", "
buff += ")"
return buff
def __getitem__(self,index):
return self.elements[index]
def __len__(self):
return len(self.elements)
def __mul__(self, otherVector):
if self.dimensionality != otherVector.dimensionality:
raise RuntimeError("Cannot multiply vectors of different dimensions")
else:
product = 0
for e in self.elements:
product += e * otherVector.elements[self.elements.index(e)]
return product
def __eq__(self, otherVariable):
return size(self) == size(otherVariable)
def size(x):
return (x * x)**(1/2)
class matrix:
def __init__(self, entries):
for i in entries:
if len(entries[0]) != len(i):
raise RuntimeError("All rows of matrix must contain the same number of entries")
self.elements = []
for row in entries:
self.elements.append(vector(row))
def __str__(self):
buff = "("
for row in self.elements:
buff += str(row)
a = self.elements.index(row) #this is the line that prompts the error
b = len(self.elements) - 1
print (a) #but this line executes before the error cuts off the rest of the program
print(b)
print(a<b)
if a < b :
buff += ", "
buff += ")"
return buff
print(matrix([[1,2],[2,3]]))
and here's the error it gives me:
Traceback (most recent call last):
File "/Users/sebpole/Documents/vectors.py", line 127, in <module>
print(matrix([[1,2],[2,3]]))
File "/Users/sebpole/Documents/vectors.py", line 83, in __str__
a = self.elements.index(row)
File "/Users/sebpole/Documents/vectors.py", line 38, in __eq__
return size(self) == size(otherVariable)
NameError: name 'size' is not defined
I fixed that specific error by skipping a definition of the function 'size()' and just writing it in to the definition of vector equality. Since it was short that wasn't a problem and the program runs fine after that tweak, but I have the following two conceptual questions:
1) What's going on with the line after the error executing before the error did?
2)What was the problem exactly? Why did the program have a problem with calling a function I defined a little later? Why did taking the index of a row of a matrix call the definition of equality for that row (a vector)?
Use self.save to refer function defined in your class
The error is coming from the call to print in line 127, and the entire line is not being executed. Did you really see printed output in the console, other than the stack trace?
Conceptually, the line in question, print(matrix([[1,2],[2,3]])) does this:
the matrix instance is created successfully
print calls __str__ on that matrix instance
__str__ calls index on the list of vector instances
index needs to look through the list to find a matching value, and calls __eq__ on each member of the list to find the match
your original __eq__ code calls a missing function named size (which you noticed, and fixed)
I am surprised that this produced any output, other than the error.