Cryptic python error 'classobj' object has no attribute '__getitem__'. Why am I getting this? - class

I really wish I could be more specific here but I have read through related questions and none of them seem to relate to the issue that I am experiencing here and I have no understanding of the issue i am experiencing. This is for a homework assignment so I am hesitant to put up all my code for the program, here is a stripped down version. Compile this and you will see the issue.
import copy
class Ordering:
def __init__(self, tuples):
self.pairs = copy.deepcopy(tuples)
self.sorted = []
self.unsorted = []
for x in self.pairs:
self.addUnsorted(left(x))
self.addUnsorted(right(x))
def addUnsorted(self, item):
isPresent = False
for x in self.unsorted:
if x == item:
isPresent = True
if isPresent == False:
self.unsorted.append(left(item))
Here I have created a class, Ordering, that takes a list of the form [('A', 'B'), ('C', 'B'), ('D', 'A')] (where a must come before b, c must come before b, etc.) and is supposed to return it in partial ordered form. I am working on debugging my code to see if it works correctly but I have not been able to yet because of the error message I get back.
When I input the follwing in my terminal:
print Ordering[('A', 'B'), ('C', 'B'), ('D', 'A')]
I get back the following error message:
Traceback (most recent call last): File "<stdin>", line 1, in (module) Type Error: 'classobj' object has no attribute '__getitem__'
Why is this?!

To access an element of a list, use square brackets. To instantiate a class, use parens.
In other words, do not use:
print Ordering[('A', 'B'), ('C', 'B'), ('D', 'A')]
Use:
print Ordering((('A', 'B'), ('C', 'B'), ('D', 'A')))
This will generate another error from deeper in the code but, since this is a homework assignment, I will let you think about that one a bit.
How to use __getitem__:
As a minimal example, here is a class that returns squares via __getitem__:
class HasItems(object):
def __getitem__(self, key):
return key**2
In operation, it looks like this:
>>> a = HasItems()
>>> a[4]
16
Note the square brackets.

Answer to "Why is this?"
Your demo-code is not complete ( ref. comment above ), however the issue with .__getitem__ method is clearly related with a statement to print an object ( which due to other reasons did fail to respond to a request to answer to a called .__getitem__ method ) rather than the Class itself.
>>> aList = [ ('A','B'), ('C','D'), ('E','F')] # the stated format of input
>>> aList # validated to be a list
[('A', 'B'), ('C', 'D'), ('E', 'F')]
>>> type( aList ) # cross-validated
<type 'list'>
>>> for x in aList: # iterator over members
... print x, type( x ) # show value and type
... left( x ) # request as in demo-code
...
('A', 'B') <type 'tuple'>
Traceback (most recent call last): <<< demo-code does not have it
File "<stdin>", line 3, in <module>
NameError: name 'left' is not defined
>>> dir( Ordering ) # .__getitem__ method missing
[ '__doc__', '__init__', '__module__', 'addUnsorted']
>>> dir( aList[0] ) # .__getitem__ method present
['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__',
'__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__',
'__getslice__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__',
'__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
'count', 'index']

Related

Can I use a pandas-like string expression for filtering a DataFrame?

I am considering replacing my use of pandas with polars in a tool that allows users to input predicate expressions for filtering/subsetting data rows. This allows users to use expressions that the pandas.DataFrame.query method can parse, such as "x > 1", as a very simple example.
However, I can't seem to find a way to use the same types of string expressions with polars.DataFrame.filter so that I can swap out pandas for polars without requiring users to change their predicate expressions.
The only thing I've found that's close to my question is this posting: String as a condition in a filter
Unfortunately, that's not quite what I need, as it still requires a string expression like "pl.col('x') > 1" rather than simply "x > 1".
Is there a way to use the simpler ("agnostic") syntax with polars?
Using the example from the polars.DataFrame.filter docs:
>>> df = pl.DataFrame(
... {
... "foo": [1, 2, 3],
... "bar": [6, 7, 8],
... "ham": ["a", "b", "c"],
... }
... )
When calling df.filter, I'm forced to use expressions like the following:
pl.col("foo") < 3
(pl.col("foo") < 3) & (pl.col("ham") == "a")
However, I want to be able to use the following string expressions instead, respectively, so that the users of the tool (currently using pandas) do not have to be aware of the polars-specific syntax (thus allowing me to swap libraries without impacting users):
"foo < 3"
"foo < 3 & ham == 'a'"
When I attempt to do so, here's what happens, which is puzzling since str is one of the supported types for the predicate argument, so it is unclear as to the syntax supported for str predicates since the docs do not show any examples of such:
>>> df.filter("foo < 3")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Caskroom/miniconda/base/envs/gedi_subset/lib/python3.10/site-packages/polars/internals/dataframe/frame.py", line 2565, in filter
self.lazy()
File "/usr/local/Caskroom/miniconda/base/envs/gedi_subset/lib/python3.10/site-packages/polars/utils.py", line 391, in wrapper
return fn(*args, **kwargs)
File "/usr/local/Caskroom/miniconda/base/envs/gedi_subset/lib/python3.10/site-packages/polars/internals/lazyframe/frame.py", line 1165, in collect
return pli.wrap_df(ldf.collect())
exceptions.NotFoundError: foo < 3
What I was expecting was the same return value that df.filter(pl.col("foo") < 3) would return.
You could try to use the SqlContext for that.
import polars as pl
ctxt = pl.SQLContext()
df = pl.DataFrame(
{
"foo": [1, 2, 3],
"bar": [6, 7, 8],
"ham": ["a", "b", "c"],
}
)
ctxt.register("df", df.lazy())
string_expr = "foo < 3 and ham = 'a'"
(ctxt.query(f"""
SELECT * FROM df
WHERE {string_expr}
"""))
shape: (1, 1)
┌─────┐
│ x │
│ --- │
│ i64 │
╞═════╡
│ 3 │
└─────┘
Note that the SQL language doesn't use the bitwise & nor equality == the same way as pandas, so you might need to replace & with and and == with =.

polars LazyFrame.with_context().filter() throws unexpected NotFoundError for column

I have two LazyFrames, df1 and df2.
After filtering df2 according to df1 max value, I want to concatenate them.
But combination of with_context() and filter() on LazyFrames will raise NotFoundError.
What's the best way to do this?
import polars as pl
df1 = pl.DataFrame({'foo': [0, 1], 'bar': ['a', 'a']}).lazy()
df2 = pl.DataFrame({'foo': [1, 2, 3], 'bar': ['b', 'b', 'b']}).lazy()
df = pl.concat(
[
df1,
df2.with_context(df1.select(pl.col('foo').alias('foo_')))
.filter(pl.col('foo') > pl.col('foo_').max())
]).collect()
# ---------------------------------------------------------------------------
# NotFoundError Traceback (most recent call last)
# [<ipython-input-2-cf44deab2d4b>](https://localhost:8080/#) in <module>
# 4 df2 = pl.DataFrame({'foo': [1, 2, 3], 'bar': ['b', 'b', 'b']}).lazy()
# 5
# ----> 6 df = pl.concat(
# 7 [
# 8 df1,
#
# 1 frames
# [/usr/local/lib/python3.8/dist-packages/polars/utils.py](https://localhost:8080/#) in wrapper(*args, **kwargs)
# 327 def wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
# 328 _rename_kwargs(fn.__name__, kwargs, aliases)
# --> 329 return fn(*args, **kwargs)
# 330
# 331 return wrapper
#
# [/usr/local/lib/python3.8/dist-packages/polars/internals/lazyframe/frame.py](https://localhost:8080/#) in collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, no_optimization, slice_pushdown, common_subplan_elimination, streaming)
# 1166 streaming,
# 1167 )
# -> 1168 return pli.wrap_df(ldf.collect())
# 1169
# 1170 def sink_parquet(
#
# NotFoundError: foo_
When I assign comparison result as a column, error not raised,
(df2.with_context(df1.select(pl.col('foo').alias('foo_')))
.with_column((pl.col('foo') > pl.col('foo_').max()).alias('x'))
.filter(pl.col('x'))).collect()
# OK
But I drop that column after filter(), error again.
(df2.with_context(df1.select(pl.col('foo').alias('foo_')))
.with_column((pl.col('foo') > pl.col('foo_').max()).alias('x'))
.filter(pl.col('x'))
.drop('x')).collect()
# NotFoundError: foo_
Finally I find this works. But what's the difference between the previous?
(it seems verbose. other good solution exists?)
(df2.with_context(df1.select(pl.col('foo').alias('foo_')))
.with_column(pl.col('foo_').max())
.filter(pl.col('foo') > pl.col('foo_'))
.drop('foo_')).collect()
# OK
related?
https://stackoverflow.com/a/71108312/7402018

PySpark error when converting DF column to list

I have a problem with my Spark script.
I have dataframe 2, which is a single column dataframe. What I want to achieve is, returning only the results from df1 where the user is in the list.
I've tried the below, but get an error (also below)
Can anyone please advise?
listx= df2.select('user2').collect()
df_agg = df1\
.coalesce(1000)\
.filter((df1.dt == 20181029) &(df1.user.isin(listx)))\
.select('list of fields')
Traceback (most recent call last):
File "/home/keenek1/indev/rax.py", line 31, in <module>
.filter((df1.dt == 20181029) &(df1.imsi.isin(listx)))\
File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/column.py", line 444, in isin
File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/column.py", line 36, in _create_column_from_literal
File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py", line 1160, in __call__
File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py", line 320, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.sql.functions.lit.
: java.lang.RuntimeException: Unsupported literal type class java.util.ArrayList [234101953127315]
at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:77)
at org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$create$2.apply(literals.scala:163)
at org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$create$2.apply(literals.scala:163)
at scala.util.Try.getOrElse(Try.scala:79)
at org.apache.spark.sql.catalyst.expressions.Literal$.create(literals.scala:162)
at org.apache.spark.sql.functions$.typedLit(functions.scala:113)
at org.apache.spark.sql.functions$.lit(functions.scala:96)
at org.apache.spark.sql.functions.lit(functions.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Not sure this is the best answer but:
# two single column dfs to try replicate your example:
df1 = spark.createDataFrame([{'a': 10}])
df2 = spark.createDataFrame([{'a': 10}, {'a': 18}])
l1 = df1.select('a').collect()
# l1 = [Row(a=10)] - this is not an accepted value for the isin as it seems:
df2.select('*').where(df2.a.isin(l_x)).show() # this will throw and error
df2.select('*').where(df2.a.isin([10])).show() # this will NOT throw and error
So something like:
l2 = [item.a for item in l1]
# l2 = [10]
df2.where(F.col('a').isin(l2)).show()
(Which is a bit weird to be honest but... there is a ticket for supporting isin with single column dataframes)
Hope this helps, good luck!
edit: this is provided the collected list is a small one :)
Your example would be:
listx= [item.user2 for item in df2.select('user2').collect()]
df_agg = df1\
.coalesce(1000)\
.filter((df1.dt == 20181029) &(df1.user.isin(listx)))\
.select('list of fields')

Keras does not mach model with classes

I am new to Keras and I am trying to make a Neuronal Network to recognize 38 cases. I created such a model, but it just does not work. There is some problem with last layer I think. I checked summary and it looks like output of last layers is 38 as it should. Can someone help me with making it work?
My code is:
model = Sequential()
model.add(Convolution2D(16, 5, 5, border_mode='valid', input_shape=(168, 192, 3)) )
model.add( Activation('relu') )
model.add( MaxPooling2D(2,2) )
model.add( Convolution2D(16, 5, 5) )
model.add( Activation('relu') )
model.add( MaxPooling2D(2,2) )
model.add( Flatten() )
model.add( Dense(512, activation='relu'))
model.add(Dense(38, activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer=adam(0.001),metrics=['accuracy'])
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
train_data_dir = 'data/train'
validation_data_dir = 'data/validation'
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
'data/train',
target_size=(168, 192),
batch_size=38,
class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
'data/validation',
target_size=(168, 192),
batch_size=38,
class_mode='binary')
model.fit_generator(
train_generator,
steps_per_epoch=2000,
epochs=10,
validation_data=validation_generator,
validation_steps=800)
and the error looks like:
ValueError: Error when checking target: expected dense_129 to have shape (None, 38) but got array with shape (38, 1)
According to Keras documentation of from_from_directory, the specified directory ('data/train' in your case) should contain one subdirectory per class.
Since the error is saying the model is getting an array of shape (38, 1), this means you do not have 38 folders with data/train. (Note do not confuse that the first 38 dimension is the batch size, which coincidentally you have set it to same as number of classes, but does not have to be).
So you should either reaarange your subfolders into one class per subfolder, or load data manually, and flow from memory.

How to add meta_data to Pandas dataframe?

I use Pandas dataframe heavily. And need to attach some data to the dataframe, for example to record the birth time of the dataframe, the additional description of the dataframe etc.
I just can't find reserved fields of dataframe class to keep the data.
So I change the core\frame.py file to add a line _reserved_slot = {} to solve my issue. I post the question here is just want to know is it OK to do so ? Or is there better way to attach meta-data to dataframe/column/row etc?
#----------------------------------------------------------------------
# DataFrame class
class DataFrame(NDFrame):
_auto_consolidate = True
_verbose_info = True
_het_axis = 1
_col_klass = Series
_AXIS_NUMBERS = {
'index': 0,
'columns': 1
}
_reserved_slot = {} # Add by bigbug to keep extra data for dataframe
_AXIS_NAMES = dict((v, k) for k, v in _AXIS_NUMBERS.iteritems())
EDIT : (Add demo msg for witingkuo's way)
>>> df = pd.DataFrame(np.random.randn(10,5), columns=list('ABCDEFGHIJKLMN')[0:5])
>>> df
A B C D E
0 0.5890 -0.7683 -1.9752 0.7745 0.8019
1 1.1835 0.0873 0.3492 0.7749 1.1318
2 0.7476 0.4116 0.3427 -0.1355 1.8557
3 1.2738 0.7225 -0.8639 -0.7190 -0.2598
4 -0.3644 -0.4676 0.0837 0.1685 0.8199
5 0.4621 -0.2965 0.7061 -1.3920 0.6838
6 -0.4135 -0.4991 0.7277 -0.6099 1.8606
7 -1.0804 -0.3456 0.8979 0.3319 -1.1907
8 -0.3892 1.2319 -0.4735 0.8516 1.2431
9 -1.0527 0.9307 0.2740 -0.6909 0.4924
>>> df._test = 'hello'
>>> df2 = df.shift(1)
>>> print df2._test
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\Python\lib\site-packages\pandas\core\frame.py", line 2051, in __getattr__
(type(self).__name__, name))
AttributeError: 'DataFrame' object has no attribute '_test'
>>>
This is not supported right now. See https://github.com/pydata/pandas/issues/2485. The reason is the propogation of these attributes is non-trivial. You can certainly assign data, but almost all pandas operations return a new object, where the assigned data will be lost.
Your _reserved_slot will become a class variable. That might not work if you want to assign different value to different DataFrame. Probably you can assign what you want to the instance directly.
In [6]: import pandas as pd
In [7]: df = pd.DataFrame()
In [8]: df._test = 'hello'
In [9]: df._test
Out[9]: 'hello'
I think a decent workaround is putting your datafame into a dictionary with your metadata as other keys. So if you have a dataframe with cashflows, like:
df = pd.DataFrame({'Amount': [-20, 15, 25, 30, 100]},index=pd.date_range(start='1/1/2018', periods=5))
You can create your dictionary with additional metadata and put the dataframe there
out = {'metadata': {'Name': 'Whatever', 'Account': 'Something else'}, 'df': df}
and then use it as out[df]