'PipelinedRDD' object has no attribute '_get_object_id' - pyspark

I´ve got an issue trying to replicate the example I saw here - https://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-load-data-run-query.
It seems to fail when it comes to : hvacTable = sqlContext.createDataFrame(hvac)
and the error it returns is:
'PipelinedRDD' object has no attribute '_get_object_id'
Traceback (most recent call last):
File "/usr/hdp/current/spark2-client/python/pyspark/sql/context.py", line 333, in createDataFrame
return self.sparkSession.createDataFrame(data, schema, samplingRatio, verifySchema)
File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1124, in __call__
args_command, temp_args = self._build_args(*args)
File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1094, in _build_args
[get_command_part(arg, self.pool) for arg in new_args])
File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 289, in get_command_part
command_part = REFERENCE_TYPE + parameter._get_object_id()
AttributeError: 'PipelinedRDD' object has no attribute '_get_object_id'
I´m following the example to a T, it´s a pyspark notebook in Jupyter.
Why is this error occurring?

you probably running it on newer cluster. Please update "sqlContext" to "spark" to get it to work. We'll update this doc article as well.
Also in Spark 2.x you can now do this operation with DataFrames which simpler. You can replace the snippet creating hvac table with following equivalent:
csvFile = spark.read.csv('wasb:///HdiSamples/HdiSamples/SensorSampleData/hvac/HVAC.csv', header=True, inferSchema=True)
csvFile.write.saveAsTable("hvac")

Related

why reload doesn't work with written from scratch class?

I'm running Python 3.6 on Ubuntu 20.04.
I'm writing an evolutionary algorithm, I translate the chromosome values to a class that has the evaluation function, but when I call the newly written class it doesn't update. I tried using reload, but it gives me an error. Help, please!
def ddWriteStg(individual):
stg= open('EvalInd.py','w')
stg.write('class EvalInd:\n\n')
if individual[1]<individual[3]:
fast=individual[1]
slow=individual[3]
else:
slow=individual[1]
fast=individual[3]
stg.write('\tdef loadParms(self):\n')
stg.write('\t\tself._fast='+str(fast)+'\n')
stg.write('\t\tself._slow='+str(slow)+'\n\n')
stg.write('\tdef __init__(self):\n')
stg.write('\t\tself.loadParms()\n')
stg.write('\t\tpass\n')
stg.write('\tdef evaluate(self):\n')
stg.write('\t\treturn self._fast+self._slow\n')
stg.close()
import importlib
def evaluateInd(individual):
ddWriteStg(individual)
from EvalInd import EvalInd
importlib.reload(EvalInd)
x=EvalInd()
val=x.evaluate()
return val
ind=[1,47,1,52]
a=evaluateInd(ind)
print(str(a))
ind=[1,60,1,80]
a=evaluateInd(ind)
print(str(a))
output: if I don't include the reload line, I get:
99
99
after I include the reload line:
Traceback (most recent call last):
File "test.py", line 65, in <module>
a=evaluateInd(ind)
File "test.py", line 59, in evaluateInd
importlib.reload(EvalInd)
File "/usr/lib/python3.6/importlib/__init__.py", line 139, in reload
raise TypeError("reload() argument must be a module")
TypeError: reload() argument must be a module
Thank you for your help!
A dear friend of mine helper me, and figured out in less than a minute the mistakes:
import EvalInd
x= EvalInd.EvalInd()
the changes above fix everything.

PyMongo .. collection.find_one() .. error

I am using PyMongo version 3.2.2 and trying to make a simple query:
from pymongo import MongoClient
connection = MongoClient('localhost', 27017)
db = connection.terra
acris = db.acris
first_item = acris.find_one()
print(first_item["DOCUMENT ID])
Nothing happens, it waits forever. So then when I do CTRL -C I get the following traceback error:
^CTraceback (most recent call last):
File "first_pymongo_app", line 12, in <module>
acris.find_one()
gotit = waiter.acquire(True, timeout)
The code works for the first 4 lines and fails at the 5th line = acris.find_one()
How can I find this error, so that PyMongo returns the first document/object value with the key specified in the collection?
Also, note that this code works:
print(acris.find())
It returns this object:
<pymongo.cursor.Cursor object at 0x103324be0>

Why does mongodb motor not resolve the data correctly?

I have a huge tornado app that was written in a blocking manner. I'm trying to convert my db calls to run async. I'm having lots of issues.
I keep the mongo calls in a top level folder called lib and in the app folder I keep all my views.
The error i'm getting
Traceback (most recent call last):
File "/Users/marcsantiago/staging_env/lib/python2.7/site-packages/tornado/web.py", line 1445, in _execute
result = yield result
File "/Users/marcsantiago/staging_env/lib/python2.7/site-packages/tornado/gen.py", line 1008, in run
value = future.result()
File "/Users/marcsantiago/staging_env/lib/python2.7/site-packages/tornado/concurrent.py", line 232, in result
raise_exc_info(self._exc_info)
File "/Users/marcsantiago/staging_env/lib/python2.7/site-packages/tornado/gen.py", line 1017, in run
yielded = self.gen.send(value)
File "/Users/marcsantiago/pubgears/app/admin.py", line 179, in get
notes, start_date, stats, last_updated = self.db_data()
File "/Users/marcsantiago/pubgears/app/admin.py", line 76, in db_data
while (yield chain_slugs_updated.fetch_next):
AttributeError: 'NoneType' object has no attribute 'fetch_next'
So inside the lib folder I have this method.
def get_chains_updated(date):
slugs = []
# Chain class can't do aggregate could create a class instance if i want
cursor = db.chain.aggregate([
{'$match':{'last_update':{'$gt':date}}},
{'$group':{'_id':{'site':'$site'}, 'total':{'$sum':'$count'}}}
])
while (yield cursor.fetch_next):
res = yield cursor.next_object()
slugs.append(res['_id']['site'])
yield slugs
Later I call this method one of my views
chain_slugs_updated = yield chaindb.get_chains_updated(yesterday)
slugs = []
#for site in chain_slugs_updated:
while (yield chain_slugs_updated.fetch_next):
site = chain_slugs_updated.next_object()
slugs.append('%s' % (site, site))
notes.append('<strong>%s</strong> chains have been updated in the past 24 hours (%s).' % (chain_slugs_updated.count(), ', '.join(slugs)))
This is what it use to be when I was using pymongo
lib
def get_chains_updated(date):
slugs = []
# Chain class can't do aggregate could create a class instance if i want
results = db.chain.aggregate([
{'$match':{'last_update':{'$gt':date}}},
{'$group':{'_id':{'site':'$site'}, 'total':{'$sum':'$count'}}}
])
for res in results:
slugs.append(res['_id']['site'])
return slugs
view
chain_slugs_updated = chaindb.get_chains_updated(yesterday)
slugs = []
for site in chain_slugs_updated:
slugs.append('%s' % (site, site))
notes.append('<strong>%s</strong> chains have been updated in the past 24 hours (%s).' % (len(chain_slugs_updated), ', '.join(slugs)))
I have tons of code I have to translate to get this async working correctly, I would very much appreciate any help. Thanks.
To return a list of objects from get_chains_updated, you must either return slugs the list (Python 3) or raise gen.Return(slugs) (all Python versions). For more info, see Refactoring Tornado Coroutines.

Why do I get mysterious "TypeError: can only update value with String or number"?

This is really bizarre. If I run this code (as a nose test), it prints "-0:34:00.0" and all is well
def test_o1(self):
observer = ephem.Observer()
observer.lon, observer.lat = math.radians(73.9), math.radians(40.7)
observer.horizon = '-0:34'
print observer.horizon
but, if I run this:
def test_o2(self):
location = UserLocation()
location.foo()
where UserLocation is:
# Document is mongoengine.Document
class UserLocation(Document):
[...]
def foo(self):
observer = ephem.Observer()
observer.lon, observer.lat = math.radians(73.9), math.radians(40.7)
observer.horizon = '-0:34'
I get:
Traceback (most recent call last):
File "/home/roy/deploy/current/python/local/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/roy/deploy/current/code/pyza/models/test_user_location.py", line 82, in test_o2
location.foo()
File "/home/roy/deploy/current/code/pyza/models/user_location.py", line 134, in foo
observer.horizon = '-0:34'
TypeError: can only update value with String or number
Any idea what might be going on?
Arrggghhh. I figured it out. My UserLocation source file starts with:
from future import unicode_literals
apparently, _libastro insists on ascii strings, not unicode.

How can I have cyclic or forward ReferenceField when using reverse_delete_rule in MongoEngine?

This code bombs:
from mongoengine import *
class Employee(Document):
name = StringField()
boss = ReferenceField("Employee", reverse_delete_rule = NULLIFY)
Heres the exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "[…]/mongoengine/base.py", line 791, in __new__
new_class = super_new(cls, name, bases, attrs)
File "[…]/mongoengine/base.py", line 630, in __new__
f.document_type.register_delete_rule(new_class,
File "[…]/mongoengine/fields.py", line 757, in document_type
self.document_type_obj = get_document(self.document_type_obj)
File "[…]/mongoengine/base.py", line 136, in get_document
""".strip() % name)
mongoengine.base.NotRegistered: `Employee` has not been registered
in the document registry.
Importing the document class automatically registers it, has it
been imported?
Removing the reverse_delete_rule fixes the problem, but I would like to have this rule.
I tried this, and it works, but it really looks like crap, and I fear that there might be bad side-effects (so far, I have not seen any, though):
from mongoengine import *
class Employee(Document):
pass # required for the reverse_delete_rule to work on the boss field,
# because the Employee class needs to exist.
class Employee(Document):
name = StringField()
boss = ReferenceField("Employee", reverse_delete_rule = NULLIFY)
Any ideas? Shouldn't this be considered a bug in MongoEngine?
Try use 'self' instead 'Employee':
from mongoengine import *
class Employee(Document):
name = StringField()
boss = ReferenceField("self", reverse_delete_rule = NULLIFY)
See details: https://mongoengine-odm.readthedocs.org/en/latest/guide/defining-documents.html#reference-fields.