Pyspark 'tzinfo' error when using the Cassandra connector - pyspark

I'm reading from Cassandra using
a = sc.cassandraTable("my_keyspace", "my_table").select("timestamp", "vaue")
and then want to convert it to a dataframe:
a.toDF()
and the schema is correctly infered:
DataFrame[timestamp: timestamp, value: double]
but then when materializing the dataframe I get the following error:
Py4JJavaError: An error occurred while calling o89372.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 285.0 failed 4 times, most recent failure: Lost task 0.3 in stage 285.0 (TID 5243, kepler8.cern.ch): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main
process()
File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/opt/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/types.py", line 541, in toInternal
return tuple(f.toInternal(v) for f, v in zip(self.fields, obj))
File "/opt/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/types.py", line 541, in <genexpr>
return tuple(f.toInternal(v) for f, v in zip(self.fields, obj))
File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/sql/types.py", line 435, in toInternal
return self.dataType.toInternal(obj)
File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/sql/types.py", line 190, in toInternal
seconds = (calendar.timegm(dt.utctimetuple()) if dt.tzinfo
AttributeError: 'str' object has no attribute 'tzinfo'
which sounds like a string as been given to pyspark.sql.types.TimestampType.
How could I debug this further?

Related

pyspark gives 'KeyError: 0'

I want to image classification with pyspark but
spark_model = sparkdn.fit(train)
gives
PythonException:
An exception was thrown from the Python worker. Please see the stack trace below.
Traceback (most recent call last):
File "/home/esra/anaconda3/envs/env/lib/python3.9/site-packages/sparkdl/image/imageIO.py", line 158, in resizeImageAsRow
imgAsArray = imageStructToArray(imgAsRow)
File "/home/esra/anaconda3/envs/env/lib/python3.9/site-packages/sparkdl/image/imageIO.py", line 121, in imageStructToArray
imType = imageType(imageRow)
File "/home/esra/anaconda3/envs/env/lib/python3.9/site-packages/sparkdl/image/imageIO.py", line 111, in imageType
return sparkModeLookup[imageRow.mode]
KeyError: 0
Any suggestion? Thanks

AttributeError: module 'pyspark.rdd' has no attribute 'V'

py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (hadoop102 executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/opt/module/spark/python/lib/pyspark.zip/pyspark/worker.py", line 601, in main
func, profiler, deserializer, serializer = read_command(pickleSer, infile)
File "/opt/module/spark/python/lib/pyspark.zip/pyspark/worker.py", line 71, in read_command
command = serializer._read_with_length(file)
File "/opt/module/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 160, in _read_with_length
return self.loads(obj)
File "/opt/module/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 430, in loads
return pickle.loads(obj, encoding=encoding)
AttributeError: module 'pyspark.rdd' has no attribute 'V'

Random forest classifier gives out a weird error

def predict(training_data, test_data):
# TODO: Train random forest classifier from given data
# Result should be an RDD with the prediction of the random forest for each
# test data point
RANDOM_SEED = 13579
RF_NUM_TREES = 3
RF_MAX_DEPTH = 4
RF_NUM_BINS = 32
model = RandomForest.trainClassifier(training_data, numClasses=2, categoricalFeaturesInfo={}, \
numTrees=RF_NUM_TREES, featureSubsetStrategy="auto", impurity="gini", \
maxDepth=RF_MAX_DEPTH, seed=RANDOM_SEED)
predictions = model.predict(test_data.map(lambda x: x.features))
labels_and_predictions = test_data.map(lambda x: x.label).zip(predictions)
return predictions
I encounter below error:
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 122.0 failed 1 times, most recent failure: Lost task 0.0 in stage 122.0 (TID 226, localhost, executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "C:\Spark\spark-2.4.3-bin-hadoop2.7\spark-2.4.3-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\worker.py", line 377, in main
File "C:\Spark\spark-2.4.3-bin-hadoop2.7\spark-2.4.3-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\worker.py", line 372, in process
File "C:\Spark\spark-2.4.3-bin-hadoop2.7\spark-2.4.3-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\serializers.py", line 393, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "C:\Spark\spark-2.4.3-bin-hadoop2.7\spark-2.4.3-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\util.py", line 99, in wrapper
return f(*args, **kwargs)
File "<ipython-input-20-170be0983095>", line 12, in <lambda>
File "C:\Spark\spark-2.4.3-bin-hadoop2.7\spark-2.4.3-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\mllib\linalg\__init__.py", line 483, in __getattr__
return getattr(self.array, item)
AttributeError: 'numpy.ndarray' object has no attribute 'features'
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:452)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:588)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:571)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:406)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:153)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:148)

Expected binary or unicode string, got 0.0

I'm training my first model with TensorFlow, but I keep having this error:
Expected binary or unicode string, got 0.0
I followed TensorFlow linear model tutorial (https://www.tensorflow.org/tutorials/wide) and applied it on my own dataset.
This is what I get:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/nick/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
return func(*args, **kwargs)
File "/home/nick/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/home/nick/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 953, in _train_model
features, labels = input_fn()
File "<stdin>", line 2, in train_input_fn
File "<stdin>", line 5, in input_fn
File "<stdin>", line 5, in <dictcomp>
File "/home/nick/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 102, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/home/nick/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 473, in make_tensor_proto
append_fn(tensor_proto, proto_values)
File "/home/nick/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 109, in SlowAppendObjectArrayToTensorProto
tensor_proto.string_val.extend([compat.as_bytes(x) for x in proto_values])
File "/home/nick/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 109, in <listcomp>
tensor_proto.string_val.extend([compat.as_bytes(x) for x in proto_values])
File "/home/nick/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/compat.py", line 65, in as_bytes
(bytes_or_text,))
TypeError: Expected binary or unicode string, got 0.0
Any suggestion?
Thanks

Celery log shows cleanup failed

I am using celery with django. I see an error when I lookup the celery log for the automatically scheduled cleanup. I am not sure what this means, and the implications of not doing the cleanup. Any help is appreciated.
[2013-09-28 23:00:00,204: ERROR/MainProcess] Task celery.backend_cleanup[65af1634-374a-4068-b1a5-749b70f7c78d] raised exception: NotImplementedError('No updates',)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/celery-3.0.15-py2.7.egg/celery/task/trace.py", line 228, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/celery-3.0.15-py2.7.egg/celery/task/trace.py", line 415, in __protected_call__
return self.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/celery-3.0.15-py2.7.egg/celery/app/builtins.py", line 58, in backend_cleanup
app.backend.cleanup()
File "/usr/local/lib/python2.7/dist-packages/djcelery/backends/database.py", line 58, in cleanup
model._default_manager.delete_expired(expires)
File "/usr/local/lib/python2.7/dist-packages/djcelery/managers.py", line 110, in delete_expired
self.get_all_expired(expires).update(hidden=True)
File "/usr/local/lib/python2.7/dist-packages/django/db/models/query.py", line 469, in update
rows = query.get_compiler(self.db).execute_sql(None)
File "/usr/local/lib/python2.7/dist-packages/djangotoolbox/db/basecompiler.py", line 376, in execute_sql
raise NotImplementedError('No updates')