I am seeing below error when joining two hive tables using pyspark hive context .
error:
""") File
"/usr/hdp/2.3.4.7-4/spark/python/lib/pyspark.zip/pyspark/sql/context.py",
line 552, in sql File
"/usr/hdp/2.3.4.7-4/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in call File
"/usr/hdp/2.3.4.7-4/spark/python/lib/pyspark.zip/pyspark/sql/utils.py",
line 36, in deco File
"/usr/hdp/2.3.4.7-4/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
line 300, in get_return_value py4j.protocol.Py4JJavaError: An error
occurred while calling o41.sql. : org.apache.spark.SparkException: Job
cancelled because SparkContext was shut down EX:
lsf.registerTempTable('temp_table')
out = hc.sql(
"""INSERT OVERWRITE TABLE AAAAAA PARTITION (day ='2017-09-20')
SELECT tt.*,ht.id
FROM temp_table tt
JOIN hive_table ht
ON tt.id = ht.id
""")
Also how to parameterize day ?
Related
When executing the query inside Orange 3, I got an ambiguous column error
QUERY
SELECT CAST(REPLACE(memoria_em_uso, 'G','') AS FLOAT),
CAST(REPLACE(memoria_total, 'G','') as FLOAT),
CAST(REPLACE(memoria_livre, 'M','') as FLOAT)
FROM tbl_memoria_hosts
OUTPUT
Error encountered in widget SQL Table:
Traceback (most recent call last):
File "/home/luis/anaconda3/lib/python3.7/site-packages/Orange/data/sql/backend/postgres.py", line 81, in execute_sql_query
cur.execute(query, params)
psycopg2.errors.AmbiguousColumn: column reference "replace" is ambiguous
LINHA 1: SELECT (("replace")::double precision) AS "replace", (("repl...
Note:
All columns in the table are VARCHAR, where the information is written as follows 7.7G
We are trying Spark DataFrame selectExpr and its working for one column, when i add more than one column it throws error.
First one is working, the second one throws error.
Code sample:
df1.selectExpr("coalesce(gtr_pd_am,0 )").show(2)
df1.selectExpr("coalesce(gtr_pd_am,0),coalesce(prev_gtr_pd_am,0)").show()
Error log:
>>> df1.selectExpr("coalesce(gtr_pd_am,0),coalesce(prev_gtr_pd_am,0)").show()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/hdp/2.6.5.0-292/spark2/python/pyspark/sql/dataframe.py", line 1216, in selectExpr
jdf = self._jdf.selectExpr(self._jseq(expr))
File "/usr/hdp/2.6.5.0-292/spark2/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py", line 1160, in __call__
File "/usr/hdp/2.6.5.0-292/spark2/python/pyspark/sql/utils.py", line 73, in deco
raise ParseException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.ParseException: u"\nmismatched input ',' expecting <EOF>(line 1, pos 21)\n\n== SQL ==\ncoalesce(gtr_pd_am,0),coalesce(prev_gtr_pd_am,0)\n---------------------^^^\n"
check this
df1.selectExpr("coalesce(gtr_pd_am,0)”,”coalesce(prev_gtr_pd_am,0)").show()
You need specify the columns individually
I have a postgres database and I want to run a query and load a table into spark dataframe. some columns of my database are array. For example:
=> select id, f_2 from raw limit 1;
will return
id | f_2
---------+-----------
1 | {{140,130},{NULL,NULL},{NULL,NULL}}
what I want is accessing to 140 (first element of inner array) that is easy in postgres using this query:
=> select id, f_2[1][1] from raw limit 1;
id | f_2
---------+-----------
1 | 140
but I want to load it into spark dataframe and here is my code to load data:
df = sqlContext.sql("""
select id as id,
f_2 as A
from raw
""")
and return this error:
Py4JJavaError: An error occurred while calling o560.collectToPython.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 (TID 4, localhost, executor driver): java.lang.ClassCastException: [Ljava.lang.Integer; cannot be cast to java.lang.Integer
and then I tried this one:
df = sqlContext.sql("""
select id as id,
f_2[0] as A
from raw
""")
and got same error and then tried this one:
df = sqlContext.sql("""
select id as id,
f_2[0][0] as A
from raw
""")
and return this error:
ERROR: An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 0))
AnalysisException: u"Can't extract value from f_2#32685[0];"
Whenever I try and run zobbpack I generate the error:
psycopg2.IntegrityError: null value in column "zoid" violates not-null constraint
Any ideas what is causing this and how to fix it?
relstorage 1.5.1, postgres 8, plone 4.2.1.1
2012-12-03 13:18:03,485 [zodbpack] INFO Opening storage (RelStorageFactory)...
2012-12-03 13:18:03,525 [zodbpack] INFO Packing storage (RelStorageFactory).
2012-12-03 13:18:03,533 [relstorage] INFO pack: beginning pre-pack
2012-12-03 13:18:03,533 [relstorage] INFO pack: analyzing transactions committed Mon Nov 26 12:31:54 2012 or before
2012-12-03 13:18:03,536 [relstorage.adapters.packundo] INFO pre_pack: start with gc enabled
2012-12-03 13:18:03,759 [relstorage.adapters.packundo] INFO analyzing references from objects in 97907 new transaction(s)
2012-12-03 13:18:03,761 [relstorage.adapters.scriptrunner] WARNING script statement failed: '\n INSERT INTO object_refs_added (tid)\n VALUES (%(tid)s)\n '; parameters: {'tid': 0L}
2012-12-03 13:18:03,761 [relstorage.adapters.packundo] ERROR pre_pack: failed
Traceback (most recent call last):
File "/usr/local/lib64/python2.6/site-packages/RelStorage-1.5.1-py2.6.egg/relstorage/adapters/packundo.py", line 486, in pre_pack
conn, cursor, pack_tid, get_references)
File "/usr/local/lib64/python2.6/site-packages/RelStorage-1.5.1-py2.6.egg/relstorage/adapters/packundo.py", line 580, in _pre_pack_with_gc
self.fill_object_refs(conn, cursor, get_references)
File "/usr/local/lib64/python2.6/site-packages/RelStorage-1.5.1-py2.6.egg/relstorage/adapters/packundo.py", line 387, in fill_object_refs
self._add_refs_for_tid(cursor, tid, get_references)
File "/usr/local/lib64/python2.6/site-packages/RelStorage-1.5.1-py2.6.egg/relstorage/adapters/packundo.py", line 459, in _add_refs_for_tid
self.runner.run_script_stmt(cursor, stmt, {'tid': tid})
File "/usr/local/lib64/python2.6/site-packages/RelStorage-1.5.1-py2.6.egg/relstorage/adapters/scriptrunner.py", line 52, in run_script_stmt
cursor.execute(stmt, generic_params)
IntegrityError: null value in column "zoid" violates not-null constraint
Traceback (most recent call last):
File "zodbpack.py", line 86, in <module>
main()
File "zodbpack.py", line 78, in main
skip_prepack=options.reuse_prepack)
File "/usr/local/lib64/python2.6/site-packages/RelStorage-1.5.1-py2.6.egg/relstorage/storage.py", line 1114, in pack
adapter.packundo.pre_pack(tid_int, get_references)
File "/usr/local/lib64/python2.6/site-packages/RelStorage-1.5.1-py2.6.egg/relstorage/adapters/packundo.py", line 486, in pre_pack
conn, cursor, pack_tid, get_references)
File "/usr/local/lib64/python2.6/site-packages/RelStorage-1.5.1-py2.6.egg/relstorage/adapters/packundo.py", line 580, in _pre_pack_with_gc
self.fill_object_refs(conn, cursor, get_references)
File "/usr/local/lib64/python2.6/site-packages/RelStorage-1.5.1-py2.6.egg/relstorage/adapters/packundo.py", line 387, in fill_object_refs
self._add_refs_for_tid(cursor, tid, get_references)
File "/usr/local/lib64/python2.6/site-packages/RelStorage-1.5.1-py2.6.egg/relstorage/adapters/packundo.py", line 459, in _add_refs_for_tid
self.runner.run_script_stmt(cursor, stmt, {'tid': tid})
File "/usr/local/lib64/python2.6/site-packages/RelStorage-1.5.1-py2.6.egg/relstorage/adapters/scriptrunner.py", line 52, in run_script_stmt
cursor.execute(stmt, generic_params)
psycopg2.IntegrityError: null value in column "zoid" violates not-null constraint
Running zodbpack.py without GC enabled (thanks Martijn) worked and enabled the database to pack down from 3.8Gb to 887Mb.
So my zodbpack-conf.xml looks like:
<relstorage>
pack-gc false
<postgresql>
dsn dbname='zodb' user='user' host='host' password='password'
</postgresql>
</relstorage>
and i run it with:
python zodbpack.py -d 7 zodbpack-conf.xml
Note: after the pack completes you still need to vacuum the database to get the space back. I run this from the command line as:
psql zodb postgresuser
zodb=# SELECT pg_database_size('zodb');
zodb=# vacuum full;
zodb=# SELECT pg_database_size('zodb');
Interestingly, when i run the pack command from the ZMI Control Panel, I still get the same error as previously:
Site Error
An error was encountered while publishing this resource.
Error Type: IntegrityError
Error Value: null value in column "zoid" violates not-null constraint
So i am assuming the ZMI pack uses GC enabled, and that there is still an issue with null values in my site database...
Should I try and run some SQL to clean it out? If an object has no 'zoid' value then is it effectively inaccessible junk? A SQL example for this would be great too :)
I downloaded Cassandra 1.1.1 and launched cqlsh under the version 3
I tried to create a new column family:
CREATE TABLE stats (
pid blob,
period int,
targetid blob,
sum counter,
PRIMARY KEY (pid, period, targetid)
);
But I got this:
Traceback (most recent call last):
File "./cqlsh", line 908, in perform_statement
self.cursor.execute(statement, decoder=decoder)
File "./../lib/cql-internal-only-1.0.10.zip/cql-1.0.10/cql/cursor.py", line 117, in execute
response = self.handle_cql_execution_errors(doquery, prepared_q, compress)
File "./../lib/cql-internal-only-1.0.10.zip/cql-1.0.10/cql/cursor.py", line 132, in handle_cql_execution_errors
return executor(*args, **kwargs)
File "./../lib/cql-internal-only-1.0.10.zip/cql-1.0.10/cql/cassandra/Cassandra.py", line 1583, in execute_cql_query
self.send_execute_cql_query(query, compression)
File "./../lib/cql-internal-only-1.0.10.zip/cql-1.0.10/cql/cassandra/Cassandra.py", line 1593, in send_execute_cql_query
self.oprot.trans.flush()
File "./../lib/thrift-python-internal-only-0.7.0.zip/thrift/transport/TTransport.py", line 293, in flush
self._trans.write(buf)
File "./../lib/thrift-python-internal-only-0.7.0.zip/thrift/transport/TSocket.py", line 117, in write
plus = self.handle.send(buff)
error: [Errno 32] Broken pipe
And on the server console:
Error occurred during processing of message.
java.lang.IllegalArgumentException
at java.nio.Buffer.limit(Buffer.java:247)
at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:51)
at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60)
at org.apache.cassandra.db.marshal.AbstractCompositeType.getString(AbstractCompositeType.java:140)
at org.apache.cassandra.config.CFMetaData.validate(CFMetaData.java:929)
at org.apache.cassandra.service.MigrationManager.announceNewColumnFamily(MigrationManager.java:131)
at org.apache.cassandra.cql3.statements.CreateColumnFamilyStatement.announceMigration(CreateColumnFamilyStatement.java:83)
at org.apache.cassandra.cql3.statements.SchemaAlteringStatement.execute(SchemaAlteringStatement.java:99)
at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:108)
at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:121)
at org.apache.cassandra.thrift.CassandraServer.execute_cql_query(CassandraServer.java:1237)
at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3542)
at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3530)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:186)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
I'd suggest reporting bugs at https://issues.apache.org/jira/browse/CASSANDRA.