Spark sql query execution fails with org.apache.parquet.io.ParquetDecodingException - scala

I am executing a simple create table query in spark sql using spark-submit(cluster mode). Receiving org.apache.parquet.io.ParquetDecodingException. I could get few details on this issue over internet, one of the suggestion was to add the config spark.sql.parquet.writeLegacyFormat=true. The issue still persist after addding this setting.
Below is the query:
spark.sql("""
CREATE TABLE TestTable
STORED AS PARQUET
AS
SELECT Col1,
Col2,
Col3
FROM Stable""")
Error Description :
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file maprfs:///path/disputer/1545555-r-00000.snappy.parquet
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:461)
at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:219)
at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:186)
... 13 more
Caused by: java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.MutableLong cannot be cast to org.apache.spark.sql.catalyst.expressions.MutableInt
Spark Configuration file :
spark.driver.memory=10G
spark.executor.memory=23G
spark.executor.cores=3
spark.executor.instances=100
spark.dynamicAllocation.enabled=false
spark.yarn.preserve.staging.files=false
spark.yarn.executor.extraJavaOptions=-XX:MaxDirectMemorySize=6144m
spark.sql.shuffle.partitions=1000
spark.shuffle.service=true
spark.yarn.maxAppAttempts=1
spark.broadcastTimeout=36000
spark.debug.maxToStringFields=100
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2
spark.network.timeout=600s
spark.sql.parquet.enableVectorizedReader=false
spark.scheduler.listenerbus.eventqueue.capacity=200000
spark.driver.memoryOverhead=1024
spark.yarn.executor.memoryOverhead=5120
spark.executor.extraJavaOptions=-XX:+UseG1GC
spark.driver.extraJavaOptions=-XX:+UseG1GC

This issue was occurring due to disabling spark.sql.parquet.enableVectorizedReader. spark.sql.parquet.enableVectorizedReader=true resolves the issue.
For more details, Visit https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-vectorized-parquet-reader.html

Related

INSERT INTO POSGRESQL DB sink table with FLINK SQL

how to write in FLINK SQL full command with upsert for postgres if it doesn't support this syntax?
2023-02-13 21:22:47,152 ERROR org.apache.flink.connector.jdbc.internal.JdbcOutputFormat [] - JDBC executeBatch error, retry times = 2
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO flink.flink(id, full_name) VALUES (1, 'test') ON CONFLICT (id) DO UPDATE SET id=EXCLUDED.id, full_name=EXCLUDED.full_name was aborted: ERROR: there is no unique or exclusion constraint matching the ON CONFLICT specification Call getNextException to see other errors in the batch.
at org.postgresql.jdbc.BatchResultHandler.handleCompletion(BatchResultHandler.java:201) ~[flink-sql-connector-postgres-cdc-2.3.0.jar:2.3.0]

SQL0443 error when executing a Select with a Trigger on DB2 for iSeries

I've some problem querying a db2/as400 table, let's call it TAB1. Since it was added a trigger on this table when I perform a normal SELECT (using the TAB1-key) I get the following error. Never had a problem before the trigger was created.
It's a query performed in a batch-application (Java 1.6) using Modern Batch and Spring Batch 2.1.8. No chance to update the libraries, since the program is quite old and the customers won't agree.
Anyway,I would say it's a trigger problem ( as the sql codes say) but different applications perform different SELECTs on TAB1 and they never get any similar problem.The batch perform more or less 40000 select like this, and just 300 fail with this error.
Any idea, tip, suggestion?
### Error querying database. Cause: java.sql.SQLException: [SQL0443] *N *N
### The error may exist in class path resource [eu/mycompany/el20/dq/as400/dataaccess/mappers/tab1/Tab1Mapper.xml]
### The error may involve eu.mycompany.el20.dq.as400.dataaccess.persistence.tab.Tab1Mapper.selectByExample-Inline
### The error occurred while setting parameters
### SQL: select * from TAB1 WHERE ( D10_SOC = ? and D10_COD_NDG = ? and D10_DATE = ? )
### Cause: java.sql.SQLException: [SQL0443] *N *N
; uncategorized SQLException for SQL []; SQL state [38501]; error code [-443]; [SQL0443] *N *N; nested exception is java.sql.SQLException: [SQL0443] *N *Nstack trace: org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:83)
org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:80)
org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:80)
org.mybatis.spring.MyBatisExceptionTranslator.translateExceptionIfPossible(MyBatisExceptionTranslator.java:71)
org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:365)
com.sun.proxy.$Proxy120.selectList(Unknown Source)
org.mybatis.spring.SqlSessionTemplate.selectList(SqlSessionTemplate.java:195)
org.apache.ibatis.binding.MapperMethod.executeForMany(MapperMethod.java:124)
org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:90)
org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:40)
com.sun.proxy.$Proxy136.selectByExample(Unknown Source)
eu.mycompany.el20.dq.as400.crud.services.tab.BLSTab1.select(BLSTab1.java:46)
it.mycompany.xframe.dq.batch.steps.programstep.ProgramExecutor.processRecord(ProgramExecutor.java:544)
com.ibm.websphere.batch.devframework.steps.technologyadapters.GenericXDBatchStep.processRecord(GenericXDBatchStep.java:263)
com.ibm.websphere.batch.devframework.steps.technologyadapters.GenericXDBatchStep.processJobStep(GenericXDBatchStep.java:227)
com.ibm.ws.gridcontainer.batch.impl.StepManagerImpl._regularJobBatchLoop(StepManagerImpl.java:1065)
com.ibm.ws.gridcontainer.batch.impl.StepManagerImpl.executeStep(StepManagerImpl.java:390)
com.ibm.ws.gridcontainer.security.actions.ExecuteStepBatchUserPrivilegedAction.executeAction(ExecuteStepBatchUserPrivilegedAction.java:47)
com.ibm.ws.gridcontainer.security.AbstractUserPrivilegedAction.runWithoutSecurity(AbstractUserPrivilegedAction.java:66)
com.ibm.ws.gridcontainer.services.impl.WASRunUnderCredentialServiceImpl.runUnderUserCredential(WASRunUnderCredentialServiceImpl.java:134)
com.ibm.ws.gridcontainer.services.impl.WASRunUnderCredentialServiceImpl.runActionUnderUserCredential(WASRunUnderCredentialServiceImpl.java:386)
com.ibm.ws.gridcontainer.batch.impl.JobManagerImpl._sequentialStepScheduling(JobManagerImpl.java:783)
com.ibm.ws.gridcontainer.batch.impl.JobManagerImpl.executeJob(JobManagerImpl.java:199)
com.ibm.ws.batch.BatchJobControllerWork._runJob(BatchJobControllerWork.java:435)
com.ibm.ws.batch.BatchJobControllerWork.run(BatchJobControllerWork.java:241)
com.ibm.ws.asynchbeans.J2EEContext$RunProxy.run(J2EEContext.java:271)
java.security.AccessController.doPrivileged(AccessController.java:399)
com.ibm.ws.asynchbeans.J2EEContext.run(J2EEContext.java:797)
com.ibm.ws.asynchbeans.WorkWithExecutionContextImpl.go(WorkWithExecutionContextImpl.java:222)
com.ibm.ws.asynchbeans.ABWorkItemImpl.run(ABWorkItemImpl.java:206)
java.lang.Thread.run(Thread.java:790)
The text for message SQL0443 is 'Trigger program or external routine detected an error'.
I would suggest looking at the host database server job for the JDBC connection.
On the IBM i command line, run the command WRKOBJLCK OBJ(<user>) OBJTYPE(*USRPRF) (where <user> is the user profile you're using to do the JDBC connection) and find jobs named QZDASOINIT. These are the database host server jobs.
In these jobs, look at the joblog (or joblog spool file) to find the SQL0443 message ... around that message you should see what the actual error is.

Issues while using Snowflake component In Talend

To transfer data from Ms sql server 2008 to Snowflake I used talend , but every time I get error as
java.io.IOException: net.snowflake.client.loader.Loader$ConnectionError: State: CREATE_TEMP_TABLE, SQL compilation error: error line 1 at position 68
invalid identifier '"columnname"'
at org.talend.components.snowflake.runtime.SnowflakeWriter.close(SnowflakeWriter.java:397)
at org.talend.components.snowflake.runtime.SnowflakeWriter.close(SnowflakeWriter.java:52)
at local_project.load_jobnotes_0_1.Load_Jobnotes.tMSSqlInput_1Process(Load_Jobnotes.java:2684)
at local_project.load_jobnotes_0_1.Load_Jobnotes.runJobInTOS(Load_Jobnotes.java:3435)
at local_project.load_jobnotes_0_1.Load_Jobnotes.main(Load_Jobnotes.java:2978)
Caused by: net.snowflake.client.loader.Loader$ConnectionError: State: CREATE_TEMP_TABLE, SQL compilation error: error line 1 at position 68
invalid identifier '"ID"'
at net.snowflake.client.loader.ProcessQueue.run(ProcessQueue.java:349)
at java.lang.Thread.run(Thread.java:748)
Caused by: net.snowflake.client.jdbc.SnowflakeSQLException: SQL compilation error: error line 1 at position 68
The column does exist in my Snowflake DB still I get error as column does not exist
On analysing what query Talend executing in snowflake I found that It tries to create a temporary table to store data but in doing so it selects all column from table between “ ” double quotes and hence error comes as invalid identifier '"columnname"'
If I execute the same query manually without double quotes its works fine , can you please let us know what is workaround of this issue
Query executed by talend in snowflake for your reference
CREATE TEMPORARY TABLE "Tablename_20171024_115736_814_1"
AS SELECT "column1","column2","column3"
FROM "database"."schema"."table" WHERE FALSE
The issue is most likely due to a case mismatch between the object names in Snowflake and what is being sent through the connector. On the Snowflake side, all object names are stored as UPPER CASE. Suggest you try passing COLUMN1, COLUMN2, etc and see if that works.
You can also try setting the QUOTED_IDENTIFIERS_IGNORE_CASE to true, it might help.
I found that this issue is due to mixed case database or schema names not properly being applied by Talend. I discover a hack by updating the Snowflake connector role parameter and added something such as this screenshot:

Kafka connect JDBC connector

I am trying to use io.confluent.connect.jdbc.JdbcSourceConnector in bulk mode using the query .
query = select name, cast(ID as NUMBER(20,2)),status from table_name
Is this possible?
If possible am I missing something?
I am getting
exception (org.apache.kafka.connect.runtime.WorkerTask:148)
org.apache.avro.SchemaParseException: Illegal character in: CAST(IDASNUMBER(20,2))
Use an alias for the 2nd column. This alias must be the name of the target avro-field.

iReport designer 4.5.1 /4.6.0 cannot interact with Hive

I have followed the instructions from here and installed the updated plugin. The error has become:
Query error
Message: net.sf.jasperreports.engine.JRException:
Error executing SQL statement for : null Level: SEVERE Stack Trace:
Error executing SQL statement for : null com.jaspersoft.hadoop.hive.HiveFieldsProvider.getFields(HiveFieldsProvider.java:113)
com.jaspersoft.ireport.hadoop.hive.designer.HiveFieldsProvider.getFields(HiveFieldsProvider.java:32)
com.jaspersoft.ireport.hadoop.hive.connection.HiveConnection.readFields(HiveConnection.java:154)
com.jaspersoft.ireport.designer.wizards.ConnectionSelectionWizardPanel.validate(ConnectionSelectionWizardPanel.java:146)
org.openide.WizardDescriptor$7.run(WizardDescriptor.java:1357)
org.openide.util.RequestProcessor$Task.run(RequestProcessor.java:572)
org.openide.util.RequestProcessor$Processor.run(RequestProcessor.java:997)
After downgrading to 4.5.0 the error has become (the connection is verified and I am able to query the table from hive):
Query error
Message: net.sf.jasperreports.engine.JRException: Query returned non-zero code: 10, cause:
FAILED: Error in semantic analysis: Line 1:14 Table not found 'panstats' Level:
SEVERE Stack Trace: Query returned non-zero code: 10, cause:
FAILED: Error in semantic analysis: Line 1:14 Table not found 'panstats'
com.jaspersoft.hadoop.hive.HiveFieldsProvider.getFields(HiveFieldsProvider.java:260)
com.jaspersoft.ireport.hadoop.hive.designer.HiveFieldsProvider.getFields(HiveFieldsProvider.java:32)
com.jaspersoft.ireport.hadoop.hive.connection.HiveConnection.readFields(HiveConnection.java:146)
com.jaspersoft.ireport.designer.wizards.ConnectionSelectionWizardPanel.validate(ConnectionSelectionWizardPanel.java:146)
org.openide.WizardDescriptor$7.run(WizardDescriptor.java:1357)
org.openide.util.RequestProcessor$Task.run(RequestProcessor.java:572)
org.openide.util.RequestProcessor$Processor.run(RequestProcessor.java:997)
I am using Hive 0.8.1 on OS X Lion 10.7.4.
Is your query as simple as select * from panstats? I suspect that the query is not the problem, but you'll want to confirm that first.
You could try querying that table from a tool like SQuirreL SQL. If that tool also cannot get the data, then it's probably a Hive issue. If it can... then it's probably an issue with iReport or the Hive plugin.
It sounds like Hive is not configured to share metadata. It uses the annoying default configuration with Derby, so outside connections don't get access to your panstats table. I came across this article about configuring Hive earlier this year. It documents using MySQL instead of derby. If that's indeed the problem, then it's just a Hive configuration issue. Following that article would solve things both for SQuirreL and for iReport.