Write Spark Dataframe to PosgreSQL

Write Spark Dataframe to PosgreSQL - postgresql

I'm trying to write a Spark Dataframe to a pre-created PostgreSQL table. I get the following error during the INSERT process of my job :
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO ref.tableA(a,b) VALUES ('Mike',548758) was aborted. Call getNextException to see the cause.
I also tried to catch the error and call the getNextException method but i still have the same error in the logs. In order to write the Dataframe to the corresponding table i used the following process :
val jdbcProps = new java.util.Properties()
jdbcProps.setProperty("driver", Config.psqlDriver)
jdbcProps.setProperty("user", Config.psqlUser)
jdbcProps.setProperty("password", Config.psqlPassword)
jdbcProps.setProperty("stringtype", "unspecified")
df.write
.format("jdbc")
.mode(SaveMode.Append)
.jdbc(Config.psqlUrl, tableName, jdbcProps)
Package versions :
- Spark : 1.6.2
- Scala : 2.10.6
Any ideas ?

Related

Delete records from postgres from databricks. (pyspark)

So i am using pyspark to connect to postgres database from databricks, i can read , i can create table and also update it. but i am unable to delete a record.
dfs = spark.read.format('jdbc')\
.option("url", jdbcUrl)\
.option("user", user)\
.option("password", password)\
.option("query", "DELETE FROM meta.test4 WHERE Emp_Id = 1")\
.load()
this snippet here results in a syntax error
org.postgresql.util.PSQLException: ERROR: syntax error at or near "FROM"
How do i delete a record in postgres?

spark.read is only used for reading data. Internally, it wraps the query in a SELECT * FROM (<query>) so your statement actually becomes:
SELECT * FROM (DELETE FROM meta.test4 WHERE Emp_Id = 1)
and this obviously causes syntax error as you described.
If you want to run DML/DDL operations against remote database, you need to explicitly connect and run a statement using JDBC's Connection and Statement classes. This tutorial provides a nice overview.

spark sql unable to find the database and table which it earlier wrote to

There is a spark component creating a sql table out of transformed data. It successfully saves the data into spark-warehouse under the <database_name>.db folder. The component also tries to read from existing table in order to not blindly overwrite. While reading, spark is unable to find any database other than default.
sparkVersion: 2.4
val spark: SparkSession = SparkSession.builder().master("local[*]").config("spark.debug.maxToStringFields", 100).config("spark.sql.warehouse.dir", "D:/Demo/spark-warehouse/").getOrCreate()
def saveInitialTable(df:DataFrame) {
df.createOrReplaceTempView(Constants.tempTable)
spark.sql("create database " + databaseName)
spark.sql(
s""" create table if not exists $databaseName.$tableName
|using parquet partitioned by (${Constants.partitions.mkString(",")})
|as select * from ${Constants.tempTable}""".stripMargin)
}
def deduplication(dataFrame: DataFrame): DataFrame ={
if(Try(spark.sql("show tables from " + databaseName)).isFailure){
//something
}
}
After saveInitialTable function is performed successfully. In the second run, the deduplication function still is not able to pick up the <database_name>
I am not using hive explicitly anywhere, just spark DataFrames and SQL API.
When I run the repl in the same directory as spark-warehouse, it too gives on default database.
scala> spark.sql("show databases").show()
2021-10-07 18:45:57 WARN ObjectStore:6666 - Version information not found in metastore.
hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
2021-10-07 18:45:57 WARN ObjectStore:568 - Failed to get database default, returning
NoSuchObjectException
+------------+
|databaseName|
+------------+
| default|
+------------+

Problem reading AVRO messages from a Kafka Topic using Structured Spark Streaming (Spark Version 2.3.1.3.0.1.0-187/ Scala version 2.11.8)

I am invoking spark-shell like this
spark-shell --jars kafka-clients-0.10.2.1.jar,spark-sql-kafka-0-10_2.11-2.3.0.cloudera1.jar,spark-streaming-kafka-0-10_2.11-2.3.0.jar,spark-avro_2.11-2.4.0.jar,avro-1.9.1.jar
After That I read from a Kafka Topic using readStream()
val df = spark.readStream.format("kafka").option("kafka.bootstrap.servers",
"kafka-1.x:9093,kafka-2.x:9093,kafka-0.x:9093").option("kafka.security.protocol","
SASL_SSL").option("kafka.ssl.protocol","TLSv1.2").option("kafka.sasl.mechanism","PLAIN").option("kafka.sasl.jaas.config","org.apache.kafka.common.security.plain.PlainLoginModule
required username=\"token\" password=\"XXXXXXXXXX\";").option("subscribe", "test-topic").option("startingOffsets",
"latest").load()
Then I read the AVRO Schema File
val jsonFormatSchema = new String(Files.readAllBytes(Paths.get("/root/avro_schema.json")))
Then I make the DataFrame which matches the AVRO schema
val DataLineageDF = df.select(from_avro(col("value"),jsonFormatSchema).as("DataLineage")).select("DataLineage.*")
This Throws an Error on me :
java.lang.NoSuchMethodError: org.apache.avro.Schema.getLogicalType()Lorg/apache/avro/LogicalType;
I could fix this Problem by replacing the jar spark-avro_2.11-2.4.0.jar with spark-avro_2.11-2.4.0-palantir.31.jar
Issue:
DataLineageDF.writeStream.format("console").outputMode("append").trigger(Trigger.ProcessingTime("10 seconds")).start
Fails, with this Error
Exception in thread "stream execution thread for [id = ad836d19-0f29-499a-adea-57c6d9c630b2, runId = 489b1123-a2b2-48ea-9d24-e6744e0959b0]" java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.boxedType(Lorg/apache/spark/sql/types/DataType;)Ljava/lang/String;
which seems to be related to In-compatible jars. If anyone has any idea what's going wrong please comment

Unable to write Spark's DataFrame to hive using presto

I'm writing some code to save a DataFrame to a hive database using presto
df.write.format("jdbc")
.option("url", "jdbc:presto://myurl/hive?user=user/default")
.option("driver","com.facebook.presto.jdbc.PrestoDriver")
.option("dbtable", "myhivetable")
.mode("overwrite")
.save()
this actually must work , but this actually raises an exception
java.lang.IllegalArgumentException: Can't get JDBC type for array<string>

Error when importing into a spark dataframe a mssql table named as an integer

I have a table in MSSQL named as dbo.1table and I need to convert it into a dataframe and later on save it as an avro file but I can't even load it as dataframe.
I tested my code with tables named with characters a-z and it works, I tried to convert the table name "toString()" and nothing has worked so far.
I expect to have a dataframe and then save it as an avro file.
Instead I have the folloiwng error:
val DFDimAccountOperator = spark.read.format("jdbc")
.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
.option("url", connection)
.option("dbtable", "dbo.1table")
.option("user", userId)
.option("password", pwd).load()
DFDimAccountOperator.write.format("avro").save("conversionTypes/testinAVro13")
Exception in thread "main" com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near '.1'. at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1621)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:592)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:522)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7194)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2935)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:248)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:223)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeQuery(SQLServerPreparedStatement.java:444)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:61)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:210)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
at com.aon.ukbi.TomsExample$.runJdbcDatasetExample(TomsExample.scala:27)
at com.aon.ukbi.TomsExample$.main(TomsExample.scala:16)

connection
For making a connection between MSSQL and Spark you need to add sqljdbc jar into $SPARK_HOME/jars location and restart your spark-shell and paste these line into Spark Shell.
scala> val DFDimAccountOperator = spark.read.format("jdbc").option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") .option("url", "jdbc:sqlserver://xxxxxxxx.xxx:1433;database=xx;user=xxxxxx;password=xxxxxx") .option("dbtable", "xxxxxx").load()
Restart and Re-run Code(replace XXXX with appropiate value)
after this, you can write your data frame whichever format you want.
DFDimAccountOperator.write.format("avro").save("conversionTypes/testinAVro13")
Hope this helps you let me know if you have further queries related to this if its solve your purpose accept the answer HAppy HAdooop

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Write Spark Dataframe to PosgreSQL - postgresql

Related

Delete records from postgres from databricks. (pyspark)

spark sql unable to find the database and table which it earlier wrote to

Problem reading AVRO messages from a Kafka Topic using Structured Spark Streaming (Spark Version 2.3.1.3.0.1.0-187/ Scala version 2.11.8)

Unable to write Spark's DataFrame to hive using presto

Error when importing into a spark dataframe a mssql table named as an integer

Categories

Resources