Create Index thru SPARK for JDBC - postgresql

I am trying to create Index on Postgres Table thru Spark and the code is as below:
val df3 = sqlContext.read.format("jdbc")
.option("url", "jdbc:postgresql://URL")
.option("user", "user")
.option("password", "password")
.option("dbtable", "(ALTER TABLE abc.test1 ADD PRIMARY KEY (test))as t")
.option("driver", "org.postgresql.Driver")
.option("lowerBound", 1L)
.option("upperBound", 10000000L)
.option("numPartitions", 100)
.option("fetchSize", "1000000")
.load()
The error is
Exception in thread "main" org.postgresql.util.PSQLException: ERROR: syntax error at or near "TABLE"
Just wondering can we do that or the above Data frame is wrong. Appreciate your help.

Related

pyspark insert failed using spark.read method

def QueryDB(sqlQuery):
jdbcUrl = mssparkutils.credentials.getSecret("param1","DBJDBCConntring","param3")
spark=SparkSession.builder.appName("show results").getOrCreate()
dbcdf = (spark.read.format("jdbc")
.option("url", jdbcUrl)
.option("query", sqlQuery)
.load()
)
return jdbcdf
df= QueryDB("INSERT INTO schema.table1 (column1, column2) output inserted.column1 values('one', 'two')")
df.show()
the notebook runs without any error but no rows are inserted. any suggestion or sample code to insert into table.
spark.read.format("jdbc") is to read JDBC. If you want to insert data to JDBC you'd want something like this
jdbcDF.write \
.format("jdbc") \
.option("url", "jdbc:postgresql:dbserver") \
.option("dbtable", "schema.tablename") \
.option("user", "username") \
.option("password", "password") \
.save()

databricks: truncate table in redshift

I want to delete redshift table within databricks notebook. I can query the table but when I try to delete it, I get an error message.
truncate_traffic_sql = "TRUNCATE TABLE my_table"
spark.read \
.format("com.databricks.spark.redshift") \
.option("url", f"jdbc:postgresql://fw-rs-qa.xxxxx.us-east-1.redshift.amazonaws.com:5439/mydb?user={credentials['user']}&password={credentials['password']}") \
.option("query", truncate_traffic_sql) \
.option("tempdir", "s3a://my-bucket/Fish/tmp") \
.option("forward_spark_s3_credentials", "true") \
.load()
Caused by: java.sql.SQLException: Exception thrown in awaitResult:
org.postgresql.util.PSQLException: ERROR: syntax error at or near "TABLE"

How to execute SQL truncate table in pysark

Am trying to truncate an Oracle table using pyspark using the below code
truncatesql = """ truncate table mytable """
mape=spark.read \
.format("jdbc") \
.option("url", DB_URL) \
.option("driver", "oracle.jdbc.driver.OracleDriver") \
.option("dbtable", truncatesql) \
.load()
but it keeps throwing java.sql.SQLSyntaxErrorException: ORA-00933: SQL command not properly ended how can I truncate a table using direct SQL query ?
Try by wrapping your query with an alias.
Example:
truncatesql = """(truncate table mytable)e"""
mape=spark.read \
.format("jdbc") \
.option("url", DB_URL) \
.option("driver", "oracle.jdbc.driver.OracleDriver") \
.option("dbtable", truncatesql) \
.load()

Writing SQL table directly to file in Scala

Team,
I'm working on Azure databricks, I'm able to write a dataframe to CSV file using the following option:
df2018JanAgg
.write.format("com.databricks.spark.csv")
.option("header", "true")
.save("dbfs:/FileStore/output/df2018janAgg.csv")
but I'm seeking an option to write data directly from SQL table to CSV in Scala.
Can someone please let me know if such options exist.
Thanks,
Srini
Yes data could be directly loaded between a sql table to Datafame and vice-versa. Reference: https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html
//JDBC -> DataFarme -> CSV
spark.read
.format("jdbc")
.option("url", "jdbc:postgresql:dbserver")
.option("dbtable", "schema.tablename")
.option("user", "username")
.option("password", "password")
.load()
.write.format("com.databricks.spark.csv")
.option("header", "true")
.save("dbfs:/FileStore/output/df2018janAgg.csv")
//DataFarme -> JDBC
df.write
.format("jdbc")
.option("url", "jdbc:postgresql:dbserver")
.option("dbtable", "schema.tablename")
.option("user", "username")
.option("password", "password")
.save()

Spark JDBC returning dataframe only with column names

I am trying to connect to a HiveTable using spark JDBC, with the following code:
val df = spark.read.format("jdbc").
option("driver", "org.apache.hive.jdbc.HiveDriver").
option("user","hive").
option("password", "").
option("url", jdbcUrl).
option("dbTable", tableName).load()
df.show()
but the return I get is only an empty dataframe with modified columns name, like this:
--------------|---------------|
tableName.uuid|tableName.name |
--------------|---------------|
I've tried to read the dataframe in a lot of ways, but it always results the same.
I'm using JDBC Hive Driver, and this HiveTable is located in an EMR cluster. The code also runs in the same cluster.
Any help will be really appreciated.
Thank you all.
Please set fetchsize in option it should work.
Dataset<Row> referenceData
= sparkSession.read()
.option("fetchsize", "100")
.format("jdbc")
.option("url", jdbc.getJdbcURL())
.option("user", "")
.option("password", "")
.option("dbtable", hiveTableName).load();