So i am using pyspark to connect to postgres database from databricks, i can read , i can create table and also update it. but i am unable to delete a record.
dfs = spark.read.format('jdbc')\
.option("url", jdbcUrl)\
.option("user", user)\
.option("password", password)\
.option("query", "DELETE FROM meta.test4 WHERE Emp_Id = 1")\
.load()
this snippet here results in a syntax error
org.postgresql.util.PSQLException: ERROR: syntax error at or near "FROM"
How do i delete a record in postgres?
spark.read is only used for reading data. Internally, it wraps the query in a SELECT * FROM (<query>) so your statement actually becomes:
SELECT * FROM (DELETE FROM meta.test4 WHERE Emp_Id = 1)
and this obviously causes syntax error as you described.
If you want to run DML/DDL operations against remote database, you need to explicitly connect and run a statement using JDBC's Connection and Statement classes. This tutorial provides a nice overview.
Related
Can someone let me know make the first row the header when saving to SQL Server with Databricks
I am currently using the following code to upload / save to SQL in Azure
jdbcUrl = f"jdbc:sqlserver://{DBServer}.database.windows.net:1433;database={DBDatabase};user={DBUser};password={DBPword};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"
df = spark.read.csv("/mnt/lake/RAW/OptionsetMetadata.csv")
df.write.mode("overwrite") \
.format("jdbc") \
.option("url", jdbcUrl) \
.option("dbtable", 'UpdatedProducts')\
.save()
The table looks like the following after saving:
JDBC driver creates the table according to the schema. It looks like that you're reading from the CSV file, and don't specify .option("header", "true") when reading. Just add this option to your read operation.
I am trying to delete records (not truncate as it's based on condition) from postgre table, can some one help me what would be pyspark command?
For selecting/Inserting I am using below command :
Build the query
logDetlSql=f"(select * from {dqLogDetlStgTbl} where prc_name ='{inputParam.prc_name}' ) logDetlStg "
Execute to fetch data using spark jdbc
dfsql_log_detl_stg = spark.read.jdbc(url=dq_dburl, table=logDetlSql, properties=connection_properties)#.select(*dqLogTblColList)
Write using spark jdbc
dfsql_log_detl_stg.write.jdbc(url=dq_dburl, mode="append", table=dqLogDetlTbl,properties=connection_properties)
I`m trying to save my data frame in *.orc using jdbc in postgresql. I have an intermediate table created in my localhost on the server I use, but the table is not saved in postgresql.
I would like to find out what extensions postgresql works with (you may not be able to create a *.orc table in it), and that it accepts - a Dataset or sql query from the created table.
I'm using spark.
Properties conProperties = new Properties();
conProperties.setProperty("driver", "org.postgresql.Driver");
conProperties.setProperty("user", srgs[2]);
conProperties.setProperty("password", args[3]);
finalTable.write()
.format("orc")
.mode(SaveMode.Overwrite)
.jdbc(args[1], "dataTable", conProperties);
spark-submit --class com.pack.Main --master yarn --deploy-mode cluster /project/build/libs/mainC-1.0-SNAPSHOT.jar ha-cluster?jdbc:postgresql://MyHostame:5432/nameUser&user=nameUser&password=passwordUser
You cannot keep the .orc format in Postgres. You wouldn't want that either.
See updated write below
finalTable.write
.format("jdbc")
.option("url", srgs[1])
.option("dbtable", "dataTable")
.option("user", srgs[2])
.option("password", args[3])
.save()
I have a table in MSSQL named as dbo.1table and I need to convert it into a dataframe and later on save it as an avro file but I can't even load it as dataframe.
I tested my code with tables named with characters a-z and it works, I tried to convert the table name "toString()" and nothing has worked so far.
I expect to have a dataframe and then save it as an avro file.
Instead I have the folloiwng error:
val DFDimAccountOperator = spark.read.format("jdbc")
.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
.option("url", connection)
.option("dbtable", "dbo.1table")
.option("user", userId)
.option("password", pwd).load()
DFDimAccountOperator.write.format("avro").save("conversionTypes/testinAVro13")
Exception in thread "main" com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near '.1'. at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1621)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:592)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:522)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7194)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2935)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:248)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:223)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeQuery(SQLServerPreparedStatement.java:444)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:61)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:210)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
at com.aon.ukbi.TomsExample$.runJdbcDatasetExample(TomsExample.scala:27)
at com.aon.ukbi.TomsExample$.main(TomsExample.scala:16)
connection
For making a connection between MSSQL and Spark you need to add sqljdbc jar into $SPARK_HOME/jars location and restart your spark-shell and paste these line into Spark Shell.
scala> val DFDimAccountOperator = spark.read.format("jdbc").option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") .option("url", "jdbc:sqlserver://xxxxxxxx.xxx:1433;database=xx;user=xxxxxx;password=xxxxxx") .option("dbtable", "xxxxxx").load()
Restart and Re-run Code(replace XXXX with appropiate value)
after this, you can write your data frame whichever format you want.
DFDimAccountOperator.write.format("avro").save("conversionTypes/testinAVro13")
Hope this helps you let me know if you have further queries related to this if its solve your purpose accept the answer HAppy HAdooop
I have values in dataframe , and I have created a table structure in Teradata. My requirement is to load dataframe to Teradata. But I am getting error:
I have tried following code :
df.write.format("jdbc")
.option("driver","com.teradata.jdbc.TeraDriver")
.option("url","organization.td.intranet")
.option("dbtable",s"select * from td_s_zm_brainsdb.emp")
.option("user","userid")
.option("password","password")
.mode("append")
.save()
I got an error :
java.lang.NullPointerException at
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:93)
at
org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:518)
at
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
... 48 elided
I changed url option to make it similar to jdbc url, and ran following command:
df.write.format("jdbc")
.option("driver","com.teradata.jdbc.TeraDriver")
.option("url","jdbc:teradata//organization.td.intranet,CHARSET=UTF8,TMODE=ANSI,user=G01159039")
.option("dbtable",s"select * from td_s_zm_brainsdb.emp")
.option("user","userid")
.option("password","password")
.mode("append")
.save()
Still i am getting error:
java.lang.NullPointerException at
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:93)
at
org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:518)
at
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
... 48 elided
I have included following jars:
with --jars option
tdgssconfig-16.10.00.03.jar
terajdbc4-16.10.00.03.jar
teradata-connector-1.2.1.jar
Version of Teradata 15
Spark version 2
Change the jdbc_url and dbtable to the following
.option("url","jdbc:teradata//organization.td.intranet/Database=td_s_zm_brainsdb)
.option("dbtable","emp")
Also note in teradata, there are no row locks, so the above will create a table lock. i.e. it will not be efficient - parallel writes from sparkJDBC are not possible.
Native tools of teradata - fastloader /bteq combinations will work.
Another option - that requires a complicated set up is Teradata Query Grid - this is super fast - Uses Presto behind the scene.
I found actual issue.
JDBC Url should be in following form :-
val jdbcUrl = s"jdbc:teradata://${jdbcHostname}/database=${jdbcDatabase},user=${jdbcUsername},password=${jdbcPassword}"
It was causing exception , because I didnt supply username and password.
Below is code useful while reading data from Teradata table,
df = (spark.read.format("jdbc").option("driver", "com.teradata.jdbc.TeraDriver")
.option("url", "jdbc:teradata//organization.td.intranet/Database=td_s_zm_brainsdb")
.option("dbtable", "(select * from td_s_zm_brainsdb.emp) AS t")
.option("user", "userid")
.option("password", "password")
.load())
This will create data frame in Spark.
For writing data back to database below is statement,
Saving data to a JDBC source
jdbcDF.write \
.format("jdbc") \
.option("url", "jdbc:teradata//organization.td.intranet/Database=td_s_zm_brainsdb") \
.option("dbtable", "schema.tablename") \
.option("user", "username") \
.option("password", "password") \
.save()