Oracle FOR ORDINALITY functionality in pyspark

Oracle FOR ORDINALITY functionality in pyspark - pyspark

I am converting existing oracle code to pyspark. While converting oracle JSON code to pyspark, found FOR ORDINALITY. How can we convert this to pyspark?
Thank you
I tried with row_number but not working.

Related

How to execute a update query in spark sql temp tables

I am trying the below code but it is throwing some random error that I am unable to understand:
df.registerTempTable("Temp_table")
spark.sql("Update Temp_table set column_a='1'")

Currently spark sql does not support UPDATE statments. The workaround is to use create a delta lake / iceberg table using your spark dataframe and execute you sql query directly on this table.
For iceberg implementation refer to :
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-iceberg.html

How to convert sql cursor output to spark dataframe in spark?

I got an output using cursor.fetchall().
How can I convert the output into Spark dataframe and create a parquet file in Pyspark?

You should use JDBC connection to connect Spark with your database

writing psycopg2 query result to pyspark dataframe

Is there a way to directly fetch the contents of a table from a postgresQL database into a pyspark dataframe using the psycopg2 library?
The solutions online so far only talk about using a pandas dataframe. But that is not possible with very large set of data in spark since it would be loading all the data to the driver node.
The code I am using is as follows:
conn = psycopg2.connect(database="databasename", user='user', password='pass', host='postgres.host, port= '5432'
)
cur = conn.cursor()
cur.execute("select * from database.table limit 10")
data = cur.fetchall()
The resulting data output is a tuple that is difficult to convert to a dataframe.
Any suggestions would be greatly appreciated

Directly use spark jdbc to connect to postgresql to read the data, and it will return a dataframe.

Execute Postgresql Stored Procedure in PySpark

I am working on Pyspark in AWS Glue
I want to execute Stored Procedure/Function on Postgresql Database
Is it possible?
What is the syntax? Is there any special package needed?
Ankur

You can try using a module like pg8000 to run this function
You can also try calling the postgres function like you would select data from a specific table using the spark read function with jdbc as the format. Considering glue uses pyspark in the back end, i would imagine just giving the function name instead of a table name, should do the trick. Just remember to add the jdbc driver to your glue job
eg: You can do this in spark
jdbcDF = spark.read.format("jdbc").option("url","jdbc:postgresql://host:5432/db").option("driver", "org.postgresql.Driver").option("query", "SELECT * from function()").option("user", "user").option("password", "password").load()

Setting date format parameter on a sqoop-import job

I am having trouble casting a date column to a string using sqoop-import from an oracle database to an HDFS parquet file. I am using the following:
sqoop-import -Doraoop.oracle.session.initialization.statements="alter session set nls_date_format='YYYYMMDD'"
My understanding is that this should execute the above statement before it begins transferring data. I have also tried
-Duser.nls_date_format="YYYYMMDD"
But this doesn't work either, the resulting parquet file still contains the original date format as listed in the table. If it matters, I am running these in a bash script and also casting the same date columns to string using --map-column-java "MY_DATE_COL_NAME=String"What am I doing wrong?
Thanks very much.

Source: SqoopUserGuide
Oracle JDBC represents DATE and TIME SQL types as TIMESTAMP values. Any DATE columns in an Oracle database will be imported as a TIMESTAMP in Sqoop, and Sqoop-generated code will store these values in java.sql.Timestamp fields.
You can try casting date to String while importing within the query.
For Example
sqoop import -- query 'select col1, col2, ..., TO_CHAR(MY_DATE_COL_NAME, 'YYYY-MM-DD') FROM TableName WHERE $CONDITIONS'

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Oracle FOR ORDINALITY functionality in pyspark - pyspark

I am converting existing oracle code to pyspark. While converting oracle JSON code to pyspark, found FOR ORDINALITY. How can we convert this to pyspark? Thank you I tried with row_number but not working.

Related

How to execute a update query in spark sql temp tables

How to convert sql cursor output to spark dataframe in spark?

writing psycopg2 query result to pyspark dataframe

Execute Postgresql Stored Procedure in PySpark

Setting date format parameter on a sqoop-import job

Categories

Resources