Delete data from teradata using pyspark

Delete data from teradata using pyspark - pyspark

I am trying to delete the record from teradata and then write into the table for avoiding duplicates
So i have tried in many ways which is not working
I have tried deleting while reading the data which is giving syntax error like '(' expected between table and delete
spark.read.format('jdbc').options('driver','com.TeradataDriver').options('user','user').options('pwd','pwd').options('dbtable','delete from table').load()
Also tried like below, which is also giving syntax error like something expected between '('and delete
options('dbtable','(delete from table) as td')
2)I have tried deleting while writing the data which is not working
df.write.format('jdbc').options('driver','com.TeradataDriver').options('user','user').options('pwd','pwd').options('dbtable','table').('preactions','delete from table').save()

Possible solution is to call procedure which delete data.
import teradata
host,username,password = '','', ''
udaExec = teradata.UdaExec (appName="test", version="1.0", logConsole=False)
with udaExec.connect(method="odbc"
,system=host
,username=username
,password=password
,driver="Teradata Database ODBC Driver 16.20"
,charset= 'UTF16'
,transactionMode='Teradata') as connect:
connect.execute("CALL db.PRC_DELETE()", queryTimeout=0)

Related

DataBricks: Ingesting CSV data to a Delta Live Table in Python triggers "invalid characters in table name" error - how to set column mapping mode?

First off, can I just say that I am learning DataBricks at the time of writing this post, so I'd like simpler, cruder solutions as well as more sophisticated ones.
I am reading a CSV file like this:
df1 = spark.read.format("csv").option("header", True).load(path_to_csv_file)
Then I'm saving it as a Delta Live Table like this:
df1.write.format("delta").save("table_path")
The CSV headers have characters in them like space and & and /, and I get the error:
AnalysisException:
Found invalid character(s) among " ,;{}()\n\t=" in the column names of your
schema.
Please enable column mapping by setting table property 'delta.columnMapping.mode' to 'name'.
For more details, refer to https://docs.databricks.com/delta/delta-column-mapping.html
Or you can use alias to rename it.
The documentation I've seen on the issue explains how to set the column mapping mode to 'name' AFTER a table has been created using ALTER TABLE, but does not explain how to set it at creation time, especially when using the DataFrame API as above. Is there a way to do this?
Is there a better way to get CSV into a new table?
UPDATE:
Reading the docs here and here, and inspired by Robert's answer, I tried this first:
spark.conf.set("spark.databricks.delta.defaults.columnMapping.mode", "name")
Still no luck, I get the same error. It's interesting how hard it is for a beginner to write a CSV file with spaces in its headers to a Delta Live Table

Thanks to Hemant on the Databricks community forum, I have found the answer.
df1.write.format("delta").option("delta.columnMapping.mode", "name")
.option("path", "table_path").saveAsTable("new_table")
Now I can either query it with SQL or load it into a Spark dataframe:
SELECT * FROM new_table;
delta_df = spark.read.format("delta").load("table_path")
display(delta_df)
SQL Way
This method does the same thing but in SQL.
First, create a CSV-backed table for your CSV file:
CREATE TABLE table_csv
USING CSV
OPTIONS (path '/path/to/file.csv', 'header' 'true', 'mode' 'FAILFAST');
Then create a Delta table using the CSV-backed table:
CREATE TABLE delta_table
USING DELTA
TBLPROPERTIES ("delta.columnMapping.mode" = "name")
AS SELECT * FROM table_csv;
SELECT * FROM delta_table;
I've verified that I get the same error as I did when using Python should I omit the TBLPROPERTIES statement.
I guess the Python answer would be to use spark.sql and run this using Python, that way I could embed the CSV path variable in the SQL.

You can set the option in the Spark Configuration of the cluster you are using. That is how you enable the mode at runtime.
You could also set the config at runtime like this:
spark.conf.set("spark.databricks.<name-of-property>", <value>)

Question/Resolved - "extra data after last expected column" Error when trying to import a csv file into postgresql

Just posting this question and the solution since it took forever for me to figure this out.
Using CSV file, I was trying to import data into PostgreSQL with pgAdmin. I kept running into the same issue of "extra data after last expected column."

Solution that worked for me (instead of using Import module): copy tablename (columns) FROM 'file location .csv' CSV HEADER
Since some of the data included multiple commas within the cell, it was counting as a new column each time.

string or binary data error even if 0 rows are inserted

I am trying to insert data into a temp table by joining other two tables but for some reason, i keep getting this error String or Binary data would be truncated.
On debugging, I realized there are no rows being inserted into the table and it still throws an error.
To get rid of this, I had finally used SET ANSI_WARNINGS OFF inside the stored procedure and it worked fine. Now the issue is I cannot recompile the stored procedure with this settings in the production database and I want this issue to be fixed. And the other thing which is more irritating is, by default the ANSI_WARNINGS are actually OFF for the database.
Please let me know what could be the possible solution. It would be of great help.

Delimiter Issue

I am using a Mac laptop and I am trying to copy a local csv file and import it into a postgresql table. I have used the delimiter query and the following query works:
copy c2013_levinj.va_clips_translation
from local '/Users/jacoblevin/Desktop/clips_translation_table.csv'
Delimiter ',' skip 1 rejectmax 1;
However, each time the query is submitted, I receive a message that says "0 rows fetched." I have tried dropping the table and re-creating it as well as using the "select *" query. Suffice to say, i have been unable to pull any data. Does anyone have any ideas what's wrong? Thanks in advance.

What happens if you try this:
copy c2013_levinj.va_clips_translation
from local '/Users/jacoblevin/Desktop/clips_translation_table.csv'
WITH CSV HEADER;
That should be more robust and do what you want.

Dealing with errors during a copy from

I've to import a file from an external source to a postgresql table.
I tried to do it with \copy from , but I keep getting errors (additional columns) in the middle of the file.
Is there a way to tell postgresql to ignore lines containing errors during a "\copy from" ?
Thanks

Give it a try with PostgreSQL Loader instead.

No. All data is correct or there is no data at all, those are the two options you have in PostgreSQL.

Categories

HOME

google-cloud-dataproc

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Delete data from teradata using pyspark - pyspark

Related

DataBricks: Ingesting CSV data to a Delta Live Table in Python triggers "invalid characters in table name" error - how to set column mapping mode?

Question/Resolved - "extra data after last expected column" Error when trying to import a csv file into postgresql

string or binary data error even if 0 rows are inserted

Delimiter Issue

Dealing with errors during a copy from

Categories

Resources