Spark read delimited csv ignoring escape - scala

I need to read a csv delimited by "|": each column value is a string and it is included between "".
I use this code to read the file
val df = spark.read.option("header", "true").option("delimiter","|").option("escape", "_").option("inferSchema","false").csv("maprfs:///iper/file.txt")
Is there a way to ignore and not use the escape character?
Otherwise how I can delete a special character in the csv file (for example "\" or "_") and reload it as a dataframe?

Related

postgresql copy from with csv array literal whose delimiter is ','

I'd like to copy from with csv file with Postgres.
That csv file has array literal whose delimiter is ;.
example: a,b,c,{1;2;3}
I did that with replacing delimiter of csv file , to | and set option delimiter | and replacing delimiter of array literal ; to ,.
example: a|b|c|{1,2,3}
I think there may be option to set delimiter of array literal.
If so, I don't have to replace delimiter of csv file.
Are there any smarter way?
There is no option to configure the separator between the elements of an array in its text representation.
But if you have any control over how the CSV file is generated, you can escape the array literal:
a,b,c,"{1,2,3}"
That would work fine with COPY.

How to remove special characters ^# from a dataframe in pyspark

How can i prevent the special characters i.e ^# from being written to the file while writing the dataframe to s3?
using df.option("quote", "") while saving to file handled the ascii null character.

customize spark csv line terminator

I am using pyspark code to generate csv from a dataframe using below code,
df.repartition(1).write.format('com.databricks.spark.csv').option("header","true").mode("overwrite").save("/user/test")
But, when i open and see the line terminator in notepad++, it is coming with default line terminator "\n". I have tried different options such as textinputformat record delimiter set etc. but no luck. Is there a way to customize this EOL while exporting dataframe to csv in spark. Actually i need to customize this EOL with CRLF ("\r\n").Appreciate any help. Thanks.
You can use the lineSep option to set a single character as line separator.
(
df.repartition(1).write.format('com.databricks.spark.csv')
.option("header", "true")
.mode("overwrite")
.option("lineSep", "^")
.save("/user/test")
)
Docs source: https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-option

Postgres COPY command with literal delimiter

I was trying to import a CSV file into a PostgreSQL table using the COPY command. The delimiter of the CSV file is comma (,). However, there's also a text field with a comma in the value. For example:
COPY schema.table from '/folder/foo.csv' delimiter ',' CSV header
Here's the content of the foo.csv file:
Name,Description,Age
John,Male\,Tall,30
How to distinguish between the literal comma and the delimiter?
Thanks for your help.
To have the \ to be recognized as a escape character it is necessary to use the text format
COPY schema.table from '/folder/foo.csv' delimiter ',' TEXT
But then it is also necessary to delete the first line as the HEADER option is only valid for the CSV format.

Trying to import a CSV file into postgres with comma as delimeter

I am trying to import a CSV file into Posgres that has a comma as delimiter. I do:
\COPY products(title, department) from 'toyd.csv' with (DELIMITER ',');
All super cool.
However, title and department are both strings. I have some commas that are in these columns that I don't want to be interpreted as delimiters. So I pass the strings in quotes. But this doesn't work. Postgres still thinks they are delimiters. What are my missing?
Here is a snippet from the CSV that causes the problem:
"Light","Reading, Writing & Spelling"
Any ideas?
You aren't using CSV format there, just a comma-delimited one.
Tell it you want FORMAT CSV and it should default to quoted text - you could also change the quoting character if necessary.