How to delete <0xfeff> characters in Delta table? - pyspark

I have some BOM characters in my Delta tables.
Environment : Databricks - Microsoft Azure.
I would like to find these characters <0xfeff> and delete it. But I don't know how to do it.
I tried to catch these characters during the ingestion and delete them with Pyspark.
Do you have an idea to do it directly with SQL perhaps ? Thanks.

Related

How to use Glue to remove ''(single quote) in Redshift?

When I upload table from S3 to AWS redshift by using Glue, the table that shows on Redshift including single quote('') in the table.
I think it is the white space in the original table. Please help me to solve this problem. Thank you very much.

I am Migrating data. How do i remove the errors on my column names?

Migrating SQL tables from MSSQL to Postgresql using Pentaho Data Integration Having errors when running the job. I think that i have to put double quotes on the column but idont know where to add it.

Font decoding problem by importing records from a table in a DB2 database into IBM Lotus Notes documents using a Lotuscript code agent

I have an agent written in Lotuscript (IBM Domino 9.0.1 - Windows 10) that reads records into a DB2 database and writes them to Notes documents. The table in DB2 (Centos OS) contains international names in Varchar fields such as "Łódź".
The DB2 database was created as UTF-8 CodePage: 1208 and Domino by its nature supports UNICODE. Unfortunately, the value loaded in the notes document is not "Łódź" as it should be but it is "? Ód?".
How can I import special characters from DB2
in Domino NSF DBs in correct ways?
Thank you
To import the table I used the following code taken from OpenNtfs XSnippets:
https://openntf.org/XSnippets.nsf/snippet.xsp?id=db2-run-from-lotusscript-into-notes-form
Find where the codepage conversion is happening. Alter the lotusscript to dump the hex of the received data for the column-concerned to a file or in a dialog-box. If the hex codes differ from what is in the column, then it may be your Db2-client that is using the wrong codepage. Are you aware of the DB2CODEPAGE environment variable for Windows? That might help if it is the Db2-client that is doing the codepage conversion.
i.e setting environment variable DB2CODEPAGE=1208 may help, although careful testing is required to ensure it does not cause other symptoms that are mentioned online.

Output a mySQL query to CSV with Postgres

So I have a decent size database (roughly 8 million rows) that I need to pull data from. It needs to be output into a CSV that can be opened by Excel. I've tried virtually every solution I found, to no avail.
\copy - Puts all values in a single column, separated by ','.
copy...to...with csv header - same result as above.
copy...into outfile - refuses to work. Claims there's something wrong with my path, when I used the same path as before
I'm not the most experienced with my SQL to say the least, but I'll try my best to provide any information necessary.
Have you try mysql DUMP?
I have experience like you, to backup or upload 11 million data.
And success with mysqlDUMP, maybe you can seach like mysqldump for PostGresql

Can one upload a csv file to postgres via pgadmin, without specifying column names beforehand?

Im trying to use the PgAdminn III Import tool and want to upload a .csv file. I dont know the column names OR column numbers beforehand, and would like to have them be populated on the fly. I also know that the number of columns is consistent across rows.
In the sense of having a table dynamically created for you from the CSV, no, not with PgAdmin-III or psql.
You'll want to write a quick script for that with your preferred scripting language + its PostgreSQL driver interface, or use an ETL tool like CloverETL, Pentaho Kettle, or Talend Studio.