How to use postgresql copy command with ANSI encoding? - postgresql

I use PostgreSQL copy command to make a csv file.
COPY (select * from table) from '/tmp/123456.csv' with csv header encoding 'UTF8';
this command is ok
But I want to make file with ANSI encoding, because I want to open it in Microsoft Excel
COPY (select * from table) from '/tmp/123456.csv' with csv header encoding 'ANSI';
this command can't make csv file

Since you seem to be on Windows, maybe you want to use one of the Windows encodings, for example WIN1252 for western European languages.

Related

PostgreSQL Copy To - CSV filename encoding

I have a database setup with a UTF-8 encoding. Trying to copy a table to csv, where the filename has a special character writes out the filename wrong to disk.
On a Windows 10 localhost PostgreSQL installation:
copy
(select 'tønder')
to 'C:\temp\Sønderborg.csv' (FORMAT CSV, HEADER TRUE, DELIMITER ';', ENCODING 'UTF8');
Names the csv file: Sønderborg.csv and not Sønderborg.csv.
Both
SHOW CLIENT ENCODING;
SHOW SERVER_ENCODING;
returns UTF8
How can one control the csv filename encoding? The encoding inside the csv is ok writing Tønder!
UPDATE
I have run the copy command from pgAdmin, DataGrip and a psql console. DataGrip uses JDBC and will only handle UTF8. All three applications writes the csv filename in wrong encoding. The only difference is that the psql console says the client encoding is WIN1252.
I don't think it's possible to change this behaviour. It looks like Postgres assumes that the filename encoding matches the server_encoding (as suggested on the mailing lists here and here). The only workaround I could find was to run the command while connected to a WIN1252-encoded database, which is probably not very helpful.
If you're trying to run this on the same machine as the server itself, then instead of using the server-side COPY, you can run psql's client-side \copy, which will respect your client_encoding when interpreting the file path:
psql -c "\copy (select 'tønder') to 'C:\temp\Sønderborg.csv' (FORMAT CSV, HEADER TRUE, DELIMITER ';', ENCODING 'UTF8')"
Note that cmd.exe (and even powershell.exe) still uses legacy DOS encodings by default, so you might need to run chcp 1252 to set the console codepage before launching psql.

Does pg_dump preserve all Unicode characters when .sql file is ANSI?

I use
pg_dump.exe -U postgres -f "file-name.sql" database-name
to backup UTF-8 encoded databases on PostgreSQL 8.4 and 9.5, Windows host. Some may have foreign characters such as Chinese, Thai etc stored in Characters columns.
The resulting .sql file shows ANSI encoding when opening in Notepad++ (I'm NOT applying ANSI to opened files by default). How do I know if Unicode characters are always preserved in the dump file? Should I be using an archive (object) backup file instead?
Quote from the manual
By default, the dump is created in the database encoding.
There is no difference in a text file in ANSI encoding and UTF-8 if no extended characters are used. Maybe your dump has no special characters and thus the editor doesn't identify it as UTF-8.
If you want the SQL dump in a specific encoding, use the --encoding=encoding parameter or the PGCLIENTENCODING environment variable

Character with byte sequence 0x9d in encoding 'WIN1252' has no equivalent in encoding 'UTF8'

I am reading a csv file in my sql script and copying its data into a postgre sql table. The line of code is below :
\copy participants_2013 from 'C:/Users/Acrotrend/Desktop/mip_sahil/mip/reelportdata/Participating_Individual_Extract_Report_MIPJunior_2013_160414135957.Csv' with CSV delimiter ',' quote '"' HEADER;
I am getting following error : character with byte sequence 0x9d in encoding 'WIN1252' has no equivalent in encoding 'UTF8'.
Can anyone help me with what the cause of this issue and how can I resolve it?
The problem is that 0x9D is not a valid byte value in WIN1252.
There's a table here: https://en.wikipedia.org/wiki/Windows-1252
The problem may be that you are importing a UTF-8 file and postgresql is defaulting to Windows-1252 (which I believe is the default on many windows systems).
You need to change the character set on your windows command line before running the script with chcp. Or in postgresql you can:
SET CLIENT_ENCODING TO 'utf8';
Before importing the file.
Simply specify encoding 'UTF-8' as the encoding in the \copy command, e.g. (I broke it into two lines for readability but keep it all on the same line):
\copy dest_table from 'C:/src-data.csv'
(format csv, header true, delimiter ',', encoding 'UTF8');
More details:
The problem is that the Client Encoding is set to WIN1252, most likely because it is running on Windows machine but the file has a UTF-8 character in it.
You can check the Client Encoding with
SHOW client_encoding;
client_encoding
-----------------
WIN1252
Any encoding has numeric ranges of valid code. Are you sure so your data are in win1252 encoding?
Postgres is very strict and doesn't import any possible encoding broken files. You can use iconv that can works in tolerant mode, and it can remove broken chars. After cleaning by iconv you can import the file.
I had this problem today and it was because inside of a TEXT column I had fancy quotes that had been copy/pasted from an external source.

COPY function in PostgreSQL

I would like to use the COPY function in PostgreSQL to import a CSV file into a PostgreSQL database.
Where it says the filename in the documentation, does the CSV file have to be stored in a specific location or can it be stored in any location.
For example, copy data_table from '/tmp/outputdata.csv' WITH DELIMITER AS ',' CSV QUOTE AS '"';. Where it says tmp, does that mean the tmp folder in the C: drive. Can it be change to another folder name?
It looks like you are confused by Linux vs. Windows file-path notation. What you have there is a Linux path anchored to root. Windows uses drive letters, which you can specify just as well when you are running on Windows.
If you use Windows notation, take care that you have to escape backslashes if you are not using standard_conforming_strings = on - which is the default in Postgres 9.1 or later, but not in older versions. Like:
COPY data_table from E'C:\\tmp\\outputdata.csv' WITH ...
With standard_conforming_strings = on you can simply write:
COPY data_table from 'C:\tmp\outputdata.csv' WITH ...
Note that a PostgreSQL Windows server also understands default path notation with slashes instead of backslashes.
For SQL COPY FROM / TO you can use any path that the owner of server process (postgres by default) has permission to read / write.
For the \copy meta command of the psql client the permissions of current local user apply.
Yes, of course you can specify whatever location where you have read access. There's no problem changing the path of the file.
Keep attention only on the fact that on windows you have to escape the backslash in this way :
copy data_table from 'c:\\Temp\\outputdata.csv' WITH DELIMITER AS ',' CSV QUOTE AS '"';

Oracle -Character Encoding

I am facing a peculiar problem where i need to update a particular value in database to say 'Hellò'. When i run normal update statement the values are updated fine. But when i put it i a .sql script file and then run the update statement the last character gets replaced by a junk value. Can some one enlighten me on this and how oracle processes script files?
If the update statement works then this isn't an issue with the character encoding in the database so there are two main culprits to look at; your software and your file encoding.
Check that you editor is UTF8 compliant - Notepad for example is not, but Wordpad is, and so are better editors like UltraEdit. You also need to check that the file is saved as UTF8 since this in not always the default and if you edited and saved a file with Notepad, for example, it won't be UTF8 any more.
Once you have a UTF8 file then you must load it into the database via software that supports UTF8, which excludes SQLPlus in Windows 10g. SQL Developer is OK. If you want to use SQLPLus upgrade your client to 11g. You don't say how you are loading it, so check that whatever you are using supported UTF8.