Which Postgres multi client_encoding best practice? - postgresql

I have a DB server and client with encoding UTF8,and the client some time write data in SJIS or LATIN1,
ERROR: invalid byte sequence for encoding "UTF8": 0xd5 0x78
I tried to set client_encoding to SJIS then it's worked
I wonder why this error happened?
Because I think UTF8 was support both of them descriped in this docs
https://www.postgresql.org/docs/11/multibyte.html#id-1.6.10.5.7
Are the any way to make it automatic convert without setting encoding manually?

No, you need to explicitly tell it what encoding the client uses.
Postgres does support automatic conversion between the encodings, but it does not support automatic encoding detection. If the client is assumed to use UTF-8 but then writes some SJIS bytes, they simply might end up as invalid.

Related

Can PostgreSQL convert entries to UTF-8 even though the input is Latin1?

I have psql (PostgreSQL) 10.10 and client_encoding is UTF8. Now entries are made by an older Delphi version which cannot use UTF8 so the entries in the DB have the special signs not represented as UTF8. A ™ sign is represented by \u0099 for instance. Is it possible to force a conversion when the sign is entered into the data base? Switching Delphi is not an option right now. I am sorry if this is a basic question. My knowledge about data bases is limited.
It looks like your Delphi client is not using LATIN1, but WINDOWS-1252, because ™ is code point 99 in that encoding.
You can change client_encoding per session, and that is what you should do.
Either let your application execute
SET client_encoding = WIN1252;
or set the PGCLIENTENCODING environment variable or specify client_encoding as part of the connect string.

ERROR: invalid byte sequence for encoding "UTF8" inserting in pgadmin

I am having issues while inserting data in postgres which has a character é .
While inserting this character through PGADMIN, it parses the character to ETX, while the pgsql shell parses it to ^C. When I keep the query with character in a file and pass the file in psql shel it gives me an error :
ERROR: invalid byte sequence for encoding "UTF8": 0x82
My Postgres 9.0 Db encoding is set to UTF-8.
Please let me know how to deal with these kind of characters.
Thanks,
Rohit.
PS: I am not sure if the character can be seen here properly. It is a box drawing character which is represented in
ASCII as – 192 and
UTF- 8 as - U+2514
The simple solution is to find out what encoding your client is using SET client_encoding
For example this may fix your problem:
SET client_encoding = 'WIN1252';
If you are on Windows with pgadmin, a client encoding of Windows 1252 would be the most likely cause of the problem.

Character encoding for Postgres API function return values?

I have a 9.0 postgres server instance and a database using UTF8 character encoding with German_Germany.1252 collation. I'm trying to get my libpq error messages on the client as US-ASCII strings. To this end I do:
PQsetClientEncoding( connection, "SQL_ASCII" );
which returns no error. However, the strings returned from PQerrorMessage() still seem to be UTF8.
Is the return value from PQerrorMessage always guaranteed to be UTF8? No matter the client/server settings?
SQL_ASCII as a client encoding means, pass the bytes through as is, which is exactly what you didn't want. There actually isn't any client encoding that corresponds to just ASCII. If your messages are in German, then you might want a setting such as LATIN1 or LATIN9. Otherwise change the language to English and the messages will be in ASCII anyway.

How to send text in UTF-8 using Indy TIdTCPServer in c++ builder

My client j2me application reading text input stream using UTF-8
reader = new InputStreamReader(in,"UTF-8");
and my server when gets connected sends text using this statement
AContext->Connection->IOHandler->WriteLn(cxMemo1->Text,TEncoding::UTF8);
but result text showing weird characters like ?????????????????????????? ?????????????
Where I'm doing wrong?
also when i tried to load from utf-8 encoding data file in such a way
AContext->Connection->IOHandler->WriteFile("c:\\fids.xml");
it's all the same!
Indy 10 completely supports UTF-8 encoding. I've myself worked with it's TIdFTP component & successfully uploaded Unicode text files. From what I can make of it:
Your connection/transfer type is set to ftASCII rather than ftBinary.
Your J2ME applet/Host platform does not suport UTF-8
'?' characters occur when data is going through a Unicode-to-Ansi conversion to an Ansi charset that does not support the Unicode characters being converted.
What version of C++Builder are you using? In versions prior to CB2009, you should tell Indy the encoding of the AnsiString data that you are passing in. Indy defaults to ASCII (ie: TIdTextEncoding::ASCII) for most String-based operation. That can be overridden when needed, either with optional AAnsiEncoding parameters, the TIdIOHandler::DefAnsiEncoding property, or the global Idglobal::GIdDefaultAnsiEncoding setting. If you do not specify the correct encoding, the AnsiString data may not be converted to Unicode correctly before then being converted to UTF-8. For example:
AContext->Connection->IOHandler->WriteLn(cxMemo1->Text, TIdTextEncoding_UTF8, TTIdTextEncoding_Default);
Or:
AContext->Connection->IOHandler->DefAnsiEncoding = TIdTextEncoding_Default;
AContext->Connection->IOHandler->WriteLn(cxMemo1->Text, TIdTextEncoding_UTF8);
You can optionally also use the TIdIOHandler::DefStringEncoding property if you do not want to specify the UTF-8 encoding on every call:
AContext->Connection->IOHandler->DefStringEncoding = TIdTextEncoding_UTF8;
AContext->Connection->IOHandler->WriteLn(cxMemo1->Text);
Now, with that said, the fact that WriteFile() is also sending data that J2ME is not handling correctly tells me that Indy is not the root of the issue. WriteFile() simply dups the raw file data as-is to the connection without any interpretation at all. If you send a UTF-8 encoded file, then UTF-8 encoded octets will be sent to J2ME.
I suggest you use a packet sniffer, such as Wireshark, to verify the data that Indy is sending. That will tell you for sure whether Indy is really at fault or not.
*PS: notice in the examples above that I use Indy's TIdTextEncoding macros instead of TEncoding directly. This is because Indy's TIdTextEncoding logic works around some bugs in Embarcadero's TEncoding classes. Also, we're going to phase out direct support for TEncoding in Indy 11 and expand on TIdTextEncoding so Indy has more control than Embarcadero offers.

Encoding problems with ogr2ogr and Postgis/PostgreSQL database

In our organization, we handle GIS content in different file formats. I need to put these files into a PostGIS database, and that is done using ogr2ogr. The problem is, that the database is UTF8 encoded, and the files might have a different encoding.
I found descriptions of how I can specify the encoding by adding an options parameter to org2ogr, but appearantly it doesn't have an effect.
ogr2ogr -f PostgreSQL PG:"host=localhost user=username dbname=dbname \
password=password options='-c client_encoding=latin1'" sourcefile;
The error I recieve is:
ERROR 1: ALTER TABLE "soer_vd" ADD COLUMN "målsætning" CHAR(10)
ERROR: invalid byte sequence for encoding "UTF8": 0xe56c73
HINT: This error can also happen if the byte sequence does not match the
encoding expected by the server, which is controlled by "client_encoding".
ERROR 1: ALTER TABLE "soer_vd" ADD COLUMN "påvirkning" CHAR(10)
ERROR: invalid byte sequence for encoding "UTF8": 0xe57669
HINT: This error can also happen if the byte sequence does not match the
encoding expected by the server, which is controlled by "client_encoding".
ERROR 1: INSERT command for new feature failed.
ERROR: invalid byte sequence for encoding "UTF8": 0xf8
HINT: This error can also happen if the byte sequence does not match the
encoding expected by the server, which is controlled by "client_encoding".
Currently, my source file is a Shape file and I'm pretty sure, that it is Latin1 encoded.
What am I doing wrong here and can you help me?
Kind regards, Casper
Magnus is right and I will discuss the solution here.
I have seen the option to inform PostgreSQL about character encoding, options=’-c client_encoding=xxx’, used many places, but it does not seem to have any effect. If someone knows how this part is working, feel free to elaborate.
Magnus suggested to set the environment variable PGCLIENTENCODING to LATIN1. This can, according to a mailing list I queried, be done by modifying the call to ogr2ogr:
ogr2ogr -–config PGCLIENTENCODING LATIN1 –f PostgreSQL
PG:”host=hostname user=username dbname=databasename password=password” inputfile
This didn’t do anything for me. What worked for me was to, before the call to ogr2ogr, to:
SET PGCLIENTENCODING=LATIN1
It would be great to hear more details from experienced users and I hope it can help others :)
That does sound like it would set the client encoding to LATIN1. Exactly what error do you get?
Just in case ogr2ogr doesn't pass it along properly, you can also try setting the environment variable PGCLIENTENCODING to latin1.
I suggest you double check that they are actually LATIN1. Simply running file on it will give you a good idea, assuming it's actually consistent within the file. You can also try sending it through iconv to convert it to either LATIN1 or UTF8.
You need to write your command line like this :
PGCLIENTENCODING=LATIN1 ogr2ogr -f PostgreSQL PG:"dbname=...
Currently, OGR from GDAL does not perform any recoding of character data during translation between vector formats. The team has prepared RFC 23.1: Unicode support in OGR document which discusses support of recoding for OGR drivers. The RFC 23 was adopted and the core functionality was already released in GDAL 1.6.0. However, most of OGR drivers have not been updated, including Shapefile driver.
For the time being, I would describe OGR as encoding agnostic and ignorant. It means, OGR does take what it gets and sends it out without any processing. OGR uses char type to manipulate textual data. This is fine to handle multi-byte encoded strings (like UTF-8) - it's just a plain stream of bytes stored as array of char elements.
It is advised that developers of OGR drivers should return UTF-8 encoded strings of attribute values, however this rule has not been widely adopted across OGR drivers, thus making this functionality not end-user ready yet.
On windows a command is
SET PGCLIENTENCODING=LATIN1
On linux
export PGCLIENTENCODING=LATIN1
or
PGCLIENTENCODING=LATIN1
Moreover this discussion help me:
https://gis.stackexchange.com/questions/218443/ogr2ogr-encoding-on-windows-using-os4geo-shell-with-census-data
On windows
SET PGCLIENTENCODING=LATIN1 ogr2ogr...
do not give me any error, but ogr2ogr do not works...I need to change the system variable (e.g. System--> Advanced system settings--> Environment variables -->New system variable) reboot the system and then run
ogr2ogr...
I solved this problem using this command:
pg_restore --host localhost --port 5432 --username postgres --dbname {DBNAME} --schema public --verbose "{FILE_PATH to import}"
I don't know if this is the right solution, but it worked for me.
For some reason, I dont know why, I could not import tables with ÅÄÖ in them to the public schema.
When I created a new schema I could import the tables to the new schema.