I have recently started using PostgreSQL for creating/updating existing SQL databases. Being rather new in this I came across an issue of selecting correct encoding type while creating new database. UTF-8 (default) did not work for me as data to be included is of various languages (English, Chinese, Japanese, Russia etc) as well as includes symbolic characters.
Question: What is the right database encoding type to satisfy my needs.
Any help is highly appreciated.
There are four different encoding settings at play here:
The server side encoding for the database
The client_encoding that the PostgreSQL client announces to the PostgreSQL server. The PostgreSQL server assumes that text coming from the client is in client_encoding and converts it to the server encoding.
The operating system default encoding. This is the default client_encoding set by psql if you don't provide a different one. Other client drivers might have different defaults; eg PgJDBC always uses utf-8.
The encoding of any files or text being sent via the client driver. This is usually the OS default encoding, but it might be a different one - for example, your OS might be set to use utf-8 by default, but you might be trying to COPY some CSV content that was saved as latin-1.
You almost always want the server encoding set to utf-8. It's the rest that you need to change depending on what's appropriate for your situation. You would have to give more detail (exact error messages, file contents, etc) to be able to get help with the details.
Related
I am loading data into QlikView report from different sources, one of them is Sybase db. Seems like Sybase db is using ISO 8859-1 encoding, but there are also Russian characters there, and QlikView just don't display them properly.
I don't see the way to manually define encoding in Qlikview. Is there any?
I tried to specify cyrillic charset in ODBC settings, but it also doesn't help. Funny thing is ASE isql (tool to run queries on Sybase) there is no issue with encoding. Can I specify encoding when select stuff in Sybase?
Sounds like a charset conversion issue. My guess is that your isql has a charset conversion option enabled, but your qlikview session has not.
I have installed postgreSQL.
I use pgAdminIII as admin panel.
I am trying to watch tables content and see following:
How to avoid encoding problem?
For a UTF8 database, pgAdmin should always display strings correctly. The most likely explanation is that the data itself is incorrect.
This generally happens when a client application sends data in a format which doesn't match its client_encoding setting. If this is the case, setting client_encoding correctly would prevent this from happening (provided the client application's code page is supported by Postgres). This wouldn't fix the existing data, but it might be possible to repair it with the convert function.
I have a PostgreSQL database. It has a table which stores menu items(labels).These menu items are stored in English. is there any way to convert these stored items (100+) to Japanese language by using localization feature? because my customer box is UNIX with locale set to Japanese.
Reading between the lines, I'd say your database is in iso-8869-1 or WIN1252 encoding, which are 1-byte encodings for English languages.
If so, while you could transcode to a Japanese-specific encoding, they're mostly quite limited - both in their coverage of English (Roman) characters, and their coverage of Kanji/Hiragana. Japanese doesn't work wonderfully in a 1-byte encoding. Shift-JIS was an attempt to work around that, but it's an awful text encoding to work with and PostgreSQL will refuse to run with it.
Instead, convert the database to utf-8. This will support all your existing content and all new content. UTF-8 will work for any language.
To do so:
CREATE DATABASE mydb_new ENCODING 'UTF-8'
LC_COLLATE 'jp_JA.UTF-8' LC_CTYPE 'jp_JA.UTF-8';
then pg_dump the old database, and pg_restore to the new database. Afterwards you can rename the databases to swap them over.
All characters in latin-1 are valid in utf-8, so there won't be issues loading the dump.
You/your customer might need to generate/install the ja_JP.UTF-8 locale (if on Linux/BSD). How to do that is somewhat distro/platform specific.
I'm connecting to a remote Firebird 2.1 DB Server and i'm querying data that contains some cyrillic characters togeather with some latin ones.
The problem is that when i deploy the app on the production system, the cyrillic characters look like this: ÂÚÇÄÓØÍÀ. In addition, when trying to log what comes in from the DB, the cyrillic content is just skipped in the log file (i.e. i'm not seeing the ÂÚÇÄÓØÍÀ at all).
At this point i'm not sure whether i'm getting inconsistent data from the DB OR the production environment can't recognize those characters for some reason.
I've been wandering about for quite some time now and ran out of ideas, so any hints would be great.
The Dev machine that i use runs Windows 7 Ultimate SP1. My system locale is Bulgarian
The Production Server is accessed via Paralles Plesk Panel, and i'm not sure what's underneath.
If you did not specify any character set in your connection properties, then almost all Firebird drivers default to connection character set NONE. This means that Firebird will send the bytes of strings as they are stored in the database without any conversion, on the other side the driver will use the default system character set to convert those bytes to strings. If you use multiple systems with various default system character sets you will get different results.
You should always explicitly specify a connection characterset (WIN1251 in your case), unless you really know what you are doing.
I had an application that used a Sybase ASA 8 database. However, the application is not working anymore and the vendor went out of business.
Therefore, I've been trying to extract the data from the database, which has Arabic characters. When I connect to the database and display the contents, Arabic characters do not display correctly; instead, it looks something like ÇáÏãÇã.
Which is incorrect.
I tried to export the data to a text file. Same result. Tried to save the text file with UTF-8 encoding, but to no avail.
I have no idea what collation the tables are set to. Is there a way to export the data correctly, or convert it to the correct encoding?
the problem was solved by exporting the data from the database using "Windows-1252" encoding, and then importing it to other applications with "Windows-1256" encoding.
When you connect to the database, use the CHARSET=UTF-8 connection parameter. That will tell the server to convert the data to UTF-8 before sending it to the client application. Then you can save the data from the client to a file.
This, of course, is assuming that the data was saved with the correct character set to begin with. If it wasn't, you may be out of luck.