Retrieve multilingual data (Chinese, Japanese...) from SQL Server 2008 R2 and display in Java webapp - unicode

I have Chinese data in my db and I need to display it in my Java web app. However I am getting ??? as output.
Database Used: SQL Server 2008 R2 (nvarchar datatype is used in order to support Unicode data and db is created with default collation name i.e. SQL_Latin1_General_CP1_CI_AS and there is no problem while storing the data in db).
Development Environment: Window 7
Treegrid is used to display data.
I have already:
1. set charset and pageEncoding to UTF-8 in my HTML, jsp and Java
pages.
2. Updated my jdbc connection with useUnicode=true;characterEncoding=UTF-8;.
3. Configured Tomcat’s server.xml connector to use UTF-8 (URIEncoding="UTF-8").
I have once set collation_name to Latin1_General_CI_AI still it's not working.

Latin1_General_CI_AI --> There's part of your problem. Latin1 has nothing to do with Unicode. Getting "???" means there's an encoding problem somewhere in your toolchain, where your UTF-8 data gets scrambled into another encoding.

Related

Azure database for postgreSQL Flexible Server and Entity Framework Core shows accented spanish words incorrectly

I have created a Flexible Server (Azure database for postgreSQL), I am using pgadmin to connect to this database (Encoding UTF8), I have a table named skills, and using pgadmin shows the spanish words correctly.
SHOW SERVER_ENCODING;
UTF8
SHOW CLIENT_ENCODING;
SQL_ASCII
The problem is when I connect entity framework core to this database, shows accented spanish words incorrectly, for example:
Publicación (Correct)
Publicaci\xf3n (Incorrect)
When I send data from Web API to database it is inserted this way
Publicación
There is an option in Azure Database for PostgreSQL flexible server is Server Parameters,
Inside there is a parameter:
client_encoding was SQL_ASCII and I changed to UTF8
then my pgadmin shows accented spanish words correctly, that was the solution

How to set Client_Encoding

I'm using NodeJS to get some data from a PostgreSQL database and render it on the web. When I make a query on SQL Shell(psql) everything looks fine but when I console.log that same data from NodeJS, all the special characters are replaced with gibberish.
The encoding for this database is
- Encoding: UTF8
- Collation: French_France.1252
- Ctype: French_France.1252
I tried to set Client_Encoding to UTF8 but when I reconnect to the database I find out that it is still not set.
Also I get this warning each time I connect to the database (Just in case it may cause smth)
WARNING: Console code page (850) differs from Windows code page (1252)
8-bit characters might not work correctly. See psql reference
page "Notes for Windows users" for details.
My OS is Windows 8 and PostgreSQL version is 10.x.
After a long search I have found a solution that I want to share now.
Change your ODBC DSN connection string to include this: ConnSettings=SET CLIENT_ENCODING TO 'UTF8';

Informix SE to PostgreSQL data migration

I'm rewriting an old application running on SCO Unix that connects to an Informix SE 7.24 database. The target os is RHEL 6.3 and the dbms is PostgreSQL 9.4.
I've already adapted the DDL script and created the empty database but now I'm looking for a way to migrate data. Informix and PostgreSQL are using two different character set, CP437 and UTF8.
I've tried to export the database with the dbexport utility, converted *.unl files to the new charset and then loaded with the COPY table_name FROM 'table.unl' (DELIMITER '|', ENCODING 'UTF-8', NULL ''). This worked for most of the tables but when the size of the .unl file grows (over 1GB), the import process crash. What can I do?
You have not show us COPY error message.
I migrated some databases and one of the easiest way is to use JDBC especially with Jython (Python that works on JVM). You can see such migration in my response: Blob's migration data from Informix to Postgres Of course you must change it to use your schema, but with JDBC it is easy to read table names and other schema info.

PgAdmin III, opening server status gives "invalid byte sequence for encoding UTF8"

I have two Postgres 9.3 servers in synchronous replication.
I had needed to restart the slave in order to load a new archive_cleanup_command in the recovery.conf.
The server restarted correctly and it's now perfectly in sync with the master server.
But when I open "Server status" panel for the slave server in PgAdmin III (which executable is located on the master server), I get some errors like this:
invalid byte sequence for encoding “UTF8” plus some hex codes
It might be because I put a tilde ~ in the archive_cleanup_command, but it didn't worked, then I removed it and the command worked correctly.
Maybe that ~ has been written somewhere and it's not a valid char... but I also deleted logs...
Log of the slave server has a lot of lines like the followings:
2015-02-13 11:11:32 CET ERROR: invalid byte sequence for encoding “UTF8”: 0xe8 0x20 0x73
2015-02-13 11:11:32 CET STATEMENT: SELECT pg_file_read('pg_log/postgresql-2015-02-13_111038.log', 0, 50000)
Note that postgresql-2015-02-13_111038.log is the last log, the one from which I got these lines.
The problem you have is that the locale setting lc_messages is set to an encoding that is different to the encoding of the database(s). As a result, some messages are being written into the log using Windows-1252 encoding, while when you try to use PgAdmin to view the log, it tries to interpret the file using UTF-8. Some of the byte sequences written in the log are not valid UTF-8, leading to the error.
In fact, the way in which different locales interact in postgresql can result in mixed encodings in the log file. There is a Bug Report on this, but it does not look like it has been resolved.
Probably the easiest way to resolve this is to set lc_messages to English_United States.UTF-8.
It would also be preferable to have lc_messages aligned across all of the databases on the server (or at least all using the same encoding).
Be sure to remove any existing log files as they will already contain the incorrect encoding.
It is due to your postgresql.log corrupted as stated in the statement 'select pf_file_read ....'.
If you do a "touch" (after a backup of your log perhaps) on you server log, and reconnect, you'll not see this unicode error anymore and thus, you'll be able to use pgadmin III furthermore.

UTF8 encoding issue?

I am having a problem with characters á, Á, ó, Ó, ú, Ú, í, Í, é, É being stored in our mysql DB as strange characters. We are using PDO for inserting to the DB.
The odd this is that I have a local copy of the site on my computer on WAMP which all works fine, and there is no encoding issue. The live site is on a Linux server, if that possibly makes a difference.
The local DB is a copy of the live DB, so all the encoding is the same in all of the tables.
I have tried setting the PDO encoding:
$pdo = new PDO('mysql:host=' . Settings::DBHostName() . ';charset=utf8;dbname=' . Settings::DBName(), Settings::DBUsername(), Settings::DBPassword(), array(
PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION
));
Any other suggestions? I can't see why it would work locally and not on our live site?
For the PHP versions since 5.3.6 you should set the encoding in DSN.
For all others to issue a conventional SET NAMES query is the only choice.
The last parameter of the PDO constructor call is the driver specific options, given as an array with key => value pairs. The MySQL driver has a PDO::MYSQL_ATTR_INIT_COMMAND option where you can specify a command that is executed every time you connect to the database.You can use the MySQL specific query SET NAMES utf8 as a value for the init command, to tell MySQL to use UTF-8 as the character set for our connection.
$pdo = new PDO('mysql:host=mysql.host.com;dbname=db;array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"',
'database', 'password');