This question already has answers here:
How to use non-ascii character string literals in firebird sql statements?
(3 answers)
Closed 3 years ago.
I made a simple SELECT statement, like this:
select t.no_qult, t.desc_qult
from qualities_type t
where t.name_qult = 'Lỗi vải'
The field name_qult is UNICODE_FSS charset. Problem is it didn't work with unicode input value Lỗi vải (Vietnamese language), just work when I use plain text Lá»—i vải.
Does anyone know how to query with a unicode input value?
Do not use literals. Don't put data into the query text, put it outside of query as "parameters".
It has a number of benefits, like more reliable parsing, more type-checking, more safety and often more speed (you can prepare query once, and then run it many times only changing parameters value).
How you code the parameters in SQL queries depends upon the library you use in your programming language for connecting to Firebird. See http://bobby-tables.com/ for some examples. The following are three often used conventions to try:
SELECT .... WHERE t.name_qult = ? -- natively supported by Firebird, index-based access to parameters
SELECT .... WHERE t.name_qult = :NAME_PARAM -- BDE / Delphi style
SELECT .... WHERE t.name_qult = #NAME_PARAM -- MS SQL / .Net style
I do not know which flavours are supported in languages and programs you use.
IB Expert uses Delphi libraries, hence using #2 option.
Java-written programs tent to use #1 option.
Additionally in your connection options check that the "connection charset" is set to UTF-8 or to come Vietnamese codepage that can transfer all those specific characters.
UNICODE_FSS charset is outdated. If possible, it would be better to move to UTF-8 charset wherever possible.
Related
I am using Data Transfer utility for IBM i in order to create TSV files from my AS400s and then import them to my SQl Server Data Warehouse.
Following this: SO Question about SSIS encoding script i want to stop using conversion in SSIS task and have the data ready from the source.
I have tried using vatious codepages in TSV creation (1200 etc.) but 1208 only does the trick in half: It creates UTF8 which then i have to convert to unicode as shown in the other question.
What CCSID i have to use to get unicode from the start?
Utility Screenshot:
On IBM i, CCSID support is intended to be seamless. Imagine the situation where the table is in German encoding, your job is in English and you are creating a new table in French - all on a system whose default encoding is Chinese. Use the appropriate CCSID for each of these and the operating system will do the character encoding conversion for you.
Unfortunately, many midrange systems aren't configured properly. Their system default CCSID is 'no CCSID / binary' - a remnant of a time some 20 years ago, before CCSID support. DSPSYSVAL QCCSID will tell you what the default CCSID is for your system. If it's 65535, that's 'binary'. This causes no end of problems, because the operating system can't figure out what the true character encoding is. Because CCSID(65535) was set for many years, almost all the tables on the system have this encoding. All the jobs on the system run under this encoding. When everything on the system is 65535, then the OS doesn't need to do any character conversion, and all seems well.
Then, someone needs multi-byte characters. It might be an Asian language, or as in your case, Unicode. If the system as a whole is 'binary / no conversion' it can be very frustrating because, essentially, the system admins have lied to the operating system with respect to the character encoding that is effect for the database and jobs.
I'm guessing that you are dealing with a CCSID(65535) environment. I think you are going to have to request some changes. At the very least, create a new / work table using an appropriate CCSID like EBCDIC US English (37). Use a system utility like CPYF to populate this table. Now try to download that, using a CCSID of say, 13488. If that does what you need, then perhaps all you need is an intermediate table to pass your data through.
Ultimately, the right solution is a proper CCSID configuration. Have the admins set the QCCSID system value and consider changing the encoding on the existing tables. After that, the system will handle multiple encodings seamlessly, as intended.
The CCSID on IBM i called 13488 is Unicode type UCS-2 (UTF-16 Big Endian). There is not "one unicode" - there are several types of Unicode formats. I looked at your other question. 1208 is also Unicode UTF-8. So what exactly is meant "to get Unicode to begin with" is not clear (you are getting Unicode to begin with in format UTF-8) -- but then I read your other question and the function you mention does not say what kind of "unicode" it expects :
using (StreamWriter writer = new StreamWriter(to, false, Encoding.Unicode, 1000000))
The operating system on IBM i default is to mainly store data in EBCDIC database tables, and there are some rare applications that are built on this system to use Unicode natively. It will translate the data into whatever type of Unicode it supports.
As for SQL Server and Java - I am fairly sure they use UCS-2 type Unicode so if you try using CCSID 13488 on the AS/400 side to transfer, it may let you avoid the extra conversion from UTF-8 Unicode because CCSID 13488 is UCS-2 style Unicode.
https://www-01.ibm.com/software/globalization/ccsid/ccsid_registered.html
There are 2 CCSID's for UTF-8 unicode on system i 1208 and 1209. 1208 is UTF-8 with IBM PAU 1209 is for UTF-8. See link above.
In a program I have used a function RPAD() to format data coming from DB2 db.
In one instance the value was Ãmber. The following function:
RPAD('Ãmber',10,' ')
gives 9 characters only.
The ASCII value of 'Ã' is 195. I am not able to understand the reason for this behaviour.
Could someone share their experience.
Thanks
By default, DB2 will consider the length of à to be 2, likely because it is counting bytes rather than characters.
values(LENGTH('Ãmber'))
6
You can override it for LENGTH and many other functions
values(LENGTH('Ãmber', CODEUNITS16))
5
Unfortunately, RPAD does not take a parameter like this. I'm guessing this might be because the function was added for Oracle compatibility rather than on its own merits.
You could write your own RPAD function as a stored procedure or UDF, or just handle it with a CASE statement if this is the only place where you need it.
Mongodb use utf-8 in internal, how to set the output charset? Is there a command similar as MySQL's "set names"? I using c++ mongoclient.
I'm not sure If the c++ driver behaves differently, but from what I know you'll always get a UTF-8 encoded result back. So if you want those data converted to another character set you need to perform it for yourself (don't know what ways you have with c++).
MongoDB exclusively deals with UTF-8. You can't change either input or output character sets and character encodings. You will need to do that in your application, where you also need to make sure that every string you send to MongoDB is actually UTF-8. None of the drivers currently support anything else. It's not likely they will ever do either.
I have a tableview (linked to a database) and a search bar. When I type something in the search bar, I do a quick search in the database and display the results as I type.
The query looks like this:
SELECT * FROM MyTable WHERE name LIKE '%NAME%'
Everything works fine as long as I use only ASCII characters. What I want is to type ASCII characters and to match their equivalent with diacritics. For instance, if I type "Alizee" I would expect it to match "Alizée".
Is there a way to do make the query locale-insensitive? I've red about the COLLATE option in SQL, but there seems to be of no use with SQLite.I've also red that iPhone SDK 3.0 has "Localized collation" but I was unable to find any documentation about what this means...
Thank you.
There are a few options for solving this:
Replacing all accented chars in the
query before executing it, e.g.
"Psychédélices" => "Psychedelices"
"À contre-courant" => "A contre-courant"
"Tempête" => "Tempete"
etc.
but this only works for the input so
you must not have accented chars in
the database itself. Simple solution but
far from perfect.
Using a 3rd party library, namely ICU (links below). Not sure if it's the best choice for iPhone though.
Writing one or more custom C functions that will do the comparison. More in the links below.
A few posts here on StackOverflow that discuss the various options:
How to sort text in sqlite3 with specified locale?
Case-insensitive UTF-8 string collation for SQLite (C/C++)
How to implement the accent/diacritic insensitive search in Sqlite?
Also a couple of external links:
SQLite and native UNICODE LIKE support in C/C++
sqlite case and accent insensitive searches
I'm not sure about SQL, but I think you can definitely use the NSDiacriticInsensitivePredicateOption to compare in-memory NSStrings.
An example would be an NSArray full of the strings you're searching over. You could just iterate over the array comparing strings using the NSDiacriticInsensitivePredicateOption as your comparison option and displaying the successful matches.
At least in previous versions of SQL Server, you had to prefix Unicode string constants with an "N" to make them be treated as Unicode. Thus,
select foo from bar where fizz = N'buzz'
(See "Server-Side Programming with Unicode" for SQL Server 2005 "from the horse's mouth" documentation.)
We have an application that is using SQL Compact Edition and I am wondering if that is still necessary. From the testing I am doing, it appears to be unneeded. That is, the following SQL statements both behave identically in SQL CE, but the second one fails in SQL Server 2005:
select foo from bar where foo=N'າຢວ'
select foo from bar where foo='າຢວ'
(I hope I'm not swearing in some language I don't know about...)
I'm wondering if that is because all strings are treated as Unicode in SQL CE, or if perhaps the default code page is now Unicode-aware.
If anyone has seen any official documentation, either yea or nay, I'd appreciate it.
I know I could go the safe route and just add the "N"'s, but there's a lot of code that will need changed, and if I don't need to, I don't want to! Thanks for your help!
SQL CE was originally developed for Windows CE, which is purely Unicode. As a result, SQL CE already leans heavily toward Unicode and the "N" prefix is unnecessary.