UTF-8 table with lowercase and uppercase and relation between them - unicode

I'm looking for a UTF-8 table / tables etc. with all the lowercase (small) and uppercase (capital) characters in (hexa-)decimal form and with a relation between the elements.
So far I found:
https://www.fileformat.info/info/unicode/category/Ll/list.htm
https://www.fileformat.info/info/unicode/category/Lu/list.htm
these are nice lists, though:
there is no relation between the 2 tables (I could create this by means some scripting creating a new table)
the values are Unicode values and not utf-8 (hexa-)decimal values.
Any suggestions?

My advise is to use an existing library that can do case conversion for you. You mentioned you were targeting C++ so the library ICU is one option.

Related

PostgreSQL SELECT can alter a table?

So I'm new to SQL like databases and the place that I work at migrated to PostgreSQL. One table drastically reduced its contents. The point is, I only used SELECT statements, and changed the name of the columns with AS. Is there a way I might have changed the table data?
When you migrate from a DBMS to another DBMS you must be sure that the objects created are strictly equivalent... The question seems to be trivial, but is'nt.
As a matter fact one important consideration for litterals (char/varchar...) is to verify the collation used formerly and the collation you have used to create the newly database in PostGreSQL.
Collation in an RDBMS is the way to adjust the behavior of character strings with regard to certain parameters such as the distinction, or not, of upper and lower case letters, the distinction, or not, of diacritical characters (accents, ligatures...), specific sorting to language, etc. And constitutes a superset of the character encoding.
Did you verify this point when using some WHERE clause to search some litterals ? If not, try to restricts litteral in applying the right collation (COLLATE operator) or use UPPER function to avoid the distinguish between upper and lower chars...

DB2 UnLOAD in unicode with two chardelimiter

I have to create an UNLOAD job for a DB2 table and save the UNload in unicode. That's no problem.
But unfortunately there are contents in the table columns that correspond to the separators.
For example, I would like the combination #! as a separator, but I can't do that in unicode.
Can someone tell me how to do this?
Now my statement looks like this:
DELIMITED COLDEL X'3B' CHARDEL X'24' DECPT X'2E'
UNICODE
thanks a lot for your help
The delimiter can be a single character (not two characters, as you want).
In this case the chosen solution was to find a single character that did not appear in the data.
When that is not possible, consider a non-delimited output format, or a different technique to get the data to the external system (for example via federation or other SQL-based interchange, or XML etc.

Scala Slick, Trouble with inserting Unicode into database

When I create a new row of data that contains several columns that may contain Unicode, the columns that do contain Unicode are being corrupted.
However, if I insert that data directly, using the mysql-cli Slick will retrieve that Unicode data fine.
Is there anything I should add to my table class to tell Slick that this Column may be a Unicode string?
I found the problem, I have to set the character encoding, for the connection.
db.default.url="jdbc:mysql://localhost/your_db_name?characterEncoding=UTF-8"
You probably need to configure that on the db schema side by setting the right collation.

Do text_pattern_ops comparators understand UTF-8?

According to the PostgreSQL 9.2 documentation, if I am using a locale other than the C locale (en_US.UTF-8 in my case), btree indexes on text columns for supporting queries like
SELECT * from my_table WHERE text_col LIKE 'abcd%'
need to be created using text_pattern_ops like so
CREATE INDEX my_idx ON my_table (text_col text_pattern_ops)
Now section 11.9 of the documentation states that this results in a "character by character" comparison. Are these (non-wide) C characters or does the comparison understand UTF-8?
Good question, I'm not totally sure but my tentative understanding is:
Here Postgresql means "real characters" (eventually multibyte), not bytes. The comparison "understands UTF-8" always, with or without this special index.
The point is that, for locales that have special (non C) collation rules, we normally want to follow those rules (and call the respective locale libraries) when doing comparisons ( <, >...) and sorting. But we don't want to use those collations for POSIX regular matching and LIKE patterns. Hence the existence of two different types of indexes for text.
The operators in the text_pattern_ops operator class actually do a memcmp() on the strings, so the documentation is perhaps slightly inaccurate talking about characters.
But this doesn't really affect the question whether they support UTF-8. The indexing of pattern matching operations in the described fashion does support UTF-8. The underlying operators don't have to worry about the encoding.

Storing uni code characters in PostgreSQL 8.4 table

I want to store unicode characters in on of the column of PostgreSQL8.4 datat base table. I want to store non-English language data say want to store the Indic language texts. I have achieved the same in Oracle XE by converting the text into unicode and stored in the table using nvarchar2 column data type.
The same way I want to store unicode characters of Indic languages say (Tamil,Hindi) in one of the column of a table. How to I can achieve that,what data type should I use?
Please guide me, thanks in advance
Just make sure the database is initialized with encoding utf8. This applies to the whole database for 8.4, later versions are more sophisticated. You might want to check the locale settings too - see the manual for details, particularly around matching with LIKE and text pattern ops.