how can i know the character set of data stored in Sybase database? - character

Good day,
i have a Sybase ASE 12.5 database on windows NT server
i need to know the character set of some Arabic data stored in the database
i checked the database default character set : it is "CP850"
but the stored data are "Arabic" data,so they are stored using another character set
i tried checking the "master..syscharsets" table , i can't find any popular Arabic charsets
Command: select id, csid, name, description from master..syscharsets
Result: http://dc414.2shared.com/download/CCfkf_RW/syscharsets_cropped.jpg?tsid=20140507-130321-3ade23f2
any ideas how to know the character set of the data?

i think it uses cp850 multilingual, try sp_configure "enable unicode conversions" in the server and try also sp_help tableName

Related

Data Migration from DB2 to PostgreSQL using AWS DMS - Varchar fields are showing trailing spaces

We are migrating DB2 data to PostgreSQL 11.x using AWS DMS, we have varchar fields in db2 with trailing spaces and without any TRIM these fields working fine when we are using these fields in a WHERE clause. I think DB2 internally trimming them as these are varchar fields. But after moving to PostgreSQL these fields are not working without TRIM and also some times these giving unexpected results even if you use TRIM. below is the detailed problem.
Source: DB2 - RECIP_NUM -- VARCHAR(10) -- 'ST001 '
select RECIP_NUMBER, SERV_TYPE, LENGTH(SERV_TYPE) AS before_trim_COL_LENGTH, LENGTH(trim(SERV_TYPE)) AS after_trim_COL_LENGTH
from serv_type rst
WHERE SERV_TYPE = 'ST001' -- THIS WORKS FINE WITHOUT TRIM
Output:Output of DB2
Target: PGSQL -- RECIP_NUM -- VARCHAR(10) -- 'ST001 '
select RECIP_NUMBER, SERV_TYPE, LENGTH(SERV_TYPE) AS COL_LENGTH
from serv_type rst
WHERE trim(SERV_TYPE) = 'ST001' -- THIS IS NOT GIVING ANY OUTPUT WITHOUT TRIM
Output: Output of PostgreSQL
Is there any way we can tell PostgreSQL to ignore the trailing spaces of a VARCHAR Column?
Postgres doesn't follow the SQL standard, which requires the shorter string be padded, when comparing VARCHAR or TEXT strings; it only pads the CHAR strings. Therefore, you can use ...WHERE SERV_TYPE::char = 'ST001'::char to simulate the Db2 behaviour. Note though that this will preclude the use of index on SERV_TYPE, same as when using trim(SERV_TYPE).

SQL Command to insert Chinese Letters

I have a database with one column of the type nvarchar. If I write
INSERT INTO table VALUES ("玄真")
It shows ¿¿ in the table. What should I do?
I'm using SQL Developer.
Use single quotes, rather than double quotes, to create a text literal and for a NVARCHAR2/NCHAR text literal you need to prefix it with N
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_name ( value NVARCHAR2(20) );
INSERT INTO table_name VALUES (N'玄真');
Query 1:
SELECT * FROM table_name
Results:
| VALUE |
|-------|
| 玄真 |
First, using NVARCHAR might not even be necessary.
The 'N' character data types are for storing data that doesn't 'fit' in the database's defined character set. There's an auxiliary character set defined as the NCHAR Character set. It's kind of a band aid - once you create a database it can be difficult to change its character set. Moral of this story - take great care in defining the Character Set when creating your database and do not just accept the defaults.
Here's a scenario (LiveSQL) where we're storing a Chinese string in both NVARCHAR and VARCHAR2.
CREATE TABLE SO_CHINESE ( value1 NVARCHAR2(20), value2 varchar2(20 char));
INSERT INTO SO_CHINESE VALUES (N'玄真', '我很高興谷歌翻譯。' )
select * from SO_CHINESE;
Note that both the character sets are in the Unicode family. Note also I told my VARCHAR2 string to hold 20 characters. That's because some characters may require up to 4 bytes to be stored. Using a definition of (20) would give you only room to store 5 of those characters.
Let's look at the same scenario using SQL Developer and my local database.
And to confirm the character sets:
SQL> clear screen
SQL> set echo on
SQL> set sqlformat ansiconsole
SQL> select *
2 from database_properties
3 where PROPERTY_NAME in
4 ('NLS_CHARACTERSET',
5 'NLS_NCHAR_CHARACTERSET');
PROPERTY_NAME PROPERTY_VALUE DESCRIPTION
NLS_NCHAR_CHARACTERSET AL16UTF16 NCHAR Character set
NLS_CHARACTERSET AL32UTF8 Character set
First of all, you should to establish the Chinese character encoding on your Database, for example
UTF-8, Chinese_Hong_Kong_Stroke_90_BIN, Chinese_PRC_90_BIN, Chinese_Simplified_Pinyin_100_BIN ...
I show you an example with SQL Server 2008 (Management Studio) that incorporates all of this Collations, however, you can find the same characters encodings in other Databases (MySQL, SQLite, MongoDB, MariaDB...).
Create Database with Chinese_PRC_90_BIN, but you can choose other Coallition:
Select a Page (Left Header) Options > Collation > Choose the Collation
Create a Table with the same Collation:
Execute the Insert Statement
INSERT INTO ChineseTable VALUES ('玄真');

Firebird errors when using Symmetricds

I am using symmetricds free version to replicate my firebird database. When I demo by creating new (blank) DB, it worked fine. But when I config on my existing DB (have data), error occurred.
I use Firebird-2.5.5.26952_0 32bit & symmetric-server-3.9.5, OS is Windows Server 2008 Enterprise.
I had searched for whole day but found nothing to solve this. Anyone please help. Thank for your time.
UPDATE:
When initial load, Symmetric execute the statement to declare UDF in firebird DB:
declare external function sym_hex blob
returns cstring(32660) free_it
entry_point 'sym_hex' module_name 'sym_udf
It caused error because my existing DB charset is UNICODE_FSS, max length of CSTRING is 10922. When I work around by updating charset to NONE, it worked fine. But it is not a safe solution. Still working to find better.
One more thing, anyone know others open source tool to replicate Firebird database, I tried many in here and only Symmetric work.
The problem seems to be a bug in Firebird where the length of CSTRING is supposed to be in bytes, but in reality it uses characters. Your database seems to have UTF8 (or UNICODE_FSS) as its default character set, which means each character can take up to 4 bytes (3 for UNICODE_FSS). The maximum length of CSTRING is 32767 bytes, but if it calculates in characters for CSTRING, that suddenly reduces the maximum to 8191 characters (or 32764 bytes) (or 10922 characters, 32766 bytes for UNICODE_FSS).
The workaround to this problem would be to create a database with a different default character set. Alternatively, you could (temporarily) alter the default character set:
For Firebird 3:
Set the default character set to a single byte character set (eg NONE). Use of NONE is preferable to avoid unintended transliteration issues
alter database set default character set NONE;
Disconnect (important, you may need to disconnect all current connections because of metadata caching!)
Set up SymmetricDS so it creates the UDF
Set the default character set back to UTF8 (or UNICODE_FSS)
alter database set default character set UTF8;
Disconnect again
When using Firebird 2.5 or earlier, you will need perform a direct system table update (which is no longer possible in Firebird 3) using:
Step 2:
update RDB$DATABASE set RDB$CHARACTER_SET_NAME = 'NONE'
Step 4:
update RDB$DATABASE set RDB$CHARACTER_SET_NAME = 'UTF8'
The alternative would be for SymmetricDS to change its initialization to
DECLARE EXTERNAL FUNCTION SYM_HEX
BLOB
RETURNS CSTRING(32660) CHARACTER SET NONE FREE_IT
ENTRY_POINT 'sym_hex' MODULE_NAME 'sym_udf';
Or maybe character set BINARY instead of NONE, as that seems closer to the intent of the UDF.

TYPO3 RTE: Saving mathematical/greek symbols doesn't work

I need to display some mathematical/greek symbols in the RTE and later in the frontend. Inserting them via copy/paste or the "Insert characters" option works great, but as soon as I save the text, the inserted symbol get's replaced with a question mark and T3 throws following error:
1: These fields of record 56 in table "tt_content" have not been saved correctly: bodytext! The values might have changed due to type casting of the database.
I think there is an issue with the character set of T3 or my DB, but I don't know where to start looking.
Tested on my 7.6.8 and it seems to work OK. When I login to my mysql and run this query:
SELECT default_character_set_name FROM information_schema.SCHEMATA
WHERE schema_name = "7_6_local_typo3_org";
(7_6_local_typo3_org is database name) it returns:
+----------------------------+
| default_character_set_name |
+----------------------------+
| utf8 |
+----------------------------+
1 row in set (0.00 sec)
and also collation:
SELECT default_collation_name FROM information_schema.SCHEMATA
WHERE schema_name = "7_6_local_typo3_org";
+------------------------+
| default_collation_name |
+------------------------+
| utf8_general_ci |
+------------------------+
1 row in set (0.00 sec)
Then also I have in my my.cnf (mysql config file):
character-set-server = utf8
collation-server = utf8_general_ci
Similar problem when pasting HTML with UTF-Icons into Raw-HTML content-element in TYPO3-8.7.x but it works when i encode the symbols, for example:
<span class="menuicon">⌚</span>
Possible reasons for error message
1: These fields of record X in table "tt_content" have not been saved correctly: bodytext! The values might have changed due to type casting of the database.
in a TYPO3 installation (example installation's version: 10.4.20) can be
the MySQL/MariaDB tables of this TYPO3 installation are using an inappropriate/outdated character set and/or collation (Step 1 below).
this TYPO3 installation is not yet configured to use utf8mb4 for the database (Step 2 below).
TYPO3 supports utf8mb4 since at least version 9.5. With it comes proper Unicode support, including emojis, mathematical symbols, and Greek letters (e.g. ⌚∰β) in CKEditor bodytext.
I migrated my TYPO3 installation's database and configuration to utf8mb4 in the following way, getting rid of the aforementioned error message and saving and displaying Unicode multibyte characters correctly.
Be sure to apply these migrations in a test environment first, then check existing content and test usual content editing scenarios before applying these migrations on a production system to make sure MySQL/MariaDB converted between the character sets correctly and without data loss (truncation).
Step 1
Update TYPO3 database tables to use utf8mb4 as character set and utf8mb4_unicode_ci as collation.
The following bash one-liner loops over all tables in database typo3 and applies these updates. It assumes MySQL/MariaDB root privileges, a password-less socket connection, and a TYPO3 database (table_schema) named typo3. Adapt accordingly. Tested successfully on
Debian 11 MariaDB Server (10.5.12-MariaDB-0+deb11u1)
Ubuntu 20.04 LTS MySQL Server (8.0.27-0ubuntu0.20.04.1)
for tbl in $(mysql --disable-column-names --batch -e 'select distinct TABLE_NAME from information_schema.tables where table_schema="typo3" and table_type="BASE TABLE";'); do echo "updating table $tbl" && mysql -e "ALTER TABLE typo3.${tbl} CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"; done
To ensure that during this conversion (from a "smaller" encoding to the up-to-four-bytes-per-character utf8mb4 encoding) no (string) data gets lost/truncated, MySQL/MariaDB automatically adapts a text/string column's datatype to a larger text/string datatype, e.g. from TEXT to MEDIUMTEXT.
To restore some TYPO3 (extension) table's column back to its specified datatype, visit TYPO3 backend -> Maintenance -> Analyze Database Structure. This tool will allow to restore those column's original (smaller) datatypes. This may cause data truncations. I'm not sure whether TYPO3 will warn if truncation actually occurs, though assuming the TYPO3 (extension) developers had utf8mb4 in mind when specifying/designing a column's datatype and the user-provided content of a particular database cell is not too large, truncation should not be happening (overview of text/string datatype sizes).
Step 2
Configure TYPO3 to use utf8mb4. For example, when leveraging typo3conf/AdditionalConfiguration.php, have the following configurations in AdditionalConfiguration.php:
// ...
$GLOBALS['TYPO3_CONF_VARS']['DB']['Connections']['Default']['charset'] = 'utf8mb4';
$GLOBALS['TYPO3_CONF_VARS']['DB']['Connections']['Default']['tableoptions']['charset'] = 'utf8mb4';
$GLOBALS['TYPO3_CONF_VARS']['DB']['Connections']['Default']['tableoptions']['collate'] = 'utf8mb4_unicode_ci';
// ...

Postgresql order by - danish characters is expanded

I'm trying to make a "order by" statement in a sql query work. But for some reason danish special characters is expanded in stead of their evaluating their value.
SELECT roadname FROM mytable ORDER BY roadname
The result:
Abildlunden
Æblerosestien
Agern Alle 1
The result in the middle should be the last.
The locale is set to danish, so it should know the value of the danish special characters.
What is the collation of your database? (You might also want to give the PostgreSQL version you are using) Use "\l" from psql to see.
Compare and contrast:
steve#steve#[local] =# select * from (values('Abildlunden'),('Æblerosestien'),('Agern Alle 1')) x(word)
order by word collate "en_GB";
word
---------------
Abildlunden
Æblerosestien
Agern Alle 1
(3 rows)
steve#steve#[local] =# select * from (values('Abildlunden'),('Æblerosestien'),('Agern Alle 1')) x(word)
order by word collate "da_DK";
word
---------------
Abildlunden
Agern Alle 1
Æblerosestien
(3 rows)
The database collation is set when you create the database cluster, from the locale you have set at the time. If you installed PostgreSQL through a package manager (e.g. apt-get) then it is likely taken from the system-default locale.
You can override the collation used in a particular column, or even in a particular expression (as done in the examples above). However if you're not specifying anything (likely) then the database default will be used (which itself is inherited from the template database when the database is created, and the template database collation is fixed when the cluster is created)
If you want to use da_DK as your default collation throughout, and it's not currently your database default, your simplest option might be to dump the database, then drop and re-create the cluster, specifying the collation to initdb (or pg_createcluster or whatever tool you use to create the server)
BTW the question isn't well-phrased. PostgreSQL is very much not ignoring the "special" characters, it is correctly expanding "Æ" into "AE"- which is a correct rule for English. Collating "Æ" at the end is actually more like the unlocalised behaviour.
Collation documentation: http://www.postgresql.org/docs/current/static/collation.html