TYPO3 RTE: Saving mathematical/greek symbols doesn't work - typo3

I need to display some mathematical/greek symbols in the RTE and later in the frontend. Inserting them via copy/paste or the "Insert characters" option works great, but as soon as I save the text, the inserted symbol get's replaced with a question mark and T3 throws following error:
1: These fields of record 56 in table "tt_content" have not been saved correctly: bodytext! The values might have changed due to type casting of the database.
I think there is an issue with the character set of T3 or my DB, but I don't know where to start looking.

Tested on my 7.6.8 and it seems to work OK. When I login to my mysql and run this query:
SELECT default_character_set_name FROM information_schema.SCHEMATA
WHERE schema_name = "7_6_local_typo3_org";
(7_6_local_typo3_org is database name) it returns:
+----------------------------+
| default_character_set_name |
+----------------------------+
| utf8 |
+----------------------------+
1 row in set (0.00 sec)
and also collation:
SELECT default_collation_name FROM information_schema.SCHEMATA
WHERE schema_name = "7_6_local_typo3_org";
+------------------------+
| default_collation_name |
+------------------------+
| utf8_general_ci |
+------------------------+
1 row in set (0.00 sec)
Then also I have in my my.cnf (mysql config file):
character-set-server = utf8
collation-server = utf8_general_ci

Similar problem when pasting HTML with UTF-Icons into Raw-HTML content-element in TYPO3-8.7.x but it works when i encode the symbols, for example:
<span class="menuicon">⌚</span>

Possible reasons for error message
1: These fields of record X in table "tt_content" have not been saved correctly: bodytext! The values might have changed due to type casting of the database.
in a TYPO3 installation (example installation's version: 10.4.20) can be
the MySQL/MariaDB tables of this TYPO3 installation are using an inappropriate/outdated character set and/or collation (Step 1 below).
this TYPO3 installation is not yet configured to use utf8mb4 for the database (Step 2 below).
TYPO3 supports utf8mb4 since at least version 9.5. With it comes proper Unicode support, including emojis, mathematical symbols, and Greek letters (e.g. ⌚∰β) in CKEditor bodytext.
I migrated my TYPO3 installation's database and configuration to utf8mb4 in the following way, getting rid of the aforementioned error message and saving and displaying Unicode multibyte characters correctly.
Be sure to apply these migrations in a test environment first, then check existing content and test usual content editing scenarios before applying these migrations on a production system to make sure MySQL/MariaDB converted between the character sets correctly and without data loss (truncation).
Step 1
Update TYPO3 database tables to use utf8mb4 as character set and utf8mb4_unicode_ci as collation.
The following bash one-liner loops over all tables in database typo3 and applies these updates. It assumes MySQL/MariaDB root privileges, a password-less socket connection, and a TYPO3 database (table_schema) named typo3. Adapt accordingly. Tested successfully on
Debian 11 MariaDB Server (10.5.12-MariaDB-0+deb11u1)
Ubuntu 20.04 LTS MySQL Server (8.0.27-0ubuntu0.20.04.1)
for tbl in $(mysql --disable-column-names --batch -e 'select distinct TABLE_NAME from information_schema.tables where table_schema="typo3" and table_type="BASE TABLE";'); do echo "updating table $tbl" && mysql -e "ALTER TABLE typo3.${tbl} CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"; done
To ensure that during this conversion (from a "smaller" encoding to the up-to-four-bytes-per-character utf8mb4 encoding) no (string) data gets lost/truncated, MySQL/MariaDB automatically adapts a text/string column's datatype to a larger text/string datatype, e.g. from TEXT to MEDIUMTEXT.
To restore some TYPO3 (extension) table's column back to its specified datatype, visit TYPO3 backend -> Maintenance -> Analyze Database Structure. This tool will allow to restore those column's original (smaller) datatypes. This may cause data truncations. I'm not sure whether TYPO3 will warn if truncation actually occurs, though assuming the TYPO3 (extension) developers had utf8mb4 in mind when specifying/designing a column's datatype and the user-provided content of a particular database cell is not too large, truncation should not be happening (overview of text/string datatype sizes).
Step 2
Configure TYPO3 to use utf8mb4. For example, when leveraging typo3conf/AdditionalConfiguration.php, have the following configurations in AdditionalConfiguration.php:
// ...
$GLOBALS['TYPO3_CONF_VARS']['DB']['Connections']['Default']['charset'] = 'utf8mb4';
$GLOBALS['TYPO3_CONF_VARS']['DB']['Connections']['Default']['tableoptions']['charset'] = 'utf8mb4';
$GLOBALS['TYPO3_CONF_VARS']['DB']['Connections']['Default']['tableoptions']['collate'] = 'utf8mb4_unicode_ci';
// ...

Related

Query failed: collation "numerickn" for encoding "UTF8" does not exist

I have a column (vendor_name) in Postgresql (AWS RDS) table which can contain alphanumeric values. I would like to do a natural sort on this column.
The sample data in the table is as follows
delta 20221120
delta 20220109
costco delivery 564
costco delivery 561
united 01672519702943
Uber
I have created a colllate in the db as below.
CREATE COLLATION IF NOT EXISTS numerickn (provider = icu, locale = 'en-u-kn-true')
If anyone sorts on the vendor name column in the UI grid, I am adding the following clause dynamically in my query.
ORDER BY "vendor" COLLATE "numerickn"
However, it gives the following error, though I see collation exists in DB.
Error: Query failed: collation "numerickn" for encoding "UTF8" does
not exist
I am not sure why it does not work if collate exists in the DB. In my vendor name, numeric can appear anywhere within the string, so there is no pattern.
I could not find why it was not working in the stage environment while in my local it was working.
In the end, I moved away from colation logic and implement the natural sort in a different way found in stack overflow only.
PostgreSQL ORDER BY issue - natural sort
I am using Nodejs in my api code. My solution goes as follows
qOrderBy = String.raw` ORDER BY ARRAY(
SELECT ROW(
CAST(COALESCE(NULLIF(match[1], ''), '9223372036854775807') AS BIGINT),
match[2]
)
FROM REGEXP_MATCHES(vendor, '(\d*)|(\D*)', 'g')
AS match ) ${sortOrder}`
}

VARCHAR comparison on an indexed column

Postgres is behaving differently from the 'common sense' expected behavior:
Given a table 'my_table' and a VARCHAR(250) column named 'MyVarcharColumn' where an index IDX_MyvarcharColumn is created based on the 'MyVarcharColumn'.
Collation: Default
Postgres version: 11.12
LC_COLLATE: en_US.utf8
Enconding: UTF8
CTYPE: en_US.utf8
The problem is presented below:
Given a query (A)
SELECT * FROM my_table t
WHERE 'mystring' = t.MyVarcharColumn
When running the query above, no records are returned even though there is a value 'mystring' present in 'my_table'.
Workaround:
SELECT * FROM my_table t
WHERE 'mystring' = t.MyVarcharColumn collate "C"
By adding 'collate "C"' the query works fine, obviously no one wants to have to add the "collate" statement at the end of every query.
Second 'Workaround':
By recreating the databases indexes 'REINDEX database myDB' the query also starts to work as expected without the need of adding the statement 'collate'.
The question is: is there a way to avoid using the collate statement and/or the REINDEX to make this work without a workaround?
Re-creating the database with a different collation it also not an option at the moment.
Using lower(column_name) to compare isn't an option since it does not use indexes and it would make the query slow.

Trim/whitespace issue when load data from Db2 source to Postgresql DB using Talend Open source

We are seeing issue in table value which are populated from DB2 (source) to Postgres (Target).
I have including here all the job details for each component.
Based on the above approach and once the data has been populated, when we run the below query in the Postgres DB.
SELECT * FROM VMRCTTA1.VMRRCUST_SUMM where cust_gssn_cd='XY03666699' ;
SELECT * FROM VMRCTTA1.VMRRCUST_SUMM where cust_cntry_cd='847' ;
There will be no records were returned however, when we run the same query with Trim as below it works.
SELECT * FROM VMRCTTA1.VMRRCUST_SUMM where trim(cust_gssn_cd)='XY03666699' ;
SELECT * FROM VMRCTTA1.VMRRCUST_SUMM where trim(cust_cntry_cd)='847' ;
Below are the ways we have tried to overcome this but no luck.
Used tmap between source and target component.
Used trim in source component under Advanced setting.
Change the datatype in Postgres DB of cust_cntry_cd from char(5) to Character varying, this will allow value without any length restriction.
Please suggest what is missing as we have this issue in almost all the table where we have character/varchar columns.
We are using TOS.
The data type is probably character(5) in DB2.
That means that the trailing spaces are part of the column and will be migrated. You have to compare with
cust_cntry_cd = '847 '
or cast the right argument to character(5):
cust_cntry_cd = CAST ('847' AS character(5))
Maybe you could delete all spaces in the advanced settings of the tDB2Input component.
Like the screen :

Firebird errors when using Symmetricds

I am using symmetricds free version to replicate my firebird database. When I demo by creating new (blank) DB, it worked fine. But when I config on my existing DB (have data), error occurred.
I use Firebird-2.5.5.26952_0 32bit & symmetric-server-3.9.5, OS is Windows Server 2008 Enterprise.
I had searched for whole day but found nothing to solve this. Anyone please help. Thank for your time.
UPDATE:
When initial load, Symmetric execute the statement to declare UDF in firebird DB:
declare external function sym_hex blob
returns cstring(32660) free_it
entry_point 'sym_hex' module_name 'sym_udf
It caused error because my existing DB charset is UNICODE_FSS, max length of CSTRING is 10922. When I work around by updating charset to NONE, it worked fine. But it is not a safe solution. Still working to find better.
One more thing, anyone know others open source tool to replicate Firebird database, I tried many in here and only Symmetric work.
The problem seems to be a bug in Firebird where the length of CSTRING is supposed to be in bytes, but in reality it uses characters. Your database seems to have UTF8 (or UNICODE_FSS) as its default character set, which means each character can take up to 4 bytes (3 for UNICODE_FSS). The maximum length of CSTRING is 32767 bytes, but if it calculates in characters for CSTRING, that suddenly reduces the maximum to 8191 characters (or 32764 bytes) (or 10922 characters, 32766 bytes for UNICODE_FSS).
The workaround to this problem would be to create a database with a different default character set. Alternatively, you could (temporarily) alter the default character set:
For Firebird 3:
Set the default character set to a single byte character set (eg NONE). Use of NONE is preferable to avoid unintended transliteration issues
alter database set default character set NONE;
Disconnect (important, you may need to disconnect all current connections because of metadata caching!)
Set up SymmetricDS so it creates the UDF
Set the default character set back to UTF8 (or UNICODE_FSS)
alter database set default character set UTF8;
Disconnect again
When using Firebird 2.5 or earlier, you will need perform a direct system table update (which is no longer possible in Firebird 3) using:
Step 2:
update RDB$DATABASE set RDB$CHARACTER_SET_NAME = 'NONE'
Step 4:
update RDB$DATABASE set RDB$CHARACTER_SET_NAME = 'UTF8'
The alternative would be for SymmetricDS to change its initialization to
DECLARE EXTERNAL FUNCTION SYM_HEX
BLOB
RETURNS CSTRING(32660) CHARACTER SET NONE FREE_IT
ENTRY_POINT 'sym_hex' MODULE_NAME 'sym_udf';
Or maybe character set BINARY instead of NONE, as that seems closer to the intent of the UDF.

Postgresql order by - danish characters is expanded

I'm trying to make a "order by" statement in a sql query work. But for some reason danish special characters is expanded in stead of their evaluating their value.
SELECT roadname FROM mytable ORDER BY roadname
The result:
Abildlunden
Æblerosestien
Agern Alle 1
The result in the middle should be the last.
The locale is set to danish, so it should know the value of the danish special characters.
What is the collation of your database? (You might also want to give the PostgreSQL version you are using) Use "\l" from psql to see.
Compare and contrast:
steve#steve#[local] =# select * from (values('Abildlunden'),('Æblerosestien'),('Agern Alle 1')) x(word)
order by word collate "en_GB";
word
---------------
Abildlunden
Æblerosestien
Agern Alle 1
(3 rows)
steve#steve#[local] =# select * from (values('Abildlunden'),('Æblerosestien'),('Agern Alle 1')) x(word)
order by word collate "da_DK";
word
---------------
Abildlunden
Agern Alle 1
Æblerosestien
(3 rows)
The database collation is set when you create the database cluster, from the locale you have set at the time. If you installed PostgreSQL through a package manager (e.g. apt-get) then it is likely taken from the system-default locale.
You can override the collation used in a particular column, or even in a particular expression (as done in the examples above). However if you're not specifying anything (likely) then the database default will be used (which itself is inherited from the template database when the database is created, and the template database collation is fixed when the cluster is created)
If you want to use da_DK as your default collation throughout, and it's not currently your database default, your simplest option might be to dump the database, then drop and re-create the cluster, specifying the collation to initdb (or pg_createcluster or whatever tool you use to create the server)
BTW the question isn't well-phrased. PostgreSQL is very much not ignoring the "special" characters, it is correctly expanding "Æ" into "AE"- which is a correct rule for English. Collating "Æ" at the end is actually more like the unlocalised behaviour.
Collation documentation: http://www.postgresql.org/docs/current/static/collation.html