TYPO3 - Keep special chars in header - special-characters

How do I keep special chars in the header field in TYPO3?
As it is now it changes ß into ss. How can I keep the ß sign without writing codes?

Related

Anonymize arbitrary Unicode text to script-specific placeholders

When reporting bugs in one of my programs (which handles specialized XML files containing Bible text, which may contain copyrighted or otherwise protected material), there is an option to submit an anonymized XML file with the report. As some bugs only happen with certainly formatted text in the fields, punctuation should be preserved and only letters and digits anonymized.
At the moment, the code first identifies fields that should be anonymized, then iterates over each character and replaces uppercase Latin letters by X, lowercase Latin letters by x, Digits by 0 and Greek letters by Χ / χ.
Now I got a report that it does not work with Cyrillic text. I fixed it now by replacing any Unicode letters by X or x, but that will change the script of the letters unneccessarily and may make bugs unreproducible.
I could now of course add another special case for Cyrillic to solve this, and wait for the next bug report.
But I wondered if anybody else may have already compiled a list or database of "anonymizing" characters for each script available in Unicode. Or found a way to extract it from the UnicodeData either included in Java or provided by the Unicode consortium. Preferrably as a Java library, but any other file that can be used from Java should be fine, too.

Character Encoding Issue - Characters Being Replaced with Random Characters after Saving in Textarea

I'm working with a third-party company and I'm trying/hoping to determine the cause of a character encoding issue before I bring it up with them.
This company has a custom drag and drop editor for designing websites on their platform. Within the editor they have a Raw HTML widget that I can drag in and add my own content too. The problem is that when I copy HTML from someones old website, using the inspector tool, and paste it into this widget of theirs, all of the apostrophe's & double quotes get replaced with 'jibberish'. I also have the same issue when I try pasting the content into notepad, notepad++, sublime editors and then pasting it into their Raw HTML editor.
Here's a recording of the issue and a few examples:
https://streamable.com/phwn2
Here are the known characters that get replaced and what they get replaced
’ turns into â™
“ turns into âœ
” turns into â
+ turns into (a space)
Å turns into Ã…
" stays as "
' stays as '
Does anyone see a pattern with these characters or know what could be the cause of these characters being replaced?
The website probably has UTF-8 encoding, and the company's editor might be using something like Windows-1252 encoding. In your first example, the right single quote has UTF-8 encoding e2 80 99. When each of those bytes is read by a program using Windows-1252, you get "small latin letter a with circumflex" (e2), [undefined] 80 and "trademark" (99). I haven't checked the other transformations. If this is the problem, then you could do a workaround by first converting the copied characters to the destination encoding with iconv, before pasting into the company's editor.

Zend 1.12 trims accents and special characters off

I have a problem with my Zend installation.
It trims special characters (with accents) and simple quotes off before sending my to the my Database (MySQL)
In my bootstrap.ini, it's set to use UTF-8
`$dbAdapter->query("SET NAMES 'utf8'");`
Is there a special treatment to apply to my form elements before sending it to my database table ?
Thanks

Does American/British use non-ASCII characters?

I am a developer who is working with Chinese characters. I am trying to convert part of my project into English. I am currently rewriting the project's internationalization module.
I am unfamiliar with the standards for English, so I don't know if non-ascii is used widely?
If it is: Tell me some characters they use frequently.
Standard English spelling uses en dash (–), curly quotation marks (“, ”, ‘, ’); American English also uses em dash (—). Depending on conventions and preferences, several non-Ascii letters may be used, too, especially in words of French or Latin origin, such as é, ë, ç, and æ. Moreover, even in nonspecialized texts, various special character such as superscript two (²), micro sign (µ), and degree sign (°) may be seen.

UTF8 charset, diacritical elements, conversion problems - and Zend Framework form escaping

I am writing a webapp in ZF and am having serious issues with UTF8. It's using multi lingual content through Zend Form and it seems that ZF heavily escapes all of these characters and basically just won't show a field if there's diacritical elements 'é' and if I use the HTML entity equivalent e.g. é it gets escaped so that the user will see 'é'.
Zend Form allows for having non escaped data, but trying to use this is confusing, and it seems it'd need to be used all over the place.
So, I have been told that if the page and the text is in UTF8, no conversion to htmlentities is required. Is this true?
And if the last question is true, then how do I convert the source text to UTF8? I am comfortable setting up apache so that it sends a default UTF8 charset heading, and also adding the charset meta tag to the html, but doing this I am still getting messed up encoding. I have also tried opening the translation csv file in TextWrangler on OSX as UTF8, but it has done nothing.
Thanks!
L
'é' and if I use the HTML entity equivalent e.g. é it gets escaped so that the user will see 'é'.
This I don't understand. Can you show an example of how it is displayed, as opposed to how it should be displayed?
So, I have been told that if the page and the text is in UTF8, no conversion to htmlentities is required. Is this true?
Yup. In more detail: If the data you're displaying and the encoding of the HTML page are both UTF-8, the multi-byte special characters will be displayed correctly.
And if the last question is true, then how do I convert the source text to UTF8?
Advanced editors and IDEs enable you to define what encoding the source file is saved in. You would need to open the file in its current encoding (with special characters being displayed correctly) and save it as UTF-8.
If the content is messed up when you have the right content-type header and/or meta tag specified, then the content is not UTF-8 yet. If you don't get it sorted, post an example of what it looks like here.