TextMeshPro does not recognize subscripts and superscripts - unity3d

I am replacing my current Text components with TextMeshPro, and when replacing texts containing characters such as superscripts and subscripts, I am getting the error.
I am aware that in TextMeshPro there is the option <sub> </sub> and <sup> </sup> to create superscripts and subscripts, but I preferred to do it directly with characters, as I did with normal Text.
The doubt comes from the fact that some characters are recognized, and others not:
² (recognized)
⁵ (recognized)
⁶ (not recognized)
The subscripts, on the other hand, do not recognize even one of them.

This thread contains all the information on solving the problem.
https://forum.unity.com/threads/textmeshpro-does-not-recognize-subscripts-and-superscripts.1151834/
In short words,
I changed the default font to a font that supported superscripts and subscripts. From that font I created two TMP fonts, one static with the main ASCII characters and one dynamic for the rest of the characters. Then I added the dynamic font in the static font fallbacks, and fine. Add the static font to the TMP and that's it.

Related

Visually changing how a string of characters is represented in VScode

Prettify symbols is an extension in vscode that changes a sequence of characters visually without affecting what the code does. For example, visually changing --> to ⟶ while the coding language uses -->. However, this extension creates seemingly random symbols throughout the file and is no longer maintained. Therefore, that extension is hardly usable at the moment.
Fira Code uses ligatures to do something similar (or the same I am not sure).
What other ways are there to visually change a string of letters ? I am mostly interested in solutions for vscode. As an example I would like to change
~[\Omega]
to
Ω
visually for the user but the code uses the original ~~[\Omega]
[EDIT: I found this github page that adds ligatures to a font. Unfortunately when one creates a ligature where the "hidden" symbol contains many characters and the visible symbol contains a few symbols, a long trace of spaces replacing the missing characters is left behind. The prettify symbols extension mentioned before does not have these spaces. For those that are still interested in making ligatures with the second link, this Fira code font page shows the names of symbols in Fira code. That might be helpful when making a new font from Fira code using the first link of this edit (which is the second link of the question) ]

The list of unicode unusual characters

Where can I get the complete list of all unicode characters that doesn't behave as simple characters. Examples: character 0x0363 (won't be printed without another one before), character 0x0084 (does weird things when printed). I need just a raw list of such unusual characters to replace them with something harmless to avoid unwanted output effects. Regular characters (those who not in this list) should use exactly one character place when printed (= cursor moved +1 to the right), should not depend on previous or next characters, and should not affect printing style in any way.
Edit because of multiple comments:
I have some unicode string, usually consists of "usual" characters like 0x20-0x7E or cyrillic letters. Also, there are a lot of other unicode characters that are usual and may be safely assumed as having strlen() = 1. The string is printed on the terminal and I should know the resulting position of the cursor. I don't want to use some complex and non-stable libraries to do that, i want to have simplest possible logic to do that. Every problematic character may be replaced with U+0xFFFD or something like "<U+0363>" (ASCII string with its index instead of character itself). I want to have a list of "possibly-problematic" characters to replace. It is acceptable to have some non-problematic characters in this list too, but not much.
There is no simple algorithm for this. You'll likely need a complex, but extremely stable library: libicu, or something based on it. Basically every other library that does this kind of work is based on libicu, which is maintained by the Unicode organization.
If you don't want to use the official library (or something based on their library), you'll need to parse the Unicode Character Database yourself. In particular, you need to look at Character Properties, and parse the files in the UCD.
I believe you're asking for Bidi_Class (i.e. "direction") to be Left_To_Right, Canonical_Combining_Class to be Not_Reordered, and Joining_Type to be Non_Joining.
You probably also want to check the General_Category and avoid M* (Marks) and C* (Other).
This should work for some Emoji, but this whole approach will break a lot of emoji that look simple and are not. Most famously: ❤️, which is two "characters," not one. You may want to filter out Emoji. As a simple starting point, you may want to restrict yourself to the Basic Multilingual Plane (BMP), which are code points 0000-FFFF. Anything above this range is, almost by definition, rare or unusual. The BMP does include some emoji, but most emoji (and all new emoji) are outside the range.
Remember that the glyphs for single characters can still have radically different widths, even in nominally fixed-width fonts. For example, 𒈙 (U+12219 CUNEIFORM SIGN LUGAL OPPOSING LUGAL) is a completely "normal" character in the way you're describing. It is left-to-right. It doesn't depend on or influence characters around it (it's non-combining and non-joining). Its "length in characters" is 1. Its glyph is also extremely wide in most fonts and breaks a lot of layout. I don't know anything in the Unicode database that would warn you of this, since "glyph width" is entirely a function of fonts, not characters, and Unicode explicitly does not consider fonts. (That said, most of the most problematic characters are outside the BMP. Probably the most common exception is DŽ, but many fixed-width fonts have a narrow glyph for it: DŽ.)
Let's write some cuneiform in a fixed-width font.
Normally, every character should line up with a character above.
Here: 𒈙. See how these characters don't align correctly?
Not only is it a very wide glyph, but its width is not even a multiple.
At least not in my font (Mac Safari 15.0).
But DŽ is ok.
Also remember that there are multiple ways to encode the same "character." For example, é can be a "simple" character (U+00E9), or it can be two characters (U+0065, U+0301). So in some cases é may print in your scheme, and in others it won't. I suspect this is fine for your problem, but if it isn't, you're going to need to apply a normalization form (likely NFC).

Using characters larger than 0xFFFF

I have an OpenType font with some optional glyphs selected by features. I've opened it in FontForge and I can see that the associated unicode code point is, for example, 0x1002a.
Is it possible to use this value to render the glyph in iText? I've tried calling showText() with a string containing the corresponding surrogate pairs ("\uD800\uDC2A") but nothing appears.
Is there another way to do this, or am I barking up the wrong tree?

iText -- How do I identify a single font that can print all the characters in a string?

This is wrt iText 2.1.6.
I have a string containing characters from different languages, for which I'd like to pick a single font (among the registered fonts) that has glyphs for all these characters. I would like to avoid a situation where different substrings in the string are printed using different fonts, if I already have one font that can display all these glyphs.
If there's no such single font, I would still like to pick a minimal set of fonts that covers the characters in my string.
I'm aware of FontSelector, but it doesn't seem to try to find a minimal set of fonts for the given text. Correct? How do I do this?
iText 2.1.6 is obsolete. Please stop using it: http://itextpdf.com/salesfaq
I see two questions in one:
Is there a font that contains all characters for all languages?
Allow me to explain why this is impossible:
There are 1,114,112 code points in Unicode. Not all of these code points are used, but the possible number of different glyphs is huge.
A simple font only contains 256 characters (1 byte per font), a composite font uses CIDs from 0 to 65,535.
65,535 is much smaller that 1,114,112, which means that it is technically impossible to have a single font that contains all possible glyphs.
FontSelector doesn't find a minimal set of fonts!
FontSelector doesn't look for a minimal set of fonts. You have to tell FontSelector which fonts you want to use and in which order! Suppose that you have this code:
FontSelector selector = new FontSelector();
selector.addFont(font1);
selector.addFont(font2);
selector.addFont(font3);
In this case, FontSelector will first look at font1 for each specific glyph. If it's not there, it will look at font2, etc... Obviously font1, font2 and font3 will have different glyphs for the same character in common. For instance: a, a and a. Which glyph will be used depends on the order in which you added the font.
Bottom line:
Select a wide range of fonts that cover all the glyphs you need and add them to a FontSelector instance. Don't expect to find one single font that contains all the glyphs you need.

Unicode Keystroke Characters?

Does unicode have characters in it similar to stuff like the things formed by the <kbd> tag in HTML? I want to use it as part of a game to indicate that the user can press a key to perform a certain action, for example:
Press R to reset, or S to open the settings menu.
Are there characters for that? I don't need anything fancy like ⇧ Shift or Tab ⇆, single-letter keys are plenty. I am looking for something that would work somewhat like the Enclosed Alphanumerics subrange.
If there are characters for that, where could I find a page describing them? All the google searches I tried turned only turned up "unicode character keyboard shortcuts" stuff.
If there are not characters for that, how can I display something like that as part of (or at least in line with) a text string in Processing 2.0.1?
(The rendering referred to is not the default rendering of kbd, which simply shows the content in the system’s default monospace font. But e.g. in StackOverflow pages, a style sheet is used to format kbd so that it looks like a keycap.)
Somewhat surprisingly, there is a Unicode way to create something that looks like a character in a keycap: enter the character, then immediately COMBINING ENCLOSING KEYCAP U+20E3.
Font support to this character is very limited but contains a few free fonts. Unfortunately, none of them is a sans-serif font, and the character to be shown inside should normally appear in such a font – after all, real keycaps contains very simple shapes for characters, without serifs. And generally, a character and an enclosing mark should be taken from the same font; otherwise they might be incompatible. However, it seems that taking the normal character from the sans-serif font (FreeSans) in GNU Freefont and the combining mark from the serif font (FreeSerif) of the same source creates a reasonable presentation:
I’m afraid it won’t work here in text, but I’ll try: A⃣ .
Whether this works depends on the use of suitable fonts, as mentioned, but also on the rendering software. Programs have been rather bad at displaying combining marks, but there has been some improvement. I tested this in Word 2007, where it works OK, and also on web browsers (Chrome, Firefox, IE) with good results using code like this:
<style>
.cap { font-family: FreeSerif; }
.cap span { font-family: FreeSans; }
</style>
<span class="cap"><span>A</span>⃣</span>
It isn’t perfect, when using the fonts mentioned. The character in the cap is not quite centered. Moreover, if I try to use the technique e.g. for the character Å (which is present on normal Nordic keyboards), the ring above A extends out of the cap. You could tweak this by setting the font size of the letter in the cap to, say, 85% of the font size of the combining mark, but then the horizontal position of the letter is even more off.
To summarize, it is possible to do such things at the character level, but if you can use other methods, like using a border or a background image for a character, you can probably achieve better rendering.