I'm combining some characters with unicode combining marks. Some marks when presented in labels, however appear shifted to the right, this is not the actual example but let's say I were to combine A and ˚. Instead of having Å, i have A˚. If I copy the text and paste it somewhere else, the character appears perfect (Å).
To combine the characters I use a method that does this:
Character("\(character)\(mark)")
Where character would be a letter and mark an accent or another combining mark.
I read that this may happen because some fonts don't support certain characters. The font I use for my labels where I display the combined stuff is the systemFont.
Why is this happening? How can I prevent combined characters from being shifted to the right?
Related
I'm looking at formatting a utf8 free text string to fit an exact column width on a terminal. I'm coding various truncation methods (left/middle/right) for long strings however, when the truncation break point lies over a wide character, such as an emoji, the display column counting falls apart. some form of padding is needed for the 'half wide' column placement.
Is there a suitable narrow character to show that indicates we do have valid unicode character, but insufficient display space to show it, as opposed to the special replacement character � usually used for invalid unicode ??
Example: on a fixed spacing terminal fit two smiley emojis into the space that would fit 'aaa'. e.g. "👨👨" ! so need a, preferably standardised, substitute character for the second emoji/wide character, e.g. "👨⋮" to fit that three wide space.
A side issue is trying to work out when decomposed composite characters start and end, (also are there combining prefixes?). It looks like the next code point needs to be read to see if it is still zero width (e.g. 'o' U+006F, then 'umlaut' U+0308, rather than ö U+00F6; don't stop after the plain 'o').
In a unicode string, each grapheme consists of one or more code points. However, there are some code points, such as the Zero-width joiner (ZWJ), which are never a part of a grapheme. The ZWJ is, in itself, invisible. Are all of those "non-grapheme" code points always invisible?
The Unicode representation of the Ogham script is notable for containing a non-invisible whitespace character. (U+1680: OGHAM SPACE MARK)
Tom Scott made an excellent YouTube video on the subject: link
There are many joining characters which are intended to modify a base character. Whether they provide a grapheme on their own is partially an implementation detail, I expect.
Example: o followed by U+0308 COMBINING DIAERESIS produces ö (the glyph in isolation is rendered by your browser as ̈)
List of all code points in this category: https://codepoints.net/search?lb=CM
Recent Unicode versions also have invisible characters which modify how a previous emoji is being rendered, famously to add e.g. a skin color trait to emojis with human figures or faces. These by definition are not graphemes in their own right, though again, rendering engines are probably free to figure out a way to represent them if they are encountered in isolation.
Example: 👋 U+1F44B WAVING HAND SIGN followed by U+1F3FB EMOJI MODIFIER FITZPATRICK TYPE-1-2 (which in isolation renders as 🏻) produces 👋🏻
Full catalog: https://www.unicode.org/emoji/charts/full-emoji-modifiers.html
Is there any way to display diacritical marks like following without the dotted ring?
◌́
◌̀
◌̃
Each of these items are actually two characters in Unicode that are combined via ligatures or mark-to-base features in the font. The dotted circle is 0x25CC, and the marks you have here are 0x301, 0x300, and 0x303 - each of these are designed to combine with the previous character, but there are non-combining versions of each of these: 0x2CA, 0x2CB, and 0x2DC.
So you can delete the dotted circle from the beginning of the character (it may be difficult to figure out where this character is, since the marks have a width of zero), and replace it with a space, but it may display in odd ways depending on what's surrounding it:
́
̀
̃
Or use the non-combining versions of these marks:
ˊ
ˋ
˜
What's the Unicode or Segoe UI Symbols (or other font) code for exclamation mark in circle?
There is no single Unicode codepoint for that particular symbol.
Unicode does define a U+20DD COMBINING ENCLOSING CIRCLE codepoint, but most fonts (including Segoe) do not treat it as a combining symbol, but rather as its own character. In Word, for instance, you would have to adjust the character spacing between it and a preceding character (in this case U+0021 EXCLAMATION MARK) to a negative offset to make them overlap (see Using the “Combining Enclosing Circle” character in Word).
Some fonts do support U+20DD in general (see COMBINING ENCLOSING CIRCLE (U+20DD) Font Support), and some of them do treat it as a combining mark (Code2000, GNU FreeFont fonts, STIX fonts, Symbola, XITS, etc), but the resulting overlap may not visually be exactly what you are looking for, depending on the size and alignment of the character it is being combined with.
I am trying to minimize the vertical distance between controls on a programmatically constructed Windows Form (using C#). This involves setting the Height property appropriately.
I have found that if the text of the control does not contain any letters with descenders in them (i.e. does not have any of the characters j, g, p, q or y) then the control Height can be smaller than when it does contain such letters (if it does contain letters with descenders then the descenders are chopped off if the Height isn't enough).
It will work fine to test for any of the above 5 characters as long as the language is English, or English - like, but I need to be able to cater for (just about) any language.
Is there a way, given some arbitrary Unicode character (and perhaps a font) to determine if that Unicode character has a descender or not?
There is no property defined for Unicode characters to indicate the presence of a descender, and it’s really a feature of glyph design rather than characters. For example, “Q” has a descenders in many fonts, and “J” has one in some. Besides, given the context, you should also consider diacritic marks placed below a letter, not just descenders of base letters. And probably diacritics above letters, too.
So you would need to read the font information (when available) about character dimensions, or tentatively draw characters in your software and measure their dimensions.
As a rule of thumb, any line height below 1.1 times the font size will cause problems with some characters and fonts. Using 1 (“setting solid”) is not enough, because characters may in fact extend outside the font size.
In Windows, you call GetPath() to get an array containing the X/Y coordinates of every point making up the perimeter or outline of the string of glyphs. Search the array for min/max, which will get you the rectangle exactly enclosing the string. Right to the edge of the letters.