Why does the red heart emoji require two code points, but the other colored hearts require one? - unicode

It appears that the red heart emoji (❀️) "\u2764\uFE0F" requires two Unicode codepoints, specifically Heavy Black Heart followed by a Variation Selector. However, blue πŸ’™, green πŸ’š, yellow πŸ’›, and purple πŸ’œ each have their own single codepoint.
Why is red so different?

For historical reasons. Originally, there was only U+2764 HEAVY BLACK HEART which the first applications that supported Emojis decided to render as a red heart. These early applications always rendered U+2764 as Emoji. Later it was realized that this was a bad idea and the variation selectors for Emojis were standardized. When additional heart emojis were added, there was no need for another red heart, so it was omitted. Instead there's a separate black heart emoji U+1F5A4 πŸ–€.
In theory, an application could require that the Emoji variation selector is appended to other heart code points as well. But it doesn't make much sense to render characters like PURPLE HEART as a non-Emoji. It does make a difference for HEAVY BLACK HEART, though, which is often intended to be rendered as the original, plain heavy black heart character.

HEAVY BLACK HEART was added to Unicode decades before emoji. When emoji were incorporated in Unicode 6 some already existing characters were simply reused as emoji to avoid unnecessary duplicates. Later, variation sequences were defined for characters that also map to a non-emoji character set to allow for better control over how they display. For example, U+2744 ❄ SNOWFLAKE is originally from Zapf Dingbats (I believe) but was later also made an emoji. So if you want to force the original text-style display you can use VARIATION SELECTOR-15 (resulting in β„οΈŽ), and if you want to force the newer emoji-style display you can use VARIATION SELECTOR-16 (resulting in ❄️).
Note, however, that not many platforms actually support those variation sequences correctly at the moment. Also not all of them automatically apply the variation selectors when using the emoji keyboard. In theory ❀ and ❄ (and many other emoji) should display as text style by default without VS16, but many applications ignore that as well.
I have a list of all code points that can display differently via a variation sequence, on my website, if you're interested. The next Unicode update in June is going to add some more.

Related

Are all "non-grapheme" code points invisible?

In a unicode string, each grapheme consists of one or more code points. However, there are some code points, such as the Zero-width joiner (ZWJ), which are never a part of a grapheme. The ZWJ is, in itself, invisible. Are all of those "non-grapheme" code points always invisible?
The Unicode representation of the Ogham script is notable for containing a non-invisible whitespace character. (U+1680: OGHAM SPACE MARK)
Tom Scott made an excellent YouTube video on the subject: link
There are many joining characters which are intended to modify a base character. Whether they provide a grapheme on their own is partially an implementation detail, I expect.
Example: o followed by U+0308 COMBINING DIAERESIS produces â (the glyph in isolation is rendered by your browser as ̈)
List of all code points in this category: https://codepoints.net/search?lb=CM
Recent Unicode versions also have invisible characters which modify how a previous emoji is being rendered, famously to add e.g. a skin color trait to emojis with human figures or faces. These by definition are not graphemes in their own right, though again, rendering engines are probably free to figure out a way to represent them if they are encountered in isolation.
Example: πŸ‘‹ U+1F44B WAVING HAND SIGN followed by U+1F3FB EMOJI MODIFIER FITZPATRICK TYPE-1-2 (which in isolation renders as 🏻) produces πŸ‘‹πŸ»
Full catalog: https://www.unicode.org/emoji/charts/full-emoji-modifiers.html

Are they unicode code points that enable geometric transformation such as rotation and mirroring?

Playing with Unicode to create symbols with the already large set of combiners and other modifiers allows to already go far.
Although, there are times where some arrows are only given in a single direction, or a diacritic is available only placed above, but not for example bellow on the left side.
So are they modifiers/combiners that allow to instruct such a composition?
For example, the combining rectangle allows to make something like aΜ». At least on the current terminal, it's rendered with a rectangle on the above right position compared to the a glyph to which it's combined, having it's longest side oriented horizontally. Now, what if :
the goal is to place the rectangle at the top left, top middle, etc.?
the goal is to rotate the rectangle before it's combined with the main glyph?
the goal is to mirror the rectangle before it's combined with the main glyph?
Obviously the last point don't make much difference for a rectangle, but for asymmetric glyphs it would.
No, there is no such mechanism in Unicode. Different positional variants of the same diacritic are encoded as separate characters. For example, U+0307 COMBINING DOT ABOVE, U+0358 COMBINING DOT ABOVE RIGHT, and U+1DF8 COMBINING DOT ABOVE LEFT are all different codepoints. There is currently no way to represent, say, a generic combining dot below right in Unicode.
Similarly, arbitrary Unicode characters cannot be mirrored or rotated. Where such transformations make a meaningful distinction (for example the pair β€œE” and β€œΖŽβ€), they have once again been encoded atomicly.
There are some very specific circumstances where such modifiers can be applied. In Sutton SignWriting, rotation is a productive feature. Rotating glyphs is necessary to display text correctly, so a number of rotation modifiers have been defined. For example, U+1D800 SIGNWRITING HAND-FIST INDEX points upwards in its base orientation (𝠀), but by appending U+1DAA1 SIGNWRITING ROTATION MODIFIER-2 you can make it point north-west instead (𝠀πͺ‘).
For emoji only, Unicode also specifies a mechanism for defining whether a given glyph is supposed to face left or right. For example, β€œπŸš—β€β¬…οΈβ€ would be an automobile going to the left and β€œπŸš—β€βž‘οΈβ€ would be an automobile going to the right. No commercially available fonts presently support this mechanism, however.

Color in the Unicode standard?

Unicode 6.0 added several characters with descriptions that suggest those characters are supposed to be rendered in a specific color:
RED APPLE U+1F34E
GREEN APPLE U+1F34F
BLUE HEART U+1F499
GREEN HEART U+1F49A
YELLOW HEART U+1F49B
PURPLE HEART U+1F49C
GREEN BOOK U+1F4D7
BLUE BOOK U+1F4D8
ORANGE BOOK U+1F4D9
LARGE RED CIRCLE U+1F534
LARGE BLUE CIRCLE U+1F535
LARGE ORANGE DIAMOND U+1F536
LARGE BLUE DIAMOND U+1F537
SMALL ORANGE DIAMOND U+1F538
SMALL BLUE DIAMOND U+1F539
UP-POINTING RED TRIANGLE U+1F53A
DOWN-POINTING RED TRIANGLE U+1F53B
UP-POINTING SMALL RED TRIANGLE U+1F53C
DOWN-POINTING SMALL RED TRIANGLE U+1F53D
I had thought font symbols were always grayscale.
Did the unicode authors forsee that these might be rendered in different colors?
Within the official unicode.org PDFs (http://www.unicode.org/charts/PDF/U1F300.pdf), these are portrayed only as having different types of crosshatching.
Is there any current mechanism that would allow for specific characters to be rendered in a specific color, based only on its codepoint, and not any other rich-text formatting? (eg. a color property within TrueType or OpenType font files)
From the Unicode FAQ: Emoji and Dingbats, bolding mine:
Q: What about characters whose name specifies a color?
A: Some of the characters from the core emoji sets have names that include a color term, for example, BLUE HEART or ORANGE BOOK. These color terms in the names do not imply any requirement about how a character must be presented; they are intended only to help identify the corresponding character in the core emoji sets. Even names of symbols such as BLACK MEDIUM SQUARE or WHITE MEDIUM SQUARE are not meant to indicate that the corresponding character must be presented in black or white, respectively; rather, the use of black and white is generally just to contrast filled versus outline shapes, or a darker color fill versus a lighter color fill. [PE]
There was quite a bit of debate on the mailing lists at the time on whether these should be named with colors, or generic names that didn't reference color, and whether that was setting a bad precendent. The Emoji Symbols: Background Data includes "old names" such as APPLE-1 instead of RED APPLE and BOOK-3 instead of ORANGE BOOK.
The final names use this principle:
Symbols with an inherent color shall bear this color in their
name unless the entity denoted by the name has identifies the color
anyway (e.g., a BANANA is uniquely yellow and therefore does
not need to be called YELLOW BANANA, while a RED APPLE must be
named so as there are also green apples).
Unicode 6.1 have a feature to change glyph for the same unicode code point, by specifying Variation Selector(U+FE0x).
For example, left-pointing triangle(#"\U000025C0", β—€) can be colored by adding "\U0000FE0F" ◀️* and non-colored by adding "\U0000FE0E" as suffix. (#"\U000025C0\U0000FE0E", β—€οΈŽ**).
*looks default on Mac OS X 10.8**This is default on Linux.
From https://learn.microsoft.com/en-us/typography/opentype/spec/otff#tables-related-to-color-fonts:
Tables Related to Color Fonts
COLR: Color table
CPAL: Color palette table
CBDT: Color bitmap data
CBLC: Color bitmap location data
sbix: Standard bitmap graphics
SVG : The SVG (Scalable Vector Graphics) table
In short,
CBDT/CBLC contain colored bitmaps (in PNG). They were proposed by Google.
sbix contains colored bitmaps (in JPG, PNG, or TIFF). It was proposed by Apple.
COLR defines one or more accompanying color glyphs (in vector format) for each glyph, and when they are overlapped they create the final colored glyph. CPAL defines several color themes (dark-on-white, white-on-dark, ...) since COLR is merely paletted images. COLR/CPAL were proposed by Microsoft.
SVG was proposed by Mozilla and Adobe. It may be used with CPAL.
FreeType (part of Android, iOS, and macOS) supports CBDT/CBLC and sbix since 2.5 and 2.5.1 (released in 2013), and COLR/CPAL since 2.10.0 (released in 2018). DirectWrite (part of Windows) supports COLR/CPSL since 8.1 (released in 2013) and all four above since 10 1607 (released in 2016).
Noto Color Emoji (default on Android) uses CBDT/CBLC. Segoe UI Emoji (default on Windows) uses COLR/CPAL. Apple Color Emoji (default on iOS and macOS) uses sbix.
See also https://en.wikipedia.org/wiki/OpenType#Color
I don't know that there's any standard mechanism for colored fonts, but obviously there are colored fonts. For example, the emoji font in iOS and OS X. Emoji characters in any text view on OS X will result in colored symbols, and they won't be affected by choosing a text color. These emoji even show up in Terminal.app.
(From this page.)

How to center align, ignoring certain characters?

Look at this UILabel. It's center-aligned:
Now look at this UILabel. Although it is technically center-aligned, it really doesn't look that way:
The reason why it looks like this is because the center-alignment considers the degree symbol a third character, thus bumping the other two off to the left a bit. My question is: is there any way to ignore certain characters whilst center-aligning a label?
Interesting question. The only solution that comes to mind for me is to pad the text string with spaces on the front to cancel out the ignored characters on the back.
That is, to center #"60d" as if it were #"60", set the text to #" 60d". This works well with a fixed width font, but otherwise is only a rough approximation.
If you like this idea and want to get fancy with it, then you can use NSStrings method
– stringByPaddingToLength:withString:startingAtIndex:
perhaps in conjunction with – rangeOfCharacterFromSet: or some such method to determine how many spaces to pad with.
You could of course measure the text string(s) and compute your own positioning, rather than using text alignment in a larger field.
Assuming you don't want to do that, another idea that comes to mind is to display the string β€œΒ°60°” with the first character styled with a color of opacity 0 and no shadow.
I don't do iOS development so I don't know how practical these are.

iOS japanese handwriting input code help please

I have a series of questions about writing code for iOS and including handwritten recognition of japanese. I am a beginner, so be gentle and assume I am stupid ...
I'd like to present a japanese word in hiragana (japanese phonetic alphabet), then have the user handwrite the appropriate kanji (chinese character). Then, this is internally compared to the correct character. Then, user gets feedback (if they were correct or not).
My questions here revolve around the handwritten input.
I know normally if one uses the chinese keyboard this type of input is possible.
How can I institute something similar, without using the keyboard itself? Are there already library functions for this (I feel there must be since that input is available on the chinese keyboard)?
Also, Kanji aren't exactly the same as chinese characters. There are unique characters that japanese people invented themselves. How would I be able to include these in my handwriting recognition?
We worked on a similar exercise back at University.
As the order of the strokes is well defined with kanji and there are only 8 (?) different strokes. Basically each Kanji is a well-ordered sequence of strokes. Like te (hand) is the sequence "The short falling backward stroke" and then twice the "left to right stroke" and finally "The long downward stroke with the little tip at the bottom". There are databases that give you this information.
Now the problem is almost reduced to identify the correct stroke. You will still run into some ambiguities where you have to take into consideration in which spatial relation some strokes are to some others.
EDIT: For stroke recognition we snapped the free hand writing to 45 degrees (Where is the little circle symbol on the keyboard?) angles, thus converting it into a sequence of vectors along one of these directions. Let's assume that direction zero is from bottom to top, direction 1 bottom right to top left, 2 from right to left and so on CCW.
Then the first stroke of te (手) would be [23]+ (as some write it falling and some horizontal)
The second and third stroke would be 6+
and the last would be 4+[123] (as with the little tip, every writer uses a different direction)
This coarse snapping was actually enough for us to recognize kanjis. Maybe there are more sofisticated ways, but this simple solution managed to recognize about 90% of kanjis. It couldn't grasp only the handwriting of one professor, but the problem was that also no human except himself could read his handwriting.
EDIT2: It is important that your user "prints" the Kanji and doesn't write in calligraphy, since in calligraphy many strokes are merged into one. Like when writing a kanji with the radical of "rice field" in calligraphy, this radical morphs into something completely different. Or radicals with a lot of horizontal dashes (like the radical of "speech" iu) just become one long wriggly line.