How do I identify "simple" monochome emojis like ☺? - unicode

There are a number of Unicode characters categorized as "Extended Pictographic" which have no color and usually appear smaller than "normal" emojis. Here are some examples:
☺ ☻ ♥ ♦ ♣ ♠ ♂ ♀ ♪ ♫ ☼ ↕ ↔
These don't have a full-colour emoji counterpart. Does the Unicode Consortium provide any table or other information that allows me to identify these characters, i.e. to distinguish these monochrome "Extended Pictographic" characters from full-colour "Extended Pictographic" characters? I wasn't able to find such information myself.

The distinction you're seeing is called "emoji presentation" vs "text presentation." Some characters have both, and may have one or the other be the default.
The file you want is emoji-data. When you say "these don't have a full-colour emoji counterpart," that's not correct for most of them. They just have default text representation. I'll walk through a few of these to see how to understand them. One of my favorite exploration tools for this is the Unicode Utilities. You'll want UTS#51 as well.
☺: WHITE SMILING FACE
(Note that in Unicode, WHITE means "not filled in" and BLACK means "filled in" in many cases for historical reasons, going back to Japanese flip phones. They are not actually colors. Similarly, HEAVY means "bold" or "wide.")
ID: 263A
Emoji: yes
Extended_Pictographic: yes
Emoji_Presentation: no
So there is an emoji form of this, but it's not the default. We have to ask for it by adding U+FE0F (VARIATION SELECTOR-16). With that, this character displays as ☺️.
😊 SMILING FACE WITH SMILING EYES
For comparison, see this character, which is a more "traditional" emoji.
ID: 1F60A
Emoji: yes
Extended_Pictograph: yes
Emoji_Presentation: yes
☻: BLACK SMILING FACE
ID: 263B
Emoji: no
Extended_Pictographic: yes
Emoji_Presentation: no (necessarily, since not Emoji)
So, this is not an emoji at all which means it has no "emoji presentation." It is merely "similar in kind to characters with the Emoji property" (i.e. extended pictographic).
↕: UP DOWN ARROW
Another example of an Emoji without Emoji_Presentation. The emoji form is ↕️.
1: DIGIT ONE
And just for a little completeness:
ID: 0031
Emoji:yes
Extended_Pictographic: no
Emoji_Presentation: no
Digits are also emoji, and can take VS-16 modifier:
default: 😊123😊
as emoji: 😊1️2️3️😊
If you want a browsable list of characters with different properties, see the Character Property Index.

Related

Why do the printed unicode characters change?

The way the unicode symbol is displayed depends on whether I use the White Heavy Check Mark or the Negative Squared Cross Mark before it or not. If I do, the Warning Sign is coloured. If I put a space between the symbols, I get the mono-coloured text-like version.
Why does this behaviour exist and can I force the coloured symbol somehow?
I tried a couple of different REPLs, the behaviour was the same.
; No colour
(str (char 0x274e) " " (char 0x26A0))
; Coloured
(str (char 0x274e) "" (char 0x26A0))
Clojure unicode display.
I expect the symbol being displayed the same way regardless of which symbol comes before it.
Why does this behaviour exist
A vendor thought it would be a neat idea to render emoji glyhps in colour. The idea caught on.
https://en.wikipedia.org/wiki/Emoji#Emoji_versus_text_presentation
can I force the coloured symbol somehow
U+FE0E VARIATION SELECTOR-15 and U+FE0F VARIATION SELECTOR-16
http://mts.io/2015/04/21/unicode-symbol-render-text-emoji/
Unicode is about characters (code points), not glyphs (see it as "image" of a character).
Fonts are free to (and should) merge nearby characters into a single glyphs. In printed Latin scripts this is not very common (but we could have it e.g. ff,fi, ffi), without considering the combining codepoints which, per definition, should combine with other characters, to get just one glyph,
Many other scripts require it. Starting to cursive Latin scripts, but most cursive scripts requires changes. E.g. Arabic has different glyphs of initial, final, middle or separated character (+ special combination, common to cursive scripts). Indian scripts have similar behaviours.
So the base of Unicode has already this behaviour, and modern good fonts should be able to do it.
It was not so late, that emojii uses such functionality, e.g. country letters/flags to other common cases.
Often the Unicode documentation tell you of such possibilities, and the special code points which could change behaviour, but then it is task of the font to fullfil the expected behaviour (and to find good glyphs).
So: character (as unicode code point) is not one to one to a design (glyphs).

Special characters appear as emoji

I'm developing an Android app with Galaxy S8.
When I write some special characters, they appear to be Galaxy emoji. Things like these: ↔♥◀▶
How can I prevent them from turning into emoji?
Typically to select between the emoji and the character version you'll have to use a variation selector
Unicode defines variation sequences for many of its emoji to indicate their desired presentation.
Emoji characters can have two main kinds of presentation:
an emoji presentation, with colorful and perhaps whimsical shapes, even animated
a text presentation, such as black & white
— Unicode Technical Report #51: Unicode Emoji
Specifying the desired presentation is done by following the base emoji with either U+FE0E VARIATION SELECTOR-15 (VS15) for text or U+FE0F VARIATION SELECTOR-16 (VS16) for emoji-style.
https://en.wikipedia.org/wiki/Emoji#Emoji_versus_text_presentation
So if you don't want them to be displayed as emojis just attach a VS15 to the end

Two different eye emojis?

As far as I knew, there are currently two emojis for eyes. The pair of eyes (U+1F440) with hex code f09f9180 (👀), and a single eye (U+1F441) with hex code f09f9181 (👁).
I now found when using the emojis of the keyboard in my phone that another eye emoji exists, with hex code f09f9181efb88f (👁️).
The gajim messenger on the PC, and the Conversations app on the mobile phone, can display both. The gajim emoji-chooser only contains the short sequence and the Swiftkey-Keyboard Emoji-Chooser only the longer one.
When I copy and paste the emojis i.e. in the Firefox URL address bar, they look the same (blue eye, while the messengers both display them in black). When I Google for the emojis, I only find pages describing the shorter code point.
Firefox renders both emojis the same, but Vivaldi (Chromium based) shows the one with the shorter code point as narrow black and white emoji and the other one as larger brown eye.
When I Google for the hex dump, I find a lot of emojipedia sites for the shorter dump, and nothing useful at all for the longer one.
Is there somewhere any documentation about the additional emoji? Why aren't both emojis available in both emoji choosers?
f0 9f 91 80 is the UTF-8 encoded form of codepoint U+1F440.
f0 9f 91 81 is the UTF-8 encoded form of codepoint U+1F441.
f0 9f 91 81 ef b8 8f is the UTF-8 encoded form of codepoints U+1F441 U+FE0F.
U+FE0F is a Variation Selector:
Variation Selectors is a Unicode block containing 16 Variation Selector format characters (designated VS1 through VS16). They are used to specify a specific glyph variant for a Unicode character. They are currently used to specify standardized variation sequences for mathematical symbols, emoji symbols, 'Phags-pa letters, and CJK unified ideographs corresponding to CJK compatibility ideographs. At present only standardized variation sequences with VS1, VS15 and VS16 have been defined.
Where U+FE0F is VARIATION SELECTOR-16:
U+FE0F was added to Unicode in version 3.2 (2002). It belongs to the block Variation Selectors in the Basic Multilingual Plane.
This character is a Nonspacing Mark and inherits its script property from the preceding character.
The glyph is not a composition. It has a Ambiguous East Asian Width. In bidirectional context it acts as Nonspacing Mark and is not mirrored. In text U+FE0F behaves as Combining Mark regarding line breaks. It has type Extend for sentence and Extend for word breaks. The Grapheme Cluster Break is Extend.
This codepoint may change the appearance of the preceding character. If that is a symbol, dingbat or emoji, U+FE0F forces it to be rendered as a colorful image as compared to a monochrome text variant. The Unicode standard defines some standardized variants. See also “Unicode symbol as text or emoji” for a discussion of this codepoint.
In other words, U+FE0F tells VS-aware software to render U+1F441 as a colorful emoji instead of as monochromatic text.
The singular ‘👁’ is used as an emoji, but is defined as being text-style (i.e. black-and-white rather than colourful) by default. This isn’t implemented consistently across all platforms, however, so sometimes the character will also display as emoji style instead. In order to explicitly force one or the other style, the characters U+FE0E and U+FE0F can be appended to 👁 to make it appear as text style (👁︎) or emoji style (👁️) respectively. Because of the inconsistencies I mentioned, some devices and applications automatically add U+FE0F to the character (resulting in the longer code your phone keyboard produced), while others leave the character as-is (leaving just the code for the eye itself).

Which is the difference between the tick symbol U+2713 and U+2714

Appart from the visual aspect... do those tick symbols have any different semantics?
I mean: One is thin and the other bold. But... any special meaning for one or the other? Or it is just a matter of using one graphical aspect or another?
Unlike the majority of characters in Unicode, the Dingbats range U+27xx have no particular semantic content. The 'heavy' check mark has no meaning beyond 'a check mark that is visually bolder than the other one'; contrast this with the 'bold' letters in plane that have a mathematical meaning.
This range of characters comes from the symbol font Zapf Dingbats. Symbol fonts are visual in nature and don't fit well in Unicode, but Zapf Dingbats has historical significance as a one of the PostScript core font set guaranteed to be available on PS printers. Subsequently characters from Zapf Dingbats have commonly been used in document interchange, making it worthwhile to standardise them.
Appart from the visual aspect.
There's no appart here, the visual aspect is King in Unicode. U+2713 is a check mark. U+2714 is a heavy check mark. It should appear as a bolder version of U+2713 if you have a decent font.
These codepoints are in a group named Dingbats, a group of typographical symbols. Chess pieces, arrows, asterisks, that sort of thing. There's no semantic meaning attached to them. It is just heavier.

What is the unicode variation selector

I was wondering. What is the unicode Variation Selectors U-FE00 to U-FE0F used for.
Example: ︀︁︂︂ 
The Unicode standard talks about this. Here's a bit of the relevant section from 3.2.0, annex 28 (I'm sure there are more recent versions around; this is the first I found):
Unicode characters can be represented by a wide variety of glyphs, as discussed in Chapter 2, General Structure in The Unicode Standard, Version 3.0. Occasionally the need arises in text processing to restrict or change the set of glyphs that are to be used to represent a character. Normally such changes are indicated by choice of font or style in rich-text documents. In special circumstances, such a variation from the normal range of appearance needs to be expressed side-by-side in the same document in plain-text contexts, where it is impossible or inconvenient to exchange formatted text. For example, in languages employing the Mongolian script, sometimes a specific variant range of glyphs is needed for a specific textual purpose for which the range of “generic” glyphs is considered inappropriate. The variation selectors are used when characters have essentially the same semantic.
Variation selectors provide a mechanism for specifying a restriction on the set of glyphs that are used to represent a particular character. They also provide a mechanism for specifying variants, such as for CJK Ideographs and Mongolian, that have essentially the same semantic but have substantially different ranges of glyphs. A variation sequence, which always consists of a base character followed by the variation selector, may be specified as part of the Unicode Standard. That sequence is referred to as a variant of the base character. The variation selector affects only the appearance of the base character,* and only in the variation sequences defined in this Standard. The variation selector is not used as a general code extension mechanism.
(It goes on...)
You may also be interested in the Standardized Variants (this time from 6.0.0).
This is not a complete answer to the question, but it's pertinent to Emojis and Variant Selectors:
The ❤ character (U+2764 code point) is a Unicode character from 1993.
But the ❤️ emoji is actually the ❤ (U+2764) character followed by the Variant Selector-16 (U+FE0F).
Why?
Exclusively speaking about Emojis (documentation):
VS15 and VS16 are reserved to determine whether or not a character
should be displayed as an emoji. [...]
Emoji variation sequences contain VS16 (U+FE0F) for emoji-style (with color) or VS15 (U+FE0E) for text style (monochrome)
If there is a character (or symbol, glyph, etc...) that is intended to be also a emoji, the Variant Selector-16 will specify to the render, to renders it as Emoji. But if the same character is followed by the Variant Selector-15, it will specify to the render, to renders it as just text. If no Variant Selector is appended, than the default representation will depends on Unicode's specification. For Emoticons the default is Emoji. For other characters like ❤, the default is text...
Another example from Emoticons (Unicode_block)'s documentation:
Each emoticon has two variants:
U+FE0E (VARIATION SELECTOR-15) selects text presentation (e.g. 😊︎ 😐︎ ☹︎)
U+FE0F (VARIATION SELECTOR-16) selects emoji-style (e.g. 😊️ 😐️ ☹️).
If there is no variation selector appended, the default is the
emoji-style. Example:
U+1F610 (NEUTRAL FACE) 😐
U+1F610 (NEUTRAL FACE), U+FE0E (VARIATION SELECTOR-15) 😐︎
U+1F610 (NEUTRAL FACE), U+FE0F (VARIATION SELECTOR-16) 😐️
Note: The VS15 and VS16 are not mandatory to a valid emoji. There are a lot of emoji without Variant Selectors.
Your guess is as good as mine.. but according to this source...
has got it...
Emoji Character Encoding Data Hints: 1 In iOS 5 / OSX 10.7, the underlying code that the Apple OS generates for this emoji was changed.2 The code generated for this emoji was changed slightly in iOS 7 / OSX 10.9 (a variation selector was added) to make it easier for this emoji to be identified and shown in OSX and iOS. We don't mind Apple, thank you! We just love our emojis!
Their chart goes on to note that this "new", post-10.9 version
has a UTF-8 Character Count of 2 vs the previous 1... if that helps.
The Variation Selectors range was introduced with version 3.2 of the Unicode Standard, and is located in Plane 0, the Basic Multilingual Plane. Further selectors can be found in the Variation Selectors Supplement range.
Most Unicode characters can be represented by a wide variety of glyphs, and in rich text a particular glyph can be indicated by choosing a particular font or style. This mechanism is not available in plain text, and so variation selectors have been introduced as a way of indicating that the glyphs applicable to a particular character should be changed or restricted. The base character is followed by the variation selector, the combination being called a variation sequence. This is not intended to be general-purpose mechanism, and the only permitted variation sequences are those defined in the Standardized Variants file, which forms part of the Unicode Character Database.
From http://www.alanwood.net/unicode/variation_selectors.html