Special characters appear as emoji - unicode

I'm developing an Android app with Galaxy S8.
When I write some special characters, they appear to be Galaxy emoji. Things like these: ↔♥◀▶
How can I prevent them from turning into emoji?

Typically to select between the emoji and the character version you'll have to use a variation selector
Unicode defines variation sequences for many of its emoji to indicate their desired presentation.
Emoji characters can have two main kinds of presentation:
an emoji presentation, with colorful and perhaps whimsical shapes, even animated
a text presentation, such as black & white
— Unicode Technical Report #51: Unicode Emoji
Specifying the desired presentation is done by following the base emoji with either U+FE0E VARIATION SELECTOR-15 (VS15) for text or U+FE0F VARIATION SELECTOR-16 (VS16) for emoji-style.
https://en.wikipedia.org/wiki/Emoji#Emoji_versus_text_presentation
So if you don't want them to be displayed as emojis just attach a VS15 to the end

Related

How do I identify "simple" monochome emojis like ☺?

There are a number of Unicode characters categorized as "Extended Pictographic" which have no color and usually appear smaller than "normal" emojis. Here are some examples:
☺ ☻ ♥ ♦ ♣ ♠ ♂ ♀ ♪ ♫ ☼ ↕ ↔
These don't have a full-colour emoji counterpart. Does the Unicode Consortium provide any table or other information that allows me to identify these characters, i.e. to distinguish these monochrome "Extended Pictographic" characters from full-colour "Extended Pictographic" characters? I wasn't able to find such information myself.
The distinction you're seeing is called "emoji presentation" vs "text presentation." Some characters have both, and may have one or the other be the default.
The file you want is emoji-data. When you say "these don't have a full-colour emoji counterpart," that's not correct for most of them. They just have default text representation. I'll walk through a few of these to see how to understand them. One of my favorite exploration tools for this is the Unicode Utilities. You'll want UTS#51 as well.
☺: WHITE SMILING FACE
(Note that in Unicode, WHITE means "not filled in" and BLACK means "filled in" in many cases for historical reasons, going back to Japanese flip phones. They are not actually colors. Similarly, HEAVY means "bold" or "wide.")
ID: 263A
Emoji: yes
Extended_Pictographic: yes
Emoji_Presentation: no
So there is an emoji form of this, but it's not the default. We have to ask for it by adding U+FE0F (VARIATION SELECTOR-16). With that, this character displays as ☺️.
😊 SMILING FACE WITH SMILING EYES
For comparison, see this character, which is a more "traditional" emoji.
ID: 1F60A
Emoji: yes
Extended_Pictograph: yes
Emoji_Presentation: yes
☻: BLACK SMILING FACE
ID: 263B
Emoji: no
Extended_Pictographic: yes
Emoji_Presentation: no (necessarily, since not Emoji)
So, this is not an emoji at all which means it has no "emoji presentation." It is merely "similar in kind to characters with the Emoji property" (i.e. extended pictographic).
↕: UP DOWN ARROW
Another example of an Emoji without Emoji_Presentation. The emoji form is ↕️.
1: DIGIT ONE
And just for a little completeness:
ID: 0031
Emoji:yes
Extended_Pictographic: no
Emoji_Presentation: no
Digits are also emoji, and can take VS-16 modifier:
default: 😊123😊
as emoji: 😊1️2️3️😊
If you want a browsable list of characters with different properties, see the Character Property Index.

Swift: Unicode transformations: How to generate a rainbow infinity symbol

In xcode, developing for iOS "\u{1F3F3}\u{FE0F}\u{200D}\u{1F308}" is a rainbow flag.
"\u{1F3F3}" is a white flag, and "\u{1F308}" is a rainbow. The middle symbols "\u{FE0F}\u{200D}" are invisible symbols used to join these two together to make the rainbow flag symbol.
I am trying to combine unicode characters to make a rainbow infinity symbol, but not exactly sure how to implement this.
Not sure if there is an already existing unicode character or apple api I can use to do this, but would appreciate learning how to do this
I wouldn't mind having an infinity symbol over the rainbow flag either (like the apple anti-lgbt flag incident) as an alternative.
Emoji fonts are still just fonts. If they don’t contain a specific glyph, then they cannot display that glyph. The reason “🏳️‍🌈” looks like a rainbow flag is because someone drew a picture of a rainbow flag and then defined their font in such a way that the sequence <U+1F3F3, U+FE0F, U+200D, U+1F308> would be displayed using that specific image. Much like how someone first had to define the precise shape of the letter “A” in their font and then apply that glyph to the codepoint U+0041.
There is no image-rendering code that instinctively knows how to apply the colours of 🌈 to the shape of 🏳️ and then automatically generates a new glyph on the fly. It’s all explicitly pre-defined.
U+200D is the so-called Zero Width Joiner (ZWJ), so emoji sequences using that character are appropriately named Zero Width Joiner Sequences. They were originally invented by Apple to support emoji that weren’t part of the Unicode standard (in particular, variants of 💏, 💑, and 👪️ with different gender configurations), but later other vendors jumped on board as well and nowadays they are officially part of Unicode as an alternative way for defining new emoji without having to encode entirely new characters. Currently, about a third of all officially recommended emoji are ZWJ sequences.
In theory, any person can make up their own ZWJ sequences just by joining existing characters together (as was their original intent). In your case, “♾️+ZWJ+🌈” or <U+267E, U+FE0F, U+200D, U+1F308> would be an obvious sequence for a rainbow-coloured infinity symbol. You just have to create your own font containing the glyph you want, and then distribute that font to other people so that they can see the same glyph as you. There are just a few problems:
Making fonts with colourful glyphs is not easy. I couldn’t tell you whether there even exist freely available tools for that task.
There are four different formats for emoji fonts (used by Apple, Google, Microsoft, and Mozilla respectively) and they generally do not work on each other’s platforms, so you would need to create not just one, but several fonts unless you don’t care about people on other operating systems.
Installing your own fonts is not possible on most mobile phones, so your custom emoji would mostly only be available to desktop users.

Two different eye emojis?

As far as I knew, there are currently two emojis for eyes. The pair of eyes (U+1F440) with hex code f09f9180 (👀), and a single eye (U+1F441) with hex code f09f9181 (👁).
I now found when using the emojis of the keyboard in my phone that another eye emoji exists, with hex code f09f9181efb88f (👁️).
The gajim messenger on the PC, and the Conversations app on the mobile phone, can display both. The gajim emoji-chooser only contains the short sequence and the Swiftkey-Keyboard Emoji-Chooser only the longer one.
When I copy and paste the emojis i.e. in the Firefox URL address bar, they look the same (blue eye, while the messengers both display them in black). When I Google for the emojis, I only find pages describing the shorter code point.
Firefox renders both emojis the same, but Vivaldi (Chromium based) shows the one with the shorter code point as narrow black and white emoji and the other one as larger brown eye.
When I Google for the hex dump, I find a lot of emojipedia sites for the shorter dump, and nothing useful at all for the longer one.
Is there somewhere any documentation about the additional emoji? Why aren't both emojis available in both emoji choosers?
f0 9f 91 80 is the UTF-8 encoded form of codepoint U+1F440.
f0 9f 91 81 is the UTF-8 encoded form of codepoint U+1F441.
f0 9f 91 81 ef b8 8f is the UTF-8 encoded form of codepoints U+1F441 U+FE0F.
U+FE0F is a Variation Selector:
Variation Selectors is a Unicode block containing 16 Variation Selector format characters (designated VS1 through VS16). They are used to specify a specific glyph variant for a Unicode character. They are currently used to specify standardized variation sequences for mathematical symbols, emoji symbols, 'Phags-pa letters, and CJK unified ideographs corresponding to CJK compatibility ideographs. At present only standardized variation sequences with VS1, VS15 and VS16 have been defined.
Where U+FE0F is VARIATION SELECTOR-16:
U+FE0F was added to Unicode in version 3.2 (2002). It belongs to the block Variation Selectors in the Basic Multilingual Plane.
This character is a Nonspacing Mark and inherits its script property from the preceding character.
The glyph is not a composition. It has a Ambiguous East Asian Width. In bidirectional context it acts as Nonspacing Mark and is not mirrored. In text U+FE0F behaves as Combining Mark regarding line breaks. It has type Extend for sentence and Extend for word breaks. The Grapheme Cluster Break is Extend.
This codepoint may change the appearance of the preceding character. If that is a symbol, dingbat or emoji, U+FE0F forces it to be rendered as a colorful image as compared to a monochrome text variant. The Unicode standard defines some standardized variants. See also “Unicode symbol as text or emoji” for a discussion of this codepoint.
In other words, U+FE0F tells VS-aware software to render U+1F441 as a colorful emoji instead of as monochromatic text.
The singular ‘👁’ is used as an emoji, but is defined as being text-style (i.e. black-and-white rather than colourful) by default. This isn’t implemented consistently across all platforms, however, so sometimes the character will also display as emoji style instead. In order to explicitly force one or the other style, the characters U+FE0E and U+FE0F can be appended to 👁 to make it appear as text style (👁︎) or emoji style (👁️) respectively. Because of the inconsistencies I mentioned, some devices and applications automatically add U+FE0F to the character (resulting in the longer code your phone keyboard produced), while others leave the character as-is (leaving just the code for the eye itself).

Unicode Keystroke Characters?

Does unicode have characters in it similar to stuff like the things formed by the <kbd> tag in HTML? I want to use it as part of a game to indicate that the user can press a key to perform a certain action, for example:
Press R to reset, or S to open the settings menu.
Are there characters for that? I don't need anything fancy like ⇧ Shift or Tab ⇆, single-letter keys are plenty. I am looking for something that would work somewhat like the Enclosed Alphanumerics subrange.
If there are characters for that, where could I find a page describing them? All the google searches I tried turned only turned up "unicode character keyboard shortcuts" stuff.
If there are not characters for that, how can I display something like that as part of (or at least in line with) a text string in Processing 2.0.1?
(The rendering referred to is not the default rendering of kbd, which simply shows the content in the system’s default monospace font. But e.g. in StackOverflow pages, a style sheet is used to format kbd so that it looks like a keycap.)
Somewhat surprisingly, there is a Unicode way to create something that looks like a character in a keycap: enter the character, then immediately COMBINING ENCLOSING KEYCAP U+20E3.
Font support to this character is very limited but contains a few free fonts. Unfortunately, none of them is a sans-serif font, and the character to be shown inside should normally appear in such a font – after all, real keycaps contains very simple shapes for characters, without serifs. And generally, a character and an enclosing mark should be taken from the same font; otherwise they might be incompatible. However, it seems that taking the normal character from the sans-serif font (FreeSans) in GNU Freefont and the combining mark from the serif font (FreeSerif) of the same source creates a reasonable presentation:
I’m afraid it won’t work here in text, but I’ll try: A⃣ .
Whether this works depends on the use of suitable fonts, as mentioned, but also on the rendering software. Programs have been rather bad at displaying combining marks, but there has been some improvement. I tested this in Word 2007, where it works OK, and also on web browsers (Chrome, Firefox, IE) with good results using code like this:
<style>
.cap { font-family: FreeSerif; }
.cap span { font-family: FreeSans; }
</style>
<span class="cap"><span>A</span>⃣</span>
It isn’t perfect, when using the fonts mentioned. The character in the cap is not quite centered. Moreover, if I try to use the technique e.g. for the character Å (which is present on normal Nordic keyboards), the ring above A extends out of the cap. You could tweak this by setting the font size of the letter in the cap to, say, 85% of the font size of the combining mark, but then the horizontal position of the letter is even more off.
To summarize, it is possible to do such things at the character level, but if you can use other methods, like using a border or a background image for a character, you can probably achieve better rendering.

What is the unicode variation selector

I was wondering. What is the unicode Variation Selectors U-FE00 to U-FE0F used for.
Example: ︀︁︂︂ 
The Unicode standard talks about this. Here's a bit of the relevant section from 3.2.0, annex 28 (I'm sure there are more recent versions around; this is the first I found):
Unicode characters can be represented by a wide variety of glyphs, as discussed in Chapter 2, General Structure in The Unicode Standard, Version 3.0. Occasionally the need arises in text processing to restrict or change the set of glyphs that are to be used to represent a character. Normally such changes are indicated by choice of font or style in rich-text documents. In special circumstances, such a variation from the normal range of appearance needs to be expressed side-by-side in the same document in plain-text contexts, where it is impossible or inconvenient to exchange formatted text. For example, in languages employing the Mongolian script, sometimes a specific variant range of glyphs is needed for a specific textual purpose for which the range of “generic” glyphs is considered inappropriate. The variation selectors are used when characters have essentially the same semantic.
Variation selectors provide a mechanism for specifying a restriction on the set of glyphs that are used to represent a particular character. They also provide a mechanism for specifying variants, such as for CJK Ideographs and Mongolian, that have essentially the same semantic but have substantially different ranges of glyphs. A variation sequence, which always consists of a base character followed by the variation selector, may be specified as part of the Unicode Standard. That sequence is referred to as a variant of the base character. The variation selector affects only the appearance of the base character,* and only in the variation sequences defined in this Standard. The variation selector is not used as a general code extension mechanism.
(It goes on...)
You may also be interested in the Standardized Variants (this time from 6.0.0).
This is not a complete answer to the question, but it's pertinent to Emojis and Variant Selectors:
The ❤ character (U+2764 code point) is a Unicode character from 1993.
But the ❤️ emoji is actually the ❤ (U+2764) character followed by the Variant Selector-16 (U+FE0F).
Why?
Exclusively speaking about Emojis (documentation):
VS15 and VS16 are reserved to determine whether or not a character
should be displayed as an emoji. [...]
Emoji variation sequences contain VS16 (U+FE0F) for emoji-style (with color) or VS15 (U+FE0E) for text style (monochrome)
If there is a character (or symbol, glyph, etc...) that is intended to be also a emoji, the Variant Selector-16 will specify to the render, to renders it as Emoji. But if the same character is followed by the Variant Selector-15, it will specify to the render, to renders it as just text. If no Variant Selector is appended, than the default representation will depends on Unicode's specification. For Emoticons the default is Emoji. For other characters like ❤, the default is text...
Another example from Emoticons (Unicode_block)'s documentation:
Each emoticon has two variants:
U+FE0E (VARIATION SELECTOR-15) selects text presentation (e.g. 😊︎ 😐︎ ☹︎)
U+FE0F (VARIATION SELECTOR-16) selects emoji-style (e.g. 😊️ 😐️ ☹️).
If there is no variation selector appended, the default is the
emoji-style. Example:
U+1F610 (NEUTRAL FACE) 😐
U+1F610 (NEUTRAL FACE), U+FE0E (VARIATION SELECTOR-15) 😐︎
U+1F610 (NEUTRAL FACE), U+FE0F (VARIATION SELECTOR-16) 😐️
Note: The VS15 and VS16 are not mandatory to a valid emoji. There are a lot of emoji without Variant Selectors.
Your guess is as good as mine.. but according to this source...
has got it...
Emoji Character Encoding Data Hints: 1 In iOS 5 / OSX 10.7, the underlying code that the Apple OS generates for this emoji was changed.2 The code generated for this emoji was changed slightly in iOS 7 / OSX 10.9 (a variation selector was added) to make it easier for this emoji to be identified and shown in OSX and iOS. We don't mind Apple, thank you! We just love our emojis!
Their chart goes on to note that this "new", post-10.9 version
has a UTF-8 Character Count of 2 vs the previous 1... if that helps.
The Variation Selectors range was introduced with version 3.2 of the Unicode Standard, and is located in Plane 0, the Basic Multilingual Plane. Further selectors can be found in the Variation Selectors Supplement range.
Most Unicode characters can be represented by a wide variety of glyphs, and in rich text a particular glyph can be indicated by choosing a particular font or style. This mechanism is not available in plain text, and so variation selectors have been introduced as a way of indicating that the glyphs applicable to a particular character should be changed or restricted. The base character is followed by the variation selector, the combination being called a variation sequence. This is not intended to be general-purpose mechanism, and the only permitted variation sequences are those defined in the Standardized Variants file, which forms part of the Unicode Character Database.
From http://www.alanwood.net/unicode/variation_selectors.html