Swift: Unicode transformations: How to generate a rainbow infinity symbol - swift

In xcode, developing for iOS "\u{1F3F3}\u{FE0F}\u{200D}\u{1F308}" is a rainbow flag.
"\u{1F3F3}" is a white flag, and "\u{1F308}" is a rainbow. The middle symbols "\u{FE0F}\u{200D}" are invisible symbols used to join these two together to make the rainbow flag symbol.
I am trying to combine unicode characters to make a rainbow infinity symbol, but not exactly sure how to implement this.
Not sure if there is an already existing unicode character or apple api I can use to do this, but would appreciate learning how to do this
I wouldn't mind having an infinity symbol over the rainbow flag either (like the apple anti-lgbt flag incident) as an alternative.

Emoji fonts are still just fonts. If they don’t contain a specific glyph, then they cannot display that glyph. The reason “🏳️‍🌈” looks like a rainbow flag is because someone drew a picture of a rainbow flag and then defined their font in such a way that the sequence <U+1F3F3, U+FE0F, U+200D, U+1F308> would be displayed using that specific image. Much like how someone first had to define the precise shape of the letter “A” in their font and then apply that glyph to the codepoint U+0041.
There is no image-rendering code that instinctively knows how to apply the colours of 🌈 to the shape of 🏳️ and then automatically generates a new glyph on the fly. It’s all explicitly pre-defined.
U+200D is the so-called Zero Width Joiner (ZWJ), so emoji sequences using that character are appropriately named Zero Width Joiner Sequences. They were originally invented by Apple to support emoji that weren’t part of the Unicode standard (in particular, variants of 💏, 💑, and 👪️ with different gender configurations), but later other vendors jumped on board as well and nowadays they are officially part of Unicode as an alternative way for defining new emoji without having to encode entirely new characters. Currently, about a third of all officially recommended emoji are ZWJ sequences.
In theory, any person can make up their own ZWJ sequences just by joining existing characters together (as was their original intent). In your case, “♾️+ZWJ+🌈” or <U+267E, U+FE0F, U+200D, U+1F308> would be an obvious sequence for a rainbow-coloured infinity symbol. You just have to create your own font containing the glyph you want, and then distribute that font to other people so that they can see the same glyph as you. There are just a few problems:
Making fonts with colourful glyphs is not easy. I couldn’t tell you whether there even exist freely available tools for that task.
There are four different formats for emoji fonts (used by Apple, Google, Microsoft, and Mozilla respectively) and they generally do not work on each other’s platforms, so you would need to create not just one, but several fonts unless you don’t care about people on other operating systems.
Installing your own fonts is not possible on most mobile phones, so your custom emoji would mostly only be available to desktop users.

Related

Why Julia returns "\uf8ff" when I use  (Apple logo) unicode?

I thought Julia supports raw unicode input, such as:
julia> test = "π£¢∞§"
"π£¢∞§"
julia> 😘 = 1 ;
julia> print(😘 )
1
However, it seems julia does not support  (Apple logo).
julia>  = 123
ERROR: syntax: invalid character ""
julia> test = ""
"\uf8ff"
I wonder what's the underlying reason for that, and whether there is a way I can use  character in Julia?
I believe this link more properly explains the case of the unicode character that you see as apple's logo.
The problem is that the unicode value used is one of several that is set aside for private use. That means that each operating system, or application, or implementation is free to use those unicode characters for anything they want. It just so happens that Apple has chosen to use unicode character U+F8FF (decimal value 63743, or on the web as either  or ) as the Apple Logo. But some Windows fonts put in a Windows logo. And some other fonts put in a Klingon Mummification glyph. Or elven script. Or anything they want. And if it isn't defined in your local font, you'll just see a square.
My opinion is that Julia simply doesn't use this special value for anything. This also explains why your "π£¢∞§" characters work nicely - they are proper unicode characters, more largely supported by different platforms.
As a side note, i too see a simple square instead of the apple logo on this instance.
Edit
Here is a list of unicode characters supported by Julia.
To expand on Alex's answer...
Apple's logo () isn't an official Unicode symbol. I think there are very few commercial logos and symbols in the main Unicode tables.
However, Unicode provides some 'anything goes' areas (called PUAs - private use areas) that companies and individuals can fill with their own symbols, so that their users can access certain special glyphs. The main PUA is U+E000 to U+F8FF. Depending on which font you're using, you'll find all kinds of stuff assigned to these codes. On a Mac, I can usually get the Apple logo at "\uf8ff", with the right font selected, but not the Ubuntu symbol or the Windows logo, unless I choose another font. (There's also a fallback mechanism, whereby if you request a code point that the current font doesn't have, the OS will find a suitable substitute in another font and use that.)
[
In Julia, you can only use certain Unicode characters for variable names. Julia wouldn't allow anything from the private use area anyway, unless some fonts were distributed to every computer and everyone agreed on who had which Unicode point. (Mathematica makes extensive use of PUA symbols in their notebooks, because they can and do install their own fonts, and can then access various glyphs from the PUA in the notebook with guaranteed results.)
You are allowed to use emoji characters as variable names, so you could try the Emoji apple, rather than the Apple apple:

Why isn't there a font that contains all Unicode glyphs?

Pretty much as the title says. Rendering all of the unicode format correctly what with composite characters and characters that affect other characters and ligatures is really hard, I understand that. We have fonts that seem to be designed for maximum Unicode symbol support(Symbola, Code2001, others) and specialized fonts for certain planes or character ranges(BabelStone Han, others).
I don't know much about the underlying technical details for fonts. Is there a maximum size? Is it a copyright problem? Is essentially redrawing all ~110,000 extant glyphs too hard? I understand style concerns, but why not fall back to a 'default' font that had glyphs for everything? They're on unicode.org, redrawing them all would be pretty hard work but then you'd have a guaranteed fallback font for everything. If you got rights to some pre-existing fonts you could just composite them and that should help a lot. Such a font would be a great help to humanity and I can't see a good technical reason why it doesn't exist or at least an open-source effort to create it, so I presume an invisible-to-me reason why it can't be done.
What is that reason?
"Why would you even want that?" questions aside, from a programming perspective there's a very simple reason: the OpenType spec only affords an addressable glyph index space of one USHORT, so one font can only support 16 bits worth of glyphs identifiers, or 65,536 glyphs max. (And note the terminology: a "glyph" is not the same as a "character" or "letter")
The current version of Unicode, v8 as of this answer, contains 120,737 assigned code points, or almost twice as many as fit in a modern font (2021 edit: v13 upped this number to 143,859). In fact, Unicode hasn't been able to fit in a modern OpenType font since 2001, with the release of Unicode 3.1, which upped the number of code points from 49,259 to 94,205.
"So what about font collections?" I hear you ask. Why not use multiple fonts and support all unicode that way? Well now, you've just described Adobe's Sans Pro, and Google's Noto (which are the same font).
As for the "how hard can it be": a uniform style for all glyphs in Unicode, across 129 established written scripts on this planet, each with their own typesetting rules? Incredibly hard. You may think fonts are just files with pictures for letters, and someone types a letter, that picture shows up: that is not how fonts work, and isn't how fonts have worked since the late 1980's.
Modern fonts are the typographic equivalent of a game ROM: sure, it's not much use without the hardware or software to run that ROM on, but all the things that actually matter are in the ROM. Similarly, modern fonts contain all the information for typesetting. Not just pictures, they contain the metadata, the metrics, the positioning and substitutions rules for arbitrary sequences, with separate rule sets for each written script that OpenType supports, mandatory and optional ligatures, language-specific character replacements for letters at the start/middle/final position in a word, or in isolation, character repositioning relative to arbitarily complex sequences of other characters either before or after it, arbitrarily complex sequence replacements with other arbitrarily complex sequences, possible bitmap fallbacks for small-point rendering, hinting instructions on how to properly rasterize vector graphics that are inherently not aligned to any particular pixel grid, and more. A modern font is a ridiculously complex application, that a font engine consults to figure out how to typeset sequences of code points.
Making a (set of) Unicode-encompassing font(s) that looks good for all contexts is a vast team effort.
So: "Why isn't there a font that contains all Unicode glyphs?", because that's been technically impossible since 2001. We can, and do, make font families that cover all of Unicode, but with 129 different scripts all with their own typesetting rules, it's a lot of work, and almost (almost) not worth the effort compared to only covering a subset of all languages.
And as for this:
Such a font would be a great help to humanity and I can't see a good technical reason why it doesn't exist or at least an open-source effort to create it, so I presume an invisible-to-me reason why it can't be done.
Just because you didn't know about them, doesn't mean they don't exist, with millions of people who are familiar with them. They exist =)
They're even open source, go out and thank the people who made them!
There is GNU Unifont. It aims to contain all Unicode, except Apple Emoji.
You will probably find what you are looking for at the following links.
Unicode Character Table
HTML Character Entity References
Huge List of Unicode Symbols
List of Unicode Characters of Category “Other Symbol
This other is funny for particular character since you can draw what you search:
Unicode Character Recognition
Can't enter unicode character with Alt+ even with EnableHexNumpad
Basic Questions
Q: How many characters are in Unicode?
A: The short answer is that as of Version 13.0, the Unicode Standard contains 143,859 characters. The long answer is rather more complicated, because of all the different kinds of characters that people might be interested in counting.
Unicode font
A Unicode font is a computer font that maps glyphs to code points defined in the Unicode Standard. The vast majority of modern computer fonts use Unicode mappings, even those fonts which only include glyphs for a single writing system, or even only support the basic Latin alphabet.
Fonts which support a wide range of Unicode scripts and Unicode symbols are sometimes referred to as "pan-Unicode fonts", although as the maximum number of glyphs that can be defined in a TrueType font is restricted to 65,535, it is not possible for a single font to provide individual glyphs for all defined Unicode characters (143,859 characters, with Unicode 13.0).
...
No single "Unicode font" includes all the characters defined in the present revision of ISO 10646 (Unicode) standard, as more and more languages and characters are continually added to it, and common font formats cannot contain more than 65,535 glyphs (about half the number of characters encoded in Unicode).
As a result, font developers and foundries incorporate new characters in newer versions or revisions of a font, or in separate auxiliary fonts intended specifically for particular languages.
Enjoy!

How UTF8/Unicode adapt to new writing systems?

An example to clarify my question:
The Hongkongers' native language is Cantonese, however, we all write in a different language: Madarin Chinese. Two languages are kindof similar, and Hongkongers are educated to write in Madarin Chinese language.
Cantonese doesn't have a writing system. Though we are still happy with Madarin as our writing language, however, in case one day Hongkongers decided to develop a 'Cantonese script' which contains not-yet-existing characters, how should UTF8/Unicode/fonts change, to adapt these new characters?
I mean, who will change the UTF8/Unicode/fonts standard? How exactly Linux/Windows OS have to be modified, in order to display these newly created characters?
(The example is just to make my question clear. We're not talking about politics ;D )
The Unicode coding space has over 1,000,000 code points, and only about 10% of them have been allocated, so there is a lot of room for new characters (even though some areas of the coding space have been set apart for use other than added characters). The Unicode Consortium, working in close cooperation with the relevant body at ISO, assigns code points to new characters on the basis of proposals that demonstrate actual usage or, in some cases, plans with a solid basis and widespread support.
Thus, if a new script were designed and there was a large community that would seriously use it, it would be added, with its characters, into Unicode after due proposals and discussion.
It would then be up to font manufacturers to add glyphs for such characters. This might take a long time, but if there is strong enough need, new fonts and enhancements to existing fonts would emerge.
No change to UTF-8 or other Unicode transfer encodings would be needed. They already encode the entire coding space, whether code points are assigned to characters or not.
Rendering software would need no modifications, unless there are some specialties in the writing system. Normal characters would be rendered just fine, as soon as suitable fonts are available.
However, if the characters added were outside the Basic Multilingual Plane (BMP), the “16-bit subset of Unicode”, both rendering and processing (and input) would be problematic. Many programming languages and programs effectively treat Unicode as if it were a 16-bit code and run into problems (possibly solvable, but still) when characters outside the BMP are used. If the writing system had, say, 10,000 characters, it is quite possible that it would have to allocated outside the BMP.
The Unicode committee adds new characters as they see fit. Then fonts add support for the new characters. Operating systems should not require changes simply to display the new characters. Typing the characters would generally require updates or plug-ins to an operating system's input methods.

Which is the difference between the tick symbol U+2713 and U+2714

Appart from the visual aspect... do those tick symbols have any different semantics?
I mean: One is thin and the other bold. But... any special meaning for one or the other? Or it is just a matter of using one graphical aspect or another?
Unlike the majority of characters in Unicode, the Dingbats range U+27xx have no particular semantic content. The 'heavy' check mark has no meaning beyond 'a check mark that is visually bolder than the other one'; contrast this with the 'bold' letters in plane that have a mathematical meaning.
This range of characters comes from the symbol font Zapf Dingbats. Symbol fonts are visual in nature and don't fit well in Unicode, but Zapf Dingbats has historical significance as a one of the PostScript core font set guaranteed to be available on PS printers. Subsequently characters from Zapf Dingbats have commonly been used in document interchange, making it worthwhile to standardise them.
Appart from the visual aspect.
There's no appart here, the visual aspect is King in Unicode. U+2713 is a check mark. U+2714 is a heavy check mark. It should appear as a bolder version of U+2713 if you have a decent font.
These codepoints are in a group named Dingbats, a group of typographical symbols. Chess pieces, arrows, asterisks, that sort of thing. There's no semantic meaning attached to them. It is just heavier.

Where to get a reference image for any unicode code point?

I am looking for an online service (or collection of images) that can return an image for any unicode code point.
Unicode.org does not have an image for each one, consider for example
http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=31cf
EDIT: I need to use these images programmatically, so the code chart PDFs provided at unicode.org are not useful.
The images in the PDF are copyrighted, so there are legal issues around extracting them. (I am not a lawyer.) I suspect that those legal issues prevent a simple solution from being provided, unless someone wants to go to the trouble of drawing all of those images. It might happen, but seems unlikely.
Your best bet is to download a selection of fonts that collectively cover the entire range of characters, and display the characters using those fonts. There are two difficulties with this approach: combining characters and invisible characters.
The combining characters can easily be detected from the Unicode database, and you can supply a base character (such as NBSP) to use for displaying them. (There is a special code point intended for this purpose, but I can't find it at the moment.)
Invisible characters could be displayed with a dotted square box containing the abbreviation for the character. Those you may have to locate manually and construct the necessary abbreviations. I am not aware of any shortcuts for that.