Font for "math bold script" unicode charset - unicode

I wouldn't believe I have been stuck on this for one hour, but it seems the fonts for extended unicode characters are not easyly available as TTF / OTF for use on computers, especially with graphic software where unicode fallback doesn't work
especifically I looking for the so called Math bold script
somehting like : 𝓓𝓮𝓶𝓸 𝓯𝓸𝓷𝓽 𝓐𝓑𝓒𝓖𝓟 𝓮𝓻𝓽𝓷𝓭 (<- those are extended chars)
as in https://textfancy.com/font-converter/
as imagen at: https://snipboard.io/fNYd7w.jpg
(becouse I am not sure we all see the same glyphs)
Note: what I am looking for, is a standrd TTF font, which normal glyphs are equal to those extended glyphs, meaning that the A looks like the 𝓐, B like 𝓑, and so on. So I could use the font as normal font in every software.

The STIX math fonts support the Unicode Mathematical Alphanumeric Symbols block.
https://www.stixfonts.org/
https://github.com/stipub/stixfonts
(Note: the variable fonts don't include support for that block of characters; only the static fonts do.)
Please note the intended use of those Unicode characters, as pointed out in the STIX project:
The sans serif, fraktur, script, etc., alphabets in Plane 1 (U+1D400-U+1D4FF) are intended to be used only as technical symbols.

Related

Where are the unicode characters on the disk and what's the mapping process?

There are several unicode relevant questions has been confusing me for some time.
For these reasons as follow I think the unicode characters are existed on disk.
Execute echo "\u6211" in terminal, it will print the glyph corresponding to the unicode code point U+6211.
There's a concept of UCD (unicode character database), and We can download it's latest version. UCD latest
Some new version unicode characters like latest emojis can not display on my mac until I upgrade macOS version.
So if the unicode characters does existed on the disk , then :
Where is it ?
How can I upgrade it ?
What's the process of mapping the unicode code point to a glyph ?
If I use a specific font, then what's the process of mapping the unicode code point to a glyph ?
If not, then what's the process of mapping the unicode code point to a glyph ?
It will very appreciated if someone could shed light on these problems.
Execute echo "\u6211" in terminal, it will print the glyph corresponding to the unicode code point U+6211.
That's echo -e in bash.
› echo "\u6211"
\u6211
› echo -e "\u6211"
我
Where is it ?
In the font file.
Some new version unicode characters like latest emojis can not display on my mac until I upgrade macOS version.
How can I upgrade it ?
Installing/upgrading a suitable font with the emojis should be enough. I don't have macOS, so I cannot verify this.
I use "Noto Color Emoji" version 2.011/20180424, it works fine.
What's the process of mapping the unicode code point to a glyph ?
The application (e.g. text editor) provides the font rendering subsystem (Quartz? on macOS) with Unicode text and a font name. The font renderer analyses the codepoints of the text and decides whether this is simple text (e.g. Latin, Chinese, stand-alone emojis) or complex text (e.g. Latin with many marks, Thai, Arabic, emojis with zero-width joiners). The renderer finds the corresponding outlines in the font file. If the file does not have the required glyph, the renderer may use a similar font, or use a configured fallback font for a poor substitute (white box, black question mark etc.). Then the outlines undergo shaping to compose a complex glyph and line-breaking. Finally, the font renderer hands off the result to the display system.
Apart from the shaping, very little of this has to do with Unicode or encoding. Font rendering already used to work that way before Unicode existed, of course font files and rendering was much simpler 30 years ago. Encoding only matters when someone wants to load or save text from an application.
Summary: investigate
Truetype/Opentype font editing software so you can see what's contained in the files
font renderers, on Linux look at the libraries pango and freetype.
Generally speaking, operating system components that use text use the Unicode character set. In particular, font files use the Unicode character set. But, not all font files support all the Unicode codepoints.
When a codepoint is not supported by one font, the system might fallback to another that does. This is particularly true of web browsers. But ultimately if the codepoint is not supported, an unfilled rectangle is rendered. (There is no character for that because it's not a character. In fact, if you were able to copy and paste it as text, it should be the original character that couldn't be rendered.)
In web development, the web page can either supply or give the location of fonts that should work for the codepoints it uses.
Other programs typically use the operating system's rendering facilities and therefore the fonts available through it. How to install a font in an operating system is not a programming question (unless you are including a font in an installer for your program). For more information on that, you could see if the question fits with the Ask Different (Apple) Stack Exchange site.

Why isn't there a font that contains all Unicode glyphs?

Pretty much as the title says. Rendering all of the unicode format correctly what with composite characters and characters that affect other characters and ligatures is really hard, I understand that. We have fonts that seem to be designed for maximum Unicode symbol support(Symbola, Code2001, others) and specialized fonts for certain planes or character ranges(BabelStone Han, others).
I don't know much about the underlying technical details for fonts. Is there a maximum size? Is it a copyright problem? Is essentially redrawing all ~110,000 extant glyphs too hard? I understand style concerns, but why not fall back to a 'default' font that had glyphs for everything? They're on unicode.org, redrawing them all would be pretty hard work but then you'd have a guaranteed fallback font for everything. If you got rights to some pre-existing fonts you could just composite them and that should help a lot. Such a font would be a great help to humanity and I can't see a good technical reason why it doesn't exist or at least an open-source effort to create it, so I presume an invisible-to-me reason why it can't be done.
What is that reason?
"Why would you even want that?" questions aside, from a programming perspective there's a very simple reason: the OpenType spec only affords an addressable glyph index space of one USHORT, so one font can only support 16 bits worth of glyphs identifiers, or 65,536 glyphs max. (And note the terminology: a "glyph" is not the same as a "character" or "letter")
The current version of Unicode, v8 as of this answer, contains 120,737 assigned code points, or almost twice as many as fit in a modern font (2021 edit: v13 upped this number to 143,859). In fact, Unicode hasn't been able to fit in a modern OpenType font since 2001, with the release of Unicode 3.1, which upped the number of code points from 49,259 to 94,205.
"So what about font collections?" I hear you ask. Why not use multiple fonts and support all unicode that way? Well now, you've just described Adobe's Sans Pro, and Google's Noto (which are the same font).
As for the "how hard can it be": a uniform style for all glyphs in Unicode, across 129 established written scripts on this planet, each with their own typesetting rules? Incredibly hard. You may think fonts are just files with pictures for letters, and someone types a letter, that picture shows up: that is not how fonts work, and isn't how fonts have worked since the late 1980's.
Modern fonts are the typographic equivalent of a game ROM: sure, it's not much use without the hardware or software to run that ROM on, but all the things that actually matter are in the ROM. Similarly, modern fonts contain all the information for typesetting. Not just pictures, they contain the metadata, the metrics, the positioning and substitutions rules for arbitrary sequences, with separate rule sets for each written script that OpenType supports, mandatory and optional ligatures, language-specific character replacements for letters at the start/middle/final position in a word, or in isolation, character repositioning relative to arbitarily complex sequences of other characters either before or after it, arbitrarily complex sequence replacements with other arbitrarily complex sequences, possible bitmap fallbacks for small-point rendering, hinting instructions on how to properly rasterize vector graphics that are inherently not aligned to any particular pixel grid, and more. A modern font is a ridiculously complex application, that a font engine consults to figure out how to typeset sequences of code points.
Making a (set of) Unicode-encompassing font(s) that looks good for all contexts is a vast team effort.
So: "Why isn't there a font that contains all Unicode glyphs?", because that's been technically impossible since 2001. We can, and do, make font families that cover all of Unicode, but with 129 different scripts all with their own typesetting rules, it's a lot of work, and almost (almost) not worth the effort compared to only covering a subset of all languages.
And as for this:
Such a font would be a great help to humanity and I can't see a good technical reason why it doesn't exist or at least an open-source effort to create it, so I presume an invisible-to-me reason why it can't be done.
Just because you didn't know about them, doesn't mean they don't exist, with millions of people who are familiar with them. They exist =)
They're even open source, go out and thank the people who made them!
There is GNU Unifont. It aims to contain all Unicode, except Apple Emoji.
You will probably find what you are looking for at the following links.
Unicode Character Table
HTML Character Entity References
Huge List of Unicode Symbols
List of Unicode Characters of Category “Other Symbol
This other is funny for particular character since you can draw what you search:
Unicode Character Recognition
Can't enter unicode character with Alt+ even with EnableHexNumpad
Basic Questions
Q: How many characters are in Unicode?
A: The short answer is that as of Version 13.0, the Unicode Standard contains 143,859 characters. The long answer is rather more complicated, because of all the different kinds of characters that people might be interested in counting.
Unicode font
A Unicode font is a computer font that maps glyphs to code points defined in the Unicode Standard. The vast majority of modern computer fonts use Unicode mappings, even those fonts which only include glyphs for a single writing system, or even only support the basic Latin alphabet.
Fonts which support a wide range of Unicode scripts and Unicode symbols are sometimes referred to as "pan-Unicode fonts", although as the maximum number of glyphs that can be defined in a TrueType font is restricted to 65,535, it is not possible for a single font to provide individual glyphs for all defined Unicode characters (143,859 characters, with Unicode 13.0).
...
No single "Unicode font" includes all the characters defined in the present revision of ISO 10646 (Unicode) standard, as more and more languages and characters are continually added to it, and common font formats cannot contain more than 65,535 glyphs (about half the number of characters encoded in Unicode).
As a result, font developers and foundries incorporate new characters in newer versions or revisions of a font, or in separate auxiliary fonts intended specifically for particular languages.
Enjoy!

What range of unicode characters should be kept in a #font-face web font for a US based website with a US audience?

As part of optimizing a web development project, we need to strip out unnecessary characters that are never going to be used to reduce the size of font files. I have searched Google and found nothing canonical on the subject of which characters are required and which are safe to remove.
I've found the following ranges that may be of interest:
0020 — 007F Basic Latin
00A0 — 00FF Latin-1 Supplement
0100 — 017F Latin Extended-A
0180 — 024F Latin Extended-B
0250 — 02AF IPA Extensions
02B0 — 02FF Spacing Modifier Letters
0300 — 036F Combining Diacritical Marks
27C0 — 27EF Miscellaneous Mathematical Symbols-A
It seems that the most aggressive approach would be to only keep "Basic Latin", 0020 — 007F, which provides upper and lower-case letters, numbers and a few basic symbols, like the $, +, (, ), etc.
Latin-1 Supplement contains some extra goodies like Trademark and Copyright symbols and fractions.
Latin Extended-A and -B contain letters with accent marks, and since our copy is in English, I'm not sure if these will ever be needed.
If we use only that ranges (0020 — 007F) and (00A0 — 00FF), will we run into problems down the line with missing characters, should some user decide to post a comment in Spanish (for example)? Or will the browser fall back to a default font for characters that aren't included the web font?
The point of a web-font is to make the main bodies of text and headlines look pretty, which the basic latin set should cover, but I don't know if there are hidden "gotchas" with stripping down to just the "Basic Latin" range, like accented characters showing as diamond question marks instead of falling back to a system font, etc.
What range of unicode characters should be kept in a #font-face web
font for a US based website with a US audience? Are there any best practices or guidelines for striping unnecessary characters from a font for web use?
I would recommend subsetting to one of the common "code page" definitions that support US/Western Europe. Most code page definitions pre-date Unicode and typically have the bits and pieces needed for various regional support without including entire Unicode blocks. Suggestions:
Windows Code Page 1252
ISO/IEC 8859-1 "Latin 1"*
ISO/IEC 8859-15
*This is the same as Unicode Ranges 0020-007F Basic Latin + 00A0-00FF Latin-1 Supplement
These include much more than is strictly required for US English, though as noted above, several accented characters commonly appear in English text (é, ñ, as well as other punctuation marks and symbols). These sets include those characters, so you should be in good shape for the vast majority of text for a U.S. audience. Note also that in most fonts, these characters are typically "composites", which means that they use a reference to the components (e.g. 'é' is built from references to 'e' and '´'); as such, they don't normally require as much size to store them, so retaining them usually won't incur a major size penalty.
If you might encounter European financial text, I'd suggest either Windows 1252 or ISO/IEC 8859-15 which include the Euro currency symbol.
I don't know if there are hidden "gotchas" with stripping down to just the "Basic Latin" range, like accented characters showing as diamond question marks instead of falling back to a system font
Any characters that don't exist in the font you are using will fall back to any default font the browser can find with the characters in. This will likely be ugly when interleaved with other characters from your custom font, but modern OSes provide decent font coverage for commonly-used characters from the above blocks so typically it will still be readable.
So you should include characters based on whether you think they'll be used commonly enough that having them rendered in an ugly font is a deal-breaker. For what it's worth, a pretty minimal set I have used before for a similar purpose is ¡£°±²³¿ÉËÑéëñ‘’“”–—•€™, but your site's exactly requirements may vary. (For example, if you coöpted the New-Yorker-style diaeresis you would certainly want äëïöü.)
(How exactly default fallback fonts work varies between browsers and was famously troublesome in older versions of IE, and IE Mobile. But the basic accented Latin letters are pretty safe.)

Unicode fonts for Japanese

I am creating a game. I have some UI with text. Recently we wanted to add Japanese language version but I have problem with fonts. I use stb_freetype to rasterize fonts and I support Unicode so it should not be a problem. But most fonts doesn't seem to contain Janapese characters, on Windows I've found that Arial Unicode does. But its size is 26 MB, that's much more than our complete game!
I've seen Unicode and fonts but it doesn't cover my questions completly.
So basically I'm asking about 2 things:
Does Janapese fonts have different typefaces? I mean, Western fonts have serif, sans-serif or more exotic versions. Does this apply also to Asian fonts?
I probably would use system font rather than providing such a big file myself. I know how to locate Arial Unicode on Windows, but our game have also versions for Mac OSX, Linux and iOS. Where can I find Unicode fonts (and which ones should I use) on those platforms? Especially I'd be intrested about Linux, because this is least familiar platform for us.
most fonts doesn't seem to contain Janapese characters, on Windows I've found that Arial Unicode does. But its size is 26 MB, that's much more than our complete game!
Arial Unicode contains a lot more than just Japanese. It's also in general not a very good font: it is made to cover a lot of Unicode code points, but it is missing many features needed to actually render some languages properly. Not to mention it is not freely redistributable.
I suggest looking at the free Japanese fonts used by Linux distributions. For example VLGothic is 3.7MB and compresses down to just 2.2MB, which would be much more palatable. See also: Takao, Motoya, Togoshi.
Does Janapese fonts have different typefaces? I mean, Western fonts have serif, sans-serif or more exotic versions. Does this apply also to Asian fonts?
Certainly. Japanese (and other Han-derived fonts in general) vary widely, just as Latin does. Generally fonts might be categorised as:
Gothic: typically unstressed, without line-endings, with little sign of the original brushed nature of the characters. Most similar to Latin ‘sans-serif’ fonts—indeed, the name ‘Gothic‘ is taken from exactly that tradition.
Often used as default screen fonts as they render well in reduced detail. As well as square-ended Gothic Kaku, there's Gothic Maru which uses rounded features, matching well with Latin rounded sans.
Minchō: has serif-like endings stylised from the brush strokes, and strong vertical stress. Often formal in appearance. Most similar to Latin ‘serif’ fonts, typically paired with a transitional serif design. Often the default Japanese font for word processing, paired with Times New Roman.
Kyōkasho (‘textbook’): formal handwritten style, clear and readable, but less straight-edged than Mincho. Most similar to a legible Latin pen-written script font; might also usefully be paired with a more characterful serif.
Kaisho: traditional brushed style, but still regular and legible, somewhat formal. Not usually so good at low screen resolutions. Might be paired with a semi-serif or brushed script Latin face.
Gyōsho: cursive brushed style, less clear, typically for display purposes. Also Sōsho takes this further, to generally-illegible lengths.
Display fonts. There are some everyday handwritten-like styles, but typically fewer really wacky novelty fonts. Presumably because the amount of work involved in creating a font to cover the huge number of common Kanji makes it not worth it. You may also find novelty fonts that contain only the kana and Latin (rōmaji) characters, with few or no kanji.
The file size is not the only reason to avoid redistributing Microsoft technology to Linux or Mac!
1.Does Janapese fonts have different typefaces? I mean, Western fonts have serif, sans-serif or more exotic versions. Does this apply also to Asian fonts?
CJK (Chinese, Japanese, Korean) fonts do have different typefaces. Some fonts are more caligraphic and others are more plain. It is roughly analogous to the difference between san-serif and serif. (There is no notion of italics or bold in CJK fonts. I understand that there are conventions for expressing emphasis, but I don't recall what they are.)
You can see these font differences on Windows by comparing Arial Unicode MS with MingLiU or MS Mincho. The Arial version is very "plain" in both Latin characters and CJK idiographs.
As with Latin characters, I believe the distinction in typefaces is purely for visual appeal, and does not strongly imply any difference in meaning.
I can not help you with your second question.
Yeah you need to license a font for distirbution. You can't simply use the one out of your computer...For file size you should just pull the specific code pages. typically arial unicode is $3500 to embedd in a game...however thats the price for all 50,000 characters....
Source: Monotype Imagaing
Yes. As a simple example, just look at the differences between MS PGothic and MS PMincho, which should be available if you have Microsoft Word installed.
Unfortunately I have no experience with Unicode fonts on OSX or Linux, so I can't help you there.

how to generate Chinese Characters using Postscript?

Does anyone knows how to generate Chinese characters using Postscript or related tools? I'd like to use unicode to represent Chinese characters but it seems that Postscript doesn't support unicode, yet. In addition, I'd like to specify several fonts to generate the same character.
Thus, I have two questions:
1. how to use unicode in Postscript? Or how to enumerate Chinese Character set in the postscript way?
2. How to specify the fonts configurations using Postscript?
At last, in case postscript cannot do this job, what tools should I turn to for my purpose?
Thank you very much!
-Jin
In Adobe's official PostScript language specification there is no specific support for Unicode fonts. (And this is the final version of the spec for PS Level 3, valid since its publication in 1999 -- PostScript as a language is no longer developed...)
However, PostScript supports (since Level 2) multi-byte fonts (2-, 3- and 4-bytes) in a generic way (see 'CID'). All PostScript fonts need an "encoding": an encoding basically is a table telling at which index position of a font which glyph description for a given character can be found. So while there are no Unicode fonts as such, there are multi-byte CID fonts which provide ranged subsets of Unicode.
Also, there are no freely re-distributable CMaps. (A CMap .) If you need a CMap, you have to derive it from the Windows codepage and the matching Adobe CMap.
If you just look for a "super-simple" method to use Unicode text strings with no need of checking for ranges, language etc.: sorry to disappoint you. There is no way. That would be a pipe dream.
Have a look at CID-keyed fonts instead. These are designed to include a large number of glyphs. (Page 364ff in PLRM)
Update: Linked to the correct page with CID font description.