Emacs: Font setup for displaying unicode characters in OSX - emacs

I'm trying to display special unicode characters, in particular the mathematical operator 𝓮 in emacs. Specifically:
position: 283 of 317 (89%), column: 0
character: 𝓮 (displayed as 𝓮) (codepoint 120046, #o352356, #x1d4ee)
preferred charset: unicode (Unicode (ISO10646))
code point in charset: 0x1D4EE
syntax: w which means: word
category: .:Base, L:Left-to-right (strong)
buffer code: #xF0 #x9D #x93 #xAE
file code: #xF0 #x9D #x93 #xAE
(encoded by coding system utf-8-unix)
display: no font available
Unicode data:
Name: MATHEMATICAL BOLD SCRIPT SMALL E
Category: Letter, Lowercase
Combining class: Ll
Bidi category: Ll
Decomposition: font e
Character code properties: customize what to show
name: MATHEMATICAL BOLD SCRIPT SMALL E
general-category: Ll (Letter, Lowercase)
decomposition: (font 101) (font 'e')
There are text properties here:
fontified t
I'm using GNU Emacs 24 a recent nightly binary. The text above displays fine on my browser and in TextEdit however, the special characters come up empty when viewed in emacs.
I read this from an old Emacs 22 manual: "A fontset does not necessarily specify a font for all character codes. If a fontset specifies no font for a certain character, or if it specifies a font that does not exist on your system, then it cannot display that character. It will display an empty box instead." - This is the exact behavior I am observing
It seems I may need to build a fontset to be able to display such arbitrary characters (starting with the Xdefaults or Xresources files).
How can I identify which font families I will need to include in the fontset to display Math operators (most online examples refer to languages like Latin, Chinese, etc.)? I couldn't even find any examples of .Xdefault or .Xresource files.
Am I on the right track? Is there an easier/more obvious way to do this?

I have the same problem, and I don't have a general solution either. Here's my approach to fixing a single character (or potentially a range),
assuming that you have the character in a buffer and it's not displaying.
Some experimentation showed that Menlo is a useful source of characters, like FreeSerif.
Put the cursor before the non-displayed character.
m-x describe-char. This gives you a lot of information about the character, including a line of the form "code point in charset: 0x2055".
Somewhere in your .emacs or related files, use this function. It can potentially fix a whole range of characters by snagging them from the FreeSerif family or something else, but I don't have good choices for anything but a few characters.
(defun bbextra-fix-fontset-font (from &optional to family)
"Make characters FROM to TO come from FAMILY.
Default value of TO is FROM, and of FAMILY is FreeSerif (which
seems to have some of the characters)"
(set-fontset-font t (cons from (or to from))
(font-spec :family (or family "FreeSerif"))))
;; Here are the characters I have fixed.
(bbextra-fix-fontset-font #x2042)
(bbextra-fix-fontset-font #x2023)
(bbextra-fix-fontset-font #x203D)
(bbextra-fix-fontset-font #x2055)
;These come from Menlo
(bbextra-fix-fontset-font #x2620 #x2620 "Menlo") ; skull and crossbones
(bbextra-fix-fontset-font #x266C #x266C "Menlo") ; 16th notes
(bbextra-fix-fontset-font #x2695 #x2695 "Menlo") ; asclepion
(bbextra-fix-fontset-font #x2624 #x2624 "Menlo") ; caduceus

The function set-fontset-font may be used to specify which font to use for any range of characters; e.g.,
(set-fontset-font t '(#x1d4ee . #x1d4ee) (font-spec :family "FreeSerif"))

There was a known bug with MacOS emacs and displaying characters beyond the BMP. See for example my bug report at Emacs bugs.
After reporting this bug, I had an e-mail suggesting use of the “Mac port” version of emacs. This apparently displays non-BMP characters.
The bug was fixed subsequently, in 24.4 and beyond.

Related

Unit Separator "us"

I've seen the unit separator represented as different symbols (I've provided links to each one). What's the difference between each one? I'm working on a project and the only symbol that works is the "us" symbol.
Unit Separator Symbol #1:
Unit Separator Symbol #2:
Unit Separator Symbol #3:
Unit separator is one of the many ASCII control codes, so done for very old times. You see that you can use FS, GS, RS, and US, to split data (e.g. on a serial console).
Such control characters are interpreted as control character in Unicode (so in modern world), so without a real symbol.
And then things may get complex. Text processor, shaping engines and/or fonts may interpret control characters differently: either just as control, and so possibly ignoring them, if they do not have semantic for such control, or trying to display it. One common form it is to use U+241F (SYMBOL FOR UNIT SEPARATOR), in the Unicode block Control Pictures (U+2400 – U+243F), which includes symbols for all ASCII control codes. Note: fonts display it differently, some fonts as a boxed text with an abbreviation, some fonts as small letters in diagonal.
Note old fonts (with 256 symbols) used control character for extra symbols, see e.g. the default DOS code page: https://en.wikipedia.org/wiki/Code_page_437, where you see your symbol: the black triangle. ("Black" in font means filled, so not just the sides/contour). Note: there were also special methods on how to print them (instead of interpreting them as control characters), and different systems used different symbols on control codes.

Why do the printed unicode characters change?

The way the unicode symbol is displayed depends on whether I use the White Heavy Check Mark or the Negative Squared Cross Mark before it or not. If I do, the Warning Sign is coloured. If I put a space between the symbols, I get the mono-coloured text-like version.
Why does this behaviour exist and can I force the coloured symbol somehow?
I tried a couple of different REPLs, the behaviour was the same.
; No colour
(str (char 0x274e) " " (char 0x26A0))
; Coloured
(str (char 0x274e) "" (char 0x26A0))
Clojure unicode display.
I expect the symbol being displayed the same way regardless of which symbol comes before it.
Why does this behaviour exist
A vendor thought it would be a neat idea to render emoji glyhps in colour. The idea caught on.
https://en.wikipedia.org/wiki/Emoji#Emoji_versus_text_presentation
can I force the coloured symbol somehow
U+FE0E VARIATION SELECTOR-15 and U+FE0F VARIATION SELECTOR-16
http://mts.io/2015/04/21/unicode-symbol-render-text-emoji/
Unicode is about characters (code points), not glyphs (see it as "image" of a character).
Fonts are free to (and should) merge nearby characters into a single glyphs. In printed Latin scripts this is not very common (but we could have it e.g. ff,fi, ffi), without considering the combining codepoints which, per definition, should combine with other characters, to get just one glyph,
Many other scripts require it. Starting to cursive Latin scripts, but most cursive scripts requires changes. E.g. Arabic has different glyphs of initial, final, middle or separated character (+ special combination, common to cursive scripts). Indian scripts have similar behaviours.
So the base of Unicode has already this behaviour, and modern good fonts should be able to do it.
It was not so late, that emojii uses such functionality, e.g. country letters/flags to other common cases.
Often the Unicode documentation tell you of such possibilities, and the special code points which could change behaviour, but then it is task of the font to fullfil the expected behaviour (and to find good glyphs).
So: character (as unicode code point) is not one to one to a design (glyphs).

What character is this:?

EDIT
While posting the question, character I ask for was shown well to me, but after postig it does not show up anymore. As it does not appear, please look up in original site
EDIT2
I looked for Unicode chars associated with "alien", and found no matching ones. Here is how they are compared side by side:
I found, that some texts inside my database contain character like . I am not sure, how it would rendered with different fonts and environments, so here is the image, how I see it:
I tried to identify it with different ways. For example, when I paste it into Sublime Text, it automatically shows as control character <0x85>. When I tried to identify it in different unicode-detectors (http://www.babelstone.co.uk/Unicode/whatisit.html, https://unicode-table.com/en/, https://unicode-search.net/unicode-namesearch.pl), their conclusion is pretty match the same:
Uni­code code point char­acter U+0085
UTF-8 en­co­ding c2 85 hexa­decimal
194 133 deci­mal
0302 0205 octal
Uni­co­de char­ac­ter name <control>
Uni­co­de 1.0 char­act­er name (de­pre­ca­ted) NEXT LINE (NEL)
https://unicode-search.net/unicode-namesearch.pl
also included this information
HTML en­co­ding … … hexa­decimal
… … deci­mal
which gave me some vague hint, how it was possible, that … become ``. But this is not main problem here.
My question is: how is possible, that control character is shown up like this and what is the actual glyph used to represent it?
I tried to sketch into http://shapecatcher.com/ to identify it but without success. I did not find such a glyph in any Unicode table.
The alien symbol is not a Unicode character; but is in Microsoft's Webdings font, with character code 0x85. Running Start > Run > charmap, then selecting Webdings from the Font drop list, opens this window:
If I click that alien character in the leftmost column, the message Character Code : 0x85 is shown at the bottom of the window.
I can even copy that character from the Character Map and paste it into Microsoft Wordpad:
The WebDings symbols were included in Unicode Release 7: Pictographic symbols (including many emoji), geometric symbols, arrows, and ornaments originating from the Wingdings and Webdings sets. Therefore you would expect the alien symbol to also be in Unicode. However, I don't think the version of Webdings that was used included that alien symbol, since Windows 10 also has a ttf file for Webdings (version 5.01), and it also does not include the alien symbol:
So presumably what originally caught your attention was some text being rendered with an older version of the Webdings font which included that alien symbol.
The glyph is 👽 U+1F47D EXTRATERRESTRIAL ALIEN. I don't know why your system misrenders a control character.

What is different between encoding and font

Encoding is maping that gives characters or symbols a unique value.
If a character is not present in encoding no matter what font you use it won't display correct fonts
Like Lucida console, arial or terminal
But problem is terminal font is showing line draw characters but other font is not showing line draw characters
My question is why terminal is behaving different to other font
Plz note
Windows 7
Locale English
For the impatient, the relevant link is at the bottom of this answer.
Encoding is maping that gives characters or symbols a unique value.
No, that are the specifics of a character-set, which maps certain characters to code points (using the Unicode terminology). Lets ignore the above for now.
If a character is not present in encoding no matter what font you use it won't display correct fonts Like Lucida console, arial or terminal
Font formats map Unicode code points to glyphs. Not all code points may be mapped for specific fonts - somebody has to create all these symbols. Again, lets ignore this.
Not all binary encodings may map to code points within a certain character set; this is possibly what you mean.
But problem is terminal font is showing line draw characters but other font is not showing line draw characters
Your terminal seems to operate on a different character set, probably the "OEM" or "IBM PC" character set instead of a Unicode compliant character set or Windows-1252 / ISO 8859-1 / Latin.
If it is the latter than you are out of luck unless you can set your output-terminal to another character set, as Windows-1252 doesn't support the box drawing characters at all.
Solutions:
If possible try and set the output to OEM / IBM PC character set.
If it is Unicode you can try and convert the output to Unicode: read it in (decode it) using the OEM character set and then re-encode it using the box drawing subset.

How do I select a specific font for display for a particular set of Unicode codepoints in Emacs?

I am using:
GNU Emacs 24.3.50.1 (i386-apple-darwin11.4.2, NS apple-appkit-1138.51) of 2013-05-27 on Celestra.local
I can get Chinese characters to display, but they use a 'Song' font, the typeface of which is difficult to read (it's analogous to a serif-font). How do I tell Emacs to select a specific face to display Chinese characters? All characters that are ASCII are displayed in Inconsolata.
I looked at EmacsWiki:FontSets and added this to my .emacs file:
(set-fontset-font "fontset-standard"
(cons (decode-char 'ucs #x4E00)
(decode-char 'ucs #x9FFF))
"-*-SimHei-*-*-*-*-14-*-*-*-*-*-iso10646-1")
but my Chinese characters are stilled displayed in the default fallback font.
Try "fontset-default" rather than "fontset-standard", as fontset-default will be the fallback for all fontsets that do not specify something more specific.
I would also use script names rather than a character range, but either will work:
(set-fontset-font "fontset-default" 'han "SimHei-14")
You may need to add other scripts such as 'cjk-misc for full coverage.