fontspec (xetex/luatex) and fontconfig - xelatex

LuaTeX and XeTeX are passing on a warning on from the fontspec package, and I'm not sure whether I should worry. The following minimal code will generate the warning:
\documentclass{report}
\usepackage{fontspec}
\setmainfont[Script=Cyrillic]{Arial}
\begin{document}
\end{document}
The warning is
fontspec warning: "script-not-exist"
Font 'Arial' does not contain script 'Cyrillic'.
Replacing 'Arial' with, for example, 'Charis SIL', results in no warning.
I believe I've traced the problem back to fc-list/ fontconfig. In the output of fc-list for the Arial font, I get the following 'capabilities' line:
capability: "otlayout:arab otlayout:hebr otlayout:latn"(s)
This line does not mention Cyrillic (which would be the code 'cyrl', according to the \newfontscript{Cyrillic}{cyrl} line in fontspec-xetex.sty). Fonts for which I do not get this warning, such as Charis SIL, explicitly mention Cyrillic in their capabilities lines:
capability: "ttable:Silf otlayout:DFLT otlayout:cyrl otlayout:latn"(s)
Unfortunately, documentation of this "capability" line in the output of fc-list is limited to the line
capability String List of layout capabilities in the font
(that from https://www.freedesktop.org/software/fontconfig/fontconfig-user.html and a couple other places on the web)
My question is basically, should I worry about this warning message? Arial (and Charis) list 'ru', the ISO 639-1 code for Russian, in the 'lang' field. Just what is this "layout" capability for the Cyrillic script that Charis SIL supports, but Arial apparently does not, and why does it matter?
BTW, this is with the TeX Live 2017 distro.

Related

Font for "math bold script" unicode charset

I wouldn't believe I have been stuck on this for one hour, but it seems the fonts for extended unicode characters are not easyly available as TTF / OTF for use on computers, especially with graphic software where unicode fallback doesn't work
especifically I looking for the so called Math bold script
somehting like : 𝓓𝓮𝓶𝓸 𝓯𝓸𝓷𝓽 𝓐𝓑𝓒𝓖𝓟 𝓮𝓻𝓽𝓷𝓭 (<- those are extended chars)
as in https://textfancy.com/font-converter/
as imagen at: https://snipboard.io/fNYd7w.jpg
(becouse I am not sure we all see the same glyphs)
Note: what I am looking for, is a standrd TTF font, which normal glyphs are equal to those extended glyphs, meaning that the A looks like the 𝓐, B like 𝓑, and so on. So I could use the font as normal font in every software.
The STIX math fonts support the Unicode Mathematical Alphanumeric Symbols block.
https://www.stixfonts.org/
https://github.com/stipub/stixfonts
(Note: the variable fonts don't include support for that block of characters; only the static fonts do.)
Please note the intended use of those Unicode characters, as pointed out in the STIX project:
The sans serif, fraktur, script, etc., alphabets in Plane 1 (U+1D400-U+1D4FF) are intended to be used only as technical symbols.

Where are the unicode characters on the disk and what's the mapping process?

There are several unicode relevant questions has been confusing me for some time.
For these reasons as follow I think the unicode characters are existed on disk.
Execute echo "\u6211" in terminal, it will print the glyph corresponding to the unicode code point U+6211.
There's a concept of UCD (unicode character database), and We can download it's latest version. UCD latest
Some new version unicode characters like latest emojis can not display on my mac until I upgrade macOS version.
So if the unicode characters does existed on the disk , then :
Where is it ?
How can I upgrade it ?
What's the process of mapping the unicode code point to a glyph ?
If I use a specific font, then what's the process of mapping the unicode code point to a glyph ?
If not, then what's the process of mapping the unicode code point to a glyph ?
It will very appreciated if someone could shed light on these problems.
Execute echo "\u6211" in terminal, it will print the glyph corresponding to the unicode code point U+6211.
That's echo -e in bash.
› echo "\u6211"
\u6211
› echo -e "\u6211"
我
Where is it ?
In the font file.
Some new version unicode characters like latest emojis can not display on my mac until I upgrade macOS version.
How can I upgrade it ?
Installing/upgrading a suitable font with the emojis should be enough. I don't have macOS, so I cannot verify this.
I use "Noto Color Emoji" version 2.011/20180424, it works fine.
What's the process of mapping the unicode code point to a glyph ?
The application (e.g. text editor) provides the font rendering subsystem (Quartz? on macOS) with Unicode text and a font name. The font renderer analyses the codepoints of the text and decides whether this is simple text (e.g. Latin, Chinese, stand-alone emojis) or complex text (e.g. Latin with many marks, Thai, Arabic, emojis with zero-width joiners). The renderer finds the corresponding outlines in the font file. If the file does not have the required glyph, the renderer may use a similar font, or use a configured fallback font for a poor substitute (white box, black question mark etc.). Then the outlines undergo shaping to compose a complex glyph and line-breaking. Finally, the font renderer hands off the result to the display system.
Apart from the shaping, very little of this has to do with Unicode or encoding. Font rendering already used to work that way before Unicode existed, of course font files and rendering was much simpler 30 years ago. Encoding only matters when someone wants to load or save text from an application.
Summary: investigate
Truetype/Opentype font editing software so you can see what's contained in the files
font renderers, on Linux look at the libraries pango and freetype.
Generally speaking, operating system components that use text use the Unicode character set. In particular, font files use the Unicode character set. But, not all font files support all the Unicode codepoints.
When a codepoint is not supported by one font, the system might fallback to another that does. This is particularly true of web browsers. But ultimately if the codepoint is not supported, an unfilled rectangle is rendered. (There is no character for that because it's not a character. In fact, if you were able to copy and paste it as text, it should be the original character that couldn't be rendered.)
In web development, the web page can either supply or give the location of fonts that should work for the codepoints it uses.
Other programs typically use the operating system's rendering facilities and therefore the fonts available through it. How to install a font in an operating system is not a programming question (unless you are including a font in an installer for your program). For more information on that, you could see if the question fits with the Ask Different (Apple) Stack Exchange site.

Displaying Unicode characters in PDF produced by Apache FOP

I have an XML file containing a list of names, some of which use characters/glyphs which are not represented in the default PDF font (Helvetica/Arial):
<name>Paul</name>
<name>你好</name>
I'm processing this file using XSLT and Apache FOP to produce a PDF file which lists the names. Currently I'm getting the following warning on the console and the Chinese characters are replaced by ## in the PDF:
Jan 30, 2016 11:30:56 AM org.apache.fop.events.LoggingEventListener processEvent WARNING: Glyph "你" (0x4f60) not available in font "Helvetica".
Jan 30, 2016 11:30:56 AM org.apache.fop.events.LoggingEventListener processEvent WARNING: Glyph "好" (0x597d) not available in font "Helvetica".
I've looked at the documentation and it seems to suggest that the options available are:
Use an OpenType font - except this isn't supported by FOP.
Switch to a different font just for the non-ASCII parts of text.
I don't want to use different fonts for each language, because there will be PDFs that have a mixture of Chinese and English, and as far as I know there's no way to work out which is which in XSLT/XSL-FO.
Is it possible to embed a single font to cover all situations? At the moment I just need English and Chinese, but I'll probably need to extend that in future.
I'm using Apache FOP 2.1 and Java 1.7.0_91 on Ubuntu. I've seen some earlier questions on a similar topic but most seem to be using a much older version of Apache FOP (e.g. 0.95 or 1.1) and I don't know if anything has been changed/improved in the meantime.
Edit: My question is different (I think) to the suggested duplicate. I've switched to using the Ubuntu Font Family using the following code in my FOP config:
<font kerning="yes" embed-url="../fonts/ubuntu/Ubuntu-R.ttf" embedding-mode="full">
<font-triplet name="Ubuntu" style="normal" weight="normal"/>
</font>
<font kerning="yes" embed-url="../fonts/ubuntu/Ubuntu-B.ttf" embedding-mode="subset">
<font-triplet name="Ubuntu" style="normal" weight="bold"/>
</font>
However, I'm still getting the 'glyph not available' warning:
Jan 31, 2016 10:22:59 AM org.apache.fop.events.LoggingEventListener processEvent
WARNING: Glyph "你" (0x4f60) not available in font "Ubuntu".
Jan 31, 2016 10:22:59 AM org.apache.fop.events.LoggingEventListener processEvent
WARNING: Glyph "好" (0x597d) not available in font "Ubuntu".
I know Ubuntu Regular has these two glyphs because it's my standard system font.
Edit 2: If I use GNU Unifont, the glyphs display correctly. However, it seems to be a font aimed more at console use than in documents.
If you cannot find a suitable font supporting both Chinese and English (or you found one, but you don't like very much its latin glyphs), remember that font-family can contain a comma-separated list of names, to be used in that order.
So, you can list your desired font for English text first, and then the one for the Chinese text:
<!-- this has # instead of the missing Chinese glyphs -->
<fo:block font-family="Helvetica" space-after="1em" background-color="#AAFFFF">
Paul 你好</fo:block>
<!-- this has all the glyphs, but I don't like its latin glyphs -->
<fo:block font-family="SimSun" space-after="1em" background-color="#FFAAFF">
Paul 你好</fo:block>
<!-- the best of both worlds! -->
<fo:block font-family="Helvetica, SimSun" space-after="1em" background-color="#FFFFAA">
Paul 你好</fo:block>
The output looks like this:
The answer to my question is either use GNU Unifont, which:
Supports Chinese and English.
Is available under a free licence.
'Just works' if you add it to the FOP config file.
Or alternatively produce separate templates for English and Chinese PDFs and use different fonts for each.

How can I get Mocha's Unicode output to display properly in a Windows console?

When I run Mocha, it tries to show a check mark or an X for a passing or a failing test run, respectively. I've seen great-looking screenshots of Mocha's output. But those screenshots were all taken on Macs or Linux. In a console window on Windows, these characters both show up as a nondescript empty-box character, the classic "huh?" glyph:
If I highlight the text in the console window and copy it to the clipboard, I do see actual Unicode characters; I can paste the fancy characters into a textbox in a Web browser and they render just fine (✔, ✖). So the Unicode output is getting to the console window just fine; the problem is that the console window isn't displaying those characters properly.
How can I fix this so that all of Mocha's output (including the ✔ and ✖) displays properly in a Windows console?
By pasting the characters into LinqPad, I was able to figure out that they were 'HEAVY CHECK MARK' (U+2714) and 'HEAVY MULTIPLICATION X' (U+2716). It looks like neither character is supported in any of the console fonts (Consolas, Lucida Console, or Raster Fonts) that are available in Windows 7. In fact, out of all the fonts that ship with Windows 7, only a handful support these characters (Meiryo, Meiryo UI, MS Gothic, MS Mincho, MS PGothic, MS PMincho, MS UI Gothic, and Segoe UI Symbol). The ones starting with "MS" are all fixed-width (monospace) fonts, but they all look awful at the font sizes typical of a console. And the others are out, since the console requires fixed-width fonts.
So you'll need to download a font. I like DejaVu Sans Mono -- it's free, it looks good at console sizes, it's easy to tell the 0 from the O and the 1 from the I from the l, and it's got all kinds of fancy Unicode symbols, including the check and X that Mocha uses.
Unfortunately, it's a bit of a pain to install a new console font, but it's doable. (Steps adapted from this post by Scott Hanselman, but extended to include the non-obvious subtleties of 000.)
Steps:
Download the DejaVu fonts. Unzip the files. Go into the "ttf" directory you just unzipped, select all the files, right-click and "Install".
Run Regedit, and go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont.
Add a new string value. Give it a name that's a string of zeroes one longer than the longest string of zeroes that's already there. For example, on my Windows 7 install, there's already a value named 0 and one named 00, so I had to name the new one 000.
Double-click on your new value, and set its value to DejaVu Sans Mono.
Reboot. (Yes, this step is necessary, at least on OSes up to and including Windows 7.)
Now you can open a console window, open the window menu, go to Defaults > Font tab, and "DejaVu Sans Mono" should be available in the Font list box. Select it and OK.
Now Mocha's output will display in all its glory.
Update: this issue has now been fixed. Starting from Mocha 1.7.0, fallbacks are used for symbols that don't exist in default console fonts (√ instead of ✔, × instead of ✖, etc.). It's not as pretty as it could be, but it surely beats empty-box placeholder symbols.
For details, see the related pull request: https://github.com/visionmedia/mocha/pull/641

Using Unicode in fancyvrb’s VerbatimOut

Problem
VerbatimOut from the “fancyvrb” package doesn’t play nicely with UTF-8 characters.
Minimal working example:
\documentclass{minimal}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{fancyvrb}
\begin{document}
\begin{VerbatimOut}{\jobname.test}
é
\end{VerbatimOut}
\input{\jobname.test}
\end{document}
Error message
When compiled using pdflatex mini, this gives the error
File ended while scanning use of \UTFviii#three#octets.
A different error occurs when the sole occurrence of é above is replaced by something else, e.g. é */:
Package inputenc Error: Unicode char \u8:### not set up for use with LaTeX.
– indicating that in this case, LaTeX succeeds in reading a multi-byte UTF-8 character, but not knowing what to do with it (i.e. it’s the wrong character).
In fact, when I open the produced .test file manually, it contains the character é, but in Latin-1 encoding!
Proof: when I open the files in a hex editor, I get the following:
Original file: C3 A9 (corresponds to LATIN SMALL LETTER E WITH ACUTE in UTF-8)
Written file: E9 (corresponds to é in Latin-1)
Question
How to set VerbatimOut up correctly?
filecontents* (from “filecontents”) shows that it can work. Unfortunately, I don’t understand either code so I cannot fix fancyvrb’s code by replicating the logic from filecontents manually.
I also cannot use filecontents* instead of VerbatimOut because the former doesn’t work within a \newenvironment, while the latter does.
(Oh, by the way: vanilla Verbatim instead of VerbatimOut also works as expected. The error seems to occur when writing the file, not when reading the verbatim input)
Is your end goal to write symbols and accents in Verbatim? Because you can do that like this:
\documentclass{article}
\usepackage{fancyvrb}
\begin{document}
\begin{Verbatim}[commandchars=\\\{\}]
\'{e} \~{e} \`{e} \^{e}
\end{Verbatim}
\end{document}
The commandchars option allows the \ { } characters to work as they normally would.
Source: http://ctan.mirror.garr.it/mirrors/CTAN/macros/latex/contrib/fancyvrb/fancyvrb.pdf
This is still unfixed? I'll take another look. What exactly do you want: your package to use VerbatimOut, or for it not to interfere with it?
Tests
TexLive 2009's Xelatex compiles fine. With pdflatex, version
This is pdfTeX, Version 3.1415926-1.40.10 (TeX Live 2009)
I get an error message that is rather more useful error message than you got:
! Argument of \UTFviii#three#octets has an extra }.
\par
l.8 é
? i \makeatletter\show\UTFviii#three#octets
! Undefined control sequence.
\GenericError ...
#4 \errhelp \#err# ...
l.8 é
If I were to make a wild guess, I'd say that inputenc with pdftex uses the pdftex primitives to do some hairy storing and restoring of character tables, and some table somewhere has got a rarely mistake in it.
Possibly related
I saw a post by Vladimir Volovich in the pdf-tex mailing list archives, all the way back from 2003, that discusses a conflict between inputenc & fancyvrb, and posts a patch to "solve the problem". Who knows, maybe he faced the same problem? It might be worth emailing him.
XeTeX has much better Unicode support. The following run through xelatex produces “é” both in \jobname.test and the output PDF.
\documentclass{minimal}
\usepackage{fontspec}
\tracingonline=1
\usepackage{fancyvrb}
\begin{document}
\begin{VerbatimOut}{\jobname.test}
é
\end{VerbatimOut}
\input{\jobname.test}
\end{document}
fontspec loads the Latin Modern fonts, which have Unicode support. The standard TeX Computer Modern fonts don’t have the right tables for Unicode support.
If you use a character that does not have a glyph in the current font, by default XeTeX writes a blank space to the PDF and prints a warning in the log but not on the terminal. \tracingonline=1 prints the warning to the terminal.
On http://wiki.portal.chalmers.se/agda/pmwiki.php?n=Main.LiterateAgda, they suggest that you should use
\usepackage{ucs}
\usepackage[utf8x]{inputenc}
in the preabmle. I successfully used this in order to insert unicode into a verbatim environment.
\documentclass{article}
\usepackage{fancyvrb}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\newenvironment{MonVerbatim}{%
\count0=128\relax %
\loop
\catcode\count0=11\relax
\advance\count0 by 1\relax
\ifnum\count0<256
\repeat
\VerbatimOut[commandchars=\\\{\}]{VerbatimText.tex}%
}{\endVerbatimOut}
\newcommand\test{A command producing accented characters éà}
\begin{document}
\begin{MonVerbatim}
A little bit text in verbatim mode éà_].
\test
\end{MonVerbatim}
Followed by some accented character éà.
\end{document}
This code is working for me with TeXLive 2018 and pdflatex. Yous should
probably avoid changing catcode if you are using a 16 bits TeX (lualatex or xelatex).
You can use the package "iftex" to check the tex engine used.