TM tag in TOC doesn't match TM in content - dita

We noticed a situation (PDF) where the reg mark in the TOC entry generated the symbol a bit higher than it appears in the content.
Here's a screencap. The top is the TOC and you can see the registered mark is much higher than the bottom example which is the Chapter title and content.
This is using DITA-OT 1.7.3 default PDF2 plugin.
I looked through topic.fo and both use:
<fo:inline line-height="100%" font-family="Helvetica, Arial Unicode MS" baseline-shift="20%" font-size="smaller">®</fo:inline>
But they each use a different wrapper.
The TOC uses:
<fo:inline end-indent="14pt" keep-together.within-line="auto" line-height-shift-adjustment="disregard-shifts" font-family="Helvetica, Arial Unicode MS">
The Content uses:
<fo:inline baseline-shift="20%" font-size="75%" line-height-shift-adjustment="disregard-shifts">
I looked through the TOC styling a bit but I'm not seeing where to make the adjustment.
I'm guessing the toc needs: baseline-shift="20%" font-size="75%"
This is a minor issue but is there any insight on where to make the adjustments so they're consistent?

I'm not sure that making the change you describe would give you what you expect. I have a feeling if you make those changes to the TOC wrapper, you may wind up with the entire content of the TOC smaller. You may want to change the line-height-shift-adjustment for the TOC wrapper to a value of consider-shifts to see what that does.

I think you are also mistaken because in no way does that image of the TOC depict font-family as Helvetica or Arial Unicode as the font used.

Related

Errors using ps2ascii on some files

What does FC_WEIGHT refer to? Please advise: Although a text file was produced it is large and consists largely of numbers which makes it hard to proofread. I need relatively good confidence the output matches the input. If there is a fix please point me to it and bring joy to my dull drab existence.
entered the command
ps2ascii /Users/dwstclair/Desktop/untitled3/stmt_20181130.pdf a.txt
The result was:
DEBUG: FC_WEIGHT didn't match
On the off chance a default font was missing on my system
I added DroidSansFallback.ttf (no joy)
Basically, I wouldn't use ps2ascii. Its long been deprecated and doesn't even ship in more recent versions of Ghostscript.
Instead consider using the txtwrite device. It works with a wider range of input (in particular it can use ToUnicode CMaps in PDF files, which ps2ascii cannot) and is capable of producing output in other than ASCII, which is quite useful. Even if you aren't working with non-Latin languages, the ability to preserve ligatures (eg fi, ffi, ffl etc) is convenient.
The actual answer to your question is 'don't worry about it'.
FC_WEIGHT refers to the weight of a font (light, bold, regular, ExtraBold etc). This message can only arise when you are using FontConfig, and Ghostscript is enumerating the available fonts from font config, trying to find a match for a missing font in the input. This means that a candidate font did not match the target font's weight.
Since you aren't going to use the font, it doesn't affect you.

Why isn't there a font that contains all Unicode glyphs?

Pretty much as the title says. Rendering all of the unicode format correctly what with composite characters and characters that affect other characters and ligatures is really hard, I understand that. We have fonts that seem to be designed for maximum Unicode symbol support(Symbola, Code2001, others) and specialized fonts for certain planes or character ranges(BabelStone Han, others).
I don't know much about the underlying technical details for fonts. Is there a maximum size? Is it a copyright problem? Is essentially redrawing all ~110,000 extant glyphs too hard? I understand style concerns, but why not fall back to a 'default' font that had glyphs for everything? They're on unicode.org, redrawing them all would be pretty hard work but then you'd have a guaranteed fallback font for everything. If you got rights to some pre-existing fonts you could just composite them and that should help a lot. Such a font would be a great help to humanity and I can't see a good technical reason why it doesn't exist or at least an open-source effort to create it, so I presume an invisible-to-me reason why it can't be done.
What is that reason?
"Why would you even want that?" questions aside, from a programming perspective there's a very simple reason: the OpenType spec only affords an addressable glyph index space of one USHORT, so one font can only support 16 bits worth of glyphs identifiers, or 65,536 glyphs max. (And note the terminology: a "glyph" is not the same as a "character" or "letter")
The current version of Unicode, v8 as of this answer, contains 120,737 assigned code points, or almost twice as many as fit in a modern font (2021 edit: v13 upped this number to 143,859). In fact, Unicode hasn't been able to fit in a modern OpenType font since 2001, with the release of Unicode 3.1, which upped the number of code points from 49,259 to 94,205.
"So what about font collections?" I hear you ask. Why not use multiple fonts and support all unicode that way? Well now, you've just described Adobe's Sans Pro, and Google's Noto (which are the same font).
As for the "how hard can it be": a uniform style for all glyphs in Unicode, across 129 established written scripts on this planet, each with their own typesetting rules? Incredibly hard. You may think fonts are just files with pictures for letters, and someone types a letter, that picture shows up: that is not how fonts work, and isn't how fonts have worked since the late 1980's.
Modern fonts are the typographic equivalent of a game ROM: sure, it's not much use without the hardware or software to run that ROM on, but all the things that actually matter are in the ROM. Similarly, modern fonts contain all the information for typesetting. Not just pictures, they contain the metadata, the metrics, the positioning and substitutions rules for arbitrary sequences, with separate rule sets for each written script that OpenType supports, mandatory and optional ligatures, language-specific character replacements for letters at the start/middle/final position in a word, or in isolation, character repositioning relative to arbitarily complex sequences of other characters either before or after it, arbitrarily complex sequence replacements with other arbitrarily complex sequences, possible bitmap fallbacks for small-point rendering, hinting instructions on how to properly rasterize vector graphics that are inherently not aligned to any particular pixel grid, and more. A modern font is a ridiculously complex application, that a font engine consults to figure out how to typeset sequences of code points.
Making a (set of) Unicode-encompassing font(s) that looks good for all contexts is a vast team effort.
So: "Why isn't there a font that contains all Unicode glyphs?", because that's been technically impossible since 2001. We can, and do, make font families that cover all of Unicode, but with 129 different scripts all with their own typesetting rules, it's a lot of work, and almost (almost) not worth the effort compared to only covering a subset of all languages.
And as for this:
Such a font would be a great help to humanity and I can't see a good technical reason why it doesn't exist or at least an open-source effort to create it, so I presume an invisible-to-me reason why it can't be done.
Just because you didn't know about them, doesn't mean they don't exist, with millions of people who are familiar with them. They exist =)
They're even open source, go out and thank the people who made them!
There is GNU Unifont. It aims to contain all Unicode, except Apple Emoji.
You will probably find what you are looking for at the following links.
Unicode Character Table
HTML Character Entity References
Huge List of Unicode Symbols
List of Unicode Characters of Category “Other Symbol
This other is funny for particular character since you can draw what you search:
Unicode Character Recognition
Can't enter unicode character with Alt+ even with EnableHexNumpad
Basic Questions
Q: How many characters are in Unicode?
A: The short answer is that as of Version 13.0, the Unicode Standard contains 143,859 characters. The long answer is rather more complicated, because of all the different kinds of characters that people might be interested in counting.
Unicode font
A Unicode font is a computer font that maps glyphs to code points defined in the Unicode Standard. The vast majority of modern computer fonts use Unicode mappings, even those fonts which only include glyphs for a single writing system, or even only support the basic Latin alphabet.
Fonts which support a wide range of Unicode scripts and Unicode symbols are sometimes referred to as "pan-Unicode fonts", although as the maximum number of glyphs that can be defined in a TrueType font is restricted to 65,535, it is not possible for a single font to provide individual glyphs for all defined Unicode characters (143,859 characters, with Unicode 13.0).
...
No single "Unicode font" includes all the characters defined in the present revision of ISO 10646 (Unicode) standard, as more and more languages and characters are continually added to it, and common font formats cannot contain more than 65,535 glyphs (about half the number of characters encoded in Unicode).
As a result, font developers and foundries incorporate new characters in newer versions or revisions of a font, or in separate auxiliary fonts intended specifically for particular languages.
Enjoy!

pandoc-generated docx misses italic variables in equations

I have the following segment of Markdown with embedded LaTeX equations:
# Fisher's linear discriminant
\newcommand{\cov}{\mathrm{cov}}
\newcommand{\A}{\mathrm{A}}
\renewcommand{\B}{\mathrm{B}}
\renewcommand{\T}{^\top}
The first method to find an optimal linear discriminant was proposed by Fisher
(1936), using the ratio of the between-class variance to the within-class variance
of the projected data, $d(\vec x)$, as a criterion. Expressed in terms of the
sample properties, the $p$-dimensional centroids $\bar {\vec x}_\A$ and
$\bar {\vec x}_\B$ and the $p \times p$ covariance matrices
$S_A = \cov_i ( \vec x_{\A i} )$ and $S_B = \cov_i ( \vec x_{\B i} )$, the
optimal direction is given by
$$
\vec w = \left ( \frac{ S_A + S_B }{2} \right ) ^{-1}
~ ( \bar {\vec x}_\B - \bar {\vec x}_\A ).
$$
When I convert it with pandoc to LaTeX and compile it with xelatex, I get the expected text with nicely rendered math. When I convert it with pandoc to MS Word using
pandoc test.text -o test.docx
and open it in MS Office Word 2007, I get the following:
Only those parts of the equations that are symbols or upright text get rendered correctly, while variable names in italics are replaced by a question mark in a box.
How can I make this work?
In Word 2007, I see a result similar to yours, except that here, I don't see the "question marks in boxes" characters, just space.
If I then take one of the expressions, and use your trick of going to linear display and back, the characters reappear for that expression.
If I save and re-open, the other expressions still do not display correctly, but if I save and look at the XML, I notice that
the Math font has been changed to Cambria Math
additional run parameter (w:rPr) XML specifying the Cambria Math
font has been inserted in many of the runs (w:r) inside the oMath
elements, even in the oMath expressions that do not display
correctly. However, in the oMath expression that now displays
correctly, this extra XML has been applied to every run. In the
others, it has only been applied to some runs (I think I can see the
pattern but I'm running out of time here right now...)
If I manually add the XML to the other runs and re-open the
document, the expressions appear correctly. Or at least, they do in
the one case I have tried.
Since Word 2010 displays the resuls correctly, I can only assume that it does not rely on these explicit font settings, whereas Word 2007 does. This doesn't really help you yet, because altering all those w:r elements would be even harder than what you are already doing. But it is possible that a default style/font needs to be set, either somewhere higher in the XML hierarchy, or perhaps elsewhere in the .zip (perhaps in fontTable.xml or styles.xml). I'm not familiar enough with Word's XML structures to guess what, if anything might be missing, but may be able to have a look tomorrow.
I suppose another possibility is that you just have to have all these extra rPr elements for this to work in Word 2007, which would suggest that pandoc may have been written for Word 2010, not 2007. (I don't know anything about the tool).
As an example, where you have
<m:r>
<m:t>(</m:t>
</m:r>
what you need is
<m:r>
<w:rPr>
<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math" />
</w:rPr>
<m:t>(</m:t>
</m:r>
I did the following to get rid of the font issue:
Create a new empty word document.
Copy all content to the new document.
Choose Match Source Format.
As discussed above, Windows doesn't have the font Lucida Grande, so substituting the Math Font with Cambria Math should work.
Rename the test.docx to test.zip
vim test.zip and select test/word/settings.xml
find and change Lucida Grande to Cambria Math
save and rename zip to docx. This results in something like this docx.
You can then also supply that file as a sort of docx template to pandoc with the --reference-docx option.

How can I substitute one glyph for another in an OpenType PostScript OTF font file?

I'm trying to use fonts from the Nitti Basic family for programming. These fonts are packaged as OpenType PostScript OTF files.
Its U+002D (HYPHEN-MINUS) glyph works well as a hyphen, but not so well as a minus.
For example, it doesn't line up with the horizontal bar of the plus sign.
On the other hand, Nitti's glyph for U+2212 (MINUS) is perfect as a minus (of course), and this is what I need when programming. It's not feasible for me to actually use codepoint U+2212; after all, U+002D is what you get when you press the minus sign on the keyboard and it's what programming languages use for subtraction.
So instead I'd like to steal the glyph from U+2212 and use it for U+002D, so that that character looks like a minus sign.
How can I do it?
Update: Yes, it is possible to use U+002D as a hyphen in source code.
As mentioned above, a minus sign is what I need.
I agree with Jukka, there are tools to do this.
However, please don't forget that a font is usually protected by very similar contracts as software. In this case the link you provided for example points to a legal document that reads (amongst much other):
"Except as permitted herein, you may not rename, modify, adapt,
translate, reverse engineer, decompile, disassemble, alter or
otherwise copy the Bold Monday Font Software."
Notice the fact that you're not permitted legally to change this font. If you read the rest of the agreement you'll see a lot of restrictions on the actual use of the font as well. Make sure you're not breaking your license by what you are doing...
For posterity, here's how to do it:
Obtain Adobe's AFDKO font tools and install them.
Put the OTF files into an empty directory.
Run ttx *.otf to convert the OTF files to TTX (XML).
Edit each TTX file in a text editor:
In the cmap section, change occurrences of hyphen to minus. This table maps characters to glyphs. Character U+002D was originally mapped to the hyphen glyph; this change maps it to the minus glyph.
Over the whole file, change ocurrences of NittiBasic to NittiBasicM and Nitti Basic to Nitti Basic M. This will distinguish the modified version of the font from the original once it's installed.
Rename the TTX files, replacing Nitti Basic with Nitti Basic M.
Run ttx -b *.ttx to convert the TTX files back to OTF.
Finally, install the newly-created OTF files.
Tools like FontForge can be used to edit a font in a simple manner.
Note that in programming, too, HYPHEN-MINUS has multiple uses: as a minus sign, but also (in some languages) as allowed in identifiers, as well as in comments, where it usually appears in the role of hyphen. In some uses, a HYPHEN glyph will look odd.

A set of typefaces that cover the whole Unicode character range

Does anybody know a set of typefaces that altogether cover the whole Unicode character range? we know that it is impossible to display all unicode characters using just one or two fonts. But probably, we can find a set of fonts using them the whole Unicode range could be displayed. Does anybody have any experience?
Thank you so much in advance.
One way to find such set of fonts is to look into Windows Font Linking. If you take a look at the registry key HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\FontLink\SystemLink you'll see fonts that "link" to cover the complete Unicode set.
as far as i know Arial Unicode is one of the full.
Everson Mono covers a large portion of the Unicode characters, and SIL International makes a lot of different fonts for minority languages.