Changing font size in groff does nothing - manpage

I have an unordered list defined in groff, but the system I'm on can't render the unicode bullet character, so I wanted to use a small o, and I've tried to use \s-2 to decrease the typesize by two, yet, when I open said file with man, the o doesn't look any smaller than the rest of the text around it.
Here's the code I'm using:
.SH SYNOPSIS
.LP
\fIFiles\fP:
.br
.IP \s-2o 4
\fI\s+2File 1\fP
.IP
\fIFile 2\fP

Related

How can a MATLAB program test whether MATLAB can render a particular font?

I would like to use some special characters in a MATLAB figure. How can my program ensure the fonts are available before using them?
listfonts() is not reliable. It claims "Zapf Dingbats" is available on my machine, but it is not (and text() renders using a default font instead). listfonts() always includes the standard PostScript fonts. I suppose that's because they are always available for PostScript output, but I'm interested in displayed figures. Likewise uisetfont() and MATLAB -> Preferences -> Fonts -> Custom list "Zapf Dingbats", but render the sample using a default font.
Just looking for the font file doesn't work, either. For example, "Webdings" works fine on my main machine. However, on a second machine, "Webdings" is installed (there's a file /Library/Fonts/Webdings.ttf, and Word can use it), but MATLAB substitutes a default font.
I thought of one test: Create a small figure with one marker symbol, use print() to write it to a .png file, read in the file as data, compute a hash, and compare that hash with a stored value. Is there a less clumsy method?
I found Unicode equivalents for most of the symbols I need, that work for both of my test machines. However, they too apparently depend on my having the right fonts installed. For example, there are many Unicode versions of a square. Hex codes 2588, 25a0, and 25fc work here, but 25fe, 2b1b, and 2bc0 are rendered as blank. Is there a way to tell whether these characters are available?
I'm running R2017b under macOS version 10.13.5, and "set | grep LANG" displays "LANG=en_US.UTF-8".

Page numbers in scribble/acmart

I'm creating a pdf with the scribble/acmart language. How can I add page numbers to my document?
Make a LaTeX file with the line \settopmatter{printfolios=true}
If the file is named texstyle.tex, invoke Scribble with the command:
scribble ++style texstyle.tex --pdf FILE.scrbl
The rendered FILE.pdf should have line numbers.
(If you already had a ++style file, just add the \settopmatter line to that.)
The solution Ben gave is one way. But you can actually do this without modifying your texstyle.tex file.
If you add the following lines to your document, the appropriate topmatter will be added to your pdf file:
#para[#:style 'pretitle]{
#elem[#:style "settopmatter"]{
printfolios=true}}
You can see it doing this by running:
> scribble --latex myfile.scrbl
If you do this, you will notice the following line in your pdf file:
\settopmatter{printfolios=true}\titleAndVersionAndAuthors{Hello}{6.9.0.4}{\SNumberOfAuthors{1}\SAuthor{World}}
(Where Hello and World is the name and author of your paper, and the \title... macro runs \maketitle.)
This works because the 'pretitle style (when given to a paragraph), pulls its entire body above the title.
And whenever a string is given as the style for an element, it maps to the a latex command.
That is, this scribble code:
#elem[#:style "mycommand"]{Thebody}
Maps to:
\mycommand{Thebody}
The result of composing these two forms together is to drag this to the top of the file.
And because you've done this in scribble rather than latex, you can use Racket's semantics to add page numbers. For example, if you use your own #lang, you can now have the language decide whether or not you want pages.

Defining what is a line in Tesseract

I'm working on document recognition for scanned bank statement. The statements that I have are organized by lines, such as the one attached. Because Tesseract does such a good job at detecting the areas of text, it breaks the lines in the middle (I'm assuming this is because of the large white space between the first block in the line (blurred for privacy reason), and the next one ('EUR', or 'COURS').
In the hocr file, the bbox of all the elements in the line are within 2px or so, so I could potentially rebuild a line myself. However, this seems more like a hack. Is there a way to tell Tesseract that lines should be as wide as the document itself? Or would there be another way to go about it? I've tried playing with the psm option, but with no luck.
-psm 6 -- Assume a single uniform block of text -- should work. If not, you may want to use the older version 2.0x, which does not perform page layout analysis.

Is there an option to control output page orientation (using knitr->pander->pandoc->docx)

I am playing with Tal's intro to producing word tables with as little overhead as possible in real world situations. (Please see for reproducible examples there - Thanks, Tal!) In real application, tables are to wide to print them on a portrait-oriented page, but you might not want to split them.
Sorry if I have overlooked this in the pandoc or pander documentation, but how do I control page orientation (portrait/landscape) when writing from R to a Word .docx file?
I maybe should add tat I started using knitr+markdown, and I am not yet familiar with LaTex syntax. But I'm trying to pick up as much as possible while getting my stuff done.
I am pretty sure the docx writer has no section breaks implemented, also as far as I understand --reference-docx allows for customizing styles and not the page layout (but I might also be wrong here), this is from pandocs guide on --reference-docx:
--reference-docx=FILE
Use the specified file as a style reference in producing a docx file.
For best results, the reference docx should be a modified version of a
docx file produced using pandoc. The contents of the reference docx
are ignored, but its stylesheets are used in the new docx. If no
reference docx is specified on the command line, pandoc will look for
a file reference.docx in the user data directory (see --data-dir). If
this is not found either, sensible defaults will be used. The
following styles are used by pandoc: [paragraph] Normal, Title,
Authors, Date, Heading 1, Heading 2, Heading 3, Heading 4, Heading 5,
Block Quote, Definition Term, Definition, Body Text, Table Caption,
Image Caption; [character] Default Paragraph Font, Body Text Char,
Verbatim Char, Footnote Ref, Link.
Which are styles that are saved in the /word/styles.xml component of the docx document.
The page layout on the other hand is saved in the /word/document.xml component in the <w:sectPr> tag, but pandoc's docx writer ignores this part as far as I can tell.
The docx writer builds by default a continuous document, with elements such as headers, paragraphs, simple tables and so on ... much like a html output.
Option #1 (doesn't solve the page orientation problem):
The only page layout option that you can define through styles is the pageBreakBefore which will add a page break before a certain style
Option #2 (seems elegant but hasn't been tested):
Recently the custom writer has been added that allows for a custom lua script, where you should be able to define how certain Pandoc blocks will be written into the output file ... meaning you could potentially define section breaks and page layout for a specific block inserting the sectPr tag into the document. I haven't tried this out but it would be worth investigating. On pandoc github you can check out a sample lua script file for custom html output.
However, this means, you have to have lua installed, learn the language, and it is up to you if you think its worth the time investment.
Optin #3 (a couple of clicks in Word might just do):
As you will probably spend quite some time setting up how to insert sections and what would be the right size, margins, and figuring how to fit the table to such a layout ... I recommend that you use pandoc to put write your document.docx, that you open in Word, and do the layout by hand:
select the table you want on the landscape page
go to Layout > Margins
> select Apply to: Selected text
> choose Page Setup > select Landscape
Now a new section with a landscape orientation should surround your table.
What you would anyway also probably want to do is styling the table and table caption a little (font-size,...), to achieve the best result (all text styling can be already applied with pandoc where --reference-docx comes handy).
Option #4 (in situation when you can just use pdf instead of docx):
As far as I could figure out is that with pandoc does a good job with tables in md -> docx (alignment, style, ... ), in tex -> docx it had some trouble sometimes. However if your option allows for a pdf output latex will be your greatest friend. For example your problem is solved as easily as just using
\usepackage{pdflscape}
and adding this around your table
\begin{landscape}
...
\end{landscape}
This are the options that I could think of so far.
I would always recommend using the pdf format for reports, as you can style it to your liking with latex and the layout will stay the way you want it to be.
However, I also know that for various reasons word documents are still the main way of reviewing manuscripts in many fields ... so i would most likely just go with my suggested option 3, mostly cause it is a lazy and quick solution and because I usually don't have many documents with tons of giant tables with awkward placement and styling.
Good luck ;-)
Based on Taleb's answer here and some officer package functions, I created a little gist that one can use like this:
---
title: "Example"
author: "Dan Chaltiel"
output:
word_document:
pandoc_args:
'--lua-filter=page-break.lua'
---
I'm in portrait
\endLandscape
I'm in landscape
\endPortrait
I'm in portrait again
With page-breaks.lua being the file hosted here: https://gist.github.com/DanChaltiel/e7505e62341093cfdc489265963b6c8f
This is far from perfect (for instance it won't work without the last portrait section), but it is quite useful sometimes.

how to remove characters from a font file?

i've downloaded the DejaVu open source font and want to use it ad a WebFont, but even when converting it, i get a large file, and because the website i'll use will be only in few languages (arabic, french, amazigh) then, i dont need some characters.
so is there a way to browse the font file and delete the unnecessary range of unicode characters that i'll not need?
Using FontForge, you may open Element->Font Info->Unicode Ranges. You will see all available ranges and you can select a whole Unicode range with a single click. Then, you can tune your selection and delete using Encoding->Detach & Remove Glyphs.
Also, you can use Edit->Select->Select by Script.
The easiest method I found is to use pyftsubset tool from FontTools. Here's an example:
$ pyftsubset NotoSans-Regular.ttf \
--unicodes=U+0400-045F,U+0490-0491,U+04B0-04B1,U+2116 \
--output-file=NotoSans-Regular.cyrillic.woff2 \
--flavor=woff2
Note: woff2 output requires Brotli.
I wrote a simple script around it which automates the whole process including generation of a CSS file after splitting the font file. You may find it here: https://github.com/johncf/ttf2web