Non-latin symbols in chapter title - cyrillic

I have the following asciidoc source file:
:doctype: book
Перший розділ
-----------------
Chapters can contain sub-sections nested up to three deep.
The following error comes up:
asciidoctor: WARNING: b.txt: line 3: unterminated listing block
M+ 1mn is not a known font.
I run it with the following command:
asciidoctor -r asciidoctor-pdf -b pdf b.txt -a pdf-style=my.yml -a pdf-fontsdir="/usr/share/fonts/truetype/msttcorefonts"
Where my.yml has this content:
extends: default
font:
catalog:
Times_New_Roman:
normal: Times_New_Roman.ttf
bold: Times_New_Roman_Bold.ttf
italic: Times_New_Roman_Italic.ttf
bold_italic: Times_New_Roman_Bold_Italic.ttf
base:
font-family: Times_New_Roman
Apparently, the problem is in Ukrainian symbols. How should I fix it?
EDIT:
This happens only if the chapter title is in Cyrillic, with the body it works fine.
:doctype: book
The First Chapter
-----------------
Розділи можуть містити підрозділи.

I think the problem is that you are using 4 extra dashes in the document title.
Перший розділ
-----------------
It should be:
Перший розділ
-------------
Please note that setext (two-lines) titles are not recommended, instead you should use this syntax:
= Перший розділ
:doctype: book
Chapters can contain sub-sections nested up to three deep.

Related

How to write in two columns like a table in Linux man pages?

I'm creating a custom man page for my C library, and I'd like to do a thing like this
LIST OF FUNCTIONS |<--- terminal window side
|
Function Description |
function1 function1's description |
function2 function2's description |
which is longer than the |<--- here if the text
first one | overlaps out of the window,
function3 function3's description | it auto-aligns to Description
... ... |
How could I do that?
I think that it's a combination of https://tldp.org/HOWTO/Man-Page/q3.html and then use GROFF - https://www.linuxjournal.com/article/1158
.SH DESCRIPTION
.B foo
frobnicates the bar library by tweaking internal
symbol tables. By default it parses all baz segments
and rearranges them in reverse order by time for the
.BR xyzzy (1)
linker to find them. The symdef entry is then compressed
using the WBG (Whiz-Bang-Gizmo) algorithm.
All files are processed in the order specified.
There is a command on the linuxjournal site with the following:
$ groff -Tascii -man coffee.man | more
The groff man page starts with the following:
The man macro package for groff is used to produce manual pages
(“man pages”) like the one you are reading.

ruamel.yaml.cmd rt breaks lists, if containing long string, or hash

I just notices that the command line tool, called like this: "ruamel.yaml.cmd rt --save $YAML_FILE", will break lists that either contain long strings, or hashes:
Example list containing a hash:
Source:
telegraf::inputs:
cpu:
- percpu: true
totalcpu: true
report_active: true
output:
telegraf::inputs:
cpu:
- percpu: true
totalcpu: true
report_active: true
example list containing long string:
source:
rsyslog::config::snippets:
00_forward:
ensure: 'present'
lines:
- 'if $syslogfacility != 1 then {'
- 'action(Name="collector-syslog" Type="omfwd" Target="%{hiera("rsyslog_server")}" Port="514" Action.ResumeInterval="5" Protocol="tcp")'
- '}'
output:
rsyslog::config::snippets:
00_forward:
ensure: present
lines:
- if $syslogfacility != 1 then {
- action(Name="collector-syslog" Type="omfwd" Target="%{hiera("rsyslog_server")}"
Port="514" Action.ResumeInterval="5" Protocol="tcp")
- '}'
I already created a bug report for this, but it was deleted with a comment pointing to https://yaml.readthedocs.io/en/latest/example.html?highlight=indent#output-of-dump-as-a-string.
But I am not sure how this code snipped should help me with the command line tool.
Or is the tool deprecated, and I have to roll my own?
The automatic detection of the indent seems incorrect for your input, as that input is inconsistent (your mappings are indented 2 positions and your sequences 4 positions with an offset for the block sequence indicator of 2). ruamel.yaml.cmd as on PyPI doesn't support different indentation levels for sequences and mappings (ruamel.yaml didn't when that was written, it does now).
Apart from that you cannot set the line width for the output in ruamel.yaml.cmd for older versions ( before 2020-12-01), and those versions are using the default 80 characters for the wrapping.
I recommend you upgrade to 0.5.6 and use the command line options:
yaml rt --indent 2 --width 1024 --save <yourfile>
The appropriate repository for ruamel.yaml.cmd is https://sourceforge.net/p/ruamel-yaml-cmd/code/ci/default/tree/ . A bug report on ruamel.yaml which can only be used from a Python program, should include the minimal source code of the program that reproduces the error, and if not provided, issues will be removed as announced on its create issue page.

use perl to extract specific output lines

I'm endeavoring to create a system to generalize rules from input text. I'm using reVerb to create my initial set of rules. Using the following command[*], for instance:
$ echo "Bananas are an excellent source of potassium." | ./reverb -q | tr '\t' '\n' | cat -n
To generate output of the form:
1 stdin
2 1
3 Bananas
4 are an excellent source of
5 potassium
6 0
7 1
8 1
9 6
10 6
11 7
12 0.9999999997341693
13 Bananas are an excellent source of potassium .
14 NNS VBP DT JJ NN IN NN .
15 B-NP B-VP B-NP I-NP I-NP I-NP I-NP O
16 bananas
17 be source of
18 potassium
I'm currently piping the output to a file, which includes the preceding white space and numbers as depicted above.
What I'm really after is just the simple rule at the end, i.e. lines 16, 17 & 18. I've been trying to create a script to extract just that component and put it to a new file in the form of a Prolog clause, i.e. be source of(banans, potassium).
Is that feasible? Can Prolog rules contain white space like that?
I think I'm locked into getting all that output from reVerb so, what would be the best way to extract the desirable component? With a Perl script? Or maybe sed?
*Later I plan to replace this with a larger input file as opposed to just single sentences.
This seems wasteful. Why not leave the tabs as they are, and use:
$ echo "Bananas are an excellent source of potassium." \
| ./reverb -q | cut --fields=16,17,18
And yes, you can have rules like this in Prolog. See the answer by #mat. You need to know a bit of Prolog before you move on, I guess.
It is easier, however, to just make the string a a valid name for a predicate:
be_source_of with underscores instead of spaces
or 'be source of' with spaces, and enclosed in single quotes.
You can use probably awk to do what you want with the three fields. See for example the printf command in awk. Or, you can parse it again from Prolog directly. Both are beyond the scope of your current question, I feel.
sed -n 'N;N
:cycle
$!{N
D
b cycle
}
s/\(.*\)\n\(.*\)\n\(.*\)/\2 (\1,\3)/p' YourFile
if number are in output and not jsut for the reference, change last sed action by
s/\^ *[0-9]\{1,\} \{1,\}\(.*\)\n *[0-9]\{1,\} \{1,\}\(.*\)\n *[0-9]\{1,\} \{1,\}\(.*\)/\2 (\1,\3)/p
assuming the last 3 lines are the source of your "rules"
Regarding the Prolog part of the question:
Yes, Prolog facts can contain whitespace like this, with suitable operator declarations present.
For example:
:- op(700, fx, be).
:- op(650, fx, source).
:- op(600, fx, of).
Example query and its result, to let you see the shape of terms that are created with this syntax:
?- write_canonical(be source of(a, b)).
be(source(of(a,b))).
Therefore, with these operator declarations, a fact like:
be source of(a, b).
is exactly the same as stating:
be(source(of(a,b)).
Depending on use cases and other definitions, it may even be an advantage to create this kind of facts (i.e., facts of the form be/1 instead of source_of/2). If this is the only kind of facts you need, you can simply write:
source_of(a, b).
This creates no redundant wrappers and is easier to use.
Or, as Boris suggested, you can use single quotes as in 'be source of'/2.

Find Duplicate Function names in different files

I have been merging all of source-code files used by various developers/CAD drafters for the past 15 or so years. It appears that everyone worked off the same code base until about 7 years ago, when everyone seems to have made a local copy of all the files and used/edited them locally.
I have successfully/painfully merged all of their files with the same names back together. However, I am finding that sometimes, files with different names contain functions with the same names and parameters. Tools that are expecting one implementation of a function may end up calling a different one depending on which files were loaded when.
Is there a simple way to search all of the files for repeated function names?
For Example, a function looks like this:
(defun MyInStr (SearchIn SearchFor)
...
)
How could I search all files for (defun MyInStr (SearchIn SearchFor)
I would suggest using ctags to generate the TAGS file, then searching it for duplicate lines:
$ ctags -R
$ sort TAGS -o - | uniq -c | grep -v '^ *1 '
The above will produce output like this:
...
3 defun MyInStr (SearchIn SearchFor)
...
which will tell you that MyInStr is re-defined 3 times in the codebase with the identical signature.
You can also extract just the function name using sed or do a more complicated processing of the TAGS file with perl or lisp or python any other scripting tool.

Replace matches of one regex expression with matches from another, across two files

I am currently helping a friend reorganise several hundred images on a database driven website. I have generated a list of the new, reorganised image paths offline and would like to replace each matching image reference in the sql export of the database with the new paths.
EDIT: Here is an example of what I am trying to achieve
The new_paths_list.txt is a file that I generated using a batch script after I had organised all of the existing images into folders. Prior to this all of the images were in just a few folders. A sample of this generated list might be:
image/data/product_photos/telephones/snom/snom_xyz.jpg
image/data/product_photos/telephones/gigaset/giga_xyz.jpg
A sample of my_exported_db.sql (the database exported from the website) might be:
...
,(110,32,'data/phones/snom_xyz.jpg',3),(213,50,'data/telephones/giga_xyz.jpg',0),
...
The result I want is my_exported_db.sql to be:
...
,(110,32,'data/product_photos/telephones/snom/snom_xyz.jpg',3),(213,50,'data/product_photos/telephones/gigaset/giga_xyz.jpg',0),
...
Some pseudo code to illustrate:
1/ Find the first image name in my_exported_db.sql, such as 'snom_xyz.jpg'.
2/ Find the same image name in new_paths_list.txt
3/ If it is present, copy the whole line (the path and filename)
4/ Replace the whole path in in my_exported_db.sql of this image with the copied line
5/ Repeat for all other image names in my_exported_db.sql
A regex expression that appears to match image names is:
([^)''"/])+\.(?:jpg|jpeg|gif|png)
and one to match image names, complete with path (for relative or absolute) is:
\bdata[^)''"\s]+\.(?:jpg|jpeg|gif|png)
I have looked around and have seen that Sed or Awk may be capable of doing this, but some pointers would be greatly appreciated. I understand that this will only work accurately if there are no duplicated filenames.
You can use sed to convert new_paths_list.txt into a set of sed replacement commands:
sed 's|\(.*\(/[^/]*$\)\)|s#data\2#\1#|' new_paths_list.txt > rules.sed
The file rules.sed will look like this:
s#data/snom_xyz.jpg#image/data/product_photos/telephones/snom/snom_xyz.jpg#
s#data/giga_xyz.jpg#image/data/product_photos/telephones/gigaset/giga_xyz.jpg#
Then use sed again to translate my_exported_db.sql:
sed -i -f rules.sed my_exported_db.sql
I think in some shells it's possible to combine these steps and do without rules.sed:
sed 's|\(.*\(/[^/]*$\)\)|s#data\2#\1#|' new_paths_list.txt | sed -i -f - my_exported_db.sql
but I'm not certain about that.
EDIT<:
If the images are in several directories under data/, make this change:
sed "s|image/\(.*\(/[^/]*$\)\)|s#[^']*\2#\1#|" new_paths_list.txt > rules.sed