Convert Doyxgen xml output to DITA - doxygen

Is there a way to take in doxygen output and convert it to dita files, which can then be rendered by e.g. Oxygen XML Author?

Related

having problems opening DITA files in OxygenXML which contain special characters

I am having problems opening files which contain special characters like é, è, ë, ê, à, á, ö, etc. The error message I get from OxygnXML is:
File encoding (UTF8) does not support all characters from the current file.
To ignore these errors or to replace invalid characters follow the link below to change the "Encoding errors handling" option value from REPORT to IGNORE or REPLACE.
The strange thing is: when I alter the file (by swapping the 'ó' for an 'o', for instance), I can import the files both in OxygenXML and in FontoXML.Afterwards I can correct them again and save the file. But I don't see a difference between the original file and the altered file.
This is the original file
<p id="id-9f3a1788-a751-4f48-ed9c-9e19447ad3b0">Ze is zó zenuwachtig, dat ze bijna aan de ... moet .</p>
And this is the saved corrected file (from FontoXML, in this case - just to show the added instructions):
<p id="id-9f3a1788-a751-4f48-ed9c-9e19447ad3b0">Ze is
z<?fontoxml-change-addition-start author-id="erik.verhaar" change-id="6f6bb382-3d43-4c5b-b35f-f857d729cf22" timestamp="1627473671530"?>ó<?fontoxml-change-addition-end change-id="6f6bb382-3d43-4c5b-b35f-f857d729cf22"?><?fontoxml-change-deletion author-id="erik.verhaar" change-id="0296c77c-863b-421f-bf5c-c0901c7a2751" text="ó" timestamp="1627473669483"?>
zenuwachtig, dat ze bijna aan de ... moet .</p>
What is the difference between the original ó and the corrected one? And how can I change my original files so they can be imported in OxygenXML?
Thanks!!
Text files (XML for example) are saved on disk using bytes, they are edited and presented using characters. An encoding takes care of converting bytes to characters (sometimes multiple bytes are converted to characters) when the document is opened and again the encoding does the conversion of characters to bytes when the document is saved.
There are many encodings but with the most popular (like UTF-8) characters belonging to the 0-128 ASCII range like a-z A-Z are usually saved to a single byte. Characters outside of the range, for example e-acute (é) usually get saved as multiple bytes, depending on the encoding used for saving.
When an XML document is opened Oxygen attempts to understand what encoding to use for reading it. If the XML document has a heading like this:
Oxygen uses the encoding specified in the heading. If the XML doc is lacking the heading Oxygen will fallback to UTF-8. Basically Oxygen implements the XML specification when it comes to detecting the encoding of the XML file:
https://www.w3.org/TR/xml/#sec-guessing
In your case Oxygen detected the encoding as UTF-8 and started to use UTF-8 to convert bytes to characters. It encountered a sequence of bytes which were not encoded using UTF-8. Oxygen does not continue loading the file because in such cases you may end up with corrupt content when saving it back.
In my opinion the other editor tool you used to create the XML files was not XML aware, it did not actually saved the XML as UTF-8 even if the heading in the XML document specified this.
We do not actually know with what encoding that other editing tool used to save the XML, one thing you could try would be to reopen the XML document in that other editing tool and change its encoding heading declaration from:
<?xml version='1.0' encoding='UTF-8'?>
to:
<?xml version='1.0' encoding='CP1250'?>
because I suspect that other editing tool actually used for saving the XML document the default platform encoding which on Windows should usually be CP1250.
Then save the XML document in the other editing tool and try to re-open it in Oxygen, if it works change its heading encoding declaration back to UTF-8 and save the XML document in Oxygen in order to properly save it using the UTF-8 encoding.
This older set of slides I made about XML encoding might also be useful to you:
https://www.oxygenxml.com/events/2018/large_xml_documents.pdf

How to programatically get the fragments called in an RTFtemplate?

I need to programmatically find the fragments that are called by each rtftemplate.
So, for example in the figure, I would need to get the "GlossaryTermsAcronyms" fragment for the H2_terms_acronyms template.
I can't seem to find any query or script solution to do this. But this should be possible, right?
Unfortunately that is (almost) impossible.
The information is stored in the t_documents.bincontent column. It is binary encoded RTF.
Somewhere in that RTF there should be a reference to the templates fragments that are used.
If you can figure out how to decode the bincontent to get to the actual RTF code of your template, you might have a chance.
Binary fields in EA are usually stored as a zipped text file.
In case the field is included in an xml file (or xml string in the database), it will be base64 encoded.

How to convert a TextMate Grammar (XML flavor) to either YAML or JSON flavor

Textmate grammar (.tmLanguage files) are sometimes expressed in XML format.
I would like to convert to a more readable format (i.e. JSON or YAML) to integrate in a VS Code Syntax Highlighting Extension.
To clarify what I mean, here are a few examples:
XML format
YAML format (equivalent to the previous one)
JSON format
I could write a script in Python to do that but it would save me some time if such converter already exists.
Thanks
The TextMate Languages extension has some commands built-in for this. Screenshot from the readme:

Starting format for text file to be converted to other formats

I need to write a document using images, texts, hyperlinks... And then convert it to PDF and DOC (but in the future it can be converted to more file formats).
What's the best "starting format" for this document?
Doc or Docx might be the best file format for creating the document containing images, texts, hyperlinks, and many more elements. Once created, it's easy to convert files in .doc/.docx format into other file format, such as Image, PDF, HTML, by using OpenXML or even commercial library like Spire.Doc.

ASCIIDOC to PDF fop issue link issue

I have some asciidoc source that I am converting to chunked HTML and PDF for documentation. The document contains external links, as follows:
ASCIIDOC source:
https://some-url-here.tld[Link Text]
Asciidoc is correctly generating the following XML representation:
<simpara>
<ulink url="https://some-url-here.tld">Link Text</ulink>
</simpara>
xsltproc is translating this XML to .fo as follows:
<fo:block space-before.optimum="1em" space-before.minimum="0.8em" space-before.maximum="1.2em">
<fo:basic-link external-destination="url(https://some-url-here.tld)">Link Text</fo:basic-link>
<fo:inline hyphenate="false">
[<fo:basic-link external-destination="url(https://https://some-url-here.tld)">https://https://some-url-here.tld</fo:basic-link>]
</fo:inline>
</fo:block>
Which renders like this in the PDF:
Link Text [ https://some-url-here.tld ]
Rather than:
Link Text
which is a link to https://some-url-here.tld
I am using Asciidoc 8.6.9 with docbook 1.7.0 xsl stylesheets.
DocBook-XSL has a parameter called ulink.show, with a default value of 1 (true). If you change the parameter value to 0, you will get the wanted output.
Reference: http://docbook.sourceforge.net/release/xsl/current/doc/fo/ulink.show.html.
Btw, DocBook-XSL 1.70 is a rather old version, but the parameter is available.