Pandoc Jupyter (.ipynb) to Word (.docx) incorrect results - ms-word

Jupyter notebook (ipynb) to Word-format document (.docx) conversion just is not working correctly. I have tried several approaches using jupyter nbconvert, pandoc, and commercial document format converters. So far, none have produced appropriate results. I have to believe there exists some way for pandoc to do the conversion correctly. Thank you so much for your assistance on this.
The Word-format document should preserve these aspects of the Jupyter notebook:
Headings numbered
Code cells hidden
Latex math expressions presented correctly
Tables with images presented correctly
Data.frame presented in same format as in Jupyter (pretty row-banded table)
Kable data.frame presented in same format as in Jupyter (pretty row-banded table)
Here attached are a test Jupyter notebook and examples of some (inappropriate) results.
PrintTest.ipynb is the Jupyter notebook to be transformed to Word-format.
PrintTest.jpg is a screenshot of the Jupyter notebook.
PrintTest_1.html from Jupyter, File > Download as > HTML Embedded (.html)
No heading numbers
Code cells exposed
PrintTest_2.html jupyter nbconvert PrintTest.ipynb --to=html --template=toc2 --output PrintTest_2.html
Sidebar table of contents exposed
PrintTest_3.docx pandoc PrintTest.ipynb -o PrintTest_3.docx
Latex math expression presented as plain text
Table with image not presented
Code cells exposed
Data.frame presented as plain text
Kable data.frame not presented
PrintTest_4.docx pandoc PrintTest_2.html -o PrintTest_4.docx
Latex math expression presented as plain text
Table with image not centered
Data.frame presented as plain text
Kable data.frame presented as plain text
PrintTest_5.docx pandoc PrintTest.ipynb --mathjax -o PrintTest_5.docx
Same as PrintTest_3.docx
PrintTest_6.docx pandoc PrintTest_2.html --mathjax -o PrintTest_6.docx
Same as PrintTest_4.docx

This post by a team targeting the issue of writing in Jupyter and exporting to Word might interest you.

With Pandoc there is a flag -N for some output writers (word, html, ...) to number the headings.
Example:
pandoc jupyter_file.ipynb -s -N -o new_word_file.docx
add --toc if you also want to export a table of content:
pandoc jupyter_file.ipynb -s -N --toc -o new_word_file.docx

Related

Is it possible for Pandoc to detect cross-references when converting from Word?

I have a minimal Word (.docx) document that contains a figure and cross-reference to it (see image below; Figure 1: Some figure). When I Ctrl+click on that cross-reference, it puts me back on the image.
But when I convert this document to markdown or native format using pandoc -s file.docx -t markdown -o file.md or pandoc -s file.docx -t native -o file.txt, the cross-reference is parsed as regular text.
What I am wondering is if it is possible for Pandoc to detect and parse such instances in .docx document using some filter or some other method. Note that cross-references to tables, chapters, and subchapters are also relevant for me.

Remove cells from Jupyter Notebook with nbconvert

Recommendations mentioned in How to hide one specific cell (input or output) in IPython Notebook? don't work.
On Windows I do the following
jupyter nbconvert a.ipynb --TagRemovePreprocessor.remove_cell_tags="{'remove_cell'}"
but get an error
traitlets.traitlets.TraitError: The 'remove_cell_tags' trait of a TagRemovePreprocessor instance must be a set, but a value of type 'unicode' (i.e. u"{'remove_cell'}") was specified.
I also tried '{"remove_cell"}'
I am using nbconvert 5.4.0
Any ideas how to do this?
You need to enable the TagRemovePreprocessor before you call it.
The code below shows how to enable it and how to enclose your tags as a list so you can exclude more than one tag if you wish. To exclude a single tag, just put one element in the list eg ['remove_cell'].
The parameter --to html is not required if you are converting to html (as html is the default). If you want to convert to python, for example, change --to html to --to python
jupyter nbconvert a.ipynb --TagRemovePreprocessor.enabled=True --TagRemovePreprocessor.remove_cell_tags="['remove_cell', 'other_tag_to_remove']" --to html
Note that the TagRemovePreprocessor is only available in nbconvert 5.3 and above: https://nbconvert.readthedocs.io/en/latest/changelog.html?highlight=TagRemovePreprocessor
Needs some extra quoting to work:
--TagRemovePreprocessor.remove_cell_tags={\"remove_cell\"}.
However beware of an ongoing issue with noteboot to notebook conversion - it seems like in this case preprocessors, including tag removal, do not run. See more in this SO question:
jupyter nbconvert --to notebook not excluding raw cells
Update: Not tested on windows, just on Linux

refman.rtf fails to use equation from '*.md' in doxygen

I use '.md' to generate '(index).html' and '(refman*).rtf' documentation with doxygen 1.8.14.
The mathematical equation in '*.md' gives a correct equation in html output but not in the file 'refman.rtf'. The other theoretical parts like paragraph and other stuff work well between *.md and rtf output.
I guess *.rtf is not recognizing the equation part of the *.md document.
Does the RTF generation through doxygen read the *.md files?
Do I need to change any tag to make *.md work with rtf output?
Not only for markdown but also for "normal" doxygen input formulas do not work.
From the documentation:
Doxygen allows you to put LATEX formulas in the output (this works
only for the HTML and LATEX output, not for the RTF nor for the man
page output).
A workaround workflow, at the moment for non inline formulas, is to do something like:
Create an image with the formula e.g in a dummy doxygen run where one does not use MATHJAX, this will result in an image with a name like: 'form_0.png'.
In the code one has to place an if construct like:
\if rtf_run
\image rtf form_0.png
\else
\f... with the formula
\endif
One now has to run doxygen twice:
once for the output without rtf, i.e. without setting rtf_run in ENABLED_SECTIONS
once for rtf output by setting rtf_run in ENABLED_SECTIONS
EDIT June 5, 2018: I've just pushed a proposed patch to github pull request 756. Here the formulas are rendered as png images and included in the RTF documentation.
EDIT: 2018/06/10: The push request has been integrated in the master version on github.

asciidoc: is there a way to create an anchor that will be visible in libreoffice writer?

Tl;dr;
What is the correct way to create an anchor in docbook? and is there a way that will make the anchor visible in writer?
Background
I am trying to split up documentation that was previously in single open office documents into smaller asciidoc documents which are both included in the main open office document and also converted to either or both of html & pdf.
I have this mostly working. I use asciidoctor to create html. asciidoctor-pdf to create pdf and a combination of asciidoctor and pandoc to create .odt files. I also tried the python implementation of asciidoc but found the interface less useable.
Round tripping between asciidoc and odt is obviously not possible. This is sort of a fusion where the master document is word processed but pieces of content that can be produced independently (think man pages - in fact that is one of several use cases) are included.
asciidoc to html:
asciidoctor -b html5 foo.adoc -o foo.html
asciidoc to pdf:
asciidoctor-pdf -b pdf foo.adoc -o foo.pdf
asciidoc to odt
asciidoctor -b docbook foo.adoc -o foo.docbook
pandoc --base-header-level=3 -V date:"" -V title:"" -f docbook foo.docbook -o foo.odt
With pandoc I have to nullify the date and title and set the header-level as desired for the section to be inserted as an extra complication.
I insert the resulting .odt into the main document using insert section inside open office.
Note that the main document is not a master document as I could not find a way of creating a master document without also automatically splitting the file on h1 boundaries.
I have two main problems to resolve with this set-up. I would like to add headings in the asciidoc document as cross references and also create entries for them in the alphabetical index (actually the first heading would be suffcient). Is there a way to do this?
Index markers in asciidoc do not result in entries in .odt file being created.
I am able to cross reference content in the inserted section using "insert reference/heading" and referencing the uniquely named header. However, whenever I use "update all" these cross references are invalidated. They are shown as "Error: Reference source not found".
[On a separate note I would also like a way to find broken cross references automatically]
I am currently using libreoffice - Version: 4.3.7.2
I am not adverse to switching version or flavours (i.e. apache) if one behaves better than the other.
I'm not sure if the answer is in the asciidoc or docbook parts of the chain. I would accept an answer which inserts a index entry at the start of the inserted section (top of the .adoc/docbook file) automatically.
I am also open to changing my toolchain to something that will work.
For example I tried the asciidoc-odt backend and fell foul of https://github.com/dagwieers/asciidoc-odf/issues/47 which does not inspire confidence.
Using asciidoc-odt I avoid the need to create an intermediate docbook file. However, I still can't get the anchor to appear.
I can get a macro to create an anchor but at present I haven't figured out how to run the macro from the command line.
To create an anchor in DocBook, make an inline anchor in the .adoc file. For example, giving this to asciidoctor:
[[X1]]Section1
---------------
produced this:
<title>
<anchor xml:id="X1" xreflabel="[X1]"/>
Section1
</title>
Conversely, putting this on separate lines did not create an anchor tag in my test:
[[X1]]
Section 1
Now for some bad news. From the Pandoc User's Guide:
Internal links are currently supported for HTML formats (including HTML slide shows and EPUB), LaTeX, and ConTeXt.
I interpret this to mean that currently, Pandoc does not create internal links in Writer. When I tried it, the link was ignored.
Note: It looks like I did not answer all of your questions. If you want to ask more about LibreOffice cross references and headings (the big bold paragraph towards the end of the question), maybe you could make a separate question just for that part.

display math on nbviewer?

I write an ipython notebook which correctly display the equations on my local machine. However, when I paste the ipynb file to Gist and use the nbviewer to view it, some math equation disappear. What causes the problem? Any way to fix it? The ipynb has the the following latex code in Markdown cell:
\begin{align}
F(P)=f_L(P)+f_G(P_{i,j})+f_{elec}(P,\phi_{ext},\phi_{int})\qquad (1)
\end{align}
The problem you are facing is that nbviewer uses nbconvert to convert the ipynb to html. Nbconvert inturn uses pandoc to do the conversion and pandoc strips raw latex (the \align in your case) when converting markdown to html.
You can try to embed the raw LaTeX into $s to make pandoc aware, however not all constructs are supported and will be converted (see Github Issue for more details).