Is there an option to control output page orientation (using knitr->pander->pandoc->docx) - ms-word

I am playing with Tal's intro to producing word tables with as little overhead as possible in real world situations. (Please see for reproducible examples there - Thanks, Tal!) In real application, tables are to wide to print them on a portrait-oriented page, but you might not want to split them.
Sorry if I have overlooked this in the pandoc or pander documentation, but how do I control page orientation (portrait/landscape) when writing from R to a Word .docx file?
I maybe should add tat I started using knitr+markdown, and I am not yet familiar with LaTex syntax. But I'm trying to pick up as much as possible while getting my stuff done.

I am pretty sure the docx writer has no section breaks implemented, also as far as I understand --reference-docx allows for customizing styles and not the page layout (but I might also be wrong here), this is from pandocs guide on --reference-docx:
--reference-docx=FILE
Use the specified file as a style reference in producing a docx file.
For best results, the reference docx should be a modified version of a
docx file produced using pandoc. The contents of the reference docx
are ignored, but its stylesheets are used in the new docx. If no
reference docx is specified on the command line, pandoc will look for
a file reference.docx in the user data directory (see --data-dir). If
this is not found either, sensible defaults will be used. The
following styles are used by pandoc: [paragraph] Normal, Title,
Authors, Date, Heading 1, Heading 2, Heading 3, Heading 4, Heading 5,
Block Quote, Definition Term, Definition, Body Text, Table Caption,
Image Caption; [character] Default Paragraph Font, Body Text Char,
Verbatim Char, Footnote Ref, Link.
Which are styles that are saved in the /word/styles.xml component of the docx document.
The page layout on the other hand is saved in the /word/document.xml component in the <w:sectPr> tag, but pandoc's docx writer ignores this part as far as I can tell.
The docx writer builds by default a continuous document, with elements such as headers, paragraphs, simple tables and so on ... much like a html output.
Option #1 (doesn't solve the page orientation problem):
The only page layout option that you can define through styles is the pageBreakBefore which will add a page break before a certain style
Option #2 (seems elegant but hasn't been tested):
Recently the custom writer has been added that allows for a custom lua script, where you should be able to define how certain Pandoc blocks will be written into the output file ... meaning you could potentially define section breaks and page layout for a specific block inserting the sectPr tag into the document. I haven't tried this out but it would be worth investigating. On pandoc github you can check out a sample lua script file for custom html output.
However, this means, you have to have lua installed, learn the language, and it is up to you if you think its worth the time investment.
Optin #3 (a couple of clicks in Word might just do):
As you will probably spend quite some time setting up how to insert sections and what would be the right size, margins, and figuring how to fit the table to such a layout ... I recommend that you use pandoc to put write your document.docx, that you open in Word, and do the layout by hand:
select the table you want on the landscape page
go to Layout > Margins
> select Apply to: Selected text
> choose Page Setup > select Landscape
Now a new section with a landscape orientation should surround your table.
What you would anyway also probably want to do is styling the table and table caption a little (font-size,...), to achieve the best result (all text styling can be already applied with pandoc where --reference-docx comes handy).
Option #4 (in situation when you can just use pdf instead of docx):
As far as I could figure out is that with pandoc does a good job with tables in md -> docx (alignment, style, ... ), in tex -> docx it had some trouble sometimes. However if your option allows for a pdf output latex will be your greatest friend. For example your problem is solved as easily as just using
\usepackage{pdflscape}
and adding this around your table
\begin{landscape}
...
\end{landscape}
This are the options that I could think of so far.
I would always recommend using the pdf format for reports, as you can style it to your liking with latex and the layout will stay the way you want it to be.
However, I also know that for various reasons word documents are still the main way of reviewing manuscripts in many fields ... so i would most likely just go with my suggested option 3, mostly cause it is a lazy and quick solution and because I usually don't have many documents with tons of giant tables with awkward placement and styling.
Good luck ;-)

Based on Taleb's answer here and some officer package functions, I created a little gist that one can use like this:
---
title: "Example"
author: "Dan Chaltiel"
output:
word_document:
pandoc_args:
'--lua-filter=page-break.lua'
---
I'm in portrait
\endLandscape
I'm in landscape
\endPortrait
I'm in portrait again
With page-breaks.lua being the file hosted here: https://gist.github.com/DanChaltiel/e7505e62341093cfdc489265963b6c8f
This is far from perfect (for instance it won't work without the last portrait section), but it is quite useful sometimes.

Related

How to do I get a media wiki page to look like a word doc with a standard template defined by the Docs dept?

We have a standard word template used by the Doc dept. When they have finished a doc, they archive it in pdf. It is immediately obsolete.
My proposed solution is to use media wiki transclusions to compile a doc from reusable 'idea pages'. The analogy is to have reusable text the way we have reusable code. So if a step in a process is to 'Plug in the D* thing' There would be a wiki page for that. It would be included by reference (transclusion} in any document that need that information, and it is maintained in one place, eliminating a doc search in all the places it might be when it changes.
I have prototyped it this far, and from a git diff between tags, I can produce a list of system tests for that tag by wrapping the lines of the output with transclusion brackets..
Now I am looking to make the document look and feel like the Word standard doc for archival purposes. I wish to print to pdf and have the standard word styles apply.
I am tempted to:
Copy a really ugly word style sheet and trim it of unused stuff.
Use templates to impose styles on mediawiki stuff (makes ugly markup)
Use a magic style converter. I am hoping for this.
Any ideas?

Inserting non-text items in VS code similar to RTF editors?

I use VS code for lots of things, from actual programming to just taking notes or conspecting. A few features I would really love would be
ability to insert images in between the text
live, cell based auto formatting for text that forms a small table
Both of these can be done in rich text editors like ms word by inserting just images or an excel table.
Now I realize that VS code is rather file format agnostic and it wouldn't make a whole lot of sense to somehow have an excel table in the middle of a .js file but I think there is a way.
For example, JSDoc is a kind of add-on format that lives entirely within js comments. Same could be done with tables and/or images. Of course there wouldn't be a universal way to encode this but just like JSDoc it could be adapted to different language environments, be it a php file or c file or plain text.
For example, in raw text format, an auto formatted table of data could look something like this within a javascript file:
const my_data = [
/*!!! auto_format_csv(Title, Description, Weight) !!!*/
"Apple", "A nice fruit", 1,
"Car", "A motorized vehicle", 2000
/*!!! auto_format_end !!!*/
];
and while editing the file in vs code, it could look something like this:
So to the question part: are there perhaps any extensions that already do this sort of thing? If not, is it possible to create such an extension with the liberties given to extensions as of right now?
I know that vs code is based on electron and open source so in theory, everything is possible but I want to have these features as easily as possible so having framework support for this would likely help a lot.

The auto numbering/bullets of the Source MS Word document do not automatically align with those of the Destination MS Word document

It happens when a user inserts a Word Document(Source) into another Word Document(Destination) - auto numbering/bullets didn't work...
I wont to insert text here, and continue auto-numbering of source as it in destination:
Step 1
But it inserting above without auto-numbering:
Step 2
I have an answer from MS that it's a feature of MS Word :(.
So, the question is - is it possible to automatically align the above either by use Word Automation ( via C# , .NET Interop Word APIs )?
I mean to align the source document contents as per the destination document’s Auto-numbering. The same is about Bullets...
Please help - we are open to any suggestions/ recommendations.
Both documents need to have the same style definitions for both the Paragraph Style of the text being copied and also the List Style that organizes the numbering or bullets. Using automation, you can copy styles between the files before actually moving the numbered or bulleted text.
I agree with John.
The basic idea is to use the Define New MultiLevel List Dialog to define a new multilevel list in which each level you would be using is attached to an existing paragraph style. A paragraph style can be attached to only one level in one list. Then use the styles to apply the numbering.
Once you have them set up, you apply the numbering by using the styles, not the numbering controls.
Setting up the numbering linked to styles can seem a bit convoluted. Step-by-step instructions for doing it in Windows can be found here:
http://www.shaunakelly.com/word/numbering/numbering20072010.html
Backup: http://web.archive.org/web/20130510174814/http://www.shaunakelly.com/word/numbering/numbering20072010.html
For a Mac, John has a page showing the Mac controls to accomplish it.
http://www.brandwares.com/bestpractices/2016/06/outline-numbering-in-word-for-os-x/
Backup link: http://web.archive.org/web/20200912134758/http://www.brandwares.com/bestpractices/2016/06/outline-numbering-in-word-for-os-x/
These describe the only known ways to have consistent numbering in Word in heavily-edited / co-authored documents.

asciidoc: is there a way to create an anchor that will be visible in libreoffice writer?

Tl;dr;
What is the correct way to create an anchor in docbook? and is there a way that will make the anchor visible in writer?
Background
I am trying to split up documentation that was previously in single open office documents into smaller asciidoc documents which are both included in the main open office document and also converted to either or both of html & pdf.
I have this mostly working. I use asciidoctor to create html. asciidoctor-pdf to create pdf and a combination of asciidoctor and pandoc to create .odt files. I also tried the python implementation of asciidoc but found the interface less useable.
Round tripping between asciidoc and odt is obviously not possible. This is sort of a fusion where the master document is word processed but pieces of content that can be produced independently (think man pages - in fact that is one of several use cases) are included.
asciidoc to html:
asciidoctor -b html5 foo.adoc -o foo.html
asciidoc to pdf:
asciidoctor-pdf -b pdf foo.adoc -o foo.pdf
asciidoc to odt
asciidoctor -b docbook foo.adoc -o foo.docbook
pandoc --base-header-level=3 -V date:"" -V title:"" -f docbook foo.docbook -o foo.odt
With pandoc I have to nullify the date and title and set the header-level as desired for the section to be inserted as an extra complication.
I insert the resulting .odt into the main document using insert section inside open office.
Note that the main document is not a master document as I could not find a way of creating a master document without also automatically splitting the file on h1 boundaries.
I have two main problems to resolve with this set-up. I would like to add headings in the asciidoc document as cross references and also create entries for them in the alphabetical index (actually the first heading would be suffcient). Is there a way to do this?
Index markers in asciidoc do not result in entries in .odt file being created.
I am able to cross reference content in the inserted section using "insert reference/heading" and referencing the uniquely named header. However, whenever I use "update all" these cross references are invalidated. They are shown as "Error: Reference source not found".
[On a separate note I would also like a way to find broken cross references automatically]
I am currently using libreoffice - Version: 4.3.7.2
I am not adverse to switching version or flavours (i.e. apache) if one behaves better than the other.
I'm not sure if the answer is in the asciidoc or docbook parts of the chain. I would accept an answer which inserts a index entry at the start of the inserted section (top of the .adoc/docbook file) automatically.
I am also open to changing my toolchain to something that will work.
For example I tried the asciidoc-odt backend and fell foul of https://github.com/dagwieers/asciidoc-odf/issues/47 which does not inspire confidence.
Using asciidoc-odt I avoid the need to create an intermediate docbook file. However, I still can't get the anchor to appear.
I can get a macro to create an anchor but at present I haven't figured out how to run the macro from the command line.
To create an anchor in DocBook, make an inline anchor in the .adoc file. For example, giving this to asciidoctor:
[[X1]]Section1
---------------
produced this:
<title>
<anchor xml:id="X1" xreflabel="[X1]"/>
Section1
</title>
Conversely, putting this on separate lines did not create an anchor tag in my test:
[[X1]]
Section 1
Now for some bad news. From the Pandoc User's Guide:
Internal links are currently supported for HTML formats (including HTML slide shows and EPUB), LaTeX, and ConTeXt.
I interpret this to mean that currently, Pandoc does not create internal links in Writer. When I tried it, the link was ignored.
Note: It looks like I did not answer all of your questions. If you want to ask more about LibreOffice cross references and headings (the big bold paragraph towards the end of the question), maybe you could make a separate question just for that part.

Conversion between docx / doc / rtf and lightweight markup

I am looking for a tool or set of tools to convert between file formats D and M where
D is a format handled by MSWord, in order of preference, docx, doc, rtf
M is a lightweight markup, such as markdown, textile, txt2tags, it can be an esoteric one
there is a way to generate html from M
conversion is two-way, it's done both from D to M, and from M to D
utf-8 encoding is handled properly
the content is simple, paragraphs, some simple formatting like bold and italics, maybe lists
the tools are platform-independent
What I've found so far
TeX, LaTeX -- too heavyweight
docx2txt -- too lightweight, it supports no formatting at all
html -- MSWord produces bloated html
a few one-way conversions, like doc to mediawiki,
UPDATE:
The use case is a document workflow between technical and non-technical people
I, the technical guy edit a document in plain text, put it into version control, etc.
I send it to my manager or other non-technical people
They add comments, make changes to it using their Word, then they send it back to me
I want to simply grok their changes, make my changes, put it into version control, without having to use Word
I think that Pandoc much more than meet all requirements.
http://pandoc.org
Adam, I've used docx4j to convert docx to html, edit the html in CKEditor, and then use docx4j to convert the html back to docx. My process made some assumptions about the css (ie it was designed to handle docx4j's clean html, and editing in CKEditor).
You don't say whether there is a way to generate M from HTML?
This is probably hard to do two-way, since you will have impedance mismatches between the various formats.
The best world I can think of would be a sort of Wiki / Word hybrid: Maybe you can get Google Wave to do that for you?
Another solution that might work is a CMS like Plone (did they ever add WYSIWIG capability? I stopped caring after version 1). Keep your documents there. Let the system handle changes, annotations etc. You can automate retrieval of the source (should be ReStructuredText) and commit that to your source control if you have to.
This script I wrote might help you in your workflow:
https://github.com/matb33/docx2md
It is a command-line PHP script that will only work with .docx files. It will extract the XML, run some XSL transformations, and provide you the result in Markdown format.
I encourage you to send me .docx files that don't convert accurately. I'd love to make this script as robust and reliable as possible.