From word to Latex using pandoc , problem with citation - ms-word

My main propose of using pandoc is to make word documents from latex files, so I share them with my colleagues for review. I am new in pandoc, so I used a straightforward example.
I used pandoc to create a docx file form a simple Latex tex file, which had a simple one citation. The docx file is created successfully, However, when I wanted to reverse the process and create a Latex file from the newly created docx file. The citations are copied in the newly tex file just as simple text, without any citation command in latex. Is there any way you can transfer citations from docx file to latex file, and store them is some kind bib file through panndoc?
-s input.tex --bibliography=b.bib -o output.docx
-s output.docx --bibliography=b.bib -o input.tex

Change the word reference style to "export bibtex" and at least you can copy paste the references into a bibtex file. References inside the text are kind of okay. References in footnotes don't work so well

Related

Import docx file into emacs org-mode

I have been scouring the web for hours now and I haven't found a complete answer. I am wanting to convert my .org file to .docx (and docx. to .org) while maintaining the sections and tables. I have found and tried using pandoc through powershell as a tool to do this but I believe I am not doing any thing.
Here is the command I type into pandec:
pandoc report7a.org -s -o report7.docx
Shows the error:
pandoc.exe: Cannot decode byte '\xfe': Data.Text.Internal.Encoding.decodeUtf8: Invalid UTF-8 stream
I have little experience with doing stuff like this.
Here is an image of the .org file I want to convert
I think that your editor put the byte order mark (BOM) into begin of the file. Check this post on how to do it with Emacs.

Using Pandoc to read Docx, to capture contents of a docx textbox

When I use Pandoc to read a docx file, it ignores textboxes.They do not seem to be copied to the intermediary format.
How can I make pandoc read the textbox into the intermediary format, so that I can write a filter to include it in the output?
I'm the maintainer of the docx reader in pandoc. I don't think it currently deals with text boxes -- and I'm not sure there's an elegant way to represent them in pandoc's intermediate format. But if you post an issue on pandoc's github issue tracker, along with a sample docx file, I'll take a look and see if it's possible to add it to the reader.

Can I inline a org file inside another org file?

In latex we can split a big document(paper.tex) into several tex files(abstract.tex,intro.tex ...) which can be inserted inline using \input{paper.tex}.
Is there a similar facility in org-mode?
Just use the include command i.e.
#+INCLUDE abstract.org
#+INCLUDE intro.org
This will inline abstract.org and intro.org in the current org file. See also the org-mode documentation on include files.
Without test, but I think you can just simply add the latex command \input{paper.tex} in any place of .org file you want. It will render the .tex file to final file.
PS: other option: Include-files which are mentioned in the comment.

Adding latex preamable to an Rmd file?

I'd like to include some latex code in my .Rmd file so that when I do:
knit('file.Rmd')
pandoc('file.md', format='latex')
then the .tex file is processed with
\usepackage{mathpazo}
in the preamble of the latex file? Is this possible, by adding the preamble text somehow dierctly into the .Rmd file, similar to what can be done with embedded configs:
https://github.com/yihui/knitr-examples/blob/master/088-pandoc-embedded.Rmd
(lines 5--13).
Or do I have to create some extra templates in my homedir for pandoc to find?
Thanks, Stephen

How to generate Microsoft Word documents using Sphinx

Sphinx supports a few output formats:
Multiple HTML files (with html or dirhtml)
Latex which is useful for creating .pdf or .ps
text
How can I obtain output in a Microsoft Word file instead?
With another doc generator I managed to generate a single html output file and then convert it to Microsoft Word format using the Word application.
Unfortunately I don't know a way to generate either Word or the HTML single-page format.
The solution I use is singlehtml builder like andho mentioned in the comment, then convert the html to docx using pandoc.
The following sample assumes the generated html would be located at _build/singlehtml/index.html
make singlehtml
cd _build/singlehtml/
pandoc -o index.docx index.html
There is a Sphinx extension for generating docx format (which I haven't tested) and a newer one (which I also haven't tested, but looks like it is more actively maintained)
To convert files in restructured text to MSdoc, I use rst2odt and next unoconv. Look next script:
#!/bin/sh
rst2odt $1 $1.odt
unoconv -f doc $1.odt
rm $1.odt
With rst2odt you can use your own stylesheet: unoconv comes with OpenOffice and also allows to apply an Open Office style (template) during the conversion. Simply edit a converted document, change styles, add headers and footers, save that as an ODF Text Document Template (OTT) and use this as part of the conversion, like:
unoconv -f doc -t template.ott $1.odt
to use that template for various conversions later on.
I realize this is an old question, but I found that LibreOffice supports the following way of doing conversion (assuming soffice.exe is in your path):
soffice.exe --invisible --convert-to doc myInputFile.odt
Some things I have read say to use the --headless option rather than --invisible. Both seem to work on Windows.
You can start with the rst2odt.py script and then do the above to convert to an MS Word document.
Here is a link with additional start up options for LibreOffice:
http://help.libreoffice.org/Common/Starting_the_Software_With_Parameters
Here is a link with file types supported by OpenOffice which, I believe, LibreOffice should also support:
http://wiki.services.openoffice.org/wiki/Framework/Article/Filter/FilterList_OOo_3_0
This answer is not a command line solution and it is not obviously the best, but it simply works for me and save my time. After generating html file 1, you can open the generated html with a browser and copy the entire page (Crtl + a and Ctrl+ c) and then run Microsoft Office(or use live version if you don't have Microsoft Windows, like me) and paste (Ctrl+v) to it.
The best option might be rst -> odt -> doc
Convert the sphinx documents into openoffice format.
Then convert open the odt with openoffice and saved to Word. But I don't know how to do this automatically.
This is a workaround using Calibre (https://calibre-ebook.com), which includes a powerful converter. This worked well and most of the formatting are preserved:
Generate epub output in Sphinx make epub
Import epub output into Calibre and then convert epub to docx using inbuilt ebook converter.
Answer is too late for the original question, but people looking at the same problem may find this useful.
I don't now what Sphinx is, but you could create a rtf file or html file or something similar.
See the following blogpost for more information/approaches : OFFICE AUTOMATION
and from there : How to use ASP to generate a Rich Text Format (RTF) document to stream to Microsoft Word
This article describes how you can generate Rich Text Format (RTF) files with ASP script and then stream those files to Microsoft Word. This technique provides an alternative to server-side Automation of Microsoft Word for run-time document generation.
You don't use ASP script (who does :-) ), but for the idea.