Don't escape bibtex citation in pandoc conversion - ms-word

I am having to work with word users who don't use bibtex. However I am trying to create a workflow where those users simply input [#citation] directly in the word doc. Then I use pandoc to convert that word doc back to markdown so I can more readily use bibtex. So here is what is in the word doc (foo.docx):
Sentence with a citation [#foo]
Then I run this pandoc code:
pandoc -s foo.docx -t markdown -o foo.md
Then resulting markdown is:
Sentence with a citation \[\#foo\]
Because of these escaped characters, I can't actually generate the citations. Happy to entertain other ideas to make this work but merging markdown and office users seems to be always a bit friction-y. The ultimately question is how to maintain citation from word to markdown?

Related

pandoc markdown to docx - keep list on one page

I have a markdown list like so:
* Question A
- Answer 1
- Answer 2
- Answer 3
I need to ensure that all the answers (1 - 3) appear on the same page as Question A when I convert the markdown document to docx using pandoc. How can I do this?
Use custom styles in your Markdown and then define those styles in a custom docx template.
It's important to note that Pandoc's documentation states (emphasis added):
Because pandoc’s intermediate representation of a document is less
expressive than many of the formats it converts between, one should
not expect perfect conversions between every format and every other.
Pandoc attempts to preserve the structural elements of a document, but
not formatting details...
Of course, Markdown has no concept of "pages" or "page breaks," so that is not something Pandoc can handle by default. However, Pandoc is aware of docx styles. As the documentation explains:
By default, pandoc’s docx output applies a predefined set of styles
for blocks such as paragraphs and block quotes, and uses largely
default formatting (italics, bold) for inlines. This will work for
most purposes, especially alongside a reference.docx file. However, if
you need to apply your own styles to blocks, or match a preexisting
set of styles, pandoc allows you to define custom styles for blocks
and text using divs and spans, respectively.
If you define a div or span with the attribute custom-style, pandoc
will apply your specified style to the contained elements. So, for
example using the bracketed_spans syntax,
[Get out]{custom-style="Emphatically"}, he said.
would produce a docx file with “Get out” styled with character style
Emphatically. Similarly, using the fenced_divs syntax,
Dickinson starts the poem simply:
::: {custom-style="Poetry"}
| A Bird came down the Walk---
| He did not know I saw---
:::
would style the two contained lines with the Poetry paragraph style.
If the styles are not yet in your reference.docx, they will be defined
in the output file as inheriting from normal text. If they are already
defined, pandoc will not alter the definition.
If you don't want to define the style manually, but would like it applied to every list automatically (or perhaps to every list which follows a specific pattern), you could define a custom filter which applied the style(s) to every matching element in the document.
Of course, that only adds the style names to the output. You still need to define the styles (tell Word how to display elements assigned those styles). As the documentation for the --reference-doc option explains :
For best results, the reference docx should be a modified version of a
docx file produced using pandoc. The contents of the reference docx
are ignored, but its stylesheets and document properties (including
margins, page size, header, and footer) are used in the new docx. If
no reference docx is specified on the command line, pandoc will look
for a file reference.docx in the user data directory (see --data-dir).
If this is not found either, sensible defaults will be used.
To produce a custom reference.docx, first get a copy of the default
reference.docx: pandoc --print-default-data-file reference.docx >
custom-reference.docx. Then open custom-reference.docx in Word, modify
the styles as you wish, and save the file.
Of course, when modifying the custom-reference.docx in Word, you can add your new custom style which you have used in your Markdown. As #CindyMeister points out in a comment:
Word would handle this using styles, where the Question style would
have the paragraph setting "Keep with Next". the Answer style would
have this as well. A third style, for the last entry, would NOT have
the setting activated. In addition, all three styles would have the
paragraph setting "Keep together" activated.
Finally, when using pandoc to convert your Markdown to a Word docx file, use the option --reference-doc=custom-reference.docx and your custom style definitions will be included in the generated docx file. As long as you also properly identify which elements in the Markdown document get which styles, your should have a list which doesn't get broken across a page break as long at the entire list fits on one page.

Markdown metadata format

Is there a standard or convention for embedding metadata in a Markdown formatted post, such as the publication date or post author for conditional rendering by the renderer?
Looks like this Yaml metadata format might be it.
There are all kinds of strategies, e.g. an accompanying file mypost.meta.edn, but I'm hoping to keep it all in one file.
There are two common formats that look very similar but are actually different in some very specific ways. And a third which is very different.
YAML Front Matter
The Jekyll static site generator popularized YAML front matter which is deliminated by YAML section markers. Yes, the dashes are actually part of the YAML syntax. And the metadata is defined using any valid YAML syntax. Here is an example from the Jekyll docs:
---
layout: post
title: Blogging Like a Hacker
---
Note that YAML front matter is not parsed by the Markdown parser, but is removed prior to parsing by Jekyll (or whatever tool you're using) and could actually be used to request a different parser than the default Markdown parser for that page (I don't recall if Jekyll does that, but I have seen some tools which do).
MultiMarkdown Metadata
The older and simpler MultiMarkdown Metadata is actually incorporated into a few Markdown parsers. While it has more recently been updated to optionally support YAML deliminators, traditionally, the metadata ends and the Markdown document begins upon the first blank line (if the first line was blank, then no metadata). And while the syntax looks very similar to YAML, only key-value pairs are supported with no implied types. Here is an example from the MultiMarkdown docs:
Title: A Sample MultiMarkdown Document
Author: Fletcher T. Penney
Date: February 9, 2011
Comment: This is a comment intended to demonstrate
metadata that spans multiple lines, yet
is treated as a single value.
CSS: http://example.com/standard.css
The MultiMarkdown parser includes a bunch of additional options which are unique to that parser, but the key-value metadata is used across multiple parsers. Unfortunately, I have never seen any two which behaved exactly the same. Without the Markdown rules defining such a format everyone has done their own slightly different interpretation resulting in a lot of variety.
The one thing that is more common is the support for YAML deliminators and basic key-value definitions.
Pandoc Title Block
For completeness there is also the Pandoc Title Block. If has a very different syntax and is not easily confused with the other two. To my knowledge, it is only supported by Pandoc (if enabled), and it only supports three types of data: title, author, and date. Here is an example from the Pandoc documentation:
% title
% author(s) (separated by semicolons)
% date
Note that Pandoc Title Blocks are one of two style supported by Pandoc. Pandoc also supports YAML Metadata as described above.
A workaround use standard syntax and compatible with all other viewers.
I was also looking for a way to add application specific metadata to markdown files while make sure the existing viewers such as vscode and github page will ignore added metadata. Also to use extended markdown syntax is not a good idea because I want to make sure my files can be rendered correctly on different viewers.
So here is my solution: at beginning of markdown file, use following syntax to add metadata:
[_metadata_:author]:- "daveying"
[_metadata_:tags]:- "markdown metadata"
This is the standard syntax for link references, and they will not be rendered while your application can extract these data out.
The - after : is just a placeholder for url, I don't use url as value because you cannot have space in urls, but I have scenarios require array values.
Most Markdown renderers seem to support this YAML format for metadata at the top of the file:
---
layout: post
published-on: 1 January 2000
title: Blogging Like a Boss
---
Content goes here.
The most consistent form of metadata that I've found for Markdown is actually HTML meta tags, since most Markdown interpreters recognize HTML tags and will not render meta tags, meaning that metadata can be stored in a way that will not show up in rendered HTML.
<title>Hello World</title>
<meta name="description" content="The quick brown fox jumped over the lazy dog.">
<meta name="author" content="John Smith">
## Heading
Markdown content begins here
You can try this in something like GitHub Gist or StackEdit.
Correct.
Use the yaml front matter key-value syntax — like MultiMarkdown supports — but (ab)use the official markdown URL syntax to add your metadata.
… my workaround looks like this:
---
[//]: # (Title: My Awesome Title)
[//]: # (Author: Alan Smithee)
[//]: # (Date: 2018-04-27)
[//]: # (Comment: This is my awesome comment. Oh yah.)
[//]: # (Tags: #foo, #bar)
[//]: # (CSS: https://path-to-css)
---
Put this block at the top of your .md doc, with no blank line between the top of the doc and the first ---.
Your fake yaml won't be included when you render to HTML, etc. … it only appears in the .md.
You can also use this technique for adding comments in the body of a markdown doc.
This is not a standard way, but works with Markdown Extra.
I wanted something that worked in the parser, but also didn't leave any clutter when I browse the files on Bitbucket where I store the files.
So I use Abbreviations from the Markdown Extra syntax.
*[blog-date]: 2018-04-27
*[blog-tags]: foo,bar
then I parse them with regexp:
^\*\[blog-date\]:\s*(.+)\s*$
As long as I don't write the exact keywords in the text, they leave no trace. So use some prefix obscure enough to hide them.
I haven't seen this mentioned elsewhere here or in various blogs discussing the subject, but in a project for my personal website, I've decided to use a simple JSON object at the top of each markdown file to store metadata. It's a little more cumbersome to type compared to some of the more textual formats above, but it's super easy to parse. Basically I just do a regex such as ^\s*({.*?})\s*(.*)$ (with the s option on to treat . as \n) to capture the json and markdown content, then parse the json with the language's standard method. It allows pretty easily for arbitrary meta fields.

Escape newline in source

There's a huge wiki page in our dokuwiki I have the pleasure of having to edit, the problem is that most of it is a giant table.
I know you can insert a newline into the result without writing a newline in the markup (which would be interpreted as a paragraph change), but all I want to do is put line breaks in the source and not have it affect the wiki page at all (so it's easier for editors to read, like an html table I suppose, where literal newlines are ignored).
So is there some syntax available to escape a newline in dokuwiki, not unlike \ for bash or ^ for DOS?
What you ask is afaik not possible in the dokuwiki core.
One alternative would be the edittable-plugin. It lets you edit your table within an actual table-editor. However the current version is somewhat flaky in very wide tables. Should that be an issue for you, then you could try an older version from June 2015.

Pandoc: Prevent conversion of special characters in LaTeX

I am converting a MS Word-document (.docx) with Pandoc to LaTeX (.tex). The .docx-file contains backslashes and brackets which Pandoc converts to the corresponding LaTeX-commands (e.g. \textbackslash) what I do not want.
How can I prevent Pandoc from converting special characters?
I think pandoc is actually doing what you want. You cannot have plain backslashes in LaTeX since they would be interpreted as commands, so instead you have to use \textbackslash{}, which is the command to print a simple plain backslash in LaTeX. Try generating a PDF with LaTeX and you'll see what I mean.
If you actually want to include LaTeX commands in your Word file, I think that's not possible. (How would pandoc know whether the word user wanted to write a backslash or a LaTeX command?) However, you can transform your word doc to markdown, adjust it (in pandoc markdown you actually can include raw TeX), then export it to LaTeX.
pandoc input.docx -o file.md
# edit file.md now
pandoc file.md -o output.tex
For a more automated solution, you could look into pandoc filters. Then it's up to you how to solve the ambiguity of backslashes...

Difference between code and verbatim in Org-mode?

What is the intended difference between ~code~ and =verbatim= markup in Org-mode? Exporting to HTML in both cases yields <code> tags.
Same for LaTeX...
Though, as they are fontified differently in your buffer, you can use them for different semantics.
Personally, I use "code" for var/func names, commands to be typed, etc; and "verbatim" for paths or file names.
I would have loved to have the same number of markups as there are in TeX Info, but that's not the case...
In Org 8.0 (ox-* exporters) there are a few differences.
In LaTeX
Code comes out as `\verb{sep}content{sep} where {sep} is found as an appropriate delimiter.
Verbatim comes out as \texttt{content} with certain characters escaped/protected.
In HTML and ODT
Code and Verbatim are treated identically
In TeXInfo
The same behaviour is followed as in LaTeX.