asciidoc: is there a way to create an anchor that will be visible in libreoffice writer? - libreoffice

Tl;dr;
What is the correct way to create an anchor in docbook? and is there a way that will make the anchor visible in writer?
Background
I am trying to split up documentation that was previously in single open office documents into smaller asciidoc documents which are both included in the main open office document and also converted to either or both of html & pdf.
I have this mostly working. I use asciidoctor to create html. asciidoctor-pdf to create pdf and a combination of asciidoctor and pandoc to create .odt files. I also tried the python implementation of asciidoc but found the interface less useable.
Round tripping between asciidoc and odt is obviously not possible. This is sort of a fusion where the master document is word processed but pieces of content that can be produced independently (think man pages - in fact that is one of several use cases) are included.
asciidoc to html:
asciidoctor -b html5 foo.adoc -o foo.html
asciidoc to pdf:
asciidoctor-pdf -b pdf foo.adoc -o foo.pdf
asciidoc to odt
asciidoctor -b docbook foo.adoc -o foo.docbook
pandoc --base-header-level=3 -V date:"" -V title:"" -f docbook foo.docbook -o foo.odt
With pandoc I have to nullify the date and title and set the header-level as desired for the section to be inserted as an extra complication.
I insert the resulting .odt into the main document using insert section inside open office.
Note that the main document is not a master document as I could not find a way of creating a master document without also automatically splitting the file on h1 boundaries.
I have two main problems to resolve with this set-up. I would like to add headings in the asciidoc document as cross references and also create entries for them in the alphabetical index (actually the first heading would be suffcient). Is there a way to do this?
Index markers in asciidoc do not result in entries in .odt file being created.
I am able to cross reference content in the inserted section using "insert reference/heading" and referencing the uniquely named header. However, whenever I use "update all" these cross references are invalidated. They are shown as "Error: Reference source not found".
[On a separate note I would also like a way to find broken cross references automatically]
I am currently using libreoffice - Version: 4.3.7.2
I am not adverse to switching version or flavours (i.e. apache) if one behaves better than the other.
I'm not sure if the answer is in the asciidoc or docbook parts of the chain. I would accept an answer which inserts a index entry at the start of the inserted section (top of the .adoc/docbook file) automatically.
I am also open to changing my toolchain to something that will work.
For example I tried the asciidoc-odt backend and fell foul of https://github.com/dagwieers/asciidoc-odf/issues/47 which does not inspire confidence.
Using asciidoc-odt I avoid the need to create an intermediate docbook file. However, I still can't get the anchor to appear.
I can get a macro to create an anchor but at present I haven't figured out how to run the macro from the command line.

To create an anchor in DocBook, make an inline anchor in the .adoc file. For example, giving this to asciidoctor:
[[X1]]Section1
---------------
produced this:
<title>
<anchor xml:id="X1" xreflabel="[X1]"/>
Section1
</title>
Conversely, putting this on separate lines did not create an anchor tag in my test:
[[X1]]
Section 1
Now for some bad news. From the Pandoc User's Guide:
Internal links are currently supported for HTML formats (including HTML slide shows and EPUB), LaTeX, and ConTeXt.
I interpret this to mean that currently, Pandoc does not create internal links in Writer. When I tried it, the link was ignored.
Note: It looks like I did not answer all of your questions. If you want to ask more about LibreOffice cross references and headings (the big bold paragraph towards the end of the question), maybe you could make a separate question just for that part.

Related

Merge 2 pdf files and preserve forms

I'd like to merge at least 2 PDF files into one while preserving all the form elements in the original PDFs. The form elements include text fields, radio buttons, check boxes, drop down menus and others. Please have a look at this sample PDF file with forms:
http://foersom.com/net/HowTo/data/OoPdfFormExample.pdf
Now try to merge it with any other arbitrary PDF file.
Can you do it?
EDIT: As for the implementation, I'd ideally prefer a command line solution on a linux plattform using open source tools such as 'ghostscript', or any other tool that you think is appropriate to solve this task.
Of course, everybody is welcome to supply any working solution to this problem, including a coded solution that involves writing a script which makes some API calls to a pdf-processing library. However, I'd suggest to take the path of least resistance first (CMD Solution).
Best Regards
EDIT #2: Well there are indeed several CMD tools that merge PDFs. However, these tools don't seem to, AFAIK, to preserve the forms in the original PDFs! These tools appear to simply just concatenate the printouts of all those PDFs into a single Printout, which is then presented as a single PDF.
Furthermore, If you printout a PDF file with forms into a file, you lose all the forms in it. This clearly not what I'm looking for.
I have found success using pdftk, which is an open-source software that runs on linux and can be called from your terminal.
To concatenate multiple pdfs into one (and preserve form-fillable elements), you can use the following command:
pdftk input1.pdf input2.pdf cat output output-file.pdf

How do I create some kind of table of content in GitHub wiki?

If you look here: http://en.wikipedia.org/wiki/Stack_Overflow
You'll notice there's a little "Content" section, if you click on one of the links, it will send you to a specific section on the page.
How do I do this in GitHub wiki? With Markdown or whatever they use?
It is nicely demonstrated in the Table of Contents of the Markdown Cheatsheet.
##### Table of Contents
[Headers](#headers)
[Emphasis](#emphasis)
...snip...
<a name="headers"/>
## Headers
If you hover over a Header in a GitHub Markdown file, you'll see a little link sample to the left of it, you can also use that link. The format for that link is <project URL#<header name>. The <header name> must be all lower case.
Since github cannot use TOC directly, but we have other alternatives.
You can automatically generate TOC via Online tool:
Generate TOC Table of Contents from GitHub Markdown or Wiki Online
or via Local tool:
github-markdown-toc
Visual Studio Code
If you happen to use Visual Studio Code, there is easy-to-use extension called Markdown All in One that can make the TOC for any .md file in an instant.
Just open Command Palette (Ctrl-Shift-P) -> Markdown: Create Table of Contents
Auto-update messes your edited TOC?
As an additional tip, you might want to turn the "automatic TOC updates on save" OFF by using
"markdown.extension.toc.updateOnSave": false,
in your Visual Studio Settings (Command Palette -> Preferences: Open Settings (JSON)).
One possible (semi-automated) solution is Eugene Kalinin's github-markdown-toc. This tool essentially crunches through your README.md file and snarfs out #'s headings to create a TOC.
Download the script https://github.com/ekalinin/github-markdown-toc
Feed your README.md to the script (as noted in Eugene's README.md)
cat README.md | bash github-markdown-toc
Cut and paste generated TOC and place it at the top of your README.md file
Note that this bash implementation only works on Linux (from what I can tell).
As a side note, there is a golang implementation and is probably more of a hassle to get working.
Update Aug. 2021:
After ToC in README (see March 2021 below), you now have:
Table of content for Wikis
For Wikis we now automatically generate a table of contents based on the Markdown headings.
As illustrated here:
Do you wiki?
We just added an automatic table of contents to the sidebar to help with navigation
You can now (March 2021) check out what the CEO of GitHub Nat Friedman just announced
GitHub now automatically creates a table of contents for your http://README.md files from your headers.
After much consideration, we made this a feature of the viewer, not a concern of the editor: no special markdown to insert.
So... it does not modify your markdown (README.md or other .md files) to insert, or update your text: it only provides a menu which allows quick access to your test sections based on markdown headers.
That may, or may not, what you are after.
https://github.com/jonschlinkert/markdown-toc
git clone your-repo.wiki.git (add the .wiki right before .git to clone the wiki
npm i -g markdown-toc
Insert <!-- toc --> (case sensitive) in your wiki's markdown
markdown-toc -i my-wiki-markdown.md (-i will edit it in place)
Profit
Update: I think maybe https://github.com/thlorenz/doctoc is more popular now.
Currently it's not possible to do that using markdown syntax (.md). There is ongoing unofficial discussion about automatically generating table of contents TOC on rendered markdown files like README.md which lists some of the ideas.
However there are some other workarounds such as:
Use AsciiDoc instead as per suggestion from this comment. For example:
:toc: macro
:toc-title:
:toclevels: 99
# Title
toc::[]
## A
### A2
## B
### B2
Check the example at littlebits/react-popover (README.adoc).
Online Table Of Content Generator (raychenon/play-table-of-contents)
arthurhammer/github-toc - browser extension that adds a table of contents to GitHub repos
If you are not in the position to stick with Markdown, you can do as below:
on GitHub/wiki: switch Markdown to MediaWiki. Use __TOC__ Syntax. See sample.
on GitHub/repo: switch Markdown to AsciiDoc. Use :toc: Syntax. See demo.
on GitHub/repo: switch Markdown to reStructuredText. Use .. contents:: Syntax.
However, using Markdown files in GitHub/repo, you can get it on GitHub Pages like in Wikipedia
when Jekyll is activated, it generates GitHub Pages using Kramdown by default
Kramdown comes with Table Of Content. Use {:toc} Syntax. See the explanation.
You can choose the Edit mode "MediaWiki" which will generate a toc for the headers, e.g.
== First ==
== Second ==
Due to the fact that github has it's own way of generating id=".." attributes in h1, h2, h3, etc... headers in html version after processing Markdown (for example Bitbucket use little different pattern of sluggifying headers title to id="slug") it is handy to don't reinvent the wheel and use library that reverse engineered this process.
I found one quite good library for this task called markdown-toc.
For me it seems the best solution because I always have installed node on my machine.
Just execute npx markdown-toc -i file.md.
And it looks like it is one of more popular tools for this task - at least in node.js ecosystem.
ls
cat <<EOF >> test.md | tee
## Table of Contents
<!-- toc -->
- old toc 1
- old toc 2
- old toc 3
<!-- tocstop -->
## abc
This is a b c.
## xyz
This is x y z.
EOF
ls
cat test.md
npx markdown-toc -i test.md
cat test.md
output:
Pandoc
The swiss army knife of markup:
cat README.md | pandoc --from markdown --toc -s --to markdown -
You can use mdtoc (I am the author).
Once installed, simply run:
mdtoc path/to/file.md
One more TOC markdown related tool implemented on the top of Perl (which is shipped with Linux/Git-for-Windows always and with Cygwin optionally, and there are no dependencies on extra packages)
https://github.com/ildar-shaimordanov/git-markdown-toc
I guess my tool works similar or almost similar to ekalinin/git-markdown-toc mentioned above by other people. I have never compared tham because his tool is implemented as Go-Lang which doesn't exist on my system. The main goal of my script is to provide the good solution in creating TOC locally -- no any connection to any exteranl hosts and so on, only read a local file (README.md, by default) and create the TOC and embed it to the file.
Example:
[Go to Delete](#delete_lines)
#delete_lines
code here, will be pointed here
See: https://guides.github.com/features/mastering-markdown/
And, to make a nested outline:
* 1\. [Go to Delete](#delete_lines)
* 1.1\. item
* 1.2\. item
* 1.2\. item
* 2\. item
See: https://meta.stackexchange.com/questions/85474/how-to-write-nested-numbered-lists
And for more info and complex linking:
https://stackoverflow.com/questions/6695439/how-to-link-to-a-named-anchor-in-multimarkdown#:~:text=In%20standard%20Markdown%2C%20place%20an,%5Blink%20text%5D(%23abcd)%20.

Is there an option to control output page orientation (using knitr->pander->pandoc->docx)

I am playing with Tal's intro to producing word tables with as little overhead as possible in real world situations. (Please see for reproducible examples there - Thanks, Tal!) In real application, tables are to wide to print them on a portrait-oriented page, but you might not want to split them.
Sorry if I have overlooked this in the pandoc or pander documentation, but how do I control page orientation (portrait/landscape) when writing from R to a Word .docx file?
I maybe should add tat I started using knitr+markdown, and I am not yet familiar with LaTex syntax. But I'm trying to pick up as much as possible while getting my stuff done.
I am pretty sure the docx writer has no section breaks implemented, also as far as I understand --reference-docx allows for customizing styles and not the page layout (but I might also be wrong here), this is from pandocs guide on --reference-docx:
--reference-docx=FILE
Use the specified file as a style reference in producing a docx file.
For best results, the reference docx should be a modified version of a
docx file produced using pandoc. The contents of the reference docx
are ignored, but its stylesheets are used in the new docx. If no
reference docx is specified on the command line, pandoc will look for
a file reference.docx in the user data directory (see --data-dir). If
this is not found either, sensible defaults will be used. The
following styles are used by pandoc: [paragraph] Normal, Title,
Authors, Date, Heading 1, Heading 2, Heading 3, Heading 4, Heading 5,
Block Quote, Definition Term, Definition, Body Text, Table Caption,
Image Caption; [character] Default Paragraph Font, Body Text Char,
Verbatim Char, Footnote Ref, Link.
Which are styles that are saved in the /word/styles.xml component of the docx document.
The page layout on the other hand is saved in the /word/document.xml component in the <w:sectPr> tag, but pandoc's docx writer ignores this part as far as I can tell.
The docx writer builds by default a continuous document, with elements such as headers, paragraphs, simple tables and so on ... much like a html output.
Option #1 (doesn't solve the page orientation problem):
The only page layout option that you can define through styles is the pageBreakBefore which will add a page break before a certain style
Option #2 (seems elegant but hasn't been tested):
Recently the custom writer has been added that allows for a custom lua script, where you should be able to define how certain Pandoc blocks will be written into the output file ... meaning you could potentially define section breaks and page layout for a specific block inserting the sectPr tag into the document. I haven't tried this out but it would be worth investigating. On pandoc github you can check out a sample lua script file for custom html output.
However, this means, you have to have lua installed, learn the language, and it is up to you if you think its worth the time investment.
Optin #3 (a couple of clicks in Word might just do):
As you will probably spend quite some time setting up how to insert sections and what would be the right size, margins, and figuring how to fit the table to such a layout ... I recommend that you use pandoc to put write your document.docx, that you open in Word, and do the layout by hand:
select the table you want on the landscape page
go to Layout > Margins
> select Apply to: Selected text
> choose Page Setup > select Landscape
Now a new section with a landscape orientation should surround your table.
What you would anyway also probably want to do is styling the table and table caption a little (font-size,...), to achieve the best result (all text styling can be already applied with pandoc where --reference-docx comes handy).
Option #4 (in situation when you can just use pdf instead of docx):
As far as I could figure out is that with pandoc does a good job with tables in md -> docx (alignment, style, ... ), in tex -> docx it had some trouble sometimes. However if your option allows for a pdf output latex will be your greatest friend. For example your problem is solved as easily as just using
\usepackage{pdflscape}
and adding this around your table
\begin{landscape}
...
\end{landscape}
This are the options that I could think of so far.
I would always recommend using the pdf format for reports, as you can style it to your liking with latex and the layout will stay the way you want it to be.
However, I also know that for various reasons word documents are still the main way of reviewing manuscripts in many fields ... so i would most likely just go with my suggested option 3, mostly cause it is a lazy and quick solution and because I usually don't have many documents with tons of giant tables with awkward placement and styling.
Good luck ;-)
Based on Taleb's answer here and some officer package functions, I created a little gist that one can use like this:
---
title: "Example"
author: "Dan Chaltiel"
output:
word_document:
pandoc_args:
'--lua-filter=page-break.lua'
---
I'm in portrait
\endLandscape
I'm in landscape
\endPortrait
I'm in portrait again
With page-breaks.lua being the file hosted here: https://gist.github.com/DanChaltiel/e7505e62341093cfdc489265963b6c8f
This is far from perfect (for instance it won't work without the last portrait section), but it is quite useful sometimes.

Automatic TOC in github-flavoured-markdown

Is it possible to generate an automatic Table of Contents using Github Flavoured Markdown?
I created two options to generate a toc for github-flavored-markdown:
DocToc Command Line Tool (source) requires node.js
Installation:
npm install -g doctoc
Usage:
doctoc . to add table of contents to all markdown files in the current and all sub directories.
DocToc WebApp
If you want to try it online first, go to the doctoc site,
paste the link of the markdown page and it will generate a table of
content that you can insert at the top of your markdown file.
Github Wikis and Anchors
As Matthew Flaschen pointed out in the comments below, for its wiki pages GitHub previously didn't generate the anchors that doctoc depends on.
UPDATE: However, they fixed this issue.
GitHub Pages (which is basically a wrapper for Jekyll) appears to use kramdown, which implements all of Maruku, and therefore has support for an automatically generated table of contents via atoc attribute:
* auto-gen TOC:
{:toc}
The first line just starts an unordered list and is actually thrown away.
This results in a nested set of unordered lists, using the headers in the document.
Note: this should work for GitHub Pages, not GitHub Flavored Markdown (GFM) as used in comments or wiki pages. AFAIK a solution doesn't exist for that.
If you edit Markdown files with Vim, you can try this plugin vim-markdown-toc.
The usage is simple, just move your cursor to the place you want to append Table of Contents and run :GenTocGFM, done!
Screenshots:
Features:
Generate toc for Markdown files. (Support GitHub Flavored Markdown and Redcarpet)
Update existing toc.
Auto update toc on save.
Update March 2021: GitHub added an official workaround
READMEs now show a ToC like this as you scroll down into them:
demo: https://github.com/cirosantilli/test-git-web-interface/tree/master/d
It does not render inside the document as I wanted for better Ctrl + F, but it is better than nothing.
Also also works for non-README as well now, e.g.: https://github.com/cirosantilli/test-git-web-interface/blob/master/md.md
They also added a repository setting to enable disable that. It's so weird, who would ever want to disable it? Under https://github.com/cirosantilli/test-git-web-interface/settings Features:
Table of contents
Autogenerate table of contents for Markdown files in this repository. The table of contents will be displayed near the top of the file.
Original answer
It's not possible, except for the workarounds proposed.
I proposed Kramdown TOC extension and other possibilities to support#github.com and Steven! Ragnarök replied with the usual:
Thanks for the suggestion and links. I'll add it to our internal feature request list for the team to see.
Let's upvote this question until it happens.
Another workaround is to use Asciidoc instead of Markdown, which does render TOCs. I've moved to this approach for my content nowadays.
It's not automatic, but it uses Notepad++ regular expressions:
Replace all first by the second (removes all lines not having headers)
^##(#?)(#?)(.*?)$(.|\r|\n)*?(?=^##|\z)
-\1\2 [\3](#\3)\n
Then (converts headers III to spaces)
-##
-
Then (converts headers II to spaces)
-#
-
Then (remove unused chars at the beginning and at the end of link title)
\[ *((?:(?![ .:#!\?;]*\])[^#])*)[ #:!\?;]*\]
[\1]
Then (convert last tokens lowercase and dash instead of spaces)
\]([^ \r\n]*) ([^\r\n ]*)
]\L\1-\2
Remove unused final pounds and initial dashes:
(?:()[-:;!\?#]+$|(\]#)-)
\1\2
Remove useless chars in links:
(\].*?)(?:\(|\))
\1
And finally add parenthesis around final links:
\](?!\()(.*?)$
\]\(\1\)
And voilà! You can even put this in a global macro if you repeat it enough time.
Github Flavored Markdown uses RedCarpet as their Markdown engine.
From the RedCarpet repo:
:with_toc_data - add HTML anchors to each header in the output HTML,
to allow linking to each section.
It seems in that you'd need to get at the renderer level to set this flag, which isn't possible on Github obviously. However, the latest update to Github Pages, it seems that automatic anchoring is turned on for headers, creating linkable headings. Not exactly what you want, but it might help you create a TOC for your doc a bit easier (albeit manually).
A very convenient way to achieve a table of contents for a mardown file when working with Visual Studio Code is the extension Markdown-TOC.
It can add a toc to existing markdown files and even keep the toc up-to-date on saving.
It is possible to generate a webpage automatically with http://documentup.com/ from the README.md file. It's not creating a TOC, but for many it might solve the reason for wanting to create a TOC.
Another alternative to Documentup is Flatdoc: http://ricostacruz.com/flatdoc/
Gitdown is a markdown preprocessor for Github.
Using Gitdown you can:
Generate Table of Contents
Find dead URLs and Fragment Identifiers
Include variables
Include files
Get file size
Generate Badges
Print Date
Print information about the repository itself
Gitdown streamlines common tasks associated with maintaining a documentation page for a GitHub repository.
Using it is straightforward:
var Gitdown = require('gitdown');
Gitdown
// Gitdown flavored markdown.
.read('.gitdown/README.md')
// GitHub compatible markdown.
.write('README.md');
You can either have it as a separate script or have it as part of the build script routine (such as Gulp).
Use coryfklein/doctoc, a fork of thlorenz/doctoc that does not add "generated with DocToc" to every table of contents.
npm install -g coryfklein/doctoc
Majority of other answers require to install some tool.
I found a quick and easy online solution https://imthenachoman.github.io/nGitHubTOC.
For any markdown input it generates table of content output.
You can specify minimum and maximum heading level.
The source code is located at https://github.com/imthenachoman/nGitHubTOC
My colleague #schmiedc and I have created a GreaseMonkey script that installs a new TOC button left of the h1 button which uses the excellent markdown-js library to add/refresh a table of contents.
The advantage over solutions like doctoc is that it integrates into GitHub's wiki editor and does not need users to work on their command-line (and require users to install tools like node.js). In Chrome, it works by drag 'n dropping into the Extensions page, in Firefox you will need to install the GreaseMonkey extension.
It will work with plain markdown (i.e. it does not handle code blocks correctly, as that is a GitHub extension to markdown). Contributions welcome.
This is a not a direct answer to this question as so many people have provided workarounds. I don't think generating a TOC has been officially supported by Github yet to-date. If you want GitHub to render a Table of Contents on their GFM preview pages automatically, please participate the discussion on the official feature request issue.
Shameless "borrow" of this SO answer.
You can do this with Pandoc.
pandoc -s --toc input.md -o input_toc.md
Note: the order of the input and output files is important here.
Currently it's not possible using markdown syntax (see the ongoing discussion at GitHub), however you can use some external tools such as:
Online Table Of Content Generator (raychenon/play-table-of-contents)
arthurhammer/github-toc - browser extension that adds a table of contents to GitHub repos
Alternatively use AsciiDoc instead (e.g. README.adoc), e.g.
:toc: macro
:toc-title:
:toclevels: 99
# Title
## A
### A2
## B
### B2
as suggested in this comment. Check the demo here.
For Github's Texteditor Atom check out this awesome plugin (or "package" in Atom-lingo), which generates "TOC (table of contents) of headlines from parsed markdown" files:
markdown-toc
Once installed as Atom-package you can use the shortcut ctrl-alt-c to insert a TOC based on your markdown-doc-structure at the current cursor position...
Screenshots:
Atom Keybindings
markdown-toc gives you the following default key-bindings to control the plugin in Atom:
ctrl-alt-c => create TOC at cursor position
ctrl-alt-u => update TOC
ctrl-alt-r => delete TOC
Plugin Features (from the project's README)
Auto linking via anchor tags, e.g. # A 1 → #a-1
Depth control [1-6] with depthFrom:1 and depthTo:6
Enable or disable links with withLinks:1
Refresh list on save with updateOnSave:1
Use ordered list (1. ..., 2. ...) with orderedList:0
Here's a shell script I threw together today for this. Might need to tweak it for your needs, but it should be a good starting point.
cat README.md \
| sed -e '/```/ r pf' -e '/```/,/```/d' \
| grep "^#" \
| tail -n +2 \
| tr -d '`' \
| sed 's/# \([a-zA-Z0-9`. -]\+\)/- [\1](#\L\1)/' \
| awk -F'(' '{for(i=2;i<=NF;i++)if(i==2)gsub(" ","-",$i);}1' OFS='(' \
| sed 's/^####/ /' \
| sed 's/^###/ /' \
| sed 's/^##/ /' \
| sed 's/^#//'
If anyone knows a better way to do those final # replacements, please add a comment. I tried various things and wasn't happy with any, so I just brute forced it.
There's now a GitHub Action accomplishing this:
https://github.com/marketplace/actions/toc-generator
Specify location of TOC (option)
e.g. README.md
<!-- START doctoc -->
<!-- END doctoc -->
Setup workflow
e.g. .github/workflows/toc.yml
on: push
name: TOC Generator
jobs:
generateTOC:
name: TOC Generator
runs-on: ubuntu-latest
steps:
- uses: technote-space/toc-generator#v2
Update 2022-02
In VSCode, check out extension "Markdown All in One". It will generate and update the TOC of markdown automatically.
Install Extension.
Place cursor at where you want to insert TOC.
Run command "Markdown All in One: Create Table of Contents"
Enjoy!

Microsoft Word to Org-mode

I am trying to put the Microsoft Word document in emacs using org-mode. I have copied the Word Document and pasted in emacs. I like to achieve the headings like 7.1.2.4 in org-mode format.
and then link the TOC to appropriate headings. How I can do that? Any suggestions? Any programming language like Perl has done it?
Thanks.
There is ODT2ORG (https://bitbucket.org/josemaria.alkala/odt2org/wiki/Home) which lets you import odt files in org-mode.
Use Openoffice/Libreoffice to produce an .odt from your .doc.
Use odt2org to get an .org.
About the headings: I am not entirely sure I understand you.
there is org-toc.el included in org-mode that provides a seperate buffer with a TOC of your current document (like in Reftex). All the entries there are already links to the individual headings. Also, an exported document will have a TOC included by default without your intervention.
Orgmode does not support automatically numbered headings (yet). However, if you want to export your document to html, docbook, latex, or pdf, your headings will appear numbered and nested (you can tweak the settings quite a lot).
I doubt that you will get your intended result purely automatically but it should work 70% automatically, especially if you have latex installed and simply want to have a good-looking pdf in the end. Convert doc to odt, convert odt to org, open and type "C-c C-e d".
Another option: Save as an HTML file, then use Pandoc to convert the HTML to an .org file.
I've converted loads of Word documents into Org files. It takes minutes to do it by hand.
If you want cross-references, use internal links (4.2 in the current manual).
The * and ** style headings are always likely to be there in Org. Think of the use case where exports are compiled from #+INCLUDEd files, or you have done a selective export using tags. Any kind of single sourcing technology isn't going to display the numbering.
There is a ruby gem which converts doc to md. With pandoc you can convert to org.
https://github.com/benbalter/word-to-markdown