Easy way to get source files from doxygen documentation? - doxygen

I have found the source for a game that I am interested in modifying, and it's in the form of doxygen "documentation".
Under "files" I get html files with names not related to the cpp and h files, and also line numbers.
I was wondering if there is a way to convert the doxygen documentation into regular cpp and h files.

How many files are there? If not too many, a manual copy-paste perhaps...
html2text is a python program that is pretty good at converting html to, well, text.


Merge 2 pdf files and preserve forms

I'd like to merge at least 2 PDF files into one while preserving all the form elements in the original PDFs. The form elements include text fields, radio buttons, check boxes, drop down menus and others. Please have a look at this sample PDF file with forms:
Now try to merge it with any other arbitrary PDF file.
Can you do it?
EDIT: As for the implementation, I'd ideally prefer a command line solution on a linux plattform using open source tools such as 'ghostscript', or any other tool that you think is appropriate to solve this task.
Of course, everybody is welcome to supply any working solution to this problem, including a coded solution that involves writing a script which makes some API calls to a pdf-processing library. However, I'd suggest to take the path of least resistance first (CMD Solution).
Best Regards
EDIT #2: Well there are indeed several CMD tools that merge PDFs. However, these tools don't seem to, AFAIK, to preserve the forms in the original PDFs! These tools appear to simply just concatenate the printouts of all those PDFs into a single Printout, which is then presented as a single PDF.
Furthermore, If you printout a PDF file with forms into a file, you lose all the forms in it. This clearly not what I'm looking for.
I have found success using pdftk, which is an open-source software that runs on linux and can be called from your terminal.
To concatenate multiple pdfs into one (and preserve form-fillable elements), you can use the following command:
pdftk input1.pdf input2.pdf cat output output-file.pdf

Two closely matching files: get corresponding lines?

I'm in a situation where I'm programmatically generating LaTeX code, and I want my Synctex to point to the correct lines in the original file.
The generation is basically doing template expansion, so the original files are nearly identical to the generated ones, but with some snippets expanded.
I'm wondering, is there a diff tool or library that will easily give me the line number of the original file that corresponds to a given line in the generated one? Can this be extracted from a normal Unix diff somehow?
This is part of a build script, so ideally something easy to run, like bash or python, is preferred to something that needs to be compiled.
Google’s diff-match-patch lib is a neat solution to questions like these: https://github.com/google/diff-match-patch

Include *prewritten* documentation in Doxygen

To distinguish this question from Doxygen: Adding a custom link under the "Related Pages" section which has an accepted answer that is not a real answer to the question, I specifically add prewritten to the question.
What I want:
Write one document tex file (without preamble, since this file will be \input-ed into a full document)
Import the document into Doxygen's HTML output.
Using Doxygen to produce tex file will probably not work, since it does too much layout work [This holds for its HTML output too like empty table rows 2015]. If Doxygen takes some other input that can easily be transformed into LaTeX, that will do.
You can easily add an already existing Latex file to your doxygen documentation using \latexonly\input{yourfile}\endlatexonly.
I would assume you put it e.g. under a doxygen \page.

Microsoft Word to Org-mode

I am trying to put the Microsoft Word document in emacs using org-mode. I have copied the Word Document and pasted in emacs. I like to achieve the headings like in org-mode format.
and then link the TOC to appropriate headings. How I can do that? Any suggestions? Any programming language like Perl has done it?
There is ODT2ORG (https://bitbucket.org/josemaria.alkala/odt2org/wiki/Home) which lets you import odt files in org-mode.
Use Openoffice/Libreoffice to produce an .odt from your .doc.
Use odt2org to get an .org.
About the headings: I am not entirely sure I understand you.
there is org-toc.el included in org-mode that provides a seperate buffer with a TOC of your current document (like in Reftex). All the entries there are already links to the individual headings. Also, an exported document will have a TOC included by default without your intervention.
Orgmode does not support automatically numbered headings (yet). However, if you want to export your document to html, docbook, latex, or pdf, your headings will appear numbered and nested (you can tweak the settings quite a lot).
I doubt that you will get your intended result purely automatically but it should work 70% automatically, especially if you have latex installed and simply want to have a good-looking pdf in the end. Convert doc to odt, convert odt to org, open and type "C-c C-e d".
Another option: Save as an HTML file, then use Pandoc to convert the HTML to an .org file.
I've converted loads of Word documents into Org files. It takes minutes to do it by hand.
If you want cross-references, use internal links (4.2 in the current manual).
The * and ** style headings are always likely to be there in Org. Think of the use case where exports are compiled from #+INCLUDEd files, or you have done a selective export using tags. Any kind of single sourcing technology isn't going to display the numbering.
There is a ruby gem which converts doc to md. With pandoc you can convert to org.

Conversion between docx / doc / rtf and lightweight markup

I am looking for a tool or set of tools to convert between file formats D and M where
D is a format handled by MSWord, in order of preference, docx, doc, rtf
M is a lightweight markup, such as markdown, textile, txt2tags, it can be an esoteric one
there is a way to generate html from M
conversion is two-way, it's done both from D to M, and from M to D
utf-8 encoding is handled properly
the content is simple, paragraphs, some simple formatting like bold and italics, maybe lists
the tools are platform-independent
What I've found so far
TeX, LaTeX -- too heavyweight
docx2txt -- too lightweight, it supports no formatting at all
html -- MSWord produces bloated html
a few one-way conversions, like doc to mediawiki,
The use case is a document workflow between technical and non-technical people
I, the technical guy edit a document in plain text, put it into version control, etc.
I send it to my manager or other non-technical people
They add comments, make changes to it using their Word, then they send it back to me
I want to simply grok their changes, make my changes, put it into version control, without having to use Word
I think that Pandoc much more than meet all requirements.
Adam, I've used docx4j to convert docx to html, edit the html in CKEditor, and then use docx4j to convert the html back to docx. My process made some assumptions about the css (ie it was designed to handle docx4j's clean html, and editing in CKEditor).
You don't say whether there is a way to generate M from HTML?
This is probably hard to do two-way, since you will have impedance mismatches between the various formats.
The best world I can think of would be a sort of Wiki / Word hybrid: Maybe you can get Google Wave to do that for you?
Another solution that might work is a CMS like Plone (did they ever add WYSIWIG capability? I stopped caring after version 1). Keep your documents there. Let the system handle changes, annotations etc. You can automate retrieval of the source (should be ReStructuredText) and commit that to your source control if you have to.
This script I wrote might help you in your workflow:
It is a command-line PHP script that will only work with .docx files. It will extract the XML, run some XSL transformations, and provide you the result in Markdown format.
I encourage you to send me .docx files that don't convert accurately. I'd love to make this script as robust and reliable as possible.