Generate .hhk file From Word Document - ms-word

I am trying to convert MS Word file to chm file. I have a well organized word document. But,I could not figure out how to word saved as a html file to chm file. I know I can add html file to created project but there are some issue such that I could not solve how to convert ms word table of content file to index file in html help workshop program. I would be very happy If someone provide some example about conversion of word documents.(I am trying to achieve this thorough HTML Help Workshop program)
Best regards,

Converting a Word document to CHM format is difficult without special (often expensive) tools and has a learning curve.
You should think about whether the PDF format is not sufficient. But the CHM format - integrated in the Windows operating system - has of course some popular functions.
I recommend to read through Search and Index not working after converting from Word 2016 to CHM.
As I mentioned in my answer I never used chmProcessor before (because using other tools) but surprisingly seems to be a good one for converting Word documents in a simple way.
Please try chmProcessor for your needs. You may want to ask a new question here on SO later.
Edit:
Maybe you have additional interest in the following CodeProject article:
How to Easily Write a User's Guide for Your Application using Different File Extensions

Related

Word 2010 additional file format

I'm not sure whether this is the best approach for this or whether I perhaps should ask the question more clearer.
What I want to do is to create an additional file output - e.g. if the user uses Word to create a description consisting of known tags, I want to be able to save this as bbcode.
Now I do have an idea of how to do this, but is there a way to say add another file format to the "Save file"-dialog box and have it run a parser and file writer, that'd read the current document and export it using known bbcode-tags (that perhaps would be adjustable from some configuration window)?
The result would be a file containing bbcode as well as the text information that the user has entered.
How would I hook up my addin to the file output dialog? Is there a way to do this? I'm not sure it's custom XML since I won't be using the XML at all.
Thanks in advance and please excuse my poor English.
Edit: after having a look at the Word 2010 AddIn-project, I figured, that I'm looking for a way to define my own "export"-format. I'd like to export the BBCode to a .txt (or even .bbcode) file. The Microsoft.Office.Interop.Word.WdExportFormat seems to have its own fixed enumeration. Is there a way to add an export-format?
There is some code for this here:
phpbb.com/community/viewtopic.php?f=17&t=395554

Best way to get a database friendly list of Veteran Affairs Hospital

I sincerely apologize if this isn't the proper forum to discuss this, but I wasn't sure where to go or what would be the best option.
Basically, I'm trying to find a database friendly list of veteran affairs hospitals. The closest thing that I've been able to find is www.va.gov/ofcadmin/docs/CATB.pdf as it has all the information I'm looking for:
Region
Address
City in a separate column
Zip Code in a separate column
State
Facility # (also known as StationID)
VISN
Symbol
I've tried exporting that PDF out into CSV but it's a complete nightmare to get working. So, I was curious if anyone had any ideas or insights into how I could accomplish this task.
First, here's a CSV file containing the data found in CATB.pdf. The very first line contains the column headers, and the rest of the file contains the contents.
http://tmp.alexloney.com/CATB.csv
Now, for the more detailed explanation...I took the PDF you provided a link to, converted it to an HTML document using Adobe Acrobat, then I used a lot of Regular Expressions to parse the file and clean it up. Once the file was cleaned up enough, I was able to write a program to parse through the remainder of the file, grab the state and region, and spit it all out in a nicely formatted CSV.
Hope that helps you!
I believe that PDFILL has an option in it that will convert a PDF file to Excell. Once in Excell you should have no problem converting to a CSV file.

How to convert Word 2007 document to PDF using Apache FOP

I am currently using Apache FOP and have a stylesheet (possibly from RenderX) that converts Word 2003 XML documents (Saved as XML option) to PDF. However, this does not work for Word 2007 XML documents.
I am looking for options and/or suggestions on how to accomplish one of the following tasks -
Get a stylesheet that will transform Word 2007 XML file to:
Word 2003 XML or
PDF using FOP (using a stylesheet to create xsl-fo)
I am also open to any other options you might have. If possible I would like to do this with little to no cost. However, I am limited to using Java so a C# type option is not possible.
Thanks,
You could try docx4j, an open source Java library (ASL v2) which uses FOP to create PDFs from docx files.
I'm not aware of any style sheets that do this transformation. It would be reasonably sophisticated. If you end up having to engineer another way of doing it, you might want to look at JODConverter (straight conversion - might be your best bet), the OpenOffice UNO API (very manual), JODReports or Docmosis (both can produce documents in various formats). All can produce PDFs from a Java environment. I think they all have free versions.
Hope that helps.

Understanding WordProcessingML tags and avoid unnecessary tags

I am using MS Word API to generate .docx which contains the data fetched from DB, in which i am applying the respective styles, fonts, symbols, etc. If the data fetched from the DB is quite huge, then there is a problem in displaying those data in the .docx file. I found that internally MS Word 2007 will write some content through tags which may not be needed to display the data. Hence i am figuring out what are the necessary MS Word tags needed when converting into a .xml file. So that i can avoid unnecessary tags and build only the respective tags which are needed to display the data. Hence i am planning to write my own .xml with the MS Word tags which are needed, than generating a .XML from .docx file
My queries are:-
1) Whether it is right that the MS Word will generate some tags which may not be needed during the conversion of .docx to document.xml? That makes it heavy? If so what are the tags , so that i can avoid them when write by own .xml file.
2) Please send links to understand about the MS Word tags and its advantages, which tags are needed and which are not ?
3) Whether my approach to write a new .xml similar to document.xml (.docx conversion) is worthy one to go forward so that i can build the .xml with the tags i needed , so that i can improve the performance of the data display?
Please shed some light into it and thanks in advance..
Thanks,
Rithu
You'll want to learn WordprocessingML in much more detail to do this. It certainly isn't impossible, but it is quite a learning curve to start with. Probably the best place to start is with this eBook. If you go the manual route, you'll need a zip technology. If you're in Visual Studio, you can make the writing of all of this easier by using the Open XML SDK.
As to your questions on 'unnecessary tags', it's hard to believe that there would be much at all in the file that is unnecessary. But that depends on what you consider not needed - for example, if a word is caught as mispelled, there will be "dirty=1" attribute on the Run tag. If you're okay with displaying mispelled words, then that could be considered unnecessary. Really depends on what you're displaying for and in what.

How to generate Microsoft Word documents using Sphinx

Sphinx supports a few output formats:
Multiple HTML files (with html or dirhtml)
Latex which is useful for creating .pdf or .ps
text
How can I obtain output in a Microsoft Word file instead?
With another doc generator I managed to generate a single html output file and then convert it to Microsoft Word format using the Word application.
Unfortunately I don't know a way to generate either Word or the HTML single-page format.
The solution I use is singlehtml builder like andho mentioned in the comment, then convert the html to docx using pandoc.
The following sample assumes the generated html would be located at _build/singlehtml/index.html
make singlehtml
cd _build/singlehtml/
pandoc -o index.docx index.html
There is a Sphinx extension for generating docx format (which I haven't tested) and a newer one (which I also haven't tested, but looks like it is more actively maintained)
To convert files in restructured text to MSdoc, I use rst2odt and next unoconv. Look next script:
#!/bin/sh
rst2odt $1 $1.odt
unoconv -f doc $1.odt
rm $1.odt
With rst2odt you can use your own stylesheet: unoconv comes with OpenOffice and also allows to apply an Open Office style (template) during the conversion. Simply edit a converted document, change styles, add headers and footers, save that as an ODF Text Document Template (OTT) and use this as part of the conversion, like:
unoconv -f doc -t template.ott $1.odt
to use that template for various conversions later on.
I realize this is an old question, but I found that LibreOffice supports the following way of doing conversion (assuming soffice.exe is in your path):
soffice.exe --invisible --convert-to doc myInputFile.odt
Some things I have read say to use the --headless option rather than --invisible. Both seem to work on Windows.
You can start with the rst2odt.py script and then do the above to convert to an MS Word document.
Here is a link with additional start up options for LibreOffice:
http://help.libreoffice.org/Common/Starting_the_Software_With_Parameters
Here is a link with file types supported by OpenOffice which, I believe, LibreOffice should also support:
http://wiki.services.openoffice.org/wiki/Framework/Article/Filter/FilterList_OOo_3_0
This answer is not a command line solution and it is not obviously the best, but it simply works for me and save my time. After generating html file 1, you can open the generated html with a browser and copy the entire page (Crtl + a and Ctrl+ c) and then run Microsoft Office(or use live version if you don't have Microsoft Windows, like me) and paste (Ctrl+v) to it.
The best option might be rst -> odt -> doc
Convert the sphinx documents into openoffice format.
Then convert open the odt with openoffice and saved to Word. But I don't know how to do this automatically.
This is a workaround using Calibre (https://calibre-ebook.com), which includes a powerful converter. This worked well and most of the formatting are preserved:
Generate epub output in Sphinx make epub
Import epub output into Calibre and then convert epub to docx using inbuilt ebook converter.
Answer is too late for the original question, but people looking at the same problem may find this useful.
I don't now what Sphinx is, but you could create a rtf file or html file or something similar.
See the following blogpost for more information/approaches : OFFICE AUTOMATION
and from there : How to use ASP to generate a Rich Text Format (RTF) document to stream to Microsoft Word
This article describes how you can generate Rich Text Format (RTF) files with ASP script and then stream those files to Microsoft Word. This technique provides an alternative to server-side Automation of Microsoft Word for run-time document generation.
You don't use ASP script (who does :-) ), but for the idea.