Script for parsing winword doc document - ms-word

MS Office Word document has the following structure:
Title(line, font size 14)
Description(paragraph, font size 12)
Some other paragraphs of text(font size 12)
I need script to extract Titles and Descriptions from this document(and, for example, to put them into Excel table). ANy ideas about such script?

Well, you may need to clarify the question a little more. But as far as script goes,
1) Use VBA (It's part of Word), you could easily write a little VBA macro that would load up a doc, parse that info, write it to a CSV file and close the doc, then repeat for the next doc.
2) VBScript (basically same as above but use VBScript as the language, similiar but not exactly the same as VBA).
3) JavaScript (same idea as VBScript above).
4) Powershell (a .net way to do the same thing).
Personally, if this is a one off deal, I'd go with VBA and be done with it.

Related

Generate .hhk file From Word Document

I am trying to convert MS Word file to chm file. I have a well organized word document. But,I could not figure out how to word saved as a html file to chm file. I know I can add html file to created project but there are some issue such that I could not solve how to convert ms word table of content file to index file in html help workshop program. I would be very happy If someone provide some example about conversion of word documents.(I am trying to achieve this thorough HTML Help Workshop program)
Best regards,
Converting a Word document to CHM format is difficult without special (often expensive) tools and has a learning curve.
You should think about whether the PDF format is not sufficient. But the CHM format - integrated in the Windows operating system - has of course some popular functions.
I recommend to read through Search and Index not working after converting from Word 2016 to CHM.
As I mentioned in my answer I never used chmProcessor before (because using other tools) but surprisingly seems to be a good one for converting Word documents in a simple way.
Please try chmProcessor for your needs. You may want to ask a new question here on SO later.
Edit:
Maybe you have additional interest in the following CodeProject article:
How to Easily Write a User's Guide for Your Application using Different File Extensions

Word 2010 additional file format

I'm not sure whether this is the best approach for this or whether I perhaps should ask the question more clearer.
What I want to do is to create an additional file output - e.g. if the user uses Word to create a description consisting of known tags, I want to be able to save this as bbcode.
Now I do have an idea of how to do this, but is there a way to say add another file format to the "Save file"-dialog box and have it run a parser and file writer, that'd read the current document and export it using known bbcode-tags (that perhaps would be adjustable from some configuration window)?
The result would be a file containing bbcode as well as the text information that the user has entered.
How would I hook up my addin to the file output dialog? Is there a way to do this? I'm not sure it's custom XML since I won't be using the XML at all.
Thanks in advance and please excuse my poor English.
Edit: after having a look at the Word 2010 AddIn-project, I figured, that I'm looking for a way to define my own "export"-format. I'd like to export the BBCode to a .txt (or even .bbcode) file. The Microsoft.Office.Interop.Word.WdExportFormat seems to have its own fixed enumeration. Is there a way to add an export-format?
There is some code for this here:
phpbb.com/community/viewtopic.php?f=17&t=395554

Is there a way to show all docvariables from a word file?

I have a Microsoft-Word File which contains several DocVariables.
In our application we fill/replace these DocVariables with content.
With the shortcut Alt+F9 I can switch in a mode in which I can see the DocVariable.
But in the document I have now, there are DocVariable which I cannot see.
Is there a way/mode in Word 2007 in which I can see all the DocVariables which are defined in the Word-File?
As far as I know there is not a way to do this with MS Word's built in features. You could write a custom VBA script that would get a list of all the DocVariables. But even easier than that I use the following program when I need to do what you are saying: http://gregmaxey.mvps.org/word_tip_pages/cc_var_bm_doc_prop_tools_addin.html
It is a free add-in for Word that has done the job very well the times I used it.

How to save a document in ms word 2003 using command prompt?

Please help, How to save a document in ms word 2003 using command prompt?
The only thing I know about cmd is making a directory(mkdir), open ms word(win word), and hide rar files to jpeg files. And moving files from 1 directory to another.
You can open Word document from the command prompt (starting a new Word process), but there is no easy way of sending any commands to a runnning instance of Word by a simple command line script. If you want to save Word documents programmatically, you can, for example, use VBA ("macros") or VBScript for it. But it would make much more sense if you change the Word document programmatically before, so I suggest that you first make yourself comfortable with VBA.
AFAIK there's no direct way to send a command from command line to words UI. You have to imploy a tool or trick here:
Using an autostart macro was sufficient if you want to convert data like opening a txt or html file with the command line and save it as a doc file with the autostart macro. It may even work to shut word down again within that autostart macro.
Another possibility is a kind of Win-GUI-recorder like AutoIT. This can create scripts or exes containing a script that record some actions you have previously shown it yourself (and much much more). Take a look at their pages at http://www.autoitscript.com/autoit3/.
And a third possibility is Word's ActiveX-IF that can be acessed by any programming system (even AutoIT).
Greetings from Germany!
LuI

How can I search and replace in a PDF document using Perl?

Does anyone know of a free Perl program (command line preferable), module, or anyway to search and replace text in a PDF file without using it like an editor.
Basically I want to write a program (in Perl preferably) to automate replacing certain words (e.g. our old address) in a few hundred PDF files. I could use any program that supports command line arguments. I know there are many modules on CPAN that manipulate or create pdfs but they don't have (that I've seen) any sort of simple search and replace.
Thanks in advance for any and all advice!!!
Take a look at CAM::PDF. More specifically the changeString method.
How did you generate those PDFs in the first place? Search-and-replace in the original sources and re-generate PDFs seems to be more viable. Direct editing PDFs can be very difficult, and I'm not aware of any free tools that can do it easily.