detect changes in MS word document programatically - ms-word

How to detect changes in particular section of a word document. There could be many changes in other part of the document which should be ignored. I need to see if the doc is changed in the particular section only. Is there any way of finding this out?

The object WordApplication has the property MODIFIED, so, you can to use:
if xxxx.Modified then showmessage( 'doc edited' )

Related

DOCVARIABLE in ms word Field has disappeared, and yet still appears to be functioning. How can I get it back?

First off, sorry if this is really basic, but I've been working with fields in a word document for the past few days and I'm finding them quite counterintuitive. I have a document with over 100 images, and I am sourceing those images using the INCLUDEPICTURE field. Inside that field there is a DOCVARIABLEwhich contains the path to the image. I set this up to display all 1000 images. I then copied this word file and made a new one because I had a second set of images to display. SoI copied and pasted a section of the image name in the field codes and replaced it with a new name, e.g. all "image_a" instances were replaced with "image_b" so instead of seeing "image_a_1.png" and "image_a_2-png", the field codes now show "image_b_1.png" and "image_b_2.png" etc. and this has successfully retrieved the correct images so the document looks good.
However after doing this I have noticed that the codes in the fields has now changed. beforehand at the start the appeared like this:
{ INCLUDEPICTURE "{ DOCVARIABLE "var_doc_path" }folderwithpics\\image_a_1.pgn" \d }
now however after the copy and paste this is what appears:
{ INCLUDEPICTURE "folderwithpics\\image_b_1.pgn" \* MERGEFORMAT \d }
The doc variable is no longer displayed. What's weird that is that the correct image is still sourced and displayed in the word document, so it seems that the docvarible which is essential for the field to reference the correct path, is still active.
There is a problem though, which is that in a new word document, I need to use INCLUDEIMAGE to source all of the 1000 images again into this new document, and they aren't getting displayed. I need to go back and manually enter in the full path for each of the images in order for the new word document to access those image.
I think this must have something to do with the fact that the correct path is no longer displayed. Can anyone help me? I think I need to get the document to display { DOCVARIABLE "var_doc_path" } in the INCLUDEPICTURE field again.
As a side note if anyone has a good guide they can reccommend on working with fields I think that would be a great help. Thanks!
Unless you copied the document via Windows or via SaveAs, rather than simply copying & pasting content from one document to another, the new document will not contain the Document Variable. By using the \d switch, Word is referencing a copy of the image stored in the document metadata rather than the one in the filepath it can no longer access via the DOCVARIABLE field.
FWIW, the \* MERGEFORMAT switch does nothing useful in an INCLUDEPICTURE field.

manipulating Microsoft Word DOCX files that have links and track changes using Python

I have been using the excellent python-docx package to read, modify, and write Microsoft Word files. The package supports extracting the text from each paragraph. It also allows accessing a paragraph a "run" at a time, where the run is a set of characters that have the same font information. Unfortunately, when you access a paragraph by runs, you lose the links, because the package does not support links. The package also does not support accessing change tracking information.
My problem is that I need to access change tracking information. Or, more specifically, I need to copy paragraphs that have change tracking indicated from one document to another.
I've tried doing this at the XML level. For example, this code snippet appends the contents of file1.docx to file2.docx:
from docx import Document
doc1 = Document("file1.docx")
doc2 = Document("file2.docx")
doc2.element.body.append(doc1.element.body)
doc2.save("file2-appended.docx")
When I try to open the file on my Mac for complicated files, I get this error:
But if I click OK, the contents are there. The manipulation also works without problem for very simple files.
What am I missing?
The .element attribute is really an "internal" interface and should be named ._element. In most other places I have named it that. What you're getting there is the root element of the document part. You can see what it is by calling:
print(doc2.element.xml)
That element has one and only one w:body element below it, which is what you get when with doc2.element.body (.xml will work on that too, btw, if you want to inspect that element).
What your code is doing is appending one body element at the end of another w:body element and thereby forming invalid XML. The WordprocessingML vocabulary is quite strict about what element can follow another and how many and so forth. The only surprise for me is that it actually sometimes works for you, I take it :)
If you want to manipulate the XML directly, which is what the ._element attribute is there for, you need to do it carefully, in view of the (complex) WordprocessingML XML Schema.
Unlike when you stick to the published API, there's no safety net once ._element (or .element) appears in your code.
Inside the body XML can be relationships to external document parts, like images and hyperlinks. These will only be valid within the document in which they appear. This might explain why some files can be repaired.

Word 2010 - How can I prevent Word to update linked images

i`m working at a mental illness medical facility.
For our documents we are using Word templates which contains the header and footer as an linked image.
The advantage of this is if we have to change something in the header, we have to change one image and all documents are getting these changes.
Now we have the problem that this image changes even if Word is in read-only mode or has been released.
This also affects documents that are sent to patients or doctors and then a copy is printed for the patient record. If the image changes at this point in time, the documents are different and document authenticity is no longer guaranteed.
Is it somehow possible to prevent Word from updating the image when it is in read-only mode?
EDIT:
Setting "Update links on Open" (File->Options->Advanced->General) is turned off
File>Options>Advanced and scroll down to the General section and uncheck the box for "Update automatic links at open."
Hope this helps...
Or You Can Try The Macro:
Sub AutoOpen()
With Options
.UpdateFieldsAtPrint = false
.UpdateLinksAtPrint = false
End With
ActiveDocument.Fields.Update
End Sub

OPEN XML add custom not visible data to paragraph/table

Is there a way to store additional data for a paragraph, that would be persisted after user opens and saves a document in MS Word.
Ive been using CusotmXML for this, but it turns out that this is no logner possible due to the fact that MS-Word strips all CusotmXML elements from the document structure.
Every single paragraph or a table has an ID that I would like to "pair back" to my data-source.
So later when I read the docx again I can identify origins of every unchanged paragraph/table in the document.
A possibility would be to insert a SET field. This creates a bookmark in the document to which you can assign information. There's no way to protect it from the user removing it, however. A DATA field might also be a possibility.
Unlike "vanish" (which I believe is equivalent to "hidden" font format) the information would not display if the user is in the habit of displaying non-printing information. It will display, however, if the user toggles on field codes (Alt+F9).
You can have a divId on a paragraph, and in xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" there are attributes w14:textId and w14:paraId.
For example:
<w:p w14:textId="81a184ad" w14:paraId="81a184ad" >
<w:pPr>
<w:divId w:val="124349312"/>
See [MS-Docx] for details.
Alternatively, have a look at content controls, which you can wrap around paragraphs and tables (or put inside them). These have an ID property; they also let you store arbitrary text in their tag property. The string is limited in length to something like 120 chars.
A rather noddy solution, but have you cosidered using a custom run for your data and hiding it from displaying using Vanish
<w:rPr>
<w:vanish />
</w:rPr>
Adding vanish to run properties will hide the run from displaying and you might use this to store custom data with out affecting the output of the document.

MS Word 2007 - How to set up placeholder text to mimic text but not formatting

I'm probably biting off more than I can chew with this particular problem, but I'll try to be as specific as possible in case it's within my scope. Disclaimer: I'm not terribly experienced with MS Word, beyond simple data entry/some formatting, and I have absolutely zero experience working with macros or VBasic. Unfortunately, I'm afraid the solution to my problem will come in the form of one of those last two.
THE GOAL:
What I want to do is to have placeholder text throughout my template document that will change content but not formatting when the first instance of it is changed. Basically, I'm writing a template for support manuals for a software suite. Each app has certain similar features like the menu bar, data entry screen, diagnostic log screen, transaction history, etc., so I am pre-writing those sections and using placeholders when I need to insert certain app specific properties.
I started off using the Insert->Quick Parts->Document Property->Subject tool which I used as a placeholder for the app name. I set the Property to [Subject] and then used Insert->Quick Parts->Field->Subject throughout the document, wherever I needed to include the app name. This worked fine in this case because the app name will always be capitalized. I simply change the text in the first [Subject] (which is content controlled) and update the fields throughout the document, and they all match nicely, easy-peasy, work done, go home and drink beer, right?
Not quite.
Our software handles part tracking via scanners and SQL Server, so while the interface and menu in the apps remains largely unchanged, the parts they track change from app to app. Because of this, I need to change the part name when I reference it within the text of the manuals; for example, if I'm working in ToiletPap.app and our TP is tracked by the roll, I need every mention of [Component] to be changed to roll. If I'm working in LightBulbs.app, I need [Component] to say bulb.
My first efforts went toward creating a custom doc property called Component using the Advanced tab under the Document Properties dropmenu. I then created a plaintext content control around my first [Component] titled Component and made my next [Component] a field with modified code: {COMPONENT * MERGEFORMAT}. This comes from copying what I can find when [Subject] works. This didn't work at all; updating the text in the first CC doesn't change the Content doc prop, and my fields return "!Undefined Bookmark, COMPONENT".
I got close to what I need by using the [Comments] doc property, set initially to [Component]. I used it just like [Subject], but (this is when I realized that capitalization was going to be an issue) when I mention my [component] in-text, as often as not, I need to to be lowercase instead of upper.
I've looked on MS's forums and a few others as well as here on SO, and I can't find anyone who's trying to do the same thing, much less an answer to how. Please keep in mind when answering, it would be a great help to me if you would include step-by-step instructions on how to enter/implement the code you provide because, as I mentioned, I have no idea how to go about editing macros/VBasic for MS Word.
To restate and summarize my overall question: How can I use a placeholder that displays the text "[Component]" so that, when I change the first instance of [Component] to something else, say "hopper", every subsequent instance of [Component] is updated to hopper but maintains its current capitalization and formatting scheme?
Apologies for the length of the request, but I wanted to make sure I explained the situation as accurately as possible. Thanks in advance for your consideration and responses.
I managed to solve this one after a couple extra hours of tinkering. I didn't need macros or VBasic, either.
On the first instance of [component] I created a plain-text content control to act as a container (not a necessity, but it makes it look nicer. Will likely cause a problem eventually, but for now, it's working as intended) and bookmarked it. Then, for all other instances of [container] I selected each and used Insert->Quick Parts->Field->Ref with the following field code:
REF Text1 \*Lower
Where "Text1" is my bookmark and "*Lower" indicates all lower case. The *Lower can be replaced with *Upper or *FirstCap to indicate all upper case or capitalize the first letter respectively. Now, each field reflects the text of the first with the capitalization appropriate to each field's location within the document. Just like using the doc prop with [Subject], ^a -> f9 is needed to update all fields within the document.