OPEN XML add custom not visible data to paragraph/table - ms-word

Is there a way to store additional data for a paragraph, that would be persisted after user opens and saves a document in MS Word.
Ive been using CusotmXML for this, but it turns out that this is no logner possible due to the fact that MS-Word strips all CusotmXML elements from the document structure.
Every single paragraph or a table has an ID that I would like to "pair back" to my data-source.
So later when I read the docx again I can identify origins of every unchanged paragraph/table in the document.

A possibility would be to insert a SET field. This creates a bookmark in the document to which you can assign information. There's no way to protect it from the user removing it, however. A DATA field might also be a possibility.
Unlike "vanish" (which I believe is equivalent to "hidden" font format) the information would not display if the user is in the habit of displaying non-printing information. It will display, however, if the user toggles on field codes (Alt+F9).

You can have a divId on a paragraph, and in xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" there are attributes w14:textId and w14:paraId.
For example:
<w:p w14:textId="81a184ad" w14:paraId="81a184ad" >
<w:pPr>
<w:divId w:val="124349312"/>
See [MS-Docx] for details.
Alternatively, have a look at content controls, which you can wrap around paragraphs and tables (or put inside them). These have an ID property; they also let you store arbitrary text in their tag property. The string is limited in length to something like 120 chars.

A rather noddy solution, but have you cosidered using a custom run for your data and hiding it from displaying using Vanish
<w:rPr>
<w:vanish />
</w:rPr>
Adding vanish to run properties will hide the run from displaying and you might use this to store custom data with out affecting the output of the document.

Related

DOCVARIABLE in ms word Field has disappeared, and yet still appears to be functioning. How can I get it back?

First off, sorry if this is really basic, but I've been working with fields in a word document for the past few days and I'm finding them quite counterintuitive. I have a document with over 100 images, and I am sourceing those images using the INCLUDEPICTURE field. Inside that field there is a DOCVARIABLEwhich contains the path to the image. I set this up to display all 1000 images. I then copied this word file and made a new one because I had a second set of images to display. SoI copied and pasted a section of the image name in the field codes and replaced it with a new name, e.g. all "image_a" instances were replaced with "image_b" so instead of seeing "image_a_1.png" and "image_a_2-png", the field codes now show "image_b_1.png" and "image_b_2.png" etc. and this has successfully retrieved the correct images so the document looks good.
However after doing this I have noticed that the codes in the fields has now changed. beforehand at the start the appeared like this:
{ INCLUDEPICTURE "{ DOCVARIABLE "var_doc_path" }folderwithpics\\image_a_1.pgn" \d }
now however after the copy and paste this is what appears:
{ INCLUDEPICTURE "folderwithpics\\image_b_1.pgn" \* MERGEFORMAT \d }
The doc variable is no longer displayed. What's weird that is that the correct image is still sourced and displayed in the word document, so it seems that the docvarible which is essential for the field to reference the correct path, is still active.
There is a problem though, which is that in a new word document, I need to use INCLUDEIMAGE to source all of the 1000 images again into this new document, and they aren't getting displayed. I need to go back and manually enter in the full path for each of the images in order for the new word document to access those image.
I think this must have something to do with the fact that the correct path is no longer displayed. Can anyone help me? I think I need to get the document to display { DOCVARIABLE "var_doc_path" } in the INCLUDEPICTURE field again.
As a side note if anyone has a good guide they can reccommend on working with fields I think that would be a great help. Thanks!
Unless you copied the document via Windows or via SaveAs, rather than simply copying & pasting content from one document to another, the new document will not contain the Document Variable. By using the \d switch, Word is referencing a copy of the image stored in the document metadata rather than the one in the filepath it can no longer access via the DOCVARIABLE field.
FWIW, the \* MERGEFORMAT switch does nothing useful in an INCLUDEPICTURE field.

manipulating Microsoft Word DOCX files that have links and track changes using Python

I have been using the excellent python-docx package to read, modify, and write Microsoft Word files. The package supports extracting the text from each paragraph. It also allows accessing a paragraph a "run" at a time, where the run is a set of characters that have the same font information. Unfortunately, when you access a paragraph by runs, you lose the links, because the package does not support links. The package also does not support accessing change tracking information.
My problem is that I need to access change tracking information. Or, more specifically, I need to copy paragraphs that have change tracking indicated from one document to another.
I've tried doing this at the XML level. For example, this code snippet appends the contents of file1.docx to file2.docx:
from docx import Document
doc1 = Document("file1.docx")
doc2 = Document("file2.docx")
doc2.element.body.append(doc1.element.body)
doc2.save("file2-appended.docx")
When I try to open the file on my Mac for complicated files, I get this error:
But if I click OK, the contents are there. The manipulation also works without problem for very simple files.
What am I missing?
The .element attribute is really an "internal" interface and should be named ._element. In most other places I have named it that. What you're getting there is the root element of the document part. You can see what it is by calling:
print(doc2.element.xml)
That element has one and only one w:body element below it, which is what you get when with doc2.element.body (.xml will work on that too, btw, if you want to inspect that element).
What your code is doing is appending one body element at the end of another w:body element and thereby forming invalid XML. The WordprocessingML vocabulary is quite strict about what element can follow another and how many and so forth. The only surprise for me is that it actually sometimes works for you, I take it :)
If you want to manipulate the XML directly, which is what the ._element attribute is there for, you need to do it carefully, in view of the (complex) WordprocessingML XML Schema.
Unlike when you stick to the published API, there's no safety net once ._element (or .element) appears in your code.
Inside the body XML can be relationships to external document parts, like images and hyperlinks. These will only be valid within the document in which they appear. This might explain why some files can be repaired.

Are Sphinx Excerpts inserted into the data and thus accessible by MySql?

In reading more about Excerpts:
http://sphinxsearch.com/wiki/doku.php?id=php_api_docs#buildexcerpts_documents_index_words_options
I am still unclear if the inserted tags are internally used by sphinx to format the final display text or if they actually added to the mysql:
before_match is a string to insert before each set of matching words. The default is '<b>'.
In other words if I changed the string to some non display htlm e.g. <!-- START --> for before_match and <!-- END --> for after_match I could then search on those in mysql or is the search still just a zone inside the index?
No, buildExerpts does not touch the database at all. It does not even change the sphinx index either.
You simply pass it a block of text (or multiple) and a text query. And then it forms a new block of text for each, highlighting the provided terms. The new block(s) of text are provided by return.
... if you want to do something with that block of text (so can search it otherwise!) would then have save it somewhere for example.
(the 'inserted tags' you talk about, are just added in the new block of text, which is generally considered to be HTML, so would then just be displayed to the end user)

MS Word 2007 - How to set up placeholder text to mimic text but not formatting

I'm probably biting off more than I can chew with this particular problem, but I'll try to be as specific as possible in case it's within my scope. Disclaimer: I'm not terribly experienced with MS Word, beyond simple data entry/some formatting, and I have absolutely zero experience working with macros or VBasic. Unfortunately, I'm afraid the solution to my problem will come in the form of one of those last two.
THE GOAL:
What I want to do is to have placeholder text throughout my template document that will change content but not formatting when the first instance of it is changed. Basically, I'm writing a template for support manuals for a software suite. Each app has certain similar features like the menu bar, data entry screen, diagnostic log screen, transaction history, etc., so I am pre-writing those sections and using placeholders when I need to insert certain app specific properties.
I started off using the Insert->Quick Parts->Document Property->Subject tool which I used as a placeholder for the app name. I set the Property to [Subject] and then used Insert->Quick Parts->Field->Subject throughout the document, wherever I needed to include the app name. This worked fine in this case because the app name will always be capitalized. I simply change the text in the first [Subject] (which is content controlled) and update the fields throughout the document, and they all match nicely, easy-peasy, work done, go home and drink beer, right?
Not quite.
Our software handles part tracking via scanners and SQL Server, so while the interface and menu in the apps remains largely unchanged, the parts they track change from app to app. Because of this, I need to change the part name when I reference it within the text of the manuals; for example, if I'm working in ToiletPap.app and our TP is tracked by the roll, I need every mention of [Component] to be changed to roll. If I'm working in LightBulbs.app, I need [Component] to say bulb.
My first efforts went toward creating a custom doc property called Component using the Advanced tab under the Document Properties dropmenu. I then created a plaintext content control around my first [Component] titled Component and made my next [Component] a field with modified code: {COMPONENT * MERGEFORMAT}. This comes from copying what I can find when [Subject] works. This didn't work at all; updating the text in the first CC doesn't change the Content doc prop, and my fields return "!Undefined Bookmark, COMPONENT".
I got close to what I need by using the [Comments] doc property, set initially to [Component]. I used it just like [Subject], but (this is when I realized that capitalization was going to be an issue) when I mention my [component] in-text, as often as not, I need to to be lowercase instead of upper.
I've looked on MS's forums and a few others as well as here on SO, and I can't find anyone who's trying to do the same thing, much less an answer to how. Please keep in mind when answering, it would be a great help to me if you would include step-by-step instructions on how to enter/implement the code you provide because, as I mentioned, I have no idea how to go about editing macros/VBasic for MS Word.
To restate and summarize my overall question: How can I use a placeholder that displays the text "[Component]" so that, when I change the first instance of [Component] to something else, say "hopper", every subsequent instance of [Component] is updated to hopper but maintains its current capitalization and formatting scheme?
Apologies for the length of the request, but I wanted to make sure I explained the situation as accurately as possible. Thanks in advance for your consideration and responses.
I managed to solve this one after a couple extra hours of tinkering. I didn't need macros or VBasic, either.
On the first instance of [component] I created a plain-text content control to act as a container (not a necessity, but it makes it look nicer. Will likely cause a problem eventually, but for now, it's working as intended) and bookmarked it. Then, for all other instances of [container] I selected each and used Insert->Quick Parts->Field->Ref with the following field code:
REF Text1 \*Lower
Where "Text1" is my bookmark and "*Lower" indicates all lower case. The *Lower can be replaced with *Upper or *FirstCap to indicate all upper case or capitalize the first letter respectively. Now, each field reflects the text of the first with the capitalization appropriate to each field's location within the document. Just like using the doc prop with [Subject], ^a -> f9 is needed to update all fields within the document.

How to add a simple text label in a jqGrid form?

When using the Add or Edit form from the pager I'm wondering how a simple static label can be added in the form without it creating any additional columns in it's affect on colNames[]'s and colModel[]'s. For example I have a quite simple typical Add form which opens from the pager containing a few label's and form elements: Name, Email, Web Site, etc., and then the lower section of the form has a few drop down menus containing the number 1 through 10 with the idea being to ask the user to pick a value between 1 and 10 to put a value on the importance to them about the product or service which is listed beside it. Just above this section I want to add some text only to give a brief instruction asking the user to "Choose the importance of the following products and services using the scale: [1=Low interest --- 10=Very high interest]". I cannot figure out how to get a text label inserted in the form without having to define a column with a formoption{} etc which is not needed for just some descriptive text. I know about the "bottominfo: 'some text'" for adding text to the bottom of the form but I need to insert some text similar to that mid-way (or other positions) in the form without it affecting the tabular structure of the grid. Is this even possible? TIA.
You can modify Edit or Add forms inside of afterShowForm. The ids of the form fields are like "tr_Name". There consist from "tr_" prefix and the corresponding column name.
I modified the code example from my old answer so that in the Add dialod there exist an additional line with the bold text "Additional Information:". In the "Edit" dialog (like one want in the original question) the input field for one column is disabled. You can see the example live here. I hope that a working code example can say more as a lot of words.