I have created a document template (Test1.docx file) which has several MERGEFields in it. When I open the template in xml, I see some of the fields as SimpleFields and some as FieldCodes. Now, if I make some changes to the template (like adding some new template fields) and save it as Test2.docx and now when I open Test2.docx as an xml file, the field which came as SimpleField in the Test1.docx file's xml comes as FieldCode in the Test2.docx file's xml, even though I have not made any changes to that specific field.
When is a mergefield be recognized by openxml as simple field and fieldcode?
Related
One of my customers has reported seeing the following message when working in documents created from templates that I provided:
This file has Custom XML elements that are no longer supported in Word. Saving the file will remove these elements permanently.
I understand that this message relates to Custom XML Markup (https://learn.microsoft.com/en-us/office/troubleshoot/word/custom-xml-elements-not-supported). I can confirm that there is no Custom XML Markup in the document.
Any idea what could be causing this error to display?
I looked at document.xml of my Template. There are "w:sz", "w:szCs", "w:lang", "w:b" inside for "w:p" and "w:r". I opened my doc.docx and only saved using DocumentFormat.dll. I see that appropriate text is in "w:t" element of a "w:r" of a "w:p" element and ect in the output. I see that the "w:r" has "w:rPr" and there are "w:b", "w:rFonts w:ascii w:cs, w:hAnsi" inside of it. "w:p" also has some properties. But "w:sz", "w:szCs" are not present. I tried to create RunStyleProperties and to prepend to an existing paragraph like in "how can I change the font open xml" except i did not create a run, used foreach for existing paragrapths and runs. I tried to recreate methods that are described in "https://learn.microsoft.com/en-us/office/open-xml/how-to-apply-a-style-to-a-paragraph-in-a-word-processing-document",but all my attempts were not effective. The fact is the style was not applied, the text style has some default values.
I have been using the excellent python-docx package to read, modify, and write Microsoft Word files. The package supports extracting the text from each paragraph. It also allows accessing a paragraph a "run" at a time, where the run is a set of characters that have the same font information. Unfortunately, when you access a paragraph by runs, you lose the links, because the package does not support links. The package also does not support accessing change tracking information.
My problem is that I need to access change tracking information. Or, more specifically, I need to copy paragraphs that have change tracking indicated from one document to another.
I've tried doing this at the XML level. For example, this code snippet appends the contents of file1.docx to file2.docx:
from docx import Document
doc1 = Document("file1.docx")
doc2 = Document("file2.docx")
doc2.element.body.append(doc1.element.body)
doc2.save("file2-appended.docx")
When I try to open the file on my Mac for complicated files, I get this error:
But if I click OK, the contents are there. The manipulation also works without problem for very simple files.
What am I missing?
The .element attribute is really an "internal" interface and should be named ._element. In most other places I have named it that. What you're getting there is the root element of the document part. You can see what it is by calling:
print(doc2.element.xml)
That element has one and only one w:body element below it, which is what you get when with doc2.element.body (.xml will work on that too, btw, if you want to inspect that element).
What your code is doing is appending one body element at the end of another w:body element and thereby forming invalid XML. The WordprocessingML vocabulary is quite strict about what element can follow another and how many and so forth. The only surprise for me is that it actually sometimes works for you, I take it :)
If you want to manipulate the XML directly, which is what the ._element attribute is there for, you need to do it carefully, in view of the (complex) WordprocessingML XML Schema.
Unlike when you stick to the published API, there's no safety net once ._element (or .element) appears in your code.
Inside the body XML can be relationships to external document parts, like images and hyperlinks. These will only be valid within the document in which they appear. This might explain why some files can be repaired.
For word report generation, I am looking at binding XML to content controls to see if it is any easier than to use Word Interop and hardcode index reference to content controls to assign values to them.
However, I don't really understand how to do it.
My work flow is entering information in Excel and then generate an XML file to have content controls populated by XML, however, what I read is the other way round: Word Control Control Toolkit and descriptions where the XML is populated by user entering information in Word, and then programmer to unzip docx file to retrieve the XML file.
How can I populate content controls with XML?
There are samples on generating Word documents from Word templates, XML and data bound content controls # http://worddocgenerator.codeplex.com/
Set up the mapped content controls in the 'template' docx using the content control toolkit or similar. Do this using a sample XML file containing your Excel data.
Now you have that template document, at run time you can inject your XML file into it (ie replace the custom xml part it contains, with your instance data), in C# or Java or whatever.
When the user opens the document in Word 2007/2010, the information in the custom XML part will automatically be copied into the bound controls, and visible to the user.
Note that content control data binding doesn't easily support repeating data (eg populating table rows) in Word 2007/2010, though there are ways to do it.
I want to create a word 2007 document without using object model. So I would prefer to create it using open xml format. So far I have been able to create the document. Now I want to add a content control in it and map it to xml. Can anybody guide me regarding the same???
Anoop,
You said that you are able to creat the document using OpenXmlSdk. With that assumption, you can use the following code to create the content control to add to the Wordprocessing.Body element of your Document.
//praragraph to be added to the rich text content control
Run run = new Run(new Text("Insert any text Here") { Space = StaticTextConstants.Preserve });
Paragraph paragraph = new Paragraph(run);
SdtProperties sdtPr = new SdtProperties(
new Alias { Val = "MyContentCotrol" },
new Tag { Val = "_myContentControl" });
SdtContentBlock sdtCBlock = new SdtContentBlock(paragraph);
SdtBlock sdtBlock = new SdtBlock(sdtPr, sdtCBlock);
//add this content control to the body of the word document
WordprocessingDocument wDoc = WordprocessingDocument.Open(path, true); //path is where your word 2007 file is
Body mBody = wDoc.MainDocumentPart.Document.Body;
mBody.AppendChild(sdtBlock);
wDoc.MainDocumentPart.Document.Save();
wDoc.Dispose();
I hope this answers a part of your question. I did not understand what you ment by "Map it to XML". Did you mean to say you want to create CustomXmlBlock and add the ContentControl to it?
Have a look for the Word Content Control Toolkit on www.codeplex.com.
Here is a very brief explanation on how to do what you are attempting.
You need to have access to the developer tab on the Word ribbon. To get this working click on the Office (Round thingy) in the top left hand corner and Select Word Options at the bottom of the menu. On the first options page there is a checkbox to show the developer toolbar.
Use the developer toolbar to add the Content controls you want on the page. Click the properties button in the Content controls section of the developer bar and set the name and tag properties (I stick to naming the name and tag fields with the same name).
Save and close the word document.
Open the Content control toolkit and then open your document with the toolkit. Use the left hand pain to create some custom xml to link to your controls.
Now use the bind view to drag and drop the mappings between your custom xml and the custom controls that are displayed in the right panel of the toolkit.
You can use the openxml sdk 1.0 or 2.0 (still in ctp) to open your word document in code and access the custom xml file that is contained as part of the word document.
If you want to have a look at how your word document looks as xml. Make a copy of your word document and then rename it to say "a.zip". Double click on the zip file and then navigate the folder structure. The main content of the word document is held under the word folder in a file called "document.xml". The custom xml part of the document is held under the customXml folder and is generally found in the file named "item1.xml".
I hope this brief explanation get you up and running.