OpenXmlDoc Word: How to add Mark an item for indexing? - openxml

I am using DocumentFormat.OpenXml to generate a Word document programmatically.
In the conceptual content for Word, I can't find any mention for marking items so that they can be included in the document index. What classes can I use for this?
I did a very simple document containing just This is an apple. with an an apple marked for indexing with Main entry: apple. The Xml content is as follows:
<w:p w:rsidR="000975CB" w:rsidRDefault="00B83C06" w:rsidP="007969F3">
<w:r>
<w:t xml:space="preserve">This is </w:t>
</w:r>
<w:r w:rsidR="007969F3"><w:t>an apple</w:t></w:r>
<w:r><w:fldChar w:fldCharType="begin"/></w:r>
<w:r><w:instrText xml:space="preserve"> XE "</w:instrText></w:r>
<w:r w:rsidRPr="00246108"><w:instrText>apple</w:instrText></w:r>
<w:r><w:instrText xml:space="preserve">" </w:instrText></w:r>
<w:r><w:fldChar w:fldCharType="end"/></w:r>
<w:bookmarkStart w:id="0" w:name="_GoBack"/>
<w:bookmarkEnd w:id="0"/>
<w:r><w:t>.</w:t></w:r>
</w:p>
<w:p w:rsidR="007969F3" w:rsidRPr="007969F3" w:rsidRDefault="007969F3" w:rsidP="007969F3"/>
...

An item marked for indexing is a field code (w:instrText) of type XE. A field code has to be inside a complex field character (w:fldChar).
The pertinent markup in the sample document is:
<w:r><w:fldChar w:fldCharType="begin"/></w:r>
<w:r><w:instrText xml:space="preserve"> XE "</w:instrText></w:r>
<w:r><w:instrText>apple</w:instrText></w:r>
<w:r><w:instrText xml:space="preserve">" </w:instrText></w:r>
<w:r><w:fldChar w:fldCharType="end"/></w:r>
The field code elements shown above can be combined together as:
<w:r><w:instrText>XE "apple"</w:instrText></w:r>
Sample code to produce above:
Tuple<Run,Run,Run> MarkEntryForIndex(string item)
{
Run run0 = new Run();
FieldChar fieldChar1 = new FieldChar() { FieldCharType = FieldCharValues.Begin };
run0.Append(fieldChar1);
Run run1 = new Run();
FieldCode fieldCode1 = new FieldCode() { Space = SpaceProcessingModeValues.Preserve };
fieldCode1.Text = " XE \"" + item + "\"";
run1.Append(fieldCode1);
Run run2 = new Run();
FieldChar fieldChar2 = new FieldChar() { FieldCharType = FieldCharValues.End };
run2.Append(fieldChar2);
return Tuple.Create(run0, run1, run2);
}

Related

Why LibreOffice doesn't show AltChunks in docx document?

I use OpenXML to create Word document, and use AltChunks to copy part of other document in this one. Here just a simple snippet:
const string _altChunkID = "AltChunkID1111";
var chunk = mainPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.Html, _altChunkID);
string html =
#"<html>
<head/>
<body>
<h1>Html Heading</h1>
<p>This is an html document in a string literal.</p>
</body>
</html>";
using (Stream chunkStream = chunk.GetStream(FileMode.Create, FileAccess.Write))
using (StreamWriter stringStream = new StreamWriter(chunkStream))
stringStream.Write(html);
chunk.FeedData(ms);
var altChunk = new AltChunk {Id = _altChunkID};
mainPart.Document
.Body
.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
And it works. I open Word and see this AltChunk. But LibreOffice doesn's show me this AltChunk. In case when i open document in Word and save if LibreOffice starts to show AltChunk.
So what i doing wrong? Why is Microsoft Word showing document properly and LibreOffice isn't?

Insert OOXML comment with track changes

In my Word add-in I am inserting comments by replacing the selected text with OOXML that contains the comment.
With "Track Changes" turned on Word registers this as 3 actions: delete+insert+comment. It even inserts a paragraph break but I am unsure if that is related.
Is there a way to only have it register as a comment action like when inserting a comment using Word functionality?
Using rangeObject.insertOoxml I tried to insert at the beginning and at the end of the string without any luck since the two OOXML inserts does not seem to relate (which makes sense):
Word.run(function (context) {
var range = context.document.getSelection();
var preBody = '<?xml version="1.0" encoding="UTF-8"?><pkg:package xmlns:pkg="http://schemas.microsoft.com/office/2006/xmlPackage"><pkg:part pkg:name="/_rels/.rels" pkg:contentType="application/vnd.openxmlformats-package.relationships+xml" pkg:padding="512"><pkg:xmlData><Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"><Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument" Target="word/document.xml" /></Relationships></pkg:xmlData></pkg:part><pkg:part pkg:name="/word/_rels/document.xml.rels" pkg:contentType="application/vnd.openxmlformats-package.relationships+xml" pkg:padding="256"><pkg:xmlData><Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"><Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/comments" Target="comments.xml" /></Relationships></pkg:xmlData></pkg:part><pkg:part pkg:name="/word/document.xml" pkg:contentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"><pkg:xmlData><w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><w:body><w:p>';
var postBody = '</w:p></w:body></w:document></pkg:xmlData></pkg:part><pkg:part pkg:name="/word/comments.xml" pkg:contentType="application/vnd.openxmlformats-officedocument.wordprocessingml.comments+xml"><pkg:xmlData><w:comments xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"><w:comment w:id="0" w:author="Some User" w:date="2016-10-26T10:11:05" w:initials="SU"><w:p><w:r><w:t>My comment</w:t></w:r></w:p></w:comment></w:comments></pkg:xmlData></pkg:part></pkg:package>';
var before = preBody + '<w:commentRangeStart w:id="0" />' + postBody;
var afterBody = '<w:commentRangeEnd w:id="0" /><w:r><w:commentReference w:id="0" /></w:r>';
var after = preBody + afterBody + postBody;
range.insertOoxml(before, Word.InsertLocation.start);
range.insertOoxml(after, Word.InsertLocation.end);
return context.sync().then(function () {
console.log('OOXML added to the beginning and the end of the range.');
});
})
.catch(function (error) {
console.log('Error: ' + JSON.stringify(error));
if (error instanceof OfficeExtension.Error) {
console.log('Debug info: ' + JSON.stringify(error.debugInfo));
}
});
This is a great question, thanks for asking it. When you use the insertOoxml method you are actually writing in the document's body, hence the behavior you observe when track changes is active, that's by design. There is no workaround for this, as of now. This issue will be solved when we support comments as a main feature of the API (and not inserting ooxml as workaround). Make sure to add your request or vote for an existing on in our uservoice! https://officespdev.uservoice.com/forums/224641-feature-requests-and-feedback/category/163566-add-in-word
Btw there was a bug we fixed a few months ago that the insertOoxml method was inserting a extra paragraph, make sure to update your Word to get the latest bug fixes. If this is still happening please share your Office Version number.

How to remove a Bookmark from a document?

Because OOXML documents don't seem to follow proper XML rules, a Bookmark consists of a BookmarkStart, a BookmarkEnd and an arbitrary number of elements in-between; not a hierarchy but a single flow of elements which have to be traversed in the right order:
<w:bookmarkStart w:id="4" w:name="Author"/>
<w:r w:rsidR="009878B3"><w:rPr><w:sz w:val="28"/></w:rPr><w:t><</w:t></w:r>
<w:r w:rsidR="005E0909"><w:rPr><w:sz w:val="28"/></w:rPr><w:t xml:space="preserve"> </w:t></w:r>
<w:r w:rsidR="009878B3"><w:rPr><w:sz w:val="28"/></w:rPr><w:t>Author></w:t></w:r>
<w:bookmarkEnd w:id="4"/>
I already ran into this problem in a related question: https://stackoverflow.com/questions/28219201/how-to-get-the-text-of-a-bookmark-as-a-single-string
But this question is, how can I remove the Bookmark entirely from the document, without breaking anything? Do I have to iterate through siblings from the BookmarkStart until I reach a BookmarkEnd? Is there some useful API method which makes up for the failure to use XML properly whereby one would just have a Bookmark node which could be deleted?!
You can just remove the BookmarkStart via the API and it will remove the corresponding BookmarkEnd for you and leave any contents intact. In C# something like this should work:
public static void RemoveBookmark(string filename, string bookmarkName)
{
using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(filename, true))
{
Body body = wordDocument.MainDocumentPart.Document.Body;
//find a matching BookmarkStart based on name
BookmarkStart start = body.Descendants<BookmarkStart>().FirstOrDefault(b => b.Name == bookmarkName);
if (start == null)
{
throw new Exception(string.Format("Bookmark {0} not found", bookmarkName));
}
//this is clever enough to remove the BookmarkStart and BookmarkEnd
start.Remove();
wordDocument.MainDocumentPart.Document.Save();
}
}

OpenXML editing docx

I have a dynamically generated docx file.
Need write the text strictly to end of page.
With Microsoft.Interop i insert Paragraphs before text:
int kk = objDoc.ComputeStatistics(WdStatistic.wdStatisticPages, ref wMissing);
while (objDoc.ComputeStatistics(WdStatistic.wdStatisticPages, ref wMissing) != kk + 1)
{
objWord.Selection.TypeParagraph();
}
objWord.Selection.TypeBackspace();
But i can't use same code with Open XML, because pages.count calculated only by word.
Using interop impossible, because it so slowwwww.
There are 2 options of doing this in Open XML.
create Content Place holder from Microsoft Office Developer Tab at the end of your document and now you can access this Content Place Holder programatically and can place any text in it.
you can append text driectly to your word document where it will be inserted at the end of your text. In this approach you got to write all the stuff to your document first and once you are done than you can append your document the following way
//
public void WriteTextToWordDocument()
{
using(WordprocessingDocument doc = WordprocessingDocument.Open(documentPath, true))
{
MainDocumentPart mainPart = doc.MainDocumentPart;
Body body = mainPart.Document.Body;
Paragraph paragraph = new Paragraph();
Run run = new Run();
Text myText = new Text("Append this text at the end of the word document");
run.Append(myText);
paragraph.Append(run);
body.Append(paragraph);
// dont forget to save and close your document as in the following two lines
mainPart.Document.Save();
doc.Close();
}
}
I haven't tested the above code but hope it will give you an idea of dealing with word document in OpenXML.
Regards,

merge word documents to a single document

I used the code in the link mentioned below to merge word files into a single file
http://devpinoy.org/blogs/keithrull/archive/2007/06/09/updated-how-to-merge-multiple-microsoft-word-documents.aspx
However, seeing the output file i realized that it was unable to copy header image in the first document. How do we merge documents preserving format and content.
I will suggest to use GroupDocs.Merger Cloud for merging multiple word document to a single word document, it keeps the formatting and contents of the source documents. It is a platform independent REST API solution without depending on any third-party tool or software.
Sample C# code:
var configuration = new GroupDocs.Merger.Cloud.Sdk.Client.Configuration(MyAppSid, MyAppKey);
var apiInstance_Document = new GroupDocs.Merger.Cloud.Sdk.Api.DocumentApi(configuration);
var apiInstance_File = new GroupDocs.Merger.Cloud.Sdk.Api.FileApi(configuration);
var pathToSourceFiles = #"C:/Temp/input/";
var remoteFolder = "Temp/";
var joinItem_list = new List<JoinItem>();
try
{
DirectoryInfo dir = new DirectoryInfo(pathToSourceFiles);
System.IO.FileInfo[] files = dir.GetFiles();
foreach (System.IO.FileInfo file in files)
{
var request_upload = new GroupDocs.Merger.Cloud.Sdk.Model.Requests.UploadFileRequest(remoteFolder + file.Name, File.Open(file.FullName, FileMode.Open));
var response_upload = apiInstance_File.UploadFile(request_upload);
var item = new JoinItem
{
FileInfo = new GroupDocs.Merger.Cloud.Sdk.Model.FileInfo
{ FilePath = remoteFolder + file.Name }
};
joinItem_list.Add(item);
}
var options = new JoinOptions
{
JoinItems = joinItem_list,
OutputPath = remoteFolder + "Merged_Document.docx"
};
var request = new JoinRequest(options);
var response = apiInstance_Document.Join(request);
Console.WriteLine("Output file path: " + response.Path);
}
catch (Exception e)
{
Console.WriteLine("Exception while Merging Documents: " + e.Message);
}
That code is inserting a page break after each file.
Since sections control headers, if a second or subsequent document has a header, you'll probably be wanting to keep the original section properties, and insert those after your first document.
If you look at your original document as a docx, you'll probably see that your section is a document level section properties element.
The easiest way around your problem may be to create a second section properties element inside the last paragraph (which contains the header information). Then this should just stay there when the documents are merged (ie other paragraphs added after it).
That's the theory. See also http://www.pcreview.co.uk/forums/thread-898133.php
But I haven't tried it; it assumes InsertFile behaves as I expect it should.