Retrieve reference to comment OpenXML - openxml

I am trying to pull out the text from a Word document that is referenced by a comment in OpenXML. I can easily get the text of a comment, but not the paragraph text in the document that the comment is referencing.
The image I attached shows a comment and the related text. I am having a lot of trouble finding an example of how to get the referenced text. How can I get this text?

The solution is to get the Id of the comment which as you said you already know how to retrieve, and then search the document for a CommentRangeStart element with the same Id. When you have found it, you can loop over .NextSibling() until you hit a CommentRangeEnd element.
The elements between CommentRangeStart and CommentRangeEnd is the referenced part, which obviously can be multiple runs, paragraphs, images, whatever. So you will have to handle the collected elements somehow afterwards.
I made a test document looking like this:
I've made this code to test it:
using (var wordDoc = WordprocessingDocument.Open(#"c:\test\test.docx", true))
{
MainDocumentPart mainPart = wordDoc.MainDocumentPart;
var document = mainPart.Document;
var comments = mainPart.WordprocessingCommentsPart.Comments.ChildElements;
foreach(Comment comment in comments)
{
string commentId = comment.Id;
string commentText = comment.InnerText;
OpenXmlElement rangeStart = document.Descendants<CommentRangeStart>().Where(c => c.Id == commentId).FirstOrDefault();
List<OpenXmlElement> referenced = new List<OpenXmlElement>();
rangeStart = rangeStart.NextSibling();
while(!(rangeStart is CommentRangeEnd))
{
referenced.Add(rangeStart);
rangeStart = rangeStart.NextSibling();
}
Console.WriteLine("Comment Id " + commentId + " with text \"" + " " + commentText + "\" references =>");
foreach (var ele in referenced)
{
if(!string.IsNullOrWhiteSpace(ele.InnerText))
{
Console.WriteLine(" " + ele.InnerText);
}
}
}
Console.ReadKey();
}
Which produces this output
I hope it helps!

I could not get your solution to work. However I found a workaround.
OpenXmlElement rangeStart = document.Descendants<CommentRangeStart>().Where(c => c.Id == commentId).FirstOrDefault();
bool breakLoop = false;
rangeStart = rangeStart.Parent;
while (true) // Looping through items between commentRangeStart and commentRangeEnd.
{
if (rangeStart.NextSibling() == null)
{
break;
}
foreach (var ele in rangeStart.ChildElements)
{
if (!(ele is CommentRangeEnd))
{
if (!(string.IsNullOrWhiteSpace(ele.InnerText)))
{
referenced.Add(ele);
}
}
else
{
breakLoop = true;
}
if (breakLoop)
break;
}
rangeStart = rangeStart.NextSibling();
}
Hence, instead of looping through the paragraph in which the CommenRageStart exists, since one comment may be built up of several paragraphs, I use the parent node in order to trace back and forth between the paragraphs. Finnaly, as I reach the CommentRangeEnd I can break the loop and process the data however is required.

Related

How to loop a search over a long string?

In order to get around the 255 character search limitation in the desktop Word api I'm breaking long strings into searchable chunks of 254 characters and pushing them into an object "oSearchTerms". Then I'm attempting to iterate over oSearchTerms, search for the text, highlight it, then search for the next chunk and do the same until all items in oSearchTerms have been highlighted. The problem is it's not looping. It goes through the first iteration successfully but stops.
I've tried copious context.sync() calls, return true, return context.sync(), etc, which you'll see commented out below, to no avail.
I should also point out that it's not showing any errors. The loop just isn't looping.
Do I have to convert this over to an async function? I'd like to stick with ES5 and not use fat arrow functions.
What am I missing?
var fullSearchTerm = "As discussed earlier, one of the primary objectives of these DYH rules is to ensure that operators have at least one source of XYZ-approved data and documents that they can use to comply with operational requirements The objective would be defeated if the required data and documents were not, in fact, approved and Only by retaining authority to approve these materials can we ensure that they comply with applicable requirements and can be relied upon by operators to comply with operational rules which We believe there are differences between EXSS ICA and other ICA that necessitate approval of EVIS ICA."
function findTextMatch() {
Word.run(function(context) {
OfficeExtension.config.extendedErrorLogging = true;
var oSearchTerms = [];
var maxChars = 254;
var lenFullSearchTerm = fullSearchTerm.length;
var nSearchCycles = Math.ceil(Number((lenFullSearchTerm / maxChars)));
console.log("lenFullSearchTerm: " + lenFullSearchTerm + " nSearchCycles: " + nSearchCycles);
// create oSearchTerms object containing search terms
// leaves short strings alone but breaks long strings into
// searchable 254 character chunks
for (var i = 0; i < nSearchCycles; i++) {
var posStart = i * maxChars;
var mySrch = fullSearchTerm.substr(posStart, maxChars);
console.log( i +" mySrch: "+ mySrch);
var oSrch = {"searchterm":mySrch};
oSearchTerms.push(oSrch);
}
console.log("oSearchTerms.length: " + oSearchTerms.length +" oSearchTerm: "+ JSON.stringify(oSearchTerms));
// Begin search loop
// iterate over oSearchTerms, find and highlight each searchterm
for (var i = 0; i < oSearchTerms.length; i++) {
console.log("oSearchTerms["+i+"].searchterm: " + JSON.stringify(oSearchTerms[i].searchterm));
var searchResults = context.document.body.search(oSearchTerms[i].searchterm, { matchCase: true });
console.log("do context.sync() ");
context.load(searchResults);
return context.sync()
.then(function(){
console.log("done context.sync() ");
console.log("searchResults: "+ JSON.stringify(searchResults));
if(typeof searchResults.items !== undefined){
console.log("i: "+i+ " searchResults: "+searchResults.items.length);
// highlight each result
for (var j = 0; j < searchResults.items.length; j++) {
console.log("highlight searchResults.items["+j +"]");
searchResults.items[j].font.highlightColor = "red";
}
}
else{
console.log("typeof searchResults.items == undefined");
}
// return true;
// return context.sync();
});
//.then(context.sync);
//return true;
} // end search loop
})
.catch( function (error) {
console.log('findTextMatch Error: ' + JSON.stringify(error));
if (error instanceof OfficeExtension.Error) {
console.log('findTextMatch Debug info: ' + JSON.stringify(error));
}
});
}
I recommend that you not have a context.sync inside a loop. That can be a performance hit and it makes the code hard to reason about. Please see my answer to: Document not in sync after replace text and this sample: Word Add-in Stylechecker for a design pattern that avoids this. The pattern can be used with ES5 syntax if you want.
If you implement this pattern, you may find that the problem has gone away, or at least you will be able to see clearer where the cause might be.

Is it necessary to create <span> elements to register event listeners

I have a working web app that reads local .txt files and displays the content in a div element. I create a span element out of each word because I need to be able to select any word in the document and create an EEI (Essential Elements of Information) from the text. I then register a click handler on the containing div and let the event bubble up. The three functions below show reading the file, and parsing it, and populating the text div with spans:
function readInputFile(evt) {
reset();
var theFile = evt.target.files[0];
if(theFile) {
$("#theDoc").empty(); //Clean up any old docs loaded
var myReader = new FileReader();
var ta = document.getElementById("theDoc");
myReader.onload = function(e) {
parseTheDoc(e.target.result);
initialMarkup();
};
myReader.readAsText(theFile);
} else {
alert("Can not read input file: readInputFile()");
}
}
function parseTheDoc(docContents) {
var lines = docContents.split("\n");
var sentWords =[];
for(var i = 0; i < lines.length; i++) {
sentWords = lines[i].split(" ");
words = words.concat(sentWords);
words.push("<br>");
}
//examineWords(words);
createSpans(words);
}
function createSpans() {
for (var i = 0; i < words.length; i++) {
var currentWord = words[i];
if(currentWord !== "<br>") {
var $mySpan = $("<span />");
$mySpan.text(currentWord + " ");
$mySpan.attr("id", "word_" + i);
$("#theDoc").append($mySpan);
buildDocVector(currentWord, i, $mySpan);
}
else {
var $myBreak = $("<br>");
$myBreak.attr("id", "word_" + i);
$("#theDoc").append($myBreak);
buildDocVector("br", i, $myBreak);
}
}
//console.log("CreateSpans: Debug");
}
So basically a simple fileReader, split on \n, then tokenize on white space. I then create a span for each word, and a br element for each \n. It's not beautiful, but it satisfies the requirement, and works. My question is, is there a more efficient way of doing this? It just seems expensive to create all these spans, but my requirement is to annotate the doc and map any selected word to a data model/ontology. I can't think of a way to allow the user to select any word, or combination of words (control click) and then perform operations on them. This works, but with large docs (100 pages) I start having performance/memory issues. I understand this is more a design question and may not be appropriate, but I'd really like to know if there are more performant solutions.

Insert multiple lines of text into a Rich Text content control with OpenXML

I'm having difficulty getting a content control to follow multi-line formatting. It seems to interpret everything I'm giving it literally. I am new to OpenXML and I feel like I must be missing something simple.
I am converting my multi-line string using this function.
private static void parseTextForOpenXML(Run run, string text)
{
string[] newLineArray = { Environment.NewLine, "<br/>", "<br />", "\r\n" };
string[] textArray = text.Split(newLineArray, StringSplitOptions.None);
bool first = true;
foreach (string line in textArray)
{
if (!first)
{
run.Append(new Break());
}
first = false;
Text txt = new Text { Text = line };
run.Append(txt);
}
}
I insert it into the control with this
public static WordprocessingDocument InsertText(this WordprocessingDocument doc, string contentControlTag, string text)
{
SdtElement element = doc.MainDocumentPart.Document.Body.Descendants<SdtElement>().FirstOrDefault(sdt => sdt.SdtProperties.GetFirstChild<Tag>().Val == contentControlTag);
if (element == null)
throw new ArgumentException("ContentControlTag " + contentControlTag + " doesn't exist.");
element.Descendants<Text>().First().Text = text;
element.Descendants<Text>().Skip(1).ToList().ForEach(t => t.Remove());
return doc;
}
I call it with something like...
doc.InsertText("Primary", primaryRun.InnerText);
Although I've tried InnerXML and OuterXML as well. The results look something like
Example AttnExample CompanyExample AddressNew York, NY 12345 or
<w:r xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><w:t>Example Attn</w:t><w:br /><w:t>Example Company</w:t><w:br /><w:t>Example Address</w:t><w:br /><w:t>New York, NY 12345</w:t></w:r>
The method works fine for simple text insertion. It's just when I need it to interpret the XML that it doesn't work for me.
I feel like I must be super close to getting what I need, but my fiddling is getting me nowhere. Any thoughts? Thank you.
I believe the way I was trying to do it was doomed to fail. Setting the Text attribute of an element is always going to be interpreted as text to be displayed it seems. I ended up having to take a slightly different tack. I created a new insert method.
public static WordprocessingDocument InsertText(this WordprocessingDocument doc, string contentControlTag, Paragraph paragraph)
{
SdtElement element = doc.MainDocumentPart.Document.Body.Descendants<SdtElement>().FirstOrDefault(sdt => sdt.SdtProperties.GetFirstChild<Tag>().Val == contentControlTag);
if (element == null)
throw new ArgumentException("ContentControlTag " + contentControlTag + " doesn't exist.");
OpenXmlElement cc = element.Descendants<Text>().First().Parent;
cc.RemoveAllChildren();
cc.Append(paragraph);
return doc;
}
It starts the same, and gets the Content Control by searching for it's Tag. But then I get it's parent, remove the Content Control elements that were there and just replace them with a paragraph element.
It's not exactly what I had envisioned, but it seems to work for my needs.

Custom properties are not updated for the word via openXML

I am trying to update custom properties of word document thru Open XML programming but it seems the updated properties are not getting saved properly for the word document. So when I opening document after successful execution of the update custom property code, I am getting the message box which is "This document contains field that may refer to other files; Do you want to update the fields in the Document?" If I am pressing 'NO' button then all the update properties would not be saved to the document. If we are going for yes option then it will update properties but I need to save the properties explicitly. Please suggest to save properties to the document without getting confirmation message or corrupting the document. :)
the code snippet is given as below,
public void SetCustomValue(
WordprocessingDocument document, string propname, string aValue)
{
CustomFilePropertiesPart oDocCustomProps = document.CustomFilePropertiesPart;
Properties props = oDocCustomProps.Properties;
if (props != null)
{
//logger.Debug("props is not null");
foreach (var prop in props.Elements<CustomDocumentProperty>())
{
if (prop != null && prop.Name == propname)
{
//logger.Debug("Setting Property: " + prop.Name + " to value: " + aValue);
prop.Remove();
var newProp = new CustomDocumentProperty();
newProp.FormatId = "{D5CDD505-2E9C-101B-9397-08002B2CF9AE}";
newProp.Name = prop.Name;
VTLPWSTR vTLPWSTR1 = new VTLPWSTR();
vTLPWSTR1.Text = aValue;
newProp.Append(vTLPWSTR1);
props.AppendChild(newProp);
props.Save();
}
}
int pid = 2;
foreach (CustomDocumentProperty item in props)
{
item.PropertyId = pid++;
}
props.Save();
}
}
I am using .Net framework 3.5 with Open XML SDK 2.0 and Office 2013.
Try this one
var CustomeProperties = xmlDOc.CustomFilePropertiesPart.Properties;
foreach (CustomDocumentProperty customeProperty in CustomeProperties)
{
if (customeProperty.Name == "DocumentName")
{
customeProperty.VTLPWSTR = new VTLPWSTR("My Custom Name");
}
else if (customeProperty.Name == "DocumentID")
{
customeProperty.VTLPWSTR = new VTLPWSTR("FNP.SMS.IQC");
}
else if (customeProperty.Name == "DocumentLastUpdate")
{
customeProperty.VTLPWSTR = new VTLPWSTR(DateTime.Now.ToShortDateString());
}
}
//Open Word Setting File
DocumentSettingsPart settingsPart = xmlDOc.MainDocumentPart.GetPartsOfType<DocumentSettingsPart>().First();
//Update Fields
UpdateFieldsOnOpen updateFields = new UpdateFieldsOnOpen();
updateFields.Val = new OnOffValue(true);
settingsPart.Settings.PrependChild<UpdateFieldsOnOpen>(updateFields);
settingsPart.Settings.Save();
you have to update your document fields on open.

Relational Queries In parse.com (Unity)

I found example in parse.com. I have 2 objects : Post and Comment, in the Comment objects have a collumn: "parent" pointer to Post obj and I want to join them:
var query = ParseObject.GetQuery ("Comment");
// Include the post data with each comment
query = query.Include("parent");
query.FindAsync().ContinueWith(t => {
IEnumerable<ParseObject> comments = t.Result;
// Comments now contains the last ten comments, and the "post" field
// contains an object that has already been fetched. For example:
foreach (var comment in comments)
{
// This does not require a network access.
string o= comment.Get<string>("content");
Debug.Log(o);
try {
string post = comment.Get<ParseObject>("parent").Get<string>("title");
Debug.Log(post);
} catch (Exception ex) {
Debug.Log(ex);
}
}
});
It worked!
And then, I have 2 objects: User and Gamescore, in the Gamescore objects have a collumn: "playerName" pointer to Post obj I want join them too:
var query = ParseObject.GetQuery ("GameScore");
query.Include ("playerName");
query.FindAsync ().ContinueWith (t =>{
IEnumerable<ParseObject> result = t.Result;
foreach (var item in result) {
Debug.Log("List score: ");
int score = item.Get<int>("score");
Debug.Log(score);
try {
var obj = item.Get<ParseUser>("playerName");
string name = obj.Get<string>("profile");
//string name = item.Get<ParseUser>("playerName").Get<string>("profile");
Debug.Log(name);
} catch (Exception ex) {
Debug.Log(ex);
}
}
});
but It isn't working, Please help me!
Why didn't you do the following like you did your first example:
query = query.Include ("playerName");
you just have -
query.Include ("playerName");
One solution would be to ensure that your ParseUser object is properly fetched. ie:
var obj = item.Get<ParseUser>("playerName");
Task t = obj.FetchIfNeededAsync();
while (!t.IsCompleted) yield return null;
Then you can do this without worrying:
string name = obj.Get<string>("profile");
But that will be another potential request to Parse, which is unfortunate. It seems that query.Include ("playerName") isn't properly working in the Unity version of Parse?
I believe you're supposed to use multi-level includes for this, like .Include("parent.playerName") in your first query.