How to replace text with a MS Word web add-in by preserving the formatting? - ms-word

I'm working on a simple grammar correction web add-in for MS Word. Basically, I want to get the selected text, make minimal changes and update the document with the corrected text. Currently, if I use 'text' as the coercion type, I lose formatting. If there is a table or image in the selected text, they are also gone!
As I understand from the investigation I've been doing so far, openxml is the way to go. But I couldn't find any useful example on the web. How can I manipulate text by preserving the original formatting data? How can I ignore non-text paragraphs? I want to be able to do this with the Office JavaScript API:

I would do something like this:
// get data as OOXML
Office.context.document.getSelectedDataAsync(Office.CoercionType.Ooxml, function (result) {
if (result.status === "succeeded") {
var selectionAsOOXML = result.value;
var bodyContentAsOOXML = selectionAsOOXML.match(/<w:body.*?>(.*?)<\/w:body>/)[1];
// perform manipulations to the body
// it can be difficult to do to OOXML but with som regexps it should be possible
bodyContentAsOOXML = bodyContentAsOOXML.replace(/error/g, 'rorre'); // reverse the word 'error'
// insert the body back in to the OOXML
selectionAsOOXML = selectionAsOOXML.replace(/(<w:body.*?>)(.*?)<\/w:body>/, '$1' + bodyContentAsOOXML + '<\/w:body>');
// replace the selected text with the new OOXML
Office.context.document.setSelectedDataAsync(selectionAsOOXML, { coercionType: Office.CoercionType.Ooxml }, function (asyncResult) {
if (asyncResult.status === "failed") {
console.log("Action failed with error: " + asyncResult.error.message);
}
});
}
});

Related

I need to extract text from a pdf file using itext7 or itextsharp and put html tag for bold around all the words using bold font

I am using iText7 and I want to extract all the texts from a pdf and put html tag for bold ( <b>...</b> ) around all the words that uses bold fonts and save it in text file. Any pointers? I am able to independently extract text and also extract all the bold words but not able to co-relate the two.
Here is the code snippet I am using for extracting the text:
PdfDocument MyDocument = new PdfDocument(new PdfReader("C:\\MyTest.pdf"));
string MyText = PdfTextExtractor.GetTextFromPage(MyDocument.GetPage(1), new
SimpleTextExtractionStrategy());
Here is the code I am using for extracting all the words using the bold font:
MyRectangle = new Rectangle(0, 0, 50, 100);
CustomFontFilter fontFilter = new CustomFontFilter(MyRectangle);
FilteredEventListener listener = new FilteredEventListener();
LocationTextExtractionStrategy extractionStrategy =
listener.AttachEventListener(new LocationTextExtractionStrategy(), fontFilter);
PdfCanvasProcessor parser = new PdfCanvasProcessor(listener);
parser.ProcessPageContent(MyDocument.GetPage(1));
String MyBoldTextList = extractionStrategy.GetResultantText();
//------
class CustomFontFilter : TextRegionEventFilter
{
public CustomFontFilter(iText.Kernel.Geom.Rectangle filterRect) : base(filterRect){ }
override public bool Accept(IEventData data, EventType type)
{
if (type == EventType.RENDER_TEXT){
TextRenderInfo renderInfo = (TextRenderInfo)data;
PdfFont font = renderInfo.GetFont();
if (font!=null)
return font.GetFontProgram().GetFontNames().GetFontName().Contains("Bold");
}
return false;
}
}
The problem is that the pdf in question here is a multi-column document. SimpleTextExtractionStrategy brings the text in perfect order but if I use the LocationStrategy, it messes up texts by jumping from one column to next column in each line. I am not able to find any way to get the list of bold words using SimpleTextExtractionStrategy. In LocationStrategy, the list that I get is not in the right order so I am unable to co-relate it.
So to summarize:
You want to extract all the text from a pdf and put the html tag for bold (<b>...</b>) around all the text that uses bold fonts.
Your PDFs allow normal text extraction (without those <b> tags) using the SimpleTextExtractionStrategy. The LocationTextExtractionStrategy on the other hand cannot be used as it messes up the order of the multi-column text.
Bold text in your PDFs can properly be recognized by your CustomFontFilter, i.e. by the
font.GetFontProgram().GetFontNames().GetFontName().Contains("Bold")
condition.
Thus, one way to implement your task would be to extend the SimpleTextExtractionStrategy to check every chunk received using the CustomFontFilter condition and insert <b> tags where required.
For example like this:
public class BoldTaggingSimpleTextExtractionStrategy : SimpleTextExtractionStrategy
{
FieldInfo textField = typeof(TextRenderInfo).GetField("text", BindingFlags.NonPublic | BindingFlags.Instance);
bool currentlyBold = false;
public override void EventOccurred(IEventData data, EventType type)
{
if (type.Equals(EventType.RENDER_TEXT))
{
TextRenderInfo renderInfo = (TextRenderInfo)data;
string fontName = renderInfo.GetFont()?.GetFontProgram()?.GetFontNames()?.GetFontName();
if (fontName != null && fontName.Contains("Bold"))
{
if (!currentlyBold)
{
textField.SetValue(renderInfo, "<b>" + renderInfo.GetText());
currentlyBold = true;
}
}
else if (currentlyBold)
{
AppendTextChunk("</b>");
currentlyBold = false;
}
}
base.EventOccurred(data, type);
}
}
As you see I used reflection here. I did so because (A) TextRenderInfo does not allow public setting of the text and (B) AppendTextChunk must not be used before the first chunk is processed by base.EventOccurred - there the size of a StringBuilder containing the collected text chunks is used to check whether the chunk currently processed is the first one or not; if something is in that builder before at least one chunk has been processed, one gets a NullReferenceException. There are other work-arounds for that but reflection here means but one more line of code.

Microsoft Office add-in javascript: search by matchPrefix:true- how to get the full word for the matched prefix

I am trying to follow the article here
Also, adding the code here, since links can always move or get modified or go down.
// Run a batch operation against the Word object model.
Word.run(function (context) {
// Queue a command to search the document based on a prefix.
var searchResults = context.document.body.search('pattern', {matchPrefix: true});
// Queue a command to load the search results and get the font property values.
context.load(searchResults, 'font');
// Synchronize the document state by executing the queued commands,
// and return a promise to indicate task completion.
return context.sync().then(function () {
console.log('Found count: ' + searchResults.items.length);
// Queue a set of commands to change the font for each found item.
for (var i = 0; i < searchResults.items.length; i++) {
searchResults.items[i].font.color = 'purple';
searchResults.items[i].font.highlightColor = '#FFFF00'; //Yellow
searchResults.items[i].font.bold = true;
}
// Synchronize the document state by executing the queued commands,
// and return a promise to indicate task completion.
return context.sync();
});
})
.catch(function (error) {
console.log('Error: ' + JSON.stringify(error));
if (error instanceof OfficeExtension.Error) {
console.log('Debug info: ' + JSON.stringify(error.debugInfo));
}
});
It works fine, but I don't get the full word back.
So, if the word is patternABCDEFGH, the matched word in searchResults by doing
var text = searchResults.items[i].text;
console.log('Matching text:' + text);
All I get back is pattern, how do I get the full word back ?
The option MatchPrefix doesn't search every word that starts with that combination of letters - it searches only that combination of letters when it is at the beginning of a word. So it will only find the characters "pattern" in this case and not the entire "patternMatching" or "patternABCDEFGH".
As Juan mentioned, in order to get entire words that begin with a certain letter combination you need Word's variation of regex: wildcard search.
The wildcard pattern would look like this: [P,p]attern*>
This assumes you want both upper and lower case "p", followed by any character until the end of a word.
var searchResults = context.document.body.search('[P,p]attern*>', {matchWildcards: true});

Protractor - Create a txt file as report with the "Expect..." result

I'm trying to create a report for my scenario, I want to execute some validations and add the retults in a string, then, write this string in a TXT file (for each validation I would like to add the result and execute again till the last item), something like this:
it ("Perform the loop to search for different strings", function()
{
browser.waitForAngularEnabled(false);
browser.get("http://WebSite.es");
//strings[] contains 57 strings inside the json file
for (var i = 0; i == jsonfile.strings.length ; ++i)
{
var valuetoInput = json.Strings[i];
var writeInFile;
browser.wait;
httpGet("http://website.es/search/offers/list/"+valuetoInput+"?page=1&pages=3&limit=20").then(function(result) {
writeInFile = writeInFile + "Validation for String: "+ json.Strings[i] + " Results is: " + expect(result.statusCode).toBe(200) + "\n";
});
if (i == jsonfile.strings.length)
{
console.log("Executions finished");
var fs = require('fs');
var outputFilename = "Output.txt";
fs.writeFile(outputFilename, "Validation of Get requests with each string:\n " + writeInFile, function(err) {
if(err)
{
console.log(err);
}
else {
console.log("File saved to " + outputFilename);
}
});
}
};
});
But when I check my file I only get the first row writen in the way I want and nothing else, could you please let me know what am I doing wrong?
*The validation works properly in the screen for each of string in my file used as data base
**I'm a newbie with protractor
Thank you a lot!!
writeFile documentation
Asynchronously writes data to a file, replacing the file if it already
exists
You are overwriting the file every time, which is why it only has 1 line.
The easiest way would probably (my opinion) be appendFile. It writes to a file without overwriting existing data and will also create the file if it doesnt exist in the first place.
You could also re-read that log file, store that data in a variable, and re-write to that file with the old AND new data included in it. You could also create a writeStream etc.
There are quite a few ways to go about it and plenty of other answers
on SO specifically on those functions that can provide more info.
Node.js Write a line into a .txt file
Node.js read and write file lines
Final note, if you are using Jasmine you can also create a custom jasmine reporter. They have methods that contain exactly what you want (status Pass/Fail, actual vs expected values etc) and it's fairly easy to set up with Protractor

xPages REST Service Results into Combobox or Typeahead Text Field

I've read all the documentation I can find and watched all the videos I can find and don't understand how to do this. I have set up an xPages REST Service and it works well. Now I want to place the results of the service into either a combobox or typeahead text field. Ideally I would like to know how to do it for both types of fields.
I have an application which has a view containing a list of countries, another view containing a list of states, and another containing a list of cities. I would like the first field to only display the countries field from the list of data it returns in the XPages REST Service. Then, depending upon which country was selected, I would like the states for that country to be listed in another field for selection, etc.
I can see code for calling the REST Service results from a button, or from a dojo grid, but I cannot find how to call it to populate either of the types of fields identified above.
Where would I call the Service for the field? I had thought it would go in the Data area, but perhaps I've just not found the right syntax to use.
November 6, 2017:
I have been following your suggestion, but am still lost as can be. Here's what I currently have in my code:
x$( "#{id:ApplCountry}" ).select2({
placeholder: "select a country",
minimumInputLength: 2,
allowClear : true,
multiple: false,
ajax: {
dataType: 'text/plain',
url: "./Application.xsp/gridData",
quietMillis: 250,
data: function (params) {
return {
search:'[name=]*'+params.term+'*',
page: params.page
};
},
processResults: function (data, page) {
var data = $.map(data, function (obj) {
obj.id = obj.id || obj["#entityid"];
obj.text = obj.text || obj.name;
return obj;
});
},
return {results: data};
}
}
});
I'm using the dataType of 'text/plain' because that was what I understood I should use when gathering data from a domino application. I have tried changing this to json but it makes no difference.
I'm using processResults because I understand this is what should be used in version 4 of select2.
I don't understand the whole use of the hidden field, so I've stayed away from that.
No matter what I do, although my REST service works if I put it directly in the url, I cannot get any data to display in the field. All I want to display in the field is the country code of the document, which is in the field named "name" (not my choice, it's how it came before I imported the data from MySQL.
I have read documentation and watched videos, but still don't really understand how everything fits together. That was my problem with the REST service. If you use it in Dojo, you just put the name of the service in a field on the Dojo element and it's done, so I don't understand why all the additional coding for another type of domino element. Shouldn't it work the same way?
I should point out that at some points it does display the default message, so it does find the field. Just doesn't display the country selections.
I think the issue may be that you are not returning SelectItems to your select2, and that is what it is expecting. When I do something like you are trying, I actually use a bean to generate the selection choices. You may want to try that or I'm putting in the working part of my bean below.
The Utils.getItemValueAsString is a method I use to return either the string value of a field, or if it is not on the document/empty/null an empty string. I took out an if that doesn't relate to this, so there my be a mismatch, but I hope not.
You might be able to jump directly to populating the arrayList, but as I recall I needed to leverage the LinkedHashMap for something.
You should be able to do the same using SSJS, but since that renders to Java before executing, I find this more efficient.
For label/value pairs:
LinkedHashMap lhmap = new LinkedHashMap();
Document doc = null;
Document tmpDoc = null;
allObjects.addElement(doc);
if (dc.getCount() > 0) {
doc = dc.getFirstDocument();
while (doc != null) {
lhmap.put(Utils.getItemValueAsString(doc, LabelField, true), Utils.getItemValueAsString(doc, ValueField, true));
}
tmpDoc = dc.getNextDocument(doc);
doc.recycle();
doc = tmpDoc;
}
}
List<SelectItem> options = new ArrayList<SelectItem>();
Set set = lhmap.entrySet();
Iterator hsItr = set.iterator();
while (hsItr.hasNext()) {
Map.Entry me = (Map.Entry) hsItr.next();
// System.out.println("after: " + hStr);
SelectItem option = new SelectItem();
option.setLabel(me.getKey() + "");
option.setValue(me.getValue() + "");
options.add(option);
}
System.out.println("About to return from generating");
return options;
}
I ended up using straight up SSJS. Worked like a charm - very simple.

OpenXML editing docx

I have a dynamically generated docx file.
Need write the text strictly to end of page.
With Microsoft.Interop i insert Paragraphs before text:
int kk = objDoc.ComputeStatistics(WdStatistic.wdStatisticPages, ref wMissing);
while (objDoc.ComputeStatistics(WdStatistic.wdStatisticPages, ref wMissing) != kk + 1)
{
objWord.Selection.TypeParagraph();
}
objWord.Selection.TypeBackspace();
But i can't use same code with Open XML, because pages.count calculated only by word.
Using interop impossible, because it so slowwwww.
There are 2 options of doing this in Open XML.
create Content Place holder from Microsoft Office Developer Tab at the end of your document and now you can access this Content Place Holder programatically and can place any text in it.
you can append text driectly to your word document where it will be inserted at the end of your text. In this approach you got to write all the stuff to your document first and once you are done than you can append your document the following way
//
public void WriteTextToWordDocument()
{
using(WordprocessingDocument doc = WordprocessingDocument.Open(documentPath, true))
{
MainDocumentPart mainPart = doc.MainDocumentPart;
Body body = mainPart.Document.Body;
Paragraph paragraph = new Paragraph();
Run run = new Run();
Text myText = new Text("Append this text at the end of the word document");
run.Append(myText);
paragraph.Append(run);
body.Append(paragraph);
// dont forget to save and close your document as in the following two lines
mainPart.Document.Save();
doc.Close();
}
}
I haven't tested the above code but hope it will give you an idea of dealing with word document in OpenXML.
Regards,