Office.js Word Add-In: Performance Issue with Updating Values in Large Tables - ms-word

Summary:
Updating values in large Word tables (larger than 10 by 10) is very slow.
Performance gets exponentially worse with table size.
I'm using myTable.values = arrNewValues. I've also tried
myTable.addRows("end", rows, arrNewValues). Where arrNewValues is a
2D array.
I've also tried using updating via getOoxml() and
insertOoxml(), but ran into other issues I haven't been able to
resolve, but has good performance.
Slow performance seems to be caused by "ScreenUpdating" (same issue exists in VBA and is solved via ScreenUpdating=false). I believe it is critically important to add the ability to temporarily turn off ScreenUpdating.
Is there another way to improve table updating performance?
Background:
My add-in (https://analysisplace.com/Solutions/Document-Automation) performs document automation (updates content in a variety of Word docs). Many customers want to be able to update text in largish tables. Some documents have dozens of tables (appendices). I have run into the issue where updating these documents is unacceptably slow (well over a minute) due to the table updates.
Update time by table size:
2 rows by 10 columns: .33 seconds
4 rows by 10 columns: .52 seconds
8 rows by 10 columns: 1.5 seconds
16 rows by 10 columns: 5.5 seconds
32 rows by 10 columns: 20.8 seconds
64 rows by 10 columns: 88 seconds
Sample Office.js Code (Script Lab):
function updateTableCells() {
Word.run(function (context) {
var arrValues = context.document.body.tables.getFirst().load("values");
return context.sync().then(
function () {
var rows = arrValues.values.length;
var cols = arrValues.values[0].length;
console.log(getTimeElapsed() + "rows " + rows + "cols " + cols);
var arrNewValues = [];
for (var row = 0; row < rows; row++) {
arrNewValues[row] = [];
for (var col = 0; col < cols; col++) {
arrNewValues[row][col] = 'r' + row + ':c' + col;
}
}
console.log(getTimeElapsed() + 'Before setValues ') ;
context.document.body.tables.getFirst().values = arrNewValues;
return context.sync().then(
function () {
console.log(getTimeElapsed() + "Done");
});
});
})
.catch(OfficeHelpers.Utilities.log);
}
Sample Word VBA Code:
VBA performance is similar to the Office.js performance without ScreenUpdating = False. With ScreenUpdating = False, performance is instant.
Sub PopulateTable()
Application.ScreenUpdating = False
Dim nrRow As Long, nrCol As Long
Dim tbl As Word.Table
Set tbl = ThisDocument.Tables(1)
For nrRow = 1 To 32
For nrCol = 1 To 10
tbl.Cell(nrRow, nrCol).Range.Text = "c" & nrRow & ":" & nrCol
Next nrCol
Next nrRow
End Sub
Article explaining slow performance: see "Improving Performance When Automating Tables": https://msdn.microsoft.com/en-us/library/aa537149(v=office.11).aspx?cs-save-lang=1&cs-lang=vb#code-snippet-3
Posts indicating there is no "ScreenUpdating = False" in Office.js: ScreenUpdating Office-js taskpane and Equivalent to Application.ScreenUpdating Property in office-js Excel add-in
Sounds like we won't see it any time soon.
Post related to the updating tables via getOoxml() and insertOoxml(): Word Office.js: issues with updating tables in ContentControls using getOoxml() and insertOoxml()

This is probably not the answer you're looking for, but I have been working with a word add in for validation of software, and we are talking about updating 500-1000 rows with lots of little formatting changes.
Anyway one thing I found that helped is to scroll somewhere else in the document before you make the changes to the table. Just the act of looking at it will slow it down 10-20x. It's not always instant but near.

Related

Script is taking 11 - 20 seconds to lookup up an item in an 18,000 row data set

I have two Google sheets workbooks.
One is the "master" source of lookup data with a key based on manufacturer item #, which could be anything from 1234 to A-01/234-Name_1. This sheet, referenced via SpreadsheetApp.openByUrl, has 18,000 rows and 13 columns. The key column has been converted to plain text and the sheet is sorted by this column.
The second is the "template" where people enter item #s that they need to look up against the master, typically 20 - 1500 items at a time.
The script is in the template. It is very slow and routinely times out after 30 minutes. It was written by someone else and I am new to App Script, but I think I've managed to understand what the script is doing and where the bottleneck is occurring.
It does a bunch of stuff, but this is the meat of the lookup:
var numrows = master.getDataRange().getNumRows();
var masterdata = master.getDataRange().getValues();
var itemnumberlist = template.getDataRange().getValues();
var retreiveddata = [];
// iterate through the manf item number list to find all matches in the
// master and return those matches to another sheet
for (i = 1; i < template.getDataRange().getValues().length; i++) {
for (j = 0; j < numrows; j++) {
if (masterdata[j][1].toString() === itemnumberlist[i][1].toString()) {
retreiveddata.push(data[j]);
anothersheet.appendRow(data[j]);
}
}
}
I used Logger.log() to determine that each time through the i loop is taking 11 - 19 seconds, which just seems insane.
I've been doing some google searching and I've tried a couple of different things...
First I tried moving the writing of found data out of the for loop so the script would be doing all of its reading first and then writing in one big chunk, but I couldn't get it exactly right. My two attempts are below.
var mycounter = 0;
for (i = 0; i < template.getDataRange().getValues().length; i++) {
for (j = 0; j < numrows; j++) {
if (masterdata[j][0].toString() === itemnumberlist[i][0].toString()) {
retreiveddata.push(masterdata[j]);
mycounter = mycounter + 1;
}
}
}
// Attempt 1
// var myrange = retreiveddata.length;
// for(k = 0; k < myrange; k++) {
// anothersheet.appendRow(retreiveddata.pop([k]);
// }
//Attempt 2
var myotherrange = anothersheet.getRange(2,1,myothercounter, 13)
myotherrange.setValues(retreiveddata);
I can't remember for sure, because this was on Friday, but I think both attempts resulted in the script trying to write the entire master file into "anothersheet".
So I temporarily set this aside and decided to try something else. I was trying to recreate the issue in a couple of sample spreadsheets, but I was unable to do so. The same script is getting through my 15,000 row sample "master" file in less than 1 second per lookup. The only thing I can think of is that I used a random number as my key instead of a weird text string.
That led me to think that maybe I could use a hash algorithm on both the master data and the values to be looked up, but this is presenting a whole other set of issues.
I borrowed these functions from another forum post:
function GetMD5Hash(value) {
var rawHash = Utilities.computeDigest(Utilities.DigestAlgorithm.MD5,
value);
var txtHash = '';
for (j = 0; j <rawHash.length; j++) {
var hashVal = rawHash[j];
if (hashVal < 0)
hashVal += 256;
if (hashVal.toString(16).length == 1)
txtHash += "0";
txtHash += hashVal.toString(16);
Utilities.sleep(100);
}
return txtHash;
}
function RangeGetMD5Hash(input) {
if (input.map) { // Test whether input is an array.
return input.map(GetMD5Hash); // Recurse over array if so.
Utilities.sleep(100);
} else {
return GetMD5Hash(input)
}
}
It literally took me all day to get the hash value for all 18,000 item #s in my master spreadsheet. Neither GetMD5Hash nor RangeGetMD5Hash will return a value consistently. I can only do a few rows at a time. Sometimes I get "Loading..." indefinitely. Sometimes I get "#Name" with a message about GetMD5Hash being undefined (despite the fact that it worked on the previous row). And sometimes I get "#Error" with a message about an internal error.
This method actually reduces the lookup time of each item to 2 - 3 seconds (much better, but not great). However, I can't get the hash function to consistently work on the input data.
At this point I'm so frustrated and behind on my other work that I thought I'd reach out to the smart people on these forums and hope for some sort of miracle response.
To summarize, I'm looking for suggestions on these three items:
What am I doing wrong in my attempt to move the write out of the for loop?
Is there a way to get my hash value faster or utilize a different method to accomplish the same goal?
What else can I try to help speed up the script?
Any suggestions you can offer would be greatly appreciated!
-Mandy
It sounds like you hit on the right approach with attempting to move the appendRow() call out of the loop. Anytime you are reading or writing to a spreadsheet you can expect the individual call to take 1 to 2 seconds, so this will eat up a lot of time when you get matches. Storing the matches in an array and writing them all at once is the way to go.
Another thing I notice is that your script calls getValues() in the actual for loop condition statement. The condition statement is executed each time on each iteration of the loop, so this is potentially wasting a lot of time even when you don't have matches.
A final tweak that may be helpful depending on your desired behaviour. You can stop the inner for loop after it finds the first match, which, if you only care about the first match or know there will only be one match, will save you a lot of iterations. To do this, put "break" immediately after the retreiveddata.push(masterdata[j]); line.
To fix the getValues issue, Change:
for (i = 1; i < template.getDataRange().getValues().length; i++) {
To:
for (i = 1; i < itemnumberlist.length; i++) {
And that fix along with the appendRow issue, and including the break call:
for (i = 1; i < itemnumberlist.length; i++) {
for (j = 0; j < numrows; j++) {
if (masterdata[j][0].toString() === itemnumberlist[i][0].toString()) {
retreiveddata.push(masterdata[j]);
break; //stop searching after first match, move on to next item
}
}
}
//make sure you have data to write before trying to write it.
if(retreiveddata.length > 0){
var myotherrange = anothersheet.getRange(2,1,retreiveddata.length, retreiveddata[0].length);
myotherrange.setValues(retreiveddata);
}
If you are re-using the same sheet for "anothersheet" on each execution, you may also want to call anothersheet.clear() to erase any existing data before you write your fresh results.
I would pass on the hashing approach altogether, comparing strings is comparing strings, so whether they are hashes or actual part numbers I wouldn't expect a significant difference.

MongoDB : Slow text search when searching a very frequent term

I have a collection of about 1 million documents (movies mainly), I created a text index on a field. All works fine for almost all searches : less than 20ms to have a result. The exception is when one search for a very frequent term, it can lasts up to 3000 ms !
For example,
if I search for 'pulp' in the collection (only 40 documents have it), it lasts 1ms
if I search for 'movie' (750 000 documents have it), it lasts 3000ms.
When profiling the request, the explain('executionStats') show that all 'movies' documents are scanned. I tried many indexing, sorting + limiting and hinting but all 750 000 documents are still scanned and the result is still slow to come...
Is there a strategy to be able to search very frequent term in a database faster ?
I ended to do my own stop words list by coding something like this :
import pymongo
from bson.code import Code
# NB max occurences of a word in a collection after what it is considerated as a stop word.
NB_MAX_COUNT = 20000
STOP_WORDS_FILE = 'stop_words.py'
db = connection to the database...
mapfn = Code("""function() {
var words = this.field_that_is_text_indexed;
if (words) {
// quick lowercase to normalize per your requirements
words = words.toLowerCase().split(/[ \/]/);
for (var i = words.length - 1; i >= 0; i--) {
// might want to remove punctuation, etc. here
if (words[i]) { // make sure there's something
emit(words[i], 1); // store a 1 for each word
}
}
}
};""")
reducefn = Code("""function( key, values ) {
var count = 0;
values.forEach(function(v) {
count +=v;
});
return count;
};""")
with open(STOP_WORDS_FILE,'w') as fh:
fh.write('# -*- coding: utf-8 -*-\n'
'stop_words = [\n')
result = db.mycollection.map_reduce(mapfn,reducefn,'words_count')
for doc in result.find({'value':{'$gt':NB_MAX_COUNT}}):
fh.write("'%s',\n" % doc['_id'])
fh.write(']\n')

Getting the total number of records in PagedList

The datagrid that I use on the client is based on SQL row number; it also requires a total number of pages for its paging. I also use the PagedList on the server.
SQL Profiler shows that the PagedList makes 2 db calls - the first to get the total number of records and the second to get the current page. The thing is that I can't find a way to extract that total number of records from the PagedList. Therefore, currently I have to make an extra call to get that total which creates 3 calls in total for each request, 2 of which are absolutely identical. I understand that I probably won't be able to rid of the call to get the totals but I hate to call it twice. Here is an extract from my code, I'd really appreciate any help in this:
var t = from c in myDb.MyTypes.Filter<MyType>(filterXml) select c;
response.Total = t.Count(); // my first call to get the total
double d = uiRowNumber / uiRecordsPerPage;
int page = (int)Math.Ceiling(d) + 1;
var q = from c in myDb.MyTypes.Filter<MyType>(filterXml).OrderBy(someOrderString)
select new ReturnType
{
Something = c.Something
};
response.Items = q.ToPagedList(page, uiRecordsPerPage);
PagedList has a .TotalItemCount property which reflects the total number of records in the set (not the number in a particular page). Thus response.Items.TotalItemCount should do the trick.

How to automatically generate sequent numbers when using a form

Ahab stated in 2010: the complex looking number based on the Timestamp has one important property, the number can not change when rows are deleted or inserted.
As long as the submitted data is not changed by inserting deleting rows the simple formula =ArrayFormula(ROW(A2:A) - 1) may be the easiest one to use.
For other situations there is no nice reliable solution. :(
Now we live in 2015. Maybe times have changed?
I need a reliable way to number entries using a form.
Maybe a script can do the trick? A script that can add 1 to each entry?
That certain entry has to keep that number even when rows are deleted or inserted.
I created this simple spreadsheet in which I added 1,2, and 3 manually,please have a look:
https://docs.google.com/spreadsheets/d/1H9EXns8-7m9oLbCrTyIZhLKXk6TGxzWlO9pOvQSODYs/edit?usp=sharing
The script has to find the maximum of the former entries, which is 3, and then add 1 automatically.
Who can help me with this?
Grtz, Bij
Maybe a script can do the trick? A script that can add 1 to each
entry?
Yes, that would be what you need to resort to. I took the liberty of entering this in your example ss:
function onEdit(e) {
var watchColumns = [1, 2]; //when text is entered in any of these columns, auto-numbering will be triggered
var autoColumn = 3;
var headerRows = 1;
var watchSheet = "Form";
var range = e.range;
var sheet = range.getSheet();
if (e.value !== undefined && sheet.getName() == watchSheet) {
if (watchColumns.indexOf(range.getColumn()) > -1) {
var row = range.getRow();
if (row > headerRows) {
var autoCell = sheet.getRange(row, autoColumn);
if (!autoCell.getValue()) {
var data = sheet.getDataRange().getValues();
var temp = 1;
for (var i = headerRows, length = data.length; i < length; i++)
if (data[i][autoColumn - 1] > temp)
temp = data[i][autoColumn - 1];
autoCell.setValue(temp + 1);
}
}
}
}
}
For me the best way is to create a query in a second sheet pulling everything from form responses in to second column and so on. then use the first column for numbering.
In your second sheet B1 you would use:
=QUERY(Form!1:1004)
In your second sheet A2 you would use:
=ARRAYFORMULA(if(B2:B="",,Row(B2:B)-1))
I made a second sheet in your example spreadsheet, have a look at it.

How to quickly add many columns to GWT DataGrid

I am currently trying to create a DataGrid that can take an entity with a list of values as a row. Each value in the list is in its own column in the DataGrid. The entities' lists of values may have different sizes, so the DataGrid will have a variable number of columns. I have noticed that when I try to create the DataGrid and loop over the process of adding each of the column to the DataGrid, the time it takes to add the columns does not grow linearly.
Here is the code I was using to test the quickness of adding the columns
DataGrid<String> table = new DataGrid<String>();
table.setPageSize(25);
int NUM_COLUMNS = 40;
for (int i = 0; i < NUM_COLUMNS; i++) {
GWT.log("Adding column "+i);
TextColumn<String> nameColumn = new TextColumn<String>() {
public String getValue(String object) {
return object;
}
};
table.addColumn(nameColumn, "Column " + i);
table.setColumnWidth(nameColumn, 100, Unit.PX);
}
ArrayList<String> data = new ArrayList<String>();
for (int i = 0; i < 10; i++) {
data.add("row "+i);
}
table.setRowCount(data.size(), true);
table.setRowData(0, data);
table.setWidth("100");
This took about 48 seconds, give or take 1 second, every time I ran it. It seems that loading less than 10 columns were fairly quickly, but as the number of columns grew, the time it took to load it grew exponentially.
Is there another way to add columns to the DataGrid that would be quicker? Thanks in advance.
One question you might want to ask yourself is if there's a better way to do it. A table with 40 columns (IMO) seems inefficient. In general, you're going to have significant performance loss when loading more than ~15 columns in a DataGrid, and FlexTable isn't any better.
I've worked with DataGrid quite a bit and haven't seen any of the behavior you're talking about, though in my case they typically only have 10 or fewer columns with several thousand rows. (Data is of course paged and not being jammed in all at once.)
One thing I've noticed does speed it up is pre-rendering. Are you adding the table to the DOM prior to adding all these columns, or are you adding them all first? Lots of time can be spent waiting for the DOM to update. If you're adding it to the page after rendering everything, you're probably looking at the best speed you'll get, since there's no built-in function for adding multiple columns simultaneously.